Skip to content

Attribute columns

This page discusses what an attribute column is, where one should be used, and how one can be defined.

For a detailed look at an attribute column's configuration options, see the Reference table at the bottom of this page.

What is an attribute column?

Attribute columns describe or provide additional context to the data set's observed values. They do not identify a particular sub-set of the data set's population. If the column you are configuring does identify such a sub-set of the population, then it is probably a dimension.

When to use one

Unlike dimensions, observations, measures and units, attributes are optional components of a data cube. Their primary usage is to qualify observed values by providing additional context about individual data points. Two common examples of attributes are observation status and confidence intervals.

Year Location Number of Arthur's Bakes Average customer spend Status 95% CI (lower bound) 95% CI (upper bound)
2022 London 35 7.85 Provisional 6.54 8.06

In the table above, there are three attribute columns: Status, 95% CI (lower bound) and 95% CI (upper bound).

The Status attribute column indicates whether the observed value is Provisional or Final, and applies to all of the observed values in this data set.

The 95% CI (lower bound) and 95% CI (upper bound) attribute columns contain the lower and upper bounds of the 95% confidence interval for the Average customer spend observed values.

Resources vs Literals

The configuration of attribute columns in your data set will depend primarily on whether you choose to represent the attribute as a Resource or a Literal.

Resource attributes are suitable for categorical values which can be reused as linked data. Given that the goal of the csvcubed project is to simplify the process of creating 5-star linked data from CSV files, using Resource attributes in your data cube where appropriate is encouraged.

Observation status is a good example of a Resource attribute, since there are a number of categories (Provisional, Final etc) which describe the observed value.

See the Resource attributes page for more information on how to configure these columns.

Literal attributes are simple values and are not linked data. You should only use Literal attributes when your attribute values are not categorical.

Confidence intervals are a good example of when the use of Literal attributes is appropriate, as the attribute values are numeric (i.e. not categorical), and each value is unique to the observed value it qualifies.

See the Literal attributes page for more information on how to configure these columns.

Basic configuration

As mentioned above, the configuration of attribute columns will depend on whether you choose to represent them as Resources or Literals. Please refer to the Resource attributes and Literal attributes pages for further information.

Describing observations

In a pivoted shape data set with multiple observations columns, attributes must be explicitly associated with the observed values they qualify. In the example below, there are two attribute columns, Number of Stores Status and Revenue Status, which qualify the Number of Arthur's Bakes and Revenue columns respectively.

Year Location Number of Arthur's Bakes Number of Stores Status Revenue Revenue Status
2022 London 35 Provisional 25 Provisional
2021 Cardiff 26 Final 18 Final

This can be configured as follows:

{ ...
   "columns": {
      "Number of Stores Status": {
         "type": "attribute",
         "describes_observations": "Number of Arthur's Bakes"
      },
      "Revenue Status": {
         "type": "attribute",
         "describes_observations": "Revenue"
      }
   }
}

The describes_observations field has been used to associate each attribute with the observed values it qualifies. The formatting of the fields' values (in this case, Number of Arthur's Bakes and Revenue) must match the relevant column titles exactly in order for csvcubed to recognise the association.

Reference

This table shows a list of the possible fields that can be entered when configuring an attribute column.

field name description default value
type The type of the column; to configure an attribute column use the value attribute (Required) dimension
label The title of the column (Optional) The capital case of the header in the CSV file with spaces replacing underscores
description A description of the contents of the column (Optional) none
definition_uri A URI of a resource to show how the column is created/managed (e.g. a URI of a PDF explaining a list of attribute values) (Optional) none
required If this boolean value is true csvcubed will flag to the user if there are blank values in this column (Optional) false
data_type (Literal attributes only) The XML data type of the contents of the column. If this is provided it becomes a Literal attribute column (Optional) none
values (New Resource Attributes only) If automatically-generated attributes are desired, a boolean value of true is used to signify to csvcubed to create Resource attributes from values in this column; otherwise this should be a list of attribute value objects defining the attributes used in the column. See Attribute values configuration for more details. (Optional) none
from_template (New/Existing Resource Attributes only) Use a column template (Optional) none
from_existing The URI of the resource for reuse/extension (Optional) none
describes_observations Associates this attribute with the relevant observation values. This is only necessary for pivoted shape data sets with multiple observation value columns. (Optional) none
cell_uri_template (Existing Resource Attributes only) Used to define a template to map the cell values in this column to URIs (Optional) none