Attribute columns

This page discusses what an attribute column is, where one should be used, and how one can be defined.

For a detailed look at an attribute column's configuration options, see the Reference table at the bottom of this page.

What is an attribute column?

Attribute columns describe or provide additional context to the data set's observed values. They do not identify a particular sub-set of the data set's population. If the column you are configuring does identify such a sub-set of the population, then it is probably a dimension.

When to use one

Unlike dimensions, observations, measures and units, attributes are optional components of a data cube. Their primary usage is to qualify observed values by providing additional context about individual data points. Two common examples of attributes are observation status and confidence intervals.

Year	Location	Number of Arthur's Bakes	Average customer spend	Status	95% CI (lower bound)	95% CI (upper bound)
2022	London	35	7.85	Provisional	6.54	8.06

In the table above, there are three attribute columns: Status, 95% CI (lower bound) and 95% CI (upper bound).

The Status attribute column indicates whether the observed value is Provisional or Final, and applies to all of the observed values in this data set.

The 95% CI (lower bound) and 95% CI (upper bound) attribute columns contain the lower and upper bounds of the 95% confidence interval for the Average customer spend observed values.

Resources vs Literals

The configuration of attribute columns in your data set will depend primarily on whether you choose to represent the attribute as a Resource or a Literal.

Resource attributes are suitable for categorical values which can be reused as linked data. Given that the goal of the csvcubed project is to simplify the process of creating 5-star linked data from CSV files, using Resource attributes in your data cube where appropriate is encouraged.

Observation status is a good example of a Resource attribute, since there are a number of categories (Provisional, Final etc) which describe the observed value.

See the Resource attributes page for more information on how to configure these columns.

Literal attributes are simple values and are not linked data. You should only use Literal attributes when your attribute values are not categorical.

Confidence intervals are a good example of when the use of Literal attributes is appropriate, as the attribute values are numeric (i.e. not categorical), and each value is unique to the observed value it qualifies.

See the Literal attributes page for more information on how to configure these columns.

Basic configuration

As mentioned above, the configuration of attribute columns will depend on whether you choose to represent them as Resources or Literals. Please refer to the Resource attributes and Literal attributes pages for further information.

Describing observations

In a pivoted shape data set with multiple observations columns, attributes must be explicitly associated with the observed values they qualify. In the example below, there are two attribute columns, Number of Stores Status and Revenue Status, which qualify the Number of Arthur's Bakes and Revenue columns respectively.

Year	Location	Number of Arthur's Bakes	Number of Stores Status	Revenue	Revenue Status
2022	London	35	Provisional	25	Provisional
2021	Cardiff	26	Final	18	Final

This can be configured as follows:

{ ...
   "columns": {
      "Number of Stores Status": {
         "type": "attribute",
         "describes_observations": "Number of Arthur's Bakes"
      },
      "Revenue Status": {
         "type": "attribute",
         "describes_observations": "Revenue"
      }
   }
}

The describes_observations field has been used to associate each attribute with the observed values it qualifies. The formatting of the fields' values (in this case, Number of Arthur's Bakes and Revenue) must match the relevant column titles exactly in order for csvcubed to recognise the association.

Reference

This table shows a list of the possible fields that can be entered when configuring an attribute column.

field name	description	default value
`type`	The type of the column; to configure an attribute column use the value `attribute` (Required)	dimension
`label`	The title of the column (Optional)	The capital case of the header in the CSV file with spaces replacing underscores
`description`	A description of the contents of the column (Optional)	none
`definition_uri`	A URI of a resource to show how the column is created/managed (e.g. a URI of a PDF explaining a list of attribute values) (Optional)	none
`required`	If this boolean value is true csvcubed will flag to the user if there are blank values in this column (Optional)	false
`data_type`	(Literal attributes only) The XML data type of the contents of the column. If this is provided it becomes a Literal attribute column (Optional)	none
`values`	(New Resource Attributes only) If automatically-generated attributes are desired, a boolean value of `true` is used to signify to csvcubed to create Resource attributes from values in this column; otherwise this should be a list of attribute value objects defining the attributes used in the column. See Attribute values configuration for more details. (Optional)	none
`from_template`	(New/Existing Resource Attributes only) Use a column template (Optional)	none
`from_existing`	The URI of the resource for reuse/extension (Optional)	none
`describes_observations`	Associates this attribute with the relevant observation values. This is only necessary for pivoted shape data sets with multiple observation value columns. (Optional)	none
`cell_uri_template`	(Existing Resource Attributes only) Used to define a template to map the cell values in this column to URIs (Optional)	none