Dimension columns
This page discusses what a dimension column is, when one should be used, and how one can be defined.
For a detailed look at a dimension column's configuration options, see the Reference table at the bottom of this page.
What is a dimension column?
A dimension column identifies the observations in a data set. In order to be valid, a data cube must include at least one dimension column; however, in practice, it is likely that your data set will contain more than one dimension.
A combined set of dimension values (including the measure) uniquely identifies each observation in the data set. More specifically, the combined dimension values identify the sub-set of the population to which the observed value applies.
Examples of dimensions include the time period to which the observation applies, or the geographic region which the
observation covers, as demonstrated with the Year
and Region
columns in the table below:
Year | Region | Value |
---|---|---|
2020 | England | 10.6 |
2021 | Scotland | 13.8 |
2022 | Wales | 9.43 |
When to use one
Dimensions are the fundamental building blocks of your data set, so your data set must always include at least one dimension.
If a column groups or identifies a sub-set of the population that your cube describes, then it is a dimension. Care should be taken when deciding whether a column represents a dimension or an attribute. Attributes describe the observed value and should not identify a sub-set of your cube's population.
Referring to the table above, Year
and Region
are the dimensions that partition the population into sub-sets. That
is, Year
and Region
respectively identify the time period and geographic area to which the observed value relates.
Basic configuration
If you do not provide a column configuration in your qube-config.json
file for a column, then
it will be configured by convention. This means that your column will be
treated as a dimension by default unless it has a reserved name.
If you provide a column mapping for your column, but you do not specify the type
field, then csvcubed will
automatically assume that your column is a dimension. It is also possible to explicitly set the column as a dimension
by setting the column's type
field to dimension
. The following examples are therefore equivalent:
This minimal definition results in:
- the
label
field defaulting to the column titles (Year
andRegion
in this example); - a
code_list
being automatically generated for each column, containing the column's unique values.
Label, description and definition
Additional details can be associated with the dimensions in your data set through the label
, description
and
definition_uri
fields.
As mentioned above, the label
field will default to the column title unless explicitly configured in the
qube-config.json
file. In the example below, the Region
label is amended to Geographic region
:
The description
field can be used to provide a longer description of your dimension. If you want to provide
information about your methodology, the description
field is the preferred place for this.
The definition_uri
field allows you to refer to external human readable resources that further define a dimension's
values:
Code list configuration
Code list - A predefined set of codified concepts which represent the distinct values that a dimension can hold.
One of the key principles of linked data is to connect data from different sources by reusing common definitions. Code lists are an important part of your data set where using linked data can make comparability with other cubes easy.
By default, csvcubed will generate code lists for each of the dimensions in your data set. However, there are several configuration options for refining how your code lists are generated and expressed. These are briefly described below - full details can be found on the Code list configuration page.
Link to an externally-defined code list (URI)
Use a locally-defined code-list-config.json
Define an in-line code list
Suppress a code list
Dimension column templates
The Region
column could also be configured by using a column template - doing so means that the
type
, from_existing
, label
and cell_uri_template
fields will be automatically populated based on the
statistical-geography
template.
Inheritance
To reuse or extend an existing dimension, the from_existing
field can be configured to link to a URI where the
dimension to be reused or extended is defined.
To reuse a parent dimension without making any changes to it, set the from_existing
field to the URI defining the
dimension to be reused:
To extend a parent dimension and create a new dimension from it, set the from_existing
field to the URI defining the
dimension to be reused, and set the label
field to indicate that this is a new child dimension of
http://purl.org/linked-data/sdmx/2009/dimension#refArea
:
Advanced configuration
Cell URI templates
Warning
The use of the cell_uri_template
field is considered an advanced configuration option, and therefore care must be
taken to ensure that the values generated are valid.
The Song
and Language
columns have both been configured with a cell_uri_template
field. It is important to note
that this field should only be used where the concept scheme is defined externally at an existing URI, or there is no
concept scheme, but you want to point to an existing resource to provide additional context about the dimension's value.
If cell_uri_template
is specified:
- Either
from_existing
must also be defined, in which casecell_uri_template
should refer to the concepts in the existing dimension's code list;- Or
code_list
must be set asfalse
, in which casecell_uri_template
should refer to URIs which are existing RDF resources.
The format of the cell_uri_template
value must follow RFC6570 guidance
for URI Templates. Note that the only variable which can be used in a cell_uri_template
references the column itself;
the name of the variable can be calculated by applying the
CSV column name safe transformation to the CSV column title.
Reference
This table shows a list of the possible fields that can be entered when configuring a dimension column.
field name | description | default value |
---|---|---|
type |
The type of the column. This can be left blank to configure a column as a dimension by default. | dimension |
label |
The title of the column (Optional) | The capital case version of the column header in the CSV file with spaces replacing underscores |
description |
A description of the contents of the column (Optional) | none |
from_template |
Use a column template (Optional) | none |
from_existing |
The URI of the resource for reuse/extension (Optional) | none |
code_list |
Generate a code-list (true), suppress a code-list (false), file path to a code-list-config.json, in-line code list (JSON), or link to an externally-defined code list (URI) | true |
definition_uri |
A URI of a resource to show how the column is created/managed (e.g. a URI of a PDF explaining a list of units) (Optional) | none |
cell_uri_template |
(Advanced) Override the URI generated for values within the URI (Optional) | none |