This page discusses what a dimension column is, when one should be used, and how one can be defined.
For a detailed look at a dimension column's configuration options, see the Reference table at the bottom of this page.
What is a dimension column?
A dimension column identifies the observations in a data set. In order to be valid, a data cube must include at least one dimension column; however, in practice, it is likely that your data set will contain more than one dimension.
A combined set of dimension values (including the measure) uniquely identifies each observation in the data set. More specifically, the combined dimension values identify the sub-set of the population to which the observed value applies.
Examples of dimensions include the time period to which the observation applies, or the geographic region which the
observation covers, as demonstrated with the
Region columns in the table below:
When to use one
Dimensions are the fundamental building blocks of your data set, so your data set must always include at least one dimension.
If a column groups or identifies a sub-set of the population that your cube describes, then it is a dimension. Care should be taken when deciding whether a column represents a dimension or an attribute. Attributes describe the observed value and should not identify a sub-set of your cube's population.
Referring to the table above,
Region are the dimensions that partition the population into sub-sets. That
Region respectively identify the time period and geographic area to which the observed value relates.
If you do not provide a column configuration in your
qube-config.json file for a column, then
it will be configured by convention. This means that your column will be
treated as a dimension by default unless it has a reserved name.
If you provide a column mapping for your column, but you do not specify the
type field, then csvcubed will
automatically assume that your column is a dimension. It is also possible to explicitly set the column as a dimension
by setting the column's
type field to
dimension. The following examples are therefore equivalent:
This minimal definition results in:
labelfield defaulting to the column titles (
Regionin this example);
code_listbeing automatically generated for each column, containing the column's unique values.
Label, description and definition
Additional details can be associated with the dimensions in your data set through the
As mentioned above, the
label field will default to the column title unless explicitly configured in the
qube-config.json file. In the example below, the
Region label is amended to
description field can be used to provide a longer description of your dimension. If you want to provide
information about your methodology, the
description field is the preferred place for this.
definition_uri field allows you to refer to external human readable resources that further define a dimension's
Code list configuration
Code list - A predefined set of codified concepts which represent the distinct values that a dimension can hold.
One of the key principles of linked data is to connect data from different sources by reusing common definitions. Code lists are an important part of your data set where using linked data can make comparability with other cubes easy.
By default, csvcubed will generate code lists for each of the dimensions in your data set. However, there are several configuration options for refining how your code lists are generated and expressed. These are briefly described below - full details can be found on the Code list configuration page.
Link to an externally-defined code list (URI)
Use a locally-defined code-list-config.json
Define an in-line code list
Suppress a code list
Dimension column templates
Region column could also be configured by using a column template - doing so means that the
cell_uri_template fields will be automatically populated based on the
To reuse or extend an existing dimension, the
from_existing field can be configured to link to a URI where the
dimension to be reused or extended is defined.
To reuse a parent dimension without making any changes to it, set the
from_existing field to the URI defining the
dimension to be reused:
To extend a parent dimension and create a new dimension from it, set the
from_existing field to the URI defining the
dimension to be reused, and set the
label field to indicate that this is a new child dimension of
Cell URI templates
The use of the
cell_uri_template field is considered an advanced configuration option, and therefore care must be
taken to ensure that the values generated are valid.
Language columns have both been configured with a
cell_uri_template field. It is important to note
that this field should only be used where the concept scheme is defined externally at an existing URI, or there is no
concept scheme, but you want to point to an existing resource to provide additional context about the dimension's value.
cell_uri_template is specified:
from_existingmust also be defined, in which case
cell_uri_templateshould refer to the concepts in the existing dimension's code list;
code_listmust be set as
false, in which case
cell_uri_templateshould refer to URIs which are existing RDF resources.
The format of the
cell_uri_template value must follow RFC6570 guidance
for URI Templates. Note that the only variable which can be used in a
cell_uri_template references the column itself;
the name of the variable can be calculated by applying the
CSV column name safe transformation to the CSV column title.
This table shows a list of the possible fields that can be entered when configuring a dimension column.
|The type of the column. This can be left blank to configure a column as a dimension by default.
|The title of the column (Optional)
|The capital case version of the column header in the CSV file with spaces replacing underscores
|A description of the contents of the column (Optional)
|Use a column template (Optional)
|The URI of the resource for reuse/extension (Optional)
|Generate a code-list (true), suppress a code-list (false), file path to a code-list-config.json, in-line code list (JSON), or link to an externally-defined code list (URI)
|A URI of a resource to show how the column is created/managed (e.g. a URI of a PDF explaining a list of units) (Optional)
|(Advanced) Override the URI generated for values within the URI (Optional)