Skip to content

Column definitions

This page explores how to configure a cube's column definitions inside a qube-config.json file. It discusses which columns are required for a valid cube, how to define what your columns mean and how to tell csvcubed to ignore data which isn't part of the cube.

A valid cube

A CSV-W file provides detailed information about the columns beyond their values. In csvcubed, we create CSV-Ws which express data cubes using W3C's RDF Data Cube Vocabulary. The column definitions in a qube-config.json file are designed to map to components in the RDF Data Cube Vocabulary.

In order to be valid, a cube in csvcubed must have:

And it may have:

Configuration

Consider the follow data set about the weight of badgers:

Location Year Average Badger Weight / kg
Sheffield 1996 9.6
Carlisle 1994 10.5

For each of the columns that we need to configure, we write an entry in the columns section of the qube-config.json document, using the column title as key, and create a new JSON object containing the column's configuration details.

Below, you can see that we've provided definitions for two of the three columns:

{
    "$schema": "https://purl.org/csv-cubed/qube-config/v1",
    "title": "Badger weight watch",
    "columns": {
      "Location": {
         "type": "dimension"
      },
      "Average Badger Weight / kg": {
         "type": "observations",
         "measure": {
            "label": "Average Badger Weight"
         },
         "unit": {
            "label": "kg"
         }
      }
    }
}

If we don't define a column mapping for a column in the CSV file then it is assumed to be a dimension unless it uses one of the configuration by convention reserved names.

In our example:

  • we didn't define a mapping for Year so it is assumed by csvcubed to be a dimension,
  • the Locations column has been defined as a Dimension, and
  • the Average Badger Weight / kg column has been configured as an Observations column with unit and measure definitions.

Supported Column Types

What it means to csvcubed
Dimension Identifies what the observed value describes.
Observations column Holds the statistical data which has been recorded.
Measures column Specifies what was measured.
Units column Specifies the unit of measure.
Attribute Further describes the observed value.

Ignoring columns

To ignore a column and not configure it, set the column's value to false. This will ensure the column will not be recognised as part of the cube by csvcubed.

1
2
3
4
5
{ ...
  "columns": {
    "The Column's Title": false
  }
}