Shaping your data
This page introduces the input data shapes supported by csvcubed with the aim of helping you understand which to use and how to generate data in the right shape.
csvcubed requires that all CSV data inputs are provided in one of two specialised forms of tidy data:
- the standard approach - the recommended shape for sparse multi-measure data cubes.
- the pivoted approach - the recommended shape for dense data cubes.
These two shapes share a number of similarities in how they require data to be structured; this is explored in the common structure section. Examples of standard shape and pivoted shape data sets are also presented below.
Common Structure
csvcubed requires that data is structured as per the following example, regardless of data shape:
Year | Location | Value | Status |
---|---|---|---|
2022 | London | 35 | Provisional |
2021 | Cardiff | 26 | Final |
2020 | Edinburgh | 90 | Final |
2021 | Belfast | 0 | Final |
Data set representing the number of 'Arthur's Bakes' stores in UK cities from 2020 to 2022
Note that:
- The table has a flat and tidy header row.
- No attempt has been made to group headers or bring any identifying characteristics such as the year into the headers.
- Each identifying characteristic has its own dimension column.
- the
Year
andLocation
dimension columns define the subset of the population that the row's observed value covers, i.e. each row describes the number of Arthur's Bakes stores in a particular UK city in a given year.
- the
- Every row has an observed values column containing the measured value of some property.
- the
Value
column holds the observed values here.
- the
- Each piece of information describing the observed value has its own attribute column.
- the
Status
attribute column contains information describing the status of the observed value itself. - Note that attributes should only describe the observed value and must not be used to identify any subset of the population.
- the
Standard Shape
Examples of single measure and multiple measure standard shape data sets are below. More detailed configuration instructions can be found in the standard shape section. See Converting to standard shape for instructions on how to convert the shape of your data in Python and R.
Single Measure
In this example, the single measure observed is Count of Arthur's Bakes
and the corresponding unit is Number
.
Year | Location | Value | Status | Measure | Unit |
---|---|---|---|---|---|
2022 | London | 35 | Provisional | Count of Arthur's Bakes | Number |
2021 | Cardiff | 26 | Final | Count of Arthur's Bakes | Number |
2020 | Edinburgh | 90 | Final | Count of Arthur's Bakes | Number |
2021 | Belfast | 0 | Final | Count of Arthur's Bakes | Number |
Multiple Measures
In this example, there are two measures recorded - Count of Arthur's Bakes
and Revenue
. The corresponding units are Number
and GBP Sterling, Millions
respectively.
Year | Location | Value | Status | Measure | Unit |
---|---|---|---|---|---|
2022 | London | 35 | Provisional | Count of Arthur's Bakes | Number |
2022 | London | 25 | Provisional | Revenue | GBP Sterling, Millions |
2021 | Cardiff | 26 | Final | Count of Arthur's Bakes | Number |
2021 | Cardiff | 18 | Final | Revenue | GBP Sterling, Millions |
Pivoted Shape
Examples of single measure and multiple measure pivoted shape data sets are below. More detailed configuration instructions can be found in the pivoted shape section. See Converting to pivoted shape for instructions on how to convert the shape of your data in Python and R.
Single Measure
In this example, the single measure recorded is Count of Arthur's Bakes
.
Year | Location | Count of Arthur's Bakes | Status |
---|---|---|---|
2022 | London | 35 | Provisional |
2021 | Cardiff | 26 | Final |
2020 | Edinburgh | 90 | Final |
2021 | Belfast | 0 | Final |
Multiple Measures
In this example, there are two measures recorded - Count of Arthur's Bakes
and Revenue
.
Year | Location | Count of Arthur's Bakes | Count of Stores Status | Revenue | Revenue Units | Revenue Status |
---|---|---|---|---|---|---|
2022 | London | 35 | Provisional | 25 | GBP (Sterling) | Provisional |
2021 | Cardiff | 26 | Final | 18 | GBP (Sterling) | Final |