Shaping your data

This page introduces the input data shapes supported by csvcubed with the aim of helping you understand which to use and how to generate data in the right shape.

csvcubed requires that all CSV data inputs are provided in one of two specialised forms of tidy data:

the standard approach - the recommended shape for sparse multi-measure data cubes.
the pivoted approach - the recommended shape for dense data cubes.

These two shapes share a number of similarities in how they require data to be structured; this is explored in the common structure section. Examples of standard shape and pivoted shape data sets are also presented below.

Common Structure

csvcubed requires that data is structured as per the following example, regardless of data shape:

Year	Location	Value	Status
2022	London	35	Provisional
2021	Cardiff	26	Final
2020	Edinburgh	90	Final
2021	Belfast	0	Final

Data set representing the number of 'Arthur's Bakes' stores in UK cities from 2020 to 2022

Note that:

The table has a flat and tidy header row.
- No attempt has been made to group headers or bring any identifying characteristics such as the year into the headers.
Each identifying characteristic has its own dimension column.
- the Year and Location dimension columns define the subset of the population that the row's observed value covers, i.e. each row describes the number of Arthur's Bakes stores in a particular UK city in a given year.
Every row has an observed values column containing the measured value of some property.
- the Value column holds the observed values here.
Each piece of information describing the observed value has its own attribute column.
- the Status attribute column contains information describing the status of the observed value itself.
- Note that attributes should only describe the observed value and must not be used to identify any subset of the population.

Standard Shape

Examples of single measure and multiple measure standard shape data sets are below. More detailed configuration instructions can be found in the standard shape section. See Converting to standard shape for instructions on how to convert the shape of your data in Python and R.

Single Measure

In this example, the single measure observed is Count of Arthur's Bakes and the corresponding unit is Number.

Year	Location	Value	Status	Measure	Unit
2022	London	35	Provisional	Count of Arthur's Bakes	Number
2021	Cardiff	26	Final	Count of Arthur's Bakes	Number
2020	Edinburgh	90	Final	Count of Arthur's Bakes	Number
2021	Belfast	0	Final	Count of Arthur's Bakes	Number

Multiple Measures

In this example, there are two measures recorded - Count of Arthur's Bakes and Revenue. The corresponding units are Number and GBP Sterling, Millions respectively.

Year	Location	Value	Status	Measure	Unit
2022	London	35	Provisional	Count of Arthur's Bakes	Number
2022	London	25	Provisional	Revenue	GBP Sterling, Millions
2021	Cardiff	26	Final	Count of Arthur's Bakes	Number
2021	Cardiff	18	Final	Revenue	GBP Sterling, Millions

Pivoted Shape

Examples of single measure and multiple measure pivoted shape data sets are below. More detailed configuration instructions can be found in the pivoted shape section. See Converting to pivoted shape for instructions on how to convert the shape of your data in Python and R.

Single Measure

In this example, the single measure recorded is Count of Arthur's Bakes.

Year	Location	Count of Arthur's Bakes	Status
2022	London	35	Provisional
2021	Cardiff	26	Final
2020	Edinburgh	90	Final
2021	Belfast	0	Final

Multiple Measures

In this example, there are two measures recorded - Count of Arthur's Bakes and Revenue.

Year	Location	Count of Arthur's Bakes	Count of Stores Status	Revenue	Revenue Units	Revenue Status
2022	London	35	Provisional	25	GBP (Sterling)	Provisional
2021	Cardiff	26	Final	18	GBP (Sterling)	Final