Designing a CSV
A transcribed video walkthrough
Pre-requisites to follow along
csvcubed must be installed in order to proceed, please go back to installation.
How csvcubed interprets a CSV
csvcubed needs to understand how your statistical data is structured in order to make it more machine readable. There are two ways that you can do this with csvcubed, we are covering the configuration by convention approach in this quick-start. Configuration by convention requires a standard CSV data shape with conventional column titles and fill it out with your data which is explained briefly below.
Structuring your data
The standard shape of data is the recommended way to shape your data for csvcubed. It requires that your CSV has the following columns:
In the above table:
- identifying characteristics are one or more columns which identify the sub-set of the population that has been observed in a given row. These are called dimensions elsewhere in documentation.
Valuecolumn contains the value which has been observed or measured; there is only ever one observed value per row in the standard shape.
Measurecolumn describes what has been observed or measured; note that the measure should not include any information about the units of measure.
Unitcolumn describes the unit of measure in which the
Valuehas been recorded.
The names of the columns is how csvcubed interprets what each column contains in the configuration by convention approach. Using the column titles
Unit or one of their synonyms in your CSV will work. All other columns are assumed to be identifying characteristics (dimensions).
Adding your data
First, we start by taking the above shape and adding columns for each of your identifying characteristics (dimensions).
From hereonin we will be creating a data set to represent the competition winners in Eurovision. Our CSV will be structured as per the following extract:
|1974||ABBA||Waterloo||English||6||People on Stage||Number|
|2008||Charlotte Perrelli||Hero||English||5||People on Stage||Number|
|2008||Charlotte Perrelli||Hero||English||18||Final Rank||Unitless|
|2008||Charlotte Perrelli||Hero||English||47||Final Points||Unitless|
Language are the cube's identifying dimensions.
Note that we have included multiple measures in this dataset as
Final Points and
People on Stage have been recorded for each contestant.
You can download the full CSV here.
The next step is to build a CSV-W.
Optional: further reading
The other way to configure a CSV-W cube is using the explicit configuration approach - you write a JSON configuration file which tells csvcubed exactly how to interpret your data.