Designing a CSV
A transcribed video walkthrough
Prerequisites to follow along
csvcubed must be installed in order to proceed, please go back to installation.
How csvcubed interprets a CSV
csvcubed needs to understand how your statistical data is structured in order to make it more machine readable. There are two ways that you can do this with csvcubed, we are covering the configuration by convention approach in this quick-start. Configuration by convention requires a standard CSV data shape with conventional column titles and fill it out with your data which is explained briefly below.
Structuring your data
The standard shape of data is the recommended way to start using csvcubed. It requires that your CSV has the following columns:
In the above table:
- identifying characteristics are one or more columns which identify the sub-set of the population that has been observed in a given row. These are called dimensions elsewhere in documentation.
Valuecolumn contains the value which has been observed or measured; there is only ever one observed value per row in the standard shape.
Measurecolumn describes what has been observed or measured; note that the measure should not include any information about the units of measure.
Unitcolumn describes the unit of measure in which the
Valuehas been recorded.
The names of the columns is how csvcubed interprets what each column contains in the configuration by convention approach. Using the column titles
Unit or one of their synonyms in your CSV will work. All other columns are assumed to be identifying characteristics (dimensions).
Once you have gained some familiarity with using csvcubed, you may find that the pivoted shape is a better way to represent your data. See the Shaping your data section for more information on the pivoted shape.
Adding your data
First, we start by taking the above shape and adding columns for each of your identifying characteristics (dimensions).
From hereon in we will be creating a data set to represent the competition winners in Eurovision. Our CSV will be structured as per the following extract where
Language are the cube's identifying dimensions. Note that we have included multiple measures in this dataset, as
Final Points and
People on Stage are recorded for each contestant:
|1974||ABBA||Waterloo||English||6||People on Stage||Number|
|2008||Charlotte Perrelli||Hero||English||5||People on Stage||Number|
|2008||Charlotte Perrelli||Hero||English||18||Final Rank||Unitless|
|2008||Charlotte Perrelli||Hero||English||47||Final Points||Unitless|
You can download the full CSV from GitHub.
The next step is to build a CSV-W.
Optional: further reading
The other way to configure a CSV-W cube is using the explicit configuration approach - you write a JSON configuration file which tells csvcubed exactly how to interpret your data.