Handling missing observed values
It is sometimes the case that you cannot provide a value for a given observation but you would still like to include the observation in the dataset to communicate why the value is missing. This could be for a number of reasons such as invalid or failed measurements, suppression to avoid disclosing sensitive data, or because circumstances wouldn't permit an observation.
csvcubed permits missing observation values as long as there is an attribute column which explains why the value is missing. The attribute column must be based on the http://purl.org/linked-data/sdmx/2009/attribute#obsStatus
attribute. The observation which is missing must have the observation status filled in.
Blank values are represented as a zero length string in an observations column in CSV format (i.e. ,,
).
A worked example: Sweden in Eurovision
In the example below, which is an extract of a data set capturing Sweden's Eurovision entries, we see a Marker column which is used to explain why a the given row's observed value is missing.
For example:
- the number of People on the stage is known for Anna Bergendahl's entry into Eurovision as she competed in the second semi-final; however as she didn't make it to the final, her Final Points, and Final Rank are not applicable and therefore the observations are blank in the associated CSV.
- in the case of The Mama's entry in 2020, the entire contest was cancelled so no values are provided however they were to represent Sweden at Eurovision that year.
It is important to note that the Marker
column is based on the existing http://purl.org/linked-data/sdmx/2009/attribute#obsStatus
attribute; this is vital to ensure that csvcubed knows that the attribute describes the status of the observation. An observation status column can also be used to note where an observation is provisional, revised, suppressed, final, etc.
Year | Entrant | Song | Language | Marker | Measure | Observation | Units |
---|---|---|---|---|---|---|---|
2008 | Charlotte Perrelli | Hero | English | Final Rank | 18 | Ordinal | |
2010 | Anna Bergendahl | This is my Life | English | Did not Qualify for Final | Final Rank | Ordinal | |
2015 | Mans Zelmerlow | Heroes | English | Final Rank | 1 | Ordinal | |
2020 | The Mamas | Move | English | Contest Cancelled | Final Rank | Ordinal | |
2021 | Tusse | voices | English | Final Rank | 14 | Ordinal | |
2008 | Charlotte Perrelli | Hero | English | Final Points | 47 | Points | |
2010 | Anna Bergendahl | This is my Life | English | Did not Qualify for Final | Final Points | Points | |
2015 | Mans Zelmerlow | Heroes | English | Final Points | 365 | Points | |
2020 | The Mamas | Move | English | Contest Cancelled | Final Points | Points | |
2021 | Tusse | voices | English | Final Points | 109 | Points | |
2008 | Charlotte Perrelli | Hero | English | People on Stage | 5 | People | |
2010 | Anna Bergendahl | This is my Life | English | People on Stage | 6 | People | |
2015 | Mans Zelmerlow | Heroes | English | People on Stage | 6 | People | |
2020 | The Mamas | Move | English | Contest Cancelled | People on Stage | People | |
2021 | Tusse | voices | English | People on Stage | 6 | People |
The associated CSV file would look like:
Configuration of observation status in qube-config.json
Adding a Marker
column to represent the observation status can be done in the column definitions section of the qube-config file. In this example the Marker column has the following configuration: