Skip to content

Add Datasets

adrianb82 edited this page Jul 20, 2016 · 6 revisions

Conventions

In order to ingest datasets via the Statistical Data API, the following conventions must be respected.

The ids of the observations need to be unique per repository. This means that if you have non-unique identifiers in your datasets (e.g., numbers) you will have to change your id convention. One method to do this would be to simply add the name of the dataset in front of the identifier as explained through the following table.

Dataset Id (Identifier) Recommended API identifier
Dataset 1 1 ds1_1
Dataset 2 1 ds2_1

In case you might want to split your observation into multiple observations you can take the same convention one step further by adding a suffix and a number that offer us additional information about how you split the data:

Dataset Id (Identifier) Split API identifier
Dataset 1 ds1_111 ds1_111_div1
Dataset 1 ds1_111 ds2_111_div2

Identifiers like ds1_111_div1, ds1_111_div2 will tell us that you split your observations into multiple observations that will later need to be re-assembled when visualizing your data.

Converting datasets

Before adding the data to our API you might need to convert it in the Statistical Data API format.

A typical observation in this format will look like this:

{
    "_id": “11102",
    "uri": "http://worldbank.270a.info/dataset/world-bank-indicators/PA.NUS.FCRF/NZ/1982",
    "added_date": "2014-09-10T15:00:02.294083",
    "date": "1982-01-01T00:00:00",
    "indicator_id": "wbexchgrate",
    "indicator_name": "WB Exchange rate",
    "value": 1.33260833233333,
    "repository_id": "worldbank",
    "description": "Official exchange rate (LCU per US$, period average)",
    "producer": "World Bank",
    "sample": "tourism_statistics",
    "frequency": "year",
    "year": 1982,
    "target_country": "NZ",
    "target_type": "country",
    "target_location": [
        {
         "name": "New Zealand",
         "point": {
             "lat": -42,
             "lon": 174
             }
         }
   ],
   "observation_type": "observation"
}

Creating (Adding) / Retrieving / Updating / Deleting Observations

Basic CRUD operations are described in the API Interface

Adding datasets

Use this script - datasetuploader.py

Update your token before running the script.

You need to call it like this:

sudo python dataset uploader -h

datasetuploader.py -s <serviceurl> -d <datasetpath> -i <indicator>

Example run:

sudo python datasetuploader.py -s https://api.weblyzard.com/0.2/observations/weblyzard.com/test/ -d wl_area_presence.json -i AreaPresence

Datasets used in these examples are available in the same folder.

Current version considers both indicator_id and indicator_name as having the same value.

Currently a prefix is added to the id to avoid the situation in which datasets have same ids, and indicator_name and indicator_id are populated via the indicator parameter added to the new version of the script.

Also added a location_id field for datasets that don't have target_location in the classic format.

Here are example usages for several datasets:

AREA PRESENCE
sudo python datasetuploader.py -s https://api.weblyzard.com/0.2/observations/weblyzard.com/test/ -d wl_area_presence.json -i AreaPresence

PEAKS
sudo python datasetuploader.py -s https://api.weblyzard.com/0.2/observations/weblyzard.com/test/ -d wl_peaks.json -i Peaks

Deleting Datasets

A similar script to the previous one can be written for deleting entire datasets or all datasets. Currently you will need to know the identifiers of the objects you want to delete.

Clone this wiki locally