-
Notifications
You must be signed in to change notification settings - Fork 2
Add Datasets
In order to ingest datasets via the Statistical Data API, the following conventions must be respected.
The ids of the observations need to be unique per repository. This means that if you have non-unique identifiers in your datasets (e.g., numbers) you will have to change your id convention. One method to do this would be to simply add the name of the dataset in front of the identifier as explained through the following table.
| Dataset | Id (Identifier) | Recommended API identifier |
|---|---|---|
| Dataset 1 | 1 | ds1_1 |
| Dataset 2 | 1 | ds2_1 |
In case you might want to split your observation into multiple observations you can take the same convention one step further by adding a suffix and a number that offer us additional information about how you split the data:
| Dataset | Id (Identifier) | Split API identifier |
|---|---|---|
| Dataset 1 | ds1_111 | ds1_111_div1 |
| Dataset 1 | ds1_111 | ds2_111_div2 |
Identifiers like ds1_111_div1, ds1_111_div2 will tell us that you split your observations into multiple observations that will later need to be re-assembled when visualizing your data.
Before adding the data to our API you might need to convert it in the Statistical Data API format.
A typical observation in this format will look like this:
{
"_id": “11102",
"uri": "http://worldbank.270a.info/dataset/world-bank-indicators/PA.NUS.FCRF/NZ/1982",
"added_date": "2014-09-10T15:00:02.294083",
"date": "1982-01-01T00:00:00",
"indicator_id": "wbexchgrate",
"indicator_name": "WB Exchange rate",
"value": 1.33260833233333,
"repository_id": "worldbank",
"description": "Official exchange rate (LCU per US$, period average)",
"producer": "World Bank",
"sample": "tourism_statistics",
"frequency": "year",
"year": 1982,
"target_country": "NZ",
"target_type": "country",
"target_location": [
{
"name": "New Zealand",
"point": {
"lat": -42,
"lon": 174
}
}
],
"observation_type": "observation"
}
Basic CRUD operations are described in the API Interface
Use this script - datasetuploader.py
Update your token before running the script.
You need to call it like this:
sudo python dataset uploader -h
datasetuploader.py -s <serviceurl> -d <datasetpath> -i <indicator>
Example run:
sudo python datasetuploader.py -s https://api.weblyzard.com/0.2/observations/weblyzard.com/test/ -d wl_area_presence.json -i AreaPresence
Datasets used in these examples are available in the same folder.
Current version considers both indicator_id and indicator_name as having the same value.
Currently a prefix is added to the id to avoid the situation in which datasets have same ids, and indicator_name and indicator_id are populated via the indicator parameter added to the new version of the script.
Also added a location_id field for datasets that don't have target_location in the classic format.
Here are example usages for several datasets:
AREA PRESENCE
sudo python datasetuploader.py -s https://api.weblyzard.com/0.2/observations/weblyzard.com/test/ -d wl_area_presence.json -i AreaPresence
PEAKS
sudo python datasetuploader.py -s https://api.weblyzard.com/0.2/observations/weblyzard.com/test/ -d wl_peaks.json -i Peaks
A similar script to the previous one can be written for deleting entire datasets or all datasets. Currently you will need to know the identifiers of the objects you want to delete.