Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions PARAMETERS_MAPPING/README.parameters-mapping.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Parameters Mapping or how to add a metadata header for CSV files

1. Check that the parameters to add are listed in [data-services](https://github.com/aodn/data-services/tree/master/PARAMETERS_MAPPING)
* ```parameters.csv``` list of parameters and their ids
* ```qc_flags.csv``` list of the quality control flags and meanings
* ```qc_scheme.csv``` list of the quality control scheme and definitions
* ```unit_view.csv``` list of the units and their ids (cf names, longnames and id).

2. New parameters
* needs to follow the IMOS vocabulary [BENE PLEASE UPDATE]

3. Map the parameters for your dataset collection
* update ```parameters_mapping.csv```. This is the file where all the information from the other files is brought together, and where a variable name as written in the column name of the csv is matched to a unique id for each parameters find in ```parameters.csv```, units find in ```unit_view.csv```, and qc_scheme find in ```qc_scheme.csv``` and ```qc_flags.csv``` in case the data has quality control.

4. Create view in Parameters mapping harvester: update the liquibase to update/include new views in the [harvester](https://github.com/aodn/harvesters/tree/master/workspace/PARAMETERS_MAPPING)
* start your stack restoring the paramaters_mapping schema and the schema you are working on. You can add ```restore_data: True``` under each schema if you want the data from a schema in your stack as well, this may be useful if you need to test the ```PARMETER_MAPPING_harvester```.
```

RestoreDatabaseSchemas:

- schema: parameters_mapping
- schema: working_schema
```
* open pgadmin and access your stack-db to test the sql query that will be used to create/update the view in the parameters_mapping harvester, as it is easier to get a better understanding of the query before updating the liquibase via Talend
* start your pipeline box and Talend
* update liquidbase in the second components ```Create parameters_mapping views```
* the query will crash because of 6 views are calling their respective dataset collection schema:
`aatams_biologging_shearwater_metadata_summary`;
`aatams_biologging_snowpetrel_metadata_summary`;
`aatams_sattag_dm_metadata_summary`;
`aatams_sattag_nrt_metadata_summary`;
`aodn_nt_sattag_hawksbill_metadata_summary`;
`aodn_nt_sattag_oliveridley_metadata_summary`
* write the new view you are working on at the top of the liquidbase script, so Talend can run and create before crashing at `aatams_biologging_shearwater_metadata_summary`
* check stack database that the views are created as expected.

5. Merge the changes made in
* [data-services](https://github.com/aodn/data-services/tree/master/PARAMETERS_MAPPING)
* [harvester](https://github.com/aodn/harvesters/tree/master/workspace/PARAMETERS_MAPPING) to test on RC before merging to production.

6. Test on RC: check the csv files a user can download from the portal
* once the harvester is deployed in RC, in the terminal ssh to 4-nec-hob
* type the command `talend_run parameters_mapping-parameters_mapping`
* type `talend_log parameters_mapping-parameters_mapping` to check it run succesfully
* to confirm, check the new metadata additions are in the respective view in pgadmin, and ultimately in the downloaded csv file from the [rc-portal](http://portal-rc.aodn.org.au/).

7. Promoting to production
* on Jenkins, go to [data-services_tag_promote](https://build.aodn.org.au/view/projectofficer/job/data-services_tag_promote/) and proceed to the next steps `tag_prod` and `push_prod`. At 5pm on the same day the changes will be in production.

# Other information
The [PARAMETERS_MAPPING harvester](https://github.com/aodn/harvesters/tree/master/workspace/PARAMETERS_MAPPING) runs on a cron job daily, Monday to Friday.
It harvests the content of these 5 files (```parameters.csv```,```parameters_mapping.csv```,```qc_flags.csv```,```qc_scheme.csv```,```unit_view.csv```) into the parameters_mapping DB schema and create a `_metadata_summary` view for each of the collection listed (it is not IMOS specific, for example we have a mapping for the AODN `_WAVE_DM` + `NRT` collections).