Skip to content

Clean up incoming ID3C data #32

@trvrb

Description

@trvrb

@joverlee521 ---

There are a small handful of upstream fixes we need to shipping views.

  1. The date field in v2/shipping/augur-build-metadata was formatted as 2019-09-25T19:37:35.483+00:00. This should just read 2019-09-25. I've fixed this on the augur side here: https://github.com/seattleflu/augur-build/blob/master/scripts/download_sfs_metadata.py#L25 for the time being.
  2. Our strain names should match those used by the rest of the world rather than just being a long UUID. I'd like to match existing format as closely as possible. Strains in the US are geographically labeled by state, like B/Washington/2/2019. This means that sample UUID fe1a1206-21ef-45ff-8be0-9d7643eef879 would be strain A/Washington/43eef879/2019, ie taking A or B depending on flu A or flu B and taking year from date.
  3. We need neighborhood (within Seattle proper) / puma (outside Seattle proper) for location. I believe that @kairstenfay may have started on this already in ID3C.
  4. Include age_range_coarse as a field in the shipping view.
  5. Restrict rows in shipping.augur-build-metadata to only those samples that have sequencing data.

Edited to update format for strain name in item 2 and to include items 4 and 5.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions