Commit 3ff159d
committed
Update STR data pipeline for new data
Major changes here are:
* Instead of a single `reference_region`, STRs now have a list of `reference_regions` with a single one designated the `main_reference_region`
* Allele size distributions and genotype distributions were previously represented with an attempt to represent multidimensional data with a number of nested structs, which was serviceable when there were only one or two dimensions we might want to filter on, but was getting increasingly convoluted. Since this new data expands the number of dimensions further, rather than build on the former schema and confuse things more, these distributions are now represented with a flattened list of structs each of which represents a single subset of the distribution.1 parent d71262d commit 3ff159d
File tree
3 files changed
+106
-429
lines changed- data-pipeline/src/data_pipeline
- datasets/gnomad_v3
- pipelines
3 files changed
+106
-429
lines changed
0 commit comments