Ingest currently produces two disconnected sibling directories: / for variants and .genotypes/ for genotypes. Annotate and enrich add more siblings. The user manually threads paths between commands. The .cohort/ store knows about cohorts but not the ingest/annotate stages that feed them.
Proposed: everything lives under one root (either .cohort/ or a named dataset dir). Each command discovers what it needs from the store. No manual -o paths for the common case.
.cohort/
datasets/<name>/
manifest.json
variants/chromosome={chr}/...
genotypes/samples.txt, chromosome={chr}/...
annotations/chromosome={chr}/...
cohorts/<id>/
manifest.json
sparse_g.bin, variants.parquet, membership.parquet
Ingest writes to datasets/. Annotate finds unannotated datasets and adds annotations/. Staar builds cohorts from annotated datasets. The user only specifies the input VCF and trait file. Everything else is resolved from the store.
Related: #64, #59, #62, #27, #87
Ingest currently produces two disconnected sibling directories: / for variants and .genotypes/ for genotypes. Annotate and enrich add more siblings. The user manually threads paths between commands. The .cohort/ store knows about cohorts but not the ingest/annotate stages that feed them.
Proposed: everything lives under one root (either .cohort/ or a named dataset dir). Each command discovers what it needs from the store. No manual -o paths for the common case.
Ingest writes to datasets/. Annotate finds unannotated datasets and adds annotations/. Staar builds cohorts from annotated datasets. The user only specifies the input VCF and trait file. Everything else is resolved from the store.
Related: #64, #59, #62, #27, #87