staar: add metasvm_pred + genehancer columns to annotation parquet#126
Merged
staar: add metasvm_pred + genehancer columns to annotation parquet#126
Conversation
) STAARpipeline's coding masks (disruptive_missense, plof_ds, ptv_ds) key off MetaSVM_pred=="D". We carried cadd_phred + revel as proxies, which means our masks don't match R. GeneHancer is an opaque identifier string the pipeline passes through for downstream tooling. Pulls both from FAVOR full-tier (a.dbnsfp.metasvm_pred and a.genehancer.id), flows them end-to-end: ingest -> cohort parquet -> VariantIndexEntry -> AnnotatedVariant -> MetaSTAAR sumstats schema. MetaSVM parses into a typed enum (Deleterious/Tolerated/Unknown) so A3's mask-predicate flip is a pattern match, not a string compare. GeneHancer stays Box<str>; no predicate reads it today. Preflight gains require_structural_annotation_catalog alongside the 11-weight catalog lock from #104. Fails loud at staar start if either column is missing from the cohort, so users with an old cohort get a clear rebuild message. No mask math changes; that's #77.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #107. STAARpipeline masks disruptive_missense / plof_ds / ptv_ds all key off MetaSVM_pred=="D". We were proxying with cadd_phred + revel, so our masks don't match R. GeneHancer is an opaque ID string the pipeline passes through for downstream tools.
Pulls both from FAVOR full-tier (a.dbnsfp.metasvm_pred, a.genehancer.id) and wires them through ingest -> cohort parquet -> VariantIndexEntry -> AnnotatedVariant -> MetaSTAAR sumstats. MetaSvmPred is a typed enum so A3's mask flip is a pattern match rather than a string compare.
Preflight adds require_structural_annotation_catalog next to the 11-weight lock; old cohorts get a clear rebuild message. No mask math changes yet — that's #77.
cargo test: 294/294, clippy clean.