Skip to content

staar: flip coding-mask predicates to MetaSVM#127

Merged
vineetver merged 1 commit intomasterfrom
staar/77-coding-mask-metasvm-flip
Apr 16, 2026
Merged

staar: flip coding-mask predicates to MetaSVM#127
vineetver merged 1 commit intomasterfrom
staar/77-coding-mask-metasvm-flip

Conversation

@vineetver
Copy link
Copy Markdown
Owner

Closes #77. Depends on #126 (A2 MetaSVM column wiring) — merge that first.

STAARpipeline gates disruptive_missense on (nonsynonymous SNV) & (MetaSVM_pred=="D"). We were proxying with CADD>20 OR REVEL>0.5, so disruptive_missense / plof_ds / ptv_ds returned wrong p-values. CADD and REVEL are weight channels in the score kernel, not damage classifiers.

disruptive_missense now reads annotation.metasvm_pred. plof_ds = plof ∪ disruptive_missense. ptv_ds = ptv ∪ splice ∪ disruptive_missense — matches R's ptv_ds.R variant_type="variant" branch where splicing is in the union unconditionally; the old splice+CADD>20 gate was ours, not R's. PLofMissense renamed to PLofDs.

Skipped the score-cache predicate-version bust that the plan mentioned. The cache stores per-variant U and K which are mask-independent; masks select which variants feed into a given gene test at scoring time. Flipping the predicate doesn't invalidate cached U/K.

291/291, clippy clean. Four CADD/REVEL-threshold tests deleted, two MetaSVM tests added; net -105 lines.

Base automatically changed from staar/107-metasvm-genehancer-columns to master April 16, 2026 20:47
STAARpipeline gates disruptive_missense on (nonsynonymous SNV) &
(MetaSVM_pred=="D"). We were proxying with CADD>20 OR REVEL>0.5, so the
three dependent masks (disruptive_missense, plof_ds, ptv_ds) returned
p-values that didn't match R. CADD and REVEL are weight channels in the
score kernel, not damage classifiers.

disruptive_missense now reads annotation.metasvm_pred directly.
is_plof_ds = plof ∪ disruptive_missense. is_ptv_ds =
ptv ∪ splice ∪ disruptive_missense (matches ptv_ds.R's variant_type=="variant"
branch where splicing is part of the union unconditionally; the old
splice+CADD>20 gate was our invention, not R's).

PLofMissense renamed to PLofDs; old is_plof_or_missense predicate was the
broad plof ∪ all-missense union which R does not ship. The docs-only
example output path coding_pLoF_missense.parquet becomes coding_plof_ds.parquet.

Test helper v() drops cadd/revel args since no predicate reads them now.
New v_ds() helper sets metasvm_pred=Deleterious for disruptive-missense
tests. Four old CADD/REVEL-threshold tests deleted; two new tests pin the
MetaSVM gate directly.

Net: 105 fewer lines, 291 tests green.
@vineetver vineetver force-pushed the staar/77-coding-mask-metasvm-flip branch from 2d62651 to 8df2c9d Compare April 16, 2026 20:49
@vineetver vineetver merged commit 9bbc219 into master Apr 16, 2026
3 checks passed
@vineetver vineetver deleted the staar/77-coding-mask-metasvm-flip branch April 16, 2026 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Coding masks: add pLoF+disruptive_missense, align PTV+DS with STAARpipeline

1 participant