staar: favor grm subcommand for FastSparseGRM by vineetver · Pull Request #131 · vineetver/favor-cli

vineetver · 2026-04-16T23:44:50Z

Builds a sparse ancestry-adjusted GRM + PCA scores from a pre-built cohort store and KING IBD segment output. Mirrors the full FastSparseGRM pipeline (Lin & Dey, Nature Genetics 2024) end-to-end on the carrier-indexed genotype store.

Per-chromosome architecture: each chromosome opens one ChromosomeView, walks carriers, accumulates, releases. Peak memory is one chromosome mmap plus accumulators. Matches the existing STAAR scoring loop structure so HPC operators can parallelize by chromosome.

New module src/staar/grm/ with: KING .seg parser + union-find components (king.rs), greedy unrelated selection with packed-byte divergence + 256x256 lookup tables (unrelated.rs), carrier-indexed G*v/G'*v + randomized PCA (pca.rs), block-wise ISAF-adjusted kinship with full two-pass re-estimation (estimate.rs), cache under cohorts//grm/ (cache.rs). GRM output is a 3-column TSV loadable directly by the existing kinship::load_kinship path.

The --grm flag on favor staar wires the sealed GRM artifact into the pipeline. It loads the kinship matrix and injects PCA scores as covariates automatically. Mutually exclusive with --kinship. Rejects PC* columns in --covariates to prevent double-adjustment. Passing --grm with no value infers the path from --cohort.

Closes #99

Adds `favor grm --cohort <id> --king-seg <file>` which builds a sparse ancestry-adjusted GRM + PCA scores from a pre-built cohort store and KING IBD segment output. Mirrors upstream FastSparseGRM (Lin & Dey 2024) end-to-end: KING .seg parsing and degree filtering (R/getUnrels.R, R/calcGRM.R:29-68), greedy unrelated selection with ancestry divergence tie-breaking (R/getUnrels.R:81-125, cppFunct.cpp:calculateDivergence: 525-576), randomized PCA via power iteration on carrier-indexed G*v / G'*v operations (R/runPCA.R:drpca:2-78, cppFunct.cpp:postmultiply:252- 283 and premultiply:331-366), block-wise ISAF-adjusted kinship estimation with full two-pass re-estimation for large components (R/calcGRM.R:72-173). Per-chromosome architecture matches the existing STAAR scoring loop: each chromosome opens one ChromosomeView, walks carriers, accumulates, releases. Peak memory is one chromosome mmap plus accumulators. GRM output is a 3-column TSV loadable directly by the existing kinship::load_kinship path. New module src/staar/grm/ with king.rs (KING parser + union-find components), unrelated.rs (greedy selection + packed-byte divergence with 256x256 lookup tables), pca.rs (allele freq + postmultiply + premultiply + randomized SVD), estimate.rs (block-wise kinship + two- pass), cache.rs (fingerprint + probe + save under .cohort/cache/grm/), types.rs. CLI surface: one new subcommand, no changes to favor staar.

vineetver added 2 commits April 16, 2026 19:44

fix clippy unnecessary_sort_by on CI (rust 1.95)

15e1943

vineetver merged commit 1ab9f30 into master Apr 17, 2026
3 checks passed

vineetver deleted the staar/99-fast-sparse-grm branch April 17, 2026 16:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

staar: favor grm subcommand for FastSparseGRM#131

staar: favor grm subcommand for FastSparseGRM#131
vineetver merged 2 commits intomasterfrom
staar/99-fast-sparse-grm

vineetver commented Apr 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vineetver commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vineetver commented Apr 16, 2026 •

edited

Loading