Skip to content

staar: favor grm subcommand for FastSparseGRM#131

Merged
vineetver merged 2 commits intomasterfrom
staar/99-fast-sparse-grm
Apr 17, 2026
Merged

staar: favor grm subcommand for FastSparseGRM#131
vineetver merged 2 commits intomasterfrom
staar/99-fast-sparse-grm

Conversation

@vineetver
Copy link
Copy Markdown
Owner

@vineetver vineetver commented Apr 16, 2026

Builds a sparse ancestry-adjusted GRM + PCA scores from a pre-built cohort store and KING IBD segment output. Mirrors the full FastSparseGRM pipeline (Lin & Dey, Nature Genetics 2024) end-to-end on the carrier-indexed genotype store.

Per-chromosome architecture: each chromosome opens one ChromosomeView, walks carriers, accumulates, releases. Peak memory is one chromosome mmap plus accumulators. Matches the existing STAAR scoring loop structure so HPC operators can parallelize by chromosome.

New module src/staar/grm/ with: KING .seg parser + union-find components (king.rs), greedy unrelated selection with packed-byte divergence + 256x256 lookup tables (unrelated.rs), carrier-indexed G*v/G'*v + randomized PCA (pca.rs), block-wise ISAF-adjusted kinship with full two-pass re-estimation (estimate.rs), cache under cohorts//grm/ (cache.rs). GRM output is a 3-column TSV loadable directly by the existing kinship::load_kinship path.

The --grm flag on favor staar wires the sealed GRM artifact into the pipeline. It loads the kinship matrix and injects PCA scores as covariates automatically. Mutually exclusive with --kinship. Rejects PC* columns in --covariates to prevent double-adjustment. Passing --grm with no value infers the path from --cohort.

Closes #99

Adds `favor grm --cohort <id> --king-seg <file>` which builds a sparse
ancestry-adjusted GRM + PCA scores from a pre-built cohort store and
KING IBD segment output. Mirrors upstream FastSparseGRM (Lin & Dey 2024)
end-to-end: KING .seg parsing and degree filtering (R/getUnrels.R,
R/calcGRM.R:29-68), greedy unrelated selection with ancestry divergence
tie-breaking (R/getUnrels.R:81-125, cppFunct.cpp:calculateDivergence:
525-576), randomized PCA via power iteration on carrier-indexed G*v /
G'*v operations (R/runPCA.R:drpca:2-78, cppFunct.cpp:postmultiply:252-
283 and premultiply:331-366), block-wise ISAF-adjusted kinship
estimation with full two-pass re-estimation for large components
(R/calcGRM.R:72-173).

Per-chromosome architecture matches the existing STAAR scoring loop:
each chromosome opens one ChromosomeView, walks carriers, accumulates,
releases. Peak memory is one chromosome mmap plus accumulators. GRM
output is a 3-column TSV loadable directly by the existing
kinship::load_kinship path.

New module src/staar/grm/ with king.rs (KING parser + union-find
components), unrelated.rs (greedy selection + packed-byte divergence
with 256x256 lookup tables), pca.rs (allele freq + postmultiply +
premultiply + randomized SVD), estimate.rs (block-wise kinship + two-
pass), cache.rs (fingerprint + probe + save under .cohort/cache/grm/),
types.rs. CLI surface: one new subcommand, no changes to favor staar.
@vineetver vineetver merged commit 1ab9f30 into master Apr 17, 2026
3 checks passed
@vineetver vineetver deleted the staar/99-fast-sparse-grm branch April 17, 2026 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sparse ancestry-adjusted GRM builder (FastSparseGRM / Lin-Dey)

1 participant