feat(rvdna): add health biomarker analysis engine with streaming simulation by ruvnet · Pull Request #199 · ruvnet/RuVector

ruvnet · 2026-02-22T14:58:41Z

Implement ADR-014 Health Biomarker Analysis Architecture:

biomarker.rs: Composite risk scoring engine with 17-SNP weight matrix,
gene-gene interaction modifiers (COMT×OPRM1, MTHFR compound, BRCA1×TP53),
64-dim HNSW-aligned profile vectors, clinical reference ranges for 12
biomarkers, and deterministic synthetic population generation
biomarker_stream.rs: Streaming biomarker simulator with generic RingBuffer,
configurable noise/drift/anomaly injection, z-score anomaly detection,
linear regression trend analysis, and exponential moving averages
35 unit tests + 15 integration tests (168 total, 0 failures)
Criterion benchmark suite targeting ADR-014 performance budgets

https://claude.ai/code/session_014FpaYVohmyLH5dcBZTgmSY

…lation Implement ADR-014 Health Biomarker Analysis Architecture: - biomarker.rs: Composite risk scoring engine with 17-SNP weight matrix, gene-gene interaction modifiers (COMT×OPRM1, MTHFR compound, BRCA1×TP53), 64-dim HNSW-aligned profile vectors, clinical reference ranges for 12 biomarkers, and deterministic synthetic population generation - biomarker_stream.rs: Streaming biomarker simulator with generic RingBuffer, configurable noise/drift/anomaly injection, z-score anomaly detection, linear regression trend analysis, and exponential moving averages - 35 unit tests + 15 integration tests (168 total, 0 failures) - Criterion benchmark suite targeting ADR-014 performance budgets https://claude.ai/code/session_014FpaYVohmyLH5dcBZTgmSY

https://claude.ai/code/session_014FpaYVohmyLH5dcBZTgmSY

… halve ring buffer memory - Fix snp_idx silent fallback: unwrap_or(0) masked missing SNPs with incorrect index-0 lookups; now returns Option<usize> - RingBuffer: eliminate Option<T> wrapper, halving per-slot memory for f64 (8 bytes vs 16); use T::Default instead - window_mean_std: replace two-pass sum+variance with single-pass Welford's online algorithm (2x fewer cache misses) - compute_risk_scores: pre-compute category max scores via category_meta() to avoid re-scanning SNP_WEIGHTS per call; use &str keys in intermediate HashMap to reduce String allocations - HashMap capacity hints throughout (StreamProcessor, genotypes, biomarker_values, cat_scores) to eliminate rehashing - generate_synthetic_population: hoist APOE lookup out of inner loop, reserve biomarker_values capacity upfront - All 48 tests pass (33 unit + 15 integration), benchmark compiles https://claude.ai/code/session_014FpaYVohmyLH5dcBZTgmSY

…fehacks clinical data Evidence-based adjustments from geneticlifehacks.com research articles: - MTHFR C677T (rs1801133): het weight 0.30→0.35 to match documented 40% enzyme activity decrease - MTHFR A1298C (rs1801131): het 0.15→0.10, hom_alt 0.35→0.25 to match documented ~20% enzyme decrease - Homocysteine reference range: 4-12→5-15 μmol/L (clinical consensus), critical_high 50→30 (moderate hyperhomocysteinemia threshold) - Add MTHFR A1298C × COMT interaction (1.25x Neurological): A1298C homozygous + COMT slow = amplified depression risk - Add DRD2/ANKK1 × COMT interaction (1.2x Neurological): rs1800497 × Val158Met working memory interaction - Guard vector encoding with .take(4) so expanded interaction table (now 6 entries) doesn't overflow dims 56-59 Sources: - geneticlifehacks.com/mthfr/ (enzyme activity percentages) - geneticlifehacks.com/mthfr-c677t/ (MTHFR-COMT depression data) - geneticlifehacks.com/understanding-homocysteine-levels/ (ref ranges) - geneticlifehacks.com/dopamine-receptor-genes/ (DRD2×COMT interaction) All 48 tests pass (33 unit + 15 integration), benchmark compiles. https://claude.ai/code/session_014FpaYVohmyLH5dcBZTgmSY

Evidence-based refinements from peer-reviewed clinical research: - TP53 rs1042522 (Pro72Arg): hom_ref 0.10→0.00 — CC/Pro/Pro is not independently risk-associated; prior non-zero baseline was unjustified - BRCA2 rs11571833 (K3326X): het 0.25→0.20 — aligned with iCOGS meta-analysis OR 1.28 for breast cancer (Meeks et al., JNCI 2016, 76,637 cases / 83,796 controls) - NQO1 rs1800566 (Pro187Ser): het 0.20→0.15, hom_alt 0.45→0.30 — aligned with comprehensive meta-analysis OR 1.18 for TT vs CC (Lajin & Alachkar, Br J Cancer 2013, 92 studies, 21,178 cases); larger 2022 meta-analysis (43,736 cases) found no overall association Validated unchanged weights against SOTA evidence: - APOE rs429358: OR 3-4x het, 8-15x hom (Belloy JAMA Neurology 2023) - SLCO1B1 rs4363657: OR 4.5/allele, 16.9 hom (SEARCH/NEJM; CPIC 2022) - COMT×OPRM1 interaction: confirmed p=0.037 (orthopedic trauma study) All 48 tests pass (33 unit + 15 integration). https://claude.ai/code/session_014FpaYVohmyLH5dcBZTgmSY

…tion, and interaction tests - Add gene→biomarker correlations in synthetic population: APOE e4→lower HDL/higher triglycerides, MTHFR→lower B12, NQO1 null→higher CRP - Add CUSUM changepoint detection algorithm to StreamProcessor for detecting sustained biomarker shifts beyond simple anomaly detection - Add 4 new integration tests: MTHFR×COMT interaction, DRD2×COMT interaction, APOE→HDL population correlation, CUSUM changepoint detection - Remove unused variant_categories import - All 172 tests pass, all ADR-014 performance targets exceeded https://claude.ai/code/session_014FpaYVohmyLH5dcBZTgmSY

- Add Health Biomarker Engine section to rvDNA README with usage examples for composite risk scoring, streaming processing, and synthetic populations - Add biomarker.rs and biomarker_stream.rs to Modules table - Update test count from 102 to 172 (added biomarker tests) - Add biomarker benchmark results to Speed table - Add Welford, CUSUM, and PRS to Published Algorithms table - Update root README Genomics & Health capabilities (49 → 51 features) - Add health biomarker engine and streaming biomarkers to root feature table - Update rvDNA details section with risk scoring and streaming capabilities https://claude.ai/code/session_014FpaYVohmyLH5dcBZTgmSY

…eaming Structural improvements from deep code review: - Consolidate 5 parallel arrays (SNP_WEIGHTS, HOM_REF, HOM_ALT, HET, ALLELE_FREQS) into single SnpDef struct array — eliminates entire class of parallel-array misalignment bugs - Cache category_meta() with LazyLock — avoids per-call Vec allocation (critical in generate_synthetic_population hot path) - Hoist Normal::new out of inner loop in generate_readings — pre-compute distributions per biomarker instead of per-step*per-biomarker - Add clinically meaningful lower bounds: LDL normal_low 0→50 mg/dL (critical_low 25), Triglycerides normal_low 0→35 mg/dL (critical_low 20) - Optimize RingBuffer::clear from O(capacity) to O(1) — head/len reset is sufficient since push overwrites before read - Use NUM_SNPS const for vector encoding bounds instead of magic number 51 All 172 tests pass, zero clippy warnings for rvdna. https://claude.ai/code/session_014FpaYVohmyLH5dcBZTgmSY

…ence Add rs10455872 (OR 1.6-1.75/allele CHD) and rs3798220 (OR 1.49-1.54/allele) from 2024 LPA meta-analyses. Include Lp(a) biomarker reference (0-75 nmol/L) and gene-biomarker correlation in population model. Separate NUM_ONEHOT_SNPS (17) from NUM_SNPS (19) to preserve 64-dim vector layout with LPA encoded in summary dimension 63. https://claude.ai/code/session_014FpaYVohmyLH5dcBZTgmSY

Add PCSK9 R46L loss-of-function variant (NEJM 2006: OR 0.77 CHD, 0.40 MI) as a protective cardiovascular SNP with negative weights. Include PCSK9→LDL-C biomarker correlation (15-21% lower LDL in carriers). Refactor gene-biomarker correlations from match to additive if-chain so multiple gene effects can stack on the same biomarker (e.g., APOE raises LDL while PCSK9 R46L lowers it). Panel expanded to 20 SNPs. https://claude.ai/code/session_014FpaYVohmyLH5dcBZTgmSY

Update all references from 17 SNPs to 20 SNPs reflecting the addition of LPA rs10455872/rs3798220 and PCSK9 rs11591147. Document new gene-biomarker correlations (LPA→Lp(a), PCSK9→LDL) in synthetic population section. Update module table line counts. https://claude.ai/code/session_014FpaYVohmyLH5dcBZTgmSY

…nd benchmarks ADR-015: Pure-JS biomarker engine mirroring Rust biomarker.rs and biomarker_stream.rs exactly. Includes: - src/biomarker.js: 20-SNP composite risk scoring, 6 gene-gene interactions, 64-dim L2-normalized profile vectors, synthetic population generation with Mulberry32 PRNG - src/stream.js: RingBuffer, StreamProcessor with Welford online stats, CUSUM changepoint detection, z-score anomaly detection, linear regression trend analysis, batch reading generation - tests/test-biomarker.js: 35 tests + 5 benchmarks covering all classification levels, risk scoring, vector encoding, population generation, streaming, anomaly/trend detection - index.d.ts: Full TypeScript definitions for all biomarker APIs - package.json: Bump to v0.3.0, add biomarker keywords Benchmark results (Node.js): computeRiskScores: 7.33 us/op encodeProfileVector: 9.51 us/op RingBuffer push+iter: 3.32 us/op https://claude.ai/code/session_014FpaYVohmyLH5dcBZTgmSY

Optimizations (1.7-2x speedup across all hot paths): - biomarker.js: Replace O(n) findIndex with pre-built RSID_INDEX Map for O(1) SNP lookups; cache LPA SNP references to avoid repeated array iteration in vector encoding and population generation - stream.js: Add RingBuffer.pushPop() returning evicted value; replace O(n) windowMeanStd buffer scan with O(1) incremental windowed Welford algorithm in StreamProcessor Benchmark improvements (before → after): computeRiskScores: 7.33 → 3.70 us/op (1.98x) encodeProfileVector: 9.51 → 5.25 us/op (1.81x) StreamProcessor.processReading: 220 → 110 us/op (2.00x) generateSyntheticPopulation(100): 1090 → 595 us/op (1.83x) Real-data integration tests (25 new tests): - 4 realistic 23andMe fixture files (29 SNPs each) covering: high-risk cardio, low-risk baseline, multi-risk, PCSK9-protective - End-to-end pipeline: parse 23andMe → biomarker scoring → streaming - Clinical scenarios: APOE e4/e4, BRCA1 carrier, MTHFR compound het, COMT×OPRM1 pain, DRD2×COMT, PCSK9 protective - Cross-validation: 8 JS↔Rust parity assertions on tables, z-scores, classification, vector layout, risk thresholds - Population correlations: APOE→HDL, LPA→Lp(a), score distribution, clinical biomarker range validation (500 subjects) - Full pipeline benchmark: 220 us end-to-end https://claude.ai/code/session_014FpaYVohmyLH5dcBZTgmSY

claude added 13 commits February 22, 2026 05:19

style(rvdna): apply linter formatting to biomarker module

65d671d

https://claude.ai/code/session_014FpaYVohmyLH5dcBZTgmSY

ruvnet merged commit f957eb7 into main Feb 22, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rvdna): add health biomarker analysis engine with streaming simulation#199

feat(rvdna): add health biomarker analysis engine with streaming simulation#199
ruvnet merged 13 commits intomainfrom
claude/health-biomarker-adr-ESZy4

ruvnet commented Feb 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ruvnet commented Feb 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants