Skip to content

ingest: VariantReader trait for VCF#134

Merged
vineetver merged 1 commit intomasterfrom
variant-reader-trait
Apr 22, 2026
Merged

ingest: VariantReader trait for VCF#134
vineetver merged 1 commit intomasterfrom
variant-reader-trait

Conversation

@vineetver
Copy link
Copy Markdown
Owner

Pulls VCF reading behind a small trait so the processing core stops
caring about format. Collapses the two near-duplicate record loops
(ingest/vcf.rs::process_record and staar/genotype.rs::process_record_geno)
into one RecordContext::process path. FormatHandler::open_reader
is the extension point for new readers.

Trait uses a for_each callback instead of a lending iterator:
noodles 0.73's field wrappers (AlternateBases, Samples, ...)
expose their inner &'r str only through AsRef::as_ref(&self)
with elided lifetime, so next_record would have forced either
~225 GB of memcpy per UKB-scale cohort or ouroboros/unsafe
self-references. Callback form keeps wrapper locals alive inside
the reader's loop, zero copies.

Invariance golden and all ground-truth-vs-R tests bit-identical.
339/339 cargo test, clippy clean with -D warnings.

Side finding (not fixed here)

Writing the reader unit test surfaced a pre-existing MAF bug:
noodles_vcf::Record::samples() includes the FORMAT column
("GT\t0/0\t0/1"), and GenotypeWriter::push's memchr loop
consumes it as sample[0]. Slot 0 gets a spurious 0-dosage,
all real samples shift by one, the last sample is dropped.
Invariance is tautologically green (golden regenerated under the
same bug); ground-truth tests feed JSON fixtures so they miss it.
Affects every real VCF→MAF run. Separate ticket coming; fix is
trivial (strip FORMAT via one tab-split) but needs the golden
regenerated in the same commit.

Closes #87.

Pull VCF reading behind a small VariantReader trait so new formats
(GDS, BGEN) can plug in without touching the processing core. Collapses
the duplicate record loops in ingest/vcf.rs and staar/genotype.rs into
one RecordContext path. FormatHandler::open_reader is the extension
point.

Invariance golden and the ground-truth-vs-R suite bit-identical.

Closes #87.
@vineetver vineetver merged commit cd52272 into master Apr 22, 2026
3 checks passed
@vineetver vineetver deleted the variant-reader-trait branch April 22, 2026 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

generalize ingest reader: VariantReader trait for VCF/BCF/BGEN/GDS

1 participant