Add read support for GDS (CoreArray/SeqArray) genotype files so favor ingest cohort.gds --annotations ... works the same as the VCF path.
Approach
Vendor the CoreArray C++ library as a feature-gated -sys crate (crates/corearray-sys/). Static link, no R dependency. The C++ core handles container parsing, compression (LZMA/LZ4/zlib), and bit2 genotype unpacking. Rust side reads /sample.id, /chromosome, /position, /allele, /genotype/data nodes via FFI and produces the same GenotypeResult that the VCF extractor produces.
Everything downstream (annotation join, sparse store build, STAAR) stays untouched.
crates/
corearray-sys/
Cargo.toml # links = "corearray", feature-gated
build.rs # cc::Build, compiles vendored C++
src/lib.rs # unsafe extern "C" bindings
vendor/ # CoreArray C++ source + bundled zlib/lz4/lzma
Work
- Vendor CoreArray C++ (strip R layer, keep core + compression)
- C shim: open, walk nodes, read arrays
- Rust FFI +
GdsHandler for format detection
gds.rs genotype extractor producing GenotypeResult
- Wire into
FormatRegistry and cohort ingest path
- Test against real SeqArray files
Depends on
Add read support for GDS (CoreArray/SeqArray) genotype files so
favor ingest cohort.gds --annotations ...works the same as the VCF path.Approach
Vendor the CoreArray C++ library as a feature-gated
-syscrate (crates/corearray-sys/). Static link, no R dependency. The C++ core handles container parsing, compression (LZMA/LZ4/zlib), and bit2 genotype unpacking. Rust side reads/sample.id,/chromosome,/position,/allele,/genotype/datanodes via FFI and produces the sameGenotypeResultthat the VCF extractor produces.Everything downstream (annotation join, sparse store build, STAAR) stays untouched.
Work
GdsHandlerfor format detectiongds.rsgenotype extractor producingGenotypeResultFormatRegistryand cohort ingest pathDepends on