Skip to content

GDS (SeqArray) genotype format support #69

@vineetver

Description

@vineetver

Add read support for GDS (CoreArray/SeqArray) genotype files so favor ingest cohort.gds --annotations ... works the same as the VCF path.

Approach

Vendor the CoreArray C++ library as a feature-gated -sys crate (crates/corearray-sys/). Static link, no R dependency. The C++ core handles container parsing, compression (LZMA/LZ4/zlib), and bit2 genotype unpacking. Rust side reads /sample.id, /chromosome, /position, /allele, /genotype/data nodes via FFI and produces the same GenotypeResult that the VCF extractor produces.

Everything downstream (annotation join, sparse store build, STAAR) stays untouched.

crates/
  corearray-sys/
    Cargo.toml        # links = "corearray", feature-gated
    build.rs          # cc::Build, compiles vendored C++
    src/lib.rs        # unsafe extern "C" bindings
    vendor/           # CoreArray C++ source + bundled zlib/lz4/lzma

Work

  • Vendor CoreArray C++ (strip R layer, keep core + compression)
  • C shim: open, walk nodes, read arrays
  • Rust FFI + GdsHandler for format detection
  • gds.rs genotype extractor producing GenotypeResult
  • Wire into FormatRegistry and cohort ingest path
  • Test against real SeqArray files

Depends on

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestingestVCF/genotype ingest pipeline

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions