Parallel VCF ingest via tabix region splitting

VCF is a row-oriented text format inside a compressed stream. Parsing is sequential because record N's start depends on record N-1's end. BGZF decompression is already parallel (noodles-bgzf worker threads), but the parse itself is single-threaded.

Tabix-indexed VCFs (.tbi or .csi sidecar) support random access by genomic region. When an index is present we can split the file into non-overlapping chromosome regions and parse them in parallel threads, each writing to its own per-chromosome batch.

## Behavior

1. On `favor ingest`, probe for `<path>.tbi` or `<path>.csi` alongside each input VCF.
2. If an index exists, read the region list from the index header (chromosomes and their block offsets).
3. Partition regions into N worker groups based on the memory budget (each worker needs one batch buffer per chromosome it touches).
4. Each worker opens an independent BGZF reader, seeks to its region range, parses records, and fills thread-local batch builders.
5. A coordinator thread collects full batches from workers and flushes to the per-chromosome parquet writers (single writer per chromosome, fed by multiple parse threads).
6. If no index is present, fall back to the current single-threaded sequential parse.

## Constraints

- Memory budget must account for N workers x batch buffers. Derive worker count from budget.
- Each worker decompresses independently, so total BGZF throughput scales with workers.
- Region boundaries must align to BGZF block starts (tabix guarantees this).
- The single-pass genotype extraction path (`geno_writer`) must still work. Either each worker has its own GenotypeWriter and results merge, or genotype extraction stays single-threaded and only variant-site parsing parallelizes.

## Depends on

- #18 (zero-alloc record processing -- landed, each worker benefits from the same buffer reuse)
- #70 (single-pass ingest -- workers must handle both variant + genotype output)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel VCF ingest via tabix region splitting #74

Behavior

Constraints

Depends on

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Parallel VCF ingest via tabix region splitting #74

Description

Behavior

Constraints

Depends on

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions