Skip to content

vineetver/favor-cli

FAVOR CLI

Raw variants in. Rare-variant results out.
Annotate. Enrich. Analyze. Interpret.

Install · Quick Start · Commands · Roadmap · Citation

CI Release License Rust Platform


Pre-1.0. Commands and interfaces may change between releases.

Install

curl -fsSL https://raw.githubusercontent.com/vineetver/favor-cli/master/install.sh | sh

Quick Start

# 1. configure: point at a data directory + choose annotation tier
favor setup --root /data/favor --tier base

# 2. pull annotation data (~200 GB base, ~508 GB full)
favor data pull

# 3. ingest and annotate variants
favor ingest variants.vcf.gz
favor annotate variants.ingested

# 4. run STAAR rare-variant association
favor staar --genotypes cohort.vcf.gz --phenotype pheno.tsv \
  --trait-name LDL --covariates age,sex,PC1,PC2 \
  --annotations variants.annotated

Commands

Command What it does
favor setup Configure data root, annotation tier, environment
favor data pull Download annotation parquets and optional packs
favor ingest Normalize VCF/TSV/CSV into canonical parquet variant sets
favor annotate Join variants against FAVOR base or full annotations
favor enrich Overlay tissue-specific eQTL, regulatory, enhancer-gene data
favor staar STAAR rare-variant association testing
favor meta-staar Cross-study meta-analysis from summary statistics
favor schema Inspect annotation table columns and types
favor manifest Show installed data and available commands

Use --format json for machine-readable output. Use --dry-run before heavy computation.

Data layout

FAVOR CLI uses two separate storage areas:

Data root (--root during setup) holds annotation parquets shared across projects:

/data/favor/
  base/chromosome=*/sorted.parquet      # base tier (~200 GB)
  full/chromosome=*/sorted.parquet      # full tier (~508 GB)
  tissue/                               # optional enrichment packs
    reference/                          #   gene index, cCRE regions (40 MB, always installed)
    rollups/                            #   gene-level summaries (49 MB, always installed)
    variant_in_region/                  #   variant-region junction (155 GB, always installed)
    variant_eqtl/                       #   GTEx eQTL (3 GB, optional)
    region_ccre_tissue_signals/         #   ENCODE regulatory (18 GB, optional)
    ...

Project store (.cohort/ in your working directory) holds per-project data:

my_study/
  .cohort/
    cohorts/<id>/                       # built by favor ingest or favor staar
      manifest.json
      samples.txt
      chromosome=*/
        sparse_g.bin                    # sparse genotype matrix (mmap'd)
        variants.parquet                # variant metadata + STAAR weights
        membership.parquet              # gene-variant assignments
    cache/score_cache/                  # reused across mask/MAF reruns
    annotations/refs.toml               # attached annotation databases

The store root is resolved as: --store-path flag > FAVOR_STORE env > walk up for .cohort/ > <cwd>/.cohort/.

See Setup guide for detailed configuration, pack selection, HPC tips, and working directory organization.

Resource requirements

Tested on UKB exome chr22 (~200K samples, ~400K variants, ~17K rare) with 64 GB. Full genome not yet tested.

samples    RAM       notes
───────    ──────    ─────────────────────────────
 10K       32 GB     comfortable
200K       64 GB     tested (UKB exome chr22)

Memory, threads, and temp directory are auto-detected from SLURM and cgroup. Override with:

SLURM_MEM_PER_NODE     memory pool
FAVOR_KINSHIP_MEM_GB   kinship budget (default 16 GB)
TMPDIR                 scratch space

Docs

  • Setup guide - installation, configuration, data management, HPC best practices
  • Ingest - VCF ingest patterns, preflight, throughput
  • Genotype store - sparse genotype store for rare-variant analysis
  • STAAR - null model, score test, masks, outputs, meta-analysis
  • Validation - statistical accuracy vs R reference
  • Statistical divergences - known differences from R STAAR/SKAT and why
  • Performance - benchmarks and optimization roadmap
  • Agent reference - machine interface for LLM agents

Roadmap

Milestone Focus
v0.2.0 - STAAR hardening GRM, score validation, multi-VCF input, performance profiling
v0.3.0 - MetaSTAAR cross-biobank meta-analysis, allele flip, conditional, effect sizes
v0.4.0 - Interpret variant interpretation, fine-mapping, colocalization, V2G, tiers
v0.5.0 - memory and thread pool overhaul one compute handle, bounded scratch, machine-visible resource control
v0.6.0 - storage and query engine store format, query paths, incremental ingest, cloud I/O, agent-friendly queries
v1.0.0 - Production orchestration, provenance, QC, full test suite

Citation

FAVOR CLI implements the STAAR framework and the FAVOR annotation database. If you use this tool, please cite:

Li Z*, Li X*, Zhou H, et al. A framework for detecting noncoding rare variant associations of large-scale whole-genome sequencing studies. Nature Methods, 19(12), 1599-1611 (2022). DOI: 10.1038/s41592-022-01640-x

Li X*, Li Z*, Zhou H, et al. Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nature Genetics, 52(9), 969-983 (2020). DOI: 10.1038/s41588-020-0676-4

Zhou H, Verma V, Li X, et al. FAVOR 2.0: A reengineered functional annotation of variants online resource for interpreting genomic variation. Nucleic Acids Research, 54(D1), D1405-D1414 (2026). DOI: 10.1093/nar/gkaf1217

Zhou H, Arapoglou T, Li X, et al. FAVOR: functional annotation of variants online resource and annotator for variation across the human genome. Nucleic Acids Research, 51(D1), D1300-D1311 (2023). DOI: 10.1093/nar/gkac966

Li TC, Zhou H, Verma V, et al. FAVOR-GPT: a generative natural language interface to whole genome variant functional annotations. Bioinformatics Advances, 4(1), vbae143 (2024). DOI: 10.1093/bioadv/vbae143

License

GPL-3.0

Packages

 
 
 

Contributors

Languages