Raw variants in. Rare-variant results out.
Annotate. Enrich. Analyze. Interpret.
Install · Quick Start · Commands · Roadmap · Citation
Pre-1.0. Commands and interfaces may change between releases.
curl -fsSL https://raw.githubusercontent.com/vineetver/favor-cli/master/install.sh | sh# 1. configure: point at a data directory + choose annotation tier
favor setup --root /data/favor --tier base
# 2. pull annotation data (~200 GB base, ~508 GB full)
favor data pull
# 3. ingest and annotate variants
favor ingest variants.vcf.gz
favor annotate variants.ingested
# 4. run STAAR rare-variant association
favor staar --genotypes cohort.vcf.gz --phenotype pheno.tsv \
--trait-name LDL --covariates age,sex,PC1,PC2 \
--annotations variants.annotated| Command | What it does |
|---|---|
favor setup |
Configure data root, annotation tier, environment |
favor data pull |
Download annotation parquets and optional packs |
favor ingest |
Normalize VCF/TSV/CSV into canonical parquet variant sets |
favor annotate |
Join variants against FAVOR base or full annotations |
favor enrich |
Overlay tissue-specific eQTL, regulatory, enhancer-gene data |
favor staar |
STAAR rare-variant association testing |
favor meta-staar |
Cross-study meta-analysis from summary statistics |
favor schema |
Inspect annotation table columns and types |
favor manifest |
Show installed data and available commands |
Use --format json for machine-readable output. Use --dry-run before heavy computation.
FAVOR CLI uses two separate storage areas:
Data root (--root during setup) holds annotation parquets shared across projects:
/data/favor/
base/chromosome=*/sorted.parquet # base tier (~200 GB)
full/chromosome=*/sorted.parquet # full tier (~508 GB)
tissue/ # optional enrichment packs
reference/ # gene index, cCRE regions (40 MB, always installed)
rollups/ # gene-level summaries (49 MB, always installed)
variant_in_region/ # variant-region junction (155 GB, always installed)
variant_eqtl/ # GTEx eQTL (3 GB, optional)
region_ccre_tissue_signals/ # ENCODE regulatory (18 GB, optional)
...
Project store (.cohort/ in your working directory) holds per-project data:
my_study/
.cohort/
cohorts/<id>/ # built by favor ingest or favor staar
manifest.json
samples.txt
chromosome=*/
sparse_g.bin # sparse genotype matrix (mmap'd)
variants.parquet # variant metadata + STAAR weights
membership.parquet # gene-variant assignments
cache/score_cache/ # reused across mask/MAF reruns
annotations/refs.toml # attached annotation databases
The store root is resolved as: --store-path flag > FAVOR_STORE env > walk up for .cohort/ > <cwd>/.cohort/.
See Setup guide for detailed configuration, pack selection, HPC tips, and working directory organization.
Tested on UKB exome chr22 (~200K samples, ~400K variants, ~17K rare) with 64 GB. Full genome not yet tested.
samples RAM notes
─────── ────── ─────────────────────────────
10K 32 GB comfortable
200K 64 GB tested (UKB exome chr22)
Memory, threads, and temp directory are auto-detected from SLURM and cgroup. Override with:
SLURM_MEM_PER_NODE memory pool
FAVOR_KINSHIP_MEM_GB kinship budget (default 16 GB)
TMPDIR scratch space
- Setup guide - installation, configuration, data management, HPC best practices
- Ingest - VCF ingest patterns, preflight, throughput
- Genotype store - sparse genotype store for rare-variant analysis
- STAAR - null model, score test, masks, outputs, meta-analysis
- Validation - statistical accuracy vs R reference
- Statistical divergences - known differences from R STAAR/SKAT and why
- Performance - benchmarks and optimization roadmap
- Agent reference - machine interface for LLM agents
| Milestone | Focus |
|---|---|
| v0.2.0 - STAAR hardening | GRM, score validation, multi-VCF input, performance profiling |
| v0.3.0 - MetaSTAAR | cross-biobank meta-analysis, allele flip, conditional, effect sizes |
| v0.4.0 - Interpret | variant interpretation, fine-mapping, colocalization, V2G, tiers |
| v0.5.0 - memory and thread pool overhaul | one compute handle, bounded scratch, machine-visible resource control |
| v0.6.0 - storage and query engine | store format, query paths, incremental ingest, cloud I/O, agent-friendly queries |
| v1.0.0 - Production | orchestration, provenance, QC, full test suite |
FAVOR CLI implements the STAAR framework and the FAVOR annotation database. If you use this tool, please cite:
Li Z*, Li X*, Zhou H, et al. A framework for detecting noncoding rare variant associations of large-scale whole-genome sequencing studies. Nature Methods, 19(12), 1599-1611 (2022). DOI: 10.1038/s41592-022-01640-x
Li X*, Li Z*, Zhou H, et al. Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nature Genetics, 52(9), 969-983 (2020). DOI: 10.1038/s41588-020-0676-4
Zhou H, Verma V, Li X, et al. FAVOR 2.0: A reengineered functional annotation of variants online resource for interpreting genomic variation. Nucleic Acids Research, 54(D1), D1405-D1414 (2026). DOI: 10.1093/nar/gkaf1217
Zhou H, Arapoglou T, Li X, et al. FAVOR: functional annotation of variants online resource and annotator for variation across the human genome. Nucleic Acids Research, 51(D1), D1300-D1311 (2023). DOI: 10.1093/nar/gkac966
Li TC, Zhou H, Verma V, et al. FAVOR-GPT: a generative natural language interface to whole genome variant functional annotations. Bioinformatics Advances, 4(1), vbae143 (2024). DOI: 10.1093/bioadv/vbae143
GPL-3.0