FAVOR CLI

Raw variants in. Rare-variant results out.
Annotate. Enrich. Analyze. Interpret.

Install · Quick Start · Commands · Roadmap · Citation

Pre-1.0. Commands and interfaces may change between releases.

Install

curl -fsSL https://raw.githubusercontent.com/vineetver/favor-cli/master/install.sh | sh

Quick Start

# 1. configure: point at a data directory + choose annotation tier
favor setup --root /data/favor --tier base

# 2. pull annotation data (~200 GB base, ~508 GB full)
favor data pull

# 3. ingest and annotate variants
favor ingest variants.vcf.gz
favor annotate variants.ingested

# 4. run STAAR rare-variant association
favor staar --genotypes cohort.vcf.gz --phenotype pheno.tsv \
  --trait-name LDL --covariates age,sex,PC1,PC2 \
  --annotations variants.annotated

Commands

Command	What it does
`favor setup`	Configure data root, annotation tier, environment
`favor data pull`	Download annotation parquets and optional packs
`favor ingest`	Normalize VCF/TSV/CSV into canonical parquet variant sets
`favor annotate`	Join variants against FAVOR base or full annotations
`favor enrich`	Overlay tissue-specific eQTL, regulatory, enhancer-gene data
`favor staar`	STAAR rare-variant association testing
`favor meta-staar`	Cross-study meta-analysis from summary statistics
`favor schema`	Inspect annotation table columns and types
`favor manifest`	Show installed data and available commands

Use --format json for machine-readable output. Use --dry-run before heavy computation.

Data layout

FAVOR CLI uses two separate storage areas:

Data root (--root during setup) holds annotation parquets shared across projects:

/data/favor/
  base/chromosome=*/sorted.parquet      # base tier (~200 GB)
  full/chromosome=*/sorted.parquet      # full tier (~508 GB)
  tissue/                               # optional enrichment packs
    reference/                          #   gene index, cCRE regions (40 MB, always installed)
    rollups/                            #   gene-level summaries (49 MB, always installed)
    variant_in_region/                  #   variant-region junction (155 GB, always installed)
    variant_eqtl/                       #   GTEx eQTL (3 GB, optional)
    region_ccre_tissue_signals/         #   ENCODE regulatory (18 GB, optional)
    ...

Project store (.cohort/ in your working directory) holds per-project data:

my_study/
  .cohort/
    cohorts/<id>/                       # built by favor ingest or favor staar
      manifest.json
      samples.txt
      chromosome=*/
        sparse_g.bin                    # sparse genotype matrix (mmap'd)
        variants.parquet                # variant metadata + STAAR weights
        membership.parquet              # gene-variant assignments
    cache/score_cache/                  # reused across mask/MAF reruns
    annotations/refs.toml               # attached annotation databases

The store root is resolved as: --store-path flag > FAVOR_STORE env > walk up for .cohort/ > <cwd>/.cohort/.

See Setup guide for detailed configuration, pack selection, HPC tips, and working directory organization.

Resource requirements

Tested on UKB exome chr22 (~200K samples, ~400K variants, ~17K rare) with 64 GB. Full genome not yet tested.

samples    RAM       notes
───────    ──────    ─────────────────────────────
 10K       32 GB     comfortable
200K       64 GB     tested (UKB exome chr22)

Memory, threads, and temp directory are auto-detected from SLURM and cgroup. Override with:

SLURM_MEM_PER_NODE     memory pool
FAVOR_KINSHIP_MEM_GB   kinship budget (default 16 GB)
TMPDIR                 scratch space

Docs

Setup guide - installation, configuration, data management, HPC best practices
Ingest - VCF ingest patterns, preflight, throughput
Genotype store - sparse genotype store for rare-variant analysis
STAAR - null model, score test, masks, outputs, meta-analysis
Validation - statistical accuracy vs R reference
Statistical divergences - known differences from R STAAR/SKAT and why
Performance - benchmarks and optimization roadmap
Agent reference - machine interface for LLM agents

Roadmap

Milestone	Focus
v0.2.0 - STAAR hardening	GRM, score validation, multi-VCF input, performance profiling
v0.3.0 - MetaSTAAR	cross-biobank meta-analysis, allele flip, conditional, effect sizes
v0.4.0 - Interpret	variant interpretation, fine-mapping, colocalization, V2G, tiers
v0.5.0 - memory and thread pool overhaul	one compute handle, bounded scratch, machine-visible resource control
v0.6.0 - storage and query engine	store format, query paths, incremental ingest, cloud I/O, agent-friendly queries
v1.0.0 - Production	orchestration, provenance, QC, full test suite

Citation

FAVOR CLI implements the STAAR framework and the FAVOR annotation database. If you use this tool, please cite:

Li Z*, Li X*, Zhou H, et al. A framework for detecting noncoding rare variant associations of large-scale whole-genome sequencing studies. Nature Methods, 19(12), 1599-1611 (2022). DOI: 10.1038/s41592-022-01640-x

Li X*, Li Z*, Zhou H, et al. Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nature Genetics, 52(9), 969-983 (2020). DOI: 10.1038/s41588-020-0676-4

Zhou H, Verma V, Li X, et al. FAVOR 2.0: A reengineered functional annotation of variants online resource for interpreting genomic variation. Nucleic Acids Research, 54(D1), D1405-D1414 (2026). DOI: 10.1093/nar/gkaf1217

Zhou H, Arapoglou T, Li X, et al. FAVOR: functional annotation of variants online resource and annotator for variation across the human genome. Nucleic Acids Research, 51(D1), D1300-D1311 (2023). DOI: 10.1093/nar/gkac966

Li TC, Zhou H, Verma V, et al. FAVOR-GPT: a generative natural language interface to whole genome variant functional annotations. Bioinformatics Advances, 4(1), vbae143 (2024). DOI: 10.1093/bioadv/vbae143

License

GPL-3.0

Name		Name	Last commit message	Last commit date
Latest commit History 138 Commits
.github		.github
doc		doc
docs		docs
src		src
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
install.ps1		install.ps1
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FAVOR CLI

Install

Quick Start

Commands

Data layout

Resource requirements

Docs

Roadmap

Citation

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FAVOR CLI

Install

Quick Start

Commands

Data layout

Resource requirements

Docs

Roadmap

Citation

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages