barcodeqc is a lightweight command-line tool for rapid quality control of AtlasXomics epigenomic DBiT-seq experiments. It analyzes Read 2 barcodes and ligation linkers to flag common upstream failure modes and produces a single HTML report for a go/no-go decision.
The tool expects Illumina short-read data using the barcoding schema described in Zhang et al. 2023.
- Read 1: genomic sequence
- Read 2: linker1 | barcodeA | linker2 | barcodeB | genomic sequence
What It Checks
- Linker conservation (L1 and L2)
- Barcode whitelist mismatch
- High or low barcode lanes
- Off-tissue ratio (when a tissue positions file is provided)
Pipeline stages executed by barcodeqc qc:
- Subsample Read 2 with
seqtk sampleand writeds_<sample_reads>.fastq.gz. - Run
cutadaptfor linker 1 and linker 2 independently, writing wildcard barcode files and logs. - Build
spatialTable.csvby merging linker barcode calls and joining to tissue positions. - For each linker, compute barcode count metrics, whitelist checks, and lane QC flags; write count tables and QC plots.
- If tissue positions are available, compute on/off tissue metrics and generate the on/off density plot.
- Build the summary QC table, print a terminal status table, and render the final HTML report.
- macOS or Linux
- Python 3.10+
seqtk1.4+ on PATHcutadapt(installed automatically viapip, provides thecutadaptCLI)pigzoptional (recommended for faster subsample compression)
Make sure the interpreter used to create the virtual environment is Python 3.10 or newer. On systems with multiple Python installs, use an explicit binary such as
python3.10,python3.11, orpython3.12.
git clone https://github.com/atlasxomics/barcodeqc.git
cd barcodeqc
python -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install .Conda alternative (instead of venv):
git clone https://github.com/atlasxomics/barcodeqc.git
cd barcodeqc
conda create -n barcodeqc python=3.12 -y
conda activate barcodeqc
pip install -U pip
pip install .barcodeqc uses seqtk sample during subsampling, so seqtk must be on your PATH.
macOS (Homebrew):
brew install seqtkLinux (Debian/Ubuntu):
sudo apt-get update
sudo apt-get install -y seqtkFrom source:
git clone https://github.com/lh3/seqtk.git
cd seqtk
make
cp seqtk .venv/bin/ # if using venv
cp seqtk "$CONDA_PREFIX/bin/" # if using conda (after `conda activate barcodeqc`)
Verify installation:
which seqtkOptional speed-up (parallel gzip):
which pigzbarcodeqc qc SAMPLE_NAME /path/to/read2.fastq.gz bc220If you do not provide --tissue_position_file, the tool uses the packaged tissue positions file for the selected barcode set.
qc runs the full pipeline and generates figures, tables, and the HTML report.
barcodeqc qc SAMPLE_NAME /path/to/read2.fastq.gz bc96report regenerates the HTML report from an existing run directory.
barcodeqc report -n SAMPLE_NAME -d /path/to/SAMPLE_NAMEsample_name(positional): label used for the output directory and report namer2_path(positional): Read 2 fastq or fastq.gz filebarcode_set(positional): one ofbc50,bc96,fg96,bc220,bc220_05-OCT,bc220_20-MAY--sample_reads: number of reads to subsample (default10000000)--random_seed: seed for subsampling (default42)--tissue_position_file: optional tissue_positions_list.csv from AtlasXBrowser--count_raw_reads: optional full-file read counting for report metadata (off by default; can be slow on large files)--dry_run: create the output directory but skip running the pipeline
Each run creates a directory named after sample_name in the current working directory.
SAMPLE_NAME/
SAMPLE_NAME_bcQC_report.html
ds_10000000.fastq.gz
figures/
L1_barplot.html
L2_barplot.html
L1_pareto.html
L2_pareto.html
dense_on_off.html
tables/
spatialTable.csv
L1_counts.csv
L2_counts.csv
L1_hiLoWarn.csv
L2_hiLoWarn.csv
onoff_tissue_table.csv
qc_table.csv
input_parameters.json
logs/
cutadapt_L1.log
cutadapt_L2.log
The console also prints a summary table with PASS or CAUTION statuses for each QC metric.
Notes on optional outputs:
dense_on_off.htmlandonoff_tissue_table.csvare only created when a tissue positions file is provided.L1_hiLoWarn.csvandL2_hiLoWarn.csvare only created if high or low lanes are detected.
PASSindicates the metric is within expected bounds.CAUTIONindicates the metric falls outside the expected range and should be reviewed in the report.
Use this small dataset to validate your installation and run a quick smoke test.
- Latch portal (GUI): https://console.latch.bio/s/17328881931993962
- Direct download URL: https://latch-public.s3.amazonaws.com/test-data/13502/barcodeqc_example/barcodeqc_example.tar.gz
barcodeqc qc \
barcodeqc_example \
data/barcodeqc_example_R2_001.fastq.gz \
bc220 \
-t data/tissue_positions_list.csv \
--count_raw_readsOn most modern laptops, this example should complete in about a minute.
seqtknot found: installseqtkand ensure it is on PATH.cutadaptnot found: ensure the active environment includescutadaptand thecutadaptCLI is available.- Slow subsampling: install
pigzfor parallel compression;barcodeqcwill use it automatically when available. - Missing or incorrect tissue positions: provide a valid
tissue_positions_list.csvor rely on the default barcode-set positions file. - Large input appears stuck right after
CLI started: by default this version skips full raw-read counting to avoid startup delays on very large FASTQs. If you enable--count_raw_reads, expect startup to include a full scan of the input file. - Logs are written to
barcodeqc.logandSAMPLE_NAME/logs/. - Contact your AtlasXomics Support Scientist if you encounter persistent issues.
