TBAS_pipeline

We present Trio-barcoded ONT Adaptive Sampling (TBAS), a cost-efficient long-read sequencing strategy combining sample barcoding and adaptive enrichment to sequence rare-disease trios on a single PromethION/P2 flow cell. TBAS achieved near-complete variant phasing and detection of small variants, structural variants, and tandem repeats with high accuracy and 77% potential solve rate. This scalable approach retains methylation data and enables clinically relevant, phenotype-guided long-read diagnostics at a fraction of current costs.

Overview

The software pipeline entrypoint is tbas-pipeline, implemented in tbas_pipeline/.
The software version runs tools directly (no sbatch generation).
The notebook TBAS_pipeline_slurm.ipynb is kept as-is for historical/reference workflow.
The analysis/ folder contains downstream analysis notebooks organized by topic (SNV/Trio calling, coverage, methylation, tandem repeats, and variant counting).
Benchmarking materials and example HG002 results are provided under benchmarking/, with data available at the Zenodo record below.

Requirements

Python 3.9+.
Standard long-read analysis command-line tools used by the pipeline: samtools, minimap2, sniffles, mosdepth, run_clair3.sh, kanpig, bedtools, bgzip, bcftools, whatshap, medaka, modkit, tdb.

Getting started

Install the local package:

pip install -e .

Create a manifest CSV/TSV with at least:

sample_id (example: 4_6_Gregor_Trio)
bed_file (example: Epilepsy, CMRG, WGS, or a direct BED path)
proband_gender (used by medaka stages, example: female)

Optional columns:

calls_bam (explicit path to calls*.bam; if omitted, the pipeline searches under <output_folder>/<sample_id>/calls*.bam)
read_group_prefix (skip BAM-header inference during demultiplex stage)
tr_bed_file (TR catalog BED for medaka_local; if omitted, the pipeline derives this from bed_file via built-in mapping)

Run a dry run starting from demultiplexing:

tbas-pipeline \
  --manifest example_data/test_subset_chr22/manifest_example.csv \
  --output-folder demo_output \
  --stages demultiplex \
  --dry-run

Run full pipeline:

tbas-pipeline \
  --manifest example_data/test_subset_chr22/manifest_example.csv \
  --output-folder demo_output

Pipeline stages

Default stage order:

demultiplex
fastq_extract
minimap2
bam_sort
sniffles_global
bam_mosdepth
clair3_local
kanpig_pileup
kanpig_gt
kanpig_trio
whatshap_single_sample_local_phasing
whatshap_haplotag
medaka_local
medaka_patho
modkit
modkit_nohp
tdb

You can run a subset with --stages stage1,stage2,....

Notebook reference

TBAS_pipeline_slurm.ipynb is preserved and unchanged.
Use it as reference for the original SLURM-oriented workflow.

Downstream analysis notebooks

The analysis/ directory contains topic-focused notebooks and resources:

SNV/Trio calling
- analysis/snv_trio_analysis.ipynb: downstream analysis notebook for small variants and trio evaluation.
Coverage
- analysis/covearge/mosdepth_analysis.ipynb: coverage analysis with mosdepth.
- analysis/covearge/mosdepth_summary_totals.csv: example summary output.
Methylation
- analysis/methylation/methylation_analysis.ipynb: methylation analysis notebook.
- analysis/methylation/*/: per-sample directories with supporting files.
Tandem repeats (TR)
- analysis/tr/adaptive_sampling_tr_analysis.ipynb: TR analysis on adaptive sampling data.
- analysis/tr/tr_regions/: TR region catalogs used by the analysis, including:
  - adotto_catalog.hg38.lite.*.bed
  - strchive/STRchive-disease-loci.hg38.bed
Variant counts
- analysis/variant_counts/count_variants_by_barcode.sh
- analysis/variant_counts/count_variants_by_barcode_parallel.sh
- analysis/variant_counts/variant_counts.tsv

Benchmarking

Folder: benchmarking/
- adaptive_methylation_benchmark.ipynb: methylation benchmarking notebook.
- benchmarking/hg002_variant_calls/: example HG002 variant call files and indices for reference.

Benchmarking data DOI: https://doi.org/10.5281/zenodo.17398577

Data availability and citation

If you use these materials, please reference the Zenodo record:

TBAS benchmarking data: https://doi.org/10.5281/zenodo.17398577

License

This project is distributed under the terms of the license in LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
analysis		analysis
benchmarking		benchmarking
example_data/test_subset_chr22		example_data/test_subset_chr22
tbas_pipeline		tbas_pipeline
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TBAS_pipeline_slurm.ipynb		TBAS_pipeline_slurm.ipynb
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TBAS_pipeline

Overview

Requirements

Getting started

Pipeline stages

Notebook reference

Downstream analysis notebooks

Benchmarking

Data availability and citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TBAS_pipeline

Overview

Requirements

Getting started

Pipeline stages

Notebook reference

Downstream analysis notebooks

Benchmarking

Data availability and citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages