We present Trio-barcoded ONT Adaptive Sampling (TBAS), a cost-efficient long-read sequencing strategy combining sample barcoding and adaptive enrichment to sequence rare-disease trios on a single PromethION/P2 flow cell. TBAS achieved near-complete variant phasing and detection of small variants, structural variants, and tandem repeats with high accuracy and 77% potential solve rate. This scalable approach retains methylation data and enables clinically relevant, phenotype-guided long-read diagnostics at a fraction of current costs.
- The software pipeline entrypoint is
tbas-pipeline, implemented intbas_pipeline/. - The software version runs tools directly (no
sbatchgeneration). - The notebook
TBAS_pipeline_slurm.ipynbis kept as-is for historical/reference workflow. - The
analysis/folder contains downstream analysis notebooks organized by topic (SNV/Trio calling, coverage, methylation, tandem repeats, and variant counting). - Benchmarking materials and example HG002 results are provided under
benchmarking/, with data available at the Zenodo record below.
- Python 3.9+.
- Standard long-read analysis command-line tools used by the pipeline:
samtools,minimap2,sniffles,mosdepth,run_clair3.sh,kanpig,bedtools,bgzip,bcftools,whatshap,medaka,modkit,tdb.
- Install the local package:
pip install -e .- Create a manifest CSV/TSV with at least:
sample_id(example:4_6_Gregor_Trio)bed_file(example:Epilepsy,CMRG,WGS, or a direct BED path)proband_gender(used by medaka stages, example:female)
Optional columns:
calls_bam(explicit path tocalls*.bam; if omitted, the pipeline searches under<output_folder>/<sample_id>/calls*.bam)read_group_prefix(skip BAM-header inference during demultiplex stage)tr_bed_file(TR catalog BED formedaka_local; if omitted, the pipeline derives this frombed_filevia built-in mapping)
- Run a dry run starting from demultiplexing:
tbas-pipeline \
--manifest example_data/test_subset_chr22/manifest_example.csv \
--output-folder demo_output \
--stages demultiplex \
--dry-run- Run full pipeline:
tbas-pipeline \
--manifest example_data/test_subset_chr22/manifest_example.csv \
--output-folder demo_outputDefault stage order:
demultiplexfastq_extractminimap2bam_sortsniffles_globalbam_mosdepthclair3_localkanpig_pileupkanpig_gtkanpig_triowhatshap_single_sample_local_phasingwhatshap_haplotagmedaka_localmedaka_pathomodkitmodkit_nohptdb
You can run a subset with --stages stage1,stage2,....
TBAS_pipeline_slurm.ipynbis preserved and unchanged.- Use it as reference for the original SLURM-oriented workflow.
The analysis/ directory contains topic-focused notebooks and resources:
-
SNV/Trio calling
analysis/snv_trio_analysis.ipynb: downstream analysis notebook for small variants and trio evaluation.
-
Coverage
analysis/covearge/mosdepth_analysis.ipynb: coverage analysis with mosdepth.analysis/covearge/mosdepth_summary_totals.csv: example summary output.
-
Methylation
analysis/methylation/methylation_analysis.ipynb: methylation analysis notebook.analysis/methylation/*/: per-sample directories with supporting files.
-
Tandem repeats (TR)
analysis/tr/adaptive_sampling_tr_analysis.ipynb: TR analysis on adaptive sampling data.analysis/tr/tr_regions/: TR region catalogs used by the analysis, including:adotto_catalog.hg38.lite.*.bedstrchive/STRchive-disease-loci.hg38.bed
-
Variant counts
analysis/variant_counts/count_variants_by_barcode.shanalysis/variant_counts/count_variants_by_barcode_parallel.shanalysis/variant_counts/variant_counts.tsv
- Folder:
benchmarking/adaptive_methylation_benchmark.ipynb: methylation benchmarking notebook.benchmarking/hg002_variant_calls/: example HG002 variant call files and indices for reference.
Benchmarking data DOI: https://doi.org/10.5281/zenodo.17398577
If you use these materials, please reference the Zenodo record:
- TBAS benchmarking data: https://doi.org/10.5281/zenodo.17398577
This project is distributed under the terms of the license in LICENSE.