Nanopore BCmapping

A Snakemake pipeline for extracting and mapping variant/barcode sequences from Nanopore reads.

Overview

The pipeline runs three steps in sequence:

A1 — Convert raw input (bam / fasta / fastq) to CSV
A2 — Extract var and bc sequences using anchor sequences
A3 — Match variants to a reference using massive-seq-finder

Final output: output_dir/var_bc_reads_named.csv

Repository Structure

.
├── snakefile
├── run_snakemake.sh
├── requirements.txt
├── config/
│   ├── config.yaml       # User configuration — edit this
│   └── sbatch.yaml       # SLURM resource settings
└── module/
    ├── A1.transform_to_csv.py
    ├── A2.Split_Reads_Dask.py
    └── A3.matching_variant.py

Installation

pip install -r requirements.txt

massive-seq-finder is installed directly from GitHub and is listed in requirements.txt.

Configuration

Only config/config.yaml needs to be edited between runs.

Required:

Key	Description
`samples`	Path to the input file
`input_type`	`bam`, `fasta`, or `fastq`
`anchor_seqs`	Four comma-separated anchor sequences
`oligo_length`	Expected variant sequence length
`bc_length`	Expected barcode length
`var_dict`	Path to reference variant table (must contain a `seq` column)
`output_dir`	Directory where outputs will be written

Runtime (optional, with defaults):

Key	Default	Description
`conda_env`	`anaconda_env`	Conda environment to activate
`snakemake_jobs`	`45`	Max concurrent SLURM jobs
`n_workers`	`32`	Dask workers for A2
`mem_per_worker`	`9GB`	Memory per Dask worker

Running

Submit to SLURM cluster:

bash run_snakemake.sh

Dry-run (no execution):

snakemake -n --configfile config/config.yaml

Generate DAG image:

snakemake --dag --configfile config/config.yaml | dot -Tpng > dag.png

Outputs

All outputs are written to output_dir/:

File	Description
`raw_reads.csv`	Converted reads from A1
`var_bc_reads.csv`	Extracted variant/barcode sequences from A2
`var_bc_reads_named.csv`	Variant-matched and named output from A3
`variant_matching.log`	Matching log from A3

Notes

A3 uses the MSF class from massive-seq-finder for nearest-reference matching.
If the variant table contains a names, name, ref_name, or variant_name column, it is automatically mapped to the var_name output column.
SLURM job logs are written to logs/.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nanopore BCmapping

Overview

Repository Structure

Installation

Configuration

Running

Outputs

Notes

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
module		module
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run_snakemake.sh		run_snakemake.sh
snakefile		snakefile

Folders and files

Latest commit

History

Repository files navigation

Nanopore BCmapping

Overview

Repository Structure

Installation

Configuration

Running

Outputs

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages