This Nextflow workflow processes paired-end sequencing reads by:
- Quality trimming and filtering with fastp
- Removing contaminant sequences using bbmap
- Taxonomically classifying the clean reads using kraken2
- Report using multiqc
- BBMap database: Must contain
/ref/index/
directory - Kraken2 database: Must contain
hash.k2d
file
nextflow run main.nf --input_pattern "path/to/reads/*_{1,2}.fastq.gz" \
--contaminants_db "path/to/bbmap_db" \
--kraken_db "path/to/kraken_db"
graph TD
A[Input Reads] --> B[FASTP]
B --> C[BBMAP]
C --> D[Clean Reads]
C --> E[Mapped Contaminants]
E --> F[SAMTOOLS_STATS]
D --> G[KRAKEN]
B --> H[QC Reports]
F --> H
H --> I[MULTIQC]
- Quality-filtered and contaminant-free reads
- Taxonomic classification of clean reads
- Comprehensive QC reports (FASTP, SAMTOOLS_STATS, MULTIQC)
Materials prepared for a training session at the Quadram Institute.