This project implements a reproducible Nextflow pipeline for quality control and statistical analysis of long-read sequencing data.
The pipeline takes a FASTQ file as input, performs QC analysis, calculates per-read statistics, and generates visualizations and summary statistics.
- FastQC for general quality control
- NanoPlot as a long-read-specific QC tool
- Per-read calculation of:
- GC content (%)
- read length
- mean read quality score
- CSV output
- Distribution plots
- Summary statistics
. ├── README.md ├── email_draft.md ├── environment.yml ├── files │ └── barcode77.fastq ├── main.nf ├── nextflow.config ├── out ├── process │ ├── QC │ │ ├── fastqc.nf │ │ └── nanoplot.nf │ ├── STATS │ │ └── read_stats.nf │ └── VISUALIZATION │ └── visualize.nf ├── scripts │ ├── read_stats.py │ └── visualize_stats.py └── workflow └── QC └── qc_workflow.nf
10 directories, 13 files
Clone the repository:
git clone https://github.com/realcann/pipeline_long_read.git
cd pipeline_long_read
## Create the Conda environment
conda env create -f environment.yml
conda activate qc_pipeline_env
## Run the pipeline
nextflow run main.nf
## Output
All results are written to the out/ directory.