Skip to content

Sung2021/nextflow-bio-tutorials

Repository files navigation

nextflow-bio-tutorials

Step-by-step Nextflow DSL2 tutorials for bioinformatics — from basic channels to production-grade pipelines.


Tutorials

01  Hello Channels          Nextflow fundamentals: process, channel, operators
02  FastQC + MultiQC        Real-world QC pipeline with paired-end reads
03  RNA-seq Quantification  STAR → featureCounts → DESeq2
04  Variant Calling         BWA-MEM → GATK HaplotypeCaller (best practices)
05  Proteomics mzML         Spectra → limpa DEA / DROmics BMD (R integration)
06  Scatter-Gather          Split reads → parallel align → merge (scaling pattern)
07  Containers              Docker / Singularity per-process isolation

Repository Structure

nextflow-bio-tutorials/
├── 01_hello_channels/         # Process, channel, collect, view
├── 02_fastqc_multiqc/         # fromFilePairs, named emit, publishDir
├── 03_rnaseq_quantification/  # STAR index caching, inline R process
├── 04_variant_calling/        # GATK best practices, flatMap + groupTuple
├── 05_proteomics_mzml/        # Branching (group vs dose-response), R integration
├── 06_multi_sample_scatter/   # transpose, splitCsv, scatter-gather
├── 07_containers/             # Per-process container, custom Dockerfile
├── modules/                   # Reusable DSL2 modules (fastqc, multiqc, bwa)
├── testdata/                  # Lightweight test inputs
├── nextflow.config            # Shared profiles: local, docker, singularity, slurm
└── README.md

Concept Progression

Tutorial New Concepts
01 process, Channel.of(), map, collect, view, publishDir
02 Channel.fromFilePairs(), tuple input, named emit:, fan-in
03 storeDir (index caching), conditional params, inline Rscript, mixed channels
04 Multi-stage chaining, flatMap + groupTuple, read groups, GATK pattern
05 Branching (if/else), type: 'dir' channel, R package integration
06 splitCsv, .transpose(), scatter-gather, two-level aggregation
07 container directive, BioContainers, custom Dockerfile, profile switching

Quick Start

# Prerequisites: Java 11+, Nextflow
curl -s https://get.nextflow.io | bash
chmod +x nextflow && mv nextflow ~/bin/

# Run the first tutorial
nextflow run 01_hello_channels/main.nf

# Run with Docker
nextflow run 02_fastqc_multiqc/main.nf -profile docker

# Run on SLURM cluster
nextflow run 03_rnaseq_quantification/main.nf -profile slurm

Profiles

Defined in nextflow.config:

Profile Executor Container
local Local machine None (tools must be installed)
docker Local Docker
singularity Local Singularity
slurm SLURM scheduler Via cluster config

Shared Modules

Reusable DSL2 modules in modules/:

include { FASTQC }  from './modules/fastqc'
include { MULTIQC } from './modules/multiqc'
include { BWA_MEM } from './modules/bwa'

Portfolio Integration

Tutorial Related Repository Relationship
03 RNA-seq lung-tme-deconv-profiler Produces the count matrix that feeds deconvolution
05 Proteomics proteomics Same R analysis wrapped as Nextflow processes

Requirements

  • Nextflow >= 23.04
  • Java 11+
  • Docker or Singularity (for container tutorials)
  • Bioinformatics tools (if running without containers): FastQC, MultiQC, STAR, BWA, GATK, samtools, R

About

Step-by-step Nextflow DSL2 tutorials for bioinformatics: channels, QC, RNA-seq, variant calling, proteomics, scatter-gather, and container integration

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors