This repository contains a Jupyter Notebook workflow for performing Quality Control (QC) and Genome Assembly
of bacterial Whole Genome Sequencing (WGS) data on the local sequencing server.
The notebook automates the standard GHRU workflow, combining:
- BactScout for QC
- GHRU-Assembly for genome assembly
- β Automatic setup of directory structure for each sequencing run
- β Quality Control using BactScout (with Pixi environment)
- β
Manual QC interpretation step (
final_summary.csv) - β Automatic generation of sample sheet for passed samples
- β De novo genome assembly using GHRU-Assembly (Nextflow pipeline)
- β
Optional cleanup of
workand.nextflow.logfiles - β Reproducible and auditable workflow with clear logs
mkdir -p /data/nihr/nextflow_pipelines/
cd /data/nihr/nextflow_pipelines/
# Clone BactScout and GHRU-Assembly
git clone https://github.com/ghruproject/bactscout.git
git clone https://github.com/ghruproject/GHRU-assembly.git# Create environment for BactScout (Pixi)
mamba create -n pixi python=3.10 -y
mamba run -n pixi pip install pixi
# Create environment for Nextflow (Assembly)
mamba create -n nextflow python=3.10 -y
mamba run -n nextflow mamba install -c bioconda nextflow -y- Launch JupyterLab or Jupyter Notebook on the sequencing server.
- Open the notebook
Routine_WGS_QC_and_Assembly.ipynb. - Update the
base_dirvariable to your current run folder (for example:
/data/nihr/ghru2/2025/2025-09-22). - Run the cells sequentially from top to bottom.
- Set up
input_dir,output_dir,qc_dir, andassembly_dir.
- Runs QC inside the Pixi environment.
- Generates
final_summary.csvandmultiqc_report.html.
- Open
final_summary.csvand add a new columnoverall_status(PASSEDorFAILED).
- Runs a script to create
samplesheet_passed.csvcontaining only PASSED samples.
- Runs the Nextflow assembly pipeline inside the
nextflowenvironment. - Generates the assembly output and summary files.
- Removes
work/and.nextflow.logfiles after successful completion.
- Always verify QC results before running the assembly pipeline.
- You can modify the BactScout config file to include additional organisms if needed.
- Run cleanup only after confirming assembly completion.
- The notebook follows a fixed folder structure β avoid renaming or moving directories manually.
Varun Shamanna
Senior Bioinformatician, Central Research Laboratory, KIMS, Bengaluru
PhD Researcher