NanoporeMet is a comprehensive pipeline for analyzing nanopore sequencing data, providing sequencing quality assessment, taxonomic classification with Kraken2, and reference-based coverage analysis, all wrapped in an interactive Shiny dashboard for visualization.
| Script | Description |
|---|---|
nanoporemet.py |
Main pipeline for sequencing summary analysis and Kraken2 taxonomic classification |
coverage.py |
Reference-based coverage analysis using minimap2 and samtools |
app.R |
Interactive Shiny dashboard for visualizing taxonomic results |
- Sequencing Summary Analysis: Generates publication-ready PDF plots
with:
- Mean Q-score distribution (all reads and quality-filtered)
- Sequence length distribution (log-scale) with median values
- Vertical median lines and light-grey gridlines
- Taxonomic Classification:
- Automated Kraken2 analysis on all barcode subdirectories
- Dual database support: viral-only or viral+bacterial analysis
- Automatic concatenation of FASTQ files per barcode
- Combined results with “all” barcode summary
- Cleanup of intermediate files
- Coverage Analysis:
- Reference-based mapping with minimap2
- BAM file generation and sorting with samtools
- Coverage depth calculation
- Log-scale coverage plots with horizontal/vertical coverage statistics
- Interactive Visualization:
- Web-based Shiny dashboard available at: http://172.23.210.220:3838/NGS/NanoporeMet/
- Barcode-specific result filtering
- Domain-level filtering (Virus/Bacteria)
- Taxonomy level selection (Species/Genus)
- Filter options for phages and endogenous retroviruses
- Blocklisted virus filtering
- RPM (Reads Per Million) calculations
- Bold highlighting for control viruses
-
Python 3.6+ with packages:
pandas matplotlib
-
R 4.0+ with packages:
shiny tidyverse
-
External tools:
- Kraken2 databases (paths configurable in script):
- Viral-only:
/data/kraken_databases/k2_human-viral_20240111/ - Viral+bacterial:
/data/kraken_databases/k2_pluspf_08gb_20231009/
- Viral-only:
-
Clone this repository:
git clone https://github.com/yourusername/NanoporeMet.git cd NanoporeMet -
Install Python dependencies:
pip install pandas matplotlib
-
Install R dependencies:
R -e "install.packages(c('shiny', 'tidyverse'))" -
Ensure external tools are in your PATH:
- Kraken2
- minimap2
- samtools
-
Make scripts globally accessible (recommended):
Copy the scripts to
/usr/bin/with simplified names and make them executable:sudo cp nanoporemet.py /usr/bin/nanoporemet sudo cp coverage.py /usr/bin/nanopore_coverage sudo chmod +x /usr/bin/nanoporemet /usr/bin/nanopore_coverage
This allows you to run the commands
nanoporemetandnanopore_coveragefrom any directory without specifying the path.Note: If you prefer not to install globally, you can run the scripts directly with
python nanoporemet.pyandpython coverage.pyfrom the repository directory.
Your nanopore run directory should be organized as follows:
your_run_directory/
├── fastq_pass/
│ ├── barcode01/
│ │ └── *.fastq.gz
│ ├── barcode02/
│ │ └── *.fastq.gz
│ └── ...
└── sequencing_summary_*.txt
Navigate to your nanopore run directory and run:
cd /path/to/your/nanopore_run
nanoporemetYou will be prompted:
Do you wish to analyze bacterial reads? (yes/y or no/n):
- Answer
yes/yfor viral+bacterial analysis - Answer
no/nfor viral-only analysis
The script will:
-
Generate
sequencing_summary.pdfwith Q-score and length distributions -
Process each barcode folder in
fastq_pass/ -
Run Kraken2 analysis on each barcode
-
Create combined output file:
virus.kraken.txt(viral-only mode)virus_bacteria.kraken.txt(viral+bacterial mode)
For reference-based coverage analysis of specific genomes:
cd /path/to/your/nanopore_run
nanopore_coverageYou will be prompted:
Enter the path to the reference sequence directory:
Provide the path to a directory containing a single .fasta reference
file.
The script will:
- Concatenate all FASTQ files
- Map reads to reference with minimap2
- Generate sorted BAM file
- Calculate coverage depth
- Create coverage plot PDF with:
- Log-scale coverage depth
- Horizontal coverage percentage
- Mean vertical coverage (X)
- Genome position axis
Output structure:
your_run_directory/
└── reference_name/
├── reference_name.sam
├── reference_name.bam
├── reference_name.coverage
└── reference_name.pdf
Access the pre-configured Shiny dashboard at:
http://172.23.210.220:3838/NGS/NanoporeMet/
- Upload: Load
virus.kraken.txtorvirus_bacteria.kraken.txtfiles generated by the pipeline - Barcode Selection: Choose specific barcode or “all” for summary
- Domain Filter: Select Virus or Bacteria (if available)
- Taxonomy Level: Switch between Species and Genus views
- Filter Options:
- Hide phages and endogenous retroviruses
- Hide blocklisted viruses (common contaminants)
- Output:
- Total analyzed reads counter
- Domain-level bar plot with consistent color scheme
- Detailed table with taxonomy, NCBI IDs, read counts, and RPM
If you prefer to run the Shiny app locally:
R -e "shiny::runApp('app.R')"Or open app.R in RStudio and click “Run App”.
| File | Description |
|---|---|
sequencing_summary.pdf |
Q-score and length distribution plots |
virus.kraken.txt |
Kraken2 results (viral-only mode) |
virus_bacteria.kraken.txt |
Kraken2 results (viral+bacterial mode) |
barcodeXX/barcodeXX.kreport.txt |
Individual barcode reports |
| File | Description |
|---|---|
reference_name/reference_name.pdf |
Coverage plot |
reference_name/reference_name.coverage |
Per-base coverage depths |
reference_name/reference_name.bam |
Sorted alignment file |
reference_name/reference_name.sam |
Raw alignment file |
The PDF includes four plots:
- Mean Q-score (all reads)
- Mean Q-score (quality-filtered reads)
- Sequence length (all reads, log-scale)
- Sequence length (quality-filtered reads, log-scale)
Features:
- Log-scale y-axis for coverage depth
- Genome position on x-axis
- Horizontal coverage percentage
- Mean vertical coverage (X)
- Minimalist styling with light-grey gridlines
- Human: Dark blue (#143642)
- Bacterial: Teal (#0F8B8D)
- Fungal: Orange (#EC9A29)
- Viral: Red (#A8201A)
- Unclassified: Purple-grey (#A69CAC)
Edit nanoporemet.py:
# Change these paths to point to your Kraken2 databases
if analyze_bacterial in ['yes', 'y']:
kraken_db_path = "/your/path/to/k2_pluspf_database/"
else:
kraken_db_path = "/your/path/to/k2_human-viral_database/"Edit the pattern in app.R to add/remove viruses from the blocklist:
if (input$hide_blocklisted_viruses) {
data <- data %>%
filter(!(grepl("virus1|virus2|virus3", X7, ignore.case = TRUE) & X8 == "Virus"))
}- “No fastq.gz files found”
- Ensure your directory structure is correct:
fastq_pass/barcodeXX/*.fastq.gz - Check file permissions
- Ensure your directory structure is correct:
- “Multiple reference fasta files found”
- Ensure reference directory contains only one
.fastafile - Remove hidden files (._*)
- Ensure reference directory contains only one
- Kraken2 database not found
- Verify database paths in
nanoporemet.py - Ensure databases are properly formatted
- Verify database paths in
- Shiny app shows no data
- Confirm file format matches expected Kraken2 output
- Check that barcodes exist in the uploaded file
- Ensure you’re using files generated by
nanoporemet.py
- Cannot access the web dashboard
- Verify you’re on the correct network
- Check if the URL is accessible: http://172.23.210.220:3838/NGS/NanoporeMet/
- Contact your system administrator if issues persist
- Command not found: nanoporemet / nanopore_coverage
- If you didn’t install globally, use
python nanoporemet.pyorpython coverage.pyinstead - Or install globally with the commands in step 5 of the Installation section
- If you didn’t install globally, use