Skip to content

medvir/NanoporeMet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NanoporeMet

NanoporeMet is a comprehensive pipeline for analyzing nanopore sequencing data, providing sequencing quality assessment, taxonomic classification with Kraken2, and reference-based coverage analysis, all wrapped in an interactive Shiny dashboard for visualization.

Features

Core Analysis Scripts

Script Description
nanoporemet.py Main pipeline for sequencing summary analysis and Kraken2 taxonomic classification
coverage.py Reference-based coverage analysis using minimap2 and samtools
app.R Interactive Shiny dashboard for visualizing taxonomic results

Key Capabilities

  • Sequencing Summary Analysis: Generates publication-ready PDF plots with:
    • Mean Q-score distribution (all reads and quality-filtered)
    • Sequence length distribution (log-scale) with median values
    • Vertical median lines and light-grey gridlines
  • Taxonomic Classification:
    • Automated Kraken2 analysis on all barcode subdirectories
    • Dual database support: viral-only or viral+bacterial analysis
    • Automatic concatenation of FASTQ files per barcode
    • Combined results with “all” barcode summary
    • Cleanup of intermediate files
  • Coverage Analysis:
    • Reference-based mapping with minimap2
    • BAM file generation and sorting with samtools
    • Coverage depth calculation
    • Log-scale coverage plots with horizontal/vertical coverage statistics
  • Interactive Visualization:
    • Web-based Shiny dashboard available at: http://172.23.210.220:3838/NGS/NanoporeMet/
    • Barcode-specific result filtering
    • Domain-level filtering (Virus/Bacteria)
    • Taxonomy level selection (Species/Genus)
    • Filter options for phages and endogenous retroviruses
    • Blocklisted virus filtering
    • RPM (Reads Per Million) calculations
    • Bold highlighting for control viruses

Requirements

Dependencies

Databases

  • Kraken2 databases (paths configurable in script):
    • Viral-only: /data/kraken_databases/k2_human-viral_20240111/
    • Viral+bacterial: /data/kraken_databases/k2_pluspf_08gb_20231009/

Installation

  1. Clone this repository:

    git clone https://github.com/yourusername/NanoporeMet.git
    cd NanoporeMet
  2. Install Python dependencies:

    pip install pandas matplotlib
  3. Install R dependencies:

    R -e "install.packages(c('shiny', 'tidyverse'))"
  4. Ensure external tools are in your PATH:

    • Kraken2
    • minimap2
    • samtools
  5. Make scripts globally accessible (recommended):

    Copy the scripts to /usr/bin/ with simplified names and make them executable:

    sudo cp nanoporemet.py /usr/bin/nanoporemet
    sudo cp coverage.py /usr/bin/nanopore_coverage
    sudo chmod +x /usr/bin/nanoporemet /usr/bin/nanopore_coverage

    This allows you to run the commands nanoporemet and nanopore_coverage from any directory without specifying the path.

    Note: If you prefer not to install globally, you can run the scripts directly with python nanoporemet.py and python coverage.py from the repository directory.

Usage

Directory Structure

Your nanopore run directory should be organized as follows:

your_run_directory/
├── fastq_pass/
│   ├── barcode01/
│   │   └── *.fastq.gz
│   ├── barcode02/
│   │   └── *.fastq.gz
│   └── ...
└── sequencing_summary_*.txt

1. Run Main Analysis Pipeline

Navigate to your nanopore run directory and run:

cd /path/to/your/nanopore_run
nanoporemet

You will be prompted:

Do you wish to analyze bacterial reads? (yes/y or no/n):
  • Answer yes/y for viral+bacterial analysis
  • Answer no/n for viral-only analysis

The script will:

  1. Generate sequencing_summary.pdf with Q-score and length distributions

  2. Process each barcode folder in fastq_pass/

  3. Run Kraken2 analysis on each barcode

  4. Create combined output file:

    • virus.kraken.txt (viral-only mode)
    • virus_bacteria.kraken.txt (viral+bacterial mode)

2. Run Coverage Analysis (Optional)

For reference-based coverage analysis of specific genomes:

cd /path/to/your/nanopore_run
nanopore_coverage

You will be prompted:

Enter the path to the reference sequence directory:

Provide the path to a directory containing a single .fasta reference file.

The script will:

  1. Concatenate all FASTQ files
  2. Map reads to reference with minimap2
  3. Generate sorted BAM file
  4. Calculate coverage depth
  5. Create coverage plot PDF with:
    • Log-scale coverage depth
    • Horizontal coverage percentage
    • Mean vertical coverage (X)
    • Genome position axis

Output structure:

your_run_directory/
└── reference_name/
    ├── reference_name.sam
    ├── reference_name.bam
    ├── reference_name.coverage
    └── reference_name.pdf

3. Interactive Dashboard

Access the pre-configured Shiny dashboard at:

http://172.23.210.220:3838/NGS/NanoporeMet/

Dashboard Features

  1. Upload: Load virus.kraken.txt or virus_bacteria.kraken.txt files generated by the pipeline
  2. Barcode Selection: Choose specific barcode or “all” for summary
  3. Domain Filter: Select Virus or Bacteria (if available)
  4. Taxonomy Level: Switch between Species and Genus views
  5. Filter Options:
    • Hide phages and endogenous retroviruses
    • Hide blocklisted viruses (common contaminants)
  6. Output:
    • Total analyzed reads counter
    • Domain-level bar plot with consistent color scheme
    • Detailed table with taxonomy, NCBI IDs, read counts, and RPM

Running Locally

If you prefer to run the Shiny app locally:

R -e "shiny::runApp('app.R')"

Or open app.R in RStudio and click “Run App”.

Output Files

From nanoporemet.py

File Description
sequencing_summary.pdf Q-score and length distribution plots
virus.kraken.txt Kraken2 results (viral-only mode)
virus_bacteria.kraken.txt Kraken2 results (viral+bacterial mode)
barcodeXX/barcodeXX.kreport.txt Individual barcode reports

From coverage.py

File Description
reference_name/reference_name.pdf Coverage plot
reference_name/reference_name.coverage Per-base coverage depths
reference_name/reference_name.bam Sorted alignment file
reference_name/reference_name.sam Raw alignment file

Visualization Examples

Sequencing Summary

The PDF includes four plots:

  1. Mean Q-score (all reads)
  2. Mean Q-score (quality-filtered reads)
  3. Sequence length (all reads, log-scale)
  4. Sequence length (quality-filtered reads, log-scale)

Coverage Plot

Features:

  • Log-scale y-axis for coverage depth
  • Genome position on x-axis
  • Horizontal coverage percentage
  • Mean vertical coverage (X)
  • Minimalist styling with light-grey gridlines

Shiny Dashboard Colors

  • Human: Dark blue (#143642)
  • Bacterial: Teal (#0F8B8D)
  • Fungal: Orange (#EC9A29)
  • Viral: Red (#A8201A)
  • Unclassified: Purple-grey (#A69CAC)

Customization

Modifying Kraken2 Database Paths

Edit nanoporemet.py:

# Change these paths to point to your Kraken2 databases
if analyze_bacterial in ['yes', 'y']:
    kraken_db_path = "/your/path/to/k2_pluspf_database/"
else:
    kraken_db_path = "/your/path/to/k2_human-viral_database/"

Updating Blocklisted Viruses

Edit the pattern in app.R to add/remove viruses from the blocklist:

if (input$hide_blocklisted_viruses) {
  data <- data %>%
    filter(!(grepl("virus1|virus2|virus3", X7, ignore.case = TRUE) & X8 == "Virus"))
}

Troubleshooting

Common Issues

  1. “No fastq.gz files found”
    • Ensure your directory structure is correct: fastq_pass/barcodeXX/*.fastq.gz
    • Check file permissions
  2. “Multiple reference fasta files found”
    • Ensure reference directory contains only one .fasta file
    • Remove hidden files (._*)
  3. Kraken2 database not found
    • Verify database paths in nanoporemet.py
    • Ensure databases are properly formatted
  4. Shiny app shows no data
    • Confirm file format matches expected Kraken2 output
    • Check that barcodes exist in the uploaded file
    • Ensure you’re using files generated by nanoporemet.py
  5. Cannot access the web dashboard
  6. Command not found: nanoporemet / nanopore_coverage
    • If you didn’t install globally, use python nanoporemet.py or python coverage.py instead
    • Or install globally with the commands in step 5 of the Installation section

Acknowledgments

  • Kraken2 for taxonomic classification
  • minimap2 for long-read alignment
  • samtools for SAM/BAM processing
  • Shiny for interactive visualization
  • Oxford Nanopore Technologies for sequencing platforms

About

A complete Nanopore sequencing analysis pipeline with Kraken2 classification, coverage analysis, and interactive visualization at http://172.23.210.220:3838/NGS/NanoporeMet/.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors