NanoporeMet

NanoporeMet is a comprehensive pipeline for analyzing nanopore sequencing data, providing sequencing quality assessment, taxonomic classification with Kraken2, and reference-based coverage analysis, all wrapped in an interactive Shiny dashboard for visualization.

Features

Core Analysis Scripts

Script	Description
`nanoporemet.py`	Main pipeline for sequencing summary analysis and Kraken2 taxonomic classification
`coverage.py`	Reference-based coverage analysis using minimap2 and samtools
`app.R`	Interactive Shiny dashboard for visualizing taxonomic results

Key Capabilities

Sequencing Summary Analysis: Generates publication-ready PDF plots with:
- Mean Q-score distribution (all reads and quality-filtered)
- Sequence length distribution (log-scale) with median values
- Vertical median lines and light-grey gridlines
Taxonomic Classification:
- Automated Kraken2 analysis on all barcode subdirectories
- Dual database support: viral-only or viral+bacterial analysis
- Automatic concatenation of FASTQ files per barcode
- Combined results with “all” barcode summary
- Cleanup of intermediate files
Coverage Analysis:
- Reference-based mapping with minimap2
- BAM file generation and sorting with samtools
- Coverage depth calculation
- Log-scale coverage plots with horizontal/vertical coverage statistics
Interactive Visualization:
- Web-based Shiny dashboard available at: http://172.23.210.220:3838/NGS/NanoporeMet/
- Barcode-specific result filtering
- Domain-level filtering (Virus/Bacteria)
- Taxonomy level selection (Species/Genus)
- Filter options for phages and endogenous retroviruses
- Blocklisted virus filtering
- RPM (Reads Per Million) calculations
- Bold highlighting for control viruses

Requirements

Dependencies

Python 3.6+ with packages:
```
pandas
matplotlib
```
R 4.0+ with packages:
```
shiny
tidyverse
```
External tools:
- Kraken2
- minimap2
- samtools
- seqkit (optional)

Databases

Kraken2 databases (paths configurable in script):
- Viral-only: /data/kraken_databases/k2_human-viral_20240111/
- Viral+bacterial: /data/kraken_databases/k2_pluspf_08gb_20231009/

Installation

Clone this repository:

git clone https://github.com/yourusername/NanoporeMet.git
cd NanoporeMet

Install Python dependencies:
```
pip install pandas matplotlib
```

Install R dependencies:

R -e "install.packages(c('shiny', 'tidyverse'))"

Ensure external tools are in your PATH:
- Kraken2
- minimap2
- samtools
Make scripts globally accessible (recommended):

Copy the scripts to /usr/bin/ with simplified names and make them executable:
```
sudo cp nanoporemet.py /usr/bin/nanoporemet
sudo cp coverage.py /usr/bin/nanopore_coverage
sudo chmod +x /usr/bin/nanoporemet /usr/bin/nanopore_coverage
```
This allows you to run the commands nanoporemet and nanopore_coverage from any directory without specifying the path.

Note: If you prefer not to install globally, you can run the scripts directly with python nanoporemet.py and python coverage.py from the repository directory.

Usage

Directory Structure

Your nanopore run directory should be organized as follows:

your_run_directory/
├── fastq_pass/
│   ├── barcode01/
│   │   └── *.fastq.gz
│   ├── barcode02/
│   │   └── *.fastq.gz
│   └── ...
└── sequencing_summary_*.txt

1. Run Main Analysis Pipeline

Navigate to your nanopore run directory and run:

cd /path/to/your/nanopore_run
nanoporemet

You will be prompted:

Do you wish to analyze bacterial reads? (yes/y or no/n):

Answer yes/y for viral+bacterial analysis
Answer no/n for viral-only analysis

The script will:

Generate sequencing_summary.pdf with Q-score and length distributions
Process each barcode folder in fastq_pass/
Run Kraken2 analysis on each barcode
Create combined output file:
- virus.kraken.txt (viral-only mode)
- virus_bacteria.kraken.txt (viral+bacterial mode)

2. Run Coverage Analysis (Optional)

For reference-based coverage analysis of specific genomes:

cd /path/to/your/nanopore_run
nanopore_coverage

You will be prompted:

Enter the path to the reference sequence directory:

Provide the path to a directory containing a single .fasta reference file.

The script will:

Concatenate all FASTQ files
Map reads to reference with minimap2
Generate sorted BAM file
Calculate coverage depth
Create coverage plot PDF with:
- Log-scale coverage depth
- Horizontal coverage percentage
- Mean vertical coverage (X)
- Genome position axis

Output structure:

your_run_directory/
└── reference_name/
    ├── reference_name.sam
    ├── reference_name.bam
    ├── reference_name.coverage
    └── reference_name.pdf

3. Interactive Dashboard

Access the pre-configured Shiny dashboard at:

http://172.23.210.220:3838/NGS/NanoporeMet/

Dashboard Features

Upload: Load virus.kraken.txt or virus_bacteria.kraken.txt files generated by the pipeline
Barcode Selection: Choose specific barcode or “all” for summary
Domain Filter: Select Virus or Bacteria (if available)
Taxonomy Level: Switch between Species and Genus views
Filter Options:
- Hide phages and endogenous retroviruses
- Hide blocklisted viruses (common contaminants)
Output:
- Total analyzed reads counter
- Domain-level bar plot with consistent color scheme
- Detailed table with taxonomy, NCBI IDs, read counts, and RPM

Running Locally

If you prefer to run the Shiny app locally:

R -e "shiny::runApp('app.R')"

Or open app.R in RStudio and click “Run App”.

Output Files

From nanoporemet.py

File	Description
`sequencing_summary.pdf`	Q-score and length distribution plots
`virus.kraken.txt`	Kraken2 results (viral-only mode)
`virus_bacteria.kraken.txt`	Kraken2 results (viral+bacterial mode)
`barcodeXX/barcodeXX.kreport.txt`	Individual barcode reports

From coverage.py

File	Description
`reference_name/reference_name.pdf`	Coverage plot
`reference_name/reference_name.coverage`	Per-base coverage depths
`reference_name/reference_name.bam`	Sorted alignment file
`reference_name/reference_name.sam`	Raw alignment file

Visualization Examples

Sequencing Summary

The PDF includes four plots:

Mean Q-score (all reads)
Mean Q-score (quality-filtered reads)
Sequence length (all reads, log-scale)
Sequence length (quality-filtered reads, log-scale)

Coverage Plot

Features:

Log-scale y-axis for coverage depth
Genome position on x-axis
Horizontal coverage percentage
Mean vertical coverage (X)
Minimalist styling with light-grey gridlines

Shiny Dashboard Colors

Human: Dark blue (#143642)
Bacterial: Teal (#0F8B8D)
Fungal: Orange (#EC9A29)
Viral: Red (#A8201A)
Unclassified: Purple-grey (#A69CAC)

Customization

Modifying Kraken2 Database Paths

Edit nanoporemet.py:

# Change these paths to point to your Kraken2 databases
if analyze_bacterial in ['yes', 'y']:
    kraken_db_path = "/your/path/to/k2_pluspf_database/"
else:
    kraken_db_path = "/your/path/to/k2_human-viral_database/"

Updating Blocklisted Viruses

Edit the pattern in app.R to add/remove viruses from the blocklist:

if (input$hide_blocklisted_viruses) {
  data <- data %>%
    filter(!(grepl("virus1|virus2|virus3", X7, ignore.case = TRUE) & X8 == "Virus"))
}

Troubleshooting

Common Issues

“No fastq.gz files found”
- Ensure your directory structure is correct: fastq_pass/barcodeXX/*.fastq.gz
- Check file permissions
“Multiple reference fasta files found”
- Ensure reference directory contains only one .fasta file
- Remove hidden files (._*)
Kraken2 database not found
- Verify database paths in nanoporemet.py
- Ensure databases are properly formatted
Shiny app shows no data
- Confirm file format matches expected Kraken2 output
- Check that barcodes exist in the uploaded file
- Ensure you’re using files generated by nanoporemet.py
Cannot access the web dashboard
- Verify you’re on the correct network
- Check if the URL is accessible: http://172.23.210.220:3838/NGS/NanoporeMet/
- Contact your system administrator if issues persist
Command not found: nanoporemet / nanopore_coverage
- If you didn’t install globally, use python nanoporemet.py or python coverage.py instead
- Or install globally with the commands in step 5 of the Installation section

Acknowledgments

Kraken2 for taxonomic classification
minimap2 for long-read alignment
samtools for SAM/BAM processing
Shiny for interactive visualization
Oxford Nanopore Technologies for sequencing platforms

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
NanoporeMet.Rproj		NanoporeMet.Rproj
README.Rmd		README.Rmd
README.md		README.md
app.R		app.R
coverage.py		coverage.py
nanoporemet.py		nanoporemet.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NanoporeMet

Features

Core Analysis Scripts

Key Capabilities

Requirements

Dependencies

Databases

Installation

Usage

Directory Structure

1. Run Main Analysis Pipeline

2. Run Coverage Analysis (Optional)

3. Interactive Dashboard

Dashboard Features

Running Locally

Output Files

From nanoporemet.py

From coverage.py

Visualization Examples

Sequencing Summary

Coverage Plot

Shiny Dashboard Colors

Customization

Modifying Kraken2 Database Paths

Updating Blocklisted Viruses

Troubleshooting

Common Issues

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NanoporeMet

Features

Core Analysis Scripts

Key Capabilities

Requirements

Dependencies

Databases

Installation

Usage

Directory Structure

1. Run Main Analysis Pipeline

2. Run Coverage Analysis (Optional)

3. Interactive Dashboard

Dashboard Features

Running Locally

Output Files

From nanoporemet.py

From coverage.py

Visualization Examples

Sequencing Summary

Coverage Plot

Shiny Dashboard Colors

Customization

Modifying Kraken2 Database Paths

Updating Blocklisted Viruses

Troubleshooting

Common Issues

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages