nevermore profiler

	Developed by the Bork Group Raise an issue or contact us See our other Software & Services	Contributors: Christian Schudoma Daniel Podlesny
The development of this workflow was supported by NFDI4Microbiota

Description

The nevermore_profiler is a workflow optimised for alignment-based, functional profiling of public metagenomic/-transcriptomic short read data sets against large metagenomic gene catalogues (e.g. GMGC or proGenomes.) It makes use of the nevermore workflow library.

Input data sets are profiled with gffquant. gffquant aligns the input reads using BWA-mem (or minimap2 for larger catalogues) against a reference catalogue and distributes the resulting gene counts to functional categories obtained from eggnog-mapper annotations of the reference catalogue, generating readcount-based functional profiles. The nevermore profiler workflow, nevermore library, as well as the gffquant software were/are being developed in the Bork and Zeller labs at EMBL Heidelberg. In 2023, maintenance and development was supported by NFDI4Microbiota

Citation

This workflow:

Overview

Requirements

The easiest way to handle dependencies is via Singularity/Docker containers. Alternatively, conda environments, software module systems or native installations can be used.

Preprocessing

Preprocessing and QA is done with bbmap, fastqc, and multiqc.

Decontamination/Host removal

Decontamination is done with kraken2 and additionally requires seqtk.

Kraken2 database

Host removal requires a kraken2 host database.

Gene Catalogue

The workflow requires a bwa or minimap2 index of a gene catalogue (e.g. from GMGC) as well as an sqlite database of eggnog-mapper annotations. The database can be built from an eggnog-mapper annotation table using utility scripts from gffquant.

Usage

Cloud-based Workflow Manager (CloWM)

This workflow will be available on the CloWM platform (coming soon).

Command-Line Interface (CLI)

The workflow run is controlled by environment-specific parameters (see run.config) and study-specific parameters (see params.yml). The parameters in the params.yml can be specified on the command line as well.

You can either clone this repository from GitHub and run it as follows

git clone https://github.com/grp-bork/nevermore_profiler.git
nextflow run /path/to/nevermore_profiler [-resume] -c /path/to/run.config -params-file /path/to/params.yml

Or, you can have nextflow pull it from github and run it from the $HOME/.nextflow directory.

nextflow run grp-bork/nevermore_profiler [-resume] -c /path/to/run.config -params-file /path/to/params.yml

Input files

Fastq files are supported and can be either uncompressed (but shouldn't be!) or compressed with gzip or bzip2. Sample data can be arranged in one single input directory ("flat") or as one directory per sample ("tree").

Mates 1 and 2 can be specified with suffixes _[12], _R[12], .[12], .R[12]. Lane IDs or other read id modifiers have to precede the mate identifier. Files with names not containing either of those patterns will be assigned to be single-ended. Samples consisting of both single and paired end files are assumed to be paired end with all single end files being orphans (quality control survivors).

All files in one directory -- "flat"

Files in the input directory must have perfectly matching prefixes in order to be associated as belonging to the same sample. Orphans belonging to the same sample as a paired-end file pair must contain the same sample prefix as well as the string .singles (preceding any R1/R2 suffix and the fastq suffix.)

Per-sample input subdirectories -- "tree"

All files in a sample directory will be associated with the name of the sample folder. Paired-end mate files need to have matching prefixes.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/workflows		.github/workflows
bin		bin
config		config
docs		docs
nevermore		nevermore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
clowm_info.json		clowm_info.json
environment.yml		environment.yml
main.nf		main.nf
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nevermore profiler

Description

Citation

Overview

Requirements

Preprocessing

Decontamination/Host removal

Kraken2 database

Gene Catalogue

Usage

Cloud-based Workflow Manager (CloWM)

Command-Line Interface (CLI)

Input files

All files in one directory -- "flat"

Per-sample input subdirectories -- "tree"

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

grp-bork/nevermore_profiler

Folders and files

Latest commit

History

Repository files navigation

nevermore profiler

Description

Citation

Overview

Requirements

Preprocessing

Decontamination/Host removal

Kraken2 database

Gene Catalogue

Usage

Cloud-based Workflow Manager (CloWM)

Command-Line Interface (CLI)

Input files

All files in one directory -- "flat"

Per-sample input subdirectories -- "tree"

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages