|
Developed by the Bork Group Raise an issue or contact us See our other Software & Services |
Contributors: |
The development of this workflow was supported by NFDI4Microbiota
|
||
The nevermore_profiler is a workflow optimised for alignment-based, functional profiling of public metagenomic/-transcriptomic short read data sets against large metagenomic gene catalogues (e.g. GMGC or proGenomes.) It makes use of the nevermore workflow library.
Input data sets are profiled with gffquant. gffquant aligns the input reads using BWA-mem (or minimap2 for larger catalogues) against a reference catalogue and distributes the resulting gene counts to functional categories obtained from eggnog-mapper annotations of the reference catalogue, generating readcount-based functional profiles. The nevermore profiler workflow, nevermore library, as well as the gffquant software were/are being developed in the Bork and Zeller labs at EMBL Heidelberg. In 2023, maintenance and development was supported by NFDI4Microbiota
The easiest way to handle dependencies is via Singularity/Docker containers. Alternatively, conda environments, software module systems or native installations can be used.
Preprocessing and QA is done with bbmap, fastqc, and multiqc.
Decontamination is done with kraken2 and additionally requires seqtk.
Host removal requires a kraken2 host database.
The workflow requires a bwa or minimap2 index of a gene catalogue (e.g. from GMGC) as well as an sqlite database of eggnog-mapper annotations. The database can be built from an eggnog-mapper annotation table using utility scripts from gffquant.
This workflow will be available on the CloWM platform (coming soon).
The workflow run is controlled by environment-specific parameters (see run.config) and study-specific parameters (see params.yml). The parameters in the params.yml can be specified on the command line as well.
You can either clone this repository from GitHub and run it as follows
git clone https://github.com/grp-bork/nevermore_profiler.git
nextflow run /path/to/nevermore_profiler [-resume] -c /path/to/run.config -params-file /path/to/params.yml
Or, you can have nextflow pull it from github and run it from the $HOME/.nextflow directory.
nextflow run grp-bork/nevermore_profiler [-resume] -c /path/to/run.config -params-file /path/to/params.yml
Fastq files are supported and can be either uncompressed (but shouldn't be!) or compressed with gzip or bzip2. Sample data can be arranged in one single input directory ("flat") or as one directory per sample ("tree").
Mates 1 and 2 can be specified with suffixes _[12], _R[12], .[12], .R[12]. Lane IDs or other read id modifiers have to precede the mate identifier. Files with names not containing either of those patterns will be assigned to be single-ended. Samples consisting of both single and paired end files are assumed to be paired end with all single end files being orphans (quality control survivors).
Files in the input directory must have perfectly matching prefixes in order to be associated as belonging to the same sample. Orphans belonging to the same sample as a paired-end file pair must contain the same sample prefix as well as the string .singles (preceding any R1/R2 suffix and the fastq suffix.)
All files in a sample directory will be associated with the name of the sample folder. Paired-end mate files need to have matching prefixes.

