Skip to content

EnsembleAssembler performs de novo assembly of pathogen genomes from metagenomic samples sequenced using Illumina platforms

Notifications You must be signed in to change notification settings

MaSayFl/EnsembleAssembler

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EnsembleAssembler

EnsembleAssembler-1.0.0 Quick Start Guide Blood Systems Research Institute & UCSF

EnsembleAssembler performs de novo assembly of pathogen genomes from metagenomic samples sequenced using Illumina platforms. EnsembleAssembler optimizes contig formation by integrating results from multiple assemblers including SOAPDenovo2, ABySS, MetaVelvet, and Cap3. The software does NOT perform preprocessing. The user needs to do preprocessing with other software before attempting for assembly.


INSTALLATION

Only Linux is supported. Python 2.6, 2.7 or 2.8 is required. To install, first download the latest distribution from http://ensembleassembly.sourceforge.net. Then execute tar zxvf ensembleAssembly_version.tar.gz

The resulting folder is referred to as /assembler_dir/. The directory /assembler_dir/bin/contains executable for individual assemblers which may or may not be compatible on your machine. It is advised that the users obtain individual assemblers from the following sources (October 2014); test them on your system and copy the executable files (SOAPdenovo-63mer, velvetg, velveth, meta-velvetg, abyss-pe, and cap3) to /assembler_dir/bin

http://www.bcgsc.ca/platform/bioinfo/software/abyss
http://soap.genomics.org.cn/soapdenovo.html
https://www.ebi.ac.uk/~zerbino/velvet/
http://metavelvet.dna.bio.keio.ac.jp/
http://seq.cs.iastate.edu/

Then execute chmod a+x /assembler_dir/* -R


USAGE

It is advised to create a new directory /proj_dir/ for each assembly project. Then create a configuration file config.txt inside /proj_dir/. An example configuration file is '/assembler_dir/example_project/config.txt'.

Run the following command to assemble the data.

cd /proj_dir/
/assembler_dir/ensembleAssembly ./config.txt
./ensemble.sh

You may want to change permission before issue previous command:

chmod 755 *

Users needs to provide a configuration fie for each run. Lines starting with a pound sign ('#') are comments and ignored. The sample configuration file looks like this:

PE=260 30 FULL_PATH/test1.fastq FULL_PATH/test2.fastq
#SE= FULL_PATH/ test1.fastq
NUM_THREADS= 8
SOAP_KMER=31
ABYSS_KMER = 31
METAVELVET_KMER=31
CON_LEN_DBG=150
CON_LEN_OLC=300
ASSEMBLY_MODE=optimal
#ASSEMBLY_MODE=quick

User can use either paired end (PE) or single end (SE) data, but not both at the same time. User need to provide insert mean size, insert size standard deviation, and 2 fastq files for the PE.

The following parameters are mandatory: NUM_THREADS=8 set it to the number of cores in the computer to be used for assembly

SOAP_KMER =31
ABYSS_KMER = 31
METAVELVET_KMER=31

this is the kmer size to be used SOAP, ABYSS, or METAVELVET

CON_LEN_DBG=150
CON_LEN_OLC=300

The length filter for contigs generated by DBG assemblers (first assembly) and by OLC assemblers (second and final assembly). Only contigs longer than these thresholds were kept in the output of DBG and OLC assemblers.

ASSEMBLY_MODE=optimal
ASSEMBLY_MODE=quick

The optimal mode will assemble SOAP, ABYSS, Partitioned ABYSS, MetaVelvet and Cap3. The quick mode will assemble ABYSS, Partitioned ABYSS and Cap3

If ASSEMBLY_MODE=optimal, the output will be in /assembler_dir /contig_SAVaC/ensemble_C.contig; If
ASSEMBLY_MODE=optimal, the output will be in /assembler_dir /contig_AaC/ensemble_C.contig. Both are in FASTA format.


HELP

Questions should be sent to Xutao Deng xutaodeng@gmail.com

About

EnsembleAssembler performs de novo assembly of pathogen genomes from metagenomic samples sequenced using Illumina platforms

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%