EnsembleAssembler

EnsembleAssembler-1.0.0 Quick Start Guide Blood Systems Research Institute & UCSF

EnsembleAssembler performs de novo assembly of pathogen genomes from metagenomic samples sequenced using Illumina platforms. EnsembleAssembler optimizes contig formation by integrating results from multiple assemblers including SOAPDenovo2, ABySS, MetaVelvet, and Cap3. The software does NOT perform preprocessing. The user needs to do preprocessing with other software before attempting for assembly.

INSTALLATION

Only Linux is supported. Python 2.6, 2.7 or 2.8 is required. To install, first download the latest distribution from http://ensembleassembly.sourceforge.net. Then execute tar zxvf ensembleAssembly_version.tar.gz

The resulting folder is referred to as /assembler_dir/. The directory /assembler_dir/bin/contains executable for individual assemblers which may or may not be compatible on your machine. It is advised that the users obtain individual assemblers from the following sources (October 2014); test them on your system and copy the executable files (SOAPdenovo-63mer, velvetg, velveth, meta-velvetg, abyss-pe, and cap3) to /assembler_dir/bin

http://www.bcgsc.ca/platform/bioinfo/software/abyss
http://soap.genomics.org.cn/soapdenovo.html
https://www.ebi.ac.uk/~zerbino/velvet/
http://metavelvet.dna.bio.keio.ac.jp/
http://seq.cs.iastate.edu/

Then execute chmod a+x /assembler_dir/* -R

USAGE

It is advised to create a new directory /proj_dir/ for each assembly project. Then create a configuration file config.txt inside /proj_dir/. An example configuration file is '/assembler_dir/example_project/config.txt'.

Run the following command to assemble the data.

cd /proj_dir/
/assembler_dir/ensembleAssembly ./config.txt
./ensemble.sh

You may want to change permission before issue previous command:

chmod 755 *

Users needs to provide a configuration fie for each run. Lines starting with a pound sign ('#') are comments and ignored. The sample configuration file looks like this:

PE=260 30 FULL_PATH/test1.fastq FULL_PATH/test2.fastq
#SE= FULL_PATH/ test1.fastq
NUM_THREADS= 8
SOAP_KMER=31
ABYSS_KMER = 31
METAVELVET_KMER=31
CON_LEN_DBG=150
CON_LEN_OLC=300
ASSEMBLY_MODE=optimal
#ASSEMBLY_MODE=quick

User can use either paired end (PE) or single end (SE) data, but not both at the same time. User need to provide insert mean size, insert size standard deviation, and 2 fastq files for the PE.

The following parameters are mandatory: NUM_THREADS=8 set it to the number of cores in the computer to be used for assembly

SOAP_KMER =31
ABYSS_KMER = 31
METAVELVET_KMER=31

this is the kmer size to be used SOAP, ABYSS, or METAVELVET

CON_LEN_DBG=150
CON_LEN_OLC=300

The length filter for contigs generated by DBG assemblers (first assembly) and by OLC assemblers (second and final assembly). Only contigs longer than these thresholds were kept in the output of DBG and OLC assemblers.

ASSEMBLY_MODE=optimal
ASSEMBLY_MODE=quick

The optimal mode will assemble SOAP, ABYSS, Partitioned ABYSS, MetaVelvet and Cap3. The quick mode will assemble ABYSS, Partitioned ABYSS and Cap3

If ASSEMBLY_MODE=optimal, the output will be in /assembler_dir /contig_SAVaC/ensemble_C.contig; If
ASSEMBLY_MODE=optimal, the output will be in /assembler_dir /contig_AaC/ensemble_C.contig. Both are in FASTA format.

HELP

Questions should be sent to Xutao Deng xutaodeng@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
bin		bin
example_data		example_data
example_project		example_project
README.md		README.md
dedup		dedup
ensembleAssembly		ensembleAssembly
faLenFilter.py		faLenFilter.py
fqLenFilter.py		fqLenFilter.py
partition.py		partition.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EnsembleAssembler

About

Uh oh!

Releases

Packages

Languages

MaSayFl/EnsembleAssembler

Folders and files

Latest commit

History

Repository files navigation

EnsembleAssembler

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages