EnsembleAssembler-1.0.0 Quick Start Guide Blood Systems Research Institute & UCSF
EnsembleAssembler performs de novo assembly of pathogen genomes from metagenomic samples sequenced using Illumina platforms. EnsembleAssembler optimizes contig formation by integrating results from multiple assemblers including SOAPDenovo2, ABySS, MetaVelvet, and Cap3. The software does NOT perform preprocessing. The user needs to do preprocessing with other software before attempting for assembly.
INSTALLATION
Only Linux is supported. Python 2.6, 2.7 or 2.8 is required. To install, first download the latest distribution from http://ensembleassembly.sourceforge.net. Then execute tar zxvf ensembleAssembly_version.tar.gz
The resulting folder is referred to as /assembler_dir/. The directory /assembler_dir/bin/contains executable for individual assemblers which may or may not be compatible on your machine. It is advised that the users obtain individual assemblers from the following sources (October 2014); test them on your system and copy the executable files (SOAPdenovo-63mer, velvetg, velveth, meta-velvetg, abyss-pe, and cap3) to /assembler_dir/bin
http://www.bcgsc.ca/platform/bioinfo/software/abyss
http://soap.genomics.org.cn/soapdenovo.html
https://www.ebi.ac.uk/~zerbino/velvet/
http://metavelvet.dna.bio.keio.ac.jp/
http://seq.cs.iastate.edu/
Then execute chmod a+x /assembler_dir/* -R
USAGE
It is advised to create a new directory /proj_dir/ for each assembly project. Then create a configuration file config.txt inside /proj_dir/. An example configuration file is '/assembler_dir/example_project/config.txt'.
Run the following command to assemble the data.
cd /proj_dir/
/assembler_dir/ensembleAssembly ./config.txt
./ensemble.sh
You may want to change permission before issue previous command:
chmod 755 *
Users needs to provide a configuration fie for each run. Lines starting with a pound sign ('#') are comments and ignored. The sample configuration file looks like this:
PE=260 30 FULL_PATH/test1.fastq FULL_PATH/test2.fastq
#SE= FULL_PATH/ test1.fastq
NUM_THREADS= 8
SOAP_KMER=31
ABYSS_KMER = 31
METAVELVET_KMER=31
CON_LEN_DBG=150
CON_LEN_OLC=300
ASSEMBLY_MODE=optimal
#ASSEMBLY_MODE=quick
User can use either paired end (PE) or single end (SE) data, but not both at the same time. User need to provide insert mean size, insert size standard deviation, and 2 fastq files for the PE.
The following parameters are mandatory: NUM_THREADS=8 set it to the number of cores in the computer to be used for assembly
SOAP_KMER =31
ABYSS_KMER = 31
METAVELVET_KMER=31
this is the kmer size to be used SOAP, ABYSS, or METAVELVET
CON_LEN_DBG=150
CON_LEN_OLC=300
The length filter for contigs generated by DBG assemblers (first assembly) and by OLC assemblers (second and final assembly). Only contigs longer than these thresholds were kept in the output of DBG and OLC assemblers.
ASSEMBLY_MODE=optimal
ASSEMBLY_MODE=quick
The optimal mode will assemble SOAP, ABYSS, Partitioned ABYSS, MetaVelvet and Cap3. The quick mode will assemble ABYSS, Partitioned ABYSS and Cap3
If ASSEMBLY_MODE=optimal, the output will be in /assembler_dir /contig_SAVaC/ensemble_C.contig; If
ASSEMBLY_MODE=optimal, the output will be in /assembler_dir /contig_AaC/ensemble_C.contig. Both are in FASTA format.
HELP
Questions should be sent to Xutao Deng xutaodeng@gmail.com