Skip to content

Issue assembling plant genome with NECAT #47

@LeoVincenzi

Description

@LeoVincenzi

Hi,
I'm working on a plant genome and I'm trying to assemble it with NECAT, but the final assembly I obtain is really inconsistent.
The expected genome size is 1.2 Gbp and I'm working with Oxford Nanopore reads. The starting data for the assembly are reported in the following table:

Number of reads 1,341,399
Number of bases (bp) 33,136,270,559
Average read length (bp) 24,703
Reads N50 (bp) 40,677
Expected fold-coverage 28x

The obtained results are the following:

  NECAT v.0.0.1
Total assembly size (bp) 604,869
Num. Contigs 12
Contigs average length (bp) 50,406
N50 (bp) 153,041
N90 (bp) 17,942
Longest contig (bp) 154,607

The command I run was
/opt/NECAT/Linux-amd64/bin/necat.pl assemble config.txt
and the config file was compiled as it follows:

PROJECT=Plant_genome
ONT_READ_LIST=read_list.txt
GENOME_SIZE=1200000000
THREADS=15
MIN_READ_LENGTH=3000
PREP_OUTPUT_COVERAGE=28
OVLP_FAST_OPTIONS=-n 500 -z 20 -b 2000 -e 0.5 -j 0 -u 1 -a 1000
OVLP_SENSITIVE_OPTIONS=-n 500 -z 10 -e 0.5 -j 0 -u 1 -a 1000
CNS_FAST_OPTIONS=-a 2000 -x 4 -y 12 -l 1000 -e 0.5 -p 0.8 -u 0
CNS_SENSITIVE_OPTIONS=-a 2000 -x 4 -y 12 -l 1000 -e 0.5 -p 0.8 -u 0
TRIM_OVLP_OPTIONS=-n 100 -z 10 -b 2000 -e 0.5 -j 1 -u 1 -a 400
ASM_OVLP_OPTIONS=-n 100 -z 10 -b 2000 -e 0.5 -j 1 -u 0 -a 400
NUM_ITER=1
CNS_OUTPUT_COVERAGE=28
CLEANUP=1
USE_GRID=true
GRID_NODE=8
GRID_OPTIONS=
SMALL_MEMORY=0
FSA_OL_FILTER_OPTIONS=
FSA_ASSEMBLE_OPTIONS=
FSA_CTG_BRIDGE_OPTIONS=
POLISH_CONTIGS=true

I would like to understand why the assembly obtained is so poor and how can I improve it. Maybe the parameters used for this dataset are inadequate?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions