-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Hello, I previous used a set of mate pair libraries to scaffold Allpath-LG scaffolds and was quite successful. Now I got a new assembly from nanopore contigs, and try to apply the same scaffolding procedure.
However it looks like most of the reads were discarded. What is a reason for this?
Thanks
Ray
Statistics.txt
Initial number of contigs: 48716.
Number of contigs discarded from further analysis (with -filter_contigs set to 10): 1
Time elapsed for reading in contig sequences:7.42810487747
PASS 1
-T 7107.0 -t 5673.0
Contamine mean before filtering : 3169.82170245
Contamine stddev before filtering: 22984.3175557
Contamine mean converged: 677.939688523
Contamine std_est converged: 1121.3850919
LIBRARY STATISTICS
Mean of library set to: 2805.0
Standard deviation of library set to: 717.0
MP library PE contamination:
Contamine rate (rev comp oriented) estimated to: False
lib contamine mean (avg fragmentation size): 0
lib contamine stddev: 0
Number of contamined reads used for this calculation: 10081.0
-T (library insert size threshold) set to: 7107.0
-k set to (Scaffolding with contigs larger than): 5673.0
Number of links required to create an edge: None
Maximum identical contig-end overlap-length to merge of contigs that are adjacent in a scaffold: 200
Read length set to: 62.41
Time elapsed for getting libmetrics, iteration 0: 2.3990881443
Parsing BAM file...
L50: 2662 N50: 126232 Initial contig assembly length: 1505017490
Time initializing BESST objects: 0.231798887253
Total time elapsed for initializing Graph: 0.617565870285
Reading bam file and creating scaffold graph...
ELAPSED reading file: 6581.09070301
NR OF FISHY READ LINKS: 139654
Number of USEFUL READS (reads mapping to different contigs uniquly): 338778484
Number of non unique reads (at least one read non-unique in read pair) that maps to different contigs (filtered out from scaffolding): 478966897
Reads with too large insert size from "USEFUL READS" (filtered out): 304923304
Initial number of edges in G (the graph with large contigs): 858809
Initial number of edges in G_prime (the full graph of all contigs before removal of repats): 2204910
Number of duplicated reads indicated and removed: 26299652
Mean coverage before filtering out extreme observations = 150.34412238
Std dev of coverage before filtering out extreme observations= 888.146107052
Mean coverage after filtering = 0.0386320692138
Std coverage after filtering = 0.0212291479729
Length of longest contig in calc of coverage: 1578503
Length of shortest contig in calc of coverage: 5673
Detecting repeats..
Removed a total of: 43707 repeats. With coverage larger than 0.13706564204
Number of edges in G (after repeat removal): 1503
Number of edges in G_prime (after repeat removal): 5008
Number of BWA buggy edges removed: 0
Number of edges in G (after filtering for buggy flag stats reporting): 1503
Number of edges in G_prime (after filtering for buggy flag stats reporting): 5008
Letting filtering threshold in high complexity regions be 5 for this library.
Letting -e be 5 for this library.
Removed 0 edges from graph G of border contigs.
Remove edges in high complexity areas.
Removed total of 0 edges in high density areas.
Removed an additional of 0 edges with low support from full graph G_prime of all contigs.
Number of significantly spurious edges: 0
Number of edges in G_prime (after removing edges under -e threshold (if not specified, default is -e 3): 5008
Nr of contigs/scaffolds included in this pass: 5008
Out of which 1503 acts as border contigs.
Total time for CreateGraph-module, iteration 0: 6599.10073209
0 link edges created.
Perform inference on scaffold graph...
Remove isolated nodes.
1503 isolated contigs removed from graph.
Searching for paths BETWEEN scaffolds
Entering ELS.BetweenScaffolds single core
iterating until maximum of 0 extensions.
Number of nodes:10016, Number of edges: 5008
Elapsed time single core pathfinder: 0.0146651268005
0 paths detected are with score greater or equal to 1.5
Nr of contigs left: 5008.0 Nr of linking edges left: 0.0
Number of gaps estimated by GapEst-LP module order_contigs in this step is: 0
Time elapsed for making scaffolds, iteration 0: 0.152600049973
(super)Contigs after scaffolding: 5008
param value
detect_haplotype False
hit_path_threshold False
lognormal False
orientation rf
gap_estimations []
hapl_threshold 3
gff_file None
lower_cov_cutoff 0
path_gaps_estimated 0
expected_links_over_mean_plus_stddev 5
read_len 62.41
pass_number 1
path_threshold 100000
std_dev_coverage 0.0212291479729
mean_coverage 0.0386320692138
detect_duplicate True
FASTER_ILP False
development False
std_dev_ins_size 717.0
NO_ILP False
current_N50 126232
print_scores False
mean_ins_size 2805.0
multiprocess False
scaffold_indexer 48716
hapl_ratio 1.3
no_score True
first_lib True
current_L50 2662
plots False
contigfile None
cov_cutoff None
contamination_ratio False
ins_size_threshold 7107.0
edgesupport 5
extend_paths True
tot_assembly_length 1505017490
max_extensions None
score_cutoff 1.5
min_mapq 20
information_file <open file 'scaffold//BESST_output/Statistics.txt', mode 'w' at 0x7f6549be1300>
contamination_mean 0
max_contig_overlap 200
contig_threshold 6958
contamination_stddev 0
dfs_traversal True
PASS 2
-T 8421.0 -t 7243.0
Contamine mean before filtering : 644.91884719
Contamine stddev before filtering: 7618.78938119
Contamine mean converged: 323.448900031
Contamine std_est converged: 136.579587956
LIBRARY STATISTICS
Mean of library set to: 4887.0
Standard deviation of library set to: 589.0
MP library PE contamination:
Contamine rate (rev comp oriented) estimated to: 0.220716849845
lib contamine mean (avg fragmentation size): 323.448900031
lib contamine stddev: 136.579587956
Number of contamined reads used for this calculation: 97730.0
-T (library insert size threshold) set to: 8421.0
-k set to (Scaffolding with contigs larger than): 7243.0
Number of links required to create an edge: None
Maximum identical contig-end overlap-length to merge of contigs that are adjacent in a scaffold: 200
Read length set to: 48.78
Time elapsed for getting libmetrics, iteration 1: 2.95900011063
Parsing BAM file...
L50: 0 N50: 0 Initial contig assembly length: 1505017490
Nr of contigs/scaffolds that was singeled out due to length constraints 368
Time cleaning BESST objects for next library: 0.00483298301697
Total time elapsed for initializing Graph: 0.0218350887299
Reading bam file and creating scaffold graph...
ELAPSED reading file: 325.734697104
NR OF FISHY READ LINKS: 0
Number of USEFUL READS (reads mapping to different contigs uniquly): 0
Number of non unique reads (at least one read non-unique in read pair) that maps to different contigs (filtered out from scaffolding): 0
Reads with too large insert size from "USEFUL READS" (filtered out): 0
Initial number of edges in G (the graph with large contigs): 0
Initial number of edges in G_prime (the full graph of all contigs before removal of repats): 5008
Number of duplicated reads indicated and removed: 0
Mean coverage before filtering out extreme observations = 0.00857007310694
Std dev of coverage before filtering out extreme observations= 0.0171089338111
Mean coverage after filtering = 9.06876577268e-05
Std coverage after filtering = 0.000461890485611
Length of longest contig in calc of coverage: 89136
Length of shortest contig in calc of coverage: 7243
Number of edges in G (after repeat removal): 0
Number of edges in G_prime (after repeat removal): 5008
Number of BWA buggy edges removed: 0
Number of edges in G (after filtering for buggy flag stats reporting): 0
Number of edges in G_prime (after filtering for buggy flag stats reporting): 5008
Letting filtering threshold in high complexity regions be 5 for this library.
Letting -e be 5 for this library.
Removed 0 edges from graph G of border contigs.
Remove edges in high complexity areas.
Removed total of 0 edges in high density areas.
Removed an additional of 0 edges with low support from full graph G_prime of all contigs.
Number of edges in G_prime (after removing edges under -e threshold (if not specified, default is -e 3): 5008
Nr of contigs/scaffolds included in this pass: 5008
Out of which 1135 acts as border contigs.
Total time for CreateGraph-module, iteration 1: 325.839869976
0 link edges created.
Perform inference on scaffold graph...
Remove isolated nodes.
0 isolated contigs removed from graph.
Searching for paths BETWEEN scaffolds
Entering ELS.BetweenScaffolds single core
iterating until maximum of 0 extensions.
Number of nodes:10016, Number of edges: 5008
Elapsed time single core pathfinder: 0.0115258693695
0 paths detected are with score greater or equal to 1.5
Nr of contigs left: 5008.0 Nr of linking edges left: 0.0
Number of gaps estimated by GapEst-LP module order_contigs in this step is: 0
Time elapsed for making scaffolds, iteration 1: 0.149516105652
(super)Contigs after scaffolding: 5008
param value
detect_haplotype False
hit_path_threshold False
lognormal False
orientation rf
gap_estimations []
hapl_threshold 3
gff_file None
lower_cov_cutoff 0
path_gaps_estimated 0
expected_links_over_mean_plus_stddev 5
read_len 48.78
pass_number 2
path_threshold 100000
std_dev_coverage 0.000461890485611
mean_coverage 9.06876577268e-05
detect_duplicate True
FASTER_ILP False
development False
std_dev_ins_size 589.0
NO_ILP False
current_N50 0
print_scores False
mean_ins_size 4887.0
multiprocess False
scaffold_indexer 48716
hapl_ratio 1.3
no_score True
first_lib False
current_L50 0
plots False
contigfile None
cov_cutoff None
contamination_ratio 0.220716849845
ins_size_threshold 8421.0
edgesupport 5
extend_paths True
tot_assembly_length 1505017490
max_extensions None
score_cutoff 1.5
min_mapq 20
information_file <open file 'scaffold//BESST_output/Statistics.txt', mode 'w' at 0x7f6549be1300>
contamination_mean 323.448900031
max_contig_overlap 200
contig_threshold 6958
contamination_stddev 136.579587956
dfs_traversal True
PASS 3
-T 13492.0 -t 11160.0
Contamine mean before filtering : 24633.7478754
Contamine stddev before filtering: 89658.0332439
Contamine mean converged: 6422.97819315
Contamine std_est converged: 4321.78894197
LIBRARY STATISTICS
Mean of library set to: 6496.0
Standard deviation of library set to: 1166.0
MP library PE contamination:
Contamine rate (rev comp oriented) estimated to: False
lib contamine mean (avg fragmentation size): 0
lib contamine stddev: 0
Number of contamined reads used for this calculation: 321.0
-T (library insert size threshold) set to: 13492.0
-k set to (Scaffolding with contigs larger than): 11160.0
Number of links required to create an edge: None
Maximum identical contig-end overlap-length to merge of contigs that are adjacent in a scaffold: 200
Read length set to: 168.11
Time elapsed for getting libmetrics, iteration 2: 3.24759888649
Parsing BAM file...
L50: 0 N50: 0 Initial contig assembly length: 1505017490
Nr of contigs/scaffolds that was singeled out due to length constraints 486
Time cleaning BESST objects for next library: 0.00424909591675
Total time elapsed for initializing Graph: 0.0218479633331
Reading bam file and creating scaffold graph...
ELAPSED reading file: 25.9454369545
NR OF FISHY READ LINKS: 0
Number of USEFUL READS (reads mapping to different contigs uniquly): 0
Number of non unique reads (at least one read non-unique in read pair) that maps to different contigs (filtered out from scaffolding): 0
Reads with too large insert size from "USEFUL READS" (filtered out): 0
Initial number of edges in G (the graph with large contigs): 0
Initial number of edges in G_prime (the full graph of all contigs before removal of repats): 5008
Number of duplicated reads indicated and removed: 0
Mean coverage before filtering out extreme observations = 0.00124294772942
Std dev of coverage before filtering out extreme observations= 0.00507329288087
Mean coverage after filtering = 0.00124294772942
Std coverage after filtering = 0.00507329288087
Length of longest contig in calc of coverage: 89136
Length of shortest contig in calc of coverage: 11160
Number of edges in G (after repeat removal): 0
Number of edges in G_prime (after repeat removal): 5008
Number of BWA buggy edges removed: 0
Number of edges in G (after filtering for buggy flag stats reporting): 0
Number of edges in G_prime (after filtering for buggy flag stats reporting): 5008
Letting filtering threshold in high complexity regions be 5 for this library.
Letting -e be 5 for this library.
Removed 0 edges from graph G of border contigs.
Remove edges in high complexity areas.
Removed total of 0 edges in high density areas.
Removed an additional of 0 edges with low support from full graph G_prime of all contigs.
Number of edges in G_prime (after removing edges under -e threshold (if not specified, default is -e 3): 5008
Nr of contigs/scaffolds included in this pass: 5008
Out of which 649 acts as border contigs.
Total time for CreateGraph-module, iteration 2: 26.0430119038
0 link edges created.
Perform inference on scaffold graph...
Remove isolated nodes.
0 isolated contigs removed from graph.
Searching for paths BETWEEN scaffolds
Entering ELS.BetweenScaffolds single core
iterating until maximum of 0 extensions.
Number of nodes:10016, Number of edges: 5008
Elapsed time single core pathfinder: 0.0116968154907
0 paths detected are with score greater or equal to 1.5
Nr of contigs left: 5008.0 Nr of linking edges left: 0.0
Number of gaps estimated by GapEst-LP module order_contigs in this step is: 0
Time elapsed for making scaffolds, iteration 2: 0.221040964127
(super)Contigs after scaffolding: 5008
param value
detect_haplotype False
hit_path_threshold False
lognormal False
orientation rf
gap_estimations []
hapl_threshold 3
gff_file None
lower_cov_cutoff 0
path_gaps_estimated 0
expected_links_over_mean_plus_stddev 5
read_len 168.11
pass_number 3
path_threshold 100000
std_dev_coverage 0.00507329288087
mean_coverage 0.00124294772942
detect_duplicate True
FASTER_ILP False
development False
std_dev_ins_size 1166.0
NO_ILP False
current_N50 0
print_scores False
mean_ins_size 6496.0
multiprocess False
scaffold_indexer 48716
hapl_ratio 1.3
no_score True
first_lib False
current_L50 0
plots False
contigfile None
cov_cutoff None
contamination_ratio False
ins_size_threshold 13492.0
edgesupport 5
extend_paths True
tot_assembly_length 1505017490
max_extensions None
score_cutoff 1.5
min_mapq 20
information_file <open file 'scaffold//BESST_output/Statistics.txt', mode 'w' at 0x7f6549be1300>
contamination_mean 0
max_contig_overlap 200
contig_threshold 6958
contamination_stddev 0
dfs_traversal True
PASS 4
-T 42537.0 -t 32239.0
Contamine mean before filtering : 29519.8535565
Contamine stddev before filtering: 68616.3455717
Contamine mean converged: 16048.5931953
Contamine std_est converged: 7824.95943874
LIBRARY STATISTICS
Mean of library set to: 11643.0
Standard deviation of library set to: 5149.0
MP library PE contamination:
Contamine rate (rev comp oriented) estimated to: False
lib contamine mean (avg fragmentation size): 0
lib contamine stddev: 0
Number of contamined reads used for this calculation: 676.0
-T (library insert size threshold) set to: 42537.0
-k set to (Scaffolding with contigs larger than): 32239.0
Number of links required to create an edge: None
Maximum identical contig-end overlap-length to merge of contigs that are adjacent in a scaffold: 200
Read length set to: 152.81
Time elapsed for getting libmetrics, iteration 3: 3.09710383415
Parsing BAM file...
L50: 0 N50: 0 Initial contig assembly length: 1505017490
Nr of contigs/scaffolds that was singeled out due to length constraints 589
Time cleaning BESST objects for next library: 0.00451493263245
Total time elapsed for initializing Graph: 0.0214760303497
Reading bam file and creating scaffold graph...
ELAPSED reading file: 16.8508169651
NR OF FISHY READ LINKS: 0
Number of USEFUL READS (reads mapping to different contigs uniquly): 0
Number of non unique reads (at least one read non-unique in read pair) that maps to different contigs (filtered out from scaffolding): 0
Reads with too large insert size from "USEFUL READS" (filtered out): 0
Initial number of edges in G (the graph with large contigs): 0
Initial number of edges in G_prime (the full graph of all contigs before removal of repats): 5008
Number of duplicated reads indicated and removed: 0
Mean coverage before filtering out extreme observations = 0.00143064790959
Std dev of coverage before filtering out extreme observations= 0.00381765871116
Mean coverage after filtering = 0.00143064790959
Std coverage after filtering = 0.00381765871116
Length of longest contig in calc of coverage: 89136
Length of shortest contig in calc of coverage: 32418
Number of edges in G (after repeat removal): 0
Number of edges in G_prime (after repeat removal): 5008
Number of BWA buggy edges removed: 0
Number of edges in G (after filtering for buggy flag stats reporting): 0
Number of edges in G_prime (after filtering for buggy flag stats reporting): 5008
Letting filtering threshold in high complexity regions be 5 for this library.
Letting -e be 5 for this library.
Removed 0 edges from graph G of border contigs.
Remove edges in high complexity areas.
Removed total of 0 edges in high density areas.
Removed an additional of 0 edges with low support from full graph G_prime of all contigs.
Number of edges in G_prime (after removing edges under -e threshold (if not specified, default is -e 3): 5008
Nr of contigs/scaffolds included in this pass: 5008
Out of which 60 acts as border contigs.
Total time for CreateGraph-module, iteration 3: 16.9412498474
0 link edges created.
Perform inference on scaffold graph...
Remove isolated nodes.
0 isolated contigs removed from graph.
Searching for paths BETWEEN scaffolds
Entering ELS.BetweenScaffolds single core
iterating until maximum of 0 extensions.
Number of nodes:10016, Number of edges: 5008
Elapsed time single core pathfinder: 0.0114350318909
0 paths detected are with score greater or equal to 1.5
Nr of contigs left: 5008.0 Nr of linking edges left: 0.0
Number of gaps estimated by GapEst-LP module order_contigs in this step is: 0
Time elapsed for making scaffolds, iteration 3: 0.63897395134
(super)Contigs after scaffolding: 5008
param value
detect_haplotype False
hit_path_threshold False
lognormal False
orientation rf
gap_estimations []
hapl_threshold 3
gff_file None
lower_cov_cutoff 0
path_gaps_estimated 0
expected_links_over_mean_plus_stddev 5
read_len 152.81
pass_number 4
path_threshold 100000
std_dev_coverage 0.00381765871116
mean_coverage 0.00143064790959
detect_duplicate True
FASTER_ILP False
development False
std_dev_ins_size 5149.0
NO_ILP False
current_N50 0
print_scores False
mean_ins_size 11643.0
multiprocess False
scaffold_indexer 48716
hapl_ratio 1.3
no_score True
first_lib False
current_L50 0
plots False
contigfile None
cov_cutoff None
contamination_ratio False
ins_size_threshold 42537.0
edgesupport 5
extend_paths True
tot_assembly_length 1505017490
max_extensions None
score_cutoff 1.5
min_mapq 20
information_file <open file 'scaffold//BESST_output/Statistics.txt', mode 'w' at 0x7f6549be1300>
contamination_mean 0
max_contig_overlap 200
contig_threshold 6958
contamination_stddev 0
dfs_traversal True
PASS 5
-T 333115.0 -t 259485.0
LIBRARY STATISTICS
Mean of library set to: 112225.0
Standard deviation of library set to: 36815.0
MP library PE contamination:
Contamine rate (rev comp oriented) estimated to: False
lib contamine mean (avg fragmentation size): 0
lib contamine stddev: 0
Number of contamined reads used for this calculation: 0.0
-T (library insert size threshold) set to: 333115.0
-k set to (Scaffolding with contigs larger than): 259485.0
Number of links required to create an edge: None
Maximum identical contig-end overlap-length to merge of contigs that are adjacent in a scaffold: 200
Read length set to: 719.6
Time elapsed for getting libmetrics, iteration 4: 0.472044944763
Parsing BAM file...
L50: 0 N50: 0 Initial contig assembly length: 1505017490
Nr of contigs/scaffolds that was singeled out due to length constraints 60
Time cleaning BESST objects for next library: 0.00434803962708
Total time for CreateGraph-module, iteration 4: 0.00948882102966
0 link edges created.
Perform inference on scaffold graph...
Remove isolated nodes.
0 isolated contigs removed from graph.
Searching for paths BETWEEN scaffolds
Entering ELS.BetweenScaffolds single core
iterating until maximum of 0 extensions.
Number of nodes:0, Number of edges: 0
Elapsed time single core pathfinder: 4.2200088501e-05
0 paths detected are with score greater or equal to 1.5
Nr of contigs left: 0.0 Nr of linking edges left: 0.0
Number of gaps estimated by GapEst-LP module order_contigs in this step is: 0
Time elapsed for making scaffolds, iteration 4: 5.06741690636
(super)Contigs after scaffolding: 5008
param value
detect_haplotype False
hit_path_threshold False
lognormal False
orientation fr
gap_estimations []
hapl_threshold 3
gff_file None
lower_cov_cutoff 0
path_gaps_estimated 0
expected_links_over_mean_plus_stddev 5
read_len 719.6
pass_number 5
path_threshold 100000
std_dev_coverage 0.00381765871116
mean_coverage 0.00143064790959
detect_duplicate True
FASTER_ILP False
development False
std_dev_ins_size 36815.0
NO_ILP False
current_N50 0
print_scores False
mean_ins_size 112225.0
multiprocess False
scaffold_indexer 48716
hapl_ratio 1.3
no_score True
first_lib False
current_L50 0
plots False
contigfile None
cov_cutoff None
contamination_ratio False
ins_size_threshold 333115.0
edgesupport None
extend_paths True
tot_assembly_length 1505017490
max_extensions None
score_cutoff 1.5
min_mapq 20
information_file <open file 'scaffold//BESST_output/Statistics.txt', mode 'w' at 0x7f6549be1300>
contamination_mean 0
max_contig_overlap 200
contig_threshold 6958
contamination_stddev 0
dfs_traversal True
L50: 0 N50: 0 Initial contig assembly length: 1505017490
Total time for scaffolding: 7012.52787113