Releases: shelkmike/Mabs
Mabs 2.30
- Previously, calculate_AG always mapped Oxford Nanopore reads using Minimap2 with the option "map-ont". Now, if calculate_AG sees that the median read accuracy, computed by the Phred score in the FASTQ file, is 97% or higher, Minimap2 will be run with the option "lr:hq" instead of "map-ont". This makes Mabs significantly faster.
- Previously, if Mabs-hifiasm saw two assemblies with equal AG, it considered the one with the higher N50 as the better. Now, auN is used instead of N50. auN is a metric of assembly quality that is slightly better than N50, see https://lh3.github.io/2020/04/08/a-new-metric-on-assembly-contiguity .
Mabs 2.29
Main changes:
- Fixed a problem that caused Mabs-hifiasm to crash on computers where the command "sort" works in a non-standard way. This is the same problem as described in arq5x/bedtools2#323 .
- Now, Mabs uses BUSCO datasets from OrthoDB 10 current as of 2024 instead of BUSCO datasets from OrthoDB 10 current as of 2020.
- Now, Mabs-hifiasm is based on Hifiasm 0.25.0 instead of Hifiasm 0.19.8.
- Now, Mabs-flye is based on Flye 2.9.5 instead of Flye 2.9.3.
- Recently, the capability of making assemblies from raw (which means, without error correction by tools such as Dorado Correct or HERRO) high-accuracy Oxford Nanopore reads made on 10.4.1 flow cells was added to Hifiasm. To take this into account, the option "--pacbio_hifi_reads" of Mabs-hifiasm was renamed to "--long_reads".
By default, if the median accuracy of reads in FASTQ (calculated from Phred scores) is below 99.8%, Mabs-hifiasm assembles the genome as if reads are raw high-accuracy Oxford Nanopore reads. A user can change this threshold by using the option "--hifi_accuracy_threshold". Also, an option "--should_ont_be_used" has been added that allows the user to force Mabs-hifiasm to use the provided reads as raw high-accuracy Oxford Nanopore reads ("--should_ont_be_used true") or more accurate reads ("--should_ont_be_used false"). By "more accurate reads" here I mean either PacBio HiFi reads, or Oxford Nanopore reads corrected by tools such as Dorado Correct or HERRO.
For consistency with older versions of Mabs, the option "--pacbio_hifi_reads" has been kept, and is now synonymous with the new option "--long_reads".
[The above paragraph may sound complex, but Mabs-hifiasm works great with default parameters in most cases]
When assembling a genome from raw Oxford Nanopore reads, take into account that:
a) Hifiasm (and, consequently, Mabs-hifiasm) makes bad assemblies when the raw Nanopore reads have low accuracy (median accuracy below 97%). For such reads, I recommend using NextDenovo or Mabs-flye.
b) Hifiasm (and, consequently, Mabs-hifiasm) cannot use reads in FASTA format as raw Oxford Nanopore reads. In other words, please give raw Oxford Nanopore reads in FASTQ to Mabs-hifiasm. - Made several improvements to the script plot_gene_coverage_distribution.py, which is located in the folder ./Additional. This script builds sinaplots with gene coverage.
a) Now, this script has a user-friendly interface. Run "python3 plot_gene_coverage_distribution.py --help" to see the list of options. A user can customize the appearance of the produced figure. For example, change the size of points and change the maximum value on the vertical axis.
b) Now, this script can optionally make a per-orthogroup figure alongside a per-gene figure. In the per-gene figure (which is produced by default), every point is a gene, while in the per-orthogroup figure every point is an orthogroup. For orthogroups with several genes ("multicopy orthogroups") the coverage of orthogrous is calculated as the arithmetic mean of coverages of its genes.
Mabs 2.28
- Fixed a bug that was causing Mabs-flye to ignore results of the last point (the tenth point if using "--maximum_number_of_points_to_try 10"). That bug was introduced in Mabs 2.24. The bug might make some assemblies slightly worse than they could have been.
- Since Mabs 2.24, Mabs-flye determines the option with which it gives reads to Flye, based on the median read accuracy. After a series of experiments, I decided to change the scheme that was used in Mabs 2.24. Now the Flye option depends on the median read accuracy as follows:
(0%; 95%] - "--nano-raw"
(95%; 99.8%] - "--nano-hq"
(99.8%; 100%] - "--pacbio-hifi"
Mabs 2.27
- Now Mabs-hifiasm is based on Hifiasm 0.19.8 instead of Hifiasm 0.19.5.
- Now Mabs-flye is based on Flye 2.9.3 instead of Flye 2.9.2.
- Several small improvements.
- Starting from the version 2.27, Mabs is available in a Singularity container. You can download the container from https://mikeshelk.site/Diff/Mabs_distribution/Singularity_containers/ . To read about specifics of using Mabs from the container, run the command "singularity run-help mabs.sif".
Mabs 2.24
-
The optimization algorithm of Mabs-flye has been changed.
Previously, Mabs-flye assumed the values of Flye parameters "assemble_ovlp_divergence" and "repeat_graph_ovlp_divergence" equal to each other. Mabs-flye optimized the resulting single parameter (called "max_divergence" in Mabs-flye) using the golden section method. Now Mabs-flye optimizes these two parameters independently using the Nelder-Mead method. This allows for a more thorough exploration of the parameter space.
By default, Mabs-flye tests at most 10 points in the two-dimensional parameter space. One of the starting points corresponds to the default values of "assemble_ovlp_divergence" and "repeat_graph_ovlp_divergence" used by Flye.
Less than 10 points may be tested if the optimization algorithm (the function scipy.optimize.minimize of the Python library SciPy) desides that convergence has been achieved.
The maximum number of tried points can be set with a new Mabs-flye parameter "--maximum_number_of_points_to_try". Increasing the value of this parameter will increase the computation time of Mabs-flye, but may make the assembly better. By default, the value of "--maximum_number_of_points_to_try" is 10. -
The optimization algorithm of Mabs-hifiasm has been changed.
Previously, if two "-s" values of Hifiasm led in the same AG, Mabs-hifiasm considered the assembly with the smaller "-s" better (smaller "-s" corresponds to stricter pruning of possible haplotypic duplications). Now if two "-s" values lead to the same AG, Mabs-hifiasm considers the best the one whose assembly has the larger N50. If both assemblies have the same N50 then, as previously, the assembly with the smaller "-s" is preferred. -
Previously, Mabs-flye provided all reads to Flye via the option "--nano-raw". Now, Mabs-flye uses the Phred score lines in FASTQ to calculate the accuracy of each read. Then, Mabs-flye calculates the median accuracy among all reads.
The correspondence between the median accuracy and the option with which Mabs-flye provides reads to Flye:
(0%; 95%] - "--nano-raw"
(95%; 97%] - "--nano-hq"
(97%; 99%] - "--nano-corr"
(99%; 100%] - "--pacbio-hifi"
To speed up the read accuracy calculation, Mabs-flye calculates the accuracy only for BUSCO reads (i.e. reads that correspond to BUSCO genes). -
Now Mabs-hifiasm is based on Hifiasm 0.19.5 instead of Hifiasm 0.19.3.
-
Several small changes in names of subfolders and files produced by Mabs have been made.
-
Several bugs which slightly decreased the assembly accuracy have been fixed.
Mabs 2.19
Mabs 2.18
- Now Mabs-flye is based on Flye 2.9.2 instead of Flye 2.9.1. This may increase the assembly quality.
- Now Mabs-hifiasm is based on Hifiasm 0.19.3 instead of Hifiasm 0.16.1. This may increase the assembly quality.
- The ability to use ultralong Nanopore reads has been recently added to Hifiasm. Mabs-hifiasm now can use them too, via a new option "--ultralong-nanopore-reads".
- A new option "--additional_flye_parameters" has been added to Mabs-flye. A user can pass some Flye-specific parameters to Flye through it.
- A new option "--additional_hifiasm_parameters" has been added to Mabs-hifiasm. A user can pass some Hifiasm-specific parameters to Hifiasm through it.
- Several small improvements.