Releases: steineggerlab/Metabuli
Releases · steineggerlab/Metabuli
v1.2.0
Version used in "Sensitive and scalable metagenomic classification using spaced metamers, reduced alphabets, and syncmers"
From v1.2.0 documentation is moved to https://jaebeom-kim.github.io/metabuli-doc/
Metabuli v1.2.0
Improved sensitivity via spaced k-mers and reduced amino acid alphabet
- Three layers of mismatch tolerance:
- Amino acid-level k-mer search allows for synonymous mutations (original feature).
- Reduced amino acid alphabet groups similar amino acids together, allowing for conservative substitutions (NEW).
- Spaced k-mers allow for mismatches at specific positions in the k-mer (NEW).
Improved scalability via syncmers
- Syncmers are a subsampling technique that selects a subset of k-mers based on the presence of specific s-mers. This reduces the database size and classification time, minimally decreasing sensitivity in remote homology detection.
- Two times smaller database and two times faster classification when
c = (k - s + 1) / 2 = 2.
Related options in build module
--syncmer: Use syncmers instead of all k-mers.
It reduces the database size and classification speed.
Reduction rate (k - s + 1) / 2 can be specified using--smer-lento set s.
As k-mers are subsampled, sensitivity in remote homology detection is decreased.--space-mask: Use spaced k-mers instead of contiguous ones.--custom-metamer: Specify k-mer length and customize a translation table.
Related options in classify module
--precise: preset for more precise but less sensitive classification-e: maximum E-value
Metabuli v1.1.1
- Version packaged in Metabuli App
- Import the latest MMseqs2 as a git submodule
- Added FASTA/Q format validators:
fastq_utilsandfasta_validator - Added database validation function:
validatedb - Added
classifiedRefinerfor filtering or manipulating per-read classification result file. - Improved
createnewtaxalist. - Improved thread-safety of the database creation process.
Metabuli v1.1.0
- Fix errors in v1.0.9
- Custom DB creation became easier
- Improve
updateDBcommand
Metabuli v1.0.9
DB creation process improved
- Added
updateDBmodule for adding new sequences to an existing database. - Added
--cds-infoparameter in thebuildmodule. Users can provide CDS information to skip Prodigal's gene prediction.- Currently, only NCBI RefSeq or GenBank CDS files (*cds_from_genomic.fna) are supported.
- For the accessions included in the files, the provided CDS info will be used, skipping Prodigal's gene prediction.
- Added
--max-ramparameter to thebuildmodule. - Added compatibility with taxdump files generated using taxonkit.
- 1.0.9-2: Fixations for bioconda
Metabuli v1.0.8
- Added
extractmodule: It extracts reads classified under a specific taxon at any ranks. It can be used after runningclassify.
Metabuli v1.0.7
Metabuli became faster than v1.0.6
-
Dataset
- Query: SRR24315757_1.fastq, SRR24315757_2.fastq
- 22,107,398 paired-end reads
- 6,632,219,400 nt in total
- DB: GTDB
- Complete Genome or Chromosome level assemblies
- CheckM completeness > 90 and contamination < 5
- 36,203 genomes from 8,465 species
- Query: SRR24315757_1.fastq, SRR24315757_2.fastq
-
Windows: ~8.3 times faster
- Machine: Intel(R) Core(TM) i9-9900 CPU, 32GB RAM
--max-ram: 32--threads: 8- v1.0.6: 825s for the first 587,593 reads (2.7% of all). Total time not measured
- v1.0.7: 100s for the first 587,593 reads. 1h 7m 22s in total
-
MacOS: ~1.7 times faster
- Machine: MacBook Pro 14-inch 2023, M2 Pro chip, 32GB RAM
--max-ram: 32--threads: 8- v1.0.6: 71m 34s
- v1.0.7: 42m 58s
-
Linux: ~1.3 times faster
- Machine: A server with 64-core AMD EPYC 7742 CPU and 1 TB of RAM
--max-ram: 128--threads: 32- v1.0.6: 13m 34s
- v1.0.7: 9m 58s
--threads: 64- v1.0.6: 9m 36s
- v1.0.7: 7m 19s
Metabuli v1.0.6
Windows OS is supported
Metabuli v1.0.5
The CMake file was edited to pass the Bioconda PR test.
Other than that it is the same as v1.0.4.
Metabuli v1.0.4
- Fixed a minor reproducibility issue.
- Fixed a performance-harming bug occurring with sequences containing lowercased bases.
- Auto adjustment of
--match-per-kmerparameter. Issue #20 solved. - Record version info. in
db.parameter
Metabuli v1.0.3
- New parameter:
--tie-ratioinclassifymodule. [default 0.95]
When the best matching species has a score MAX, species withscore >= (MAX * --tie-ratio)is considered as a tie to the best score. When tie species occur for a read, the read is classified into their LCA.