Skip to content

Releases: steineggerlab/Metabuli

v1.2.0

06 Apr 02:58
ce1098c

Choose a tag to compare

Version used in "Sensitive and scalable metagenomic classification using spaced metamers, reduced alphabets, and syncmers"

From v1.2.0 documentation is moved to https://jaebeom-kim.github.io/metabuli-doc/

Metabuli v1.2.0

Improved sensitivity via spaced k-mers and reduced amino acid alphabet

  • Three layers of mismatch tolerance:
    1. Amino acid-level k-mer search allows for synonymous mutations (original feature).
    2. Reduced amino acid alphabet groups similar amino acids together, allowing for conservative substitutions (NEW).
    3. Spaced k-mers allow for mismatches at specific positions in the k-mer (NEW).

Improved scalability via syncmers

  • Syncmers are a subsampling technique that selects a subset of k-mers based on the presence of specific s-mers. This reduces the database size and classification time, minimally decreasing sensitivity in remote homology detection.
  • Two times smaller database and two times faster classification when c = (k - s + 1) / 2 = 2.

Related options in build module

  • --syncmer: Use syncmers instead of all k-mers.
    It reduces the database size and classification speed.
    Reduction rate (k - s + 1) / 2 can be specified using --smer-len to set s.
    As k-mers are subsampled, sensitivity in remote homology detection is decreased.
  • --space-mask: Use spaced k-mers instead of contiguous ones.
  • --custom-metamer: Specify k-mer length and customize a translation table.

Related options in classify module

  • --precise : preset for more precise but less sensitive classification
  • -e : maximum E-value

Metabuli v1.1.1

07 Jul 03:22
a65c014

Choose a tag to compare

  • Version packaged in Metabuli App
  • Import the latest MMseqs2 as a git submodule
  • Added FASTA/Q format validators: fastq_utils and fasta_validator
  • Added database validation function: validatedb
  • Added classifiedRefiner for filtering or manipulating per-read classification result file.
  • Improved createnewtaxalist.
  • Improved thread-safety of the database creation process.

Metabuli v1.1.0

10 Feb 08:04
3cd894d

Choose a tag to compare

  • Fix errors in v1.0.9
  • Custom DB creation became easier
  • Improve updateDB command

Metabuli v1.0.9

27 Dec 06:15
1c5aa20

Choose a tag to compare

Metabuli v1.0.9 Pre-release
Pre-release

DB creation process improved

  • Added updateDB module for adding new sequences to an existing database.
  • Added --cds-info parameter in the build module. Users can provide CDS information to skip Prodigal's gene prediction.
    • Currently, only NCBI RefSeq or GenBank CDS files (*cds_from_genomic.fna) are supported.
    • For the accessions included in the files, the provided CDS info will be used, skipping Prodigal's gene prediction.
  • Added --max-ram parameter to the build module.
  • Added compatibility with taxdump files generated using taxonkit.
  • 1.0.9-2: Fixations for bioconda

Metabuli v1.0.8

29 Sep 12:20
4716a6f

Choose a tag to compare

  • Added extract module: It extracts reads classified under a specific taxon at any ranks. It can be used after running classify.

Metabuli v1.0.7

12 Sep 05:02
b79cb21

Choose a tag to compare

Metabuli became faster than v1.0.6

  • Dataset

    • Query: SRR24315757_1.fastq, SRR24315757_2.fastq
      • 22,107,398 paired-end reads
      • 6,632,219,400 nt in total
    • DB: GTDB
      • Complete Genome or Chromosome level assemblies
      • CheckM completeness > 90 and contamination < 5
      • 36,203 genomes from 8,465 species
  • Windows: ~8.3 times faster

    • Machine: Intel(R) Core(TM) i9-9900 CPU, 32GB RAM
    • --max-ram: 32
    • --threads: 8
    • v1.0.6: 825s for the first 587,593 reads (2.7% of all). Total time not measured
    • v1.0.7: 100s for the first 587,593 reads. 1h 7m 22s in total
  • MacOS: ~1.7 times faster

    • Machine: MacBook Pro 14-inch 2023, M2 Pro chip, 32GB RAM
    • --max-ram: 32
    • --threads: 8
    • v1.0.6: 71m 34s
    • v1.0.7: 42m 58s
  • Linux: ~1.3 times faster

    • Machine: A server with 64-core AMD EPYC 7742 CPU and 1 TB of RAM
    • --max-ram : 128
    • --threads : 32
      • v1.0.6: 13m 34s
      • v1.0.7: 9m 58s
    • --threads : 64
      • v1.0.6: 9m 36s
      • v1.0.7: 7m 19s

Metabuli v1.0.6

02 Aug 12:15
ef9723c

Choose a tag to compare

Windows OS is supported

Metabuli v1.0.5

18 Apr 13:21
19b33ab

Choose a tag to compare

The CMake file was edited to pass the Bioconda PR test.
Other than that it is the same as v1.0.4.

Metabuli v1.0.4

26 Mar 03:23
5bbc1fd

Choose a tag to compare

  • Fixed a minor reproducibility issue.
  • Fixed a performance-harming bug occurring with sequences containing lowercased bases.
  • Auto adjustment of --match-per-kmer parameter. Issue #20 solved.
  • Record version info. in db.parameter

Metabuli v1.0.3

06 Feb 05:02
6fdf834

Choose a tag to compare

  • New parameter: --tie-ratio in classify module. [default 0.95]
    When the best matching species has a score MAX, species with score >= (MAX * --tie-ratio) is considered as a tie to the best score. When tie species occur for a read, the read is classified into their LCA.