Skip to content

Understanding Metabuli scores on real metagenomic datasets #171

@taorong007

Description

@taorong007

Hi Metabuli team,

I'm evaluating Metabuli on real metagenomic samples and noticed significantly
lower scores compared to synthetic benchmarks. I'm trying to understand if this
is expected and how to optimize for real-world data.

Score comparison across datasets:

Dataset MMseqs2 confidence Metabuli score Sample type
CAMI-Marine 0.93 0.81 Synthetic
IBS 0.89 0.45 Real (gut)
Freshwater 0.60 0.05 Real (environmental)

My concerns:

  1. The dramatic score drop on real data makes it difficult to set confidence
    thresholds for filtering
  2. Many assignments that look taxonomically reasonable have very low scores
  3. It's unclear whether low scores indicate:
    • True uncertainty (novel/divergent organisms)
    • Database coverage issues
    • Algorithm behavior on fragmented/noisy real data

Questions:

  1. Is this score pattern expected for real metagenomic data?
  2. What factors most influence Metabuli's scoring on real vs. synthetic data?
  3. Are there parameters I should adjust for real samples? (e.g., --min-score,
    --min-sp-score)
  4. How do you recommend filtering classifications from real data - by score
    threshold or other metrics?
  5. Would using a more recent GTDB version significantly improve scores on
    real samples?

My setup:

  • Database: GTDB r214
  • Metabuli version: 1.1.0

I've attached example outputs showing the score distribution. Any guidance
on interpreting and optimizing for real data would be greatly appreciated!

Thanks!
metabuli on freshwater-ERR4195020:
Image
mmseq2 on freshwater-ERR4195020:
Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions