Understanding Metabuli scores on real metagenomic datasets

Hi Metabuli team,

I'm evaluating Metabuli on real metagenomic samples and noticed significantly 
lower scores compared to synthetic benchmarks. I'm trying to understand if this 
is expected and how to optimize for real-world data.

**Score comparison across datasets:**

Dataset        | MMseqs2 confidence | Metabuli score | Sample type
---------------|-------------------|----------------|-------------
CAMI-Marine    | 0.93              | 0.81           | Synthetic
IBS            | 0.89              | 0.45           | Real (gut)
Freshwater     | 0.60              | 0.05           | Real (environmental)

**My concerns:**
1. The dramatic score drop on real data makes it difficult to set confidence 
   thresholds for filtering
2. Many assignments that look taxonomically reasonable have very low scores
3. It's unclear whether low scores indicate:
   - True uncertainty (novel/divergent organisms)
   - Database coverage issues
   - Algorithm behavior on fragmented/noisy real data

**Questions:**
1. Is this score pattern expected for real metagenomic data? 
2. What factors most influence Metabuli's scoring on real vs. synthetic data?
3. Are there parameters I should adjust for real samples? (e.g., --min-score, 
   --min-sp-score)
4. How do you recommend filtering classifications from real data - by score 
   threshold or other metrics?
5. Would using a more recent GTDB version significantly improve scores on 
   real samples?

**My setup:**
- Database: GTDB r214
- Metabuli version: 1.1.0

I've attached example outputs showing the score distribution. Any guidance 
on interpreting and optimizing for real data would be greatly appreciated!

Thanks!
metabuli on freshwater-ERR4195020:
<img width="1686" height="529" alt="Image" src="https://github.com/user-attachments/assets/acc5522c-ac6a-4911-99a1-0b716828b02b" />
mmseq2 on freshwater-ERR4195020:
<img width="1689" height="447" alt="Image" src="https://github.com/user-attachments/assets/a4ab9d7d-c6d3-4ebd-84e1-ce484c335f6c" />


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding Metabuli scores on real metagenomic datasets #171

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dataset	MMseqs2 confidence	Metabuli score	Sample type
CAMI-Marine	0.93	0.81	Synthetic
IBS	0.89	0.45	Real (gut)
Freshwater	0.60	0.05	Real (environmental)

Understanding Metabuli scores on real metagenomic datasets #171

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions