Skip to content

mag branch 'minigut' is the same sample as modules branch 'bacteroides_fragilis' #2000

@werner291

Description

@werner291

Hello, people of nf-core. While working on my project to rewrite MaxBin2, I encountered a bit of an odd issue.

I was benchmarking a gene caller on two test datasets: the B. fragilis data from the modules branch, and the "minigut" data from the mag branch. I got identical results on both and dug into why.

Turns out the reads are the same file — test_data/test_minigut_R1.fastq.gz on mag and data/genomics/prokaryotes/bacteroides_fragilis/illumina/fastq/test1_1.fastq.gz on modules have the same SHA-256 (sha256-qQ9NSKMVSIbnKlIhuN1rEVnqCQ5dviBjTpZbTugBFqs=). The contigs are different files but contain the same 272 sequences in a different order

Looking at the git history, the reads were first committed to mag in 2018 (2c50b6d5, "test gut bact"), then the same file showed up on modules under bacteroides_fragilis/ in 2021 (b2197bc5). The contigs were re-assembled and added to mag in 2023 (95f022cc).

The naming is what tripped me up — "minigut" sounds like a multi-species gut community, not a single-organism B. fragilis dataset. There's nothing on either branch that connects the two. Maybe worth a note in the mag branch README?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions