Skip to content

Further details about the construction of new phanta databases #53

@efratmuller

Description

@efratmuller

Dear @yipinto and team!

I was wondering if you can perhaps share some more details about how you constructed the "UHGGv2 + UHGV "MQ+"" database you have kindly provided?

Specifically, I had 3 questions:

(1) The viral portion based on UHGV seems to also include viruses that did not meet the "MQ" (medium-quality) criteria as defined on the UHGV github. For example, vOTU-085841 is included in the phanta database but had an "uncertain" "viral-confidence" (as reported in UHGV's metadata) and therefore does not meet their "MQ" criteria. As another example, vOTU-018648 is only 49% complete (therefore not "MQ") but I see it in the database.

(2) In the UHGG2 portion it seems as though not all genomes were included (>280K genomes are listed in MGnify-UHGGv2). Can you please explain which genomes were included and how were they defined as strains/species in the db?

(3) Where were the non-bacterial/archeal/viral genomes sourced from?

Many many thanks in advance!
Efrat

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions