-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Dear @yipinto and team!
I was wondering if you can perhaps share some more details about how you constructed the "UHGGv2 + UHGV "MQ+"" database you have kindly provided?
Specifically, I had 3 questions:
(1) The viral portion based on UHGV seems to also include viruses that did not meet the "MQ" (medium-quality) criteria as defined on the UHGV github. For example, vOTU-085841 is included in the phanta database but had an "uncertain" "viral-confidence" (as reported in UHGV's metadata) and therefore does not meet their "MQ" criteria. As another example, vOTU-018648 is only 49% complete (therefore not "MQ") but I see it in the database.
(2) In the UHGG2 portion it seems as though not all genomes were included (>280K genomes are listed in MGnify-UHGGv2). Can you please explain which genomes were included and how were they defined as strains/species in the db?
(3) Where were the non-bacterial/archeal/viral genomes sourced from?
Many many thanks in advance!
Efrat