| Name/Link | Prokaryotic Portion | Viral Portion | Prophage-masked? | Taxonomy for Prokaryotic Portion | Comments |
|---|---|---|---|---|---|
| Default database | HumGut | MGV + RefSeq viral | N | NCBI | Default database (as described in our manuscript) |
| Masked version of default database | HumGut | MGV + RefSeq viral | Y | NCBI | Prophage-masked version of default database (as described in our manuscript) |
| Default database - GTDB | HumGut | MGV + RefSeq viral | N | GTDB | Default database with GTDB taxonomy for prokaryotic portion |
| UHGGV2 + MGV | UHGGV2 | MGV + RefSeq viral | N | GTDB | Default database with UHGGv2 replacing HumGut. UHGGv2 includes low-prevalence prokaryotes filtered by HumGut |
| HumGut + UHGV "MQ+" | HumGut | UHGV ( |
N | NCBI | Same as default database but replacing the viral portion with new viral genome catalog UHGV. Here we included UHGV genomes |
| HumGut + UHGV "HQ+" | HumGut | UHGV ( |
N | NCBI | Same as previous line but using only |
| UHGGv2 + UHGV "MQ+" | UHGGV2 | UHGV ( |
N | GTDB | UHGGv2 for prokaryotic portion; UHGV for viral portion ( |
Kraken2 database
- hash.k2d
- taxo.k2d
- opts.k2d
- seqid2taxid.map
Bracken databases (built for use with various read lengths N):
- databaseNmers.kmer_distrib
Additional files required for pipeline to run:
- inspect.out
- taxonomy/nodes.dmp
- taxonomy/names.dmp
- library/species_genome_size.txt
For use with post-processing scripts:
- host_prediction_to_genus.tsv
- species_name_to_vir_score.txt
Note: Phanta was developed with human gut metagenomes in mind. Phanta's default database was built based on human-gut viral and bacterial genomes. If you wish to apply Phanta on non human gut metagenomes you'll probably need to supply a custom database. In such cases please open new discussion and we can discuss the best way to help/collaborate on that.
The total tar.gz file should be about 20-25 GB (depends on the exact version).