Semi-automated curation of GTDB reference trees for new releases.
This tool was developed for internal use, it is not supported for public use.
In a virtual environment, install the following Python libraries:
python -m pip install dendropy tqdm typerThen, clone the repository into a directory.
git clone https://github.com/Ecogenomics/curatomatic.gitWithin the cloned repository, run the following command to view the CLI help:
python -m curatomatic --helpThe following arguments are required:
python -m curatomatic [OPTIONS] TREE RED_DICT OUT_DIR TREE= Path to the RED decorated and scaled tree (generated by Phylorank).- e.g.
gtdb_r232_bac120.decorated.red_decorated.tree
- e.g.
RED_DICT= Path to the RED dictionary (i.e. the JSON that contains thresholds).- e.g.
gtdb_r232_bac120.decorated.dict
- e.g.
OUT_DIR= Output directory to write to.
Note: The TREE is the pre-curation tree and must contain zombies (e.g. D-G123456789).
The following arguments are optional:
--log [debug|info|warning|error|critical]= Output logging messages equal, or above this level.--min-bs FLOAT= Minimum bootstrap value to consider for decoration (default = 95.0)--meta PATH= File containing unique identifiers for novel rank creation (must be supplied for curating new releases).
The format for --meta should be a TSV file that contains a header row with accession, ncbi_wgs_formatted (optional), gtdb_taxonomy (optional).
See example/meta.tsv. Additional columns can be present but will be ignored.
This file should come from the GTDB pre-release database and be updated with species clustering information,
e.g. /srv/db/gtdb/metadata/release232/representatives/sp_cluster_update/gtdb_r226_metadata.updated_reps.tsv
The example can be run using the following command:
python -m curatomatic example/gtdb_r226_ar53.decorated.scaled.red.tree example/gtdb_r226_ar53.decorated.dict /tmp/output