Skip to content

Ecogenomics/curatomatic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

curatomatic

Semi-automated curation of GTDB reference trees for new releases.

This tool was developed for internal use, it is not supported for public use.

Installation

In a virtual environment, install the following Python libraries:

python -m pip install dendropy tqdm typer

Then, clone the repository into a directory.

git clone https://github.com/Ecogenomics/curatomatic.git

Running

Help

Within the cloned repository, run the following command to view the CLI help:

python -m curatomatic --help

Arguments

The following arguments are required:

python -m curatomatic [OPTIONS] TREE RED_DICT OUT_DIR          
  • TREE = Path to the RED decorated and scaled tree (generated by Phylorank).
    • e.g. gtdb_r232_bac120.decorated.red_decorated.tree
  • RED_DICT = Path to the RED dictionary (i.e. the JSON that contains thresholds).
    • e.g. gtdb_r232_bac120.decorated.dict
  • OUT_DIR = Output directory to write to.

Note: The TREE is the pre-curation tree and must contain zombies (e.g. D-G123456789).

The following arguments are optional:

  • --log [debug|info|warning|error|critical] = Output logging messages equal, or above this level.
  • --min-bs FLOAT = Minimum bootstrap value to consider for decoration (default = 95.0)
  • --meta PATH = File containing unique identifiers for novel rank creation (must be supplied for curating new releases).

Metadata file (--meta):

The format for --meta should be a TSV file that contains a header row with accession, ncbi_wgs_formatted (optional), gtdb_taxonomy (optional). See example/meta.tsv. Additional columns can be present but will be ignored.

This file should come from the GTDB pre-release database and be updated with species clustering information, e.g. /srv/db/gtdb/metadata/release232/representatives/sp_cluster_update/gtdb_r226_metadata.updated_reps.tsv

Example

The example can be run using the following command:

python -m curatomatic example/gtdb_r226_ar53.decorated.scaled.red.tree example/gtdb_r226_ar53.decorated.dict /tmp/output

About

Semi-automated curation of GTDB reference trees for new releases.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages