-
Notifications
You must be signed in to change notification settings - Fork 1
2. Basic
There are two main query routes in goat-cli. See them by typing goat-cli.
goat-cli 0.2.0
Max Brown, Richard Challis, Sujai Kumar, Cibele Sotero-Caio <goat@genomehubs.org>
Genomes on a Tree. Query metadata across the tree of life.
For a tutorial on usage, visit: https://github.com/genomehubs/goat-cli/wiki
Visit the GoaT website here: https://goat.genomehubs.org/
USAGE:
goat-cli [SUBCOMMAND]
OPTIONS:
-h, --help Print help information
-V, --version Print version information
SUBCOMMANDS:
taxon Query by taxon index.
assembly Query by assembly index.
help Print this message or the help of the given subcommand(s)goat-cli taxon is the taxon indexed view of GoaT. It is probably the route that most people will want to take, as it contains much metadata about the taxon you are searching for.
goat-cli assembly on the other hand is assembly focused, as the name suggests.
goat-cli taxon currently has four subcommands. Running goat-cli taxon or goat-cli taxon help shows the below output.
goat-cli-taxon 0.2.0
Query by taxon index.
USAGE:
goat-cli taxon [SUBCOMMAND]
OPTIONS:
-h, --help Print help information
-V, --version Print version information
SUBCOMMANDS:
search Query metadata for any taxon across the tree of life by taxon index.
count Return the count of results for any taxon across the tree of life by taxon index.
lookup Return information relating to a taxon name, e.g. synonyms, authorities.
newick Generate a newick tree from input taxa.
help Print this message or the help of the given subcommand(s)
goat-cli taxon search will be the main subcommand of interest for most users. goat-cli taxon search requires only a taxon (-t, or --taxon) to return something from GoaT. The specified taxon can be at any taxonomic rank, and can be a binomial, common name, or NCBI taxon ID.
# each of these will return the same output.
goat-cli taxon search -t "Arabidopsis thaliana"
goat-cli taxon search -t 3702
goat-cli taxon search -t "Thale cress"GoaT outputs certain variables from the search by default, as can be seen by running the above commands. GoaT contains many more variables however, and by using flags you can determine which variables to output. As an example, if you were interested chromosome numbers of certain taxa, you would run:
# -k for karyotype data
goat-cli taxon search -kt "Arabidopsis thaliana"Adding a single flag for a variable returns only that variable as output - the defaults are overridden. Stack multiple together to get your desired output.
Six special flags can be mentioned at this point. These flags don't correspond to variables but influence the underlying data, or how it is reported.
Adding -r, or --raw to any search will report the source of each value, and coerce the data into tidy format. These are direct measurements from a taxon. This flag will only work with species (or a lower taxonomic level), as there are no direct measurements for higher taxonomic ranks.
Adding -i, --include-estimates will force the search to include estimates for a taxon. Note that with multiple variables, estimates are included by default if at least one variable for that record does not have a direct estimate.
To override this behaviour you must use the -r flag.
If combined with -d (see below), this will list all the taxa for a given node, regardless of whether there is direct data for them or not.
-d, --descendents is a really powerful flag that returns a search for an input taxon, and all of its descendents in the tree. For example an input taxon at the family level will return results for all subfamilies, genera, species, subspecies... etc from the search.
If you want all the descendents at a given rank, use the --tax-rank flag (see below).
-R, or --ranks will expand a TSV to give all the taxonomic ranks down to and including the rank specified as separate columns. For example --ranks species, will return each taxonomic rank from superkingdom -> species as separate columns.
-T, --tidy (notice caps) is a flag that returns tidy data.
The -u, --url flag returns the underlying URL(s) which is requested by goat-cli. Mainly used for debugging URLs.
The -U, --goat-ui-url returns a URL which should correspond to a GoaT website search. Paste into the browser and take a look.
The --tax-rank flag returns the result at the level of the taxonomic rank you supply. I don't know, maybe you want to find all tribes of flowering plants:
goat-cli search -dit Magnoliopsida --tax-rank tribe --size 849 | awk '{print $3}' (there are 849 currently)
If you think the command submitted will take a long time, add the --progress-bar flag.
Here is the full list of options.
goat-cli-taxon-search 0.2.0
Query metadata for any taxon across the tree of life by taxon index.
USAGE:
goat-cli taxon search [OPTIONS]
OPTIONS:
-t, --taxon <taxon> The taxon to search. An NCBI taxon ID, or the name of a taxon
at any rank.
-f, --file <file> A file of NCBI taxonomy ID's (tips) and/or binomial names.
Each line should contain a single entry.
File size is limited to 500 entries.
-v, --variables <variables> Variable parser. Input a comma separated string of variables.
--size <size> The number of results to return. Max 50,000 currently.
[default: 50]
-R, --ranks <ranks> Choose a rank to display with the results. All ranks up to the
given rank are displayed. [default: none] [possible values:
none, subspecies, species, genus, family, order, class, phylum,
kingdom, superkingdom]
-e, --expression <expression> Use an expression to filter results server-side.
--tax-rank <tax-rank> The taxonomic rank to return the results at.
-a, --assembly Print assembly data (assembly span, assembly level)
-b, --busco Print BUSCO estimates.
-g, --gc-percent Print GC%.
-k, --karyotype Print karyotype data (chromosome number & haploid number).
-G, --genome-size Print genome size data.
-B, --bioproject Print the bioproject and biosample ID of records.
-N, --n50 Print the contig & scaffold n50 of assemblies.
-D, --date Print EBP & assembly dates.
--gene-count Print gene count data.
-m, --mitochondria Print mitochondrial genome assembly size & GC%.
-p, --plastid Print plastid genome assembly size & GC%.
-S, --sex-determination Print sex determination data.
-P, --ploidy Print ploidy estimates.
-c, --c-values Print c-value data.
--legislation Print legislation data.
-l, --lineage Displays lineage information. I.e. from this node in the tree
go back and give all the nodes to the root. Conflicts with
descendents.
--target-lists Print target list data associated with each taxon.
-C, --country-list Print list of countries where taxon is found.
--status Print all data associated with how far this taxon has
progressed with genomic sequencing.
This includes sample collection, acquisition, progress in
sequencing, and whether submitted to INSDC.
-n, --names Print all associated name data (synonyms, Tree of Life ID, and
common names).
-r, --raw Print raw values (i.e. no aggregation/summary).
-d, --descendents Get information for all descendents of a common ancestor.
-T, --tidy Print data in tidy format.
-i, --include-estimates Include ancestral estimates. Omitting this flag includes only
direct estimates from a taxon. Cannot be used with --raw.
--print-expression Print all variables in GoaT currently, with their associated
variants.
Useful for construction of expressions.
--progress-bar Add a progress bar to large queries, to estimate time left.
-u, --url Print the underlying GoaT API URL(s). Useful for debugging.
-U, --goat-ui-url Print the underlying GoaT UI URL(s). View on the browser!
-h, --help Print help information
-V, --version Print version information
Not much to say about goat count other than the fact that it returns the number of hits of a query from a search. Any valid goat-cli taxon search command will be a valid goat-cli taxon count command.
goat-cli taxon lookup is very simple currently. It only requires a taxon NCBI ID, taxon name, or common name. It returns an NCBI tax-id, along with synonyms, authorities, and common names.
goat-cli taxon lookup will perform a basic spell check if no results are found. For example:
goat-cli taxon lookup -t Pterophorsa -> Did you mean: Pterophorus, Petrophora, Pterophora?
A user can query multiple taxon ID's at the same time in a comma separated list. For example:
goat-cli taxon lookup -t "38942, 2405"
goat-cli-taxon-lookup 0.2.0
Return information relating to a taxon name, e.g. synonyms, authorities.
USAGE:
goat-cli taxon lookup [OPTIONS]
OPTIONS:
-t, --taxon <taxon> The taxon to search. An NCBI taxon ID, or the name of a taxon at any
rank.
-f, --file <file> A file of NCBI taxonomy ID's (tips) and/or binomial names.
Each line should contain a single entry.
File size is limited to 500 entries.
-u, --url Print lookup URL.
-s, --size <size> The number of results to return. [default: 10]
-h, --help Print help information
-V, --version Print version information
GoaT can also return trees. Given a clade, or a string of clades, a newick tree (cladogram) will be returned.
goat-cli-taxon-newick 0.2.0
Generate a newick tree from input taxa.
USAGE:
goat-cli taxon newick [OPTIONS]
OPTIONS:
-t, --taxon <taxon> The taxon to return a newick of. Multiple taxa will return the joint
tree.
-u, --url Print lookup URL.
-r, --rank <rank> The number of results to return. [default: species] [possible values:
species, genus, family, order]
--progress-bar Add a progress bar to large queries, to estimate time left.
-h, --help Print help information
-V, --version Print version information
E.g. get a cladogram of the genera in the family Fabaceae:
goat newick -t "Fabaceae" -r genus
This section is in progress. Please check back soon!