Skip to content

2. Basic

Max Brown edited this page Jul 1, 2022 · 4 revisions

Overview

There are two main query routes in goat-cli. See them by typing goat-cli.

goat-cli 0.2.0
Max Brown, Richard Challis, Sujai Kumar, Cibele Sotero-Caio <goat@genomehubs.org>
Genomes on a Tree. Query metadata across the tree of life.

For a tutorial on usage, visit: https://github.com/genomehubs/goat-cli/wiki
Visit the GoaT website here: https://goat.genomehubs.org/

USAGE:
    goat-cli [SUBCOMMAND]

OPTIONS:
    -h, --help       Print help information
    -V, --version    Print version information

SUBCOMMANDS:
    taxon       Query by taxon index.
    assembly    Query by assembly index.
    help        Print this message or the help of the given subcommand(s)

goat-cli taxon is the taxon indexed view of GoaT. It is probably the route that most people will want to take, as it contains much metadata about the taxon you are searching for.

goat-cli assembly on the other hand is assembly focused, as the name suggests.

Query GoaT by taxon index

goat-cli taxon currently has four subcommands. Running goat-cli taxon or goat-cli taxon help shows the below output.

goat-cli-taxon 0.2.0
Query by taxon index.

USAGE:
    goat-cli taxon [SUBCOMMAND]

OPTIONS:
    -h, --help       Print help information
    -V, --version    Print version information

SUBCOMMANDS:
    search    Query metadata for any taxon across the tree of life by taxon index.
    count     Return the count of results for any taxon across the tree of life by taxon index.
    lookup    Return information relating to a taxon name, e.g. synonyms, authorities.
    newick    Generate a newick tree from input taxa.
    help      Print this message or the help of the given subcommand(s)

goat-cli taxon search

Basic search

goat-cli taxon search will be the main subcommand of interest for most users. goat-cli taxon search requires only a taxon (-t, or --taxon) to return something from GoaT. The specified taxon can be at any taxonomic rank, and can be a binomial, common name, or NCBI taxon ID.

# each of these will return the same output.
goat-cli taxon search -t "Arabidopsis thaliana"
goat-cli taxon search -t 3702
goat-cli taxon search -t "Thale cress"

GoaT outputs certain variables from the search by default, as can be seen by running the above commands. GoaT contains many more variables however, and by using flags you can determine which variables to output. As an example, if you were interested chromosome numbers of certain taxa, you would run:

# -k for karyotype data
goat-cli taxon search -kt "Arabidopsis thaliana"

Adding a single flag for a variable returns only that variable as output - the defaults are overridden. Stack multiple together to get your desired output.

Special flags

Six special flags can be mentioned at this point. These flags don't correspond to variables but influence the underlying data, or how it is reported.

Raw flag

Adding -r, or --raw to any search will report the source of each value, and coerce the data into tidy format. These are direct measurements from a taxon. This flag will only work with species (or a lower taxonomic level), as there are no direct measurements for higher taxonomic ranks.

Include estimates flag

Adding -i, --include-estimates will force the search to include estimates for a taxon. Note that with multiple variables, estimates are included by default if at least one variable for that record does not have a direct estimate.

To override this behaviour you must use the -r flag.

If combined with -d (see below), this will list all the taxa for a given node, regardless of whether there is direct data for them or not.

Descendents flag

-d, --descendents is a really powerful flag that returns a search for an input taxon, and all of its descendents in the tree. For example an input taxon at the family level will return results for all subfamilies, genera, species, subspecies... etc from the search.

If you want all the descendents at a given rank, use the --tax-rank flag (see below).

Ranks flag

-R, or --ranks will expand a TSV to give all the taxonomic ranks down to and including the rank specified as separate columns. For example --ranks species, will return each taxonomic rank from superkingdom -> species as separate columns.

Tidy format

-T, --tidy (notice caps) is a flag that returns tidy data.

URL format

The -u, --url flag returns the underlying URL(s) which is requested by goat-cli. Mainly used for debugging URLs.

The -U, --goat-ui-url returns a URL which should correspond to a GoaT website search. Paste into the browser and take a look.

Tax-rank flag

The --tax-rank flag returns the result at the level of the taxonomic rank you supply. I don't know, maybe you want to find all tribes of flowering plants:

goat-cli search -dit Magnoliopsida --tax-rank tribe --size 849 | awk '{print $3}' (there are 849 currently)

Progress bar flag

If you think the command submitted will take a long time, add the --progress-bar flag.

Current man page

Here is the full list of options.

goat-cli-taxon-search 0.2.0
Query metadata for any taxon across the tree of life by taxon index.

USAGE:
    goat-cli taxon search [OPTIONS]

OPTIONS:
    -t, --taxon <taxon>              The taxon to search. An NCBI taxon ID, or the name of a taxon
                                     at any rank.
    -f, --file <file>                A file of NCBI taxonomy ID's (tips) and/or binomial names.
                                     Each line should contain a single entry.
                                     File size is limited to 500 entries.
    -v, --variables <variables>      Variable parser. Input a comma separated string of variables.
        --size <size>                The number of results to return. Max 50,000 currently.
                                     [default: 50]
    -R, --ranks <ranks>              Choose a rank to display with the results. All ranks up to the
                                     given rank are displayed. [default: none] [possible values:
                                     none, subspecies, species, genus, family, order, class, phylum,
                                     kingdom, superkingdom]
    -e, --expression <expression>    Use an expression to filter results server-side.
        --tax-rank <tax-rank>        The taxonomic rank to return the results at.
    -a, --assembly                   Print assembly data (assembly span, assembly level)
    -b, --busco                      Print BUSCO estimates.
    -g, --gc-percent                 Print GC%.
    -k, --karyotype                  Print karyotype data (chromosome number & haploid number).
    -G, --genome-size                Print genome size data.
    -B, --bioproject                 Print the bioproject and biosample ID of records.
    -N, --n50                        Print the contig & scaffold n50 of assemblies.
    -D, --date                       Print EBP & assembly dates.
        --gene-count                 Print gene count data.
    -m, --mitochondria               Print mitochondrial genome assembly size & GC%.
    -p, --plastid                    Print plastid genome assembly size & GC%.
    -S, --sex-determination          Print sex determination data.
    -P, --ploidy                     Print ploidy estimates.
    -c, --c-values                   Print c-value data.
        --legislation                Print legislation data.
    -l, --lineage                    Displays lineage information. I.e. from this node in the tree
                                     go back and give all the nodes to the root. Conflicts with
                                     descendents.
        --target-lists               Print target list data associated with each taxon.
    -C, --country-list               Print list of countries where taxon is found.
        --status                     Print all data associated with how far this taxon has
                                     progressed with genomic sequencing.
                                     This includes sample collection, acquisition, progress in
                                     sequencing, and whether submitted to INSDC.
    -n, --names                      Print all associated name data (synonyms, Tree of Life ID, and
                                     common names).
    -r, --raw                        Print raw values (i.e. no aggregation/summary).
    -d, --descendents                Get information for all descendents of a common ancestor.
    -T, --tidy                       Print data in tidy format.
    -i, --include-estimates          Include ancestral estimates. Omitting this flag includes only
                                     direct estimates from a taxon. Cannot be used with --raw.
        --print-expression           Print all variables in GoaT currently, with their associated
                                     variants.
                                     Useful for construction of expressions.
        --progress-bar               Add a progress bar to large queries, to estimate time left.
    -u, --url                        Print the underlying GoaT API URL(s). Useful for debugging.
    -U, --goat-ui-url                Print the underlying GoaT UI URL(s). View on the browser!
    -h, --help                       Print help information
    -V, --version                    Print version information

goat-cli taxon count

Not much to say about goat count other than the fact that it returns the number of hits of a query from a search. Any valid goat-cli taxon search command will be a valid goat-cli taxon count command.

goat-cli taxon lookup

goat-cli taxon lookup is very simple currently. It only requires a taxon NCBI ID, taxon name, or common name. It returns an NCBI tax-id, along with synonyms, authorities, and common names.

goat-cli taxon lookup will perform a basic spell check if no results are found. For example:

goat-cli taxon lookup -t Pterophorsa -> Did you mean: Pterophorus, Petrophora, Pterophora?

A user can query multiple taxon ID's at the same time in a comma separated list. For example:

goat-cli taxon lookup -t "38942, 2405"

Current man page

goat-cli-taxon-lookup 0.2.0
Return information relating to a taxon name, e.g. synonyms, authorities.

USAGE:
    goat-cli taxon lookup [OPTIONS]

OPTIONS:
    -t, --taxon <taxon>    The taxon to search. An NCBI taxon ID, or the name of a taxon at any
                           rank.
    -f, --file <file>      A file of NCBI taxonomy ID's (tips) and/or binomial names.
                           Each line should contain a single entry.
                           File size is limited to 500 entries.
    -u, --url              Print lookup URL.
    -s, --size <size>      The number of results to return. [default: 10]
    -h, --help             Print help information
    -V, --version          Print version information

goat newick

GoaT can also return trees. Given a clade, or a string of clades, a newick tree (cladogram) will be returned.

goat-cli-taxon-newick 0.2.0
Generate a newick tree from input taxa.

USAGE:
    goat-cli taxon newick [OPTIONS]

OPTIONS:
    -t, --taxon <taxon>    The taxon to return a newick of. Multiple taxa will return the joint
                           tree.
    -u, --url              Print lookup URL.
    -r, --rank <rank>      The number of results to return. [default: species] [possible values:
                           species, genus, family, order]
        --progress-bar     Add a progress bar to large queries, to estimate time left.
    -h, --help             Print help information
    -V, --version          Print version information

E.g. get a cladogram of the genera in the family Fabaceae:

goat newick -t "Fabaceae" -r genus

Query GoaT by assembly index

This section is in progress. Please check back soon!