command-line-bio

A command line tool for working with gff, fasta, and bioinformatics databases.

Installation

You can find the latest test release of command-line-bio on TestPyPI: https://test.pypi.org/project/command-line-bio/

To install from TestPyPI run

pip install --index-url https://test.pypi.org/simple/ \
            --extra-index-url https://pypi.org/simple/ \
            command-line-bio

GFF Transcript Query

Retrieve information about transcripts from a gff.

Usage

Before you can query a gff you must first convert it to an SQLite database. This process takes a few minutes but only needs to be done once.

$ clb gff createdb --gff /opt/GRCh37_latest_genomic.gff.gz --db /opt/GRCh37_latest_genomic.gff.db

Created /opt/GRCh37_latest_genomic.gff from /opt/GRCh37_latest_genomic.gff.gz

Lookup a cDNA transcript

$ clb gff query --db /opt/GRCh37_latest_genomic.gff.db NM_001290187.2

Chromosome    cDNA            Protein         CCDS         Gene    Strand
------------  --------------  --------------  -----------  ------  --------
NC_000007.13  NM_001290187.2  NP_001277116.1  CCDS78285.1  KRBA1   +

Include exon positions by adding the --exons flag

$ clb gff query --db /opt/GRCh37_latest_genomic.gff.db NM_000546.6 --exons

Chromosome    cDNA         Protein      CCDS         Gene    Strand
------------  -----------  -----------  -----------  ------  --------
NC_000017.10  NM_000546.6  NP_000537.3  CCDS11118.1  TP53    -

  Exon    Start      End
------  -------  -------
     1  7579839  7579912
     2  7579700  7579721
     3  7579312  7579590
     4  7578371  7578554
     5  7578177  7578289
     6  7577499  7577608
     7  7577019  7577155
     8  7576853  7576926
     9  7573927  7574033
    10  7572927  7573008

Lookup a protein transcript

$ clb gff query --db /opt/GRCh37_latest_genomic.gff.db NP_001034792.4

Chromosome      cDNA            Protein         CCDS    Gene    Strand
--------------  --------------  --------------  ------  ------  --------
NC_000001.10    NM_001039703.6  NP_001034792.4          NBPF10  +
NW_003871055.3  NM_001039703.6  NP_001034792.4          NBPF10  -

Reference Query

Query reference genome. You can use a fasta file or SeqRepo installation as the reference source.

Usage

$ clb ref seq --fasta /opt/Homo_sapiens_assembly19.fasta chr1:111-121
$ clb ref seq --seqrepo /opt/seqrepo/latest chr1:111-121

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.github/workflows		.github/workflows
src/clb		src/clb
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

command-line-bio

Installation

GFF Transcript Query

Usage

Reference Query

Usage

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

command-line-bio

Installation

GFF Transcript Query

Usage

Reference Query

Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages