Varcontext applies SNVs, indels and complex variants to transcript sequences obtained from ensembl and returns the resulting amino acid sequence, along with other variant information.
Germline variants are only taken into account as such, when labeled in the variant_id column by: 'gs[0-9]+'. (e.g. gs123)
To set up a new MySQL ensembl mirror, do the following:
sudo apt-get updatesudo apt-get install mysql-serversudo mysql_secure_installationand set up MySQL- to mirror the ensembl data, follow the instructions here: Installing the Ensembl Data
you only need the data inhomo_sapiens_core_**_**
For the perl script to run, you also might need the following modules:
perl -MCPAN -e 'install DBI'sudo apt-get install libdbd-mysql-perl
Input files should contain at least the following columns (in identical order):
| variant_id | chromosome | start_position | ref_allele | alt_allele |
|---|
Optional columns are:
| dna_ref_read_count | dna_alt_read_count | dna_total_read_count | dna_vaf | rna_ref_read_count | rna_alt_read_count | rna_total_read_count | rna_vaf | rna_alt_expression |
|---|
NOTE: Columns are read in order, from left to right. No header checking is done (i.e. first line is skipped), so column order is VITAL to proper processing of the input file.
Varcontext can be called from a wrapper script (availabe in neolution-prep) or directly from the Terminal by:
export ENSEMBLAPI=/path/to/ensembl_api/;perl /path/to/varcontext/create_context.pl --ARGUMENTS INPUT_FILE 1> OUTPUT_FILE 2> LOG_FILE
NOTE: environment variable ENSEMBLAPI must be set to full ensembl API path and include trailing slash ('/')
--separator=VALUE- field separator for input file (default = "\t")--canonical- only fetch and apply edits to canonical transcripts (default: FALSE)--nmd- infer nonsense-mediated decay status (default = FALSE)--peptide- report peptide context only (default: FALSE)
- ensembl - a wrapper around the Ensembl API
- EditSeq
- EditTranscript
- Variant - describes a genomic variant using
chromosome,start_position,ref_alleleandalt_allele - VariantSet - a set of variants and the ability to assign transcripts and edit them
- VCF input
- NORMAL control samples
- Write tests (e.g. a testset of genes with a couple of designed mutations)