Skip to content

Functions for the analysis of risk scores in GTEX data

Notifications You must be signed in to change notification settings

mbyvcm/MRCIEUGTEx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

### Installing Package

```{r,}
# install devtools
install.packages('devtools')
library(devtools)
```

```{r,}
# install package from github
install_github("mbyvcm/MRCIEUGTEx", quiet = T)
library(MRCIEUGTEx)
```

### Adding directories to GTEx data on RDSF

These can be added manually if you know where GTEx files can be found on RDSF:

```{r,}
# path to GTEx VCF file
gtex_vcf_dir <- ""
# path to expression & covariate tar directories downloaded from GTEx Portal 
covariate_matrix_tar  <- ""
expression_matrix_tar <- ""
```
alternatively the GTEx directory on RDSF (should) contain a config.R file which can be sourced to attach these path variables:

```{r,}
source('config.R')
```
"covariate_matrix_tar" and "expression_matrix_tar" are matricies that contain normalised, QC'd data and are downloadable from the GTEx Portal here:

http://www.gtexportal.org/static/datasets/gtex_analysis_v6/single_tissue_eqtl_data/ GTEx_Analysis_V6_eQTLInputFiles_geneLevelNormalizedExpressionMatrices.tar.gz

http://www.gtexportal.org/static/datasets/gtex_analysis_v6/single_tissue_eqtl_data/ GTEx_Analysis_V6_eQTLInputFiles_covariates.tar.gz


### Generating a PRS

In order to generate a polygenic risk score, a "query data.frame" containg information for the sentinel SNPs are required.

SNP|effect.allele|other.allele|beta
---|---|---|---
rs123|A|C|0.136
rs456|G|A|-0.324

The "TwoSampleMR" [https://mrcieu.github.io/TwoSampleMR/] R package can be used to generate a similar dataframe for a given outcome and a p-value threshold.   

```{r,}
# extract SNPs from GTEx VCF files
gtex_query <- extract_query_snps_gtex(query_data_frame = query_data_frame, gtex_vcf_dir = gtex_vcf_dir)

# calculate PRS
geno <- calculate_prs_gtex(query_data_frame = df, gtex_data_frame = gtex_query)
```
If several PRS have been generated for different p-value thresholds, these can be combined in a list. This will be quicker than running each PRS separatly (because this will require expression matricies to be reloaded):

```{r,}
genoList <- list('PRS1' = geno, 'PRS2' = geno2)
```
### Run model

Make a character vector of the tissues you want to analyse. The available_tissues() may help if the 'expression_matrix_tar' path is set. 

```{r,}
tissue <- available_tissues(expression_matrix_tar)
tissue <- tissue[grep(tissue, pattern = 'Brain')]

output <- run_eqtl(geno = genoList, tissue = tissue)
```
Extract 'top hits'

```{r,}
tophits <- extract_top_hits(x = output, fdr = T, pthresh = 0.05)
```

Render volcano plots with gene symbols added to significant genes:

```{r, annotation}
vp <- volcanoplot(output, fdr = T, pthresh = 0.05)
```

About

Functions for the analysis of risk scores in GTEX data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages