-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathPractical.Rmd
More file actions
232 lines (181 loc) · 7.1 KB
/
Practical.Rmd
File metadata and controls
232 lines (181 loc) · 7.1 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
---
title: "Functional Analysis Practical"
output: html_document
---
## Instal & Load Required Pacakges
### CRAN packages
Install CRAN packages
```{r}
cran.packages <- c("tidyverse",
"ggrepel",
"kableExtra",
"ggupset")
cran.load <- function(pkg){
new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
if (length(new.pkg)){
install.packages(new.pkg, dependencies = TRUE)
}
sapply(pkg, require, character.only = TRUE)
}
cran.load(cran.packages)
```
### Bioconductor Packages
Install Biocondustor packages
```{r}
bioconductor.packages <- c("biomaRt",
"org.Hs.eg.db",
"clusterProfiler",
"enrichplot",
"pathview")
bioconductor.load <- function(pkg){
new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
if (length(new.pkg)){
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(new.pkg)
}
sapply(pkg, require, character.only = TRUE)
}
bioconductor.load(bioconductor.packages)
```
## Load Save Data
Load the example results file, the data is in the ./data/ directory and called "Treatment_vs_Control.results.tsv". Note the use of the relative path. The "relative" path to the file is relative to the current working directory. Use the variable name "de_results". Complete the code chunk below.
If you don't have the results file then you can get a copy from the data directory in this Session.
```{r read in previous days results file}
de_results <-
```
## Annotations
Using the example from the lecture slides get annotations from the Ensembl human database at Biomart. Replace the "..." with your code.
```{r}
database <- "hsapiens_gene_ensembl"
mart <- "genes"
filt <- "ensembl_gene_id"
ensembl <- useEnsembl(...)
att <- c("ensembl_gene_id","external_gene_name","chromosome_name","start_position","end_position","gene_biotype","entrezgene_id")
annotations <- getBM(...) %>%
distinct(ensembl_gene_id, .keep_all = TRUE)
```
Merge the annotations with the DESeq2 results
```{r}
de_results <- de_results %>%
left_join(...)%>% # merge annotations by ensembl gene id
arrange(...) # Sort the dataframe by the p adjusted value ascending
```
Save the annotated results to a file in the data directory, call the file "Treatment_vs_Control.annotated.results.tsv"
```{r}
```
## Results Directory
Create a results directory for the ClusterProfiler results files, I would call it "ClusterProfiler", but you can call it whatever you would like.
HINT: Code for this is in the lecture slides
```{r}
results_directory <-
```
## Filtering
Filter the results table for genes that have a p adjusted value of 0.1 and a log fold change of 1.
NOTE: The use of the absolute value for the log2 fold change. Complete the code chunk below
```{r}
pvalueCutoff <- ... # P value
qvalueCutoff <- ... # Adjusted P value
foldchange <- ... # Fold change, usually fold change is log2
```
```{r}
filtered_de_results <- de_results %>%
filter(...) %>% # Filter on p adjusted and the absolute values for the log fold change
drop_na() # Remove any NA values
```
## Gene Ontology Analysis
### Over representation Analysis
Extract a list of entrez ids for the differentially expressed genes, de_results. Use the pull() command from dplyr.
Complete the code chunk below
```{r}
de_genes <- filtered_de_results %>%
...
```
Using the example in the lecture slides perform the over representation analysis
```{r}
go_ora <- enrichGO(gene = ...,
OrgDb = ..., # Organism DB
ont = ..., # Can choose "BP","CC", "MF" or "ALL"
pAdjustMethod = ..., # Can choose "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none"
pvalueCutoff = ...,
qvalueCutoff = ...,
readable = TRUE)
```
### Gene Set enrichment Analysis
Input is a named vector of log fold changes with entrez ids as the names, the list should be sorted on the log fold changes.
HINT: Example can be fould in the lecture slides.
```{r}
gsea_gene_list <- ... # Get the log fold values in a character vector
names(gsea_gene_list) <- ... # Add the entrez gene ids as the names for the log fold changes
gsea_gene_list <- ... # Sort the character vector bym logo fold change, for lowest to highest†
```
Using the example in the lecture slides perform the g ene set enrichment analysis
```{r}
#| echo: true
#| eval: true
go_gsea <- gseGO(gene = ..., # Add the gene list here
OrgDb = ..., # Organism DB
ont = ..., # Can choose BP,CC, MF or ALL
pAdjustMethod = ..., # Can choose "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none"
minGSSize = 100,
maxGSSize = 500,
pvalueCutoff = ...)
```
Now save the Gene Ontology results to a file.
```{r}
write_delim(as.data.frame(go_ora),
file = file.path(results_directory, "go_ora_enriched.tsv"),
delim = "\t")
```
Now using the dotplot() command visualise the GO analysis results, You can limit the number of results using the parameter showCategory=10.
```{r}
```
These visualisations are ggplot2 objects and can be saved using ggsave.
## Pathway Analysis
Over representation analysis
Use the over represented gene list, de_genes, from above
```{r}
kegg_ora <- enrichKEGG(gene = ...,
organism = ..., # Organism DB
pvalueCutoff = 0.01)
```
Save the pathway analysis to file, name the file "ora_pathway.tsv"
```{r}
```
## Pathway Analysis
Gene set enrichment analysis using the named gene set list, gsea_gene_list, from above
```{r}
kegg_gsea <- gseKEGG(geneList = ...,
organism = ..., # Organism DB
minGSSize = 10,
pvalueCutoff = 0.05)
```
Save the pathway analysis to file, name the file "gsea_pathway.tsv"
```{r}
```
## Pathway Analysis Visualisation
Using the function function to retrieve the KEG pathways.
Get a list of KEGG path way ids, kegg_pathway_ids_list, from the kegg_gsea results and pass them thpo the function
```{r}
# Pathway visualisation function
kegg_pathview <- function(kegg_pathway_id){
pathview(gene.data = gsea_gene_list, # Named list of fold changes
pathway.id = kegg_pathway_id, # KEGG pathway id
species = "hsa", # KEG species id
gene.idtype = "KEGG",
kegg.native = TRUE)
}
# List of KEGG pathway ids
kegg_pathway_ids_list <- kegg_gsea %>%
...
# Iterate over the list of pathway ids and apply the data to the pathway
lapply(kegg_pathway_ids_list, kegg_pathview)
```
You can use the code below to tidy up the KEGG pathway files
```{r}
results.files <- list.files(path = ".", pattern = "pathview.png") %>% # list all pathview files
file.copy(results_directory) # Copy pathway files to results directory
# Delete results files
remove_files <- list.files(path = ".", pattern = "hsa") %>% # List all files dowloaded from KEGG
file.remove() # Delete the files
```