You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jul 23, 2025. It is now read-only.
Copy file name to clipboardExpand all lines: vignettes/tidytranscriptomics_case_study.Rmd
+67-44Lines changed: 67 additions & 44 deletions
Original file line number
Diff line number
Diff line change
@@ -15,6 +15,8 @@ vignette: >
15
15
knitr::opts_chunk$set(echo = TRUE)
16
16
```
17
17
18
+
# Workshop introduction
19
+
18
20
## Instructors
19
21
20
22
*Dr. Stefano Mangiola* is currently a Postdoctoral researcher in the laboratory of Prof. Tony Papenfuss at the Walter and Eliza Hall Institute in Melbourne, Australia. His background spans from biotechnology to bioinformatics and biostatistics. His research focuses on prostate and breast tumour microenvironment, the development of statistical models for the analysis of RNA sequencing data, and data analysis and visualisation interfaces.
This workshop will demonstrate a real-world example of using tidy transcriptomics packages, such as tidySingleCellExperiment and tidybulk, to perform a single cell analysis. This workshop is not a step-by-step introduction in how to perform single-cell analysis. For an overview of single-cell analysis steps performed in a tidy way please see the [ISMB2021 workshop](https://tidytranscriptomics-workshops.github.io/ismb2021_tidytranscriptomics/articles/tidytranscriptomics.html).
45
47
46
-
## Slides
47
-
48
-
*The embedded slides below may take a minute to appear.*
@@ -70,8 +60,19 @@ We will use the Orchestra Cloud platform during the BioC2022 workshop and this m
70
60
71
61
Alternatively, you can view the material at the workshop webpage [here](https://tidytranscriptomics-workshops.github.io/bioc2022_tidytranscriptomics/articles/tidytranscriptomics_case_study.html).
72
62
63
+
## Slides
64
+
65
+
*The embedded slides below may take a minute to appear. You can also download from [here](https://github.com/tidytranscriptomics-workshops/bioc2022_tidytranscriptomics/blob/master/inst/bioc2022_tidytranscriptomics.pdf)*
It can be interacted with using [SingleCellExperiment commands](https://bioconductor.org/packages/devel/bioc/vignettes/SingleCellExperiment/inst/doc/intro.html) such as `assay`.
110
+
It can be interacted with using [SingleCellExperiment commands](https://bioconductor.org/packages/devel/bioc/vignettes/SingleCellExperiment/inst/doc/intro.html) such as `assays`.
110
111
111
112
```{r}
112
-
Assays(sce_obj)
113
+
assays(sce_obj)
113
114
```
114
115
115
116
We can also interact with our object as we do with any tidyverse tibble.
116
117
117
-
###Tidyverse commands
118
+
## Tidyverse commands
118
119
119
120
We can use tidyverse commands, such as `filter`, `select` and `mutate` to explore the tidySingleCellExperiment object. Some examples are shown below and more can be seen at the tidySingleCellExperiment website [here](https://stemangiola.github.io/tidySingleCellExperiment/articles/introduction.html#tidyverse-commands-1).
120
121
@@ -130,7 +131,7 @@ We can use `select` to choose columns, for example, to see the sample, cell, tot
130
131
sce_obj |> select(.cell, nCount_RNA, Phase)
131
132
```
132
133
133
-
We can use `mutate` to create a column. For example, we could create a new `ident_l` column that contains a lower-case version of `ident`.
134
+
We can use `mutate` to create a column. For example, we could create a new `Phase_l` column that contains a lower-case version of `Phase`.
The object `sce_obj` we've been using was created as part of a study on breast cancer systemic immune response. Peripheral blood mononuclear cells have been sequenced for RNA at the single-cell level. The steps used to generate the object are summarised below.
174
175
@@ -188,7 +189,7 @@ The object `sce_obj` we've been using was created as part of a study on breast c
188
189
189
190
- Cells with similar transcriptome profiles were grouped into clusters using Louvain clustering from `scran`.
190
191
191
-
###Analyse custom signature
192
+
## Analyse custom signature
192
193
193
194
The researcher analysing this dataset wanted to to identify gamma delta T cells using a gene signature from a published paper [@Pizzolato2019].
ggplot(aes(UMAP_1, UMAP_2, color = signature_score)) +
@@ -257,7 +259,7 @@ sce_obj |>
257
259
bioc2022tidytranscriptomics::theme_multipanel
258
260
```
259
261
260
-
For exploratory analyses, we can select the gamma delta T cells, the red cluster on the left with high signature score. We'll filter for cells with a signature score > 0.8.
262
+
For exploratory analyses, we can select the gamma delta T cells, the red cluster on the left with high signature score. We'll filter for cells with a signature score > 0.7.
FeaturePlot(sce_obj, features = "signature_score")
297
299
298
300
sce_obj |>
299
-
subset(signature_score > 0.8)
301
+
subset(signature_score > 0.7)
300
302
```
301
303
302
-
It is then possible to perform analyses on these gamma delta T cells by simply chaining further commands, such as below.
304
+
It is then possible to focus in and analyse just these gamma delta T cells. We can chain Bioconductor and tidyverse commands to do this.
303
305
304
306
```{r eval = FALSE}
305
307
library(batchelor)
@@ -361,28 +363,32 @@ pbmc |>
361
363
add_markers(size = I(1))
362
364
```
363
365
364
-
##Exercises
366
+
# Exercises
365
367
366
-
1. What proportion of all cells are gamma-delta T cells? Use signature_score > 0.8 to identify gamma-delta T cells.
368
+
1. What proportion of all cells are gamma-delta T cells? Use signature_score > 0.7 to identify gamma-delta T cells.
367
369
368
370
2. There is a cluster of cells characterised by a low RNA output (nCount_RNA < 100). Identify the cell composition (cell_type) of that cluster.
369
371
370
372
# Pseudobulk analyses
371
373
372
-
It is sometime useful to aggregate cell-wise transcript abundance into pseudobulk samples. It is possible to explore data and perform hypothesis testing with tools and data-source that we are more familiar with. For example, we can use edgeR in tidybulk to perform differential expression testing. For more details on pseudobulk analysis see [here](https://hbctraining.github.io/scRNA-seq/lessons/pseudobulk_DESeq2_scrnaseq.html).
374
+
Now we want to identify genes whose transcription is associated with treatment, pseudo bulk analysis is how we can do this. It aggregates cell-wise transcript abundance into pseudobulk samples and enables us to perform hypothesis testing with tools and data-source that we are more familiar with. For example, we can use edgeR in tidybulk to perform differential expression testing. For more details on pseudobulk analysis see [here](https://hbctraining.github.io/scRNA-seq/lessons/pseudobulk_DESeq2_scrnaseq.html).
373
375
376
+
We want to do it for each cell type and the tidy transcriptomic ecosystem makes this very easy.
374
377
375
-
### Data exploration using pseudobulk samples
378
+
379
+
## Data exploration using pseudobulk samples
376
380
377
381
To do this, we will use a helper function called `aggregate_cells`, available in this workshop package, to combine the single cells into groups for each cell type for each sample.
378
382
379
383
```{r warning=FALSE, message=FALSE, echo=FALSE}
380
384
library(glue)
381
385
library(tidyr)
382
-
library(tidybulk)
383
-
library(tidySummarizedExperiment)
384
386
library(purrr)
385
387
library(patchwork)
388
+
389
+
# bulk RNA-seq libraries
390
+
library(tidybulk)
391
+
library(tidySummarizedExperiment)
386
392
```
387
393
388
394
```{r}
@@ -393,58 +399,75 @@ pseudo_bulk <-
393
399
pseudo_bulk
394
400
```
395
401
396
-
###Tidybulk and tidySummarizedExperiment
402
+
## Tidybulk and tidySummarizedExperiment
397
403
398
404
With `tidySummarizedExperiment` and `tidybulk` is easy to stratify our dataset for iterative self-contained analyses.
To explore the grouping, we can use tidyverse `slice` to choose a row (cell_type) and `pull` to extract the values from a column. If we pull the data column we can view the SummarizedExperiment object.
404
410
405
411
```{r}
406
412
pseudo_bulk |>
407
413
nest(data = -cell_type)
414
+
```
415
+
To explore the grouping, we can use tidyverse `slice` to choose a row (cell_type) and `pull` to extract the values from a column. If we pull the data column we can view the SummarizedExperiment object.
408
416
409
-
417
+
```{r}
410
418
pseudo_bulk |>
411
419
nest(data = -cell_type) |>
412
420
slice(1) |>
413
421
pull(data)
414
422
```
415
423
416
-
We can then identify differentially expressed genes for each cell type for our condition of interest, progressive versus stable metastatic breast cancer.
424
+
We can then identify differentially expressed genes for each cell type for our condition of interest, treated versus untreated patients.
Now we can create plots for significant genes for each cell type, visualising their transcriptional abundance, without needing to create multiple objects.
0 commit comments