Skip to content
This repository was archived by the owner on Jul 23, 2025. It is now read-only.

Commit 59636ef

Browse files
committed
Edits post meeting today
1 parent 4266d0f commit 59636ef

File tree

2 files changed

+67
-44
lines changed

2 files changed

+67
-44
lines changed
File renamed without changes.

vignettes/tidytranscriptomics_case_study.Rmd

Lines changed: 67 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@ vignette: >
1515
knitr::opts_chunk$set(echo = TRUE)
1616
```
1717

18+
# Workshop introduction
19+
1820
## Instructors
1921

2022
*Dr. Stefano Mangiola* is currently a Postdoctoral researcher in the laboratory of Prof. Tony Papenfuss at the Walter and Eliza Hall Institute in Melbourne, Australia. His background spans from biotechnology to bioinformatics and biostatistics. His research focuses on prostate and breast tumour microenvironment, the development of statistical models for the analysis of RNA sequencing data, and data analysis and visualisation interfaces.
@@ -43,18 +45,6 @@ knitr::opts_chunk$set(echo = TRUE)
4345

4446
This workshop will demonstrate a real-world example of using tidy transcriptomics packages, such as tidySingleCellExperiment and tidybulk, to perform a single cell analysis. This workshop is not a step-by-step introduction in how to perform single-cell analysis. For an overview of single-cell analysis steps performed in a tidy way please see the [ISMB2021 workshop](https://tidytranscriptomics-workshops.github.io/ismb2021_tidytranscriptomics/articles/tidytranscriptomics.html).
4547

46-
## Slides
47-
48-
*The embedded slides below may take a minute to appear.*
49-
50-
<iframe
51-
src="https://docs.google.com/gview?url=https://raw.githubusercontent.com/tidytranscriptomics-workshops/bioc2022_tidytranscriptomics/master/inst/bioc2022_tidytranscriptomics.pdf&embedded=true"
52-
scrolling="yes"
53-
style="width:100%; height:600px;"
54-
frameborder="0">
55-
</iframe>
56-
57-
5848
## Getting started
5949

6050
### Cloud
@@ -70,8 +60,19 @@ We will use the Orchestra Cloud platform during the BioC2022 workshop and this m
7060

7161
Alternatively, you can view the material at the workshop webpage [here](https://tidytranscriptomics-workshops.github.io/bioc2022_tidytranscriptomics/articles/tidytranscriptomics_case_study.html).
7262

63+
## Slides
64+
65+
*The embedded slides below may take a minute to appear. You can also download from [here](https://github.com/tidytranscriptomics-workshops/bioc2022_tidytranscriptomics/blob/master/inst/bioc2022_tidytranscriptomics.pdf)*
66+
67+
<iframe
68+
src="https://docs.google.com/gview?url=https://raw.githubusercontent.com/tidytranscriptomics-workshops/bioc2022_tidytranscriptomics/master/inst/bioc2022_tidytranscriptomics.pdf&embedded=true"
69+
scrolling="yes"
70+
style="width:100%; height:600px;"
71+
frameborder="0">
72+
</iframe>
7373

74-
## Introduction to tidySingleCellExperiment
74+
75+
# Introduction to tidySingleCellExperiment
7576

7677
```{r message = FALSE}
7778
# Load packages
@@ -106,15 +107,15 @@ library(tidySingleCellExperiment)
106107
sce_obj
107108
```
108109

109-
It can be interacted with using [SingleCellExperiment commands](https://bioconductor.org/packages/devel/bioc/vignettes/SingleCellExperiment/inst/doc/intro.html) such as `assay`.
110+
It can be interacted with using [SingleCellExperiment commands](https://bioconductor.org/packages/devel/bioc/vignettes/SingleCellExperiment/inst/doc/intro.html) such as `assays`.
110111

111112
```{r}
112-
Assays(sce_obj)
113+
assays(sce_obj)
113114
```
114115

115116
We can also interact with our object as we do with any tidyverse tibble.
116117

117-
### Tidyverse commands
118+
## Tidyverse commands
118119

119120
We can use tidyverse commands, such as `filter`, `select` and `mutate` to explore the tidySingleCellExperiment object. Some examples are shown below and more can be seen at the tidySingleCellExperiment website [here](https://stemangiola.github.io/tidySingleCellExperiment/articles/introduction.html#tidyverse-commands-1).
120121

@@ -130,7 +131,7 @@ We can use `select` to choose columns, for example, to see the sample, cell, tot
130131
sce_obj |> select(.cell, nCount_RNA, Phase)
131132
```
132133

133-
We can use `mutate` to create a column. For example, we could create a new `ident_l` column that contains a lower-case version of `ident`.
134+
We can use `mutate` to create a column. For example, we could create a new `Phase_l` column that contains a lower-case version of `Phase`.
134135

135136
```{r}
136137
sce_obj |>
@@ -147,16 +148,16 @@ sce_obj |> select(file)
147148
```
148149

149150
```{r}
150-
# Create columns for sample and group
151+
# Create column for sample
151152
sce_obj <- sce_obj |>
152-
# Extract sample and group
153+
# Extract sample
153154
extract(file, "sample", "../data/.*/([a-zA-Z0-9_-]+)/outs.+", remove = FALSE)
154155
155156
# Take a look
156-
sce_obj |> select(sample)
157+
sce_obj |> select(sample, everything())
157158
```
158159

159-
We could use tidyverse `unite` to combine columns, for example to create a new column for sample id combining the sample and BCB columns.
160+
We could use tidyverse `unite` to combine columns, for example to create a new column for sample id combining the sample and patient id (BCB) columns.
160161

161162
```{r}
162163
sce_obj <- sce_obj |> unite("sample_id", sample, BCB, remove = FALSE)
@@ -166,9 +167,9 @@ sce_obj |> select(sample_id, sample, BCB)
166167
```
167168

168169

169-
## Case study
170+
# Case study
170171

171-
### Data pre-processing
172+
## Data pre-processing
172173

173174
The object `sce_obj` we've been using was created as part of a study on breast cancer systemic immune response. Peripheral blood mononuclear cells have been sequenced for RNA at the single-cell level. The steps used to generate the object are summarised below.
174175

@@ -188,7 +189,7 @@ The object `sce_obj` we've been using was created as part of a study on breast c
188189

189190
- Cells with similar transcriptome profiles were grouped into clusters using Louvain clustering from `scran`.
190191

191-
### Analyse custom signature
192+
## Analyse custom signature
192193

193194
The researcher analysing this dataset wanted to to identify gamma delta T cells using a gene signature from a published paper [@Pizzolato2019].
194195

@@ -248,7 +249,8 @@ sce_obj |>
248249
scales::rescale(CD3D + TRDC + TRGC1 + TRGC2, to = c(0, 1)) -
249250
scales::rescale(CD8A + CD8B, to = c(0, 1))
250251
) |>
251-
252+
253+
# plot cells with high score last
252254
arrange(signature_score) |>
253255
254256
ggplot(aes(UMAP_1, UMAP_2, color = signature_score)) +
@@ -257,7 +259,7 @@ sce_obj |>
257259
bioc2022tidytranscriptomics::theme_multipanel
258260
```
259261

260-
For exploratory analyses, we can select the gamma delta T cells, the red cluster on the left with high signature score. We'll filter for cells with a signature score > 0.8.
262+
For exploratory analyses, we can select the gamma delta T cells, the red cluster on the left with high signature score. We'll filter for cells with a signature score > 0.7.
261263

262264
```{r}
263265
@@ -296,10 +298,10 @@ sce_obj$signature_score <- counts_positive - counts_negative
296298
FeaturePlot(sce_obj, features = "signature_score")
297299
298300
sce_obj |>
299-
subset(signature_score > 0.8)
301+
subset(signature_score > 0.7)
300302
```
301303

302-
It is then possible to perform analyses on these gamma delta T cells by simply chaining further commands, such as below.
304+
It is then possible to focus in and analyse just these gamma delta T cells. We can chain Bioconductor and tidyverse commands to do this.
303305

304306
```{r eval = FALSE}
305307
library(batchelor)
@@ -361,28 +363,32 @@ pbmc |>
361363
add_markers(size = I(1))
362364
```
363365

364-
## Exercises
366+
# Exercises
365367

366-
1. What proportion of all cells are gamma-delta T cells? Use signature_score > 0.8 to identify gamma-delta T cells.
368+
1. What proportion of all cells are gamma-delta T cells? Use signature_score > 0.7 to identify gamma-delta T cells.
367369

368370
2. There is a cluster of cells characterised by a low RNA output (nCount_RNA < 100). Identify the cell composition (cell_type) of that cluster.
369371

370372
# Pseudobulk analyses
371373

372-
It is sometime useful to aggregate cell-wise transcript abundance into pseudobulk samples. It is possible to explore data and perform hypothesis testing with tools and data-source that we are more familiar with. For example, we can use edgeR in tidybulk to perform differential expression testing. For more details on pseudobulk analysis see [here](https://hbctraining.github.io/scRNA-seq/lessons/pseudobulk_DESeq2_scrnaseq.html).
374+
Now we want to identify genes whose transcription is associated with treatment, pseudo bulk analysis is how we can do this. It aggregates cell-wise transcript abundance into pseudobulk samples and enables us to perform hypothesis testing with tools and data-source that we are more familiar with. For example, we can use edgeR in tidybulk to perform differential expression testing. For more details on pseudobulk analysis see [here](https://hbctraining.github.io/scRNA-seq/lessons/pseudobulk_DESeq2_scrnaseq.html).
373375

376+
We want to do it for each cell type and the tidy transcriptomic ecosystem makes this very easy.
374377

375-
### Data exploration using pseudobulk samples
378+
379+
## Data exploration using pseudobulk samples
376380

377381
To do this, we will use a helper function called `aggregate_cells`, available in this workshop package, to combine the single cells into groups for each cell type for each sample.
378382

379383
```{r warning=FALSE, message=FALSE, echo=FALSE}
380384
library(glue)
381385
library(tidyr)
382-
library(tidybulk)
383-
library(tidySummarizedExperiment)
384386
library(purrr)
385387
library(patchwork)
388+
389+
# bulk RNA-seq libraries
390+
library(tidybulk)
391+
library(tidySummarizedExperiment)
386392
```
387393

388394
```{r}
@@ -393,58 +399,75 @@ pseudo_bulk <-
393399
pseudo_bulk
394400
```
395401

396-
### Tidybulk and tidySummarizedExperiment
402+
## Tidybulk and tidySummarizedExperiment
397403

398404
With `tidySummarizedExperiment` and `tidybulk` is easy to stratify our dataset for iterative self-contained analyses.
399405

400-
# <img src="../inst/new_SE_usage-01.png" width="800px" />
401-
406+
```{r, echo=FALSE, out.width = "800px"}
407+
knitr::include_graphics("../inst/vignettes/new_SE_usage-01.png")
408+
```
402409

403-
To explore the grouping, we can use tidyverse `slice` to choose a row (cell_type) and `pull` to extract the values from a column. If we pull the data column we can view the SummarizedExperiment object.
404410

405411
```{r}
406412
pseudo_bulk |>
407413
nest(data = -cell_type)
414+
```
415+
To explore the grouping, we can use tidyverse `slice` to choose a row (cell_type) and `pull` to extract the values from a column. If we pull the data column we can view the SummarizedExperiment object.
408416

409-
417+
```{r}
410418
pseudo_bulk |>
411419
nest(data = -cell_type) |>
412420
slice(1) |>
413421
pull(data)
414422
```
415423

416-
We can then identify differentially expressed genes for each cell type for our condition of interest, progressive versus stable metastatic breast cancer.
424+
We can then identify differentially expressed genes for each cell type for our condition of interest, treated versus untreated patients.
417425

418426
```{r message=FALSE, warning=FALSE}
419427
# Differential transcription abundance
420428
pseudo_bulk <-
421429
pseudo_bulk |>
422430
423431
nest(data = -cell_type) |>
424-
432+
433+
# map inputs a data column (.x)
425434
mutate(data = map(
426435
data,
427436
~ .x |>
437+
438+
# Now using tidybulk on SummarizedExperiment
428439
identify_abundant(factor_of_interest = treatment) |>
429-
scale_abundance() |>
440+
scale_abundance() |>
430441
test_differential_abundance(~treatment)
431442
))
443+
```
432444

445+
```{r}
446+
pseudo_bulk
433447
```
434448

449+
```{r}
450+
pseudo_bulk |>
451+
slice(1) |>
452+
pull(data)
453+
```
454+
455+
Now we can create plots for significant genes for each cell type, visualising their transcriptional abundance, without needing to create multiple objects.
435456

436457
```{r message = FALSE}
437458
pseudo_bulk <-
438459
439-
pseudo_bulk |>
460+
pseudo_bulk |>
440461
441462
# Filter out significant
442463
mutate(data = map(data, ~ filter(.x, FDR < 0.5))) |>
443464
444-
# Filter cell types with no differential abundant gene-transcripts
465+
# Filter cell types with no differential abundant gene-transcripts
466+
# map_int is map that returns integer
445467
filter(map_int(data, ~ nrow(.x)) > 0) |>
446468
447469
# Plot
470+
# map2 is map that accepts 2 input columns (.x, .y)
448471
mutate(plot = map2(
449472
data, cell_type,
450473
~ .x |>

0 commit comments

Comments
 (0)