Skip to content

Added covariate modelling in DESeq2 formula and PCA plots.#57

Open
RFillinger wants to merge 2 commits intomasterfrom
modelling_and_PCA_plot
Open

Added covariate modelling in DESeq2 formula and PCA plots.#57
RFillinger wants to merge 2 commits intomasterfrom
modelling_and_PCA_plot

Conversation

@RFillinger
Copy link
Contributor

No description provided.

@kew24 kew24 linked an issue Feb 27, 2026 that may be closed by this pull request
dds <- DESeqDataSet(se, design = ~ group)


if (gsub(" ", "", params$group_reg_formula ) == ""){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for ease of reading, I might replace this with something like: if (params$group_reg_formula %in% c("", " ", NA)){.
But, test it out and see what makes sense here with potential user input.


```

#### The model formula used: `formula`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want this to be evaluated, you might need `r formula` -- but test & fact-check me.

Comment on lines +316 to +319
**Note, again, that covariates can be included in this analysis**;
Include a new column in the `units.tsv` file,
then add a new comparison to the `comparisons.tsv` file.
If you have a more complex experimental setup with additional covariates and/or confounders,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm realizing this type of comment would be helpful to include in the README – perhaps add that as an addition to this PR too. By the time our audience reads this, it feels a little late.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that this text is not needed here. What is needed is to emphasize the design used, especially if 1) we are enforcing a default design of ~group 2) allowing different designs for each row in comparisons.tsv. This is already kind of implemented at https://github.com/vari-bbc/rnaseq_workflow/pull/57/changes#diff-22a091377f50349798518be7ff92f0fb7f05c174d48a7a65793a4c327c998666L159 but it could be useful to make it super obvious.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the link above doesn't work well. I was referring to this line (L159):

Coefficients in the model are: `r paste(resultsNames(dds), collapse = ", ")`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, print an obvious warning if a variable is being treated as numeric in case it was meant to be categorical. This should be explained in the README also.


df <- cbind(pcaData$x, col_data)

df[,cols] = factor(df[,cols])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consistency in code style, I might recommend changing the = to a <-

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that's super nitpicky and won't really change anything though :)

out_patchwork <- c(out_patchwork, list(ggplot(df, aes(!!!gg_args)) +
geom_point(size=1) +
scale_color_manual(values = setNames(c("#440154FF","#2A788EFF", ggsci::pal_npg()(length(levels(in_dds@colData[[cols[i]]]))-2)),
scale_color_manual(values = setNames(c("#440154FF","#2A788EFF", ggsci::pal_npg()(length(levels(factor(in_dds@colData[[cols[i]]], levels = unique(in_dds@colData[[cols[i]]]))))-2)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hopeful that there's a better way to write this. Perhaps by levels addition wasn't the best earlier... try just replacing the whole thing with your unique call instead. Something like:

(length(unique(in_dds@colData[[cols[i]]])) - 2)

(make_PCA(vsd, ntop=10000, cols="group") |
(make_PCA(vsd, ntop=10000, cols="group", PCx = 3, PCy = 4))) +

param_list = strsplit(gsub("~", "", params$group_reg_formula), "[+]")[[1]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same nitpicky codestyle comment as earlier – replace = with <-

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to Kaitlyn's comment. Check compatibility with spaces in the formula. For example, "~ batch + condition".

@kew24
Copy link
Contributor

kew24 commented Feb 27, 2026

I like this! I've added a few comments to things to change – after those changes + after I actually test out your branch, I'll approve the PR.

out_patchwork <- c(out_patchwork, list(ggplot(df, aes(!!!gg_args)) +
geom_point(size=1) +
scale_color_manual(values = setNames(c("#440154FF","#2A788EFF", ggsci::pal_npg()(length(levels(in_dds@colData[[cols[i]]]))-2)),
scale_color_manual(values = setNames(c("#440154FF","#2A788EFF", ggsci::pal_npg()(length(levels(factor(in_dds@colData[[cols[i]]], levels = unique(in_dds@colData[[cols[i]]]))))-2)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes the added covariate(s) is categorical. It is uncommon for us in the BBC, but it is possible/can be desirable for the covariate to be numeric.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. @RFillinger, you should be able to assign a discrete vs. continuous color scale depending on the type of covariate (or if this is too crazy, you can just stick w/ the ggplot defaults). Just make sure to keep group colors consistent between this and the heatmap (line 462).

group_lvls <- unique(ht_col_annot$group)
ht_col_colors <- list(group=setNames(c("#440154FF","#2A788EFF", ggsci::pal_npg()(length(group_lvls)-2)),
                                         nm=group_lvls))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add custom model formula to config

3 participants