Added covariate modelling in DESeq2 formula and PCA plots.#57
Added covariate modelling in DESeq2 formula and PCA plots.#57RFillinger wants to merge 2 commits intomasterfrom
Conversation
| dds <- DESeqDataSet(se, design = ~ group) | ||
|
|
||
|
|
||
| if (gsub(" ", "", params$group_reg_formula ) == ""){ |
There was a problem hiding this comment.
just for ease of reading, I might replace this with something like: if (params$group_reg_formula %in% c("", " ", NA)){.
But, test it out and see what makes sense here with potential user input.
|
|
||
| ``` | ||
|
|
||
| #### The model formula used: `formula` |
There was a problem hiding this comment.
if you want this to be evaluated, you might need `r formula` -- but test & fact-check me.
| **Note, again, that covariates can be included in this analysis**; | ||
| Include a new column in the `units.tsv` file, | ||
| then add a new comparison to the `comparisons.tsv` file. | ||
| If you have a more complex experimental setup with additional covariates and/or confounders, |
There was a problem hiding this comment.
I'm realizing this type of comment would be helpful to include in the README – perhaps add that as an addition to this PR too. By the time our audience reads this, it feels a little late.
There was a problem hiding this comment.
Agree that this text is not needed here. What is needed is to emphasize the design used, especially if 1) we are enforcing a default design of ~group 2) allowing different designs for each row in comparisons.tsv. This is already kind of implemented at https://github.com/vari-bbc/rnaseq_workflow/pull/57/changes#diff-22a091377f50349798518be7ff92f0fb7f05c174d48a7a65793a4c327c998666L159 but it could be useful to make it super obvious.
There was a problem hiding this comment.
the link above doesn't work well. I was referring to this line (L159):
Coefficients in the model are: `r paste(resultsNames(dds), collapse = ", ")`.
There was a problem hiding this comment.
Also, print an obvious warning if a variable is being treated as numeric in case it was meant to be categorical. This should be explained in the README also.
|
|
||
| df <- cbind(pcaData$x, col_data) | ||
|
|
||
| df[,cols] = factor(df[,cols]) |
There was a problem hiding this comment.
for consistency in code style, I might recommend changing the = to a <-
There was a problem hiding this comment.
I know that's super nitpicky and won't really change anything though :)
| out_patchwork <- c(out_patchwork, list(ggplot(df, aes(!!!gg_args)) + | ||
| geom_point(size=1) + | ||
| scale_color_manual(values = setNames(c("#440154FF","#2A788EFF", ggsci::pal_npg()(length(levels(in_dds@colData[[cols[i]]]))-2)), | ||
| scale_color_manual(values = setNames(c("#440154FF","#2A788EFF", ggsci::pal_npg()(length(levels(factor(in_dds@colData[[cols[i]]], levels = unique(in_dds@colData[[cols[i]]]))))-2)), |
There was a problem hiding this comment.
I'm hopeful that there's a better way to write this. Perhaps by levels addition wasn't the best earlier... try just replacing the whole thing with your unique call instead. Something like:
(length(unique(in_dds@colData[[cols[i]]])) - 2)| (make_PCA(vsd, ntop=10000, cols="group") | | ||
| (make_PCA(vsd, ntop=10000, cols="group", PCx = 3, PCy = 4))) + | ||
|
|
||
| param_list = strsplit(gsub("~", "", params$group_reg_formula), "[+]")[[1]] |
There was a problem hiding this comment.
same nitpicky codestyle comment as earlier – replace = with <-
There was a problem hiding this comment.
In addition to Kaitlyn's comment. Check compatibility with spaces in the formula. For example, "~ batch + condition".
|
I like this! I've added a few comments to things to change – after those changes + after I actually test out your branch, I'll approve the PR. |
| out_patchwork <- c(out_patchwork, list(ggplot(df, aes(!!!gg_args)) + | ||
| geom_point(size=1) + | ||
| scale_color_manual(values = setNames(c("#440154FF","#2A788EFF", ggsci::pal_npg()(length(levels(in_dds@colData[[cols[i]]]))-2)), | ||
| scale_color_manual(values = setNames(c("#440154FF","#2A788EFF", ggsci::pal_npg()(length(levels(factor(in_dds@colData[[cols[i]]], levels = unique(in_dds@colData[[cols[i]]]))))-2)), |
There was a problem hiding this comment.
This assumes the added covariate(s) is categorical. It is uncommon for us in the BBC, but it is possible/can be desirable for the covariate to be numeric.
There was a problem hiding this comment.
This is a good point. @RFillinger, you should be able to assign a discrete vs. continuous color scale depending on the type of covariate (or if this is too crazy, you can just stick w/ the ggplot defaults). Just make sure to keep group colors consistent between this and the heatmap (line 462).
group_lvls <- unique(ht_col_annot$group)
ht_col_colors <- list(group=setNames(c("#440154FF","#2A788EFF", ggsci::pal_npg()(length(group_lvls)-2)),
nm=group_lvls))
No description provided.