Establishes some general functions relating to single-cell DGE#18
Establishes some general functions relating to single-cell DGE#18
Conversation
…im if would leave none, py-only catch if no metadata would be added, py-only catch and remove fake pseudobulks created by scanpy
|
this looks awesome! |
|
I think it would be worthwhile to include a pre-pseudobulk filter -- e.g. only pseudobulk a sample/cell type pair if there are at least X cells of that cell type in that sample And possibly downstream a corresponding DEG filter that only pulls a DEG comparison if there are a least N samples per group? |
| too_small <- psobject@meta.data[,output.metadata.cell.count] < min.cells | ||
| if (too_small == ncol(psobject)) { | ||
| warning(paste0("Skipping triming pseudobulks smaller than 'min_cells' as NONE were built from more than ", min_cells, " cells.")) | ||
| } else if (too_small > 0) { | ||
| msg_if("\tTrimming ", too_small, " pseudobulks built from fewer than ", min_cells, " cells.") | ||
| psobject <- psobject[,psobject@meta.data[,output.metadata.cell.count] >= min.cells] | ||
| } |
There was a problem hiding this comment.
I think it would be worthwhile to include a pre-pseudobulk filter -- e.g. only pseudobulk a sample/cell type pair if there are at least X cells of that cell type in that sample
This is included already, here for the R function! It runs after the pseudobulking currently, but could move it to before instead if there's good reason.
There was a problem hiding this comment.
oh awesome! apologies, I should have looked more carefully. I just realized it is not in the dreamlet pseudobulk function, but then is implemented in the processAssays, so was kind of making a note to myself
There was a problem hiding this comment.
I do think the downstream DEG filter to at least min.samples per category though is also useful
There was a problem hiding this comment.
agreed! got pulled away before posting that half =)
Hmm Agreed. Perhaps a function that assesses the requested DGE comps per the |
In a recent DS Working Group meeting, we discussed the utility of adding some standardized functions for performing DGE with various tools. We also laid out a few helper functions -- pseudobulking, gene filtering -- that felt required across tools.
This PR will directly include the helper functions and I'd propose that we use this
sc-dge-functions-branch as the base branch that we'll PR all of our tool-specific DGE function builds in to!Planned functions:
Side Note:
I have often found it hard to consistently import python functions across distinct methods of running -- jupyter notebooks, interactive shell, running a script with
python -u <script>. The method that has been working for me best recently is: