-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Hello @GavinHaLab and @denniepatton,
Thank you for sharing the analysis pipeline used in your CRPC subtyping paper and for such an important contribution to the community.
I'm working my way through the pipeline -- I ran ichorCNA and Griffin, and I would now like to implement ctdPheno. However, I'm having trouble determining how to format the input feature matrix (before pickling). I understand from the documentation that the matrix contains features for both reference data and samples of interest, and I am guessing the matrix includes the features shared in the CRPCSubtypingPaper/Data/ directory. But what exactly should the matrix look like before it is pickled? For example, which data are the rows vs. columns (features vs. samples, or the transpose?) and is there a required naming convention, order of features, additional normalization or formatting, etc. beyond concatenating the features as given in the Data directory above?
I searched the code and couldn't find an example of the feature matrix before pickling; just the name of the pickle file hard-coded in main() of "ctdPheno.py". I did find a reference to formatting a pickle file with an "ExploreFM.py" pipeline in the SupervisedLearning section:
| # data is formatted in the "ExploreFM.py" pipeline |
but found no additional info on that pipeline.
Could you perhaps please provide an example file showing the required format of the input matrix, and/or direct me toward any documentation explaining this that I may have missed?
Thank you very much,
Alana Weinstein