What is the correct format for the input feature matrix prior to pickling?

Hello @GavinHaLab and @denniepatton,

Thank you for sharing the analysis pipeline used in your CRPC subtyping paper and for such an important contribution to the community. 

I'm working my way through the pipeline -- I ran ichorCNA and Griffin, and I would now like to implement ctdPheno. However, I'm having trouble determining how to format the input feature matrix (before pickling). I understand from the documentation that the matrix contains features for both reference data and samples of interest, and I am guessing the matrix includes the features shared in [the CRPCSubtypingPaper/Data/ directory](https://github.com/GavinHaLab/CRPCSubtypingPaper/tree/52119b92e8383533e3ca12bfac4a677492645f1b/Data). But what exactly should the matrix look like before it is pickled? For example, which data are the rows vs. columns (features vs. samples, or the transpose?) and is there a required naming convention, order of features, additional normalization or formatting, etc. beyond concatenating the features as given in the Data directory above?

I searched the code and couldn't find an example of the feature matrix before pickling; just the name of the pickle file hard-coded in main() of "ctdPheno.py". I did find a reference to formatting a pickle file with an "ExploreFM.py" pipeline in the SupervisedLearning section: https://github.com/GavinHaLab/CRPCSubtypingPaper/blob/52119b92e8383533e3ca12bfac4a677492645f1b/SupervisedLearning/XGBClassifier.py#L499 
but found no additional info on that pipeline.

Could you perhaps please provide an example file showing the required format of the input matrix, and/or direct me toward any documentation explaining this that I may have missed? 

Thank you very much,
Alana Weinstein

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the correct format for the input feature matrix prior to pickling? #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

What is the correct format for the input feature matrix prior to pickling? #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions