Skip to content

What is the correct format for the input feature matrix prior to pickling? #1

@alanasweinstein

Description

@alanasweinstein

Hello @GavinHaLab and @denniepatton,

Thank you for sharing the analysis pipeline used in your CRPC subtyping paper and for such an important contribution to the community.

I'm working my way through the pipeline -- I ran ichorCNA and Griffin, and I would now like to implement ctdPheno. However, I'm having trouble determining how to format the input feature matrix (before pickling). I understand from the documentation that the matrix contains features for both reference data and samples of interest, and I am guessing the matrix includes the features shared in the CRPCSubtypingPaper/Data/ directory. But what exactly should the matrix look like before it is pickled? For example, which data are the rows vs. columns (features vs. samples, or the transpose?) and is there a required naming convention, order of features, additional normalization or formatting, etc. beyond concatenating the features as given in the Data directory above?

I searched the code and couldn't find an example of the feature matrix before pickling; just the name of the pickle file hard-coded in main() of "ctdPheno.py". I did find a reference to formatting a pickle file with an "ExploreFM.py" pipeline in the SupervisedLearning section:

# data is formatted in the "ExploreFM.py" pipeline

but found no additional info on that pipeline.

Could you perhaps please provide an example file showing the required format of the input matrix, and/or direct me toward any documentation explaining this that I may have missed?

Thank you very much,
Alana Weinstein

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions