The MSCRC framework gathers the codes that have been used to study the integration of three omics data (mRNA expression, microRNA expression, and DNA methylation data) using sparse mCCA to classify colorectal cancer samples.
A multi-omics classifier was trained and is available for colorectal caner subtype prediction with MSCRCclassifier R package.
From TCGA-COAD and TCGA-READ datasets, we obtained 315 primary tumor samples with matched RNA-seq, microRNA-seq and gene-level DNA methylation profiles for multi-omics data integration. The first step includes multi-omics data pre-precessing and multi-omics data fusion. Before fusion, mRNA and microRNA expression data need to be log2-scaled. And each omics data need to be z-score scaled. Sparse mCCA was applied for multi-omics data projection, followed by a weighted average for multi-omics data fusion. Sparse mCCA was implemeted with PMA R package.
After getting the integraed multi-omics data, consensus clustering was applied for colorectal cancer subtyping. The details can be accessed here.
Subsequently, using the consensus clustering results as labels, a multi-omics classification model using the naïve Bayes algorithm, which has generated reliable and accurate classification results, was built.
To assess the generalization of the multi-omics classifier, we applied it to independent datasets. The detailed prediction procedures can be accessed here.
