Skip to content

Carpentierbio/MSCRC

Repository files navigation

MSCRC

The MSCRC framework gathers the codes that have been used to study the integration of three omics data (mRNA expression, microRNA expression, and DNA methylation data) using sparse mCCA to classify colorectal cancer samples. A multi-omics classifier was trained and is available for colorectal caner subtype prediction with MSCRCclassifier R package.

workflow

Multi-omics data integration

From TCGA-COAD and TCGA-READ datasets, we obtained 315 primary tumor samples with matched RNA-seq, microRNA-seq and gene-level DNA methylation profiles for multi-omics data integration. The first step includes multi-omics data pre-precessing and multi-omics data fusion. Before fusion, mRNA and microRNA expression data need to be log2-scaled. And each omics data need to be z-score scaled. Sparse mCCA was applied for multi-omics data projection, followed by a weighted average for multi-omics data fusion. Sparse mCCA was implemeted with PMA R package.

Unsupervised colorectal cancer subtyping

After getting the integraed multi-omics data, consensus clustering was applied for colorectal cancer subtyping. The details can be accessed here.

Multi-omics classifier building

Subsequently, using the consensus clustering results as labels, a multi-omics classification model using the naïve Bayes algorithm, which has generated reliable and accurate classification results, was built.

Multi-omics classifier prediction

To assess the generalization of the multi-omics classifier, we applied it to independent datasets. The detailed prediction procedures can be accessed here.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors