| Literature DB >> 35298091 |
Ronja S Adam1,2, Dennis Poel3, Leandro Ferreira Moreno1,2, Joey M A Spronck1,2, Tim R de Back1,2, Arezo Torang1,2, Patricia M Gomez Barila1,2, Sanne Ten Hoorn1,2, Florian Markowetz4, Xin Wang5,6, Henk M W Verheul3, Tineke E Buffart1,7, Louis Vermeulen1,2.
Abstract
Previously, colorectal cancer (CRC) has been classified into four distinct molecular subtypes based on transcriptome data. These consensus molecular subtypes (CMSs) have implications for our understanding of tumor heterogeneity and the prognosis of patients. So far, this classification has been based on the use of messenger RNAs (mRNAs), although microRNAs (miRNAs) have also been shown to play a role in tumor heterogeneity and biological differences between CMSs. In contrast to mRNAs, miRNAs have a smaller size and increased stability, facilitating their detection. Therefore, we built a miRNA-based CMS classifier by converting the existing mRNA-based CMS classification using machine learning (training dataset of n = 271). The performance of this miRNA-assigned CMS classifier (CMS-miRaCl) was evaluated in several datasets, achieving an overall accuracy of ~ 0.72 (0.6329-0.7987) in the largest dataset (n = 158). To gain insight into the biological relevance of CMS-miRaCl, we evaluated the most important features in the classifier. We found that miRNAs previously reported to be relevant in microsatellite-instable CRCs or Wnt signaling were important features for CMS-miRaCl. Following further studies to validate its robustness, this miRNA-based alternative might simplify the implementation of CMS classification in clinical workflows.Entities:
Keywords: colorectal cancer; consensus molecular subtypes; miRNA; microRNA
Mesh:
Substances:
Year: 2022 PMID: 35298091 PMCID: PMC9297751 DOI: 10.1002/1878-0261.13210
Source DB: PubMed Journal: Mol Oncol ISSN: 1574-7891 Impact factor: 7.449
Fig. 5Clinical implications. (A) Cox proportional hazards model for the effect on overall survival (OS) for the 10 most important miRaCl/miRaCl‐20 features on EGAS1127 dataset, error bars represent 95% confidence intervals. (B, C) Kaplan–Meier analyses for OS stratified by the miRaCl or miRaCl‐20 predicted CMS classes in EGAS1127 dataset (n = 82), log‐rank P‐value from score test. (D) Comparison of miRaCl‐predicted CMS classes in patients with paired primary (prim.) and metastasis (met.) samples, including one recurrent (recur.) colorectal tumor; LN, lymph node; NOS, not otherwise specified. If multiple metastases were available, the primary was duplicated for visualization purposes, as marked by underscore extension of patient IDs (n = 38).
Fig. 1Experimental setup and dataset description. (A) Schematic representation of the workflow used to train and validate the consensus molecular subtype miRNA‐assigned classifier (CMS‐miRaCl). (B) Heatmap of the 10 most significantly differentially expressed miRNAs in the colon adenocarcinoma (COAD) training dataset (n = 271) per CMS class, showing read counts after preprocessing and scaling to z‐scores. On top, the mRNA‐based CMS classification and P‐value from CMS classifier are depicted, and on the right, the Benjamini–Hochberg adjusted P‐value from Wald statistics and the mRNA‐based class in which the miRNA was significantly upregulated. (C) Counts of samples' mRNA‐based CMS classes in the COAD training dataset (n = 271) and the rectal adenocarcinoma (READ) test dataset (n = 158). (D) Frequencies of samples' pathology‐based stage in training and test datasets.
Fig. 2Performance of classifier. (A) Performance of different classifiers on the training dataset COAD (n = 271), random forest (rf), and support vector machines (SVM) optimizing Kappa or accuracy (acc), respectively. (B) Confusion matrix of CMS predictions from resulting miRNA‐assigned random forest CMS classifier (miRaCl) compared with mRNA‐based CMS classes using the rectal adenocarcinoma (READ) test dataset (n = 122). (C) Confusion matrix of CMS predictions from reduced random forest classifier based on the 20 most important features (miRaCl‐20) in rows compared with known mRNA‐based CMS classes in columns using the READ test dataset (n = 122). (D) The alluvial plot of miRaCl predictions from all 381 input features in comparison with the predictions of miRaCl‐20 in READ (n = 158) and EGAS1127 primary tumor samples (n = 126). (E) Confusion matrix of microarray‐based GSE35834 test dataset comparing CMS predictions from miRNA‐assigned classifier based on the 20 most important features in rows with mRNA‐based CMS predictions in columns as far as available (n = 23). (F) Confidence of miRaCl predictions was determined as the difference between the probabilities of the first and the second most likely class in READ (n = 158) and in EGAS1127 primary tumor samples (n = 126) and the means differed (tendentially) between the CMS classes (Kruskal–Wallis test). Boxes mark the interquartile range (IQR), whiskers extend to the furthest value within 1.5*IQR (Tukey whiskers). (G) We tested for correlation with the tumor purity (Pearson correlation test). (H) miRNA‐based CMS predictions as fractions of primary samples in training and test datasets, including samples where mRNA‐based classification was not possible.
Fig. 3Important features of miRaCl. (A) Importance (Gini index) identified during the miRaCl training on colon adenocarcinoma dataset COAD, shown for the 20 features with the highest mean decrease in impurity. Asterix marks miRNAs previously reported as tumor‐specific [22]. (B) Density distributions (Gaussian kernel) of miRNA expression levels (read counts on log2 scale) in COAD stratified by known mRNA‐based CMS for the 10 most important miRNAs, which are used in miRaCl and miRaCl‐20. (C) Genes predicted to be targets of the miRNA from miRaCl‐20 were first intersected with genes downregulated in each CMS and afterwards tested for overlap with Hallmark gene sets. For each CMS, we show the miRNA with the lowest P‐value (one‐sided hypergeometric tests).
Fig. 4Features of miRaCl in regulatory networks. Regulatory networks were constructed from colon adenocarcinoma dataset COAD using up to 200 most differentially expressed mRNAs with absolute log2 fold change |log2FC| > 0.85 adjusted P‐value (Padj) < 0.001 per CMS and the most differentially expressed miRNAs with |log2FC| > 0.71 and Padj < 0.05 (Wald statistic, Benjamini–Hochberg corrected). Regulatory elements were identified amongst upregulated miRNAs with Padj < 0.001 in each CMS, respectively (A–D). Importance in miRaCl is indicated as node size, members of miRaCl‐20 are named in bold font for regulators and regular font for targets.