| Literature DB >> 35428835 |
Mi-Kyoung Seo1, Hyundeok Kang1, Sangwoo Kim2.
Abstract
Detecting microsatellite instability (MSI) in colorectal cancers (CRCs) is essential because it is the determinant of treatment strategies, including immunotherapy and chemotherapy. Yet, no attempt has been made to exploit transcriptomic profile and tumor microenvironment (TME) of it to unveil MSI status in CRC. Hence, we developed a novel TME-aware, single-transcriptome predictor of MSI for CRC, called MAP (Microsatellite instability Absolute single sample Predictor). MAP was developed utilizing recursive feature elimination-random forest with 466 CRC samples from The Cancer Genome Atlas, and its performance was validated in independent cohorts, including 1118 samples. MAP showed robustness and predictive power in predicting MSI status in CRC. Additional advantages for MAP were demonstrated through comparative analysis with existing MSI classifier and other cancer types. Our novel approach will provide access to untouched vast amounts of publicly available transcriptomic data and widen the door for MSI CRC research and be useful for gaining insights to help with translational medicine.Entities:
Mesh:
Year: 2022 PMID: 35428835 PMCID: PMC9012745 DOI: 10.1038/s41598-022-10182-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Overview of the MAP model and designing the MAP signature from RFE-RF analysis of gene expression data. (a) Overview of the MAP model. MAP was developed through a workflow consisting of four strategies. (1) identification of the MAP signature (MAPgene model); (2) modeling based on pairwise gene expression of the MAP signature genes (MAPpairs model); (3) modeling based on ssGSEA scores of cancer-, molecular-, TME-, and immune-related signatures (MAPsig model); and (4) post-refinement of the final model and prediction of MSI status. (b) A volcano plot for DEGs between MSI and MSS samples. The x axis represents log2 fold changes in gene expression data for MSI versus MSS samples. Colored dots are significant DEGs in MAP signature; red and blue indicate up- and downregulated genes, respectively. (c) The importance of 31 features is based on accuracy and Gini index scores. The mean decrease in accuracy is a measure of how much influence it has in improving classification accuracy. The mean decrease in Gini is a measure of how impurity can be reduced by features used when separating nodes. The genes with red and blue colors indicate up- and downregulated genes in MSI, compared with MSS, respectively. (d) MAP signature. A box-plot of MAP signature ssGSEA scores according to MSI status (left) and CMS-MSI and MSS subtypes (right). The dots represent samples. MAP signature scores differ significantly between MSI and MSS samples independent of CMS subtypes. CMS2-MSI did not confirm statistical significance because the number of samples was small. * P < 0.05, ** P < 0.01, *** P < 0.005. DEG; differentially expressed gene, MSI; microsatellite instability, MSS; microsatellite stability, RFE-RF; recursive feature elimination-random forest, CMS; consensus molecular subtype, ssGSEA; single-sample gene set enrichment analysis, FDR; false discovery rate.
Figure 2MAP model. (a) Top 30 important features of the MAPpairs model. The mean decrease in accuracy (left) is a measure of how much influence a feature has in improving classification accuracy. The mean decrease in Gini (right) is a measure of how impurity can be reduced by features used when separating nodes. (b) A scatter plot and histogram of the gene pairs. The relationship between the expression of two genes in the MSI and MSS groups can be confirmed through a scatter plot, and the expression value of each gene can be confirmed through the density plots at the upper and lower right corners. MLH1-related rules and TFGBR2/TYMS rule are shown. (c) Top 20 important features (signatures) of the MAPsig model. (d) Performance (accuracy, sensitive, specificity, and F1) of the MAP model. (e) Confusion matrices of the validation dataset. The actual MSI means MSI status provided in the dataset study. The red color-scale reflects percentages of class predictions against the actual class.
Figure 3MSI signatures. (a) MAP signature and UCEC MSI signature on TCGA-UCEC. (b) MAP signature and STAD MSI signature on TCGA-STAD. (c) MAP signature and preMSIm signature on TCGA-COADREAD. (d) MAP signature and preMSIm signature on TCGA-STAD. (e) MAP signature and preMSIm signature on TCGA-UCEC. The x-axis represents log2 fold changes in gene expression data for MSI versus MSS samples. The colored dots mean the genes of the corresponding signatures marked in each panel. The blue dotted line on the x-axis means − 1 and 1 of the log2 fold change scale, and 2 (− log10(0.01)) on the y-axis.