| Literature DB >> 35865460 |
Wenjing Qiu1,2, Jiasheng Yang1, Bing Wang1, Min Yang1,2, Geng Tian2,3, Peizhen Wang1, Jialiang Yang2,3.
Abstract
Microsatellite instability (MSI), an important biomarker for immunotherapy and the diagnosis of Lynch syndrome, refers to the change of microsatellite (MS) sequence length caused by insertion or deletion during DNA replication. However, traditional wet-lab experiment-based MSI detection is time-consuming and relies on experimental conditions. In addition, a comprehensive study on the associations between MSI status and various molecules like mRNA and miRNA has not been performed. In this study, we first studied the association between MSI status and several molecules including mRNA, miRNA, lncRNA, DNA methylation, and copy number variation (CNV) using colorectal cancer data from The Cancer Genome Atlas (TCGA). Then, we developed a novel deep learning framework to predict MSI status based solely on hematoxylin and eosin (H&E) staining images, and combined the H&E image with the above-mentioned molecules by multimodal compact bilinear pooling. Our results showed that there were significant differences in mRNA, miRNA, and lncRNA between the high microsatellite instability (MSI-H) patient group and the low microsatellite instability or microsatellite stability (MSI-L/MSS) patient group. By using the H&E image alone, one can predict MSI status with an acceptable prediction area under the curve (AUC) of 0.809 in 5-fold cross-validation. The fusion models integrating H&E image with a single type of molecule have higher prediction accuracies than that using H&E image alone, with the highest AUC of 0.952 achieved when combining H&E image with DNA methylation data. However, prediction accuracy will decrease when combining H&E image with all types of molecular data. In conclusion, combining H&E image with deep learning can predict the MSI status of colorectal cancer, the accuracy of which can further be improved by integrating appropriate molecular data. This study may have clinical significance in practice.Entities:
Keywords: H&E images; compact bilinear pooling; microsatellite instability; multi-omics data; multimodal deep learning
Year: 2022 PMID: 35865460 PMCID: PMC9295995 DOI: 10.3389/fonc.2022.925079
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 5.738
The properties of the dataset.
| Data Category | Abbreviation | Number of features |
|---|---|---|
| Messenger RNA | mRNA | 19,531 |
| MicroRNAs | miRNA | 1,881 |
| Long non-coding RNA | lncRNA | 7,308 |
| DNA methylation | Met | 27,578 |
| Copy number variation | CNV | 60,483 |
Figure 1The network architecture of ResNet34.
Figure 2Experimental flowchart. (A) Only H&E images data. (B) H&E images combined with multi-omics data.
Figure 3Differential analysis of mRNA, miRNA, and lncRNA. (A) Heat map of the top 40 differentially expressed genes of mRNA. (B) Heat map of the top 40 differentially expressed genes of miRNA and (C) lncRNA. (D) GO analysis, including BP, CC, and MF. (E) KEGG enrichment analysis.
Figure 4Performance of H&E images and images combined with omics data. (A) The AUC score of image and image combined with omics data. (B) Performance of each mode in Accuracy, Precision, Recall, and F1_score index. HE_omi: H&E image features combined with multi-omics features.