| Literature DB >> 35350240 |
Jiajie She1,2, Danna Su1, Ruiying Diao1, Liping Wang1.
Abstract
Endometriosis (EM), an estrogen-dependent inflammatory disease with unknown etiology, affects thousands of childbearing-age couples, and its early diagnosis is still very difficult. With the rapid development of sequencing technology in recent years, the accumulation of many sequencing data makes it possible to screen important diagnostic biomarkers from some EM-related genes. In this study, we utilized public datasets in the Gene Expression Omnibus (GEO) and Array-Express database and identified seven important differentially expressed genes (DEGs) (COMT, NAA16, CCDC22, EIF3E, AHI1, DMXL2, and CISD3) through the random forest classifier. Among these DEGs, AHI1, DMXL2, and CISD3 have never been reported to be associated with the pathogenesis of EMs. Our study indicated that these three genes might participate in the pathogenesis of EMs through oxidative stress, epithelial-mesenchymal transition (EMT) with the activation of the Notch signaling pathway, and mitochondrial homeostasis, respectively. Then, we put these seven DEGs into an artificial neural network to construct a novel diagnostic model for EMs and verified its diagnostic efficacy in two public datasets. Furthermore, these seven DEGs were included in 15 hub genes identified from the constructed protein-protein interaction (PPI) network, which confirmed the reliability of the diagnostic model. We hope the diagnostic model can provide novel sights into the understanding of the pathogenesis of EMs and contribute to the clinical diagnosis and treatment of EMs.Entities:
Keywords: artificial neural network; diagnostic efficacy; diagnostic model; endometriosis; random forest
Year: 2022 PMID: 35350240 PMCID: PMC8957986 DOI: 10.3389/fgene.2022.848116
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1Flow chart.
FIGURE 2Differential expression analysis. (A) Volcano plot of the result of differential expression analysis. The x-axis is log2 (fold change) and the y-axis is −log10 (adjusted p-value). The red dots represent significant upregulated expressed genes. The green dots represent significant downregulated expressed genes. The gray dots represent genes expressed with no change. (B) Heatmap of these DEGs. The colors in the graph from red to pink indicate the change from high to low expression levels. On the upper part of the heatmap, the blue band indicates the disease samples and the red band indicates the normal samples.
FIGURE 3The results of GO and KEGG enrichment analyses. (A) The top five GO terms of genes with significantly upregulated expressed level. (B) The top five GO terms of genes with significantly downregulated expressed level. (C) The top 10 KEGG pathways of genes with significantly upregulated expressed level. (D) The top 10 KEGG pathways of genes with significantly downregulated expressed level.
FIGURE 4Screening DEGs with the random forest model. (A) The relationship between the number of decision tree and the model error. The x-axis represents the number of decision trees, and the y-axis represents the error rate of the constructed model. When the number of decision trees is nearly 219, the error rate of the constructed model is relatively stable. (B) The importance of all variables in the random forest classifier through the Gini coefficient method. The x-axis represents the mean decrease of the Gini index, and the y-axis represents all variables. (C) The heatmap of k-means clustering in the GSE6364 dataset. The colors in the graph from red to blue indicate the change from high to low in expression level. On the upper part of the heatmap, the blue band indicates the disease samples and the red band indicates the normal samples.
FIGURE 5The artificial neural network model and the evaluation of the ROC curve. (A) The visualization of the artificial neural network model. (B) The evaluation results of the ROC curve in the GSE6364 dataset. (C) The verification results of the ROC curve in the E-MTAB-694 dataset. (D) The verification results of the ROC curve in the GSE7307 dataset. The x-axis and y-axis represent specificity and sensitivity, respectively. The AUC value is the area under the ROC curve.