| Literature DB >> 33868380 |
Ge Zhang1, Zijing Xue1, Chaokun Yan1, Jianlin Wang1, Huimin Luo1.
Abstract
As one type of complex disease, gastric cancer has high mortality rate, and there are few effective treatments for patients in advanced stage. With the development of biological technology, a large amount of multiple-omics data of gastric cancer are generated, which enables computational method to discover potential biomarkers of gastric cancer. That will be very important to detect gastric cancer at earlier stages and thus assist in providing timely treatment. However, most of biological data have the characteristics of high dimension and low sample size. It is hard to process directly without feature selection. Besides, only using some omic data, such as gene expression data, provides limited evidence to investigate gastric cancer associated biomarkers. In this research, gene expression data and DNA methylation data are integrated to analyze gastric cancer, and a feature selection approach is proposed to identify the possible biomarkers of gastric cancer. After the original data are pre-processed, the mutual information (MI) is applied to select some top genes. Then, fold change (FC) and T-test are adopted to identify differentially expressed genes (DEG). In particular, false discover rate (FDR) is introduced to revise p_value to further screen genes. For chosen genes, a deep neural network (DNN) model is utilized as the classifier to measure the quality of classification. The experimental results show that the approach can achieve superior performance in terms of accuracy and other metrics. Biological analysis for chosen genes further validates the effectiveness of the approach.Entities:
Keywords: biomarkers; deep neural network; feature selection; gastric cancer; machine learning; omics data
Year: 2021 PMID: 33868380 PMCID: PMC8044773 DOI: 10.3389/fgene.2021.644378
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1The workflow of gastric cancer biomarker identification approach (GCBMI).
Figure 2The process of combining data.
Benchmark dataset.
| GEO ID | GSE29272 | GSE30601 |
| Normal samples | 134 | 203 |
| Tumor samples | 134 | 94 |
| Features | 13515 | 14476 |
Parameter setting.
| GCBMI | MI: |
| ET | Default parameters |
| IG-MBKH | |
| Elastic Net | |
| MOBBA-LS | opN = 500, Population = 20, iteration = 300, alpha = 0.9, sigma = 0.7, injRate = 0.01, extRate = 0.01 |
Performance comparison on different metrics (the accuracy, precision, recall, F1-score, and AUC value are average).
| GCBMI + DNN | 0.9836 | ||||
| ET + SVM | 0.9259 | 0.8571 | 0.9230 | 0.9333 | |
| Elastic Net + SVM | 0.8922 | 0.9003 | 0.9433 | 0.9210 | 0.8598 |
| IG-MBKH + KNN | 0.9518 | 0.9730 | 0.9166 | 0.9437 | 0.9483 |
| MOBBA-LS + SVM | 0.94 | 0.9477 | 0.9327 | 0.9401 | 0.9412 |
The bold values represent the highest value of each metrics.
Figure 3The experimental results of gastric cancer biomarker identification approach (GCBMI) compared with other methods.
Results with different classifiers (the accuracy, precision, recall, F1-score, and AUC value are average).
| DNN | |||||
| KNN | 0.9776 | 0.9934 | 0.9729 | 0.9830 | 0.9795 |
| SVM | 0.9819 | 0.9878 | 0.9826 | 0.9862 | 0.9803 |
| NB | 0.9651 | 0.9698 | 0.9777 | 0.9737 | 0.9557 |
The bold values represent the highest value of each metrics.
Figure 4The experimental results of gastric cancer biomarker identification approach (GCBMI) with different classifiers.
Selected genes from integrating gene expression and DNA methylation dataset.
| 17 | FAHD2A,PGC,FIGF,PPAP2B,FOXA1,IFITM2,HOXC10, GPRC5C,CLEC3B,FBN1,LIF,C5,PSCA,PDGFD,KCNE2, RORC,C3 | |
| 19 | PGC,FIGF,NID2,PPAP2B,IFITM2,RAB31,RORC,GPRC5C, FSCN1,TEAD4,CLEC3B,RAB17,IGFALS,C5,PSCA,PDGFD, KCNE2,COL4A1,C3 | |
| 17 | FAHD2A,PGC,PPAP2B,FOXA1,IFITM2,IGFALS,GPRC5C, TEAD4,DNM1,ORM1,PTPRN2,FBN1,PSCA,PDGFD, KCNE2,RORC,C3 | |
| 24 | PGC,FIGF,PDGFRB,PSMA7,TEAD4,C5,RORC,ADA, IFITM1,FAHD2A,PPAP2B,IGFALS,SLC1A2,GPRC5C, CLEC3B,CAPN9,KCNE2,PSCA,IFITM2,FSCN1,RPRM, PDGFD,SERPINA4,FBN1 | |
| 17 | IFITM1,PGC,FIGF,PPAP2B,KCNE2,IFITM2,HOXC10, GPRC5C,CAPN9,FBN1,HRAS,C5,PSCA,PDGFD, SERPINA4,RORC,C3 | |
| Overlapped genes in 5-CV | 8 | PGC,RORC,GPRC5C,PDGFD,KCNE2,PSCA,IFITM2, PPAP2B |
Figure 5Heatmap of eight overlapped genes.
GO analysis of selected genes.
| GOTERM_BP_DIRECT | GO:0071560 cellular response to transforming growth factor beta stimulus | 0.003912643 | CLEC3B,FBN1, PDGFD |
| GOTERM_BP_DIRECT | GO:0043406 positive regulation of MAP kinase activity | 0.005625548 | HRAS,PDGFRB, PDGFD |
| GOTERM_BP_DIRECT | GO:0008284 positive regulation of cell proliferation | 0.01138237 | LIF,HOXC10,HRAS, PDGFRB,PDGFD |
| GOTERM_BP_DIRECT | GO:0002576 platelet degranulation | 0.016395992 | ORM1,CLEC3B, SERPINA4 |
| GOTERM_BP_DIRECT | GO:0035456 response to interferon-beta | 0.017024892 | IFITM1,IFITM2 |
| GOTERM_BP_DIRECT | GO:0035455 response to interferon-alpha | 0.018899122 | IFITM1,IFITM2 |
| GOTERM_MF_DIRECT | GO:0048407 platelet-derived growth factor binding | 0.020021643 | COL4A1,PDGFRB |
| GOTERM_MF_DIRECT | GO:0005102 receptor binding | 0.026443684 | LIF,C3,C5,PDGFRB |
| GOTERM_MF_DIRECT | GO:0005161 platelet-derived growth factor receptor binding | 0.02720561 | PDGFRB,PDGFD |
| GOTERM_BP_DIRECT | GO:0036120 cellular response to platelet-derived growth factor stimulus | 0.033768846 | PDGFRB,PDGFD |
| GOTERM_BP_DIRECT | GO:0046597 negative regulation of viral entry into host cell | 0.033768846 | IFITM1,IFITM2 |
| GOTERM_BP_DIRECT | GO:0030335 positive regulation of cell migration | 0.047784333 | HRAS,PDGFRB, PDGFD |
| GOTERM_BP_DIRECT | GO:0048008 platelet-derived growth factor receptor signaling pathway | 0.053858697 | PDGFRB, PDGFD |