Literature DB >> 36052230

A computational method for large-scale identification of esophageal cancer-related genes.

Xin He1, Wei-Song Li2, Zhen-Gang Qiu3, Lei Zhang4, He-Ming Long3, Gui-Sheng Zhang5, Yang-Wen Huang5, Yun-Mei Zhan5, Fan Meng4.   

Abstract

The incidence of esophageal cancer has obvious genetic susceptibility. Identifying esophageal cancer-related genes plays a huge role in the prevention and treatment of esophageal cancer. Through various sequencing methods, researchers have found only a small number of genes associated with esophageal cancer. In order to improve the efficiency of esophageal cancer genetic susceptibility research, this paper proposes a method for large-scale identification of esophageal cancer-related genes by computational methods. In order to improve the efficiency of esophageal cancer genetic susceptibility research, this paper proposes a method for large-scale identification of esophageal cancer-related genes by computational methods. This method fuses graph convolutional network and logical matrix factorization to effectively identify esophageal cancer-related genes through the association between genes. We call this method GCNLMF which achieved AUC as 0.927 and AUPR as 0.86. Compared with other five methods, GCNLMF performed best. We conducted a case study of the top three predicted genes. Although the association of these three genes with esophageal cancer has not been reported in the database, studies by other reseachers have shown that these three genes are significantly associated with esophageal cancer, which illustrates the accuracy of the prediction results of GCNLMF.
Copyright © 2022 He, Li, Qiu, Zhang, Long, Zhang, Huang, Zhan and Meng.

Entities:  

Keywords:  esophageal cancer; gene; gene interaction; graph convolutional network; logical matrix factorization

Year:  2022        PMID: 36052230      PMCID: PMC9425068          DOI: 10.3389/fonc.2022.982641

Source DB:  PubMed          Journal:  Front Oncol        ISSN: 2234-943X            Impact factor:   5.738


Introduction

Esophageal cancer is a common gastrointestinal malignancy, and its common clinical symptoms include retrosternal pain and progressive dysphagia (1). Judging from its prevalence, the incidence of esophageal cancer in China is relatively high globally. The pathological type of esophageal squamous cell carcinoma is more common. The typical symptoms of esophageal cancer patients are not obvious in the early stage, and the disease progresses slowly, so it is difficult to detect early. However, when esophageal cancer develops to the middle and advanced stage, the treatment difficulty increases and the prognosis is poor (2). At present, the treatment of patients with esophageal cancer is mainly surgery, radiotherapy, and chemotherapy. The patients with advanced stage have poor curative effect and high mortality (3). The occurrence of esophageal squamous cell carcinoma usually goes through a long-term and multi-stage development process. In the original efficient and orderly epithelial renewal cycle, carcinogenic factors are continuously exposed. The basal cells first show morphological changes, atypical hyperplasia and invasion to the surface. The squamous epithelial cells show nuclear atypia and abnormal differentiation. In the early stage of carcinogenesis, this pathological change is limited to the inner part of the mucosal layer and does not break through the basement membrane to infiltrate and invade downward. It is called squamous epithelial dysplasia and is the only recognized form of precancerous lesions of esophageal squamous cell carcinoma (4). A 13 year prospective cohort study (5) conducted a long-term follow-up of normal and precancerous people in Linzhou, Henan Province. It was found that compared with normal people, the relative risk of esophageal squamous cell carcinoma in patients with precancerous lesions (regardless of the degree of specific lesions) was 12.7 (5.5-29.6) times higher than that in normal people. Moreover, the cumulative incidence rate of esophageal squamous cell carcinoma in patients initially diagnosed with precancerous lesions at the end of the study was 58%, which was 8% in the population initially diagnosed with no abnormality. Therefore, atypical hyperplasia of squamous epithelium is a high-risk factor and predictor of esophageal squamous cell carcinoma. Timely early diagnosis of patients with precancerous lesions is an important means to reduce the incidence rate of esophageal squamous cell carcinoma. At present, regular gastroscopy screening for high-risk groups is an effective method for early diagnosis of esophageal cancer. However, due to the heterogeneity between patients, different patients with the same diagnosis still have different outcomes and outcomes. Therefore, an in-depth understanding of the causes of esophageal epithelial progression from normal to precancerous lesions to tumors and a comprehensive analysis of the molecular mechanism of tumor occurrence are of indispensable value for us to evaluate the risk of progression of patients with precancerous lesions, improve the diagnosis and cure rate of patients, and increase the means and opportunities for early diagnosis and treatment. With the development and progress of next-generation sequencing technology, multi-omics research on tumors has become an indispensable means to explore the mechanism of tumor occurrence and development. In recent years, a number of esophageal cancer genomic studies, including the Cancer Genome Atlas (TCGA) project, have identified a large number of genomic variants in esophageal squamous cell carcinoma by performing whole-exome or whole-genome sequencing of clinically collected tumor tissue samples (6). Although these studies reveal the important role of the identified genomic alterations in ESCC, the question of how normal epithelial cells are transformed into invasive carcinomas through mutations in precancerous lesions remains unanswered due to the cross-sectional design of previous studies. Compared with studies on esophageal squamous cell carcinoma, there are still few studies on esophageal precancerous lesions. Some researchers used microdissection experimental technology to collect tumor lesions and precancerous lesions adjacent to the tumor on paraffin sections of 45 cases of esophageal squamous cell carcinoma, as well as lesions on paraffin sections of 13 precancerous lesions for full penetrance. Subgroup sequencing analysis showed that epithelial cells in the precancerous stage already have mutations similar to those of tumors, including high-frequency mutations in esophageal cancer driver genes such as TP53, NFE2L2, NOTCHI, FAT1, indicating that the precancerous stage Epithelial cells have undergone the effects of genomic variation (7). Coincidentally, in another report, the researchers performed whole-exome sequencing on 227 different pathological stages of 70 patients with esophageal squamous cell carcinoma, and also found that dysplasia and esophageal squamous cell carcinoma have similar driver genes. Moreover, they also found that there were no genomic alterations of the same type of cancer foci in the tissues of simple non-dysplasia, indicating that most of the genomic events related to canceration started from the stage of precancerous lesions (8). Researchers have performed genomic mutation studies on pathologically normal esophagus (9) and their results have shown that although the exon mutation burden of normal esophageal epithelial cells (derived from human individuals without esophageal squamous cell carcinoma) increases with age, but no cancer-related morphological changes occurred from a histopathological point of view. The results showed that although the exon mutation load of normal esophageal epithelial cells (derived from human individuals without esophageal squamous cell carcinoma) increased with age, there were no cancer-related morphological changes from the perspective of histopathology. The above studies suggest that in the overall organizational environment, the genomic changes of epithelial cells are not enough to fully explain the occurrence of esophageal cancer. Other factors such as immunosuppression in the microenvironment (TME) and cell-cell interaction may also play an important role in the occurrence of esophageal squamous cell carcinoma. A number of experimental studies and clinical analyses have also revealed the impact of TME on tumorigenesis and development in esophageal squamous cell carcinoma. Kashima et al. (10)found that the positive intensity of cancer associated fibroblasts (CAFs) was significantly positively correlated with lymph node metastasis by staining FFPE tissue sections of patients with esophageal squamous cell carcinoma, so they verified this hypothesis through in vitro experiments and in situ metastasis mouse models, CAFs can promote the metastatic ability of cancer cells and can be used as a marker of patient prognosis. Another experimental study on the microenvironment cells of esophageal squamous cell carcinoma found that the up regulation of transcription factor F0X01 can promote the polarization of macrophages from M0 to M2 by regulating the expression of CCL20 and csf1, while M2 cells play the regulatory functions of anti-inflammatory and immunosuppression, and promote the occurrence of tumors (11). Similarly, Yang et al. found that blocking the recruitment of tumor associated macrophages (TAMs) can significantly reduce the incidence of tumors in the mouse tumorigenesis model and enhance the anti-tumor effect of CD8 + T cells in the tumor microenvironment. More importantly, M2 polarization increases the expression of PD-L2 in TAMs, leading to immune evasion and tumor promotion through PD-1 signaling pathway (12). A large number of biological experiments have only found a small number of genes related to esophageal cancer. In recent years, some scholars have identified esophageal cancer-related genes through computational methods such as machine learning. Liu et al. (13) identified genetic biomarkers of esophageal cancer by SALP-seq and machine learning methods. Wang et al. (14)identified the survival risk of esophageal cancer through the Kohonen network clustering algorithm and kernel extreme learning machine. Li et al. (15)used five conventional machine learning methods to identify key prognostic molecules in esophageal squamous cell carcinoma. Most of these previous studies performed gene differential expression analysis through data from a small number of patients to obtain genes related to esophageal cancer. Its sample size is insufficient and there is a sample-specific bias. It has become a trend to predict disease-related features through associations between biomolecules (16, 17). Therefore, we intend to identify esophageal cancer-related genes by their associations and correlation signatures. Through the known gene signatures associated with esophageal cancer, a computational model was constructed to explore the association of other genes with esophageal cancer.

Materials and methods

41 genes ( ) are found to be related to esophageal cancer by DisGeNet (18). We constructed a gene interaction network by String (19), which shows as .
Figure 1

Gene interaction network of 41 esophageal cancer-related genes.

Gene interaction network of 41 esophageal cancer-related genes. We implemented Graph Convolutional Network (GCN) to extract feature of each gene from gene interaction network. A graph network requires the input of the node feature matrix and the adjacency matrix, so that the aggregation operation of the nodes can be performed. The input of GCN is a feature matrix A and its dimension is N·F 0, where N is the number of nodes in the graph network and F 0 is the number of input features per node. The adjacency matrix A matrix representation of a graph structure whose dimension is N*N. The aggregated representation of a node does not contain its own features, the representation is the feature aggregation of neighboring nodes, so only nodes with self-loops will contain their own features in this aggregation. Therefore, The propagation rules for this network are as follows: where H is the weight matrix of the i-th layer, σ() is a nonlinear activation function, and the weights are shared among different nodes. A node with a large degree will have a large value in its feature representation, and a node with a small degree will have a small value, which may cause the gradient to disappear or explode, and also affect the stochastic gradient descent algorithm. Therefore, the feature table needs to be normalized, the matrix A is multiplied by the inverse of the matrix D, and it is transformed. We implemented Logistic Matrix Factorization(LogisticMF) to identify esophageal cancer-related genes. Unlike most previous matrix factorization models, LogisticMF does not use RMSE as its loss function, but a probabilistic approach. Specifically, given an observation matrix R, it is approximated by the inner product of two low-dimensional matrices X an X , where f is the dimension of the latent factor. Definition l means that esophageal cancer (u) is related to gene i, and its conditional probability is given as follows: where β, β represent the bias. Similar to Collaborative Filtering for Implicit Feedback Datasets, LogisticMF also uses confidence to represent its frequency. The confidence mapping function can take: where a is a smoothing parameter that adjusts the weight of positive and negative examples. Combining the above formula, we can get: Furthermore, the underlying association matrix of esophageal cancer and genes is assumed to follow a Gaussian distribution: Then its posterior probability is: we should maximize the posterior probability, so use alternating gradient descent to optimize:

Results

Experiment workflow

We have obtained 41 genes which are related to esophageal cancer and wo also need negative samples to build our model. Therefore, we randomly selected 200 genes as the negative samples. We used 10-cross validation to verify the accuracy of our model. We divided our samples into 10 groups. We used nine groups of datasets to build the model and the rest one to test the model.

Performance of GCNLMF

We apply two evaluation metrics, AUC and AUPR, to evaluate our method. The experimental results of ten tests are shown in , .
Figure 2

AUC curves of GCNLMF in 10-cross validation.

Figure 3

AUPR of GCNLMF in 10-cross validation.

AUC curves of GCNLMF in 10-cross validation. AUPR of GCNLMF in 10-cross validation. The average of AUC is 0.927 and the standard deviation is 0.035. The average of AUPR is 0.86 and the standard deviation is 0.021. Through the cross-validation experiment, we can see that the prediction accuracy of GCNLMF is very high and stable.

Comparison experiments

To highlight the superiority of GCNLMF, we compare it with five methods. The AUC for each method is the average value obtained by 10-fold cross-validation. The five methods include random forest (RF), gradient boosting decision tree (GBDT), GCN, LMF and Support Vector Machine(SVM). In RF, the number of decision trees was set as 100. The results are shown in . The experiment showed that GCNLMF had the highest performance among all methods according to AUC and AUPR scores. Compared with GBDT, RF, GCN, LMF SVM, the AUC of GCNLMF increased by 14%, 9.6%, 1.4%, 3% and 7.4%, respectively. The AUPR scores increased by 15%, 9.7%, 1.4%, 2.3% and 9.6%, respectively.
Figure 4

Results of GCNLMF compared to the other five methods.

Results of GCNLMF compared to the other five methods.

Case study

After building GCNLMF model, we used it to predict novel esophageal cancer-related genes. IL-10 is not reported to be related to esophageal cancer in the public database and GCNLMF predicted it as an esophageal cancer-related gene. Yang et al. (20) found that the -1082g/a rs1800896 genetic variation can be used as a candidate biomarker to predict the susceptibility of esophageal cancer by comparing the IL10 genotypes of 246 pathologically confirmed esophageal cancer patients and 492 healthy control subjects. Sun et al. (21) found that ETV5 was upregulated in Esophageal squamous cell carcinoma and was associated with tumor staging and prognosis. Knockdown of ETV5 or its downstream genes SKA1 and TRPV2 significantly suppress Esophageal squamous cell carcinoma cells migration and invasion, respectively. Kuerbanjiang et al. (22) detected the expression of BRAF in esophageal cancer samples by tissue microarray, and the results showed that BRAF plays an important role in the proliferation, invasion and metastasis of esophageal cancer, and overexpression of BRAF leads to shortened overall survival.

Conclusions

The incidence of esophageal cancer has obvious familial aggregation phenomenon, which is related to the susceptibility of the population and environmental conditions. In areas with high incidence of esophageal cancer, it is not uncommon for families to have esophageal cancer patients for 3 or more consecutive generations. Therefore, it is important to discover the genetic factors of esophageal cancer. Most previous studies have compared esophageal cancer patients with healthy people by means of DNA sequencing and RNA sequencing, so as to find gene mutations and abnormal gene expression associated with esophageal cancer. However, the time and money costs of such methods are high. At the same time, the sample size is limited and there are differences between samples. As a result, the numbers of genes associated with esophageal cancer were both small and inaccurate. Our previous studies have also confirmed the critical role of key genes and signaling pathways in the progression of esophageal cancer (23–25). This paper proposes a method GCNLMF for large-scale identification of esophageal cancer-related genes, which can effectively identify the characteristics of esophageal cancer-related genes. Through the correlation and characteristics between genes, more genes related to esophageal cancer can be predicted. In order to verify the accuracy of GCNLMF, we used 10-cross validation. The AUC of GCNLMF was 0.927 and the aupr was 0.86 And in ten experiments, the standard deviation of these two indicators is very small, which shows that the method is robust. We also compare GCNLMF with five other commonly used methods, and we find that the accuracy of GCNLMF is significantly higher than other methods. In order to verify the accuracy of the esophageal cancer-related genes predicted by GCNLMF, we selected the top 3 genes in the prediction results to conduct a case study. Although the association of these three genes with esophageal cancer has not been reported in the database, studies by other reseachers have shown that these three genes are significantly associated with esophageal cancer, which illustrates the accuracy of the prediction results of GCNLMF.

Data availability statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/.

Ethics statement

Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

Author contributions

XH and FM participated in its design. W-SL, Z-GQ, LZ, H-ML, Y-WH and Y-MZ interpreted and analyzed the data. XH, W-SL and G-sZ wrote the paper. All authors contributed to the article and approved the submitted version.

Funding

Financial support comes from the National Natural Science Foundation of China (No. 81960438, 82103067), Natural Science Foundation of Jiangxi Province (No. 20212BAB206078) and Jiangxi Provincial Education Foundation (No. GJJ190790, GJJ190792).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
  24 in total

1.  Role of IL-10 gene polymorphisms on the susceptibility for esophageal cancer and its association with environmental factors.

Authors:  Yingtao Yang; Xianen Fa
Journal:  Int J Clin Exp Pathol       Date:  2015-08-01

2.  Genetic landscape of esophageal squamous cell carcinoma.

Authors:  Yi-Bo Gao; Zhao-Li Chen; Jia-Gen Li; Xue-Da Hu; Xue-Jiao Shi; Zeng-Miao Sun; Fan Zhang; Zi-Ran Zhao; Zi-Tong Li; Zi-Yuan Liu; Yu-Da Zhao; Jian Sun; Cheng-Cheng Zhou; Ran Yao; Su-Ya Wang; Pan Wang; Nan Sun; Bai-Hua Zhang; Jing-Si Dong; Yue Yu; Mei Luo; Xiao-Li Feng; Su-Sheng Shi; Fang Zhou; Feng-Wei Tan; Bin Qiu; Ning Li; Kang Shao; Li-Jian Zhang; Lan-Jun Zhang; Qi Xue; Shu-Geng Gao; Jie He
Journal:  Nat Genet       Date:  2014-08-24       Impact factor: 38.330

3.  Prediction and collection of protein-metabolite interactions.

Authors:  Tianyi Zhao; Jinxin Liu; Xi Zeng; Wei Wang; Sheng Li; Tianyi Zang; Jiajie Peng; Yang Yang
Journal:  Brief Bioinform       Date:  2021-02-08       Impact factor: 11.622

4.  Genetic Alterations in Esophageal Tissues From Squamous Dysplasia to Carcinoma.

Authors:  Xi Liu; Min Zhang; Songmin Ying; Chong Zhang; Runhua Lin; Jiaxuan Zheng; Guohong Zhang; Dongping Tian; Yi Guo; Caiwen Du; Yuping Chen; Shaobin Chen; Xue Su; Juan Ji; Wanting Deng; Xiang Li; Shiyue Qiu; Ruijing Yan; Zexin Xu; Yuan Wang; Yuanning Guo; Jiancheng Cui; Shanshan Zhuang; Huan Yu; Qi Zheng; Moshe Marom; Sitong Sheng; Guoqiang Zhang; Songnian Hu; Ruiqiang Li; Min Su
Journal:  Gastroenterology       Date:  2017-03-30       Impact factor: 22.682

5.  Squamous dysplasia--the precursor lesion for esophageal squamous cell carcinoma.

Authors:  Philip R Taylor; Christian C Abnet; Sanford M Dawsey
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2013-04       Impact factor: 4.254

6.  ETV5 overexpression promotes progression of esophageal squamous cell carcinoma by upregulating SKA1 and TRPV2.

Authors:  Ming-Chuang Sun; Kang Fang; Zhao-Xing Li; Yuan Chu; Ai-Ping Xu; Zi-Ying Zhao; Zhu-Yun Leng; Yun-Wei Zhang; Ze-Hua Zhang; Li Zhang; Tao Chen; Mei-Dong Xu
Journal:  Int J Med Sci       Date:  2022-06-21       Impact factor: 3.642

7.  PLCD1 Suppressed Cellular Proliferation, Invasion, and Migration via Inhibition of Wnt/β-Catenin Signaling Pathway in Esophageal Squamous Cell Carcinoma.

Authors:  Xin He; Fan Meng; Zhong-Jian Yu; Xiong-Jie Zhu; Ling-Yu Qin; Xiao-Ran Wu; Zhi-le Liu; Ying Li; Yan-Fang Zheng
Journal:  Dig Dis Sci       Date:  2020-03-31       Impact factor: 3.199

Review 8.  Cytokine-chemokine network driven metastasis in esophageal cancer; promising avenue for targeted therapy.

Authors:  Ajaz A Bhat; Sabah Nisar; Selma Maacha; Tatiana Correa Carneiro-Lobo; Sabah Akhtar; Kodappully Sivaraman Siveen; Nissar A Wani; Arshi Rizwan; Puneet Bagga; Mayank Singh; Ravinder Reddy; Shahab Uddin; Jean-Charles Grivel; Gyan Chand; Michael P Frenneaux; Mushtaq A Siddiqi; Davide Bedognetti; Wael El-Rifai; Muzafar A Macha; Mohammad Haris
Journal:  Mol Cancer       Date:  2021-01-04       Impact factor: 27.401

9.  Somatic mutant clones colonize the human esophagus with age.

Authors:  Iñigo Martincorena; Joanna C Fowler; Agnieszka Wabik; Andrew R J Lawson; Federico Abascal; Michael W J Hall; Alex Cagan; Kasumi Murai; Krishnaa Mahbubani; Michael R Stratton; Rebecca C Fitzgerald; Penny A Handford; Peter J Campbell; Kourosh Saeb-Parsy; Philip H Jones
Journal:  Science       Date:  2018-10-18       Impact factor: 47.728

10.  Genomic comparison of esophageal squamous cell carcinoma and its precursor lesions by multi-region whole-exome sequencing.

Authors:  Xi-Xi Chen; Qian Zhong; Yang Liu; Shu-Mei Yan; Zhang-Hua Chen; Shan-Zhao Jin; Tian-Liang Xia; Ruo-Yan Li; Ai-Jun Zhou; Zhe Su; Yu-Hua Huang; Qi-Tao Huang; Li-Yun Huang; Xing Zhang; Yan-Na Zhao; Jin-Ping Yun; Qiu-Liang Wu; Dong-Xin Lin; Fan Bai; Mu-Sheng Zeng
Journal:  Nat Commun       Date:  2017-09-12       Impact factor: 14.919

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.