BACKGROUND: Esophageal cancer (ESCA) is one of the most deadly malignancies in the world. Although the management and treatment of patients with ESCA have improved, the overall 5-year survival rate is still very poor. METHODS: The study aimed to identify potential key genes associated with the pathogenesis and prognosis of ESCA. In the study, integrated bioinformatics methods were used to screen differentially expressed genes (DEGs) between ESCA and normal tissue in the data set of gene expression profiles. The hub gene in DEGs was further analyzed by protein-protein interaction (PPI) network and survival analysis to explore its relationship with the pathogenesis and poor prognosis of ESCA. RESULTS: 134 up-regulated genes and 183 down-regulated genes were obtained in ESCA compared with normal tissues. Moreover, the PPI network was established with 176 nodes and 800 interactions. Ten hub genes (AURKA, CDC20, BUB1, TOP2A, ASPM, DLGAP5, TPX2, CENPF, UBE2C, and NEK2) were filtered out based on the degree value. Functional enrichment analysis indicated that a variety of extracellular related items and ECM-receptor interaction pathway were all correlated with the ESCA. CONCLUSIONS: The results of this study would provide some guidance for further study of diagnostic and prognostic biomarkers to promote ESCA treatment.
BACKGROUND: Esophageal cancer (ESCA) is one of the most deadly malignancies in the world. Although the management and treatment of patients with ESCA have improved, the overall 5-year survival rate is still very poor. METHODS: The study aimed to identify potential key genes associated with the pathogenesis and prognosis of ESCA. In the study, integrated bioinformatics methods were used to screen differentially expressed genes (DEGs) between ESCA and normal tissue in the data set of gene expression profiles. The hub gene in DEGs was further analyzed by protein-protein interaction (PPI) network and survival analysis to explore its relationship with the pathogenesis and poor prognosis of ESCA. RESULTS: 134 up-regulated genes and 183 down-regulated genes were obtained in ESCA compared with normal tissues. Moreover, the PPI network was established with 176 nodes and 800 interactions. Ten hub genes (AURKA, CDC20, BUB1, TOP2A, ASPM, DLGAP5, TPX2, CENPF, UBE2C, and NEK2) were filtered out based on the degree value. Functional enrichment analysis indicated that a variety of extracellular related items and ECM-receptor interaction pathway were all correlated with the ESCA. CONCLUSIONS: The results of this study would provide some guidance for further study of diagnostic and prognostic biomarkers to promote ESCA treatment.
Esophageal cancer (ESCA) ranks seventh in terms of incidence and sixth in mortality overall.[ The incidence of ESCA varies from region to region, with Eastern Asia having the highest incidence and China was referred to as the “esophageal cancer belt”.[ There are 2 common histological subtypes of ESCA, which are esophageal squamous cell carcinoma (ESCC) and esophageal adenocarcinoma (EA).[ The pathogenesis of ESCA is still unclear, but heavy drinking, smoking, poor nutritional status, low intake of fruits and vegetables, drinking high-temperature drinks, chewing betel nut, gastroesophageal reflux disease, overweight, and obesity can all be the cause of ESCA.[ ESCA is one of the most deadly malignant tumors in the world. Although the management and treatment of patients with ESCA have improved, the overall 5-year survival rate is still very poor. Its 5-year survival rate is about 15% to 25%. The best results are related to early diagnosis, commonly referred to as “early stage”, however, ESCA patients are often diagnosed in the advanced stage, mainly due to the lack of early clinical symptoms.[ In recent years, in order to improve the survival rate of patients with ESCA, multimodal neoadjuvant concurrent chemoradiotherapy (CCRT) has become more and more widely used in therapy.[ After receiving CCRT, the 5-year overall survival rate and recurrence survival rate of patients with ESCA were higher, however, not all patients with ESCA could respond to neoadjuvant radiotherapy and chemotherapy. According to reports, about 60% of patients have no response to neoadjuvant chemoradiation, reducing the success rate of surgery.[ Identifying susceptible genes and biomarkers can help predict a patient's response to treatment while improving patient survival. Therefore, more genetic information remains urgently needed to provide reference for precision medical treatment.[ Recently, cancer microarrays and high-throughput sequencing technologies have been frequently used to explore common biomarkers associated with cancer, as well as drugs that are directly used in cancer treatment, diagnosis, and prognosis, revealing key genetic or epigenetic variations in tumorigenesis.[ He[ performed an informatic analysis of EA chip data in the GEO database to find relevant key genes and pathways; Zhang[ used bioinformatics to analyze esophageal squamous cell carcinoma and found that 5 genes such as SPP1 are closely related to the pathogenesis and prognosis of ESCC. In this study, we tried to detect new indicators of poor prognosis in ESCA patients through integrated bioinformatics methods, and endeavor to provide potential therapeutic targets for this challenging disease.In this study, we selected 5 microarray datasets published in the GEO database to identify differentially expressed genes (DEGs) between ESCA tissues and normal tissues. Later, further functional enrichment analysis was performed on DEGs to find the main biological functions of their regulation. In addition, hub genes affecting the pathogenesis and prognosis of ESCA patients were identified by using protein–protein interaction (PPI) networks and survival analysis. The detailed workflow of the study is shown in Figure 1.
Figure 1
Workflow for identification of hub genes and pathways for ESCA.
Workflow for identification of hub genes and pathways for ESCA.
Materials and methods
Gene expression profile data
The gene expression profile datasets (GSE17351, GSE20347, GSE29001, GSE92396, GSE100942) were obtained from Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/).[ All included datasets met the following criteria:they employed tissue samples gathered from human esophageal cancer and corresponding adjacent or normal tissues;they included at least 10 samples.
Integrated analysis of microarray datasets
The matrix data of each GEO data set is normalized and log2 converted using the Limma[ software package in the R software, and the DEG in each microarray is also filtered by the Limma software package. Gene integration of DEGs identified from 5 data sets was performed using RobustRankAggreg.[ |log 2FC| ≥ 1 and adjust P value < .05 were considered statistically significant for the DEGs.
GO functional and KEGG pathway enrichment analysis
To illustrate the role of DEGs in gene function and signaling pathways, this study used the DAVID v 6.8[ (https://david.ncifcrf.gov/) database to perform gene ontology (GO) functional enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. The results of the GO functional enrichment analysis were visualized via OmicShare platform (http://www.omicshare.com/. Accessed 19 July 2018) and GOplot[ software package in the R software.
PPI network and module analysis
Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) 10.5(https://string-db.org/) is a database of known and predicted protein interactions that contains the direct and indirect association of proteins. The number and quality of interacting proteins can be set according to their confidence settings and it has a score for each protein interaction information. The higher the score, the higher the confidence of the protein interaction.[ The confidence score was set no less than 0.7 in this study.After this, the genes were introduced into Cytoscape 3.5.1 (http://www.cytoscape.org/)[ to obtain PPI network map. MCODE app in Cytoscape was used for cluster analysis, with the degree cutoff = 2, node score cutoff = 0.2 and K-Core = 2 were set as the advanced options.
Survival analysis of hub genes
Kaplan Meier-plotter (KM plotter, http://kmplot.com/analysis/) is an online tool for further understanding the molecular basis of disease and identifying biomarkers associated with survival. This tool is suitable for real-time meta-analysis of published cancer microarray datasets to identify survival-related biomarkers.[
Expression level analysis and correlation analysis of the hub genes
The Gene Expression Profiling Interactive Analysis (GEPIA) (http://gepia.cancer-pku.cn/index.html) is a web server that provides analysis of different tumor types or pathological stages, differential expression analysis of tumor/normal tissues, survival analysis, correlation analysis, and principal component analysis. It contains 9736 tumor samples and 8587 normal tissue samples covering 33 malignancies. Among them, we used 182 esophageal cancer tissue samples and 256 normal tissue samples to study the difference in expression of the same genes in ESCA and normal tissues.[
Results
In this study, we have adopted a total of 5 datasets including 60 cancer tissues and 61 normal tissues (Table 1). After analysis, we obtained 134 up-regulated genes and 183 down-regulated genes in ESCA compared with normal tissues (Supplementary file 1). Figure 2 shows the top 20 down- and up-regulated genes in the integrated microarray analysis.
Table 1
Information for the 5 GEO datasets included in the current study.
Figure 2
Volcano plot of gene expression profile data in ESCA samples and normal ones and heat map of differentially expressed gene (DEGs). (A) Volcano plot of GSE17351. (B) Volcano plot of GSE20347. (C) Volcano plot of GSE29001. (D) Volcano plot of GSE92396. (E) Volcano plot of GSE100942. (F) Heat map of differentially expressed genes. Green represents a lower expression level, red represents higher expression levels, and white represents that there is no different expression amongst the genes. Each column represents one dataset and each row represents 1 gene. The number in each rectangle represents the normalized gene expression level. The gradual color ranged from green to red represents the changing process from down-regulation to up-regulation.
Information for the 5 GEO datasets included in the current study.Volcano plot of gene expression profile data in ESCA samples and normal ones and heat map of differentially expressed gene (DEGs). (A) Volcano plot of GSE17351. (B) Volcano plot of GSE20347. (C) Volcano plot of GSE29001. (D) Volcano plot of GSE92396. (E) Volcano plot of GSE100942. (F) Heat map of differentially expressed genes. Green represents a lower expression level, red represents higher expression levels, and white represents that there is no different expression amongst the genes. Each column represents one dataset and each row represents 1 gene. The number in each rectangle represents the normalized gene expression level. The gradual color ranged from green to red represents the changing process from down-regulation to up-regulation.
Functional enrichment analysis of DEGs
DEGs were put into David for GO and KEGG functional enrichment analysis. The functional enrichment analysis results were shown in Figure 3. According to KEGG pathway enrichment analysis, the DEGs were mainly involved in Amoebiasis and ECM–receptor interaction pathway (Fig. 4). Likewise, in GO functional enrichment analysis, 21 GO entries satisfy FDR and P values both less than .05, most of which are biological processes, followed by molecular functions and cellular components. The first 20 entries are extracellular space, extracellular region, extracellular exosome, collagen catabolic process, extracellular matrix organization, proteinaceous extracellular matrix, extracellular matrix, keratinocyte differentiation, keratinization, peptide cross-linking, collagen fibril organization, collagen trimer, cornified envelope, positive regulation of cell proliferation, serine-type endopeptidase inhibitor activity, extracellular matrix disassembly, midbody, serine-type endopeptidase activity, epidermis development, and extracellular matrix structural constituent.
Figure 3
Graph of the functional enrichment analysis results. (A) KEGG pathway and DEGs enriched in pathways. The yellow triangle represents the KEGG pathways. The red dots are the common DEGs enriched in the 2 pathways, and the pink dots represent the other DEGs enriched in the pathways. (B) Top 20 of pathway GO enrichment. (C) The circular map of the DEGs distribution of the top 5 GO pathway. The red indicates the up-regulated gene and the blue indicates the down-regulated gene. (D) DEGs clustering map of the first 5 GO pathways. In the figure, from the inside to the outside are gene clustering, |log 2FC| and GO pathway.
Figure 4
Graph of the ECM–receptor interaction pathway (17). The orange nodes show the DEGs in the pathway and the green nodes indicate other genes.
Graph of the functional enrichment analysis results. (A) KEGG pathway and DEGs enriched in pathways. The yellow triangle represents the KEGG pathways. The red dots are the common DEGs enriched in the 2 pathways, and the pink dots represent the other DEGs enriched in the pathways. (B) Top 20 of pathway GO enrichment. (C) The circular map of the DEGs distribution of the top 5 GO pathway. The red indicates the up-regulated gene and the blue indicates the down-regulated gene. (D) DEGs clustering map of the first 5 GO pathways. In the figure, from the inside to the outside are gene clustering, |log 2FC| and GO pathway.Graph of the ECM–receptor interaction pathway (17). The orange nodes show the DEGs in the pathway and the green nodes indicate other genes.
PPI network analysis and module analysis
The PPI network was constructed in the STRING 10.5 database including 176 nodes and 800 interactions. As shown in Figure 5, the size of the nodes was proportional to the degree value. The top 10 genes with the greater degree were considered to be hub genes. Besides, in order to detect important cluster modules in this PPI network, we performed module analysis and obtained the first 4 modules of high scores (Fig. 6). Module 1 has the highest number of nodes, edges, and scores, and may be the main functional module.
Figure 5
PPI network of DEGs in ESCA. Purple represents the hub genes in DEGs, and the depth of the color and size of the nodes are proportional to the degree value. Yellow dot represents the gene which degree value is greater than average and red represents other genes.
Figure 6
Four significant modules identified from the PPI network. (A) Module 1 contained 28 nodes and 355 interactions, MCODE score = 26; (B) module 2 contained 11 nodes and 55 sides, MCODE score = 11; (C) module 3 contained 9 nodes and 36 interactions, MCODE, score = 9; (D) module 4 contained 9 nodes and 36 interactions, MCODE, score = 9.
PPI network of DEGs in ESCA. Purple represents the hub genes in DEGs, and the depth of the color and size of the nodes are proportional to the degree value. Yellow dot represents the gene which degree value is greater than average and red represents other genes.Four significant modules identified from the PPI network. (A) Module 1 contained 28 nodes and 355 interactions, MCODE score = 26; (B) module 2 contained 11 nodes and 55 sides, MCODE score = 11; (C) module 3 contained 9 nodes and 36 interactions, MCODE, score = 9; (D) module 4 contained 9 nodes and 36 interactions, MCODE, score = 9.
The Kaplan Meier-plotter and expression level of hub genes correlation and correlated analysis
The Kaplan Meier-plotter was adopted to analyze the prognostic information of 10 hub genes in which ESCA were divided into EA and ESCC. It was found that the high expression of AURKA (HR = 2.75 (1.15–6.59), P = .018), BUB1 (HR = 2.01 (1.02–3.94), P = .038), DLGAP5 (HR = 2.24 (1.17–4.26), P = .012), TPX2 (HR = 2.78 (1.44–5.36), P = .0015), UBE2C (HR = 3.73, (1.78–7.81), P = 2e−04), and NEK2 (HR = 2.76 (1.07–7.15), P = .029) were associated with the worse overall survival(OS) of for EA patients (Fig. 7), as well as the high expression of AURKA (HR = 0.32 (0.13–0.82), P = .013), BUB1 (HR = 0.35 (0.15–0.81), P = .011), TOP2A (HR = 0.31 (0.11–0.86), P = .018), ASPM (HR = 0.35 (0.15–0.84), P = .014), DLGAP5 (HR = 0.23 (0.1–0.53), P = .00019), TPX2 (HR = 0.33 (0.14–0.79), P = .0094), CENPF (HR = 0.41 (0.18–0.95), P = .031), and NEK2(HR = 0.4 (0.18–0.92), P = .025) were relevant to the worse OS for the ESCC patients (Fig. 8). Furthermore, GEPIA was devoted to analyze the different expression of hub genes in cancer tissues and normal tissues and 10 hub genes were definitely highly expressed in ESCA cancer tissues (Fig. 9). Moreover, the correlation between hub genes was analyzed by GEPIA. The results showed that remaining 9 hub genes were strongly associated with AURKA in the expression of ESCA cells (Fig. 10).
Figure 7
Prognostic roles of 10 hub genes in the EA patients. Survival curves are plotted for EA cancer patients. (A) ASPM; (B) AURKA; (C) BUB1; (D) CDC20; (E) CENPF; (F) DLGAP5; (G) NEK2; (H) TOP2A; (I) TPX2; and (J) UBE2C.
Figure 8
Prognostic roles of 10 hub genes in the ESCC patients. Survival curves are plotted for ESCC cancer patients. (A) ASPM; (B) AURKA; (C) BUB1; (D) CDC20; (E) CENPF; (F) DLGAP5; (G) NEK2; (H) TOP2A; (I) TPX2; and (J) UBE2C.
Figure 9
Analysis of 10 hub genes expression level in human ESCA. The red and gray boxes represent cancer and normal tissues, respectively. (A) ASPM; (B) AURKA; (C) BUB1; (D) CDC20; (E) CENPF; (F) DLGAP5; (G) NEK2; (H) TOP2A; (I) TPX2; and (J) UBE2C.
Figure 10
Correlation analysis of 9 hub genes and AURKA in ESCA. (A) AURKA; (B) BUB1; (C) CDC20; (D) CENPF; (E) DLGAP5; (F) NEK2; (G) TOP2A; (H) TPX2; and (I) UBE2C.
Prognostic roles of 10 hub genes in the EA patients. Survival curves are plotted for EA cancerpatients. (A) ASPM; (B) AURKA; (C) BUB1; (D) CDC20; (E) CENPF; (F) DLGAP5; (G) NEK2; (H) TOP2A; (I) TPX2; and (J) UBE2C.Prognostic roles of 10 hub genes in the ESCC patients. Survival curves are plotted for ESCC cancerpatients. (A) ASPM; (B) AURKA; (C) BUB1; (D) CDC20; (E) CENPF; (F) DLGAP5; (G) NEK2; (H) TOP2A; (I) TPX2; and (J) UBE2C.Analysis of 10 hub genes expression level in human ESCA. The red and gray boxes represent cancer and normal tissues, respectively. (A) ASPM; (B) AURKA; (C) BUB1; (D) CDC20; (E) CENPF; (F) DLGAP5; (G) NEK2; (H) TOP2A; (I) TPX2; and (J) UBE2C.Correlation analysis of 9 hub genes and AURKA in ESCA. (A) AURKA; (B) BUB1; (C) CDC20; (D) CENPF; (E) DLGAP5; (F) NEK2; (G) TOP2A; (H) TPX2; and (I) UBE2C.
Discussion
ESCA is one of the most serious malignant gastrointestinal tumors, although recent advances in multimodal treatment have shown promise in reducing morbidity and improving survival, this is not enough.[ Therefore, understanding the pathogenesis of ESCA is crucial to improve survival and reduce morbidity. The rapid development of microarray technology, making breakthroughs in identifying cancer marker genes, and strengthening disease prognosis.[In the present study, a total of 317 DEGs were screened, consisting of 134 up-regulated genes and 183 down-regulated genes. Most of these DEGs were enriched in pathways closely relevant to cancer, such as extracellular space, extracellular region, extracellular exosome, and ECM–receptor interaction pathway. Among all DEGs, the top 10 in the degree value were the hub genes. Additionally, all hub genes were up-regulated in ESCA cancer tissues compared with normal tissues and closely related to the first hub gene AURKA.Aurora kinase A (AURKA) – is a serine/threonine kinase that is essential for normal mitotic spindle formation and centrosome maturation as well as separation during cell division. When an abnormality occurs in AURKA, it often leads to the functional defect of the centrosome in mitosis and the disorder of the bipolar spindle, which conduces to the asymmetric separation of chromosomes during division, induces chromosomal instability, forms aneuploidy, and causes abnormal mutations in cells. At the same time, it begins to transform cancer and form a tumor.[ The AURKA gene is clearly overexpressed in several types of cancer, including ESCA.[ It is worth noting that AURKA overexpression is significantly associated with advanced cancer and poor prognosis.[ Besides, a variety of AURKA inhibitors have been developed for the treatment of cancers of the digestive system.[ CDC20 (cell division cycle 20 homologue) is overexpressed in quantities of humantumors and also is currently a potential target for the treatment of multiple cancers. It mainly regulates cell cycle progression, and cells with CDC20 mutants block cell division and stop the progression of the cell cycle to late and chromosome separation.[ BuB1 is the only Mad1 centromere receptor in yeast, which contributes to the localization of Mad1, thereby loading MAD1 and MAD2 onto CDC20, catalyzing the formation of mitotic checkpoint complex (MCC).[ As with MAD2, overexpression of BUB1 could be a consequence of loss of normal protein functioning.[ Topoisomerase IIα (TOP2A) is a gene-encoding enzyme involved in DNA replication and correlates with the response of anthracyclines to various cancers.[ Its overexpression is common in a diversity of cancers including ESCA. Besides, TOP2A inhibitors have been used in a variety of solid tumors such as small cell lung cancer.[ ASPM regulates the duration of mitosis and passage through the G1 restriction point and it also shows overexpression in a range of cancers.[ It has been experimentally proven that knocking down ASPM can inhibit tumor growth and lead to apoptosis.[ Disc large (Drosophila) homolog-associated protein 5 (DLGAP5) is a mitotic spindle protein that promotes the formation of tubulin polymers, resulting in tubulin fragments around the ends of microtubules. DLGAP5 contains the guanylate kinase-associated protein (GKAP) domain, which is conserved among various species.[ In addition, the involvement of DLGAP5 in the formation and development of cancer affects cell migration, invasion, and adhesion ratio, suggesting that the gene and its products may be potential therapeutic targets.[ TPX2 acts as a microtubule-associated protein (MAP) to control the nucleation, function, and interaction of microtubules with other cellular structures. Studies have shown that improper expression of TPX2 leads to chromosomal instability, resulting in centrosome expansion and development into aneuploidy which is highly correlated with the occurrence and development of various tumors.[ As a downstream regulatory gene of MMP13, CENPF plays an important role in cell cycle, mitosis, and regulation of PLK1 activity in G2/M transition.[ Furthermore, different kinds of experiments have confirmed that the high expression of CEPNF is a prognostic indicator of survival and metastasis in ESCA patients.[ Recently, the ubiquitin-conjugating enzyme E2C (UBE2C), a prominent tumor biomarker candidate, is considered to be a key player in the cell cycle progression. And by qRT-PCR analysis and immunohistochemical analysis, UBE2C protein was up-regulated in ESCA tissues.[ Never in mitosis (NIMA)-associated kinase 2 (NEK2) plays a key role in the regulation of mitosis. Due to its abnormal overexpression and malignant transformation in a variety of humancancers it has become a key target for the treatment of cancer.[In enrichment analysis, GO enrichment analysis revealed that a series of extracellular related items were closely associated with ESCA. The interaction of multiple genes with the extracellular matrix combined with poor signaling can also enhance the metastasis of ESCA cells.[ Exosomes is an important component of the tumor microenvironment and plays a complex role in the progression and treatment of ESCA.[ On the other hand, in KEGG enrichment analysis, ECM–receptor interaction pathway affects cellular activities such as adhesion, migration, differentiation, proliferation, and apoptosis.[ There have also been related RNA array studies in the early stage to confirm the correlation between ECM–receptor interaction pathway and ESCA.[The limitations of our study were as follows: Firstly, our research is based on the data already in the database and research results need to be verified by corresponding experimental studies. For example, chemistry experiments comparing expression in ESCA tissues and normal tissues, should be conducted to confirm the target genes and potential key functional enrich pathway. Secondly, we obtained data in the GEO database, the sample size is small and its quality cannot be verified. Finally, our study has some limitations because it focuses on genes that are typically identified as significant changes in multiple data sets, without regard to gender, age, tumor classification, and staging.
Conclusion
In conclusion, we have identified a total of 10 hub genes associated with the development and poor prognosis of ESCA in this integrated bioinformatics analysis. However, since our research is based on data analysis, further experiments are needed to confirm. At the same time, we hope that our research results can have certain guiding significance for the future prognosis and treatment of ESCA.
Author contributions
ZW and WJR conceived and designed the study. JSS, MZQ, and ZXM collected the data. LXK, LSY, GSY, and NMW performed the data analysis, and ZW and ZJY wrote the manuscript. All authors were responsible for reviewing data. All authors read and approved the final manuscript.
Authors: Man-Li Luo; Zhuan Zhou; Lichao Sun; Long Yu; Lixin Sun; Jun Liu; Zhihua Yang; Yuliang Ran; Yandan Yao; Hai Hu Journal: Cancer Lett Date: 2018-02-21 Impact factor: 8.679
Authors: Antonio Palumbo; Nathalia Meireles Da Costa; Marco De Martino; Romina Sepe; Simona Pellecchia; Vanessa Paiva Leite de Sousa; Pedro Nicolau Neto; Cleber Dario Kruel; Anke Bergman; Luiz Eurico Nasciutti; Alfredo Fusco; Luis Felipe Ribeiro Pinto Journal: Oncotarget Date: 2016-10-04