Feng Sun1, Chen Zhang1, Shichao Ai1, Zhijian Liu1, Xiaofeng Lu1. 1. Department of Gastrointestinal Surgery, Nanjing Drum Tower Hospital, the Affiliated Hospital of Nanjing University Medical School, Nanjing, China.
Abstract
BACKGROUND: Gastric cancer (GC) is one of the most common cancer worldwide. With the high rates of metastasis and recurrence, its overall survival remains poor at the present time. Hence, seeking new potential therapeutic targets of GC is important and urgent. METHODS: We retrieved the gene expression profiles and clinical data from The Cancer Genome Atlas (TCGA) datasets. After screening differentially expressed genes (DEGs), we carried out the survival analysis for overall survival to pick out robust DEGs. To explore the role of these robust DEGs, we conducted Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) functional enrichment analyses. Subsequently, protein interactions network was constructed utilizing the Search Tool for the Retrieval of Interacting Genes (STRING) database. We then presented the module analysis and filtered out hub genes by the Cytoscape software. Finally, Kaplan-Meier analysis was utilized to demonstrate the prognostic role of these hub genes. RESULTS: According to the gene expression profiles of TCGA and the survival analysis, 238 robust DEGs were filtered out, consisting of 140 up-regulated and 98 down-regulated genes. The up-regulated DEGs were mainly enriched in systemic lupus erythematosus, cytokine activity, and alcoholism, while down-regulated DEGs were mainly enriched in steroid hormone receptor activity, immune response, and metabolism. Through the construction of the protein-protein interaction (PPI) network, eight hub genes were finally screened out, including CCR8, HIST1H3B, HIST1H2AH, HIST1H2AJ, NPY, HIST2H2BF, GNG7, and CCL25. CONCLUSIONS: Our study picked out eight hub genes, which might be potential prognostic biomarkers for GC and even be treatment targets for clinical implication in the future. 2021 Translational Cancer Research. All rights reserved.
BACKGROUND: Gastric cancer (GC) is one of the most common cancer worldwide. With the high rates of metastasis and recurrence, its overall survival remains poor at the present time. Hence, seeking new potential therapeutic targets of GC is important and urgent. METHODS: We retrieved the gene expression profiles and clinical data from The Cancer Genome Atlas (TCGA) datasets. After screening differentially expressed genes (DEGs), we carried out the survival analysis for overall survival to pick out robust DEGs. To explore the role of these robust DEGs, we conducted Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) functional enrichment analyses. Subsequently, protein interactions network was constructed utilizing the Search Tool for the Retrieval of Interacting Genes (STRING) database. We then presented the module analysis and filtered out hub genes by the Cytoscape software. Finally, Kaplan-Meier analysis was utilized to demonstrate the prognostic role of these hub genes. RESULTS: According to the gene expression profiles of TCGA and the survival analysis, 238 robust DEGs were filtered out, consisting of 140 up-regulated and 98 down-regulated genes. The up-regulated DEGs were mainly enriched in systemic lupus erythematosus, cytokine activity, and alcoholism, while down-regulated DEGs were mainly enriched in steroid hormone receptor activity, immune response, and metabolism. Through the construction of the protein-protein interaction (PPI) network, eight hub genes were finally screened out, including CCR8, HIST1H3B, HIST1H2AH, HIST1H2AJ, NPY, HIST2H2BF, GNG7, and CCL25. CONCLUSIONS: Our study picked out eight hub genes, which might be potential prognostic biomarkers for GC and even be treatment targets for clinical implication in the future. 2021 Translational Cancer Research. All rights reserved.
Entities:
Keywords:
Gastric cancer (GC); The Cancer Genome Atlas (TCGA); bioinformatics analysis; hub genes; survival analysis
Gastric cancer (GC) is the fifth most common cancer worldwide, with almost one million new cases every year. Over half of them occur in Eastern Asia, China particularly (1). Although the incidence of GC has declined and the treatment of GC has seen dramatic progress over the years, it remains the third leading cause of cancer-related death worldwide (2). So far, surgical resection is the only way to cure GC. However, 20% of patients lost their chance of surgery at the first clinic visit because of distant metastasis (3). In those cases, chemotherapy becomes the main treatment they can rely on. At present, chemotherapy for GC is still dominated by conventional chemotherapeutic drugs such as platinum and fluorouracil. The molecularly targeted drugs remain scarce (4). Therefore, it is warranted to learn the underlying biological variation for developing more efficient therapeutic strategies.Currently, carcinoembryonic antigen (CEA), cancer antigen 19-9 (CA19-9), and cancer antigen 72-4 (CA72-4) are the most common diagnostic markers for GC (5-7). However, the appliance of these biomarkers cannot meet the clinical requirements because of their low sensitivity (8,9). As for the treatment of GC, several tumor-specific proteins have been identified as therapeutic targets, including EGFR, HER-2, VEGFR, mTOR, PD-1, and PD-L1 (10-12). Still to this day, only three molecularly targeted drugs (trastuzumab, ramucirumab, and pembrolizumab) have been approved and marketed for GC treatment worldwide. Besides, a large number of molecules have been reported to be related to clinical outcomes of GC, including cancer-associated genes and non-coding RNAs (13-15). However, there are still no reliable prognosis biomarkers due to the heterogeneity of GC (16). Hence, it is meaningful to seek novel and reliable biomarkers for GC.Recently, with the advancement of sequencing platforms and the establishment of public databases such as The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO), many bioinformatics analysis studies on GC have been published in these years. For example, using GEO data sets, Zheng et al. identified 7 hub genes that affect the prognosis of GC, including COL4A1, COL1A2, COL6A3, VCAN, THBS2, TIMP1, and SERPINH1 (17). Similarly, another study identified 9 key genes (COL1A1, CDKN3, COL1A2, COL3A1, NDC80, TPX2, TOP2A, TIMP1, and CEP55) correlated with the pathogenesis of GC (18). Beyond these differentially expressed genes, non-coding RNAs, including lncRNAs (long non-coding RNAs) and miRNAs (microRNAs), were also reported to be related to the pathogenesis and prognosis of GC (19-22).Although an increasing number of integrated bioinformatics analyses on GC have appeared recently, the results differed between these studies because of the different analysis methods. In our study, we focused on DEGs (differentially expressed genes) which affected the prognosis of GC and obtained eight hub genes. This study might provide potential prognostic biomarkers and treatment targets for GC.We present the following article in accordance with the MDAR checklist (available at https://dx.doi.org/10.21037/tcr-20-3540).
Methods
Data procession
We downloaded the gene expression profiling datasets from TCGA database. The selection criteria were that samples contained complete RNA sequencing data and clinical information. According to the selection criteria, 407 samples were involved in this study, which was composed of 375 GC primary tumor specimens and 32 solid normal tissue specimens. The data were analyzed by the DeSeq2 package and edgeR package in the R language (23,24). Differentially expressed genes (DEGs) were defined by |log2FC| ≥1, and adjust P value <0.05. Overlapping DEGs between these two kinds of algorithms were retained for further analyses.
Survival analysis
A total of 367 GC patients from TCGA database were enrolled in survival analysis, including 234 (63.8%) men and 133 (36.2%) women. The median age was 67 (range, 35–90 years). Survival outcomes were calculated from the date of surgery to the date of last follow-up or the date of death. Follow-up data were downloaded from the TCGA database. Before statistical analysis, DeSeq2 package was used to transform the raw dataset to the normalized gene expression level. According to the median expression of a specific gene, the patients were divided into two groups. Log-rank test and Kaplan-Meier curve were conducted by the survival package in the R language to evaluate the prognostic value of all overlapping DEGs (25). DEGs which showed a significant correlation with overall survival (P<0.05) were referred to robust DEGs.
Functional enrichment analysis
First, we performed Gene Ontology (GO) analysis on the Database for Annotation, Visualization and Integrated Discovery (DAVID, version 6.8) to elucidate the biological function of these robust DEGs (26,27). BP (Biological process), MF (molecular function), and CC (cellular component) were performed, respectively, P<0.05 and count >2 were considered as statistically significant. Then, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were carried out utilizing the KEGG Orthology-Based Annotation System (KOBAS, version 3.0) (28). Corrected P<0.05 was defined as statistically significant.
PPI network and module screening
For robust DEGs, the Search Tool for the Retrieval of Interacting Genes (STRING, version 10.5) online database was utilized to build protein-protein interaction (PPI) network. Then, we presented it by Cytoscape software (version 3.6.1). Also, the Molecular Complex Detection (MCODE) algorithm was applied to screen neighborhoods of densely connected proteins.
Identification of hub genes
CytoHubba in the Cytoscape software was utilized to screen hub genes among these robust DEGs (29). Both of MCC (maximal clique centrality) and DMNC (density of maximum neighborhood component) were computed, and the overlapping genes were filtered out as hub genes of GC.
Statistical analysis
Survival analysis was performed by the Survival package in the R software. Survival plots were showed by the Kaplan-Meier method, and the significance was calculated by the log-rank test. P<0.05 was defined as statistically significant.
Ethical statement
The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). All information from TCGA is available and free for public, so the agreement of the medical ethics committee board is not necessary.
Results
Identification of DEGs
407 samples from TCGA database were enrolled in this study, including 375 GC primary tumor samples and 32 solid normal tissue samples. We screened DEGs using the DeSeq2 package and edgeR package, respectively. The cut-off criteria were corrected P<0.05 and |log2FC| >1. Overall, 5,770 DEGs were screened by DeSeq2 package, including 3,118 up-regulated and 2,652 down-regulated genes. A total of 5,991 DEGs were screened by edgeR package, including 3,479 up-regulated and 2,512 down-regulated genes. showed the volcano plots of DEGs for each method. We further intersected the results and obtained 5,468 overlapping DEGs, including 2,994 up-regulated and 2,474 down-regulated genes ().
Figure 1
Identification of DEGs. (A) The volcano plots of the DEGs by DeSeq2. (B) The volcano plots of the DEGs by edgeR. (C) Venn diagrams of the DEGs between the DeSeq2 and edgeR. DEG, differentially expressed gene.
Identification of DEGs. (A) The volcano plots of the DEGs by DeSeq2. (B) The volcano plots of the DEGs by edgeR. (C) Venn diagrams of the DEGs between the DeSeq2 and edgeR. DEG, differentially expressed gene.
Identification of robust DEGs associated with overall survival of GC
Log-rank test for all the overlapping DEGs was performed to explore the robust DEGs that were associated with the survival performance of GC patients. The cut-off criteria were P<0.05. Finally, we obtained 238 significantly robust DEGs, including 140 up-regulated and 98 down-regulated genes (Table S1).
GO and KEGG pathway analysis
GO analysis and KEGG analysis were performed to explore the potential biological function of the above robust DEGs. The GO analysis involved three categories: MF, BP, and CC. In MF category, the up-regulated DEGs were enriched in cytokine activity and protein heterodimerization activity. As for BP, the up-regulated DEGs were significantly enriched in nucleosome assembly, chemokine-mediated signaling pathway, skeletal system development, and immune response. For CC, the up-regulated DEGs were enriched in the nucleosome, centromeric region, extracellular space, and extracellular region (). About down-regulated DEGs, they were enriched in steroid hormone receptor activity, lipid binding, and kinase binding in the MF category. In BP category, the down-regulated DEGs were significantly enriched in steroid hormone-mediated signaling pathway, response to calcium ion, response to the drug, and monocyte chemotaxis ().
Figure 2
Functional enrichment analysis of robust DEGs. (A) GO analysis of up-regulated DEGs. (B) GO analysis of down-regulated DEGs. (C) KEGG analysis of up-regulated DEGs. (D) KEGG analysis of down-regulated DEGs. DEG, differentially expressed gene; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes.
Functional enrichment analysis of robust DEGs. (A) GO analysis of up-regulated DEGs. (B) GO analysis of down-regulated DEGs. (C) KEGG analysis of up-regulated DEGs. (D) KEGG analysis of down-regulated DEGs. DEG, differentially expressed gene; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes.As for KEGG pathway analysis, the most significant pathways of the up-regulated DEGs were systemic lupus erythematosus, cytokine-cytokine receptor interaction, alcoholism, and jak-STAT signaling pathway (). For down-regulated DEGs, the most significantly enriched pathways were the intestinal immune network for IgA production, retinol metabolism, metabolism of xenobiotics by cytochrome P450, bile secretion, and chemical carcinogenesis ().
PPI network and modules analysis
To characterize the interaction between the robust DEGs, we constructed the PPI network using STRING database and presented it by Cytoscape. Overall, there were 82 edges and 140 nodes in this network, including 48 up-regulated and 34 down-regulated genes (). Subsequently, three key modules were extracted utilizing MCODE (). Module 1 was mainly concerned with the chemokine signaling pathway. Module 2 was mainly concerned with nucleosome assembly. Besides, module 3 involved ECM-receptor interaction.
Figure 3
PPI network and hub clustering modules. (A) The PPI network of robust DEGs; (B) module 1; (C) module 2; (D) module 3. Red circles represent up-regulated genes and blue circles represent down-regulated genes. PPI, protein-protein interaction.
PPI network and hub clustering modules. (A) The PPI network of robust DEGs; (B) module 1; (C) module 2; (D) module 3. Red circles represent up-regulated genes and blue circles represent down-regulated genes. PPI, protein-protein interaction.
Screening for hub genes and survival analysis of hub genes
We further screened the hub genes among robust DEGs with cytoHubba. MCC and DMNC were computed to identify hub genes. The top 10 genes were selected based on the two algorithms. Then, we intersected the results by Venn diagram and obtained 8 overlapping hub genes, including CCR8, HIST1H3B, HIST1H2AH, HIST1H2AJ, NPY, HIST2H2BF, GNG7, and CCL25 ().
Figure 4
Prognostic value of eight hub genes. (A) Venn diagram of overlapped hub genes based on two methods. (B-I) Kaplan-Meier survival analysis of CCR8, HIST1H3B, HIST1H2AH, HIST1H2AJ, NPY, HIST2H2BF, GNG7, and CCL25.
Prognostic value of eight hub genes. (A) Venn diagram of overlapped hub genes based on two methods. (B-I) Kaplan-Meier survival analysis of CCR8, HIST1H3B, HIST1H2AH, HIST1H2AJ, NPY, HIST2H2BF, GNG7, and CCL25.Besides, the survival package in R language was utilized to perform the Kaplan-Meier analysis. As shown in , patients with higher expression levels of HIST1H3B, HIST1H2AH, HIST1H2AJ, HIST2H2BF, and CCL25 show worse OS, while those with lower expression levels of CCR8, NPY, and GNG7 show worse OS.
Discussion
Owing to the high heterogeneity of GC, it still lacks effective therapeutic targets. Although a large number of studies had been performed to identify the driving genes of GC, there were still no reliable biomarkers and drug targets up to now. In recent years, public databases, such as TCGA and GEO, provided a platform to screen the molecular targets of the tumor. Bioinformatics analysis studies of GC have been increasingly reported (17-20). However, because of the limited sample size and impertinent methods, the plentiful DEGs might show no biological roles and clinical significance. Therefore, we introduced the survival analysis at the beginning of our study to obtain clinically significant DEGs. According to the gene expression profiles of TCGA and the survival analysis, 238 robust DEGs were filtered out, consisting of 140 up-regulated and 98 down-regulated genes. The up-regulated DEGs were mainly enriched in systemic lupus erythematosus, cytokine activity, and alcoholism, while down-regulated DEGs were mainly enriched in steroid hormone receptor activity, immune response, and metabolism. Through the construction of the PPI network, eight hub genes were finally screened out, including CCR8, HIST1H3B, HIST1H2AH, HIST1H2AJ, NPY, HIST2H2BF, GNG7, and CCL25.CCR8 and CCL25 belong to the CXC subfamily of chemokines, which are important for cell migration. CCR8 (Chemokine (C-C motif) receptor 8) is preferentially expressed in the thymus, participating in regulating monocyte chemotaxis and thymic cell apoptosis. Villarreal et al. have shown that CCR8 is elevated in tumor-resident Tregs, and mAb therapy targeting CCR8 significantly inhibited tumor growth in the colorectal cancer model (30). Additionally, CCR8 was shown to enhance cell migration, invasion, and EMT in bladder cancer (31). In GC, a recent study elucidated that CCR8 was highly expressed and associated with the OS of GC patients (32). CCL25 [Chemokine (C-C motif) ligand 25], along with its specific receptor CCR9 (chemokine receptor 9), has been well reported to regulate gut mucosal immunity (33). Moreover, several studies demonstrated that CCL25/CCR9 was implicated in proliferation, migration, and anti-apoptosis of cancer cells (34,35). Interestingly, Chen et al. revealed that CCL25/CCR9 promoted tumor growth in early-stage of CRC, while suppressed cell invasion and metastasis in the late-stage (36). Further studies are warranted for elucidating the role of CCL25 in GC.HIST1H3B, HIST1H2AH, HIST1H2AJ, and HIST2H2BF belong to core histones. Two copies of each of the core histones compose histone octamer, which further form nucleosomes along with approximately 146 bp of DNA. It is well established that histone modifications played a significant role in tumor initiation and progression (37,38). However, the effects of histone expression levels alteration on tumors remain largely unknown (39,40). In the present study, we investigated that HIST1H3B, HIST1H2AH, HIST1H2AJ, and HIST2H2BF were up-regulated in GC and were related to the poor prognosis of GC patients.Neuropeptide Y (NPY) is a 36 amino acids peptide, which is widely produced in nervous systems. Beyond regulating several physiological processes, such as cognitive function and cardiovascular regulation, NPY was also revealed to promote proliferation, migration, and vascularization and in several tumors (41-45). Concerning GC, a recent bioinformatics analysis based on GEO database showed that NPY was elevated in GC and was correlated with poor survival (17). However, in our analysis, NYP was down-regulated in GC and high expression of NYP was related to better survival.G protein γ subunit 7 (GNG7) is a member of the large G protein γ family. There have been compelling suggestions that GNG7 functions as a tumor suppressor gene in many tumors, including pancreatic cancer, esophageal cancer, and renal cell cancer (46-48). Consistent with the previous studies, we elucidated that GNG7 was down-regulated in GC and negatively correlated with overall survival.In the present study, KEGG functional enrichment analysis indicated that SLE (systemic lupus erythematosus) might be associated with GC. A cohort study observed a significantly higher risk of GC in patients with SLE (49). The correlation was further confirmed by a recent meta-analysis (50). The underlying mechanisms why SLE patients were more likely to develop GC remain unclear. Various types of drugs used in SLE treatment, such as TNF-a inhibitors, corticosteroids, and nonsteroid anti-inflammatory drugs were speculated to be implicated in this process (51,52).We acknowledged some potential limitations in this study. First, it was a bioinformatics study using TCGA datasets. The lack of clinical validity remains a drawback of our study. Second, the sample size was small and the majority of patients enrolled in this study were white, which might be ethnically homogeneous. Therefore, more studies are warranted for verifying these findings in real-world clinical practice.
Conclusions
Our present study disclosed eight hub genes and several molecular pathways of GC. These findings provided prognostic biomarkers and potential treatment targets for GC. Nevertheless, further molecular biological experiments are warranted to verify the function of the hub genes in GC.
Authors: Sasha Bernatsky; Rosalind Ramsey-Goldman; Jeremy Labrecque; Lawrence Joseph; Jean-Francois Boivin; Michelle Petri; Asad Zoma; Susan Manzi; Murray B Urowitz; Dafna Gladman; Paul R Fortin; Ellen Ginzler; Edward Yelin; Sang-Cheol Bae; Daniel J Wallace; Steven Edworthy; Soren Jacobsen; Caroline Gordon; Mary Anne Dooley; Christine A Peschken; John G Hanly; Graciela S Alarcón; Ola Nived; Guillermo Ruiz-Irastorza; David Isenberg; Anisur Rahman; Torsten Witte; Cynthia Aranow; Diane L Kamen; Kristjan Steinsson; Anca Askanase; Susan Barr; Lindsey A Criswell; Gunnar Sturfelt; Neha M Patel; Jean-Luc Senécal; Michel Zummer; Janet E Pope; Stephanie Ensworth; Hani El-Gabalawy; Timothy McCarthy; Lene Dreyer; John Sibley; Yvan St Pierre; Ann E Clarke Journal: J Autoimmun Date: 2013-02-12 Impact factor: 7.094