Haitao Chen1, Jun Luo2,3, Jianchun Guo2,3. 1. Department of Orthopedic Surgery, Zhongnan Hospital of Wuhan University, Wuhan, Hubei, China (mainland). 2. Department of Pathology, Zhongnan Hospital of Wuhan University, Wuhan, Hubei, China (mainland). 3. Wuhan University Center for Pathology and Molecular Diagnostics, Wuhan, Hubei, China (mainland).
Abstract
BACKGROUND We constructed a predictive risk model of esophageal carcinoma (EC) for prognostic prediction. MATERIAL AND METHODS Immune genes and the expression data were downloaded from the ImmPort database and The Cancer Genome Atlas database. Univariate analysis, Lasso regression, and multivariate analysis were applied to screen the ultimately included prognostic immune genes for the model based on the training cohort. Survival analysis and receiver operating characteristic (ROC) curve were applied to evaluate the model. The model was further validated in the testing and entire cohorts, and the clinical utility of the model and its ability to assess the subtypes of EC were evaluated in the entire cohort. RESULTS We detected 297 differentially expressed immune genes, including 241 upregulated genes and 56 downregulated genes in EC patients. Based on these genes, we developed a 7-immune gene model of EC, including HSPA6, S100A12, NOS2, DKK1, OSM, AR, and OXTR. The area under the curve (AUC) of the model at 1 year was 0.825. Similarly, the AUC values for the validating cohorts were 0.813 and 0.816, respectively. Pathological stage and risk score of the model were independent prognostic factors. This model was effective for both subtypes of EC. CONCLUSIONS We constructed a 7-gene model consisting of HSPA6, S100A12, NOS2, DKK1, OSM, AR, and OXTR. This risk model could be used for prognostic prediction of EC.
BACKGROUND We constructed a predictive risk model of esophageal carcinoma (EC) for prognostic prediction. MATERIAL AND METHODS Immune genes and the expression data were downloaded from the ImmPort database and The Cancer Genome Atlas database. Univariate analysis, Lasso regression, and multivariate analysis were applied to screen the ultimately included prognostic immune genes for the model based on the training cohort. Survival analysis and receiver operating characteristic (ROC) curve were applied to evaluate the model. The model was further validated in the testing and entire cohorts, and the clinical utility of the model and its ability to assess the subtypes of EC were evaluated in the entire cohort. RESULTS We detected 297 differentially expressed immune genes, including 241 upregulated genes and 56 downregulated genes in EC patients. Based on these genes, we developed a 7-immune gene model of EC, including HSPA6, S100A12, NOS2, DKK1, OSM, AR, and OXTR. The area under the curve (AUC) of the model at 1 year was 0.825. Similarly, the AUC values for the validating cohorts were 0.813 and 0.816, respectively. Pathological stage and risk score of the model were independent prognostic factors. This model was effective for both subtypes of EC. CONCLUSIONS We constructed a 7-gene model consisting of HSPA6, S100A12, NOS2, DKK1, OSM, AR, and OXTR. This risk model could be used for prognostic prediction of EC.
Esophageal carcinoma (EC) is one of the most common types of cancer worldwide [1,2]. The incidence and mortality rates of EC are high, with 0.59 new cases and 0.548 deaths per 1000 people worldwide in 2017 [3]. Male sex and age over 70 years were risk factors of EC [3,4]. However, patients are usually diagnosed at the terminal stage, when it is often too late for optimal treatment [3-5]. In addition, patients with a similar pathological stage may still have different overall survival (OS) as a result of individual differences [6]. Therefore, it is urgent to find molecular biomarkers for early diagnosis and accurate prognosis prediction of EC.Despite the advances made in multiple therapies, the prognosis of EC patients remains poor [7]. Immunotherapy is an innovative treatment strategy that has shown encouraging efficacy in several types of cancer, including EC [8]. Recently, checkpoint inhibitors have achieved encouraging success in treating lung cancers and melanoma [9,10], which increases optimism about immunotherapy as a rational strategy for EC treatment. EC has very high rates of somatic mutations [11], which may be exploitable, promoting recognition by the immune system and subsequent tumor elimination. The immunotherapeutic approach may be superior to conventional anticancer agents due to the targeted therapy. However, immunotherapy has clinical and economic limitations, and response rates are limited to a small patient population [12,13]. Currently, biomarkers that predict response to immune therapy for clinical application are still in the early stages of development [14]. Thus, it is essential to identify gene models to reflect the immune status and to assess the prognosis of EC.In the present study, we developed a molecular model according to the differentially expressed (DE) immune genes, which can contribute to the early diagnosis and prognostic evaluation of EC patients. In addition, our study identified many DE immune genes and detected the enriched functions, pathways, and transcription factors (TFs), which could lay a foundation for further basic research and provide useful information for molecularly-targeted therapy and immunotherapy in EC patients.
Material and Methods
Database processing
Transcriptomic data and clinical information, including survival outcome, age, sex, the histologic grade, as well as pathological stage, were obtained from The Cancer Genome Atlas (TCGA) database. In addition, the list of immune genes was obtained from the ImmPort database ().
Identification of DE immune genes
The Wilcox test was applied for differential analysis. False discovery rate (FDR) <0.05 and log2 (fold change [FC]) >1 were set as the cut-offs for DE genes. Then, the DE immune genes were detected based on immune genes and the identified DE genes. The Pheatmap package and gplots package with R software (v3.6.1) was applied to construct the heatmaps and volcano maps of the DE genes and the DE immune genes.
Function and pathway enrichment analysis
We used the clusterProfiler package and the org.Hs.eg.db package to perform gene ontology (GO) analysis and the Kyoto Encyclopedia of Genes and Genomes (KEGG) was used for the analysis of the DE immune genes. GO terms included cellular component, biological process, and molecular function. P.adjust <0.05 was set as the cut-off for significantly enriched GO and KEGG terms.
Development of a DE immune genes-TFs network
The Database for Annotation, Visualization, and Integrated Discovery (DAVID) (v6.8) was used to detect the enriched TFs of the DE immune genes. Bonferroni <0.05 and Benjamini <0.05 were the cut-off criteria used to identify significantly enriched TFs. The interactions between the identified immune genes and the significantly enriched TFs were imported into Cytoscape (v3.7.2) to construct a DE immune genes-TFs network.
Identification of a risk model
To build the prognostic model, univariate Cox regression analysis, LASSO regression, and multivariate Cox regression analysis were applied using R software (v3.6.1) based on the methodology of previously published studies [15-17]. Firstly, univariate analysis was performed to screen the significant DE immune genes based on the detected immune genes using the “survival” package. The genes with P<0.05 were defined as being related to OS. Then, Lasso regression with the “glmnet” package was applied to remove functionally similar genes to avoid model overfitting. Finally, based on the results of Lasso regression, multivariate analysis was performed with the “survival” package to determine the ultimately included genes of the model. The differential expression of these genes between normal and cancer samples in the TCGA was shown in the boxplot. The Oncomine database () was used to validate the expression of these genes. The risk score was determined according to the following formula:N, Exp, and Coe represented the number of signature genes, the expression level of immune genes, and the coefficient value, respectively. On the basis of the median of the risk score, patients were separated into 2 subgroups. A low-risk score represents good survival for EC patients. We utilized Kaplan-Meier analysis to compare the survival rate of the 2 groups. We used the log-rank test to assess the survival difference between the low-risk and high-risk groups. The receiver operating characteristic (ROC) curve and the area under the ROC (AUC) value was applied to evaluate the risk model. We used risk score plots and heatmap of gene expression to assess the model.
Validation of the risk model
To verify the universality of the immune gene model, the testing group and the entire group were also analyzed. The survival analysis and the ROC curve were also used to assess the model. Similarly, risk score plots and gene expression heatmap using R software (v3.6.1) were applied for evaluation of the model.
Clinical utility of the model
We applied univariate and multivariate Cox analyses to evaluate our model and other prognostic variables, involving age, sex (0 for female and 1 for male), grade, and pathological stage. Independent prognostic variables were identified by Cox analyses.
Application of the model in both subtypes of EC
The expression of the identified DE immune genes in the model was detected in both subtypes of EC. Survival analysis was conducted to assess the model in both subtypes of EC.
Results
Gene expression data of 171 samples (160 EC and 11 normal) were obtained from TCGA database. One sample without follow-up data was excluded. The basic information of the remaining 159 samples is presented in Table 1. Further, these tumor samples were randomly divided into a training group and a testing group (Table 2). The former was applied for the construction of the model. The latter and the entire group were used for validation. The workflow is depicted in Figure 1.
Table 1
Clinical information of the 159 esophageal carcinoma patients in the entire cohort.
Clinical traits
Variable
N (total=159)
Percentage (%)
Survival status
Alive
95
59.7
Dead
64
40.3
Age (years)
≤60
81
50.9
>60
78
49.1
Gender
Female
23
14.5
Male
136
85.5
Pathological stage
Stage I
19
11.9
Stage II
74
46.5
Stage III
55
34.6
Stage IV
11
6.9
T
T0
2
1.3
T1
28
17.6
T2
38
23.9
T3
87
54.7
T4
4
2.5
M
M0
130
81.8
M1
11
6.9
MX
18
11.3
N
N0
68
42.8
N1
67
42.1
N2
16
10.1
N3
6
3.8
NX
2
1.3
Table 2
Grouping of the esophageal carcinoma patients.
Clinical traits
Variable
Training cohort
Testing cohort
Entire cohort
Survival status
Alive
56 (35.2%)
39 (24.5%)
95 (59.7%)
Dead
13 (8.2%)
51 (32.1%)
64 (40.3%)
Figure 1
The workflow of this study. DE – differentially expressed; GO – gene ontology; KEGG – Kyoto Encyclopedia of Genes and Genomes; TFs – transcription factors.
In total, 2835 DE genes (2231 upregulated and 604 downregulated) were detected in EC samples. Subsequently, 297 DE immune genes, including 241 upregulated genes and 56 downregulated genes, were identified. These genes are depicted in a volcano map and heatmap (Figure 2A–2D). The top 10 downregulated and upregulated DE immune genes are presented in Table 3.
Figure 2
Identification of the differentially expressed (DE) immune genes. (A, B) Volcano plot and the heatmap of the DE genes. (C, D) Volcano plot and the heatmap of the DE immune genes.
Table 3
The top 10 upregulated and downregulated differentially expressed immune genes.
Type
Genes
LogFC
FDR
Up-regulated
THRB
−1.24375
0.000272
GREM2
−3.02945
0.000334
CHGA
−2.54828
0.000449
ESRRB
−4.74163
0.001325
PGC
−5.71914
0.001895
CXCL17
−2.33765
0.002064
RORC
−2.00001
0.002064
LIFR
−3.12882
0.002175
NR3C2
−1.24081
0.002511
CCL14
−1.48323
0.002892
Down-regulated
PSMC2
1.229994
4.39E-05
ADRM1
1.286225
4.39E-05
CACYBP
1.322565
4.39E-05
BIRC5
2.846006
4.39E-05
ISG15
3.294875
4.39E-05
PLAU
3.634761
4.39E-05
IL11
3.976631
4.39E-05
ESM1
4.540454
4.39E-05
MMP12
4.340459
5.01E-05
ULBP2
3.589895
5.14E-05
FC – fold change; FDR – false discovery rate.
GO and KEGG analysis
1754 significantly enriched GO terms (43 molecular function terms, 1652 biological process terms, and 59 cellular component terms) were detected. Similarly, 75 significant KEGG terms were discovered. The first 9 GO terms and the first 10 KEGG terms were shown in Figure 3A and 3B, respectively.
Figure 3
Functions and transcription factors (TFs) enrichment analysis of the differentially expressed (DE) immune genes. (A) Gene ontology (GO) terms. (B) Kyoto Encyclopedia of Genes and Genomes (KEGG) terms. (C) A DE immune genes-TFs network.
Two DE TFs were identified based on the DAVID database. Then, the qualified 119 DE immune genes-DE TFs interactions were imported into Cytoscape (v3.7.2) to construct a network, including 58 DE upregulated immune genes, 15 DE downregulated immune genes, and 2 DE TFs (Figure 3C).We applied univariate Cox analysis to screen 11 significant immune genes (Figure 4A). Further, we identified 10 candidate genes using Lasso regression (Figure 4B, 4C). At last, 7 immune genes were obtained, including heat shock proteins A6 (HSPA6), S100A12, NOS2, Dickkopf-1 (DKK1), Oncostatin M (OSM), androgen receptors (AR), and oxytocin receptor (OXTR) (Figure 4D). Only one of these genes is a low-hazard gene (Figure 4E). The expression of these genes was also validated using the Oncomine database (Figure 5A–5G). To predict the prognosis of EC in the training group, the risk score model was created with the following formula: the risk score=(0.0095×gene expression level of HSPA6)+ (0.0034×gene expression level of S100A12)+ (0.016×gene expression level of NOS2)+ (0.0129×gene expression level of DKK1)+ (0.2706×gene expression level of OSM)+ (−1.6913×gene expression level of AR)+ (0.1402×gene expression level of OXTR).
Figure 4
Construction of the model based on the training group. (A) Univariate Cox analysis. (B, C) Lasso regression. (D) Multivariate Cox analysis. (E) The expression of 7 DE immune genes in esophageal carcinoma patients.
Figure 5
(A–G) The expression of 7 DE immune genes between normal and esophageal carcinoma samples in the Oncomine database.
All EC patients were classified into a high-risk group (n=39) and a low-risk group (n=40) according to the median risk score in the training cohort. The ROC curve showed a significant difference between the high-risk and the low-risk groups (P=1.224e-03) (Figure 6A). Additionally, the AUC values of the risk model at 1 and 3 years were 0.825 and 0.677, respectively (Figure 6B). The risk curve and the heatmap are depicted in Figure 6C–6E.
Figure 6
Evaluation of the prognostic signature in the training group. (A) Overall survival (OS) in the training cohort. (B) Time-dependent receiver operating characteristic (ROC) curve analysis in the training cohort. (C) The risk score distribution plot in the training cohort. (D) Survival status plot in the training cohort. (E) Heatmap of risk genes in the training cohort.
Patients in the validating groups were classified into 2 groups. Firstly, the survival curves differed significantly between the high-risk and low-risk subgroups in both validating groups (all P<0.05) (Figure 7A, 7C). Meanwhile, the AUC values of the model at 1 and 3 years were 0.813 and 0.596 in the testing group (Figure 7B). Similarly, the AUC values were 0.816 and 0.65 in the entire group (Figure 7D). The risk curve and the heatmap of the immune gene expression in the validating groups are illustrated in Figure 8A–8F.
Figure 7
Validation of the prognostic model in the testing and entire cohort. (A) Overall survival (OS) in the testing group. (B) The receiver operating characteristic (ROC) curve in the testing group. (C) OS in the entire group. (D) The ROC curve in the entire group.
Figure 8
Evaluation of the prognostic signature in the validating cohorts. (A) The risk score distribution plot in the testing cohort. (B) Survival status plots in the testing cohort. (C) Heatmap of risk genes in the testing cohort. (D) The risk score distribution plot in the entire cohort. (E) Survival status plots in the entire cohort. (F) Heatmap of risk genes in the entire cohort.
We found that pathological stage and risk score of the model were associated with OS in the entire group (all P<0.05) (Figure 9A, 9B). This means that these 2 factors were independent variables for the prognostic assessment of EC. These 2 variables were compared, showing that the risk score of the model was more accurate in predicting 1-year OS compared with the pathological stage (Figure 9C).
Figure 9
Clinical utility of the model in the entire cohort. (A) Univariate Cox analysis. (B) Multivariate Cox analysis. (C) The receiver operating characteristic (ROC) curve of the prognostic variables in the entire group at 1 year.
The expression of the identified DE immune genes of the model in both subtypes was shown in Figure 10A and 10B, and the survival analysis of the model in both subtypes was shown in Figure 10C and 10D.
Figure 10
Application of the model in both subtypes of esophageal carcinoma (EC). (A) The expression of the DE immune genes in adenocarcinoma of EC. (B) The expression of the DE immune genes in squamous cell carcinoma of EC. (C) Survival analysis of the risk model in adenocarcinoma of EC. (D) Survival analysis of the risk model in squamous cell carcinoma of EC.
Discussion
The prognosis of EC patients is poor due to the high rates of recurrence and death after surgery [18]. In addition, there have been few effective and reliable biomarkers available for accurate prognosis prediction of EC. Thus, it is urgent to develop a robust prediction model to forecast EC outcomes.In the present study, an immune gene model containing 7 DE immune genes was developed, including HSPA6, S100A12, NOS2, DKK1, OSM, AR, and OXTR. First, we constructed this model with the detected DE immune genes. All these DE genes between EC and normal tissues appear to play crucial roles in the early diagnosis of EC. Furthermore, several methods were used in model selection, and the model was verified by the validating groups. This means that the model is accurate and reliable. Furthermore, these identified immune genes are potential molecular targets in immunotherapy. S100A12, a member of the S100 gene family, was reported to be closely associated with EC [19]. DKK1, which is a secreted glycoprotein, was demonstrated to promote tumor growth and may be a new therapeutic target in EC [20]. AR was suggested to play a crucial role in the genesis of EC [21]. Lagergren et al. [22] found that OXTR was associated with an increased risk of EC. Upregulation of HSPA6 was associated with poor outcomes in hepatocellular carcinoma [23]. NOS2 was identified as a prognostic factor for the malignancy and recurrence of glioblastoma [24]. OSM, as an inflammatory cytokine, was found to be upregulated in breast cancer, and early therapeutic inhibition of OSM could prevent breast cancer metastasis [25]. The roles of HSPA6, NOS2, and OSM in the progression of EC need further research.EC was reported to be sex-linked, with a male predominance [21], which is consistent with the present study, in which sex was a prognostic factor for EC. Our multivariate Cox analysis demonstrated that the pathological stage and the risk score were independent prognostic variables. Further, the comparison revealed that the risk model was not inferior to pathological stage in predicting the prognosis of EC. Thus, the model in the present study could be used clinically for the prognosis assessment of EC.In this study, significantly enriched function terms included cytokine activity, leukocyte migration, cell chemotaxis, and receptor ligand activity. Cytokines are soluble proteins that mediate intercellular communication. An increasing number of clinical trials are exploring the efficacy and safety of cytokine-based drugs combined with other immunomodulatory medications in view of its anti-tumor properties [26-28]. Factors beyond tumor cells, including immune cells and extracellular matrix components, are also essential to esophageal tumorigenesis [29]. It was reported that leukocytes can interact with cancer cells at the site of solid tumors and in the bloodstream [30]. Migration of leukocytes, as an immune response, might reflect the status of the tumor microenvironment [31]. Enriched KEGG terms comprised IL-17 signaling pathway, cytokine-cytokine receptor interaction, and chemokine signaling pathway. IL-17A, a pro-inflammatory cytokine, was proved to promote the invasiveness and migration of EC cells and could be a potential therapeutic target for EC [32]. In addition, chemokines, including IL8, CXCL1, and CXCL3, were overexpressed during esophageal carcinogenesis, and the chemokine signaling pathway plays important roles in the development of EC [33]. Enrichment analysis of TFs identified STAT and NF-κB. A previous study reported increased expression of STAT3 in EC patients and might be involved in esophageal tumorigenesis [34]. NF-κB overexpression was also found to be related to the poor prognosis of EC [35]. All the enriched functions, pathways, and TFs reveal the crucial role of immunity in tumors and may provide potential targets for immunotherapy of EC.Considering the differences between these 2 subtypes in terms of etiology, epidemiology, molecular features, and responses to current therapeutic regimens, we assessed the application of the model in both subtypes of EC. The results of the differential expression of the immune genes in the 2 subtypes were consistent with the overall results. In addition, the model could be applied to the 2 subtypes of EC, although there was no significant difference between the high-risk group and the low-risk group in squamous cell carcinoma of EC (P=9.061e-02). The limited number of patients with a follow-up of 1 to 3 years may be the reason why we found no significant difference in squamous cell carcinoma.Several limitations of the current study should be addressed. Firstly, the AUC values of the model at 3-year follow-up were inferior to those at 1 year, possibly due to the limited number of patients with more than 3 years of follow-up. Secondly, this study was based on public databases, so the results need to be confirmed by clinical trials. Thirdly, the biological functions of these 7 immune-related genes and their role in the progression of EC need further evaluation in basic experiments.
Conclusions
We identified a 7-gene model consisting of HSPA6, S100A12, NOS2, DKK1, OSM, AR, and OXTR. This risk model was validated and can be used for prognosis evaluation in EC.
Authors: Peter S N van Rossum; Nadia Haj Mohammad; Frank P Vleggaar; Richard van Hillegersberg Journal: Nat Rev Gastroenterol Hepatol Date: 2017-12-13 Impact factor: 46.802
Authors: Michael S Lawrence; Petar Stojanov; Paz Polak; Gregory V Kryukov; Kristian Cibulskis; Andrey Sivachenko; Scott L Carter; Chip Stewart; Craig H Mermel; Steven A Roberts; Adam Kiezun; Peter S Hammerman; Aaron McKenna; Yotam Drier; Lihua Zou; Alex H Ramos; Trevor J Pugh; Nicolas Stransky; Elena Helman; Jaegil Kim; Carrie Sougnez; Lauren Ambrogio; Elizabeth Nickerson; Erica Shefler; Maria L Cortés; Daniel Auclair; Gordon Saksena; Douglas Voet; Michael Noble; Daniel DiCara; Pei Lin; Lee Lichtenstein; David I Heiman; Timothy Fennell; Marcin Imielinski; Bryan Hernandez; Eran Hodis; Sylvan Baca; Austin M Dulak; Jens Lohr; Dan-Avi Landau; Catherine J Wu; Jorge Melendez-Zajgla; Alfredo Hidalgo-Miranda; Amnon Koren; Steven A McCarroll; Jaume Mora; Brian Crompton; Robert Onofrio; Melissa Parkin; Wendy Winckler; Kristin Ardlie; Stacey B Gabriel; Charles W M Roberts; Jaclyn A Biegel; Kimberly Stegmaier; Adam J Bass; Levi A Garraway; Matthew Meyerson; Todd R Golub; Dmitry A Gordenin; Shamil Sunyaev; Eric S Lander; Gad Getz Journal: Nature Date: 2013-06-16 Impact factor: 49.962