Qianli Ma1, Yang Chen2, Fei Xiao1, Yang Hao1, Zhiyi Song1, Jin Zhang1, Katsuhiro Okuda3, Sang-Won Um4, Mario Silva5, Yoshihisa Shimada6, Chaozeng Si7, Chaoyang Liang1. 1. Department of Thoracic Surgery, China-Japan Friendship Hospital, Beijing, China. 2. Department of Biochemistry and Molecular Biology, The State Key Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, School of Basic Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China. 3. Department of Oncology, Immunology and Surgery, Nagoya City University Graduate School of Medical Sciences, Nagoya, Japan. 4. Division of Pulmonary and Critical Care Medicine, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea. 5. Section of Scienze Radiologiche, Department of Medicine and Surgery (DiMeC), University of Parma, Parma, Italy. 6. Department of Thoracic Surgery, Tokyo Medical University, Tokyo, Japan. 7. Department of Information Management, China-Japan Friendship Hospital, Beijing, China.
Abstract
BACKGROUND: Immune and stromal component evaluation is necessary to establish accurate prognostic markers for the prediction of clinical outcomes in lung adenocarcinoma (LUAD). We aimed to develop a gene signature based on the Estimation of STromal and Immune cells in MAlignant Tumor tissues using Expression data (ESTIMATE)-stromal-immune score in LUAD. METHODS: The transcriptomic profiles of patients with LUAD were obtained from The Cancer Genome Atlas (TCGA), and the immune and stromal scores were derived using the ESTIMATE algorithm. The prognostic signature genes were selected from the differentially expressed genes (DEGs) using the robust partial likelihood-based cox proportional hazards regression method. The negative log-likelihood and the Akaike Information Criterion (AIC) were used to identify the optimal gene signature. The validation was carried out in 2 independent datasets from the Gene Expression Omnibus (GSE68571 and GSE72094). RESULTS: Patients with high ESTIMATE, stromal, and immune scores had better overall survivals (P=0.0035, P=0.066, and P=0.0077). The expression of thirty-seven genes was related to ESTIMATE-stromal-immune score. A risk stratification model was developed based on a gene signature containing CD74, JCHAIN, and PTGDS. The ESTIMATE-stromal-immune risk score was revealed to be a prognostic factor (P=0.009) after multivariate analysis. Four groups were classified based on this risk stratification model, yielding increasing survival outcomes (log-rank test, P=0.0051). This risk stratification model and other clinicopathological factors were combined to generate a nomogram. The calibration curves showed perfect agreement between the nomogram-predicted outcomes and those actually observed. Similar observations were made in 2 independent cohorts. CONCLUSIONS: The gene signature based on the ESTIMATE-stromal-immune score could predict the prognosis of patients with LUAD. 2021 Translational Lung Cancer Research. All rights reserved.
BACKGROUND: Immune and stromal component evaluation is necessary to establish accurate prognostic markers for the prediction of clinical outcomes in lung adenocarcinoma (LUAD). We aimed to develop a gene signature based on the Estimation of STromal and Immune cells in MAlignant Tumor tissues using Expression data (ESTIMATE)-stromal-immune score in LUAD. METHODS: The transcriptomic profiles of patients with LUAD were obtained from The Cancer Genome Atlas (TCGA), and the immune and stromal scores were derived using the ESTIMATE algorithm. The prognostic signature genes were selected from the differentially expressed genes (DEGs) using the robust partial likelihood-based cox proportional hazards regression method. The negative log-likelihood and the Akaike Information Criterion (AIC) were used to identify the optimal gene signature. The validation was carried out in 2 independent datasets from the Gene Expression Omnibus (GSE68571 and GSE72094). RESULTS: Patients with high ESTIMATE, stromal, and immune scores had better overall survivals (P=0.0035, P=0.066, and P=0.0077). The expression of thirty-seven genes was related to ESTIMATE-stromal-immune score. A risk stratification model was developed based on a gene signature containing CD74, JCHAIN, and PTGDS. The ESTIMATE-stromal-immune risk score was revealed to be a prognostic factor (P=0.009) after multivariate analysis. Four groups were classified based on this risk stratification model, yielding increasing survival outcomes (log-rank test, P=0.0051). This risk stratification model and other clinicopathological factors were combined to generate a nomogram. The calibration curves showed perfect agreement between the nomogram-predicted outcomes and those actually observed. Similar observations were made in 2 independent cohorts. CONCLUSIONS: The gene signature based on the ESTIMATE-stromal-immune score could predict the prognosis of patients with LUAD. 2021 Translational Lung Cancer Research. All rights reserved.
Lung cancer is the primary cause of cancer death and the most frequently diagnosed malignancy globally. Worldwide lung cancer incidence and mortality were estimated 2.1 million new lung cancer cases and 1.8 million related deaths in 2018 (1). In the past 15 years, lung adenocarcinoma (LUAD) has become the most common subtype of lung cancer (2). For patients with LUAD, outcomes are variable and difficult to predict (3,4). Novel technologies and strategies for detection of LUAD and stratification of prognosis are being developed, yet standardized predictive and surrogate biomarkers still lack for personalized prognostication.For many years, the prognostication of lung cancer was based on the tumor-node-metastasis (TNM) staging system, which allowed major clustering for management and prognostication (5). However, the static measurement method of TNM system does not fully reflect the dynamic process of the disease, such as the time to occurrence or death, and lacks relevant prognostic information: clinical outcomes can vary 8–47% among stage I–II lung cancer (6,7). The TNM staging assumes homogeneous growth, which is not reflected by actual neoplastic processes where also the host immune response plays a key role (8). Thus, novel biomarkers and risk evaluation models are warranted to improve prognostic prediction and precision treatment in addition to the TNM staging system.LUAD tissue comprise tumor cells, infiltrating immune cells, tumor-related stromal cells, airway epithelial cells, and other normal endothelial cells. Recent studies have found that stromal cells play an essential role in the progression and treatment of cancer (9). Along with the tumor growth, immune cells also play important roles (10). Stromal and immune cells constitute the major components of the tumor microenvironment of LUAD (11,12). Therefore, the prognostic stratification of LUAD patients can be improved by comprehensive evaluation that includes metrics for stromal and immune cells to support personalize prognostication over the TNM staging system. The Estimation of STromal and Immune cells in MAlignant Tumor tissues using Expression data (ESTIMATE) algorithm was recently developed. It integrates the transcriptional profiles of cancer tissues and various infiltrating normal cells (13). Using this algorithm, the levels of infiltrating stromal and immune cells are predicted by calculating the scores based on gene signatures. Previously, researchers have used this method to predict the prognosis of patients with cancers of the digestive system cancer (14). However, data on stromal and immune scores in lung cancer are lacking.In this study, we aimed to test immune and stromal scores in association with prognosis in LUAD. In particular, we aimed to investigate the interaction between the tumor and tumor immune microenvironment (TIME) during lung tumorigenesis and progression, and to introduce a new potential gene signature risk stratification model based on the estimate-stromal-immune scores that could be clinically used to predict the survival outcomes of LUAD patients. We present the following article in accordance with the TRIPOD reporting checklist (available at http://dx.doi.org/10.21037/tlcr-21-223).
Methods
Population
This retrospective study included derivation and validation cohorts from publicly available database, as follows:❖ Derivation cohort: 514 LUAD samples from The Cancer Genome Atlas (TCGA);❖ Validation cohorts: external validation was operated in two independent cohorts from the Gene Expression Omnibus (GEO) database:96 LUAD samples from GSE68571 database;442 LUAD samples from GSE72094 database.Clinicopathological data for the corresponding patients were obtained including age, sex, pathologic stage, and survival outcome. The inclusion criteria for this study were available survival information and available expression data.In accordance with the database policy, access to the dataset was obtained from TCGA and GEO. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).
Estimation of stromal and immune scores
The gene expression data of LUAD tissues in derivation population were downloaded from the Genomic Data Commons (GDC, available at: http://potal.gdc.cancer.gov/) Data Portal on November 6, 2019. The FPKM (fragments per kilobase of exon per million reads mapped) method was used to quantify gene expression.The expression matrix for estimating the stromal and immune scores was normalized by the ESTIMATE algorithm. Stromal and immune scores were calculated by performing single-sample gene set enrichment analysis. These scores formed the basis for the ESTIMATE score (15).
Statistical analysis
The association of the Estimate, stromal, and immune scores with the clinicopathological characteristics of LUAD patients was analyzed by comparing the score distributions among sex, pathologic T, pathologic M, and TNM stage subgroups.
Relationship of estimate-stromal-immune scores and prognosis
The Kaplan–Meier estimator was used to estimate the patients’ overall survival. The derivation cohort was divided into 2 groups based on their ESTIMATE, stromal, and immune scores: maximally selected rank statistics were employed to identify the optimal score cutoff for classifying patients by each score. The R package “maxstat” was used during identification (16). Log-rank tests were used to compare the survival outcomes of the 2 groups.
Identification of differentially expressed genes (DEGs)
The high and low ESTIMATE score groups (stromal or immune) were compared for gene expression: the DEGs were identified using the “limma” package in R (17). For the identification of DEGs, the threshold was a simultaneously absolute value of log2 (fold change) >1 and false discovery rate (FDR) adjusted P value <0.05. The low-risk group was used as reference: genes that were up-regulated in the high-risk group were considered “up-regulated DEGs” and those that were comparatively down-regulated were considered “down-regulated DEGs”. A heatmap was created to show the expression patterns of significant DEGs. Complete linkage was used to measure distances between unsupervised hierarchical clusters.
Analyses of Gene Ontology and Kyoto Encyclopedia of Genes and Genomes Pathway Enrichment
Gene Ontology (GO) biological process (BP), cellular component (CC), and molecular function (MF) enrichment analyses were performed. The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis was performed for all DEGs. The criterion for statistical significance was an FDR adjusted P value of <0.05.
Gene signature and risk evaluation model
Prognostic genes were defined as those with a log-rank P value of <0.01 for survival comparison at the optimal expression cutoff. All DEGs performed the prognostic identification. A robust partial likelihood-based Cox proportional hazards regression survival model was used to select prognostic signature genes from all prognostic DEGs (18). A series of gene signatures were generated by forward selection. The Akaike information criterion (AIC) and negative log-likelihood methods were used to identify the optimal gene signature.A risk stratification model was developed as follows:❖ positive prognostic genes earned a score of 0 when up-regulated and 1 when down-regulated;❖ negative prognostic genes earned a score of 1 when up-regulated and 0 when down-regulated.This model was applied in the external validation cohorts by summing the scores of all signature genes. Then, the patients were stratified into risk groups according to their risk score level. The clinicopathological characteristics of patients in the risk score =0, risk score =1, risk score =2, and risk score =3 groups are list in .
Table 1
Clinicopathologic characteristics of patients in different risk groups
Characteristics
Whole cohort (n=514)
Risk score =0 (n=124)
Risk score =1 (n=130)
Risk score =2 (n=139)
Risk score =3 (n=121)
P value
Age [median (IQR)]
66.00 [59.00, 72.25]
68.50 [61.00, 74.00]
68.00 [60.00, 74.25]
65.00 [57.00, 71.50]
62.00 [56.50, 70.00]
<0.001
Gender (%)
0.006
Female
274 (53.6)
82 (66.1)
68 (52.7)
71 (51.4)
53 (44.2)
Male
237 (46.4)
42 (33.9)
61 (47.3)
67 (48.6)
67 (55.8)
Stage (%)
<0.001
I
275 (54.3)
85 (70.2)
73 (57.5)
60 (43.5)
57 (47.5)
II
121 (23.9)
29 (24.0)
28 (22.0)
29 (21.0)
35 (29.2)
III
84(16.6)
6 (5.0)
18 (14.2)
40 (29.0)
20 (16.7)
IV
26 (5.1)
1 (0.8)
8 (6.3)
9 (6.5)
8 (6.7)
Survival (%)
0.023
Alive
327 (63.6)
92 (74.2)
84 (64.6)
81 (58.3)
70 (57.9)
Dead
187 (36.4)
32 (25.8)
46 (35.4)
58 (41.7)
51 (42.1)
Stage (%)*
<0.001
i
274 (53.6)
85 (68.5)
73 (56.6)
60 (43.5)
56 (46.7)
ii
120 (23.5)
29 (23.4)
28 (21.7)
28 (20.3)
35 (29.2)
iii
84 (16.4)
6 (4.8)
18 (14.0)
40 (29.0)
20 (16.7)
iv
26 (5.1)
1 (0.8)
8 (6.2)
9 (6.5)
8 (6.7)
M (%)*
0.322
M0
343 (67.1)
83 (66.9)
83 (64.3)
93 (67.4)
84 (70.0)
M1
25 (4.9)
1 (0.8)
7 (5.4)
9 (6.5)
8 (6.7)
Mx
139 (27.2)
38 (30.6)
39 (30.2)
35 (25.4)
27 (22.5)
N (%)*
<0.001
N0
328 (64.2)
91 (73.4)
87 (67.4)
74 (53.6)
76 (63.3)
N1
95 (18.6)
24 (19.4)
21 (16.3)
22 (15.9)
28 (23.3)
N2
74 (14.5)
4 (3.2)
15 (11.6)
39 (28.3)
16 (13.3)
N3
2 (0.4)
1 (0.8)
0 (0.0)
1 (0.7)
0 (0.0)
Nx
11 (2.2)
3 (2.4)
6 (4.7)
2 [1–4]
0 (0.0)
T (%)
0.002
T1
168 (32.9)
55 (44.4)
51 (39.5)
35 (25.4)
27 (22.5)
T2
276 (54.0)
60 (48.4)
61 (47.3)
87 (63.0)
68 (56.7)
T3
45 (8.8)
6 (4.8)
11 (8.5)
9 (6.5)
19 (15.8)
T4
19 (3.7)
3 (2.4)
4 (3.1)
6 (4.3)
6 (5.0)
Tx
3 (0.6)
0 (0.0)
2 [1–6]
1 (0.7)
0 (0.0)
Estimate score (%)
<0.001
High
257 (50.0)
111 (89.5)
82 (63.1)
49 (35.3)
15 (12.4)
Low
257 (50.0)
13 (10.5)
48 (36.9)
90 (64.7)
106 (87.6)
Immune score (%)
<0.001
High
453 (88.1)
124 (100.0)
128 (98.5)
122 (87.8)
79 (65.3)
Low
61 (11.9)
0 (0.0)
2 [1–5]
17 (12.2)
42 (34.7)
Stromal score (%)
0.003
High
26 (5.1)
13 (10.5)
8 (6.2)
4 (2.9)
1 (0.8)
Low
488 (94.9)
111 (89.5)
122 (93.8)
135 (97.1)
120 (99.2)
*Patients with information unavailable on stage (7 patients, 1.36%), pathologic M (4 patients, 0.78%), and pathologic N (1 patient, 0.19%) were excluded from the comparison.
*Patients with information unavailable on stage (7 patients, 1.36%), pathologic M (4 patients, 0.78%), and pathologic N (1 patient, 0.19%) were excluded from the comparison.R version 3.5.2 (http://www.R-project.org) was used for the statistical analyses. The “estimate” package with default parameters was used to calculate the ESTIMATE, stromal, and immune scores.The “rbsurv” package with 3-fold cross-validation and 1,000 iterations was used to construct the robust likelihood-based survival model. GO and KEGG enrichment analyses were performed using the “clusterProfiler” package. Heatmaps and Venn diagrams were constructed with the “pheatmap” and “VennDiagram” packages, respectively. The FDR method was used to adjust for multiple testing. Multivariate Cox regression analysis was performed to identify the independent risk factors for overall survival. And the protein-protein interaction (PPI) network was constructed. After the integration of clinicopathological risk factors and risk group (based on the Estimate, stromal, and immune scores), a nomogram was built for precise prediction of overall survival.
Results
Estimation of estimate-stromal-immune scores
The expression data and clinical information from 514 LUAD patients in the TCGA database were analyzed. The clinicopathological characteristics of patients in the risk score =0, risk score =1, risk score =2, and risk score =3 groups are listed in . Patients in the group with the highest risk (risk score =3) were more likely to be younger and male (P<0.001, and P=0.006 respectively). Furthermore, patients in the highest risk group had a larger tumor size and lymph node metastases (P=0.002, and P<0.001 respectively). The lower risk groups included more early-stage patients (P<0.001).In the validaiton cohort, the estimate scores ranged from −2,963.63 to 4,485.76; the stromal scores ranged from −1,959.31 to 2,098.77; and the immune scores ranged from −1,355.85 to 3,286.67. Patients in higher risk groups yielded lower Estimate scores (P<0.001), lower immune scores (P<0.001), and lower stromal scores (P=0.003).
Association of estimate, stromal, and immune scores with the clinicopathological characteristics and prognoses of LUAD patients
The association between the scores (Estimate, stromal, or immune) and patients data is summarized in . The estimate, stromal, and immune scores of female patients were increased compared to those of males (P=0.002, P=0.0058, and P=0.0033, respectively). Significant correlations were found between the estimate, stromal, immune scores and pathologic T factor. Larger tumors (T3 and T4) yielded lower estimate and immune scores than those of smaller sizes (T1 and T2) (Kruskal Wallis test, P=0.04 for Estimate score, P=0.22 for stromal score, and P=0.012 for immune score). Significant correlations were also observed between the estimate and stromal scores and pathologic M factor. Patients with metastases (M1) yielded lower estimate and stromal scores than those without metastases or with unknown metastatic status (M0 and Mx) (Kruskal Wallis test, P=0.025 and P=0.016 respectively). Both the estimate and immune scores were variously distributed among TNM stage. Tumors with an advanced stage (stage III and stage IV) yielded lower estimate and immune scores than early-stage tumors (stage I and stage II) (Kruskal Wallis test, P=0.028 and P=0.027 for estimate score and immune score, respectively).
Figure 1
Association of Estimate, Stromal and Immune scores with lung adenocarcinoma demographic (gender), pathology and prognosis. (A) Comparisons and distributions of Estimate, Stromal and Immune scores among gender, different pathologic T, pathologic M, and TNM stage. (B) The optimal cutoff identification for Estimate, Stromal and Immune scores. The standardized log-rank statistic value is shown in the scatter plot. A vertical dashed line indicates the optimal cutoff (Estimate score =1,057.39, Stromal score =−102.58, Immune score =1,211.05). The density distribution for low- and high-Estimate-Stromal-Immune score groups is shown in the lower histogram. (C) Overall survival for patients with high vs. low Estimate scores (Kaplan-Meier plot). (D) Overall survival for patients with high vs. low Stromal scores (Kaplan-Meier plot). (E) Overall survival for patients with high vs. low Immune scores (Kaplan-Meier plot).
Association of Estimate, Stromal and Immune scores with lung adenocarcinoma demographic (gender), pathology and prognosis. (A) Comparisons and distributions of Estimate, Stromal and Immune scores among gender, different pathologic T, pathologic M, and TNM stage. (B) The optimal cutoff identification for Estimate, Stromal and Immune scores. The standardized log-rank statistic value is shown in the scatter plot. A vertical dashed line indicates the optimal cutoff (Estimate score =1,057.39, Stromal score =−102.58, Immune score =1,211.05). The density distribution for low- and high-Estimate-Stromal-Immune score groups is shown in the lower histogram. (C) Overall survival for patients with high vs. low Estimate scores (Kaplan-Meier plot). (D) Overall survival for patients with high vs. low Stromal scores (Kaplan-Meier plot). (E) Overall survival for patients with high vs. low Immune scores (Kaplan-Meier plot).Illustrations of optimal cutoff identification for estimate, stromal, and immune score are displayed in . Patients with high estimate scores had better overall survival than those with low estimate scores (log-rank test, P=0.0035) (). Patients with high stromal scores showed a trend of better overall survival than lower group (log-rank test, P=0.066), and this trend seemed more obvious after 5 years of follow-up (). Patients with high immune scores yielded better overall survival than those with low immune scores (log-rank test, P=0.0077), and this survival advantage was more remarkable after 5 years of follow-up ().
Comparison of gene expression profiles by estimate, stromal, and immune scores in LUAD
The expression profiles of LUAD patients with high estimate (high stromal and high immune) scores were compared with those of patients with low estimate (low stromal and low immune) scores to identify DEGs related to estimate-stromal-immune score.The DEGs related to Estimate, stromal, and immune score are visualized in volcano plots (). There were 86 shared up-regulated DEGs () and 3 down-regulated DEGs () in the estimate, stromal, and immune score groups. A heatmap was created to visualize the expression profiles of the selected 89 DEGs.
Figure 2
The profiles of expression and Estimate, Stromal and Immune score-related DEGs’ biological functions. (A) Volcano plots showing the DEGs of Estimate, Stromal and Immune score (high vs. low). (B) Overlap of Estimate score-, Stromal score- and Immune score-related up-regulated DEGs. (C) Overlap of Estimate score-, Stromal score- and Immune score-related down-regulated DEGs. (D) Heatmaps showing expression profiles for Estimate-Stromal-Immune score related DEGs. (E) Top 10 KEGG pathways enriched by the DEGs. (F) Top 10 Gene Ontology terms enriched by the DEGs. DEGs, differentially expressed genes.
The profiles of expression and Estimate, Stromal and Immune score-related DEGs’ biological functions. (A) Volcano plots showing the DEGs of Estimate, Stromal and Immune score (high vs. low). (B) Overlap of Estimate score-, Stromal score- and Immune score-related up-regulated DEGs. (C) Overlap of Estimate score-, Stromal score- and Immune score-related down-regulated DEGs. (D) Heatmaps showing expression profiles for Estimate-Stromal-Immune score related DEGs. (E) Top 10 KEGG pathways enriched by the DEGs. (F) Top 10 Gene Ontology terms enriched by the DEGs. DEGs, differentially expressed genes.Identified DEGs had distinct expression profiles from the unsupervised hierarchical clustering analyses. Patients with high and low Estimate (stromal or immune) scores could be distinguished effectively according to these DEGs ().KEGG pathways enriched by the DEGs were related to staphylococcus aureus infection, phagosome, tuberculosis, and cell adhesion molecules (). The stromal-related DEGs were enriched in protein digestion and absorption, focal adhesion, and extracellular matrix receptor interaction (Figure S1A). Meanwhile, the immune score-related DEGs were enriched in antigen processing and presentation, the chemokine signaling pathway, and Th1 and Th2 cell differentiation (Figure S1B). GO analysis showed that the DEGs were enriched in the immune response-activating and -regulating cell surface receptor signaling pathway, MHC class II protein complex antigen processing, and presentation of peptide antigen via MHC class II (). Separate GO enrichment analyses were carried out and showed that the stromal scores were enriched in extracellular matrix organization, connective tissue development, collagen trimer, and extracellular matrix structural constituent (Figure S1C). The immune scores were enriched in T-cell activation, positive regulation of innate immune response, T-cell receptor complex, and C-C chemokine receptor activity (Figure S1D).
Identification of prognostic DEGs in LUAD
These prognostic genes were defined from the above 89 estimate-stromal-immune score-related DEGs and comprised 36 up-regulated DEGs and 1 down-regulated DEG. A forest plot showing the prognostic effects of these 37 DEGs is displayed in .
Figure 3
Forest plot of hazard ratios for 37 Estimate, Stromal-Immune score-related prognostic DEGs. The Cox proportional hazard regression model was used to estimate the Hazard ratios and corresponding 95% confidence intervals. DEGs, differentially expressed genes.
Forest plot of hazard ratios for 37 Estimate, Stromal-Immune score-related prognostic DEGs. The Cox proportional hazard regression model was used to estimate the Hazard ratios and corresponding 95% confidence intervals. DEGs, differentially expressed genes.As recently illustrated in clinical trials boosting T-cell responses, 7 genes were summarized as follows: PDCD1, LAG3, HAVCR2, CTLA4, KIR2DL3, TNFRSF18, and TNFRSF9. These 7 genes and the 37 prognostic DEGs selected from TCGA were screened using the STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) database, and the protein–protein interaction (PPI) network was constructed. Finally, IL-7R (interleukin 7 receptor), CD52, CD53, HCK (hemopoietic cell kinase), and CYBB (Cytochrome B-245 Beta Chain) were found to interact most closely with the immune checkpoint inhibitors (ICIs) based on the number of links (Figure S2).
Gene signature based on the estimate, stromal, and immune scores and risk stratification model for LUAD
Thirty-seven DEGs were selected to identify the gene signature with the most prognostic significance. Based on the AIC and the smallest values for the negative log-likelihood statistics, CD74, JCHAIN, and PTGDS were identified as the optimal signature genes (). These 3 genes were significantly associated with favorable survival outcomes (). For risk stratification, an individual-level score was developed based on the 3 signature genes (). A lower risk score was found to be significantly associated with a longer survival time, and patient living months increased as the risk score decreased (). Using this risk score stratification model, 4 groups were classified according to their survival outcomes (log-rank test, P=0.005) (). Further validation of the risk stratification model was carried out among patients with stage I, II, and III LUAD ().
Table 2
Prognostic gene signature selection based on a robust likelihood- based survival model
Gene Symbol
Negative log-likelihood
AIC
Forward selection
CD74
973.92
1,949.83
Y
JCHAIN
971
1,946
Y
PTGDS
969.98
1,945.96
Y
HLA-DRA
969.95
1,947.9
-
HLA-DRB5
969.07
1,948.13
-
HLA-DOA
968.91
1,949.82
-
C7
968.89
1,951.78
-
PTPRC
968.89
1,953.77
-
HLA-DQA1
968.7
1,955.39
-
HLA-DMB
966.91
1,953.81
-
CHIT1
966.85
1,955.7
-
CPA3
965.89
1,955.78
-
CSF2RB
965.89
1,957.78
-
HLA-DPB1
965.86
1,959.72
-
CD79A
965.06
1,960.13
-
S100P
963.13
1,958.25
-
HLA-DQB2
962.83
1,959.66
-
HLA-DRB1
962.72
1,961.45
-
CD52
962.65
1,963.31
-
AIC, Akaike Information Criterion; a smaller statistic value indicates better predictive ability.
Figure 4
Estimate-Stromal-Immune score-based gene signature and risk stratification mode. (A) Overall survival for patients grouped by expression of three signature genes, CD74, JCHAIN, PTGDS were shown in the Kaplan-Meier plots. (B) The algorithm for risk stratification model was based on the expression of three signature genes. (C) Correlation between lung adenocarcinoma patient living months and risk score. (D) Overall survival for all patients according to risk score (Kaplan-Meier plot). (E) Overall survival for patients with stage I lung adenocarcinoma according to risk score (Kaplan-Meier plot). (F) Overall survival for patients with stage II lung adenocarcinoma according to risk score (Kaplan-Meier plot). (G) Overall survival for patients with stage III lung adenocarcinoma according to risk score (Kaplan-Meier plot).
AIC, Akaike Information Criterion; a smaller statistic value indicates better predictive ability.Estimate-Stromal-Immune score-based gene signature and risk stratification mode. (A) Overall survival for patients grouped by expression of three signature genes, CD74, JCHAIN, PTGDS were shown in the Kaplan-Meier plots. (B) The algorithm for risk stratification model was based on the expression of three signature genes. (C) Correlation between lung adenocarcinoma patient living months and risk score. (D) Overall survival for all patients according to risk score (Kaplan-Meier plot). (E) Overall survival for patients with stage I lung adenocarcinoma according to risk score (Kaplan-Meier plot). (F) Overall survival for patients with stage II lung adenocarcinoma according to risk score (Kaplan-Meier plot). (G) Overall survival for patients with stage III lung adenocarcinoma according to risk score (Kaplan-Meier plot).
Relationship of estimate-stromal-immune score-based risk group with overall survival in patients with LUAD
The prognostic value of the estimate-stromal-immune score-based risk stratification model was evaluated through multivariate cox regression (). After adjusting the variables of age, sex, race, and pathologic stage, risk group was identified as a significant prognostic factor (P=0.009).
Table 3
Multivariate Cox regression analyses of risk factors for overall survival
Characteristics
Hazard ratio
95% Cl
P value
Risk Score
1 vs. 0
1.31
(0.82–2.11)
0.262
2 vs. 0
1.41
(0.88–2.24)
0.151
3 vs. 0
1.84
(1.16–2.91)
0.009
Stage
II vs. 1
2.42
(1.68–3.48)
<0.001
III vs. 1
3.42
(2.30–5.08)
<0.001
IV vs. 1
3.53
(2.02–6.18)
<0.001
After the integration of clinicopathological risk factors and risk group (based on the estimate, stromal, and immune scores), a nomogram was built for precise prediction of OS (). A higher points total was associated with improved survival. However, the difference in points between pathologic stages III and IV was not obvious. This result was also observed in relation to sex, with the difference between females and males being less than 7 points. Compared against the ideal model, the calibration plot showed our nomogram to perform well ().
Figure 5
Nomogram for predicting overall survival of lung adenocarcinoma. (A) The 1-, 3-, and 5-year overall survival probability was predicted by integrating the Estimate, Stromal, and Immune score-based risk group and clinicopathologic risk factors from nomogram. (B) Plot shows the calibration of nomogram in terms of the agreement between predicted and observed outcomes. The dotted line shows the nomogram’s performance in the plot. This represents the perfect prediction.
Nomogram for predicting overall survival of lung adenocarcinoma. (A) The 1-, 3-, and 5-year overall survival probability was predicted by integrating the Estimate, Stromal, and Immune score-based risk group and clinicopathologic risk factors from nomogram. (B) Plot shows the calibration of nomogram in terms of the agreement between predicted and observed outcomes. The dotted line shows the nomogram’s performance in the plot. This represents the perfect prediction.
Validation of the risk stratification model in 2 external cohorts
GSE68571 cohort: among 96 patients, those with a high expression of JCHAIN and PTGDS had better OS (log-rank test, P=0.025, and P=0.038, respectively) (). A negative correlation was found between the risk scores calculated from these 3 signature genes and patient living months (). The patients in the validation cohort were classified into 4 groups using the 3-gene model, and the survival outcomes in the groups were found to be significantly different (log-rank test, P=0.027) ().
Figure 6
The Estimate-Stromal-Immune score-based risk stratification model was validated in two independent cohorts. (A) Overall survival for patients from GSE68571, grouped by expression of three signature genes, CD74, JCHAIN, and PTGDs (Kaplan-Meier plots). (B) Correlation between living months for patients and risk scores from GSE68571. (C) Overall survival according to risk score for patients from GSE68571 (Kaplan-Meier plot). (D) Overall survival for patients fromGSE72094, grouped by expression of three signature genes, CD74, JCHAIN, and PTGDs (Kaplan-Meier plots). (E) Correlation between living months for patients and risk scores from GSE72094. (F) Overall survival according to risk score for patients from GSE72094 (Kaplan-Meier plot).
The Estimate-Stromal-Immune score-based risk stratification model was validated in two independent cohorts. (A) Overall survival for patients from GSE68571, grouped by expression of three signature genes, CD74, JCHAIN, and PTGDs (Kaplan-Meier plots). (B) Correlation between living months for patients and risk scores from GSE68571. (C) Overall survival according to risk score for patients from GSE68571 (Kaplan-Meier plot). (D) Overall survival for patients fromGSE72094, grouped by expression of three signature genes, CD74, JCHAIN, and PTGDs (Kaplan-Meier plots). (E) Correlation between living months for patients and risk scores from GSE72094. (F) Overall survival according to risk score for patients from GSE72094 (Kaplan-Meier plot).GSE72094 cohort: among 442 patients, high expression levels of CD74, JCHAIN, and PTGDS were significantly associated with favorable overall survival (log-rank test, P<0.001, P=0.023, and P=0.008, respectively) (). A negative correlation was also found between risk score and patient living months (). Significantly different survival outcomes were also observed in the 4 risk-score groups (log-rank test, P<0.001) ().
Discussion
We report a signature for prognostication of LUAD beyond TNM stage, notably to assist the use of adjuvant therapy after surgery for early stage LUAD. The signature based on the ESTIMATE-stromal-immune score could predict the prognosis of patients with LUAD. Eight to forty-seven percent of LUAD patients experience disease progression, indicating that occult metastases already exist at the time of “curative” surgery (7). Traditional TNM staging classification relies solely on the characteristics of tumor cells, regardless of the effect of the host immune response. In fact, the onsite and progression of tumors are affected both by the tumor cells and the TIME. Tumor progression and patient outcomes have been found to be related to the TIME. Tumor-infiltrating immune cells and stromal cells are 2 major components of the TIME. These cells have been shown to play important roles in recent studies. It is essential to consider immune parameters as prognostic factors and to improve the accuracy of cancer classification using the immunescore.The immunescore is derived from the immune contexture, and is defined by immune cell infiltration based on single-sample gene set enrichment analysis. It was initially established to evaluate the prognosis of patients with stage I, II, and III colon cancer (19). For early-stage patients, the relapse and 5-year survival rates for the high-immunescore group were 4.8% and 86.2%, respectively. However, the relapse and 5-year survival rates for the low-immunescore group were 72% and 27.5%, respectively (20). Therefore, patients in the low-immunescore group could potentially have benefited from adjuvant therapy. Recent studies have proved that the immunescore can also play a role in prognostic prediction in other human malignancies such as hepatocellular carcinoma, pancreatic cancer, melanoma, lung cancer, and even brain metastasis (21,22). Two advantages of the immunescore are as follows: firstly, it appears to be the strongest prognostic factor for survival outcome; and secondly, it provides a tool or target for novel therapeutic approaches (23,24).In 2006, the relationship between the immunescore and clinicopathological features of patients with non-small cell lung cancer (NSCLC) was studied (25). To improve the prognostic, and most possible a predictive tool for NSCLC, Roy M. recommended that the immunescore should be added to the TNM classification (TNM-I) (26). A study showed that the former displayed T-cells, Interferon, and M1 macrophage signatures and was associated with a better prognosis, while the latter presented with M2 macrophage signatures and immunosuppressive factors such as WNT/transforming growth factor-β (27).In LUAD, T cells may cooperate to suppress the tumor. Many subtypes of T cells have been discovered due to the development of immunohistochemical technology. CD3+ is expressed in all T lymphocytes and is thus considered to be a biomarker. CD4+ T helper lymphocytes (Th), CD8+ cytotoxic T lymphocytes (CTLs), CD45RO+ memory T cells, and FoxP3+ (forkhead box P3 lymphocytes, regulatory T cells) regulatory cells (Tregs) are other subtypes of T-lymphocyte cell surface markers (28). High CD3+, CD8+, and CD4+ T-cell infiltration is an independent favorable prognostic factor (29). CD45RO+ has been proposed as a risk factor for disease-free survival (HR =1.79, P=0.001), disease-specific survival (HR =1.80, P=0.001), and OS (HR =1.61, P=0.001) (30). FoxP3 contributes to more aggressive behavior of LUAD (such as solid-predominant subtype), and is associated with poor OS (31). Our result revealed that the stromal component in pulmonary tissue might act as a barrier in tumorigenesis by constraining tumor cell proliferation. Detailed molecular analysis of lung stroma might be important for long-term health management of LUAD patients. Patients with high immune scores yielded better overall survival than those with low immune scores, and this survival advantage was more remarkable after 5 years of follow-up. This finding suggested that, from the beginning of the tumor formation, there was a stronger adaptive immune response in LUAD patients with higher immune scores than in patients with lower immune scores.Apart from the subtypes of TILs, spatial organization of tumor cells, stromal cells, and lymphocytes also plays an important role in tumor progression and metastasis. The immunescore comprises 2 parts, namely the tumor core and its invasive margin. Compared with those in the tumor nest, the numbers of infiltrating CD8+ or CD4+ T cells in the tumor stroma are much higher. Furthermore, the CD8+ T cell stromal score also seems to be superior to the tumor score. More specifically, CD8+ T-cell stromal scoring at the invasive margin is a strong prognostic factor (32).It is unclear how tumor-infiltrating B cells (TIBs) and antibodies interact with each other in the microenvironment. For Kirsten RAt Sarcoma (KRAS) mutation-type LUAD, a low proportion of IgA isotype and a high proportion of IgG1 among all intratumorally produced immunoglobulins were specifically associated with improved overall survival. In STK11 mutation-type LUAD, a high level of IgG4-producing TIBs was associated with better overall survival. The specificity of protective B-cell populations might provide basic information for the development of efficient targeted immunotherapies (33).MHC-II is an essential component of the adaptive immune system and is critical for antigen presentation to CD4+ T lymphocytes. Accumulating evidence has demonstrated that tumor-specific MHC-II is associated with favorable outcomes (34). Furthermore, positive MHC-II expression in tumor cells was found to be associated with improved disease-free survival in triple-negative breast cancer patients with lymph node metastasis (35). Park IA also found that aberrant expression of MHC II in triple-negative breast cancer might trigger an antitumor immune response that reduces the rate of relapse and enhances progression-free survival (36). In anti-PD-1-treated melanoma patients, MHC-II positivity of tumor cells was associated with the therapeutic response (37). For Hodgkin lymphoma, Roemer MGM. reported that genetically driven PD-L1 expression and MHC-II positivity were potential predictors of favorable outcome after PD-1 blockade (38).In this study, a TIME-related prognostic model was constructed to predict the outcomes of LUAD patients based on differentially expressed stromal and immune signature genes. Yue found that a 3-gene signature (ADAM12, BTK, and ERG) could be an independent prognostic factor (39). A total of 37 DEGs related to the estimate, stromal, and immune scores were identified in our study. A gene signature containing CD74, JCHAIN, and PTGDS was utilized to develop a risk stratification model.CD74, also known as the invariant chain (Ii), is the molecular chaperone of MHC-II. It is a high-affinity receptor of the macrophage migration inhibitory factor (MIF), a proinflammatory cytokine. Lee found that CD74-ROS1 fusion represented a genomic signature in early oncogenesis (40). Secretory polymeric immunoglobulins (IgA dimers and IgM pentamers) are unique because, as well as light and heavy chains, they contain joining chains (J-chains) responsible for their oligomerization. These antibodies constitute parts of the local adaptive immune system, acting on the mucosa membranes of the respiratory and digestive systems as the first barrier of protection against potential infectious agents (41). JCHAIN is the coding gene of immunoglobulin J-chains. It has been reported to be related to the clinical outcomes of acute lymphoblastic leukemia patients (42). Along with CHAC2, CLEC9A, and 8 other genes, JCHAIN was also found to be associated with survival in head and neck squamous cell carcinoma (43). Moreover, You reported that JCHAIN might exert a critical influence on resistance and tumorigenesis in Marek’s disease (44).Prostaglandin is well known to be essential for tumor angiogenesis and growth. Mast cell-derived prostaglandin D2 governs the tumor microenvironment by limiting excessive responses to TNF-α production and vascular permeability (45). When added exogenously, PTGDS reportedly suppressed the hyperproliferation and migration of A549 cells (46). Furthermore, Shyu found that overexpression of PTGDS could inhibit the invasion of H-rev107-mediated testis tumor cells through the PGD2-cAMP-SOX9 signaling pathway (47).Immunotherapy has emerged as a key pillar of cancer treatment. Seven 7 genes (PDCD1, LAG3, HAVCR2, CTLA4, KIR2DL3, TNFRSF18, and TNFRSF9) and the 37 prognostic DEGs selected from TCGA were screened using the STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) database, and the protein–protein interaction (PPI) network was constructed. In the current study, PTPRC, FGL2, CD74, HLA-DRB1, CD52, CSF2RB, LY86, PLEK, HLA-DPA1, and HLA-DQB1 were the top 10 immunotherapy target genes in the PPI network. PTPRC, also known as CD45, encodes some members from the protein tyrosine phosphatase (PTP) family. Proteins from this family are commonly activated in tumors. PTPRC has been shown to be an essential regulator of T- and B-cell antigen receptor signaling (48). Fibrinogen-like protein 2 (FGL2) is essential for both innate and adaptive immunity. FGL2 might promote tumor progression by activating cancer-associated fibroblasts in the tumor microenvironment (49). HLA-DRB1 is strongly associated with the risk of lung cancer and seemed to affect antigen presentation. Qin provided evidence for the substantial contributions of HLA II molecules to lung cancer susceptibility (50). CD52 is present on the surface of mature lymphocytes. It inhibits T-cell activation by impairing phosphorylation of the T-cell receptor-associated kinases Lck and Zap70 (51). Taken together, most prognostic DEGs from this study play essential roles in the tumor immune response and interact with immune checkpoint genes that have been verified in recent clinical trials. These newly discovered genes may be the potential targets for future immunotherapy. A nomogram was built after integration of clinicopathological risk factors and estimate, stromal, and immune scores. A higher points total was associated with improved survival. This objective evaluation method was consistent with the risk score model established in the current study.In conclusion, tumor progression and survival outcomes in LUAD depend on the microenvironmental characteristics in the tumor tissue. Gene signatures based on estimate, stromal, and immune scores can provide more prognostic information for the TNM staging system. This risk score model for stratifying prognosis is essential for individualized treatment and follow-up.The article’s supplementary files as
Authors: Matthew E Ritchie; Belinda Phipson; Di Wu; Yifang Hu; Charity W Law; Wei Shi; Gordon K Smyth Journal: Nucleic Acids Res Date: 2015-01-20 Impact factor: 16.971
Authors: Li Wang; Robert P Sebra; John P Sfakianos; Kimaada Allette; Wenhui Wang; Seungyeul Yoo; Nina Bhardwaj; Eric E Schadt; Xin Yao; Matthew D Galsky; Jun Zhu Journal: Genome Med Date: 2020-02-28 Impact factor: 11.117