Literature DB >> 32802850

Identification of an Immune Gene Expression Signature for Predicting Lung Squamous Cell Carcinoma Prognosis.

Yubo Yan1, Minghui Zhang2, Shanqi Xu2, Shidong Xu1.   

Abstract

Growing evidence indicates that immune-related biomarkers play an important role in tumor processes. This study investigates immune-related gene expression and its prognostic value in lung squamous cell carcinoma (LUSC). A cohort of 493 samples of patients with LUSC was collected and analyzed from data generated by the TCGA Research Network and ImmPort database. The R coxph package was employed to mine significant immune-related genes using univariate analysis. Lasso and stepwise regression analyses were used to construct the LUSC prognosis prediction model, and clusterProfiler was used for gene functional annotation and enrichment analysis. The Kaplan-Meier analysis and ROC were used to evaluate the model efficiency in predicting and classifying LUSC case prognoses. We identified 14 immune-related genes to incorporate into our prognosis model. The patients were divided into two subgroups (Risk-H and Risk-L) according to their risk score values. Compared to Risk-L patients, Risk-H patients showed significantly improved overall survival (OS) in both training and testing sets. Functional annotation indicated that the 14 identified genes were mainly enriched in several immune-related pathways. Our results also revealed that a risk score value was correlated with various signaling pathways, such as the JAK-STA signaling pathway. Establishment of a nomogram for clinical application demonstrated that our immune-related model exhibited good predictive prognostic performance. Our predictive prognosis model based on immune signatures has potential clinical implications for assessing the overall survival and precise treatment for patients with LUSC.
Copyright © 2020 Yubo Yan et al.

Entities:  

Mesh:

Year:  2020        PMID: 32802850      PMCID: PMC7338973          DOI: 10.1155/2020/5024942

Source DB:  PubMed          Journal:  Biomed Res Int            Impact factor:   3.411


1. Introduction

Lung cancer remains the leading cause of cancer incidence and mortality worldwide [1]. Non-small cell lung cancer (NSCLC) is the most common type of lung cancer and is classified into two major histological subtypes, lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC), each with distinct genomic and immunological profiles [2]. The discovery of epidermal growth factor receptor (EGFR), anaplastic lymphoma kinase (ALK), and ROS proto-oncogene 1 (ROS1) gene targets and the development of corresponding target drugs have prolonged the survival of patients with NSCLC [3]. Currently, progress has been slow in the development of LUSC treatments due to the lack of effective targets; however, continuous developments in immunotherapy have provided a new direction for LUSC treatment [4]. Immunocyte infiltration, which is speculated to represent the active tumor response, can be detected among most solid tumors in humans; specifically, lymphocyte infiltration in LUSC has certain survival benefits [5]. Therefore, understanding the immune gene signatures of LUSC is highly significant as it could have predictive prognosis implications. At present, the tumor-node-metastasis (TNM) classification system has been recognized as the most meaningful indicator for prognosis and can inform therapeutic decisions for LUAD as well as LUSC treatment [6]. Nonetheless, this classification system is imprecise because various progression levels and overall survival (OS) results can be observed among cases in the same stage. Therefore, novel markers are urgently needed to recognize patients with high recurrence risk. A precisely indicated prognosis significantly affects a clinician's decision to recommend adjuvant therapy. Additionally, there is increasing need to improve prognosis prediction tools. Biomarkers can reliably predict disease prognosis as well as patient survival. As a result, they are meaningful in the decision-making process for clinical LUSC treatment. In recent years, an increasing number of articles have recommended that gene expression profiles can be applied to predict and stratify the survival prognosis of LUSC cases [7, 8]. However, the role of immune-related genes in LUSC is unclear. Therefore, openly accessible large databases that contain gene expression profiles allow us to mine creditable biomarkers for predicting and classifying LUSC prognosis. This study aimed at establishing and verifying a prognosis prediction model for LUSC based on genes related to immunity and patient clinical features derived from the Cancer Genome Atlas (TCGA) Research Network and ImmPort database.

2. Materials and Methods

2.1. Data Collection

Gene expression and clinical LUSC patient data were downloaded from the TCGA Research Network (https://www.cancer.gov/tcga), and the gene set related to immunity was obtained from the ImmPort database (https://www.immport.org). The raw data were preprocessed as follows: (1) samples without clinical data were removed; (2) normal tissue sample data were removed; (3) genes with fragments per kilobase per million reads (FPKM) values of 0 in more than half the samples were removed; and (4) the expression profiles of immune-related genes were saved. After preprocessing, 493 samples comprising 1421 immune-related genes were utilized for further model analysis. The 493 samples were randomized into training and test sets. All samples underwent 500 iterations of random grouping with replacement to eliminate the impact of random allocation bias on model stability. Data in the training (n = 245) and test (n = 248) sets are presented in Table 1. There was no statistically significant difference between the two sets, which indicated reasonable sample grouping.
Table 1

Patient characteristics with lung squamous cell carcinoma in training and testing sets.

Clinical featuresOverallTraining setTesting set p value
OS4932452480.9383
Event4932452480.9293
Alive284140144
Dead209105104
T4932452480.4717
 T11144965
 T2286146140
 T3704030
 T4231013
N4932422460.7437
 N0316160156
 N11276265
 N2401921
 N3514
 NX532
M4932082040.6093
 M0405206199
 M1725
 MX813744
Stage4932442450.4364
 I241116125
 II1588870
 III833845
 IV725
 X413
Age4932452480.6387
 0~5022913
 50~60734033
 60~701818596
 70~801919794
 80~100261412
Subdivision4932352400.8473
 Bronchial1055
 L-Lower743341
 L-Upper1356867
 R-Lower1065254
 R-Middle18108
 R-Upper1326765
Gender4932452480.9648
 Female1286365
 Male365182183

2.2. Prognostic Signature

The correlation of immune-related gene expression with patient OS was assessed through the univariate Cox proportional hazards regression analysis using the survival coxph function of the R package. Genes with p values < 0.05 were identified as candidate genes. Subsequently, the number of candidate genes was reduced according to the least absolute shrinkage and selection operator lasso-Cox method using the glmnet and MASS function of the R package. Genes most significantly related to immunity were selected to construct the prognosis risk score model. The risk score model was formulated as follows: where βi represents the coefficient of every gene and χi stands for gene expression level (FPKM). The median risk score value was the threshold for classifying samples into high-risk (Risk-H) or low-risk (Risk-L) groups. ROC and the Kaplan-Meier (KM) analyses were carried out to evaluate model efficiency, stability, and accuracy in predicting and classifying LUSC case prognoses.

2.3. Functional Annotations

Eventually, 14 genes were selected and their gene families annotated according to human gene classification within the HUGO Gene Nomenclature (HGNC) database. The R package clusterProfiler was employed to carry out enrichment analysis on the 14 screened genes related to immunity and specific to prognosis. The KEGG enrichment analysis score was evaluated using the ssGSEA function of the R package GSVA [9]. Association with the risk score value was also calculated. Clustering analysis was then carried out according to the pathway enrichment score for each sample.

2.4. Association between Risk Score Value and Clinical Features

Associations between relevant clinical factors (such as stage (T, N, or M), subdivision, age, and smoking habit) and risk score value were analyzed. Then, a nomogram model was constructed, and a forest plot was drawn according to relevant clinical features and risk score values. The associations between risk score value and clinical features related to patient survival were also analyzed.

2.5. Statistical Analyses

Independent subgroups were analyzed using the Chi-square test or Fisher's exact test. Univariate and multivariate analyses were performed using the Cox regression. Differences in OS between high- and low-risk groups were evaluated according to the Kaplan-Meier survival curve. The sensitivity and specificity of the diagnosis and prognosis prediction model were determined and assessed using the ROC area under the curve (AUC). The Kruskal-Wallis test was used to evaluate the relationships of risk score with different clinical factors. A two-tailed p value of < 0.05 was recognized as statistically significant. Statistical analyses were performed using the R software (Version 3.5.5; R Core Team, 2016).

3. Results

3.1. Data Processing

Sixty-six immune-related, prognosis-specific genes were mined. The p value relationships of the 66 genes with hazard ratios (HRs) and expression levels are displayed in Figure 1.
Figure 1

Differential gene expression in lung squamous cell carcinoma. (a) The relationships of the -log10 (p values) and HR. (b) The expression levels of 66 differentially expressed genes. Red dots represent significantly different immune-related genes (p < 0.05) regarding prognosis.

3.2. Establishment of the Prognosis Prediction Model

Sixty-six immune-related genes were identified, although that number was inappropriately high for use in clinical detection. Therefore, the scope of genes related to immunity was narrowed to maintain high accuracy. The 66 genes were compressed through lasso regression to reduce the number of genes incorporated in the risk model. The variation trajectories for all independent variables (Figure 2(a)) suggested that the coefficients of a larger number of independent parameters were close to 0 as lambda gradually increased. The confidence interval (CI) under every lambda (Figure 2(b)) revealed that the best model was obtained at a lambda value of 0.03, which was consequently chosen for the eventual model that included 26 immunity-related genes. In addition, the MASS of the R package was utilized in stepwise regression analysis based on Akaike data criteria to obtain 14 genes used to construct the risk model.
Figure 2

Construction of the prognosis prediction model for LUSC patients by LASSO. (a) The changing trajectory of each independent variable. The horizontal axis represents the log value of the independent variable lambda, and the vertical axis represents the coefficient of the independent variable. (b) Confidence intervals for each lambda.

Each sample from the training cohort was then incorporated into the formula for calculating the risk score value. The OS for all samples is shown in Figure S1. Analysis of the model efficiency in predicting the 1-5-year OS resulted in a mean AUC value reaching 0.703 (Figure 3(a)). Sample distributions in Risk-H and Risk-L groups under different OS durations suggested that the 5-year sample size of the Risk-H group was reduced relative to that of Risk-L group (Figures 3(b) and 3(c)). The sample clustering results in the training cohort are presented in Figure 3(d). The 14 genes were clustered into high and low expression groups (Figure 3(e)). To verify the credibility of the prognosis prediction model, the expression profiles of the 14 genes were collected from the test cohort and incorporated into the verification model. The risk score values for the samples in the test cohort corresponded with those in the training cohort (Figure S2). To further verify model creditability and stability in prognosis prediction, the expression profiles of the 14 genes collected from 493 samples were incorporated into the model to calculate the risk score values. The results were consistent with the test set validation results (Figure S3). Taken together, the prognosis prediction model based on 14 immune-related gene expression profiles displayed superb stability and predictive accuracy in identifying immune-related characteristics.
Figure 3

Verification of the stability of the prognosis prediction model for patients with lung squamous cell carcinoma in the training cohort. (a) Survival predicted ROC curves for the training cohort. (b) Distribution of samples in Risk-H and Risk-L groups of the training cohort divided by different OS. (c) The proportion of low-risk samples in total samples varies with OS. (d) Clustering results of the training cohort. (e) Differences in risk score values between Risk-H and Risk-L groups clustered by gene expression in the training cohort.

KM survival curves were plotted for the risk model based on 14 genes in the Risk-H and Risk-L groups of the training cohort, test cohort, and the whole dataset (combined cohort). The KM survival curves of the training, test, and combined cohorts are displayed in Figure 4(a) (p < 0.001), Figure 4(b) (p = 0.003), and Figure 4(c) (p < 0.001), respectively.
Figure 4

The Kaplan-Meier survival curve of the 14-gene risk model in predicting the Risk-H and Risk-L groups on the training set (a), testing set (b), and all samples (c).

3.3. Functional Annotation of Immunity-Related Genes

The 14 gene families annotated based on human gene classification in the HGNC database (Table 2) were enriched in the endogenous ligands and latent transforming growth factor β-binding proteins (LTBP) gene families. Moreover, the expression levels of four genes (END2, CXCL5, APLN, and LTBP2) from these two gene families differed significantly between the Risk-H and Risk-L groups (Figure 5).
Table 2

Gene function annotation results.

Gene familyGenes p value
Endogenous ligandsEDN2/CXCL5/APLN0.0003
Latent transforming growth factor beta-binding proteinsLTBP20.0030
Heat shock 70 kDa proteinsHSPA50.0108
M10 matrix metallopeptidasesMMP90.0150
Caspase recruitment domain containingMAVS0.0185
SH2 domain containingPTPN110.0599
Pleckstrin homology domain containingAKT20.1180
UnknownIGLV8.61/IGHV3.73/IGLV4.60/PLAU/IGKV1.6:1
Figure 5

The expression differences of the EDN2 (a), CXCL5 (b), APLN (c), and LTBP2 (d) between the Risk-H and Risk-L groups.

3.4. Association between Risk Score Value, Signal Pathways, and Sample Clinical Features

The KEGG functional enrichment scores of all samples analyzed using the ssGSEA function of the R software GSVA package were correlated with risk score values and resulted in the acquisition of 41 relevant KEGG pathways. Cluster analysis was performed according to enrichment scores as shown in Figure 6. The most correlated pathway was the JAK/STAT signaling pathway.
Figure 6

Correlation of risk score with signaling pathways.

The relationship between different clinical parameters (including stage (T, N, or M), gender, subdivision, age, and smoking habit) and risk score value was explored (Figure S4). The clinical features did not reveal a relationship with risk score value, except for age, indicating that risk score was relatively independent of the evaluated clinical characteristics.

3.5. Nomogram Prediction Model Establishment

Risk score value was used in combination with clinical features to establish the nomogram model (Figure 7) in which risk score exhibited a pronounced association, with the greatest influence on survival rate prediction. This suggested that the risk model based on 14 genes displayed favorable performance in predicting the prognosis of LUSC. The forest plot based on risk score value and clinical features (Figure 8) indicated a risk score HR of 1.54 (p < 0.001).
Figure 7

The nomogram model constructed by combining the stage-T, stage-N, stage-M, age, subdivision, and risk score.

Figure 8

The forest plot constructed by combining the stage-T, stage-N, age, smoking, and risk score.

4. Discussion

Our study developed a novel prognostic model employing 14 immune-related genes using data from the TCGA Research Network and ImmPort database. This prognostic model successfully predicted LUSC patient prognosis. Surgical resection offers the most effective treatment for early-stage LUSC [10]. Adjuvant chemotherapy or EGFR-TKI improves the survival of stage II–III lung cancer patients after surgery [11, 12]. Therefore, adjuvant chemotherapy has been the standard care for resected stage II–III LUSC patients albeit many patients do not benefit from this form of chemotherapy. This phenomenon may be related to tumor heterogeneity. Our prediction model accurately identified early LUSC patients at high risk of recurrence. The association between the immune system and pathogenesis, as well as the progression of malignancies, has drawn increasing attention in recent years. Unlike the rapid development of LUAD treatment strategies, LUSC treatment options have progressed more slowly. Recently, immune checkpoint inhibitors that target programmed cell death 1 (PD-1) and its ligand (PD-L1) have shifted the paradigm in LUSC treatment. To date, several anti-PD-1/PD-L1 antibodies have been approved for patients with advanced NSCLC [13-15]. Emerging evidence indicates that PD-L1 expression could predict anti-PD-1/PD-L1 therapy response in patients with NSCLC [16]. Inspiringly, the latest reports have demonstrated that gene profiling has the potential to predict patient response to immune checkpoint inhibitors [17-19]. In addition, the association of risk score value with relevant signal pathways was explored with JAK/STAT revealed as the most significantly correlated pathway. A previous study indicated that the JAK/STAT signaling pathway plays an important role in immunity regulation in the tumor microenvironment [20]. Given our results, drug-induced interference with the expression of this pathway may provide a new direction for LUSC treatment. Distribution of the 14 immune-related genes was investigated in Risk-H and Risk-L samples. Seven of the 14 genes, including PTPN11, MAVS, CXCL5, PLAU, MMP9, AKT2, and HSPA5, reportedly participate in the pathological processes of the immune microenvironment, as well as the pathogenesis, malignant transformation, and progression of LUSC, which exhibited marked correlation with patient survival and prognosis [21-26]. Our results demonstrated that bioinformatics mining using available research is highly reliable and accurate. Nonetheless, the association between LUSC and EDN2 and LTBP2 genes, which may be enriched in the endogenous ligand and LTBP gene families, has not been verified in either basic or clinical studies. EDN2 is reportedly involved in regulating malignant cancer cell proliferation and invasion, which can affect cytokine-mediated signaling pathways as well as modulate the activation and chemotaxis of immunocytes [27]. At the same time, LTBP2 has been established as a prognostic marker for diverse cancer types and can control tumor cell sensitivity to immunotherapy [28, 29]. Elucidation of the roles of END2 and LTBP2 in NSCLC is currently underway in our laboratory. There were several limitations of the present study. First, our study was based on data from public datasets without prospective testing. Second, of the immune-related genes used in the prognostic model, the roles of seven genes in NSCLC are unclear. Their prognostic value should be validated by other cohorts. Third, whether patients received immunotherapy is uncertain; therefore, the predictive value of the prognostic model for immunotherapy could not be directly evaluated.

5. Conclusions

We identified new prognostic markers for LUSC that contribute to classifying patients with LUSC based on their immune molecular subtypes. Our predictive prognosis model based on immune signatures has potential clinical implications for assessing the overall survival. These findings should be validated in prospective studies.
  29 in total

1.  Atezolizumab versus docetaxel in patients with previously treated non-small-cell lung cancer (OAK): a phase 3, open-label, multicentre randomised controlled trial.

Authors:  Achim Rittmeyer; Fabrice Barlesi; Daniel Waterkamp; Keunchil Park; Fortunato Ciardiello; Joachim von Pawel; Shirish M Gadgeel; Toyoaki Hida; Dariusz M Kowalski; Manuel Cobo Dols; Diego L Cortinovis; Joseph Leach; Jonathan Polikoff; Carlos Barrios; Fairooz Kabbinavar; Osvaldo Arén Frontera; Filippo De Marinis; Hande Turna; Jong-Seok Lee; Marcus Ballinger; Marcin Kowanetz; Pei He; Daniel S Chen; Alan Sandler; David R Gandara
Journal:  Lancet       Date:  2016-12-13       Impact factor: 79.321

2.  Pembrolizumab versus docetaxel for previously treated, PD-L1-positive, advanced non-small-cell lung cancer (KEYNOTE-010): a randomised controlled trial.

Authors:  Roy S Herbst; Paul Baas; Dong-Wan Kim; Enriqueta Felip; José L Pérez-Gracia; Ji-Youn Han; Julian Molina; Joo-Hang Kim; Catherine Dubos Arvis; Myung-Ju Ahn; Margarita Majem; Mary J Fidler; Gilberto de Castro; Marcelo Garrido; Gregory M Lubiniecki; Yue Shentu; Ellie Im; Marisa Dolled-Filhart; Edward B Garon
Journal:  Lancet       Date:  2015-12-19       Impact factor: 79.321

Review 3.  Jak Stat signaling and cancer: Opportunities, benefits and side effects of targeted inhibition.

Authors:  Bernd Groner; Viktoria von Manstein
Journal:  Mol Cell Endocrinol       Date:  2017-05-30       Impact factor: 4.102

4.  Loss of PTEN causes SHP2 activation, making lung cancer cells unresponsive to IFN-γ.

Authors:  Chia-Ling Chen; Tzu-Hui Chiang; Po-Chun Tseng; Yu-Chih Wang; Chiou-Feng Lin
Journal:  Biochem Biophys Res Commun       Date:  2015-09-16       Impact factor: 3.575

5.  Comparison of lung cancer cell lines representing four histopathological subtypes with gene expression profiling using quantitative real-time PCR.

Authors:  Takashi Watanabe; Tomohiro Miura; Yusuke Degawa; Yuna Fujita; Masaaki Inoue; Makoto Kawaguchi; Chie Furihata
Journal:  Cancer Cell Int       Date:  2010-01-21       Impact factor: 5.722

6.  NCCN Guidelines Insights: Non-Small Cell Lung Cancer, Version 5.2018.

Authors:  David S Ettinger; Dara L Aisner; Douglas E Wood; Wallace Akerley; Jessica Bauman; Joe Y Chang; Lucian R Chirieac; Thomas A D'Amico; Thomas J Dilling; Michael Dobelbower; Ramaswamy Govindan; Matthew A Gubens; Mark Hennon; Leora Horn; Rudy P Lackner; Michael Lanuti; Ticiana A Leal; Rogerio Lilenbaum; Jules Lin; Billy W Loo; Renato Martins; Gregory A Otterson; Sandip P Patel; Karen Reckamp; Gregory J Riely; Steven E Schild; Theresa A Shapiro; James Stevenson; Scott J Swanson; Kurt Tauer; Stephen C Yang; Kristina Gregory; Miranda Hughes
Journal:  J Natl Compr Canc Netw       Date:  2018-07       Impact factor: 11.908

7.  A meta-analysis of adjuvant EGFR-TKIs for patients with resected non-small cell lung cancer.

Authors:  Hua Cheng; Xiao-Jian Li; Xiao-Jin Wang; Zuo-Wen Chen; Rui-Qi Wang; Hong-Cheng Zhong; Tian-Chi Wu; Qing-Dong Cao
Journal:  Lung Cancer       Date:  2019-08-05       Impact factor: 5.705

8.  LTBP2 promotes the migration and invasion of gastric cancer cells and predicts poor outcome of patients with gastric cancer.

Authors:  Jun Wang; Wen-Jia Liang; Guang-Tao Min; Hong-Peng Wang; Wei Chen; Nan Yao
Journal:  Int J Oncol       Date:  2018-04-04       Impact factor: 5.650

9.  Identification of seven-gene signature for prediction of lung squamous cell carcinoma.

Authors:  Zhe Wang; Zhongmiao Wang; Xing Niu; Jie Liu; Zhuning Wang; Lijie Chen; Baoli Qin
Journal:  Onco Targets Ther       Date:  2019-07-24       Impact factor: 4.147

10.  GSVA: gene set variation analysis for microarray and RNA-seq data.

Authors:  Sonja Hänzelmann; Robert Castelo; Justin Guinney
Journal:  BMC Bioinformatics       Date:  2013-01-16       Impact factor: 3.169

View more
  4 in total

1.  Genomic Analysis Reveals the Prognostic and Immunotherapeutic Response Characteristics of Ferroptosis in Lung Squamous Cell Carcinoma.

Authors:  Yinhe Feng; Xingyu Xiong; Yubin Wang; Ding Han; Chunfang Zeng; Hui Mao
Journal:  Lung       Date:  2022-05-05       Impact factor: 2.584

2.  A nomogram based on A-to-I RNA editing predicting overall survival of patients with lung squamous carcinoma.

Authors:  Li Liu; Jun Liu; Xiaoliang Deng; Li Tu; Zhuxiang Zhao; Chenli Xie; Lei Yang
Journal:  BMC Cancer       Date:  2022-06-29       Impact factor: 4.638

3.  The clinical significance of preoperative serum triglyceride, high-density lipoprotein, and low-density lipoprotein levels in lung squamous cell carcinoma.

Authors:  Zhupeng Li; Jianfeng Xu; Weizhong Feng; Zhifeng Ma; Yuanling Wu; Ting Zhu; Peng Xu; Lingjun Dong; JianYi Ding; Junqing Zhou; Guangmao Yu
Journal:  Sci Rep       Date:  2022-10-07       Impact factor: 4.996

4.  A Novel Immune-Related Seventeen-Gene Signature for Predicting Early Stage Lung Squamous Cell Carcinoma Prognosis.

Authors:  Tao Fan; Zhiliang Lu; Yu Liu; Liyu Wang; He Tian; Yujia Zheng; Bo Zheng; Liyan Xue; Fengwei Tan; Qi Xue; Shugeng Gao; Chunxiang Li; Jie He
Journal:  Front Immunol       Date:  2021-06-11       Impact factor: 7.561

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.