Literature DB >> 33081731

Development and validation of a fourteen- innate immunity-related gene pairs signature for predicting prognosis head and neck squamous cell carcinoma.

Fujun Zhang1, Yu Liu2, Yixin Yang1, Kai Yang3.   

Abstract

BACKGROUND: Immune-related genes is closely related to the occurrence and prognosis of head and neck squamous cell carcinoma (HNSCC). At the same time, immune-related genes have great potential as prognostic markers in many types of cancer. The prognosis of HNSCC is still poor currently, and it may be effective to predict the clinical outcome of HNSCC by immunogenic analysis.
METHODS: RNASeq and clinical follow-up information were downloaded from The Cancer Genome Atlas (TCGA), the MINiML format GSE65858 chip expression data was downloaded from NCBI, and immune-related genes was downloaded from the InnateDB database. Immune-related genes in 519 HNSC patients were integrated from TCGA dataset. By using multivariate COX analysis and Lasso regression, robust immune-related gene pairs (IRGPs) that predict clinical outcomes of HNSCC were identified. Finally, a risk prognostic model related to immune gene pair was established and verified by clinical features, test sets and GEO external validation set.
RESULTS: A total of 699 IRGPs were significantly correlated with the prognosis of HNSCC patients. Fourteen robust IRGPs were finally obtained by Lasso regression and a prognostic risk prediction model was constructed. Risk score of each sample were calculated based on Risk models and divided into the high-risk group (Risk-H) and low Risk group (Risk-L). Risk models were able to stratify the risk in patients with TNM Stage, Age, gender, and smoking history, and the AUC > 0.65 in training set and test set, shows that 14-IRGPs signature in patients with HNSCC has excellent classification performance. In addition, 14-IRGPs had the highest average C index compared with the prognostic characteristics and T, N, and Age of the 3 previously reported HNSCC.
CONCLUSION: This study constructed 14-IRGPs as a novel prognostic marker for predicting survival in HNSCC patients.

Entities:  

Keywords:  Bioinformatics; HNSCC; Immune-related gene pairs; Riskscore; TCGA

Mesh:

Substances:

Year:  2020        PMID: 33081731      PMCID: PMC7574345          DOI: 10.1186/s12885-020-07489-7

Source DB:  PubMed          Journal:  BMC Cancer        ISSN: 1471-2407            Impact factor:   4.430


Background

Human head and neck squamous cell carcinoma (HNSCC) is one of the most common tumors today, with approximately 550,000 people worldwide suffering from this disease each year, and approximately 300,000 patients die [1]. Long-term repeated inflammatory stimuli are considered to be one of the main causes of the disease, including smoking, drinking, repeated trauma, and human papillomavirus (HPV) infection [2]. HNSCC is characterized by high proliferative, regional lymph node metastasis and poor prognosis [3]. It is urgent to investigate the development of novel and sensitive HNSCC tumor prognostic markers to reduce the number of HNSCC patients not diagnosed prior to the onset of invasive disease. Cancer immunotherapy aims to enhance the activity of the immune system to fight cancer, has always been the main driving force of personalized medicine [4, 5]. In recent decades, immunotherapy has developed rapidly and has become a treatment for many cancers [6]. The expression of PD-L1 is usually higher in HNSCC tumors with a positive rate of 46 to 100% in several studies [7]. Tadalafil and anti-tumor vaccine-mediated immune rejection reversal also lead to up-regulation of PDL1 in recurrent HNSCC, suggesting that immunological checkpoint treatment may be effective in patients with HNSCC [8]. In 2016, the US food and drug administration (FDA) approved the first immunotherapy treatments- nivolumab and pembrolizumab for patients with recurrent (HNSCC with platinum-based regimens that are difficult to treat) [9]. Although these findings support the importance of immunology in HNSCC, the molecular mechanisms remain unclear, especially for immune-related genomic effects. With the advent of public large-scale gene expression data sets, cancer researchers have been able to accurately identify tumor-related prognostic biomarkers [10]. Li et al analyzed the prognostic value of IRGPs to develop individualized immune features that improve prognosis in patients with non-squamous non-small cell lung cancer [11]. However, the clinical relevance and prognostic significance of IRGPs in HNSCC have not been studied in depth. In this study, we integrated immune-related genes in 519 HNSCC patients based on the TCGA dataset. Multivariate COX analysis and Lasso regression were used to identify robust IRGPs that predicted HNSCC clinical outcomes and establish a risk prognosis model related to immune gene pairs. IRGPs was found to be a strong prognostic biomarker and predictor of HNSCC.

Methods

Data collection and processing

In April 30, 2019, RNA-seq data and the latest clinical follow-up information were downloaded from TCGA using GDC API, including 612 RNA-seq data samples. Similarly, a set of chip data set GSE65858 in MINiML format were downloaded from NCBI, including the expression profile data and clinical follow-up information of 270 HNSCC sample. All patients underwent surgery with a negative surgical margin, receive no adjuvant or neoadjuvant therapy. A total of 1039 immune-related genes (removing the name-repeated gene) were downloaded from the InnateDB database (https://www.innatedb.com/). For the TCGA RNAseq data, we screened 517 tumor samples with follow-up information and OS > 0, extracted the expression profile of the immune-related gene set and removed the gene with 0 expression level in 50% of the samples. For chip data sets, we screened samples with follow-up information and OS > 0, R package GEOquery was used to map the chip probes to GeneSymbol, the probes was mapped to multiple genes were removed, multiple probes were mapped to a single gene to take the median, gene expression profile were obtained, and the expression profile of the immune gene set were extracted. The clinical information of TCGA and GEO patients is shown in Table 1. The workflow is shown in Fig. 1.
Table 1

Clinical information of data sets

CharacteristicTCGA dataset (n = 517)GSE65858 (n = 270)
Age (years)<=6025641
> 60261229
Survival statusLiving297176
Dead22094
Genderfemale13647
male381223
GradeG 161
G 2303--
G 3124--
G 47--
pathologic_TT 13635
T 214980
T 313658
T 418497
pathologic_NN 024494
N 18332
N 2162132
N 3912
pathologic_MM 0491263
M 1/ M X237
Tumor stageStage I2718
Stage II8137
Stage III9337
Stage IV316178
SmokingNon-Smoking11748
Smoking388222
Fig. 1

Work flow chart

Clinical information of data sets Work flow chart

Sample grouping

For better model building and validation, we randomly divided the TCGA data set into two groups, one as a training set (N = 260), one as an internal validation set (N = 259), and the GSE65858 data set as an independent external validation set. During the random grouping process of TCGA, we kept the two groups of samples similar in age distribution, clinical stage, follow-up time, and proportion of patient deaths, while the number of samples after clustering the gene expression profiles of the two groups was close to each other, and the statistical characteristics of the two samples are shown in Table 2.
Table 2

Sample statistics for training set and validation set

CharacteristicTrainingSet (n = 260)TestingSet (n = 257)p value
Age (years)<=601241320.38
> 60136125
Survival statusLiving1481490.878
Dead112108
Genderfemale69670.983
male191190
GradeG 129320.653
G 2152151
G 36559
G 452
pathologic_TT 119170.886
T 27277
T 37066
T 49589
pathologic_NN 01311130.004
N 15033
N 26498
N 363
pathologic_MM 02442470.1
M 1/ M X167
Tumor stageStage I16110.444
Stage II4041
Stage III5241
Stage IV152164
SmokingNon-Smoking55620.511
Smoking198190
Sample statistics for training set and validation set

Construction of IRGPs

A total of 539,241 gene pairs were obtained by randomly permutation and combination of 1039 immune genes. For arbitrary gene i (IRG) and gene j (IRG), IRGP,were calculated. The IRGP values were defined as follows: Where IRG indicates the amount of gene expression, we calculated all IRGPs values for all samples and further filtered IRGPs with a standard deviation of 0, a total of 18,182 IRGPs were obtained.

Univariate cox survival analysis

Univariate Cox proportional hazard regression analysis was performed on each IRGPs as in Jin-Cheng et al. [12] to screen for those genes that were significantly associated with OS in the training data set, with p < 0.05 as the threshold.

Screening for robust immune-related prognostic features

LASSO is a popular method for regression modeling with a large number of potential prognostic features, because it can perform automatic feature selection in a manner that results in signatures with generally good prognostic performance [13]. The LASSO method has been extended to the Cox model for survival analysis and has been successfully applied for the purpose of building sparse signatures for survival prognosis in many application areas including oncology [14-16], First, we used training set samples to conduct univariate Cox proportional risk regression analysis for each IRGPs, with log rank p < 0.05 as the threshold, 669 IRGPs with significantly correlated prognoses were identified. Furthermore, R software package glmnet [17] was used to screen robust prognostic immune-related gene pairs, and 3-fold cross validation was used to evaluate the optimal characteristics. The degree of LASSO regression complexity adjustment is controlled by the parameter λ, where the larger λ is, the greater the penalty for a linear model with more variables, so that a model with fewer variables is eventually obtained. In this study, the optimal model is obtained when λ = 0.1218186, and we choose the features incorporated in the model at this time as the optimal combination of features, i.e., 14-IRGPs. Multivariate Cox regression analysis was conducted using the stepwise regression method to determine the coefficient of each IRGPs in the 14-IRGPs, and the following risk score model was constructed: Where N is the number of prognostic IRGPs, Exp is the IRGP value of prognostic IRGPs, and e is the estimated regression coefficient of IRGPs in the multivariate Cox regression analysis.

Validation and assessment of the IRGPs signature

To validate the IRGPs signature, patients in test datasets were divided into low risk and high risk group according to the median value of the risk score, which calculated according to the prognostic signature. The log-rank test and Cox regression analysis were conducted to evaluate overall survival difference between the low risk and high risk groups. Receiver operating characteristic curve (ROC) curve was used to assess the categorization of IRGPs signature. The IRGPs signature was also compared with the published signature by KM survival curve, ROC curve, and C-index.

RiskScore and clinical characteristics

In order to observe the relationship between riskScore and clinical phenotype, the samples were divided into two groups based on the riskScore median of the samples, and the prognosis differences between high riskScore and low riskScore were compared respectively. Similarly, the relationship Grade, Age and Stage in High and Low TMEScore was analyzed.

Functional enrichment analysis

We used R package clusterProfiler, v3.8 [18] for GO and KEGG enrichment analysis with a p value of less than 0.05 as the threshold. GSEA [19] was performed by R package GSVA using the MSigDB [20]. Gene sets with a false discovery rate (FDR) value less than 0.05 after performing 1000 permutations were considered to be significantly enriched.

Statistical analysis

The Kaplan-Meier (KM) curve was plotted when the median risk score in each data set was used as a cutoff to compare the risk of survival between the high risk group and the low risk group. Multivariate Cox regression analysis was performed to test whether gene markers are independent prognostic factors. Significance was defined as P < 0.05. AUC analysis was performed using the R package pROC. All analyses use default parameters except for special instructions, which are performed in R software version 3.4.3.

Results

Identification of IRGPs in patients with HNSCC

For the TCGA training set samples, we used a univariate Cox proportional hazard regression model to establish the relationship between patient overall survival and immune-related gene expression, and obtained 164 prognostic genes. According to the calculation rule of the IRGPs value, a total of 7374 IRGPs are obtained. The univariate Cox proportional hazards regression model was used to establish the relationship between IRGPs and overall patient survival. Finally, we obtained 699 IRGPs with significant prognostic differences (Fig. 2a). In order to screen robust immune-related prognostic gene pairs, we used lasso regression to perform dimensionality reduction analysis on these 699 IRGPs. The results show that as the lambda increases, the number of independent coefficients tends to 0 (Fig. 2b), 3-fold cross-validation was used to build the model, and the model is optimal when lambda = 0.1218186 (Fig. 2c). We select the model when lambda = 0.1218186 as the final model, which contains a total of 14 IRGPs, 19 genes (Table 3). The risk scores of these 14 IRGPs in each sample are shown in Fig. 2d. Furthermore, we calculated the Risk Score of each sample based on the Risk model, and the formula is as follows:
Fig. 2

a: The relationship between the P value of 699 IRGPs and HR. Red indicated log rank p < 0.05 IRGPs. b: The trajectory of each independent variable, the horizontal axis represents the log value of the independent variable lambda, and the vertical axis represents the coefficient of the independent variable. c: The confidence interval under each lambda. d: Relationship between 14 IRGPs and risk scores

Table 3

14 IRGPs associated with prognosis

IRGPsCoefP valueHRLow.95.CI.High.95.CI.
ARHGAP15_VS_CTSG−0.04150.0018339360.5070.3310.778
BTK_VS_CCR7−0.00150.0004716430.5120.3520.745
BTK_VS_CTSG−0.03710.001013360.5120.3430.763
CAMK2A_VS_CLEC6A−0.07280.0010001550.3350.1750.642
CAMK2A_VS_MASP1−0.15352.82E-050.4520.3120.656
CCL17_VS_SEMA3A0.02420.000235242.0191.3882.935
CDKN2A_VS_CLTC0.14320.0009242652.8661.5375.344
DUSP16_VS_TRIM60.12027.15E-054.0762.0378.154
IKBKB_VS_TRIB30.0070.0010151711.8791.2902.738
MASP1_VS_SEMA3A0.00330.0002002492.1221.4273.154
MASP1_VS_TRIM60.07280.0002888982.3071.4683.626
ORAI1_VS_SUGT10.07510.0004571452.6171.5284.481
SEMA3A_VS_SLAMF1−0.06630.0001909530.4590.3050.691
STAP2_VS_TRIB30.15491.47E-055.5932.56712.186
a: The relationship between the P value of 699 IRGPs and HR. Red indicated log rank p < 0.05 IRGPs. b: The trajectory of each independent variable, the horizontal axis represents the log value of the independent variable lambda, and the vertical axis represents the coefficient of the independent variable. c: The confidence interval under each lambda. d: Relationship between 14 IRGPs and risk scores 14 IRGPs associated with prognosis

14-IRGPs signature could be used as a prognostic marker

Multivariate regression analysis was used to establish a risk model for 14 IRGPs in the training set, validation set, TCGA dataset and independent test set data (GSE65858 dataset) for 1, 3, and 5 years. The results suggest that the average AUC of the training set is 0.758, the average AUC of the validation set is 0.659, the average AUC of the TCGA dataset is 0.709, and the average AUC of the independent test set data is 0.685 (Fig. 3a-d). With the median risk score as the threshold, the training set samples were divided into risk-H and risk-L, and the KM survival curves of 14-IRGPs in training set, validation set, all data sets of TCGA and one independent GEO testsets (GSE65858 dataset) were drawn. The results showed that the prognosis of the risk-L group of all data sets was significantly better than that of the risk-H group (Fig. 3e-h). In summary, IRGPs have great potential as prognostic markers.
Fig. 3

a: The risk model ROC of 14 IRGPs after lasso regression in TCGA training dataset. b: The risk model ROC of 14 IRGPs after lasso regression in TCGA validation dataset. c: The risk model ROC of 14 IRGPs after lasso regression in TCGA dataset. d: The risk model ROC of 14 IRGPs after lasso regression in GSE65858 dataset. e: KM survival curve of 14 IRGPs in TCGA training dataset. f: KM survival curve of 14 IRGPs in TCGA validation dataset. g: KM survival curve of 14 IRGPs in TCGA dataset. h: KM survival curve of 14 IRGPs in GSE65858 dataset

a: The risk model ROC of 14 IRGPs after lasso regression in TCGA training dataset. b: The risk model ROC of 14 IRGPs after lasso regression in TCGA validation dataset. c: The risk model ROC of 14 IRGPs after lasso regression in TCGA dataset. d: The risk model ROC of 14 IRGPs after lasso regression in GSE65858 dataset. e: KM survival curve of 14 IRGPs in TCGA training dataset. f: KM survival curve of 14 IRGPs in TCGA validation dataset. g: KM survival curve of 14 IRGPs in TCGA dataset. h: KM survival curve of 14 IRGPs in GSE65858 dataset

Predictive power of risk models in different clinical samples

In order to observe the robustness of risk models in different clinical characteristics, we observed the predictive power of risk models in different TNMstages, Age, gender and smoking history. We found that the 14-IRGPs signature model can be significantly distinguished into high-risk group and low-risk group not only in early patients and late-stage patients (Fig. 4a, b) (log rank p = 0.00023, log rank p < 0.0001), but also in young data sets and elderly data sets (Fig. 4c, d) (logrank p < 0.0001, log rank p < 0.0001), and in female data sets and male data sets (Fig. 4e, f) (log rank p = 0.01, log rank p < 0.0001). Finally, our analysis of the samples with and without smoking history shows that 14-IRGPs signature can also significantly distinguish the high-risk group from the low-risk group (Fig. 4g, h) (log rank p = 0.002, log rank p < 0.0001), those results indicated that our model has a very stable predictive power in patients of different ages, stages and genders.
Fig. 4

a: KM curve of 14-IRGPs signature in early samples. b: KM curve of 14-IRGPs signature in advanced cancer sample. c: KM curve of 14-IRGPs signature in young. d: KM curve of 14-IRGPs signature in age. e: KM curve of 14-IRGPs signature in female. f: KM curve of 14-IRGPs signature in male. g: KM curve of 14-IRGPs signature in No smoking history sample. h: KM curve of 14-IRGPs signature in smoking history sample

a: KM curve of 14-IRGPs signature in early samples. b: KM curve of 14-IRGPs signature in advanced cancer sample. c: KM curve of 14-IRGPs signature in young. d: KM curve of 14-IRGPs signature in age. e: KM curve of 14-IRGPs signature in female. f: KM curve of 14-IRGPs signature in male. g: KM curve of 14-IRGPs signature in No smoking history sample. h: KM curve of 14-IRGPs signature in smoking history sample

Univariate and multivariate analysis of 14-IRGPs signature

In order to identify the independence of 14-IRGPs signature model in clinical application, we used univariate and multivariate COX regression analysis to analyze relevant HR, 95%CI of HR, p value in TCGA training set, TCGA verification data set and all data of TCGA. We systematically analyzed clinical information recorded by TCGA patients, including age, T, N, AJCC Stage, Grade, Smoking, and our 14-IRGPs signature grouping information (Table 4).
Table 4

Univariate and multivariate COX regression analysis identified clinical factors associated with prognosis

VariablesUnivariate analysisMultivariable analysis
HR95%CI of HRP valueHR95%CI of HRP value
TCGA training datasets
 14-IRGPs signature
  Risk score (High/Low)2.971.99–4.437.79E-082.531.54–4.130.0002
  Age1.021.01–1.040.0091.020.99–1.040.0550
  Gender (Male vs Female)0.700.47–1.040.0770.940.56–1.50.8236
  T3/T4 vs T1/T20.970.66–1.430.8950.430.24–0.740.0025
  N1/N2/N3 VS N01.210.78–1.850.391.000.59–1.680.9985
  Stage IV vs Stage I/ II/III1.971.31–2.950.0012.651.46–4.780.0012
  G3/G4 vs G1/G21.160.76–1.770.4910.940.57–1.520.7969
  Smoking vs Non-smoking0.950.59–1.528.35E-010.850.48–1.490.5707
TCGA validation datasets
 14-IRGPs signature
  Risk score (High/Low)1.961.32–2.917.93E-041.721.12–2.620.0123
  Age1.020.99–1.040.0641.020.99–1.030.1549
  Gender (Male vs Female)0.770.51–1.170.2230.790.47–1.310.3577
  T3/T4 vs T1/T21.751.13–2.710.0112.051.17–3.590.0120
  N1/N2/N3 VS N01.290.87–1.910.2091.390.86–2.230.1674
  Stage IV vs Stage I/ II/III1.360.90–2.050.1420.870.49–1.530.6379
  G3/G4 vs G1/G21.220.77–1.930.3931.200.74–1.930.4439
  Smoking vs Non-smoking1.330.82–2.150.2471.340.77–2.310.2987
TCGA entire datasets
 14-IRGPs signature
  Risk score (High/Low)2.541.92–3.356.51E-112.011.41–2.849.36E-05
  Age1.021.01–1.030.00131.021.01–1.040.0087
  Gender (Male vs Female)0.740.55–0.970.0340.970.66–1.410.8830
  T3/T4 vs T1/T21.280.96–1.700.0930.720.47–1.090.1107
  N1/N2/N3 VS N01.200.87–1.660.2711.070.73–1.540.7276
  Stage IV vs Stage I/ II/III1.651.24–2.206.20E-041.851.21–2.820.0046
  G3/G4 vs G1/G21.150.84–1.570.3650.830.58–1.180.3067
  Smoking vs Non-smoking1.140.81–1.590.451.070.7–1.630.7539
GSE65858
 14-IRGPs signature
  Risk score (High/Low)1.941.27–2.970.00211.901.23–2.920.0035
  Age1.031.01–1.050.01301.041.01–1.060.0036
  Gender (Male vs Female)1.050.62–1.770.86801.060.62–1.820.8378
  T3/T4 vs T1/T22.921.81–4.721.23E-052.161.26–3.690.0049
  N1/N2/N3 VS N02.141.38–3.320.00071.390.73–2.650.3216
  Stage IV vs Stage I/ II/III2.921.72–4.957.22E-051.550.68–3.550.2987
  Smoking vs Non-smoking0.940.55–1.590.82101.140.64–2.030.6493
Univariate and multivariate COX regression analysis identified clinical factors associated with prognosis In the training set of TCGA, univariate COX regression analysis found that Risk score, AJCC Stage and Smoking were significantly correlated with survival, but the corresponding multi-factor COX regression analysis found that Risk score (HR = 2.53, 95%CI = 1.54–4.13, p = 0.0002), T Stage and AJCC Stage were significantly correlated with survival. In the verification set of TCGA, univariate COX regression analysis found that Risk score and T staging were significantly correlated with survival, but the corresponding multivariate COX regression analysis found that Risk score (HR = 1.72, 95%CI = 1.12–2.62, p = 0.0123) and T staging were significantly correlated with survival. In all data sets of TCGA, univariate COX regression analysis found that Risk score, age, gender and AJCC stage were significantly correlated with survival, but the corresponding multivariate COX regression analysis found that Risk score (HR = 2.01, 95%CI = 1.41–2.84, p < 0.0001), age and AJCC stage staging were significantly correlated with survival. Finally, in the GEO external data set, univariate COX regression analysis found that Risk score, age, T stage, N stage and AJCC stage were significantly correlated with survival, but the corresponding multivariate COX regression analysis found that Risk score (HR = 1.90, 95%CI = 1.23–2.92, p = 0.0035), age and T stage were significantly correlated with survival. The above conditions indicate that our model 14-IRGPs signature has a good predictive performance in terms of clinical application value in TCGA data set, and our model may be a prognostic indicator independent of other clinical factors and has an independent predictive performance in terms of clinical application value.

Functional analysis and immune analysis of IRGPs

In order to further analyze the functions of 14-IRGPs, we first used clusterProfiler to conduct GO and KEGG enrichment analysis on 19 genes, and finally retained the results of p < 0.05. The results showed that these pathways were enriched to 356 GO BP, which were mainly T cell receptor signaling pathway, stress-activated MAPK cascade and other biological processes, and we show the most significant top 20 (Fig. 5a), Furthermore, we found that these 19 genes were significantly enriched in 39 GO CCs and 73 GO MFs, the most prominent of which were the top 20 (Fig. 5b, c).
Fig. 5

a: Top 20 GO BP enrichment results of 19 immune-related genes. b: Top 20 GO CC enrichment results for 19 immune-related genes. c: Top 20 GO MF enrichment results for 19 immune-related genes. d: Clustering of correlation coefficients between KEGG pathways and RiskScores with a risk score correlation > 0.25. e: The KEGG pathway with a correlation with risk scores greater than 0.25 has a relationship with the ssGSEA score in each sample as the risk score increases. The horizontal axis represents the sample, and the risk score from left to right increases in turn. f: The difference of B cell and T cell activation scores between high and low risk groups. g: Expression differences of 19 genes in 14-IRGPs in patients with different response states after PD-L1 treatment; CR: complete response, PD: progressive disease, SD: stable disease, PR: partial response

a: Top 20 GO BP enrichment results of 19 immune-related genes. b: Top 20 GO CC enrichment results for 19 immune-related genes. c: Top 20 GO MF enrichment results for 19 immune-related genes. d: Clustering of correlation coefficients between KEGG pathways and RiskScores with a risk score correlation > 0.25. e: The KEGG pathway with a correlation with risk scores greater than 0.25 has a relationship with the ssGSEA score in each sample as the risk score increases. The horizontal axis represents the sample, and the risk score from left to right increases in turn. f: The difference of B cell and T cell activation scores between high and low risk groups. g: Expression differences of 19 genes in 14-IRGPs in patients with different response states after PD-L1 treatment; CR: complete response, PD: progressive disease, SD: stable disease, PR: partial response ssGSEA was used to analyze the enrichment scores of each sample in each pathway in the TCGA data set, calculate the correlation between these pathways and risk scores, and 26 pathways with correlation > 0.25 were selected (Fig. 5d), we found that most of the samples with risk score present negative correlation, a small number of positively related with risk score. Cluster analysis was conducted according to the 26 KEGG pathway enrichment scores (Fig. 5e), it can be seen that among the 26 pathways, B CELL SIGNALING PATHWAY, PRIMARY IMMUNODEFICIENCY and other pathways increase with the increase of RiskScore, and FOCAL ADHESION, GALACTOSE_METABOLISM and other metabolism-related pathways decrease with the increase of RiskScore. Results suggested that the imbalance of these pathways is closely related to the development of tumors. In addition, we obtained the signature genes of three immune cells, Activated B cell, Activated CD4 T cell, and Activated CD8 T cell, from a previous study [21], and calculated enrichment scores in each sample using the method of ssGSEA to assess the sample’s corresponding Immune cell scores. The differences in these three immune cell scores in the high and low risk groups of patients were analyzed and observed that B cell and T cell activation scores were all significantly lower in high risk patients with poor prognosis (Fig. 5f). Current immunotherapy-related datasets are rare. We found a cohort of PD-L1-treated patients with metastatic uroepithelial carcinoma shared by Sanjeev Mariathasan et al [22] and analyzed the differential expression of 19 genes in 14 IRGPs in patients with different response states after PD-L1 treatment. We observed a significant differential expression of 14 (73.6%) genes (Fig. 5g), suggesting that these genes are associated with immunotherapy.

Establishment and evaluation of nomogram model

In addition to 14-IRGPs, clinical features Stage and Age are also independent prognostic factors, indicating that they have complementary values. In order to further improve the accuracy of prediction, a new nomogram was established by integrating Stage, Age and 14-IRGPs using Cox model. According to this model, 14-IRGPs contribute the most to OS, followed by Age and Stage (Fig. 6a). By calculating the total score, oncologists could easily obtain the OS probability predicted by the nomogram of an individual patient. Furthermore, we used the calibration curve to evaluate the prediction accuracy of the model (Fig. 6b), results show that the predicted calibration curves of the three calibration points in 1, 3, and 5 years were close to the standard curve, which indicated that the model has good prediction performance. In addition, we also used DCA (Decision curve) to evaluate the reliability of the model (Fig. 6c). It was observed that RiskScore (14-IRGPs) and nomogram benefit significantly higher than the extreme curve, and nomogram is higher than RiskScore, and Age and Stage are close to the extreme curve. This suggests that RiskScore (14-IRGPs) and nomogram have good reliability.
Fig. 6

Establishment and evaluation of nomogram model. a: The nomogram model combined with Stage, Age and 14-IRGPs. b: Calibration curves of the nomogram for 1, 3 and 5 years. c: Decision curves for Stages, Ages, 14-IRGPs and nomograms

Establishment and evaluation of nomogram model. a: The nomogram model combined with Stage, Age and 14-IRGPs. b: Calibration curves of the nomogram for 1, 3 and 5 years. c: Decision curves for Stages, Ages, 14-IRGPs and nomograms

14-IRGPs was compared with other signatures and clinical features

In order to observe the performance of 14-IRGPs, the prognostic signature of three head and neck cancers reported in the past (3-gene signature of Cui L et al [23], 6-gene signature of Weidong Zhang et al [24] and 3-gene signature of Hongbo Zhou et al [25]) and four clinical features of T, N, age and Stage were selected. In order to make the model comparable, we calculated the Risk score of each head and neck cancer sample in the TCGA training set with the same method according to the corresponding genes in the 3 models, evaluated the ROC of each model and C-index (Table 5). We observe that the 6-gene signature model has the highest AUC above among the three models, while the average AUC for 1, 3 and 5 years is 0.63. The 1, 3, and 5 years AUC of 14-IRGPs signature were all above 0.73. In addition, in the C-index of all models, 14-IRGPs was significantly higher than other clinical features and models, indicating that our model has good application value.
Table 5

Comparision for 4 models and clinical features

CharacteristicsC-index (95%CI)1-year AUC (95%CI)3-year AUC (95%CI)5-year AUC (95%CI)
T0.50 (0.426–0.583, 0.571)0.53 (0.47–0.6)0.52 (0.46–0.58)0.6 (0.52–0.67)
N0.53 (0.452–0.610,0.835)0.52 (0.47–0.59)0.53 (0.47–0.59)0.45 (0.46–0.54)
Age0.56 (0.490–0.626,0.0096)0.62 (0.53–0.71)0.55 (0.48–0.62)0.51 (0.42–0.6)
AJCC stage0.52 (0.434–0.600,0.412)0.56 (0.5–0.63)0.58 (0.52–0.64)0.5 (0.41–0.59)
14-IGPS signature0.78 (0.693–0.859,3.65E-08)0.73 (0.66–0.8)0.82 (0.77–0.87)0.75 (0.66–0.82)
3-gene signature0.67 (0.579–0.767, 0.002)0.56 (0.48–0.63)0.64 (0.57–0.71)0.63 (0.52–0.74)
6-gene signature0.65 (0.560–0.750, 7.07E-5)0.56 (0.48–0.63)0.63 (0.56–0.7)0.71 (0.62–0.81)
3-gene signature0.62 (0.519–0.715,0.0430.59 (0.50–0.67)0.58 (0.51–0.66)0.51 (0.40–0.61)
Comparision for 4 models and clinical features

Discussion

Due to the heterogeneity of HNSCC, patients are still at great risk of recurrence and death even after complete surgical resection. The management of adjuvant chemotherapy for early HNSCC remains controversial. Therefore, it is important to develop a personalized management approach for HNSCC. Reliable prognostic biomarkers can identify patients with poor prognosis, and predictive biomarkers can inform patients who may benefit from additional systemic therapy, regardless of treatment, and therefore have more direct clinical relevance. In this study, we developed immune-related genes for signature prediction of HNSCC prognosis. Their potential for molecular stratification of HNSCC suggests different immune characteristics at different stages of the tumor. In the past decade, important studies based on prognostic signals of immune gene expression have shown that immune genes have a strong prognostic ability. Several gene expression scores have been proposed for predicting the risk of recurrence, and both Tadalafil and anti-tumor vaccine-mediated immune rejection reversals also lead to up-regulation of PDL1 in recurrent HNSCC, suggesting that immune checkpoint therapy may be effective in patients with HNSCC [8]. Immune-related gene signature reflecting immune infiltration can predict the prognosis of colorectal cancer [26]. AP001056.1 is a key immune-related ceRNA in SCCHN, and ICOSLG encodes immune checkpoint protein as its regulatory target, which can be used as a prognostic molecule of HNSCC [27]. The 14-IRGPs we developed could be risk stratified in four data sets, with AUC higher than 0.659. The KM curve of the risk score in the four data sets indicates that high risk predicts poor prognosis, and those results indicated that the immune-related gene can be used as a factor for stratifying the prognosis risk of HNSCC. In order to observe whether 14-IRGPs signature is dependent on TP53 and EGFR mutation characteristics, we first compared the relationship among 14-IRGPs signature, TP53 and EGFR mutation using single-factor and multi-factor analysis (Figure S1A-B). The results showed that 14-IRGPs signature had significant difference in prognosis, suggesting that 14-IRGPs signature is an independent factor. Furthermore, we compared the ROC analysis of 14-IRGPs signature in mutant and non-mutant samples and, considering the small number of EGFR mutations, only TP53 mutations were analyzed here (Figure S1C-D). We observed that the 14-IRGPs signature had higher AUC in both TP53 mutant and non-mutant samples. We also observed the lowest AUC at 1 year in TP53 mutant samples and the lowest AUC at 5 years in non-mutated samples, suggesting that the 14-IRGPs signature has better predictive performance for long-term survival in TP53 mutant samples and for short-term survival in non-mutated samples. We downloaded exon datasets of TCGA samples and extracted mutation data from HNSCC samples, in which a total of 508 patients were tested. Nineteen genes in the 14-IRGPs signature were analyzed for their mutation frequencies in these patients (Figure S1E), which had the highest frequency of CDKN2A mutations, especially in high-risk patients, mainly Nonsense_Mutation. Go and KEGG analysis were conducted to identify the functions of the 19 genes involved in HNSCC. T cell receptor signaling pathway, stress-activated MAPK cascade, B cell signaling pathway and primary immunodeficiency were enriched in TCGA samples. These immune-related pathways are involved in various biological processes, such as differentiation, growth, and apoptosis, and promote cell interaction and migration [28, 29]. Taken together, those pathways may facilitate the metastasis of HNSCC. Comprehensive analysis shows that risk score is a prognostic biomarker for HNSCC and can be used to molecularly stratify prognosis. Clinical features Age, Stage and Grade are key prognostic factors in head and neck squamous cell carcinoma. Factor [30], As expected, there is a significant association between the risk score and Age, Stage, and Grade, and found that the combination of risk score and Age has a superior prognostic effect. Although we identify potential candidate IRGPs involved in tumorigenesis in large samples by bioinformatics techniques, some limitations of this study should be noted. First, the sample lacks some clinical follow-up information, so we did not consider factors such as the presence of other health status of the patient to distinguish prognostic biomarkers. Second, the results obtained only through bioinformatics analysis are inadequate and experimental validation is needed to confirm these results. Therefore, further genetic and experimental studies of larger sample sizes and experimental validation are needed.

Conclusions

In conclusion, we studied the immunological characteristics of HNSCC and systematically studied the expression profile of immune genes. We found immune-related gene pair features in HNSCC, and have better AUC in both training and validation sets. Compared with clinical features, immune gene pair classifiers could improve survival risk prediction. Therefore, we recommend using this classifier as a molecular diagnostic test to assess the prognostic risk of patients with HNSCC. Additional file 1.
  30 in total

1.  Molecular signatures database (MSigDB) 3.0.

Authors:  Arthur Liberzon; Aravind Subramanian; Reid Pinchback; Helga Thorvaldsdóttir; Pablo Tamayo; Jill P Mesirov
Journal:  Bioinformatics       Date:  2011-05-05       Impact factor: 6.937

Review 2.  Biology of human papillomavirus infections in head and neck carcinogenesis.

Authors:  Jaana Rautava; Stina Syrjänen
Journal:  Head Neck Pathol       Date:  2012-07-03

3.  Pan-cancer Immunogenomic Analyses Reveal Genotype-Immunophenotype Relationships and Predictors of Response to Checkpoint Blockade.

Authors:  Pornpimol Charoentong; Francesca Finotello; Mihaela Angelova; Clemens Mayer; Mirjana Efremova; Dietmar Rieder; Hubert Hackl; Zlatko Trajanoski
Journal:  Cell Rep       Date:  2017-01-03       Impact factor: 9.423

4.  Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors:  Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal:  J Stat Softw       Date:  2010       Impact factor: 6.440

5.  CD38-Mediated Immunosuppression as a Mechanism of Tumor Cell Escape from PD-1/PD-L1 Blockade.

Authors:  Limo Chen; Lixia Diao; Yongbin Yang; Xiaohui Yi; B Leticia Rodriguez; Yanli Li; Pamela A Villalobos; Tina Cascone; Xi Liu; Lin Tan; Philip L Lorenzi; Anfei Huang; Qiang Zhao; Di Peng; Jared J Fradette; David H Peng; Christin Ungewiss; Jonathon Roybal; Pan Tong; Junna Oba; Ferdinandos Skoulidis; Weiyi Peng; Brett W Carter; Carl M Gay; Youhong Fan; Caleb A Class; Jingfen Zhu; Jaime Rodriguez-Canales; Masanori Kawakami; Lauren Averett Byers; Scott E Woodman; Vassiliki A Papadimitrakopoulou; Ethan Dmitrovsky; Jing Wang; Stephen E Ullrich; Ignacio I Wistuba; John V Heymach; F Xiao-Feng Qin; Don L Gibbons
Journal:  Cancer Discov       Date:  2018-07-16       Impact factor: 39.397

6.  Assessing the clinical utility of cancer genomic and proteomic data across tumor types.

Authors:  Yuan Yuan; Eliezer M Van Allen; Larsson Omberg; Nikhil Wagle; Ali Amin-Mansour; Artem Sokolov; Lauren A Byers; Yanxun Xu; Kenneth R Hess; Lixia Diao; Leng Han; Xuelin Huang; Michael S Lawrence; John N Weinstein; Josh M Stuart; Gordon B Mills; Levi A Garraway; Adam A Margolin; Gad Getz; Han Liang
Journal:  Nat Biotechnol       Date:  2014-06-22       Impact factor: 54.908

7.  TGFβ attenuates tumour response to PD-L1 blockade by contributing to exclusion of T cells.

Authors:  Sanjeev Mariathasan; Shannon J Turley; Dorothee Nickles; Alessandra Castiglioni; Kobe Yuen; Yulei Wang; Edward E Kadel; Hartmut Koeppen; Jillian L Astarita; Rafael Cubas; Suchit Jhunjhunwala; Romain Banchereau; Yagai Yang; Yinghui Guan; Cecile Chalouni; James Ziai; Yasin Şenbabaoğlu; Stephen Santoro; Daniel Sheinson; Jeffrey Hung; Jennifer M Giltnane; Andrew A Pierce; Kathryn Mesh; Steve Lianoglou; Johannes Riegler; Richard A D Carano; Pontus Eriksson; Mattias Höglund; Loan Somarriba; Daniel L Halligan; Michiel S van der Heijden; Yohann Loriot; Jonathan E Rosenberg; Lawrence Fong; Ira Mellman; Daniel S Chen; Marjorie Green; Christina Derleth; Gregg D Fine; Priti S Hegde; Richard Bourgon; Thomas Powles
Journal:  Nature       Date:  2018-02-14       Impact factor: 49.962

8.  A 3-mRNA-based prognostic signature of survival in oral squamous cell carcinoma.

Authors:  Ruoyan Cao; Qiqi Wu; Qiulan Li; Mianfeng Yao; Hongbo Zhou
Journal:  PeerJ       Date:  2019-07-31       Impact factor: 2.984

9.  Clinical and biological implications of driver mutations in myelodysplastic syndromes.

Authors:  Elli Papaemmanuil; Moritz Gerstung; Luca Malcovati; Sudhir Tauro; Gunes Gundem; Peter Van Loo; Chris J Yoon; Peter Ellis; David C Wedge; Andrea Pellagatti; Adam Shlien; Michael John Groves; Simon A Forbes; Keiran Raine; Jon Hinton; Laura J Mudie; Stuart McLaren; Claire Hardy; Calli Latimer; Matteo G Della Porta; Sarah O'Meara; Ilaria Ambaglio; Anna Galli; Adam P Butler; Gunilla Walldin; Jon W Teague; Lynn Quek; Alex Sternberg; Carlo Gambacorti-Passerini; Nicholas C P Cross; Anthony R Green; Jacqueline Boultwood; Paresh Vyas; Eva Hellstrom-Lindberg; David Bowen; Mario Cazzola; Michael R Stratton; Peter J Campbell
Journal:  Blood       Date:  2013-09-12       Impact factor: 22.113

10.  DAVID Knowledgebase: a gene-centered database integrating heterogeneous gene annotation resources to facilitate high-throughput gene functional analysis.

Authors:  Brad T Sherman; Da Wei Huang; Qina Tan; Yongjian Guo; Stephan Bour; David Liu; Robert Stephens; Michael W Baseler; H Clifford Lane; Richard A Lempicki
Journal:  BMC Bioinformatics       Date:  2007-11-02       Impact factor: 3.169

View more
  7 in total

1.  Identification of an Innate Immune-Related Prognostic Signature in Early-Stage Lung Squamous Cell Carcinoma.

Authors:  Liang Li; Xue Yu; Guanqiang Ma; Zhiqi Ji; Shihao Bao; Xiaopeng He; Liang Song; Yang Yu; Mo Shi; Xiangyan Liu
Journal:  Int J Gen Med       Date:  2021-11-30

2.  Inflammation-Related Gene Signature for Predicting the Prognosis of Head and Neck Squamous Cell Carcinoma.

Authors:  Yilong Lu; Zengrong Jia
Journal:  Int J Gen Med       Date:  2022-05-09

3.  An Epithelial-Mesenchymal Transition Hallmark Gene-Based Risk Score System in Head and Neck Squamous-Cell Carcinoma.

Authors:  Feifei Liang; Rensheng Wang; Qinghua Du; Shangyong Zhu
Journal:  Int J Gen Med       Date:  2021-08-06

4.  Prediction of Genetic Alterations in Oncogenic Signaling Pathways in Squamous Cell Carcinoma of the Head and Neck: Radiogenomic Analysis Based on Computed Tomography Images.

Authors:  Linyong Wu; Peng Lin; Yujia Zhao; Xin Li; Hong Yang; Yun He
Journal:  J Comput Assist Tomogr       Date:  2021 Nov-Dec 01       Impact factor: 1.826

5.  A novel immune-related lncRNA pair signature for prognostic prediction and immune response evaluation in gastric cancer: a bioinformatics and biological validation study.

Authors:  Jun Wang; Beidi Wang; Biting Zhou; Jing Chen; Jia Qi; Le Shi; Shaojun Yu; Guofeng Chen; Muxing Kang; Xiaoli Jin; Lie Wang; Jinghong Xu; Linghua Zhu; Jian Chen
Journal:  Cancer Cell Int       Date:  2022-02-10       Impact factor: 5.722

6.  Immunotherapy Advances in Locally Advanced and Recurrent/Metastatic Head and Neck Squamous Cell Carcinoma and Its Relationship With Human Papillomavirus.

Authors:  Huanhuan Wang; Qin Zhao; Yuyu Zhang; Qihe Zhang; Zhuangzhuang Zheng; Shiyu Liu; Zijing Liu; Lingbin Meng; Ying Xin; Xin Jiang
Journal:  Front Immunol       Date:  2021-07-08       Impact factor: 7.561

7.  Establishment and Validation of a Comprehensive Prognostic Model for Patients With HNSCC Metastasis.

Authors:  Yajun Shen; Lingyu Li; Yunping Lu; Min Zhang; Xin Huang; Xiaofei Tang
Journal:  Front Genet       Date:  2021-07-12       Impact factor: 4.599

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.