| Literature DB >> 31927533 |
Juncheng Wang1, Xun Chen2, Yuxi Tian3, Gangcai Zhu4, Yuexiang Qin5, Xuan Chen6, Leiming Pi7, Ming Wei1, Guancheng Liu8, Zhexuan Li1, Changhan Chen1, Yunxia Lv9, Gengming Cai10.
Abstract
The prognosis of head and neck squamous cell carcinoma (HNSCC) patients remains poor. High-throughput sequencing data have laid a solid foundation for identifying genes related to cancer prognosis, but a gene marker is needed to predict clinical outcomes in HNSCC. In our study, we downloaded RNA Seq, single nucleotide polymorphism, copy number variation, and clinical follow-up data from TCGA. The samples were randomly divided into training and test. In the training set, we screened genes and used random forests for feature selection. Gene-related prognostic models were established and validated in a test set and GEO verification set. Six genes (PEX11A, NLRP2, SERPINE1, UPK, CTTN, D2HGDH) were ultimately obtained through random forest feature selection. Cox regression analysis confirmed the 6-gene signature is an independent prognostic factor in HNSCC patients. This signature effectively stratified samples in the training, test, and external verification sets (P < 0.01). The 5-year survival AUC in the training and verification sets was greater than 0.74. Thus, we have constructed a 6-gene signature as a new prognostic marker for predicting survival of HNSCC patients.Entities:
Keywords: CNV; TCGA; bioinformatics; prognostic markers
Mesh:
Substances:
Year: 2020 PMID: 31927533 PMCID: PMC6977678 DOI: 10.18632/aging.102655
Source DB: PubMed Journal: Aging (Albany NY) ISSN: 1945-4589 Impact factor: 5.682
List of the most relevant 20 genes.
| ENSG00000127084 | 0.630 | -0.462 | -4.562 | 5.07E-06 | |
| ENSG00000254656 | 1.360 | 0.308 | 4.202 | 2.64E-05 | |
| ENSG00000041515 | 1.337 | 0.290 | 4.200 | 2.67E-05 | |
| ENSG00000126353 | 0.647 | -0.435 | -3.969 | 7.22E-05 | |
| ENSG00000115085 | 0.649 | -0.432 | -3.916 | 9.01E-05 | |
| ENSG00000170482 | 0.538 | -0.620 | -3.896 | 9.80E-05 | |
| ENSG00000162545 | 1.503 | 0.407 | 3.880 | 0.0001 | |
| ENSG00000125910 | 0.638 | -0.449 | -3.838 | 0.0001 | |
| ENSG00000174652 | 0.674 | -0.395 | -3.795 | 0.0001 | |
| ENSG00000197540 | 0.655 | -0.423 | -3.722 | 0.0001 | |
| ENSG00000184903 | 1.392 | 0.331 | 3.719 | 0.0001 | |
| ENSG00000132465 | 0.692 | -0.369 | -3.698 | 0.0002 | |
| ENSG00000089199 | 1.382 | 0.324 | 3.683 | 0.0002 | |
| ENSG00000189319 | 0.696 | -0.362 | -3.674 | 0.0002 | |
| ENSG00000198198 | 0.692 | -0.369 | -3.609 | 0.0003 | |
| ENSG00000153531 | 1.318 | 0.276 | 3.553 | 0.0003 | |
| ENSG00000159176 | 1.359 | 0.307 | 3.552 | 0.0003 | |
| ENSG00000127241 | 0.598 | -0.515 | -3.537 | 0.0004 | |
| ENSG00000197992 | 0.617 | -0.483 | -3.537 | 0.0004 | |
| ENSG00000182866 | 0.700 | -0.357 | -3.494 | 0.0004 |
Figure 1Identification of genes with significant amplification or deletion. (A and B) The mRNA located in the focal CNA peaks are HNSCC-related. False-discovery rates (q values) and scores from GISTIC 2.0 for alterations (x-axis) are plotted against genome positions (y-axis). Dotted lines indicate the centromeres. Amplifications (A) are shown in red, deletion (B) in blue. The green line represents 0.25 q value cut-off point that determines significance. (C) Top 50 genes with the most significant mutations. The bar chart above shows the total number of synonymous and non-synonymous mutations in each patient's top 50 genes. The bar chart on the right shows the number of samples in which the 50 genes were mutated in all samples. The different colors in the thermogram indicate the type of mutation; gray indicates no mutation.
Figure 2Functional enrichment analysis of 1321 genome variant genes. (A) Enriched KEGG biological pathways. (B) Enriched GO terms in the “biological process” category. Different colors indicate different significances, while different sizes indicate the number of genes.
Figure 3Identification of genomic variant genes and prognosis-related genes in head and neck cancer. (A) Relationship between the error rate and number of classification trees. (B) Importance the sequencing of 6 out-of-bag genes. (C) Distribution of the 6-gene signature in Kaplan-Meier survival curve for the TCGA training set. (D) ROC curve and AUC for the 6-gene signature classification. (E) TCGA training focused on risk score, survival time, survival status, and expression of the 6-gene signature.
Six genes significantly associated with OS in the training set patients.
| ENSG00000180902 | 0.77 | -2.61 | 8.88E-03 | 0.012 | 1 | |
| ENSG00000166821 | 1.33 | 2.98 | 2.87E-03 | 0.0076 | 0.6359 | |
| ENSG00000022556 | 1.32 | 2.96 | 3.03E-03 | 0.0074 | 0.6214 | |
| ENSG00000106366 | 1.43 | 3.33 | 8.47E-04 | 0.0052 | 0.4369 | |
| ENSG00000110375 | 1.29 | 2.98 | 2.84E-03 | 0.005 | 0.4175 | |
| ENSG00000085733 | 1.27 | 2.69 | 7.08E-03 | 0.0048 | 0.4 |
Figure 4Relation between the 6-gene signature and cancer risk. (A) Kaplan-Meier curve for the test set sample. (B) Kaplan-Meier curve in all TCGA tumor samples. (C) Relationship between expression of the 6-gene signature and risk scores in test set samples. (D) Relationship between expression of the 6-gene signature and the risk score in all TCGA samples.
Figure 5Performance of the 6-gene signature model with GEO data. (A) Kaplan-Meier survival curve distribution of 6-gene signature for the GSE65858 dataset. (B) ROC curve and AUC for the 6-gene signature classification. (C) Risk score, survival time, survival status, and expression of the 6-gene signature in the GSE65858 dataset.
Univariate and multivariate COX regression analyses of clinical factors and independence associated with prognosis.
| 6-gene risk score | ||||||
| Low risk group | 1(reference) | 1(reference) | ||||
| High risk group | 2.66 | 1.79-3.92 | 1.060E-06 | 2.18 | 1.04-4.53 | 0.038 |
| Age | 1.03 | 1.01-1.04 | 8.550E-04 | 0.99 | 0.95-1.02 | 0.604231 |
| Gender female | 1(reference) | 1(reference) | ||||
| Gender male | 0.73 | 0.49- 1.07 | 0.11 | 0.50 | 0.23-1.09 | 0.081554 |
| Grade 1 | 1(reference) | 1(reference) | ||||
| Grade 2 | 1.78 | 0.95-3.32 | 0.07 | 0.91 | 0.25-3.18 | 0.88025 |
| Grade 3 / 4 | 1.47 | 0.75-2.87 | 0.26 | 1.94 | 0.53-7.04 | 0.315185 |
| Pathologic T 1/ T 2 | 1(reference) | 1(reference) | ||||
| Pathologic T 3 | 2.26 | 1.28-3.95 | 4.53E-03 | 2.07 | 0.51-8.37 | 0.309 |
| Pathologic T 4 | 2.46 | 1.50-4.00 | 3.19E-04 | 7.59 | 1.90-30.22 | 0.004 |
| Pathologic N 0 | 1(reference) | 1(reference) | ||||
| Pathologic N 1 | 0.89 | 0.37-2.08 | 0.782 | 0.76 | 0.19-3.01 | 0.694 |
| Pathologic N 2 | 2.29 | 1.40-3.72 | 0.001 | 5.02 | 1.97-12.75 | 0.001 |
| Pathologic M 0 | 1(reference) | 1(reference) | ||||
| Pathologic M 1/ M X | 1.31 | 0.66-2.55 | 0.433 | 1.16 | 0.43-3.08 | 0.76 |
| Tumor stage I | 1(reference) | 1(reference) | ||||
| Tumor stage II | 1.08 | 0.23-4.90 | 0.918 | 0.55 | 0.03-9.85 | 0.686335 |
| Tumor stage III | 1.40 | 0.30-6.42 | 0.667 | 0.77 | 0.05-10.60 | 0.845959 |
| Tumor stage IV | 2.75 | 0.67-11.22 | 0.158 | 0.27 | 0.02-3.63 | 0.32 |
| 6-gene risk score | ||||||
| Low risk group | 1(reference) | 1(reference) | ||||
| High risk group | 1.62 | 1.08- 2.42 | 0.020 | 1.45 | 0.67-3.12 | 0.339 |
| Age | 1.01 | 0.98-1.02 | 0.444 | 1.00 | 0.96-1.03 | 0.993 |
| Gender female | 1(reference) | 1(reference) | ||||
| Gender male | 0.84 | 0.55-1.28 | 0.420 | 0.45 | 0.19-1.01 | 0.054 |
| Grade 1 | 1(reference) | 1(reference) | ||||
| Grade 2 | 1.86 | 0.92-3.77 | 0.08 | 0.90 | 0.28-2.88 | 0.865 |
| Grade 3 | 1.57 | 0.73-3.36 | 0.24 | 1.54 | 0.42-5.52 | 0.508 |
| Pathologic T 1/ T 2 | 1(reference) | 1(reference) | ||||
| Pathologic T 3 | 1.84 | 1.09-3.11 | 0.022 | 1.76 | 0.47-6.54 | 0.399 |
| Pathologic T 4 | 1.36 | 0.84-2.21 | 0.213 | 1.20 | 0.35-4.07 | 0.770 |
| Pathologic N 0 | 1(reference) | |||||
| Pathologic N 1 | 1.14 | 0.57-2.25 | 0.706 | 1.74 | 0.50-6 | 0.379 |
| Pathologic N 2 | 2.49 | 1.50-4.12 | 0.000 | 1.87 | 0.65-5.32 | 0.239 |
| Pathologic N 3 | 2.90 | 0.87-9.6 | 0.082 | 1.86 | 0.26-12.82 | 0.529 |
| Pathologic M 0 | 1(reference) | 1(reference) | ||||
| Pathologic M 1 | 1.05 | 0.49-2.21 | 0.905 | 1.34 | 0.54-3.28 | 0.524 |
| Tumor stage I | 1(reference) | |||||
| Tumor stage II | 2.94 | 0.34-0.66 | 0.157 | 0.55 | 0.03-9.20 | 0.677 |
| Tumor stage III | 3.04 | 0.32-0.70 | 0.137 | 1.00 | 0.08-11.42 | 0.997 |
| Tumor stage IV | 4.03 | 0.24-0.98 | 0.053 | 1.46 | 0.12-17.69 | 0.765 |
| 6-gene risk score | ||||||
| Low risk group | 1(reference) | 1(reference) | ||||
| High risk group | 1.75 | 1.13-2.67 | 0.011 | 1.83 | 1.16-2.87 | 0.008 |
| Age | 1.03 | 1.00-1.04 | 0.01 | 1.03 | 1.01-1.05 | 0.00 |
| Gender female | 1(reference) | 1(reference) | ||||
| Gender male | 1.05 | 0.61-1.77 | 0.87 | 1.02 | 0.59-1.75 | 0.94 |
| Pathologic T 1 | 1(reference) | 1(reference) | ||||
| Pathologic T 2 | 0.49 | 0.21-1.15 | 0.10 | 0.53 | 0.15-1.78 | 0.31 |
| Pathologic T 3 | 1.73 | 0.81-3.67 | 0.15 | 1.61 | 0.55-4.69 | 0.38 |
| Pathologic T 4 | 1.89 | 0.91-3.87 | 0.08 | 1.22 | 0.42-3.53 | 0.71 |
| Pathologic N 0 | 1(reference) | 1(reference) | ||||
| Pathologic N 1 | 0.36 | 0.12-1.02 | 0.06 | 0.34 | 0.10-1.08 | 0.07 |
| Pathologic N 2 | 1.60 | 0.99-2.56 | 0.05 | 1.03 | 0.51-2.05 | 0.94 |
| Pathologic N 3 | 2.93 | 1.34-6.42 | 0.01 | 1.27 | 0.41-3.84 | 0.67 |
| Pathologic M 0 | 1(reference) | 1(reference) | ||||
| Pathologic M 1/ M X | 3.71 | 1.63-8.4 | 0.00 | 2.18 | 0.70-6.79 | 0.18 |
| Tumor stage I | 1(reference) | 1(reference) | ||||
| Tumor stage II | 0.39 | 0.11-1.33 | 0.13 | 0.60 | 0.10-3.40 | 0.56 |
| Tumor stage III | 0.46 | 0.14-1.45 | 0.19 | 0.66 | 0.13-3.31 | 0.61 |
| Tumor stage IV | 1.49 | 0.60-3.70 | 0.39 | 1.14 | 0.24-5.20 | 0.87 |
Figure 6GSEA showing four pathways enriched in the high-risk group. GSEA enrichment results for focal adhesion, TGF-β signaling pathway, WNT signaling pathway, and ERBB signaling pathway.
Clinical information statistics for three datasets.
| <=50 | 38 | 50 | 41 | |
| >50 | 212 | 201 | 229 | |
| Living | 136 | 146 | 176 | |
| Dead | 141 | 105 | 94 | |
| female | 68 | 66 | 47 | |
| male | 182 | 185 | 223 | |
| G 1 | 34 | 28 | ||
| G 2 | 136 | 163 | ||
| G 3 | 68 | 51 | ||
| G 4 | 2 | 0 | ||
| T 1 | 17 | 28 | 35 | |
| T 2 | 67 | 65 | 80 | |
| T 3 | 48 | 48 | 58 | |
| T 4 | 89 | 63 | 97 | |
| N 0 | 80 | 90 | 94 | |
| N 1 | 26 | 39 | 32 | |
| N 2 | 94 | 72 | 132 | |
| N 3 | 2 | 5 | 12 | |
| M 0 | 94 | 93 | 263 | |
| M 1/ M X | 32 | 30 | 7 | |
| Stage I | 9 | 16 | 18 | |
| Stage II | 37 | 32 | 37 | |
| Stage III | 30 | 48 | 37 | |
| Stage IV | 141 | 120 | 178 |