| Literature DB >> 31854219 |
Xing Wu1, Linlin Wang2, Fan Feng3, Suyan Tian4.
Abstract
OBJECTIVE: To construct a diagnostic signature to distinguish lung adenocarcinoma from lung squamous cell carcinoma and a prognostic signature to predict the risk of death for patients with nonsmall-cell lung cancer, with satisfactory predictive performances, good stabilities, small sizes and meaningful biological implications.Entities:
Keywords: GeneRank; Lung adenocarcinoma; classification; pathway information; prognosis; squamous cell carcinoma
Mesh:
Year: 2019 PMID: 31854219 PMCID: PMC7607763 DOI: 10.1177/0300060519893837
Source DB: PubMed Journal: J Int Med Res ISSN: 0300-0605 Impact factor: 1.671
The discriminative gene list to discriminate lung adenocarcinoma from lung squamous cell carcinoma.
| Gene symbols | β | Percentage(%) | Biological relevance |
|---|---|---|---|
|
| 0.8913 | 53 | I |
|
| –1.1502 | 87 | I |
|
| 0.4041 | 37 | I |
|
| 0.0822 | 31 | D |
|
| –1.3647 | 73 | D |
|
| 1.0274 | 63 | D |
|
| 2.4164 | 54 | I |
|
| 9.6217 | 100 | D |
|
| –0.2808 | 51 | D |
|
| –3.2149 | 93 | D |
|
| 0.4458 | 53 | D |
|
| –0.2771 | 59 | D |
|
| –2.3844 | 90 | I |
|
| –3.0412 | 99 | D |
β is the estimated coefficient for the specific gene (the magnitude of association with the outcome) using the LASSO model; percentage (%) is the frequency of being identified as a non-zero β over 100 replicates.
I, indirectly related to nonsmall-cell lung cancer according to the GeneCards database; D, directly related to nonsmall-cell lung cancer according to the GeneCards database.
Performance statistics for both discriminative and prognostic gene signatures.
Classification (AC versus SCC) | Prognosis (AC & SCC) | |||
|---|---|---|---|---|
| Error rate (%) | GBS | C-index | ||
| Training set (integrated microarray data) | 6.49 | 0.056 | 8.05 × 10–9 | 0.667 |
| Test set (RNA-seq data) | 14.4 | 0.109 | 0.252 | 0.577 |
| Using bagging to eliminate the genes with low frequencies* | ||||
| Training set (integrated microarray data) | 11.50 | 0.076 | 3.01 × 10–5 | 0.630 |
| Test set (RNA-seq data) | 12.80 | 0.108 | 0.03 | |
*For the classification problem the genes with frequencies of > 80% were kept; and for the prognosis problem the genes with frequencies of > 50% were kept.
AC, lung adenocarcinoma; SCC, lung squamous cell carcinoma.; GBS, Generalized Brier Score.
Figure 1.Scatterplots of the discriminative gene signature. (a) The training set (integrated microarray data). (b) The test set (RNA-Seq data). The three genes under consideration are keratin 5 (KRT5), mucin 1 (MUC1) and triggering receptor expressed on myeloid cells 1 (TREM1) that not only have high stability (the frequencies of being selected are > 80%) but also are directly related to nonsmall-cell lung cancer. From these two plots, the AC patients (blue dots) and the SCC patients (red dots) were observed to be well separated using these three genes. The weighted gene expression values for KRT5 are given on the x-axis, for MUC1 on the y-axis and for TREM1 on the z-axis. AC, lung adenocarcinoma; SCC, lung squamous cell carcinoma. The colour version of this figure is available at: http://imr.sagepub.com.
The prognostic gene list for nonsmall-cell lung cancer.
| Gene symbols | β | Percentage (%) | Biological relevance |
|---|---|---|---|
|
| –0.3811 | 89 | D |
|
| –0.7664 | 87 | I |
|
| –0.4586 | 46 | I |
|
| 0.8026 | 65 | D |
|
| –1.1324 | 66 | D |
|
| –0.7247 | 47 | I |
|
| –0.9063 | 49 | I |
|
| 0.4625 | 44 | D |
|
| –0.3124 | 50 | D |
|
| 0.5301 | 56 | D |
|
| 0.0574 | 31 | D |
|
| –0.089 | 45 | I |
|
| –0.1602 | 44 | D |
|
| –0.013 | 35 | I |
β is the estimated coefficient for the specific gene (the magnitude of association with the outcome) using the LASSO model; percentage (%) is the frequency of being identified as a non-zero β over 100 replicates.
D, directly related to nonsmall-cell lung cancer according to the GeneCards database; I, indirectly related to nonsmall-cell lung cancer according to the GeneCards database.
Figure 2.Kaplan–Meier plots for the five-gene prognostic signature. (a) The training set (the integrated microarray dataset). (b) The test set (the RNA-Seq dataset). Using the mean of risk scores as a cutoff, the patients were divided into two groups, i.e. the high-risk group (red solid line) and the low-risk group (blue solid line) and then a log-rank test was conducted to test if the survival curves of these two groups differed. P-value is the corresponding log-rank P-value. The colour version of this figure is available at: http://imr.sagepub.com.
Figure 3.Interaction network on the basis of five prognostic genes and five diagnostic genes. In this graph, the isolated genes were excluded. The identified diagnostic genes (i.e. keratin 5 [KRT5], mucin 1 [MUC1] and complement C3 [C3]) were highlighted in yellow and the prognostic ones (i.e. alpha-2-glycoprotein 1, zinc-binding [AZGP1], clusterin [CLU] and cyclin dependent kinase 1 [CDK1]) in pink. The colour version of this figure is available at: http://imr.sagepub.com.