| Literature DB >> 31956375 |
Baoshan Ma1, Yao Geng1, Fanyu Meng1, Ge Yan1, Fengju Song2.
Abstract
Objectives: Lung adenocarcinoma (LUAD) accounts for a majority of cancer-related deaths worldwide annually. The identification of prognostic biomarkers and prediction of prognosis for LUAD patients is necessary. Materials andEntities:
Keywords: Forward selection model; Lung adenocarcinoma; Prognosis prediction; RNA-Seq data; Random survival forest
Year: 2020 PMID: 31956375 PMCID: PMC6959071 DOI: 10.7150/jca.34585
Source DB: PubMed Journal: J Cancer ISSN: 1837-9664 Impact factor: 4.207
Figure 1Identification of prognostic gene signature. A) Flowchart of RNA-Seq analysis and signature generation. Briefly, survival-related seed genes of the 506 TCGA LUAD patients were first identified by the Cox model and the machine learning model (random survival forest, RSF) from the TCGA cohort I. Next the forward selection model was used to select four sets of key genes for prognosis prediction. The survival risk score systems were built based on the expression data of gene signatures in the TCGA cohort II and the GSE72094 cohort and the GSE11969 cohort, which divided patients into high- and low-risk groups. B) KEGG enrichment pathway analysis of 5376 survival-related seed genes obtained by the Cox model. C) KEGG enrichment pathway analysis of 1113 survival-related seed genes obtained by the RSF model. D) The venn diagram showed that the common key genes obtained from RNA-Seq data and clinically-integrated RNA-Seq data using both the Cox model and the RSF model.
Figure 2Validation of the prognosis-related key genes in the TCGA cohort II. A) The KM survival curve was generated in the TCGA cohort II by the Cox model and RNA-Seq data. Patients of the TCGA cohort II were divided into high- and low-risk groups based on the 50th percentile of risk score. B) The KM survival curve was generated in the TCGA cohort II by the Cox model and clinically-integrated RNA-Seq data. Patients of the TCGA cohort II were divided into high- and low-risk groups based on the 50th percentile of risk score. C) The KM survival curve was generated in the TCGA cohort II by the RSF model and RNA-Seq data. Patients of the TCGA cohort II were divided into high- and low-risk groups based on the 50th percentile of risk score. D) The KM survival curve was generated in the TCGA cohort II by the RSF model and clinically-integrated RNA-Seq data. Patients of the TCGA cohort II were divided into high- and low-risk groups based on the 50th percentile of risk score.
Survival analysis based on the TCGA cohort II and GSE72094 cohort
| TCGA cohort II | GSE72094 cohort | |||||
|---|---|---|---|---|---|---|
| model | HRa (95% CIb) | C-indexc | HR (95% CI) | C-index | ||
| model 1d | 2.07 (1.25-3.41) | 4.38e-03 | 0.610 | 2.22 (1.50-3.28) | 6.30e-05 | 0.615 |
| model 2e | 2.72 (1.64-4.50) | 1.02e-04 | 0.645 | 2.77 (1.85-4.15) | 8.28e-07 | 0.630 |
| model 3f | 2.53 (1.53-4.17) | 2.88e-04 | 0.643 | 2.94 (1.95-4.43) | 2.94e-07 | 0.623 |
| model 4g | 3.80 (2.20-6.55) | 1.63e-06 | 0.656 | 4.12 (2.68-6.35) | 1.34e-10 | 0.672 |
aHR = hazard ratio; bCI = confidence interval; cC-index = concordance index; dmodel 1: the Cox model and RNA-Seq data; emodel 2: the Cox model and clinically-integrated RNA-Seq data; fmodel 3: the RSF model and RNA-Seq data; gmodel 4: the RSF model and clinically-integrated RNA-Seq data.
Figure 3Validation of the prognosis-related key genes in the GSE72094 cohort. A) The KM survival curve was generated in the GSE72094 cohort by the Cox model and RNA-Seq data. Patients of the GSE72094 cohort were divided into high- and low-risk groups based on the 50th percentile of risk score. B) The KM survival curve was generated in the GSE72094 cohort by the Cox model and clinically-integrated RNA-Seq data. Patients of the GSE72094 cohort were divided into high- and low-risk groups based on the 50th percentile of risk score. C) The KM survival curve was generated in the GSE72094 cohort by the RSF model and RNA-Seq data. Patients of the GSE72094 cohort were divided into high- and low-risk groups based on the 50th percentile of risk score. D) The KM survival curve was generated in the GSE72094 cohort by the RSF model and clinically-integrated RNA-Seq data. Patients of the GSE72094 cohort were divided into high- and low-risk groups based on the 50th percentile of risk score.
Figure 4The analysis of the sixteen prognosis-related key genes. A) Heat map for the key genes obtained by the RSF model and clinically-integrated RNA-Seq data. The abscissa indicates genes, and the ordinate indicates 393 samples from GSE72094. The rightmost column is the patient's risk score, sorted by ascending order from top to bottom. The low-risk group is above the red dashed line, the high-risk group is under the red dashed line. B) KEGG enrichment pathway analysis of the key genes obtained by the RSF model and clinically-integrated RNA-Seq data. C) Protein-gene network. The yellow hexagons indicate the genes obtained from the RSF model and clinically-integrated RNA-Seq data, and the orange-red ovals indicate the associated proteins. D) The KM survival curve of cross-tumor model.
Comparison of the sixteen-gene prognostic signatures to the five published lung cancer prognostic signatures
| studies | TCGA cohort II | GSE72094 cohort | GSE11969 cohort | ||||||
|---|---|---|---|---|---|---|---|---|---|
| HR (95% CI) | C-index | HR (95% CI) | C-index | HR (95% CI) | C-index | ||||
| Present study | 3.80 (2.20-6.55) | 1.63e-06 | 0.656 | 4.12 (2.68-6.35) | 1.34e-10 | 0.672 | 3.87 (2.27-6.61) | 6.81e-07 | 0.670 |
| Shukla et al. | 2.24 (1.36-3.68) | 1.48e-03 | 0.613 | 3.01 (2.00-4.51) | 1.07e-07 | 0.639 | 3.02 (1.85-4.92) | 9.42e-06 | 0.641 |
| Boutros et al. | 1.70 (1.04-2.77) | 3.49e-02 | 0.561 | 3.17 (2.10-4.79) | 3.84e-08 | 0.656 | 3.09 (1.89-5.03) | 6.10e-06 | 0.646 |
| Chen et al. | 2.20 (1.34-3.61) | 1.93e-03 | 0.625 | 2.83 (1.88-4.27) | 7.07e-07 | 0.631 | 3.09 (1.89-5.03) | 6.27e-06 | 0.644 |
| Lau et al. | 1.80 (1.10-2.94) | 1.88e-02 | 0.588 | 2.81 (1.88-4.22) | 5.55e-07 | 0.625 | 2.62 (1.62-4.23) | 7.90e-05 | 0.628 |
| Bianchi et al. | 3.13 (1.84-5.34) | 2.78e-05 | 0.619 | 2.92 (1.94-4.39) | 2.53e-07 | 0.640 | 3.41 (2.08-5.59) | 1.23e-06 | 0.655 |