| Literature DB >> 26226392 |
Abstract
The existence of fundamental differences between lung adenocarcinoma (AC) and squamous cell carcinoma (SCC) in their underlying mechanisms motivated us to postulate that specific genes might exist relevant to prognosis of each histology subtype. To test on this research hypothesis, we previously proposed a simple Cox-regression model based feature selection algorithm and identified successfully some subtype-specific prognostic genes when applying this method to real-world data. In this article, we continue our effort on identification of subtype-specific prognostic genes for AC and SCC, and propose a novel embedded feature selection method by extending Threshold Gradient Descent Regularization (TGDR) algorithm and minimizing on a corresponding negative partial likelihood function. Using real-world datasets and simulated ones, we show these two proposed methods have comparable performance whereas the new proposal is superior in terms of model parsimony. Our analysis provides some evidence on the existence of such subtype-specific prognostic genes, more investigation is warranted.Entities:
Mesh:
Year: 2015 PMID: 26226392 PMCID: PMC4520527 DOI: 10.1371/journal.pone.0134630
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Study schema.
A graphical illustration showed how Cox-TGDR-specific and Cox-filter were applied to select relevant subtype-specific prognostic genes for AC and SCC lung cancer.
Performance of Cox-TGDR-specific and Cox-filter on NSCLC data.
| Cox-TGDR | Cox-filter | |||||||
|---|---|---|---|---|---|---|---|---|
| # genes (# for AC/SCC) | Accuracy (%) | AUC (%) AC/SCC | p-value (log rank) | # genes (# for AC/SCC) | Accuracy (%) | AUC (%) AC/SCC | p-value (log rank) | |
| A. RNA-seq data with patients in early stages (I and II) | ||||||||
| Microarray itself | 33 (18/18) | 70.4 | 70.10/86.06 | 5.55×10−15 | 83(81/2) | 76.3 | 85.23/78.43 | 1.11×10−15 |
| RNA-seq as test | 60.8 | 53.88/67.26 | 0.154 | 64 | 60.34/56.55 | 0.623 | ||
| RNA-seq itself | 13(10/5) | 74.4 | 55.32/72.32 | 3.86×10−3 | 0(0/0) | — | — | — |
| Microarray as test | 50.3 | 51.63/48.32 | 0.119 | — | — | — | ||
| B. RNA-seq data with patients in all stages | ||||||||
| Microarray itself | 30(15/22) | 65.7 | 67.21/86.78 | 3.25×10−8 | 46(46/0) | 66.9 | 77.24/50 | 0.385 |
| RNA-seq as test | 60.9 | 56.08/52.89 | 0.167 | — | — | — | ||
| RNA-seq itself | 16(16/7) | 72.8 | 56.85/72.82 | 2.90×10−3 | 0(0/0) | — | — | — |
| Microarray as test | 51.5 | 54.57/51.68 | 0.313 | — | — | — | ||
1 a higher ICC cut-off (90%) was used.
Fig 2Venn diagrams of 33- and 13-gene signatures.
A) On the individual gene level. B) On the enriched pathway level. 33-gene and 13-gene signatures were obtained using Cox-TGDR-specific algorithm with one being trained on the microarray data and the other on the RNA-seq data. Here,↓ and ↑ indicate a negative and positive association with hazard of death, respectively.
Performance of Cox-TGDR-specific and Cox-filter on simulated data.
| Cox-filter | Cox-TGDR-specific | |||
|---|---|---|---|---|
| AC (%) | SCC (%) | AC (%) | SCC (%) | |
| A. Simulation 1: mutually exclusive markers for each subtype | ||||
| Gene1 | 0 | 100 | 100 | 98 |
| Gene2 | 0 | 0 | 0 | 0 |
| Gene3 | 100 | 0 | 100 | 14 |
| Gene4 | 100 | 0 | 100 | 26 |
| No. of selected genes | 9.76 | 7.72 | 3.4 | 3.82 |
| B. Simulation 2: no subtype specific prognostic genes | ||||
| Gene1 | 100 | 100 | 100 | 100 |
| Gene2 | 100 | 100 | 100 | 100 |
| Gene3 | 100 | 100 | 100 | 98 |
| Gene4 | 100 | 100 | 0 | 0 |
| No. of selected genes | 24.48 | 31.33 | 3.54 | 4.14 |
1 represents the percentage with a non-zero coefficient for specific gene among 50 replicates.