| Literature DB >> 24367507 |
Balázs Győrffy1, Pawel Surowiak2, Jan Budczies3, András Lánczky1.
Abstract
In the last decade, optimized treatment for non-small cell lung cancer had lead to improved prognosis, but the overall survival is still very short. To further understand the molecular basis of the disease we have to identify biomarkers related to survival. Here we present the development of an online tool suitable for the real-time meta-analysis of published lung cancer microarray datasets to identify biomarkers related to survival. We searched the caBIG, GEO and TCGA repositories to identify samples with published gene expression data and survival information. Univariate and multivariate Cox regression analysis, Kaplan-Meier survival plot with hazard ratio and logrank P value are calculated and plotted in R. The complete analysis tool can be accessed online at: www.kmplot.com/lung. All together 1,715 samples of ten independent datasets were integrated into the system. As a demonstration, we used the tool to validate 21 previously published survival associated biomarkers. Of these, survival was best predicted by CDK1 (p<1E-16), CD24 (p<1E-16) and CADM1 (p = 7E-12) in adenocarcinomas and by CCNE1 (p = 2.3E-09) and VEGF (p = 3.3E-10) in all NSCLC patients. Additional genes significantly correlated to survival include RAD51, CDKN2A, OPN, EZH2, ANXA3, ADAM28 and ERCC1. In summary, we established an integrated database and an online tool capable of uni- and multivariate analysis for in silico validation of new biomarker candidates in non-small cell lung cancer.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24367507 PMCID: PMC3867325 DOI: 10.1371/journal.pone.0082241
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Clinical characteristics of the datasets included in the analysis.
| Dataset | Platform | Reference | Sample size | Median follow-up (OS) | No. of deaths | Median follow-up (FP) | No. of progressions | Age | Sex (% male) | % of never smokers | Histology (% A/S/L) | Stage (% 1/2/3/4) | % surgical margins negative | Grade (% poor/moderate/well) | % chemotherapy | % radiotherapy |
| GSE4573 | GPL96 |
| 130 | 34.5 | 67 | - | - | 67±9.8 | 63% | 3.1% | 0/100/0 | 56/26/18/0 | - | 17/71/12 | - | - |
| GSE14814 | GPL96 |
| 90 | 5.4 | 38 | - | - | 62±8.5 | 74% | - | 31/58/11 | 50/50/0/0 | - | - | 56% | - |
| GSE8894 | GPL570 |
| 138 | - | - | 36 | 69 | 62±10 | 75% | - | 46/54/0 | - | - | - | - | - |
| GSE19188 | GPL570 |
| 156 | 30.4 | 50 | - | - | - | 75% | - | 49/30/21 | - | - | - | - | - |
| GSE3141 | GPL570 |
| 109 | 31.1 | 58 | - | - | - | - | - | 52/48/0 | - | - | - | - | - |
| GSE31210 | GPL570 |
| 246 | 58.2 | 35 | 54.4 | 64 | 60±8.1 | 47% | 50% | 100/0/0 | 74/26/0/0 | 90% | - | - | - |
| caArray | GPL96 |
| 462 | 45.8 | 257 | 28 | 219 | 64±10 | 51% | 14% | - | - | 98% | 39/47/14 | 27% | 21% |
| TCGA | GPL3921 |
| 133 | 18.3 | 30 | - | - | 66±9.3 | 67% | 7.5% | 0/100/0 | - | 95% | - | - | - |
| GSE29013 | GPL570 |
| 55 | 32.9 | 18 | 31.4 | 28 | 64±8.7 | 69% | 3.6% | 55/45/0 | 44/25/31/0 | - | - | 62% | - |
| GSE37745 | GPL570 |
| 196 | 42.5 | 145 | - | - | 64±9.2 | 55% | - | 54/37/12 | 66/18/14/2 | - | - | - | - |
| Entire database: | 1715 | 40 | 698/1443 | 37 | 380/821 | 64±10 | 58% (n = 886) | 17.8% (n = 187) | 50/45/5 | 63/27/10/1 | 95% (n = 705) | 34/53/13 | 29% (n = 178) | 21% (n = 73) | ||
OS: overall survival, FP: first progression, A/S/L: adenocarcinoma/squamous cell carcinoma/large cell carcinoma.
Figure 1Survival characteristics of the patients included in the database including histology of adenocarcinoma (adeno), squamous cell carcinoma (SCC) and large cell carcinoma (large), gender, stage (only with overall survival) and smoking history.
Performance of previously published biomarker candidates associated with survival in non-small-cell lung cancer.
| Gene | Literature data | Meta-analysis results | ||||||||
| Symbol | Ref. | n | Method used | Cohort | Probe ID* | n ✠ | Cutoff ✠ | HR✠ | p value: univariate | p value: multivariate |
|
| ||||||||||
| VEGF |
| 5386 | IHC, RT-PCR | NSCLC | 211527_x_at | 1404 | 244 | 1.9 | 3.3e-10 | <1e-16 |
| MMP9 |
| 2029 | IHC, RT-PCR | NSCLC | 203936_s_at | 1404 | 1865 | 1.21 | 0.012 | - |
| ADE | 486 | 734 | 1.51 | 0.02 | - | |||||
| CCNE1 |
| 2606 | IHC | NSCLC | 213523_at | 1404 | 276 | 1.59 | 2.3e-09 | 0.0096 |
| ADE | 486 | 167 | 2.44 | 4.8e-08# | 0.0013 | |||||
| BIRC5 |
| 2703 | IHC, FISH RT-PCR | NSCLC stage 2 | 202095_s_at | 185 | 295 | 1.56 | 0.077 | - |
| CDC2 |
| 2731 | IHC, RT-PCR | NSCLC | 210559_s_at | 1404 | 266 | 2.56 | <1e-16# | 0.0019 |
|
| ||||||||||
| CADM1 |
| 617 | Array + IHC | ADE | 209031_at | 486 | 1793 | 0.38 | 7e-12# | 0.0001 |
| CEA |
| 97 | IHC | NSCLC | 206199_at | 1404 | 110 | 1.21 | 0.02 | - |
| RAD51 |
| 383 | IHC | NSCLC | 205023_at | 1404 | 44 | 1.4 | 2.4e-05 | 0.24 |
| ADE | 486 | 34 | 1.36 | 0.046 | - | |||||
| SCC | 421 | 45 | 1.2 | 0.18 | - | |||||
| CDKN2A |
| 106 | IHC | NSCLC | 209644_x_at | 1404 | 1382 | 1.65 | 1.8e-09 | 0.12 |
| ADE | 486 | 486 | 2.23 | 6.8e-08 | 0.012 | |||||
| OPN |
| 25 | IHC | All patients | 209875_s_at | 1404 | 4151 | 1.5 | 2.8e-06 | 0.0001 |
|
| 82 | RT-PCR | NSCLC surgical margin neg. | 704 | 4101 | 1.93 | 1.5e-06 | 0.0032 | ||
| EZH2 |
| 106 | IHC | NSCLC stage 1 | 203358_s_at | 440 | 600 | 2.07 | 2.6e-06 | 0.32 |
| IFNAR2 |
| 113 | IHC | NSCLC PFS | 204785_x_at | 764 | 799 | 1.41 | 0.0012 | 0.05 |
| ANXA3 |
| 125 | MS, 2D-DIGE | ADE | 209369_at | 486 | 811 | 0.49 | 9.2e-07 | 0.0093 |
| S100A4 |
| 400 | IHC | SCC | 203186_s_at | 421 | 2844 | 1.24 | 0.12 | - |
| ADAM28 |
| 90 | ELISA | NSCLC | 205997_at | 1404 | 143 | 0.69 | 8.3e-06 | 0.003 |
| XIAP |
| 144 | IHC | NSCLC | 206536_s_at | 1404 | 85 | 0.86 | 0.071 | - |
| XAF1 |
| 51 | RT-PCR | SCC | 206133_at | 421 | 253 | 0.72 | 0.025 | - |
| CD24 |
| 267 | IHC | ADE | 209772_s_at | 486 | 618 | 2.45 | 3.6e-10 | <1e–16 |
| ERCC1 |
| 51 | RT-PCR | NSCLC | 203719_at | 1404 | 685 | 1.65 | 1.4e-10 | <1e-16 |
| HER2 |
| 83 | RT-PCR | NSCLC | 216836_s_at | 1404 | 898 | 1.25 | 0.0057 | 0.12 |
| CD82 |
| 151 | RT-PCR | NSCLC | 203904_x_at | 1404 | 506 | 1.27 | 0.0029 | 0.09 |
|
| ||||||||||
| 139-gene |
| 253 | Array | NSCLC stage I | see | 440 | 3368.7 | 3.59 | 8.9e-16# | <1e-16 |
| 59-gene |
| 100 | Array | NSCLC | see | 1404 | 4038.6 | 0.66 | 9.9e-08 | 0.035 |
| 15-gene |
| 133 | Array + RT-PCR | NSCLC + chemo | see | 173 | 573.7 | 0.6 | 0.042 | - |
| 50-gene |
| 129 | Array + RT-PCR + IHC | SCC | see | 421 | 754.3 | 0.65 | 0.0016 | 0.0023 |
| 17-gene |
| 91 | Array | NSCLC | see | 1404 | 618.3 | 1.27 | 0.0027 | 0.48 |
| 6-gene |
| 138 | Array + RT-PCR | NSCLC PFS | see | 764 | 543.5 | 0.77 | 0.017 | - |
| 38-gene |
| 462 | Array | ADE | see | 468 | 437.7 | 0.64 | 0.0031 | 0.092 |
ADE: adenocarcinoma; SCC: squamous cell carcinoma; 2D-DIGE: two-dimensional difference gel electrophoresis; MS: mass spectrometry; n: number of tumor samples included in the study; *highest quality probe, when several high quality probes then the best performing; # see Figure 2. for the survival plots; ✠ of the univariate analysis; multivariate: using those two parameters where most data was available (histology and gender for NSCLC, gender and stage for adenocarcinoma and squamous cell carcinoma). Multivariate analysis was performed only for biomarker candidates significant at p<0.01 in the univariate analysis.
Figure 2Validation of 29 previously published NSCLC biomarkers.
Meta-analysis of these genes and signatures in the respective sample cohort yielded CCNE1, CDC2 and CADM1 as the best performing individual genes (A–C) and the signature of Yamauchi et al. (D). A funnel plot depicting the hazard ratios (with confidence intervals) versus sample number for CDC2 and VEGF shows more reliable estimation with larger database sizes (E–F).