| Literature DB >> 21170350 |
Rachel M Ostroff1, William L Bigbee, Wilbur Franklin, Larry Gold, Mike Mehan, York E Miller, Harvey I Pass, William N Rom, Jill M Siegfried, Alex Stewart, Jeffrey J Walker, Joel L Weissfeld, Stephen Williams, Dom Zichi, Edward N Brody.
Abstract
BACKGROUND: Lung cancer is the leading cause of cancer deaths worldwide. New diagnostics are needed to detect early stage lung cancer because it may be cured with surgery. However, most cases are diagnosed too late for curative surgery. Here we present a comprehensive clinical biomarker study of lung cancer and the first large-scale clinical application of a new aptamer-based proteomic technology to discover blood protein biomarkers in disease. METHODOLOGY/PRINCIPALEntities:
Mesh:
Substances:
Year: 2010 PMID: 21170350 PMCID: PMC2999620 DOI: 10.1371/journal.pone.0015003
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Study Flow for Algorithm Training and Verification.
Sample cohort by independent study site.
| Site | Cases (n = 291) | Nodule Controls (n = 565) | Smoker Controls (n = 470) | Total/Site |
| BS | 43 | 0 | 63 | 106 |
| RPCI | 72 | 66 | 110 | 248 |
| NYU | 88 | 238 | 172 | 498 |
| PITT | 88 | 261 | 125 | 474 |
Clinical characteristics of NSCLC case and control sets for training and verification.
| Training Set (n = 985) | Verification Set (n = 341) | ||||||
| Cases | Controls | p-value | Cases | Controls | p-value | ||
| Individuals, no. (%) | 213 (21.6) | 772 (78.4) | 78 (22.9) | 263 (77.1) | |||
| Sex (%) | Male | 51.2 | 47.4 | 43.6 | 47.9 | ||
| Female | 48.8 | 52.6 | 0.3305 | 56.4 | 52.1 | 0.5015 | |
| Age, mean (SD) | 67.6 (9.8) | 59.0 (10.2) | <0.0001 | 68.3 (10.2) | 58.8 (9.6) | <0.0001 | |
| Control Nodule Status, no. (%) | Benign nodule | n/a | 420 (54.4) | n/a | 145 (55.1) | ||
| No nodule | n/a | 222 (28.8) | n/a | 75 (28.5) | |||
| Unknown | n/a | 130 (16.8) | n/a | 43 (16.4) | |||
| Smoking Status, no. | Current | 54 | 421 | <0.0001 | 25 | 150 | <0.0001 |
| Ex | 85 | 310 | <0.0001 | 31 | 108 | <0.0001 | |
| Never | 11 | 6 | <0.0001 | 7 | 3 | <0.0001 | |
| Unknown | 63 | 35 | <0.0001 | 15 | 2 | <0.0001 | |
| Smoking (PKY), mean (SD) | 47.1 (33.7) | 42.3 (24.2) | 0.0258 | 40.9 (30.8) | 42.3 (24.6) | 0.7003 | |
For continuous data the differences were tested using t-tests. For categorical data significant differences were tested using the Pearson Chi-Squared Test for independence.
Pack-years: product of the self reported number of packs of cigarettes smoked per day and the number of years of smoking.
Clinical characteristics of NSCLC cases in the training and verification sets.
| Training Cases, n = 213, no. (%) | Verification Cases, n = 78, no. (%) | ||
| Stage NSCLC | I | 99 (46.5) | 38 (49) |
| II | 32 (15.0) | 11 (14) | |
| III | 82 (38.5) | 27 (35) | |
| Not reported | - | 2 (2) | |
| Histology | Adenocarcinoma | 120 (56.3) | 49 (62.8) |
| Squamous | 71 (33.3) | 18 (23.1) | |
| Large cell | 2 (1.0) | 2 (2.6) | |
| NSCLC NOS | 20 (9.4) | 9 (11.5) |
Clinical staging for 17 Stage I, 5 Stage II and 29 Stage III cases, NOS not otherwise specified.
Potential NSCLC biomarkers§.
|
| Protein Name | UniProt ID | KS | q-value | NB Freq |
| 1 | BCA-1 | O43927 | 0.34 | 2.51E-17 | 1 |
| 2 | BMP-1 | P13497 | 0.35 | 3.49E-18 | 10 |
| 3 | C1s | P09871 | 0.29 | 3.92E-13 | 1 |
| 4 | C9 | P02748 | 0.41 | 1.33E-24 | 6 |
| 5 | Cadherin-1 | P12830 | 0.32 | 1.47E-15 | 206 |
| 6 | Calpain I | P07384 P04632 | 0.4 | 8.46E-24 | 72 |
| 7 | Catalase | P04040 | 0.32 | 1.21E-15 | 2 |
| 8 | CD30 Ligand | P32971 | 0.28 | 1.22E-12 | 51 |
| 9 | CDK5/p35 | Q00535 Q15078 | 0.27 | 1.34E-11 | 31 |
| 10 | CK-MB | P12277 P06732 | 0.33 | 2.51E-16 | 19 |
| 11 | Contactin-5 | O94779 | 0.29 | 1.67E-13 | 3 |
| 12 | Endostatin | P39060 | 0.28 | 8.48E-13 | 33 |
| 13 | ERBB1 | P00533 | 0.46 | 6.32E-31 | 136 |
| 14 | FGF-17 | O60258 | 0.31 | 6.12E-15 | 6 |
| 15 | FYN | P06241 | 0.13 | 5.19E-04 | 14 |
| 16 | HSP 90α | P07900 | 0.51 | 7.86E-37 | 85 |
| 17 | HSP 90β | P08238 | 0.39 | 1.50E-22 | 7 |
| 18 | IGFBP-2 | P18065 | 0.36 | 1.87E-19 | 54 |
| 19 | IL-15 Rα | Q13261 | 0.29 | 2.62E-13 | 4 |
| 20 | IL-17B | Q9UHF5 | 0.28 | 1.07E-12 | 1 |
| 21 | Importin β1 | Q14974 | 0.4 | 1.31E-23 | 30 |
| 22 | Kallikrein 7 | P49862 | 0.31 | 1.79E-14 | 43 |
| 23 | LDH-H 1 | P07195 | 0.3 | 8.64E-14 | 3 |
| 24 | Legumain | Q99538 | 0.28 | 2.52E-12 | 1 |
| 25 | LRIG3 | Q6UXM1 | 0.34 | 1.13E-17 | 25 |
| 26 | Macrophage mannose receptor | P22897 | 0.37 | 6.21E-21 | 21 |
| 27 | MAPK13 | O15264 | 0.34 | 4.66E-18 | 1 |
| 28 | MEK1 | Q02750 | 0.29 | 2.62E-13 | 5 |
| 29 | MetAP2 | P50579 | 0.44 | 3.40E-28 | 7 |
| 30 | Midkine | P21741 | 0.11 | 1.67E-03 | 7 |
| 31 | MIP-4 | P55774 | 0.29 | 2.69E-13 | 43 |
| 32 | MIP-5 | Q16663 | 0.31 | 1.53E-14 | 27 |
| 33 | MMP-7 | P09237 | 0.38 | 1.67E-21 | 36 |
| 34 | NACα | Q13765 | 0.33 | 7.57E-17 | 5 |
| 35 | NAGK | Q9UJ70 | 0.37 | 1.25E-20 | 5 |
| 36 | Pleiotrophin | P21246 | 0.29 | 5.02E-13 | 107 |
| 37 | PRKCI | P41743 | 0.41 | 3.81E-25 | 97 |
| 38 | Renin | P00797 | 0.25 | 1.69E-10 | 2 |
| 39 | RGM-C | Q6ZVN8 | 0.27 | 5.43E-12 | 84 |
| 40 | SCF sR | P10721 | 0.35 | 6.97E-19 | 107 |
| 41 | sL-Selectin | P14151 | 0.29 | 7.88E-13 | 57 |
| 42 | Ubiquitin+1 | P62988 | 0.33 | 4.09E-17 | 1 |
| 43 | VEGF | P15692 | 0.29 | 5.47E-13 | 1 |
| 44 | YES | P07947 | 0.28 | 1.73E-12 | 47 |
Measure of the relative importance of potential biomarkers selected with KS distance (KS), KS FDR-corrected q-value (q-value), frequency for naïve Bayes (NB Freq),
Criteria for algorithm performance on training and cross-validation.
| Criteria | Minimum Performance |
|
| Biomarker frequency in greedy algorithm classifiers | 10 | 250 |
| Sensitivity (Stage I-III) + Specificity | 1.7 | 94 |
| Stage I Sensitivity | 0.85 | 80 |
| Cross-validation Sensitivity (Stage I-III)+ Specificity | 1.7 | 50 |
| Cross-validation Stage I Sensitivity | 0.85 | 50 |
| Severe COPD Specificity | 0.65 | 45 |
Figure 2ROC curve for 12-biomarker naïve Bayes classifier.
Performance of Bayesian Classifier to distinguish NSCLC cases from controls.
| Sensitivity (%), (95% CI) | Specificity (%), (95% CI) | ||
| NSCLC Cases | Training Stage I-III | 91 (87-95) | |
| Training Stage I | 90 (84-96) | ||
| 10-fold Cross Validation | 91 (87-95) | ||
| Verification Stage I-III | 89 (81-96) | ||
| Verification Stage I | 87 (78-96) | ||
| Controls | Training All Controls | 84 (81-86) | |
| Training Benign Nodules | 82 (78-85) | ||
| 10-fold Cross Validation | 83 (80-86) | ||
| Verification All Controls | 83 (79-88) | ||
| Verification Benign Nodules | 85 (79-91) |
Figure 3ROC curve performance of the 12-biomarker naïve Bayes NSCLC classifier by study site.
Twelve biomarker classifier proteins§.
| Biomarker | UniProt ID | Direction* | Description |
| Cadherin-1 | P12830 | down | cell adhesion, transcription regulation |
| CD30 Ligand | P32971 | up | cytokine |
| Endostatin | P39060 | up | inhibition of angiogenesis |
| HSP 90α | P07900 | up | chaperone |
| LRIG3 | Q6UXM1 | down | protein binding, tumor suppressor |
| MIP-4 | P55774 | up | monokine |
| Pleiotrophin | P21246 | up | growth factor |
| PRKCI | P41743 | up | serine/threonine protein kinase, oncogene |
| RGM-C | Q6ZVN8 | down | iron metabolism |
| SCF sR | P10721 | down | decoy receptor |
| sL-Selectin | P14151 | down | cell adhesion |
| YES | P07947 | up | tyrosine kinase, oncogene |
Up or down regulation in NSCLC cases relative to controls.
Performance of classifier in demographic subsets.
| Cases | Controls | Sensitivity (%) (95%CI) | Specificity (%) (95%CI) | Accuracy (%) (95%CI) | AUC | ||
| Age | ≤61 | 57 | 467 | 84 (75-94) | 89 (86-92) | 88 (85-91) | 0.91 |
| >61 | 156 | 304 | 93 (89-97) | 76 (71-80) | 82 (78-85) | 0.89 | |
| Smoking Status | Current | 54 | 421 | 93 (86-100) | 86 (83-90) | 87 (84-90) | 0.91 |
| Ex | 85 | 310 | 91 (84-97) | 85 (80-89) | 86 (82-89) | 0.93 | |
| Pack Years | ≤40 | 84 | 381 | 91 (84-97) | 86 (83-90) | 87 (84-90) | 0.93 |
| >40 | 76 | 347 | 97 (94-100) | 84 (81-88) | 87 (84-90) | 0.94 |
Classifier specificity by level of airflow obstruction.
| Airflow Obstruction | FEV1 % Predicted | Number of Patients | Specificity (%), (95% CI) |
| GOLD 0/I | >80% | 411 | 89 (86-92) |
| GOLD II | 50–80% | 167 | 84 (78-89) |
| GOLD III/IV | <50% | 32 | 72 (56-87) |
Spirometric classification of airflow obstruction based on GOLD staging [60].
Figure 4Heat map shows the magnitude of difference for each protein measured (columns) between subject populations for the comparison of NSCLC to controls (top row) and comparisons of cases or controls between study sites (bottom row).
Top row: KS distances for NSCLC versus control distributions. Bottom row: mean KS distances for all 12 pair-wise comparisons, between the four sites, of case and control samples analyzed separately. Proteins were ordered by subtracting the NSCLC KS distance from the mean site KS distance. This revealed groups of NSCLC biomarkers (top right) contrasting with preanalytical markers (bottom left).
Proteins with Preanalytical Variability§.
| Protein | UniProt ID | Avg. KS |
| C3 | P01024 | 0.7 |
| C3a | P01024 | 0.71 |
| C3adesArg | P01024 | 0.64 |
| iC3b | P01024 | 0.66 |
| C3b | P01024 | 0.59 |
| BMP-14 | P43026 | 0.45 |
| Coagulation Factor IXab | P00740 | 0.42 |
Proteins that exhibited an average KS distance ≥0.4 for within-site, class-dependent comparisons of preanalytical variation as shown in Figure 4 and that have been identified independently as subject to significant variations due to sample handling in serum.