| Literature DB >> 25274406 |
Balázs Győrffy1, Thomas Karn, Zsófia Sztupinszki, Boglárka Weltz, Volkmar Müller, Lajos Pusztai.
Abstract
The molecular diversity of breast cancer makes it impossible to identify prognostic markers that are applicable to all breast cancers. To overcome limitations of previous multigene prognostic classifiers, we propose a new dynamic predictor: instead of using a single universal training cohort and an identical list of informative genes to predict the prognosis of new cases, a case-specific predictor is developed for each test case. Gene expression data from 3,534 breast cancers with clinical annotation including relapse-free survival is analyzed. For each test case, we select a case-specific training subset including only molecularly similar cases and a case-specific predictor is generated. This method yields different training sets and different predictors for each new patient. The model performance was assessed in leave-one-out validation and also in 325 independent cases. Prognostic discrimination was high for all cases (n = 3,534, HR = 3.68, p = 1.67 E-56). The dynamic predictor showed higher overall accuracy (0.68) than genomic surrogates for Oncotype DX (0.64), Genomic Grade Index (0.61) or MammaPrint (0.47). The dynamic predictor was also effective in triple-negative cancers (n = 427, HR = 3.08, p = 0.0093) where the above classifiers all failed. Validation in independent patients yielded similar classification power (HR = 3.57). The dynamic classifier is available online at http://www.recurrenceonline.com/?q=Re_training. In summary, we developed a new method to make personalized prognostic prediction using case-specific training cohorts. The dynamic predictors outperform static models developed from single historical training cohorts and they also predict well in triple-negative cancers.Entities:
Keywords: breast cancer; gene expression; survival
Mesh:
Year: 2014 PMID: 25274406 PMCID: PMC4354298 DOI: 10.1002/ijc.29247
Source DB: PubMed Journal: Int J Cancer ISSN: 0020-7136 Impact factor: 7.396
Clinical characteristics of patients included in the pooled datasets
| HER2 status | All patients | HER2− | HER2+ | ||
|---|---|---|---|---|---|
| ER status | ER+ | ER− | (ER+ and ER−) | ||
| Adjuvant therapy | No systemic therapy | Adjuvant therapy | Adjuvant therapy | Adjuvant therapy | |
| 3,534 | 672 | 1,316 | 427 | 551 | |
| ER+ | 2,960/3,534 (83.1%) | (all) | (all) | (none) | 372/551 (66.8%) |
| LN+ | 992/3,220 (30.8%) | 3/672 (0.4%) | 564/1,083 (52.1%) | 195/324 (60.2%) | 147/465 (31.6%) |
| Grade 1 | 329/2,185 (15.6%) | 143/528 (27.0%) | 132/815 (16.1%) | 10/326 (3.1%) | 17/291 (5.8%) |
| Grade 2 | 842/2,185 (38.5%) | 306/528 (58.0%) | 355/815 (43.6%) | 45/326 (13.8%) | 97/291 (33.3%) |
| Grade 3 | 964/2,185 (44.1%) | 78/528 (14.8%) | 297/815 (36.5%) | 271/326 (83.1%) | 177/291 (60.8%) |
| Recurrence events | 1,160/3,534 | 229/672 | 357/1,316 | 107/427 | 237/551 |
| Median RFS (years) | 5.85 | 7.85 | 5.42 | 3.44 | 5.51 |
| Median age (year) | 53.2 | 55.2 | 55.5 | 49.9 | 51.5 |
| Median size (cm) | 2.3 | 2.0 | 2.49 | 2.0 | 2.35 |
ER = estrogen receptor, RFS = recurrence free survival.
Figure 1Dynamic predictor development process. A large database is used to select a subset of training cases that are molecularly the most similar to the test case. This training subset is used to identify predictive features and to develop the test-case specific predictor (“molecular classification”). The training set is compared to the remaining samples (“training set assessment”) and the final classification takes into account both the “molecular classification” and the “training set assessment” results.
Figure 2Relapse-free survival curves for the dynamic classifier computed using the top 25 genes and a training set size of 400 samples and genomic surrogates of three commercially available prognostic signatures applied to the same 3,534 cases. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]
Performance comparison of the different predictors for overall sensitivity, specificity, accuracy, positive predictive value (PPV) and negative predictive value (NPV) including confidence intervals (A) and their concordance in terms of designated prediction (B)
| Dynamic re-classification | 21-gene signature | 70-gene signature | 97-gene signature | |
|---|---|---|---|---|
| Sensitivity | 0.84 (0.81–0.86) | 0.80 (0.76–0.82) | 0.98 (0.96–0.98) | 0.81 (0.78–0.84) |
| Specificity | 0.58 (0.55–0.61) | 0.55 (0.53–0.58) | 0.14 (0.12–0.16) | 0.49 (0.46–0.52) |
| Accuracy | 0.68 (0.66–0.70) | 0.64 (0.62–0.65) | 0.47 (0.46–0.47) | 0.61 (0.59–0.63) |
| PPV | 0.56 (0.54–0.58) | 0.53 (0.52–0.55) | 0.42 (0.42–0.43) | 0.51 (0.49–0.52) |
| NPV | 0.85 (0.82–0.87) | 0.81 (0.79–0.83) | 0.92 (0.86–0.95) | 0.80 (0.77–0.83) |
Figure 3Performance of the dynamic classifier and genomic surrogates for three other prognostic signatures in 325 independent validation samples that were not included in the pool of 3,534 samples used for selection of the training set samples. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]