| Literature DB >> 20809974 |
Fabio Parisi1, Ana M Gonzalez, Yasmine Nadler, Robert L Camp, David L Rimm, Harriet M Kluger, Yuval Kluger.
Abstract
INTRODUCTION: Multi-marker molecular assays have impacted management of early stage breast cancer, facilitating adjuvant chemotherapy decisions. We generated prognostic models that incorporate protein-based molecular markers and clinico-pathological variables to improve survival prediction.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20809974 PMCID: PMC3096952 DOI: 10.1186/bcr2633
Source DB: PubMed Journal: Breast Cancer Res ISSN: 1465-5411 Impact factor: 6.466
Primary antibodies and the company that supplied them
| Protein (species) | Company |
|---|---|
| BCL2 (mouse) | Dako, Carpinteria, CA, USA |
| BAG1 (mouse) | Chemicon, Millipore, Billerica, MA, USA |
| BIRC5 (rabbit) | Novos Biological, Littleton, CO, USA |
| MKI67 (mouse) | BD pharmingen, San Jose, CA, USA |
| CD68 (Cd68) | GeneTex, Irvine, CA, USA |
| MYBL2 (rabbit) | GeneTex, Irvine, CA, USA |
| MMP11 (mouse) | Chemicon, Millipore, Billerica, MA, USA |
| GRB7 (GRB7 rabbit) | Santa Cruz, Santa Cruz, CA, USA |
| AURKA (rabbit) | Cell signaling,Danvers, MA, USA |
| GSTM1 (mouse) | Novus Biologicals, Littleton, CO, USA |
| CCNB1 (mouse) | Novus Biologicals, Littleton, CO, USA |
| CTSL2 (mouse) | R&D, Minneapolis, MN, USA |
| ESR1 (mouse) | Dako, Carpinteria, CA, USA |
| PGR (mouse) | Dako, Carpinteria, CA, USA |
| ERBB2 (rabbit) | Dako, Carpinteria, CA, USA |
Pseudocode of nested cross-validation for model selection and model assessment
| Repeat 100 times: | ||||
| Divide the data into 10 outer folds | ||||
| Repeat 10 times: | ||||
| Keep 1 outer fold for testing | ||||
| Select the remaining 9 outer folds for training | ||||
| Divide the 9 outer training folds into 10 inner folds | ||||
| Repeat 10 times: | ||||
| Keep 1 inner fold for testing | ||||
| Select the remaining 9 inner folds for training | ||||
| Move all variables into the list of available variables | ||||
| Create an empty list of nested model variables | ||||
| Iterate this backward selection procedure until only 1 variable is left in the list of available variables: | ||||
| Train Cox models on the inner training set. Each Cox model contains all available variables except of 1 variable at a time | ||||
| Select the variable that contributes the least to the model likelihood | ||||
| Move the selected variable from the list of available variables to the top of the list of nested model variables | ||||
| Move the last available variable to the top of the list of nested model variables | ||||
| Iterate over the list of nested variables: | ||||
| Train the Cox model containing the present variable and the variables above it in the list of nested variables using the inner training set. | ||||
| Evaluate the average time-dependent area under the receiver operating characteristic curve (ATD-AUCROC) | ||||
| Record the variable usage U in the present Cox model and the size | ||||
| Estimate: | ||||
| - the expected model size <n> = ΣX(hX nX)/ΣX(hX) | ||||
| - the (inner) variable stability score for each variable vm: <vm> = ΣX(hx UX(vm))/ΣX(hx) | ||||
| Train the Cox model containing the most stable <n> variables using the outer training set. | ||||
| Evaluate the ATD-AUCROC | ||||
| Record the variable usage T in the present Cox model and the size | ||||
| TX(vm) = 1 if vm is in model X, 0 otherwise. | ||||
Figure 1Performance, model size distribution and variable stability of reduced models for predicting 15-year breast cancer-specific survival. Upper row: The average time-dependent area under the receiver operator characteristic curve (ATD-AUCROC) performances of the full Cox models (FM) and reduced models (RM) derived utilizing 14 of the proteins included in the Oncotype DX assay (left column), the 18-variable full model that incorporates these 14 markers with four additional clinico-pathological variables (middle column) and seven standard clinico-pathological variables (right column) are denoted by circles. The corresponding performances on the training sets are denoted by plus signs. Error bars span ± 1 standard deviation from the average performance of the models. Combining protein plus clinico-pathological variables improved model performance, and variable reduction shown in the reduced models resulted in further improvement. Middle row: The sizes of the 15-year survival reduced Cox models were derived from the expected model size distributions. Bottom row: The variables incorporated in these reduced models were chosen according to their stability (frequency) in the nested cross-validation procedure. Distribution of model sizes and frequency-based stability were derived from the reduced models trained on the outer training set. For example, the average size distribution of the reduced models derived from the protein only variables (left column) is four, and thus the final reduced model includes AURKA, BCL2, CD68 and MYBL2. ER, estrogen receptor; HER, human epidermal growth factor receptor; PR, progesterone receptor.
Univariate analysis for each of the 18 markers included in the full model of Figure 1
| Variable | 95% CI | ||
|---|---|---|---|
| Tumor size | 1.064 | 1.200 | 0.0003 |
| Age <50 years | 0.959 | 2.075 | 0.0719 |
| Nuclear grade | 1.017 | 1.676 | 0.0353 |
| Nodal status | 1.717 | 3.456 | 2.3 × 10-7 |
| ER (IHC) | 0.709 | 0.926 | 0.0017 |
| PR (IHC) | 0.751 | 0.986 | 0.0281 |
| HER2 (IHC) | 1.018 | 1.379 | 0.0337 |
| AURKA | 1.008 | 1.019 | 0.0001 |
| BAG1 | 0.986 | 1.008 | 0.5826 |
| BCL2 | 0.988 | 0.999 | 0.0084 |
| BIRC5 | 0.999 | 1.023 | 0.0642 |
| CCNB1 | 0.968 | 1.060 | 0.5854 |
| CD68 | 0.975 | 1.017 | 0.6866 |
| GRB7 | 1.006 | 1.019 | 0.0008 |
| GSTM1 | 0.996 | 1.030 | 0.1412 |
| KI67 | 0.994 | 1.041 | 0.1722 |
| MMP11 | 0.996 | 1.018 | 0.2349 |
| MYBL2 | 0.984 | 1.016 | 0.9950 |
For each marker a univariate Cox proportional hazards model is fit to the data using the entire cohort. The 95% confidence interval (CI) is shown together with the log-likelihood test P value. Estrogen receptor (ER), progesterone receptor (PR) and human epidermal growth factor receptor (HER) 2 were measured by immunohistochemistry (IHC).
Figure 2Performance, model size distribution and variable stability of reduced models as described in Figure 1 for lower NPI and higher NPI risk groups. Left column: Patients with an Nottingham Prognostic Index (NPI) of more than 4.4. Right column: NPI of 4.4 or less. The final reduced model (RM) for the lower NPI group consists of 7 variables, whereas the final reduced model for the higher NPI group consists of 11 partially overlapping variables. For example, CCNB1 is one of the most robust variables in the higher NPI group, but is the least robust variable in the lower NPI group. ATD-AUCROC, average time-dependent area under the receiver operator characteristic curve; ER, estrogen receptor; FM, full models; HER, human epidermal growth factor receptor; PR, progesterone receptor.
Figure 3Performance, model size distribution and variable stability of reduced models as described in Figure 1 for node-negative (node(-)) and hormone receptor positive (+) subpopulation. We included patients whose tumors were estrogen receptor (ER) positive, progesterone receptor (PR) positive or both. The left column shows the models for all variables excluding nodal status, and the right column shows models for the clinico-pathological variables alone (tumor size, nuclear grade, age, human epidermal growth factor receptor (HER) 2, ER, PR). The compact, reduced model (RM) derived from molecular and clinico-pathological covariates dramatically outperformed the full models (FM), and included AURKA, tumor size, HER2, CD68 and nuclear grade. ATD-AUCROC, average time-dependent area under the receiver operator characteristic curve.