| Literature DB >> 24558399 |
Jorma J de Ronde1, Marc Jan Bonder2, Esther H Lips3, Sjoerd Rodenhuis4, Lodewyk F A Wessels5.
Abstract
INTRODUCTION: Despite continuous efforts, not a single predictor of breast cancer chemotherapy resistance has made it into the clinic yet. However, it has become clear in recent years that breast cancer is a collection of molecularly distinct diseases. With ever increasing amounts of breast cancer data becoming available, we set out to study if gene expression based predictors of chemotherapy resistance that are specific for breast cancer subtypes can improve upon the performance of generic predictors.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24558399 PMCID: PMC3928239 DOI: 10.1371/journal.pone.0088551
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Distribution of samples in the subgroups.
| Stratification | pCR (%) | No pCR (%) |
|
| ||
|
| 31 (38) | 51 (62) |
|
| 14 (9) | 185 (91) |
|
| 42(37) | 71 (63) |
|
| 25 (56) | 20 (44) |
|
| 6 (16) | 31 (84) |
|
| ||
|
| 87 (22) | 307 (78) |
The sample sizes that are depicted are from the expression based predictors. The sample sizes for the clinical predictors are a bit lower due to missing data and can be found in Table S1.
*The HER2 positive, ER positive group was not included in the analysis due to the small sample size.
Figure 1Cartoon of the double loop cross-validation scheme.
Our analysis employed a double look cross-validation. The inner loop determines the optimal number of features to be used by a specific combination of feature selection and classifier, here depicted by the green block. This inner loop uses 2/3 of all data (i.e. the training data), the remaining 1/3 is employed to measure the performance of the trained classifier (i.e. a 3 fold cross-validation setup). The outer loop is repeated 15 times in order to get an average AUC for each predictor.
Figure 2The AUC scores for the best performing predictors on each subtype.
AUCs for the (A) HER2 positive subtype; (B) Luminal subtype; (C) Triple negative subtype and (D) HER2 positive and ER negative subtype. The red bars represent the clinical predictors, blue bars the expression based predictors and darker colors represent non-subtype specific predictors. When two boxplots are connected with a u-shaped line, the means of the AUC distributions are significantly different for the experiment represented by the boxplots (two-sided t-test, p<0.05, Bonferroni multiple testing corrected.)
Characteristics of the optimal predictors for the different subtypes.
| Clinical | Gene Expression | |||
| Stratification | Subtype specific | Non specific | Subtype specific | Non specific |
|
| LREG-Relief | NM-Relief | NM-WMW | NB-BWR |
|
| NM- CFS | NB-Relief | NB-BWR | LREG-WMW- uncor. |
|
| NB-Relief | NB-CFS | NB-WMW | NB-WMW |
|
| 3NN-CFS | NM-CFS | NB-WMW | LREG-BWR |
In each cell the optimal combination of classifier, and feature selection method, is shown.
Legend: classifiers: NB = Naive Bayes, NM = Nearest Mean, LREG: Logistic regression, SVM = Support vector machine, 3NN = 3-Nearest Neighbor; Feature selection methods: CFS = Correlated feature selection, WMW = Wilcoxon-Mann-Whitney, BWR = Ratio between to within class sum of squares, WMW-uncor. = Wilcoxon-Mann-Whitney where correlated features are removed, Inf.gain = information gain.