| Literature DB >> 26771838 |
Sabri Boughorbel1, Rashid Al-Ali1, Naser Elkum2.
Abstract
We compared the performance of several prediction techniques for breast cancer prognosis, based on AU-ROC performance (Area Under ROC) for different prognosis periods. The analyzed dataset contained 1,981 patients and from an initial 25 variables, the 11 most common clinical predictors were retained. We compared eight models from a wide spectrum of predictive models, namely; Generalized Linear Model (GLM), GLM-Net, Partial Least Square (PLS), Support Vector Machines (SVM), Random Forests (RF), Neural Networks, k-Nearest Neighbors (k-NN) and Boosted Trees. In order to compare these models, paired t-test was applied on the model performance differences obtained from data resampling. Random Forests, Boosted Trees, Partial Least Square and GLMNet have superior overall performance, however they are only slightly higher than the other models. The comparative analysis also allowed us to define a relative variable importance as the average of variable importance from the different models. Two sets of variables are identified from this analysis. The first includes number of positive lymph nodes, tumor size, cancer grade and estrogen receptor, all has an important influence on model predictability. The second set incudes variables related to histological parameters and treatment types. The short term vs long term contribution of the clinical variables are also analyzed from the comparative models. From the various cancer treatment plans, the combination of Chemo/Radio therapy leads to the largest impact on cancer prognosis.Entities:
Mesh:
Year: 2016 PMID: 26771838 PMCID: PMC4714871 DOI: 10.1371/journal.pone.0146413
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Description of the clinical variables included in the comparative analysis.
| Variables | Description | Value examples |
|---|---|---|
| Age at diagnosis | Age at cancer diagnosis | 42.7 years |
| Tumor size | Tumor size in mm (used in TNM classification) | 23, 41 cm |
| Lymph nodes positive | The number of lymph nodes (used in TNM classification) | 1, 3, 10 |
| Grade | Grouping of Nottingham scores into three groups | 1, 2, 3 |
| Histological type | Histology outcome | DCIS, IDC, ILC |
| Estrogen Receptor IHC status | Estrogen receptor status measured by Immunohistochemistry | positive, negative |
| PR Expr | Progesterone Receptor Expression | positive, negative |
| Her2 Expr | Human Epidermal growth factor receptor 2 (Her2) | positive, negative |
| Treatment | One of the three treatments (CT: Chemotheraphy, HT: Hormonal Therapy, RT: Radiaton Therapy) | CT, CT/, CT/HT/RT, NONE, RT |
| Stage | TNM stage | 1, 2, 3, 4 |
| Lymph nodes removed | Number of removed lymph nodes | 8, 14, 25 |
Sample size for the different selected prognosis periods.
Depending on the surival period, samples are included or excluded in one or more datasets among the four.
| Prognosis period | #Survived > T | #Not survived > T | Total #samples |
|---|---|---|---|
| 2 years | 1311 | 76 | 1387 |
| 5 years | 976 | 253 | 1229 |
| 8 years | 674 | 343 | 1017 |
| 11 years | 434 | 398 | 832 |
Tuning parameters of the listed predictive models.
| Models | Tuning parameters |
|---|---|
| Partial Least Square (PLS) | |
| Neural Networks (NN) | |
| Support Vector Machine (SVM) | |
| Boosted Trees (B-Trees) | |
| Random Forests (RF) | |
| GLMNet | |
| GLM | no tuning parameter |
| k-Nearest Neighbors (k-NN) |
Fig 1Tuning results of model parameters using re-sampling (GLM is not included since it does not have tuning parameters).
Fig 2Comparison of model performances for the different use cases (prognosis periods of 2, 5, 8 and 11 years).
Performance summary in terms of AUC-ROC.
| Model | 2 years | 5 years | 8 years | 11 years |
|---|---|---|---|---|
| GLM | 0.62 ± 0.07 | 0.76 ± 0.02 | 0.74 ± 0.02 | 0.71 ± 0.01 |
| GLMNet | 0.75 ± 0.05 | 0.77 ± 0.01 | 0.76 ± 0.03 | 0.72 ± 0.03 |
| Support Vector Machines | 0.64 ± 0.08 | 0.77 ± 0.04 | 0.76 ± 0.04 | 0.72 ± 0.01 |
| Random Forests | 0.73 ± 0.09 | 0.77 ± 0.02 | 0.77 ± 0.03 | 0.75 ± 0.02 |
| Neural Network | 0.67 ± 0.09 | 0.73 ± 0.03 | 0.74 ± 0.04 | 0.72 ± 0.02 |
| k-NN | 0.58 ± 0.13 | 0.72 ± 0.02 | 0.71 ± 0.05 | 0.68 ± 0.03 |
| Boosted Trees | 0.75 ± 0.07 | 0.78 ± 0.02 | 0.75 ± 0.03 | 0.74 ± 0.03 |
| Partial Least Square | 0.75 ± 0.05 | 0.77 ± 0.02 | 0.76 ± 0.03 | 0.73 ± 0.02 |
Overall model performance average in terms of AUC-ROC.
| Model | AUC |
|---|---|
| Random Forests | 0.76 ± 0.05 |
| Boosted Trees | 0.75 ± 0.04 |
| Partial Least Square | 0.75 ± 0.04 |
| GLMNet | 0.75 ± 0.04 |
| Support Vector Machines | 0.72 ± 0.07 |
| Neural Network | 0.71 ± 0.06 |
| GLM | 0.71 ± 0.06 |
| k-NN | 0.67 ± 0.09 |
Performance average of top models sorted by AUC-ROC for the different prognosis periods.
| Period | AUC |
|---|---|
| 5 years | 0.77 ± 0.02 |
| 8 years | 0.76 ± 0.03 |
| 2 years | 0.74 ± 0.07 |
| 11 years | 0.73 ± 0.03 |
Fig 3Model comparison in terms of AU-ROC differences and confidence intervals for the different prognosis periods.
Paired t-test for the statistical evaluation of model differences for the different prognosis periods.
The upper part of the matrix represents the average difference in term of AU-ROC between models in the rows and models in the columns. The lower part represents the adjusted p-values.
| 2 years | GLM | GLM-Net | SVM | RF | NN | k-NN | B-Trees | PLS |
| GLM | – | -0.12 | -0.02 | -0.11 | -0.04 | 0.04 | -0.12 | -0.13 |
| GLM-Net | 0.02 | – | 0.10 | 0.01 | 0.08 | 0.00 | -0.01 | |
| SVM | 1.00 | 0.06 | -0.09 | -0.03 | 0.06 | -0.10 | -0.11 | |
| RF | 0.04 | 1.00 | 1.00 | – | 0.07 | 0.15 | -0.01 | -0.02 |
| NN | 1.00 | 0.32 | 1.00 | 0.07 | – | 0.09 | -0.09 | |
| k-NN | 1.00 | 0.04 | 1.00 | 0.08 | 0.89 | – | -0.16 | |
| B-Trees | 0.02 | 1.00 | 0.25 | 1.00 | 0.05 | 0.04 | – | -0.01 |
| PLS | 0.03 | 1.00 | 0.07 | 1.00 | 0.32 | 0.04 | 1.00 | – |
| 5 years | GLM | GLM-Net | SVM | RF | NN | k-NN | B-Trees | PLS |
| GLM | – | -0.02 | -0.01 | -0.01 | 0.03 | 0.04 | -0.02 | -0.01 |
| GLM-Net | 0.06 | – | 0.01 | 0.00 | 0.04 | -0.00 | 0.01 | |
| SVM | 1.00 | 1.00 | – | -0.01 | 0.04 | -0.01 | -0.00 | |
| RF | 1.00 | 1.00 | 1.00 | – | 0.04 | -0.01 | 0.00 | |
| NN | 0.65 | 0.03 | 0.03 | 0.26 | – | 0.01 | -0.04 | |
| k-NN | 0.08 | 0.00 | 0.39 | 0.01 | 1.00 | – | ||
| B-Trees | 0.27 | 1.00 | 1.00 | 1.00 | 0.01 | 0.02 | – | 0.01 |
| PLS | 0.32 | 1.00 | 1.00 | 1.00 | 0.05 | 0.01 | 1.00 | – |
| 8 years | GLM | GLM-Net | SVM | RF | NN | k-NN | B-Trees | PLS |
| GLM | – | -0.02 | -0.01 | -0.03 | 0.00 | 0.04 | -0.01 | -0.02 |
| GLM-Net | 0.53 | – | 0.01 | -0.01 | 0.02 | 0.01 | 0.00 | |
| SVM | 1.00 | 1.00 | – | -0.01 | 0.02 | 0.00 | -0.00 | |
| RF | 0.48 | 1.00 | 1.00 | – | 0.03 | 0.01 | 0.01 | |
| NN | 1.00 | 1.00 | 1.00 | 0.92 | – | 0.03 | -0.01 | -0.02 |
| k-NN | 0.30 | 0.00 | 0.01 | 0.00 | 1.00 | – | -0.05 | |
| B-Trees | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.14 | – | -0.01 |
| PLS | 0.53 | 1.00 | 1.00 | 1.00 | 1.00 | 0.01 | 1.00 | – |
| 11 years | GLM | GLM-Net | SVM | RF | NN | k-NN | B-Trees | PLS |
| GLM | – | -0.01 | -0.01 | -0.01 | 0.03 | -0.03 | -0.02 | |
| GLM-Net | 1.00 | – | -0.00 | -0.03 | 0.00 | -0.02 | -0.01 | |
| SVM | 0.91 | 1.00 | – | -0.03 | 0.01 | -0.02 | -0.01 | |
| RF | 0.01 | 0.35 | 0.23 | – | 0.03 | 0.01 | 0.02 | |
| NN | 1.00 | 1.00 | 1.00 | 0.33 | – | 0.04 | -0.03 | -0.01 |
| k-NN | 0.21 | 0.04 | 0.03 | 0.00 | 0.17 | – | ||
| B-Trees | 0.09 | 1.00 | 1.00 | 1.00 | 1.00 | 0.01 | – | 0.01 |
| PLS | 1.00 | 1.00 | 1.00 | 0.11 | 1.00 | 0.01 | 1.00 | – |
Fig 4Normalized average relative variable importance for the different prognosis periods.
Relative importance of the top 9 predictors for the different prognosis periods.
| Predictors | 2 years | 5 years | 8 years | 11 years |
|---|---|---|---|---|
| lymph_nodes_positive | 37.52 | 67.91 | 99.49 | 82.09 |
| ER.Expr- | 30.95 | 32.59 | 24.22 | 16.61 |
| size | 28.81 | 56.39 | 71.54 | 59.24 |
| stage | 28.65 | 40.19 | 38.17 | 29.68 |
| ER_IHC_statuspos | 26.28 | 17.24 | 7.30 | 8.23 |
| PR.Expr- | 25.18 | 30.15 | 44.47 | 22.97 |
| grade3 | 23.15 | 32.66 | 25.54 | 21.38 |
| TreatmentCT/RT | 22.72 | 31.55 | 25.94 | 17.33 |
| lymph_nodes_removed | 22.06 | 28.06 | 37.02 | 36.08 |