| Literature DB >> 36121822 |
Raffaella Massafra1, Maria Colomba Comes1, Samantha Bove1, Vittorio Didonna1, Sergio Diotaiuti1, Francesco Giotta1, Agnese Latorre1, Daniele La Forgia1, Annalisa Nardone1, Domenico Pomarico2,3, Cosmo Maurizio Ressa1, Alessandro Rizzo1, Pasquale Tamborra1, Alfredo Zito1, Vito Lorusso1, Annarita Fanizzi1.
Abstract
Designing targeted treatments for breast cancer patients after primary tumor removal is necessary to prevent the occurrence of invasive disease events (IDEs), such as recurrence, metastasis, contralateral and second tumors, over time. However, due to the molecular heterogeneity of this disease, predicting the outcome and efficacy of the adjuvant therapy is challenging. A novel ensemble machine learning classification approach was developed to address the task of producing prognostic predictions of the occurrence of breast cancer IDEs at both 5- and 10-years. The method is based on the concept of voting among multiple models to give a final prediction for each individual patient. Promising results were achieved on a cohort of 529 patients, whose data, related to primary breast cancer, were provided by Istituto Tumori "Giovanni Paolo II" in Bari, Italy. Our proposal greatly improves the performances returned by the baseline original model, i.e., without voting, finally reaching a median AUC value of 77.1% and 76.3% for the IDE prediction at 5-and 10-years, respectively. Finally, the proposed approach allows to promote more intelligible decisions and then a greater acceptability in clinical practice since it returns an explanation of the IDE prediction for each individual patient through the voting procedure.Entities:
Mesh:
Year: 2022 PMID: 36121822 PMCID: PMC9484691 DOI: 10.1371/journal.pone.0274691
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.752
Features’ distributions of the collected patients.
| Features | Distribution | Features | Distribution |
|---|---|---|---|
| 529; 100% | N2 (abs.; %) | 41; 7.8% | |
|
| N3 (abs.; %) | 21; 4.0% | |
| Median; [ | 51; [45, 60] | NA (abs.; %) | 9; 1.7% |
|
|
| ||
| Yes (abs.; %) | 16 (3.0%) | No (abs.; %) | 52; 9.8% |
| No (abs.; %) | 513 (97.0%) | Yes (abs.; %) | 466; 88.0% |
|
| NA (abs.; %) | 11; 2.2% | |
| T1a (abs.; %) | 19; 3.6% |
| |
| T1b (abs.; %) | 45; 8.5% | Not Done (abs.; %) | 438; 82.8% |
| T1c (abs.; %) | 227; 42.9% | Positive (abs.; %) | 33; 6.3% |
| T2 (abs.; %) | 187; 35.4% | Negative (abs.; %) | 52; 9.8% |
| T3 (abs.; %) | 14; 2.6% | NA (abs.; %) | 6; 1.1% |
| T4 (abs.; %) | 21; 4.0% |
| |
| NA (abs.; %) | 16; 3.0% | Median; [ | 19 [14,24] |
|
| NA (abs.; %) | 18; 3.7% | |
| Yes (abs.; %) | 108; 20.4% |
| |
| No (abs.; %) | 419; 79.2% | Median; [ | 0 [0,2] |
| NA (abs.; %) | 2; 0.4% | NA (abs.; %) | 29; 6.0% |
|
|
| ||
| Ductal (abs.; %) | 468; 88.5% | No (abs.; %) | 148; 28.0% |
| Lobular (abs.; %) | 43; 8.1% | Yes (abs.; %) | 379; 71.6% |
| Other (abs.; %) | 18; 3.4% | NA (abs.; %) | 2; 0.4% |
|
|
| ||
| Quadrantectomy (abs.; %) | 339; 64.0% | No (abs.; %) | 157; 29.7% |
| Mastectomy (abs.; %) | 190; 36.0% | Yes (abs.; %) | 370; 69.9% |
|
| NA (abs.; %) | 2; 0.4% | |
| Median; [ | 44 [0,80] |
| |
| NA (abs.; %) | 5; 1.0% | No (abs.; %) | 465; 87.9% |
|
| Yes (abs.; %) | 63; 11.9% | |
| Median; [ | 21 [0,70] | NA (abs.; %) | 1; 0.2% |
| NA (abs.; %) | 6; 1.2% |
| |
|
| Absent (abs.; %) | 148; 28.0% | |
| Median; [ | 22 [10,40] | Anthra. + taxane (abs.; %) | 82; 15.5% |
| NA (abs.; %) | 11 22.6% | Anthra. (abs.; %) | 123; 23.2% |
|
| taxane (abs.; %) | 3; 0.5% | |
| G1 (abs.; %) | 48; 9.1% | CMF (abs.; %) | 100; 18.9% |
| G2 (abs.; %) | 231; 43.6% | other (abs.; %) | 68; 12.9% |
| G3 (abs.; %) | 229; 43.3% | NA (abs.; %) | 5; 1.0% |
| NA (abs.; %) | 21; 4.0% |
| |
|
| Absent (abs.; %) | 157; 29.7% | |
| Negative (abs.; %) | 336; 63.5% | Tamoxifen (abs.; %) | 29; 5.5% |
| Positive (abs.; %) | 85; 16.1% | LHRHa (abs.; %) | 4; 0.8% |
| NA (abs.; %) | 108; 20.4% | Tamoxifen + LHRHa (abs.; %) | 85; 16.1% |
|
| AI (abs.; %) | 163; 30.8% | |
| 0 (abs.; %) | 162; 30.7% | Tamoxifen + AI (abs.; %) | 28; 5.3% |
| 1 (abs.; %) | 99; 18.7% | LHRHa + AI (abs.; %) | 13; 2.5% |
| 2 (abs.; %) | 61; 11.5% | other (abs.; %) | 44; 8.3% |
| 3 (abs.; %) | 71; 13.4% | NA (abs.; %) | 6; 1.0% |
| NA (abs.; %) | 136; 25.7% |
| |
|
| No (abs.; %) | 4; 0.8% | |
| Absent (abs.; %) | 405; 76.6% | HT (abs.; %) | 157; 29.7% |
| G1 (abs.; %) | 22; 4.2% | CT (abs.; %) | 116; 21.9% |
| G2 (abs.; %) | 15; 2.8% | CT + HT (abs.; %) | 187; 35.3% |
| G3 (abs.; %) | 16; 3.0% | CT + trastuzumab (abs.; %) | 24; 4.5% |
| present, not typed (abs.; %) | 69; 13.0% | CT + HT + trastuzumab (abs.;%) | 39; 7.4% |
| NA (abs.; %) | 2; 0.4% | NA (abs.; %) | 2; 0.4% |
|
|
| ||
| Absent (abs.; %) | 339; 64.1% | Median; [ | 3; [0, 5] |
| Focal (abs.; %) | 101; 19.1% | NA (abs.; %) | 14; 2.9% |
| Extensive (abs.; %) | 29; 5.5% |
| |
| present, not typed (abs.; %) | 60; 11.3% | Median; [ | 0 [0,0] |
|
|
| ||
| N0 (abs.; %) | 271; 51.2% | Median; [ | 1; [1, 1] |
| N1 (abs.; %) | 187; 35.3% | NA (abs.; %) | 18; 3.7% |
Absolute and percentage counts are reported in round brackets. For age, ER, PgR, Ki67, eradicated l. and metastatic l., CT months, diag.–surg. months, surg.–ther. months, the median value and first (q1) and third (q3) quartiles of the distribution are indicated in squared brackets. The number of missing values (NA) is also specified.
Fig 1Workflow representing the diverse baseline models composing the proposed ensemble machine learning approach.
Model 1 represents the original model which inputs the raw features referred to the training set. Model 2 is a model obtained after having applied a cleaning up procedure on the training set of Model 1: it does not involve the so-called confounding patients identified by means of the cleaning up procedure. Model 3 uses all the identified confounding patients as training set. The inner functioning of the three models consists of a feature selection by Boruta technique and, subsequently, of the training of an XGBoost (XGB) classifier, which is validated on an independent test set. The scores obtained by the three models are then combined according to specific rules, thus obtaining an ensemble model, which finally returns a prediction about the IDE occurrence. All the procedure is performed ten times independently and separately at 5-and 10-year follow-ups.
Fig 2Consensus maps among each pair of the four classifiers, Random Forest (RF), Support Vector Machine (SVM), XGBoost (XGB) and Naïve Bayes (NB), (a) before and (b) after the cleaning up procedure applied for the 10-year IDE prediction. Consensus is measured by computing the Cohen’s kappa (κ) coefficient between each pair of classifiers on training sets and then averaged over 20 5-fold cross validation rounds.
Fig 3Maps representing the statistical frequency of features selected in nested cross-validation over the ten training sets for Model 1, Model 2 and Model 3.
(a) at 5-year follow-up and (b) at 10-year follow-up. The statistical frequency is expressed in percentage values on the color bar: lower values correspond to dark colors; higher values correspond to brighter colors. The 28 collected features are disposed on the y-axis, whereas the training set identifiers from 1 to 10 are disposed on the x-axis.
Summary of the performances for the 5- and 10-year follow-ups achieved by Model 1, Model 2, and Model 3 over the ten training sets after applying a 20 5-fold cross validation procedure.
| Follow-up | Model | AUC (%) | Accuracy (%) | Sensitivity (%) | Specificity (%) |
|---|---|---|---|---|---|
| Model 1 | 68.1 [66.8–69.4] | 66.8 [65.9–68.1] | 52.8 [50.8–55.0] | 72.3 [71.1–73.6] | |
| 5 years | Model 2 | 94.8 [93.7–95.7] | 88.8 [87.7–90.0] | 81.9 [78.7–84.6] | 91.2 [90.1–92.0] |
| Model 3 | 83.8 [79.5–86.8] | 75.8 [71.6–79.1] | 77.1 [71.4–81.3] | 73.7 [69.6–78.6] | |
| 10 years | Model 1 | 68.0 [66.7–69.6] | 64.3 [62.9–65.3] | 58.0 [55.9–60.1] | 68.0 [66.3–69.4] |
| Model 2 | 94.4 [93.9–95.5] | 87.0 [85.6–88.2] | 83.5 [81.5–85.2] | 88.9 [87.4–90.3] | |
| Model 3 | 89.9 [86.0–92.3] | 82.4 [78.5–85.1] | 84.0 [79.6–87.6] | 80.0 [73.5–84.4] |
The performances are evaluated in percentage median values of AUC, accuracy, sensitivity, and specificity. The percentage first and third quartile values are also computed and reported in brackets.
Summary of the performances for the 5- and 10-year follow-up achieved by Model 1, Model 2, and Model 3 over the ten independent test sets.
| Follow-up | Model | AUC (%) | Accuracy (%) | Sensitivity (%) | Specificity (%) |
|---|---|---|---|---|---|
| Model 1 | 65.8 [63.1–68.9] | 65.1 [63.4–70.6] | 49.2 [42.8–57.1] | 70.7 [67.6–74.3] | |
| 5 years | Model 2 | 70.5 [67.7–73.4] | 71.1 [67.3–74.5] | 50.1 [42.9–63.7] | |
| Model 3 | 34.3 [29.1–40.6] | 35.9 [32.7–40.4] | 35.5 [35,3–57.1] | 31.9 [28.2–35.1] | |
| Ensemble Model |
|
|
| 75.5 | |
| 10 years | Model 1 | 67.9 [59.8–70.3] | 63.2 [58.5–67.9] | 62.2 [56.0–66.7] | 62.1 [58.8–71.9] |
| Model 2 | 70.7 [59.6–76.7] | 66.0 [62.3–69.8] | 51.3 [47.6–57.9] | 75.8 [ | |
| Model 3 | 34.8 [31.7–43.6] | 41.5 [37.7–47.2] | 50.0 [42.1–52.4] | 38.5 [32.3–46.4] | |
| Ensemble Model |
|
|
|
The performances are evaluated in percentage median values of AUC, accuracy, sensitivity, and specificity. The percentage first and third quartile values are also computed and reported in brackets. For each metric, the best performances are highlighted in bold.
Fig 4Comparison of the AUC values achieved by applying either the proposed ensemble model (orange lines) or the original model (blue lines) and the percentage number of no answers over each of the ten independent tests (a) at the 5-year follow-up and (b) at the 10-year follow-up.