| Literature DB >> 30866535 |
Patrizia Ferroni1,2, Fabio M Zanzotto3, Silvia Riondino4,5, Noemi Scarpato6, Fiorella Guadagni7,8, Mario Roselli9.
Abstract
Machine learning (ML) has been recently introduced to develop prognostic classification models that can be used to predict outcomes in individual cancer patients. Here, we report the significance of an ML-based decision support system (DSS), combined with random optimization (RO), to extract prognostic information from routinely collected demographic, clinical and biochemical data of breast cancer (BC) patients. A DSS model was developed in a training set (n = 318), whose performance analysis in the testing set (n = 136) resulted in a C-index for progression-free survival of 0.84, with an accuracy of 86%. Furthermore, the model was capable of stratifying the testing set into two groups of patients with low- or high-risk of progression with a hazard ratio (HR) of 10.9 (p < 0.0001). Validation in multicenter prospective studies and appropriate management of privacy issues in relation to digital electronic health records (EHR) data are presently needed. Nonetheless, we may conclude that the implementation of ML algorithms and RO models into EHR data might help to achieve prognostic information, and has the potential to revolutionize the practice of personalized medicine.Entities:
Keywords: artificial intelligence; breast cancer prognosis; decision support systems; machine learning
Year: 2019 PMID: 30866535 PMCID: PMC6468737 DOI: 10.3390/cancers11030328
Source DB: PubMed Journal: Cancers (Basel) ISSN: 2072-6694 Impact factor: 6.639
Analytical performance of machine learning with random optimization in the training set.
| ML Predictor | AUC (SE) | 95% CI | Sensitivity (95% CI) | Specificity (95% CI) | +LR | −LR |
|---|---|---|---|---|---|---|
|
| 0.778 (0.0290) | 0.728–0.822 | 67.1 (55.4–77.5) | 88.4 (83.7–92.2) | 5.80 | 0.37 |
|
| 0.769 (0.0293) | 0.719–0.814 | 65.8 (54.0–76.3) | 88.0 (83.2–91.8) | 5.49 | 0.39 |
|
| 0.767 (0.0293) | 0.717–0.813 | 67.1 (55.4–77.5) | 86.4 (81.4–90.4) | 4.92 | 0.38 |
|
| 0.759 (0.0296) | 0.708–0.805 | 65.8 (54.0–76.3) | 86.0 (80.9–90.1) | 4.68 | 0.40 |
|
| 0.759 (0.0296) | 0.708–0.805 | 65.8 (54.0–76.3) | 86.0 (80.9–90.1) | 4.68 | 0.40 |
|
| 0.755 (0.0297) | 0.703–0.801 | 65.8 (54.0–76.3) | 85.1 (80.0–89.4) | 4.42 | 0.40 |
|
| 0.753 (0.0297) | 0.701–0.799 | 65.8 (54.0–76.3) | 84.7 (79.5–89.0) | 4.30 | 0.40 |
|
| 0.748 (0.0299) | 0.697–0.795 | 64.5 (52.7–75.1) | 85.1 (80.0–89.4) | 4.33 | 0.42 |
|
| 0.739 (0.0302) | 0.687–0.786 | 61.8 (50.0–72.8) | 86.0 (80.9–90.1) | 4.40 | 0.44 |
|
| 0.722 (0.0306) | 0.669–0.770 | 59.2 (47.3–70.4) | 85.1 (80.0–89.4) | 3.98 | 0.48 |
AUC: Area under the curve; CI: Confidence interval; LR: Likelihood ratio; ML: Machine learning; RO: Random optimization.
Weights of attribute groups in the training set.
| Method | Group | Sum of the Weights | Normalized Group Weights | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 1 | 2 | 3 | 4 | 5 | ||
|
| 0.41890 | 1.04551 | 0.60311 | 0.33909 | 0.58969 | 2.996321 | 0.13980 | 0.34893 | 0.20128 | 0.11316 | 0.19680 |
|
| 0.77299 | 1.86062 | 1.39445 | 0.90456 | 1.00740 | 5.940053 | 0.13013 | 0.31323 | 0.23475 | 0.15228 | 0.16959 |
|
| 0.42756 | 0.91373 | 1.16514 | 0.39297 | 0.58755 | 3.486968 | 0.12261 | 0.26204 | 0.33414 | 0.11269 | 0.16849 |
|
| 0.44878 | 1.28224 | 0.63075 | 0.44350 | 0.53398 | 3.339267 | 0.13439 | 0.38399 | 0.18888 | 0.13281 | 0.15991 |
|
| 0.46149 | 1.17742 | 0.55782 | 0.34141 | 0.47660 | 3.014770 | 0.15307 | 0.39055 | 0.18503 | 0.11324 | 0.15809 |
|
| 0.54682 | 1.40025 | 0.79264 | 0.59119 | 0.61023 | 3.941154 | 0.13874 | 0.35529 | 0.20112 | 0.15000 | 0.15483 |
|
| 0.64274 | 1.13249 | 0.36078 | 0.39482 | 0.45241 | 2.983255 | 0.21545 | 0.37961 | 0.12093 | 0.13234 | 0.15165 |
Data are absolute numbers for group weights. ML: Machine Learning; RO: Random Optimization.
Analytical performance of machine learning with random optimization in the testing set.
| Performance Parameter | ML-RO-0 | ML-RO-4 | DSS Model a |
|---|---|---|---|
|
| 0.696 | 0.677 | 0.698 |
|
| 0.853 | 0.838 | 0.860 |
|
| 0.822 | 0.813 | 0.815 |
|
| 9.1 (4.3–20.8) | 8.5 (3.9–19.6) | 8.6 (4.2–18.0) |
|
| 0.4 (0.3–0.6) | 0.4 (0.3–0.6) | 0.4 (0.2–0.5) |
|
| 10.7 (4.6–24.8) | 10.3 (4.5–23.7) | 10.9 (4.5–26.6) |
LR: Likelihood ratio; C.I.: Confidence interval; HR: Hazard ratio; a Analytical performance was evaluated after categorization 0/1 based on risk estimate achieved by both predictors; b F-measure represents a harmonic mean of precision [(P) positive predictive value in machine learning] and recall [(R) sensitivity in machine learning] and is calculated as: 2PR/(P+R).
Figure 1Kaplan–Meier curves of progression-free survival (PFS) of the 136 BC women included in the testing set. Comparison between patients at high (>1) or low-risk (≤1) of progression by the combined decision support system (DSS) model.
Clinical-pathological characteristics of breast cancer (BC) patients. Comparison between training and testing set.
| Clinical-Pathological Characteristics | Training Set ( | Testing Set ( |
|---|---|---|
| Age (years), Mean ± SD | 56 ± 13 | 57 ± 12 |
| Menopausal status, N (%) | ||
| Pre | 141 (44) | 51 (38) |
| Post | 177 (56) | 85 (63) |
| Body Mass Index, Mean ± SD | 25.2 ± 4.5 | 25.7 ± 5.2 |
| Histological diagnosis, N (%) | ||
| Ductal | 263 (83) | 121 (89) |
| Lobular | 37 (12) | 9 (7) |
| Others | 18 (5) | 6 (4) |
| Molecular Type a, N (%) | ||
| Triple-negative | 39 (12) | 17 (12) |
| Luminal-like A | 97 (31) | 37 (27) |
| Luminal-like B | 172 (54) | 77 (57) |
| HER2 pos | 10 (3) | 5 (4) |
| Grading, N (%) b | ||
| 1 | 20 (7) | 15 (13) |
| 2 | 108 (39) | 45 (38) |
| 3 | 151 (54) | 58 (49) |
| Tumor, N (%) b | ||
| T1 | 141 (50) | 59 (50) |
| T2 | 91 (33) | 42 (36) |
| T3 | 28 (10) | 5 (4) |
| T4 | 19 (7) | 12 (10) |
| Node, N (%) b | ||
| N0 | 134 (48) | 54 (46) |
| N+ | 145 (52) | 64 (54) |
| Prognostic stage, N (%) | ||
| I | 177 (56) | 70 (50) |
| II | 53 (17) | 20 (15) |
| III | 45 (14) | 26 (19) |
| IV | 4 (1) | 2 (1) |
| Metastatic | 39 (12) | 18 (13) |
| Receptor status, N (%) c | ||
| ER+/PR+ | 235 (74) | 94 (69) |
| ER+/PR− | 29 (9) | 19 (14) |
| ER-/PR+ | 5 (2) | 1 (1) |
| ER-/PR− | 49 (15) | 22 (16) |
| HER2/neu+, N (%) c | 66 (21) | 34 (25) |
| Ki67 proliferation index ≥20%, N (%) c | 204 (67) | 93 (71) |
| Type 2 Diabetes, N (%) | 39 (12) | 11 (8%) |
| Glucose metabolic asset d | ||
| Fasting blood glucose (mg/dl), Mean ± SD | 105 ± 31 | 102 ± 32 |
| Fasting insulin (µIU/ml), Median (IQR) | 11.9 (6.4–27.0) | 10.6 (5.6–19.6) |
| HbA1c (%), Mean ± SD | 5.8 ± 0.8 | 5.8 ± 0.7 |
| HOMA Index, Mean ± SD | 3.0 (1.4–8.3) | 2.9 (1.2–6.3) |
| Follow-up (years) | ||
| Mean (range) | 3.4 (0.29–10.5) | 3.5 (0.26–9.65) |
a According to St. Gallen Consensus Conference. b Evaluated at time of diagnosis. c Evaluated in a population of 397 primary breast cancer patients. d Evaluated at time of enrollment and prior to any treatment. ER/PR: estrogen/progesterone receptors; HER2: Human epidermal growth factor receptor 2; IQR: Interquartile range; HbA1c: Glycosylated hemoglobin; HOMA Index: Homeostasis model assessment index.
Features included in the model.
| Patient-Related | Tumor-Related | Biochemical |
|---|---|---|