| Literature DB >> 34282236 |
Abstract
For a patient affected by breast cancer, after tumor removal, it is necessary to decide which adjuvant therapy is able to prevent tumor relapse and formation of metastases. A prediction of the outcome of adjuvant therapy tailored for the patient is hard, due to the heterogeneous nature of the disease. We devised a methodology for predicting 5-years survival based on the new machine learning paradigm of coherent voting networks, with improved accuracy over state-of-the-art prediction methods. The 'coherent voting communities' metaphor provides a certificate justifying the survival prediction for an individual patient, thus facilitating its acceptability in practice, in the vein of explainable Artificial Intelligence. The method we propose is quite flexible and applicable to other types of cancer.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34282236 PMCID: PMC8289832 DOI: 10.1038/s41598-021-94243-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Distribution of patient categorical features over training, validation, and testing sets.
| Train | Validation | Testing | ||||
|---|---|---|---|---|---|---|
| Num | 326 | – | 186 | – | 200 | – |
| NEG | 113 | 34% | 56 | 30% | 67 | 33% |
| POS | 213 | 65% | 130 | 69% | 133 | 66% |
| No data | 4 | – | 4 | – | 2 | – |
| Num | 240 | – | 141 | - | 155 | – |
| 1 | 50 | 20% | 25 | 17% | 27 | 17% |
| 0 | 1 | 0% | 0 | 0% | 0 | 0% |
| 3 | 27 | 11% | 25 | 17% | 25 | 16% |
| 2 | 161 | 67% | 91 | 64% | 100 | 64% |
| 4 | 1 | 0% | 0 | 0% | 3 | 1% |
| No data | 90 | – | 49 | – | 47 | – |
| Num | 323 | – | 185 | – | 197 | – |
| 1 | 19 | 5% | 5 | 2% | 9 | 4% |
| 3 | 207 | 64% | 120 | 64% | 127 | 64% |
| 2 | 97 | 30% | 60 | 32% | 61 | 30% |
| No data | 7 | – | 5 | – | 5 | – |
| Num | 328 | – | 187 | – | 202 | – |
| Normal | 27 | 8% | 15 | 8% | 10 | 4% |
| Basal | 75 | 22% | 27 | 14% | 32 | 15% |
| Her2 | 52 | 15% | 27 | 14% | 39 | 19% |
| LumB | 67 | 20% | 50 | 26% | 42 | 20% |
| Claudin-low | 37 | 11% | 23 | 12% | 29 | 14% |
| LumA | 70 | 21% | 45 | 24% | 50 | 24% |
| No data | 2 | – | 3 | – | 0 | – |
| Num | 326 | – | 186 | – | 200 | – |
| MASTECTOMY | 209 | 64% | 118 | 63% | 130 | 65% |
| BREAST-CONSERVING | 117 | 35% | 68 | 36% | 70 | 35% |
| No data | 4 | – | 4 | – | 2 | – |
| Num | 330 | – | 190 | – | 202 | – |
| IDC+ILC | 11 | 3% | 14 | 7% | 10 | 4% |
| IDC-MUC | 6 | 1% | 6 | 3% | 5 | 2% |
| ILC | 25 | 7% | 9 | 4% | 15 | 7% |
| OTHER-INVASIVE | 1 | 0% | 1 | 0% | 0 | 0% |
| OTHER | 1 | 0% | 0 | 0% | 0 | 0% |
| IDC-MED | 6 | 1% | 2 | 1% | 3 | 1% |
| INVASIVE-TUMOUR | 3 | 0% | 0 | 0% | 1 | 0% |
| IDC-TUB | 6 | 1% | 3 | 1% | 3 | 1% |
| DCIS | 1 | 0% | 0 | 0% | 0 | 0% |
| IDC | 270 | 81% | 155 | 81% | 165 | 81% |
| No data | 0 | – | 0 | – | 0 | – |
| Num | 330 | – | 190 | – | 202 | – |
| Pre | 111 | 33% | 56 | 29% | 60 | 29% |
| Post | 219 | 66% | 134 | 70% | 142 | 70% |
| No data | 0 | – | 0 | – | 0 | – |
| Num | 330 | – | 190 | – | 202 | – |
| NEUT | 224 | 67% | 131 | 68% | 137 | 67% |
| LOSS | 20 | 6% | 8 | 4% | 8 | 3% |
| GAIN | 86 | 26% | 51 | 26% | 57 | 28% |
| No data | 0 | – | 0 | – | 0 | – |
| Num | 306 | – | 181 | – | 189 | – |
| r | 140 | 45% | 92 | 50% | 87 | 46% |
| l | 166 | 54% | 89 | 49% | 102 | 53% |
| No data | 24 | – | 9 | – | 13 | – |
| Num | 330 | – | 190 | – | 202 | – |
| 4.5 | 34 | 10% | 19 | 10% | 21 | 10% |
| 10 | 78 | 23% | 36 | 18% | 33 | 16% |
| 1 | 24 | 7% | 17 | 8% | 10 | 4% |
| 3 | 33 | 10% | 22 | 11% | 23 | 11% |
| 2 | 8 | 2% | 12 | 6% | 8 | 3% |
| 5 | 49 | 14% | 27 | 14% | 30 | 14% |
| 4 | 17 | 5% | 5 | 2% | 12 | 5% |
| 7 | 20 | 6% | 11 | 5% | 8 | 3% |
| 6 | 10 | 3% | 7 | 3% | 10 | 4% |
| 9 | 25 | 7% | 16 | 8% | 19 | 9% |
| 8 | 32 | 9% | 18 | 9% | 28 | 13% |
| No data | 0 | – | 0 | – | 0 | – |
| Num | 330 | – | 190 | – | 202 | – |
| 1 | 95 | 28% | 58 | 30% | 57 | 28% |
| 3 | 115 | 34% | 61 | 32% | 73 | 36% |
| 2 | 44 | 13% | 29 | 15% | 36 | 17% |
| 5 | 27 | 8% | 19 | 10% | 14 | 6% |
| 4 | 49 | 14% | 23 | 12% | 22 | 10% |
| No data | 0 | – | 0 | – | 0 | – |
| Num | 328 | – | 189 | – | 199 | – |
| Neg | 150 | 45% | 65 | 34% | 72 | 36% |
| Pos | 178 | 54% | 124 | 65% | 127 | 63% |
| No data | 2 | – | 1 | – | 3 | – |
| Num | 292 | – | 167 | – | 185 | – |
| HER2+ | 53 | 18% | 24 | 14% | 31 | 16% |
| ER−/HER2− | 86 | 29% | 37 | 22% | 43 | 23% |
| ER+/HER2–High-Prolif | 82 | 28% | 65 | 38% | 65 | 35% |
| ER+/HER2–Low-Prolif | 71 | 24% | 41 | 24% | 46 | 24% |
| No data | 38 | – | 23 | – | 17 | – |
| Num | 320 | – | 186 | – | 200 | – |
| High | 180 | 56% | 92 | 49% | 106 | 53% |
| Moderate | 108 | 33% | 70 | 37% | 75 | 37% |
| Low | 32 | 10% | 24 | 12% | 19 | 9% |
| No data | 10 | – | 4 | – | 2 | – |
Distribution of patient continuous features over training, validation, and testing sets.
| Train | Validation | Testing | |
|---|---|---|---|
| Num | 330 | 190 | 202 |
| Mean | 4.49 | 4.56 | 4.54 |
| Std dev | 1.12 | 1.00 | 1.17 |
| Median | 4.14 | 5.02 | 5.03 |
| Min | 1.03 | 2.01 | 1.05 |
| Max | 6.36 | 6.26 | 6.12 |
| Num | 330 | 190 | 202 |
| Mean | 103.24 | 110.35 | 112.99 |
| Std dev | 76.09 | 76.35 | 77.80 |
| Median | 88.88 | 102.00 | 98.40 |
| Min | 4.17 | 0.10 | 5.83 |
| Max | 337.03 | 301.23 | 322.83 |
| Num | 330 | 190 | 202 |
| Mean | 56.48 | 58.25 | 57.09 |
| Std dev | 13.29 | 14.36 | 12.69 |
| Median | 55.30 | 58.97 | 57.38 |
| Min | 28.29 | 26.72 | 21.93 |
| Max | 90.00 | 96.29 | 84.73 |
Performance of therapy-based stratification.
| Therapy | Yes-no-yes | No-no-yes | No-no-no | Yes-no-no | Yes-yes-yes |
|---|---|---|---|---|---|
| n.p. | 43 | 31 | 21 | 13 | 35 |
| 17 | 14 | 7 | 8 | 8 | |
| 26 | 17 | 14 | 5 | 27 | |
| n.a. | 37 | 30 | 21 | 13 | 30 |
| Sen. | 0.65 | 0.81 | 0.66 | 0.85 | 0.8 |
| Spec. | 0.92 | 0.78 | 0.8 | 0.66 | 0.84 |
| OR | 24.3 | 16.8 | 8.0 | 12.0 | 21.0 |
| OR p-val | 0.0006 | 0.002 | 0.11 | 0.1 | 0.01 |
| CI-Lo | 2.6 | 2.55 | 0.96 | 0.79 | 1.8 |
| CI-Hi | 221 | 111 | 66 | 180 | 240 |
| Kappa | 0.52 | 0.58 | 0.44 | 0.53 | 0.51 |
| AUC | 0.85 | 0.87 | 0.77 | 0.77 | 0.63 |
| AUC p-val | 0.0001 | 0.0002 | 0.02 | 0.06 | 0.13 |
| lrt p-val | 0.02 | 0.0006 | 0.06 | 0.33 | 0.03 |
| lh | 2 | 2 | 4 | 1 | 3 |
| fp | 7 | 12 | 17 | 8 | 5 |
Results on test data with automatic hyperparameter optimization and feature (gene) selection. Therapy class labels are (RAD, CHE, HOR). n.p. number of patients, n.a. number of answers. 95% confidence interval, lrt p-val p value for the log rank test, lh lookahead number, fp fingerprint size.
Secondary stratification by ER status.
| Type | n.p. | n.a. | Sen. | Spe. | or | p-val | CI-Lo | CI-Hi | Kappa | lrt pval | PPV | NPV | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Pos | 116 | 78 | 38 | 107 | 0.67 | 0.83 | 9.83 | 6.67e-07 | 3.88 | 24.93 | 0.50 | 0.001 | 0.67 | 0.83 |
| Neg | 24 | 10 | 14 | 21 | 0.86 | 0.71 | 15.00 | 0.02 | 1.63 | 138.16 | 0.57 | 0.01 | 0.86 | 0.71 |
Secondary stratification by intrinsic status.
| Type | n.p. | n.a. | Sen. | Spe. | or | p-val | CI-Lo | CI-Hi | Kappa | lrt pval | PPV | NPV | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LumA | 45 | 37 | 8 | 41 | 0.25 | 0.88 | 2.42 | 0.58 | 0.36 | 16.34 | 0.14 | 0.11 | 0.33 | 0.83 |
| LumB | 41 | 26 | 15 | 37 | 0.92 | 0.83 | 60.00 | 1.09e-05 | 5.98 | 601.61 | 0.72 | 0.05 | 0.75 | 0.95 |
| Claudin-low | 14 | 7 | 7 | 13 | 0.71 | 0.83 | 12.50 | 0.10 | 0.84 | 186.31 | 0.54 | 0.24 | 0.83 | 0.71 |
| Her2 | 22 | 12 | 10 | 20 | 0.90 | 0.60 | 13.50 | 0.06 | 1.20 | 152.22 | 0.50 | 0.79 | 0.69 | 0.86 |
| Basal | 14 | 3 | 11 | 14 | 0.82 | 0.67 | 9.00 | 0.18 | 0.52 | 155.25 | 0.43 | 0.06 | 0.90 | 0.50 |
Secondary stratification by 3 genes status.
| Type | n.p. | n.a. | Sen. | Spe. | or | p-val | CI-Lo | CI-Hi | kappa | lrt pval | PPV | NPV | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| her2+ | 18 | 8 | 10 | 15 | 1.00 | 0.80 | 40.00 | 0.01 | 1.98 | 807.14 | 0.84 | 0.01 | 0.91 | 1.00 |
| er+/her2− | 98 | 68 | 30 | 90 | 0.57 | 0.84 | 6.93 | 1.37e-04 | 2.53 | 19.02 | 0.42 | 0.07 | 0.62 | 0.81 |
| er−/her2− | 16 | 6 | 10 | 15 | 0.90 | 1.00 | 45.00 | 7.62e-03 | 2.29 | 885.65 | 0.86 | 0.004 | 1.00 | 0.83 |
Secondary stratification by tumor stage.
| Type | n.p. | n.a. | Sen. | Spe. | or | p-val | CI-Lo | CI-Hi | Kappa | lrt pval | PPV | NPV | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 27 | 20 | 7 | 26 | 0.71 | 0.84 | 13.33 | 0.01 | 1.71 | 103.76 | 0.53 | 0.03 | 0.62 | 0.89 |
| 2 | 68 | 43 | 25 | 61 | 0.80 | 0.78 | 14.00 | 1.66e-05 | 3.99 | 49.16 | 0.57 | 0.009 | 0.71 | 0.85 |
| 3 | 13 | 5 | 8 | 13 | 0.62 | 1.00 | 8.33 | 0.14 | 0.63 | 110.03 | 0.56 | 0.02 | 1.00 | 0.62 |
Secondary stratification by tumor grade.
| Type | n.p. | n.a. | Sen. | Spe. | or | p-val | CI-Lo | CI-Hi | Kappa | lrt pval | PPV | NPV | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 54 | 39 | 15 | 45 | 0.77 | 0.84 | 18.00 | 1.75e-04 | 3.62 | 89.58 | 0.59 | 0.0006 | 0.67 | 0.90 |
| 3 | 75 | 40 | 35 | 72 | 0.74 | 0.76 | 8.99 | 4.38e-05 | 3.09 | 26.13 | 0.50 | 0.02 | 0.74 | 0.76 |
Secondary stratification by lymph node status.
| Type | n.p. | n.a. | Sen. | Spe. | or | p-val | CI-Lo | CI-Hi | Kappa | lrt pval | PPV | NPV | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| POS | 82 | 48 | 34 | 75 | 0.70 | 0.83 | 11.50 | 4.19e-06 | 3.83 | 34.54 | 0.54 | 0.0003 | 0.77 | 0.78 |
| NEG | 61 | 41 | 20 | 56 | 0.79 | 0.81 | 16.07 | 2.41e-05 | 4.06 | 63.63 | 0.58 | 0.005 | 0.68 | 0.88 |
Figure 1Stratification by hormonal type: ER−/Her2−.
Figure 2Stratification by hormonal type: Her2+.
Figure 3Stratification by intrinsic type: Luminal B.
Figure 4Stratification lymph node status: positive.
Figure 5Stratification lymph node status: negative.
Kappa statistics for training data sets for various Autoweka/Weka feature selection settings.
| Therapy | No filter | cfs-best | cfs-greedy | Corr-ranker | Gain-ranker | j48-ranker | j48-greedy | CVN (lh) |
|---|---|---|---|---|---|---|---|---|
| Yesnoyes | 0.30 (rf) | 0.52 (mp) | 0.52 (mp) | 0.23 (lo) | 0.38 (smo) | 0.33 (lwl) | 0.52 (2) | |
| Nonoyes | 0.35 (sl) | 0.34 (nb) | 0.15 (rf) | 0.22 (smo) | 0.16 (lwl) | 0.35 (bn) | 0.15 (rf) | |
| Nonono | 0.09 (rf) | 0.5 (smo) | 0.35 (ibk) | 0.35 (nb) | 0.35 (nb) | 0.50 (rf) | 0.44 (4) | |
| Yesnono | 0.53 (sgd) | 0.69 (rf) | 0.39 (lo) | 0.56 (rf) | 0.56 (rf) | 0.53 (1) | ||
| Yesyesno | − 0.09 (dt) | 0.36 (rc) | 0.05 (nb) | 0.22 (lwl) | 0.10 (lwl) | 0.0 (ab) | 0.05 (nb) | |
| Yesyesyes | − 0.07 (nbm) | − 0.07 (ibk) | − 0.01 (ibk) | 0.19 (smo) | 0.14 (lwl) | 0.26 (rss) | − 0.07 (ibk) | |
| Noyesno | − 0.26 (rf) | − 0.03 (mp) | − 0.03 (mp) | 0.11 (mp) | − 0.26 (smo) | − 0.22 (mp) | − 0.03 (mp) | |
| Noyesyes (*) | 0.0 (rpt) | − 0.53 (rf) | − 0.53 (rf) | 0.13 (rf) | 0.17 (rf) | 0.23 (rf) | − 0.54 (rf) |
Therapy class (RAD, CHE, HOR). lh lookahead number or manually determined (m). Legend for autoweka methods: rf random forest, mp multilevel perceptron, nb Naive Bayes, bn Bayes Net, sgd stochastic gradient descent, rc random committee, ibk k-nearest neighbour classifier, sl simple logistic, nbm Naive Bayes Multinomial, rpt Fast Decision Tree REPTree (C4.5), smo fast training support vector machine, lo Logistic, lwl Locally Weighted Learning, ab AdaBoostM1, rss random subspace, dt decision table. (*) result for the validation dataset.
Independent cohorts.
| GEO | 45255 (ch) | 45255 (ho) | 45255 (chho) | 37181 | 7390 | 2034 |
|---|---|---|---|---|---|---|
| End point | os | os | os | dfs | os | rfs |
| Therapy | No-yes-no | No-no-yes | No-yes-yes | No-no-no | No-no-no | Yes-no-no |
| n.p. | 8 | 16 | 13 | 119 | 181 | 264 |
| 3 | 6 | 4 | 59 | 24 | 95 | |
| 5 | 10 | 9 | 60 | 157 | 169 | |
| n.a. | 8 | 16 | 13 | 106 | 179 | 258 |
| Kappa | 1.0 | 0.58 | 1.0 | 0.35 | 0.36 | 0.12 |
| Sen. | 1.0 | 0.8 | 1.0 | 0.66 | 0.7 | 0.90 |
| Spe. | 1.0 | 0.81 | 1.0 | 0.70 | 0.89 | 0.67 |
| OR | 15.0 | 18.0 | 36.0 | 4.59 | 20.86 | 20.12 |
| CI-Lo | 0.66 | 1.24 | 1.77 | 2.01 | 4.93 | 2.53 |
| CI-Hi | 339 | 260 | 731 | 10 | 88.2 | 159 |
| OR p-val | 0.19 | 0.03 | 0.01 | 3.7E−4 | 3.0E−5 | 1.8E−4 |
| AUC | 1.0 | 0.89 | 1.0 | 0.72 | 0.70 | 0.63 |
| AUC p-val | 0.01 | 0.005 | 0.003 | 2.6E−5 | 7.5E−4 | 1.0E−4 |
Results of leave-one-out evaluation with optimal multigene fingerprints derived from Metabric data sets. Therapy class: (RAD, CHE, HOR). Endpoint (e.p.) is os overall survival, dfs disease-free survival, rfs relapse-free survival. Confidence interval for odds ratio at 95% confidence. n.a. number of answers.