| Literature DB >> 35175381 |
André Pfob1,2, Chris Sidey-Gibbons2,3, Richard G Barr4, Volker Duda5, Zaher Alwafai6, Corinne Balleyguier7, Dirk-André Clevert8, Sarah Fastner1, Christina Gomez1, Manuela Goncalo9, Ines Gruber10, Markus Hahn10, André Hennigs1, Panagiotis Kapetas11, Sheng-Chieh Lu2,3, Juliane Nees1, Ralf Ohlinger6, Fabian Riedel1, Matthieu Rutten12,13, Benedikt Schaefgen1, Maximilian Schuessler14, Anne Stieber1, Riku Togawa1, Mitsuhiro Tozaki15, Sebastian Wojcinski16, Cai Xu2,3, Geraldine Rauch17, Joerg Heil1, Michael Golatta18.
Abstract
OBJECTIVES: AI-based algorithms for medical image analysis showed comparable performance to human image readers. However, in practice, diagnoses are made using multiple imaging modalities alongside other data sources. We determined the importance of this multi-modal information and compared the diagnostic performance of routine breast cancer diagnosis to breast ultrasound interpretations by humans or AI-based algorithms.Entities:
Keywords: Artificial intelligence; Breast cancer; Machine learning; Ultrasonography
Mesh:
Year: 2022 PMID: 35175381 PMCID: PMC9123064 DOI: 10.1007/s00330-021-08519-z
Source DB: PubMed Journal: Eur Radiol ISSN: 0938-7994 Impact factor: 7.034
Distribution of baseline and outcomes variables in the whole cohort and in the development and validation datasets
| Whole cohort ( | Development set ( | Validation set | ||
|---|---|---|---|---|
| Patient age —yr. (SD) | 46.5 (16.0) | 47.0 (16.0) | 45.3 (16.0) | 0.089# |
| - < 50 years —no. (%) | 785 (60.9) | 549 (60.0) | 236 (63.3) | 0.304 |
| - ≥ 50 years —no. (%) | 503 (39.1) | 366 (40.0) | 137 (36.7) | 0.304 |
Clinically suspicious palpability ( | < 0.001* | |||
| No —no. (%) | 631 (49.0) | 495 (54.1) | 136 (36.5) | |
| Yes —no. (%) | 657 (51.0) | 420 (45.9) | 237 (63.5) | |
Breast mass dimensions on B-mode breast ultrasound ( | ||||
| Mass size in longest axis —mm. (SD) | 31.2 (14.6) | 30.9 (14.7) | 31.8 (14.5) | 0.317# |
| Mass size in perpendicular plane —mm. (SD) | 19.9 (10.4) | 18.1 (9.2) | 24.2 (11.9) | < 0.001# |
| Mass size in orthogonal plane —mm. (SD) | 23.8 (12.5) | 24.6 (12.9) | 22.0 (11.2) | < 0.001# |
Tissue composition ( | ||||
| Homogeneous background texture; fat —no. (%) | 259 (20.1) | 199 (21.7) | 60 (16.1) | 0.026* |
| Heterogeneous background texture —no. (%) | 563 (43.7) | 395 (43.2) | 168 (45.0) | 0.581 |
| Homogeneous background texture; fibroglandular —no. (%) | 466 (36.2) | 321 (35.1) | 145 (38.9) | 0.222 |
Shape of mass ( | ||||
| Oval —no. (%) | 906 (70.3) | 654 (71.5) | 252 (67.6) | 0.184 |
| Round —no. (%) | 68 (5.3) | 43 (4.7) | 25 (6.7) | 0.187 |
| Irregular —no. (%) | 314 (24.4) | 218 (23.8) | 96 (25.7) | 0.513 |
Orientation of mass ( | 0.948 | |||
| Parallel —No. (%) | 1046 (81.2) | 744 (81.3) | 302 (81.0) | |
| Not parallel —No. (%) | 242 (18.8) | 171 (18.7) | 71 (19.0) | |
Margin of mass ( | 0.011 | |||
| Circumscribed —No. (%) | 619 (48.1) | 461 (49.6) | 158 (42.4) | |
| Non-Circumscribed —No. (%) | 669 (51.9) | 454 (49.6) | 215 (57.6) | |
| - Microlobulated margin —no. (%) | 69 (5.4) | 48 (5.2) | 21 (5.6) | 0.888 |
| - Indistinct margin —no. (%) | 573 (44.5) | 391 (42.7) | 182 (48.8) | 0.054 |
| - Angular margin —no. (%) | 99 (7.7) | 71 (7.8) | 28 (7.5) | 0.969 |
| - Spiculated margin —no. (%) | 19 (1.5) | 13 (1.4) | 6 (1.6) | 1.00 |
Echo pattern ( | ||||
| Anechoic —no. (%) | 2 (0.2) | 0 (0.0) | 2 (0.5) | 0.151 |
| Complex cystic and solid —no. (%) | 21 (1.6) | 16 (1.7) | 5 (1.3) | 0.778 |
| Hypoechoic —no. (%) | 1065 (82.7) | 775 (84.7) | 290 (77.7) | 0.004* |
| Isoechoic —no. (%) | 101 (7.8) | 62 (6.8) | 39 (10.5) | 0.035* |
| Heterogeneous —no. (%) | 90 (7.0) | 57 (6.2) | 33 (8.8) | 0.121 |
| Hyperechoic —no. (%) | 9 (0.7) | 5 (0.5) | 4 (1.1) | 0.510 |
| Posterior features ( | ||||
| None —no. (%) | 759 (58.9) | 522 (57.0) | 237 (63.5) | 0.037* |
| Enhancement —no. (%) | 370 (28.7) | 275 (30.1) | 95 (25.5) | 0.114 |
| Combined pattern —no. (%) | 15 (1.2) | 9 (1.0) | 6 (1.6) | 0.508 |
| Shadowing —no. (%) | 144 (11.2) | 109 (11.9) | 35 (9.4) | 0.227 |
Calcification ( | 0.552 | |||
| No calcification —no. (%) | 1231 (95.6) | 877 (95.8) | 354 (94.9) | |
| Calcification —no. (%) | 57 (4.4) | 38 (4.2) | 19 (5.1) | |
Histopathological results ( | 0.010 | |||
| Benign —no. (%) | 920 (71.4) | 673 (73.6) | 247 (66.2) | |
| - Fibroadenoma —no. (%) | 528 (57.5) | 381 (72.2) | 147 (27.8) | 0.504 |
| - Lipoma —no. (%) | 4 (0.4) | 3 (75.0) | 1 (25.0) | 1 |
| - Atypia —no. (%) | 1 (0.1) | 1 (100) | 0 (0.0) | 1 |
| - Condense cyst —no. (%) | 65 (7.1) | 55 (84.6) | 10 (15.4) | 0.043 |
| - Other —no. (%) | 320 (34.9) | 231 (72.2) | 89 (27.8) | 0.708 |
| Malignant —no. (%) | 368 (28.6) | 242 (26.4) | 126 (33.8) | |
| - No special type —no. (%) | 250 (67.9) | 154 (61.6) | 96 (38.4) | 0.020 |
| - Invasive lobular carcinoma —no. (%) | 25 (6.8) | 13 (52.0) | 12 (48.0) | 0.199 |
| - Invasive tubular carcinoma —no. (%) | 7 (1.9) | 7 (100) | 0 (0.0) | 0.127 |
| - Medullary carcinoma —no. (%) | 5 (1.4) | 2 (40.0) | 3 (60.0) | 0.455 |
| - Papillary carcinoma —no. (%) | 12 (3.3) | 11 (91.7) | 1 (8.3) | 0.107 |
| - Ductal carcinoma in situ —no. (%) | 26 (7.1) | 22 (84.6) | 4 (15.4) | 0.059 |
| - Other —no. (%) | 43 (11.7) | 33 (76.7) | 10 (23.3) | 0.149 |
p values refer to chi-square tests for binary feature evaluation (feature true vs. feature not true)
# p values refer to t-test to evaluate mean differences of continuous data
** p values refer to differences in the development and validation sets
Diagnostic performance of routine clinical breast diagnosis, the three ultrasound experts, the unimodal ultrasound machine learning algorithms, and the multi-modal ultrasound machine learning algorithms in the validation set
| AUROC – value (95% CI) | Sensitivity – % (95% CI); no | Specificity – % (95% CI); no | Negative predictive value – % (95% CI); no | Positive predictive value – % (95% CI); no | |
|---|---|---|---|---|---|
| Clinical routine | 0.95 (0.93 – 0.97) | 100 (97.1 – 100), 126 of 126 | 35.6 (29.7 – 41.9), 88 of 247 | 100 (95.9 – 100), 88 of 88 | 44.2 (38.6 – 50.2), 126 of 285 |
| US expert 1 | 0.82 (0.77 – 0.87) | 88.1 (81.1 – 93.2) 111 of 126 | 49.4 (43.0 – 55.8), 122 of 247 | 89.1 (82.6 – 93.7), 122 of 137 | 47.0 (40.5 – 53.6), 111 of 236 |
| US expert 2 | 0.82 (0.77 – 0.87) | 96.0 (91.0 – 98.7), 121 of 126 | 24.3 (19.1 – 30.1), 60 of 247 | 92.3 (83.0 – 97.5), 60 of 65 | 39.3 (33.8 – 45.0), 121 of 308 |
| US expert 3 | 0.84 (0.79 – 0.89) | 91.3 (84.9 – 95.6), 115 of 126 | 31.2 (25.4 – 37.4), 77 of 247 | 87.5 (78.7 – 93.6), 77 of 88 | 40.4 (34.6 – 46.3), 115 of 285 |
| Unimodal ultrasound ML algorithms* | |||||
| Logistic regression with elastic net penalty | 0.83 (0.78 – 0.87) | 100 (97.1 – 100), 126 of 126 | 9.3 (6.0 – 13.6), 23 of 247 | 100 (85.2 – 100), 23 of 23 | 36.0 (31.0 – 41.3), 126 of 350 |
| XGBoost tree | 0.82 (0.77 – 0.86) | 100 (97.1 – 100), 126 of 126 | 18.2 (13.6 – 23.6), 45 of 247 | 100 (92.1 – 100), 45 of 45 | 38.4 (33.1 – 43.9), 126 of 328 |
| Multi-modal ultrasound ML algorithms** | |||||
| Logistic regression with elastic net penalty | 0.90 (0.87 – 0.93) | 100 (97.1 – 100), 126 of 126 | 27.1 (21.7 – 33.1), 67 of 247 | 100 (94.6—100), 67 of 67 | 41.2 (35.6 – 46.9), (126 of 306) |
| XGBoost tree | 0.89 (0.85 – 0.92) | 100 (97.1 – 100), 126 of 126 | 19.0 (14.3 – 24.5), 47 of 247 | 100 (92.5 – 100), 47 of 47 | 38.7 (33.3 – 44.2), 126 of 326 |
* Trained on ultrasound features
** Trained on ultrasound features as well as patient age and palpability
AUROC, area under the receiver operating characteristic curve; CI, confidence interval; ML, machine learning; US, ultrasound
Fig. 1Performance comparison between the clinical routine, the ultrasound experts, the unimodal machine learning algorithms, and the multi-modal machine learning algorithms
Fig. 2Receiver operating characteristic curves of the clinical routine, the ultrasound experts, the unimodal machine learning algorithms, and the multi-modal machine learning algorithms
Predictive coefficients of the uni- and multi-modal logistic regression with elastic net penalty algorithms
| Regularized coefficients* for unimodal algorithm (95% CI) | Regularized coefficients* for multi-modal algorithm (95% CI) | |
|---|---|---|
| Clinical information | ||
| Patient age | NA | 7.60 (7.53 to 7.73) |
| Clinically suspicious palpability | NA | 0.84 (0.56 to 1.28) |
| Breast mass dimensions on B-mode breast ultrasound | ||
| Mass size in longest axis | − 0.51 (− 0.58 to − 0.43) | − 1.04 (− 1.11 to − 0.96) |
| Mass size in perpendicular plane | 0.65 (0.54 to 0.75) | 1.50 (1.41 to 1.60) |
| Mass size in orthogonal plane | − 0.15 (− 0.09 to − 0.21) | 0.00 (− 0.06 to 0.06) |
| Tissue composition | ||
| Homogeneous background texture; fat | 0.09 (− 0.42 to 0.61) | − 0.01 (− 0.53 to 0.51) |
| Heterogeneous background texture | 0.0 (− 0.45 to 0.46) | 0.00 (− 0.45 to 0.46) |
| Homogeneous background texture; fibroglandular | − 0.07 (− 0.76 to 0.62) | 0.00 (− 0.69 to 0.70) |
| Shape of mass | ||
| Oval | − 0.35 (− 0.65 to − 0.06) | − 0.04 (− 0.34 to 0.31) |
| Round | − 0.01 (− 0.48 to 0.40) | − 0.32 (− 0.73 to − 0.09) |
| Irregular | 0.42 (− 0.12 to 0.96) | 0.54 (0.0 to 1.08) |
| Orientation of mass | ||
| Not parallel | 0.61 (0.25 to 0.97) | 0.88 (0.52 to 1.24) |
| Margin of mass | ||
| Non-circumscribed | 0.45 (0.11 to 0.79) | 0.55 (0.21 to 0.89) |
| Microlobulated margin | 0.50 (0.29 to 0.71) | 0.62 (0.41 to 0.83) |
| Indistinct margin | 0.43 (− 0.41 to 1.24) | 0.51 (− 0.33 to 1.35) |
| Angular margin | 0.50 (0.24 to 0.76) | 0.55 (0.29 to 0.81) |
| Spiculated margin | 0.96 (0.07 to 1.85) | 1.10 (0.21 to 1.99) |
| Echo pattern | ||
| Anechoic | 0.0 (− 2.3 to 2.3) | 0.0 (− 2.3 to 2.3) |
| Complex cystic and solid | 0.07 (− 0.52 to 0.69) | 0.11 (− 0.51 to 0.73) |
| Hypoechoic | 0.0 (− 0.7 to 0.7) | 0.0 (− 0.6 to 0.7) |
| Isoechoic | 0.13 (− 0.33 to 0.59) | 0.20 (− 0.26 to 0.66) |
| Heterogeneous | − 0.05 (− 0.56 to 0.46) | − 0.23 (− 0.74 to 0.28) |
| Hyperechoic | − 0.32 (− 1.03 to 0.41) | − 0.58 (− 1.61 to 0.15) |
| Posterior features | ||
| None | − 0.09 (− 0.33 to 0.15) | − 0.20 (− 0.44 to 0.04) |
| Enhancement | 0.0 (− 0.81 to 0.81) | 0.0 (− 0.80 to 0.81) |
| Combined pattern | − 0.50 (− 1.62 to 0.62) | − 0.84 (− 1.96 to 0.18) |
| Shadowing | 0.31 (0.14 to 0.48) | 0.26 (0.09 to 0.43) |
| Calcification | ||
| Calcification | 0.48 (0.25 to 0.61) | 0.76 (0.53 to 0.99) |
* Positive values indicate a positive association with malignancy
Fig. 3Ultrasound Images. a This patient’s ultrasound images were evaluated to show a benign breast mass by the three ultrasound experts but to show a malignant breast mass by full clinical breast evaluation. This patient was 41 years old with a positive family history for breast cancer and a clinically suspicious palpable tumor. Histopathology showed a luminal B, NST, G3 carcinoma. b This patient’s ultrasound images were evaluated to show a benign breast mass by the three physician experts and by full clinical breast evaluation. This patient was 25 years old without any clinically suspicious signs. Histopathology showed a fibroadenoma
Traditional multivariable logistic regression
| Odds ratio (95% CI) | ||
|---|---|---|
| Clinical information | ||
| Patient age | 1.10 (1.09 – 1.12) | < 0.001 |
| Clinically suspicious palpability | ||
| No | 1 [reference] | |
| Yes | 3.90 (2.59 – 5.94) | < 0.001 |
| Breast mass dimensions on B-mode breast ultrasound | ||
| Mass size in longest axis | 0.89 (0.82 – 0.95) | 0.001 |
| Mass size in perpendicular plane | 1.12 (1.05 – 1.20) | < 0.001 |
| Mass size in orthogonal plane | 1.05 (1.00 – 1.12) | 0.098 |
| Tissue composition | ||
| Homogeneous background texture; fat | 1 [reference] | |
| Heterogeneous background texture | 1.21 (0.76 – 1.93) | 0.432 |
| Homogeneous background texture; fibroglandular | 1.02 (0.61 – 1.69) | 0.947 |
| Shape of mass | ||
| Oval | 1 [reference] | |
| Round | 0.86 (0.39 – 1.84) | 0.697 |
| Irregular | 1.36 (0.82 – 2.27) | 0.232 |
| Orientation of mass | ||
| Parallel | 1 [reference] | |
| Not parallel | 2.66 (1.59 – 4.48) | < 0.001 |
| Margin of mass | ||
| Circumscribed | 1 [reference] | |
| Non-circumscribed | 2.67 (1.27 – 5.55) | 0.009 |
| Microlobulated margin | 2.42 (1.17 – 5.04) | 0.017 |
| Indistinct margin | 1.42 (0.74 – 2.80) | 0.298 |
| Angular margin | 2.11 (1.08 – 4.21) | 0.031 |
| Spiculated margin | 2.69 (0.44 – 5.25) | 0.372 |
| Echo pattern | ||
| Anechoic | 1 [reference] | |
| Complex cystic and solid | - | - |
| Hypoechoic | - | - |
| Isoechoic | - | - |
| Heterogeneous | - | - |
| Hyperechoic | - | - |
| Posterior features | ||
| None | 1 [reference] | |
| Enhancement | 1.40 (0.88 – 2.21) | 0.152 |
| Combined pattern | 1.03 (0.16 – 5.07) | 0.969 |
| Shadowing | 2.17 (1.23 – 3.83) | 0.007 |
| Calcification | ||
| No | 1 [reference] | |
| Yes | 3.08 (1.42 – 6.70) | < 0.001 |
“-” variable did not converge
Diagnostic performance of the clinical routine and of the three ultrasound experts in the whole cohort (n = 1288)
| Clinical routine* | Ultrasound expert 1 | Ultrasound expert 2 | Ultrasound expert 3 | |
|---|---|---|---|---|
| AUC, whole cohort (95% CI) | 0.94 (0.92–0.95) | 0.76 (0.73–0.79) | 0.79 (0.76–0.82) | 0.82 (0.79–0.85) |
| AUC, validation set (95% CI) | 0.95 (0.93 to 0.97) | 0.82 (0.77 to 0.87) | 0.82 (0.77 to 0.87) | 0.84 (0.79 to 0.89) |
| Performance difference compared to validation set — | 0.390 | 0.121 | 0.659 | 0.739 |
| Sensitivity —% (no.) | 98.4% (362 of 368) | 85.6% (315 of 368) | 94.8% (349 of 368) | 78.8% (290 of 368) |
| Specificity —% (no.) | 46.2% (425 of 920) | 41.4% (381 of 920) | 21.5% (198 of 920) | 44.9% (413 of 920) |
| Negative predictive value —% (no.) | 98.6% (425 of 431) | 87.8% (381 of 434) | 91.2% (198 of 217) | 84.1% (413 of 491) |
| Positive predictive value —% (no.) | 42.2% (362 of 857) | 36.9% (315 of 854) | 32.6% (349 of 1071) | 36.4% (290 of 797) |
* Evaluation of different imaging modalities (mammography, 2D B-mode ultrasound, and/or MRI, as applicable in clinical routine) alongside additional demographic and clinical information about the patients’ age, disease history, and family medical history
Fig. 4Shapley Additive Explanations (SHAP) Value Summary Plot of the Extreme Gradient Boosting (XGBoost) Tree Model. a XGboost – unimodal Algorithm. SHAP values on the left side of the x-axis indicate that the variable was important for predicting malignancy; values on the right side indicate that the variable was important for predicting a benign breast mass. Purple indicates a high variable value (e.g. margin – non-circumscribed: yes); yellow indicates a low variable value (e.g age: lower patient age). The values on the y-axis represent the overall global variable importance. b XGboost – multi-modal Algorithm