| Literature DB >> 33889025 |
Ling Huo1, Yao Tan2, Shu Wang3, Cuizhi Geng4, Yi Li5, XiangJun Ma6, Bin Wang2, YingJian He1, Chen Yao2,7, Tao Ouyang1.
Abstract
PURPOSE: This study aimed to establish and evaluate the usefulness of a simple, practical, and easy-to-promote machine learning model based on ultrasound imaging features for diagnosing breast cancer (BC).Entities:
Keywords: breast cancer; diagnostic accuracy; machine learning; patient stratification; screening modalities; ultrasound imaging
Year: 2021 PMID: 33889025 PMCID: PMC8057795 DOI: 10.2147/CMAR.S297794
Source DB: PubMed Journal: Cancer Manag Res ISSN: 1179-1322 Impact factor: 3.989
AUC of the Two Models in Our Previous Study24
| Strategies | Logistic Regression (95% CI) | Random Forest (95% CI) |
|---|---|---|
| Full models | 0.7812 (0.7325–0.8298) | 0. 7878(0.7392–0.8365) |
| Logistic | 0.7727 (0.7227–0.8227) | 0. 7757 (0.7258–0.8255) |
| Random forest | 0.7880 (0.7395–0.8364) | 0. 7868 (0.7377–0.8359) |
Variable Assignment
| Variables | Name | Value |
|---|---|---|
| Breast left/right | zyc | 0-left, 1-right |
| Direction | FX | 0- parallel, 1-unparallel |
| Margins blur | bqxcd1 | 0-identifiable, 1-non‐identifiable but no blur, 2-non‐identifiable and blurred |
| Margins angulation | bqxcd2 | 0-identifiable, 1-non‐identifiable but no angulation, 2-non‐identifiable and angled |
| Margins microlobulation | bqxcd3 | 0-identifiable, 1-non‐identifiable but no microlobulation, 2-non‐identifiable and microlobulated |
| Margin burr | bqxcd4 | 0-identifiable, 1-non‐identifiable but no burr, 2-non‐identifiable and burr |
| Posterior echoes | hfhs | 0-no change, 1-enhanced, 2- attenuated (include mixed) |
| Surrounding tissue edema | shuiz | 0-no, 1-yes |
| Benign vs malignant | End | 0- benign, 1-malignant |
| Clinicians | biras | 0-benign tendency (follow-up), 1- malignant tendency (biopsy) |
| Biopsy results | Path | 0- benign, 1-malignant |
| Follow-up results | Path3 | 0- benign, 1-malignant |
Comparison Between the Modeling Data Set and the Validation Data Set
| Variables | Modeling Data Set (n=1125) | Validation Data Set (n=1965) | |||
|---|---|---|---|---|---|
| Zyc | Left, n (%) | 0 (0.00%) | 942 (47.94%) | – | – |
| Right, n (%) | 0 (0.00%) | 1023 (52.06%) | |||
| FX | Parallel | 826 (73.42%) | 1566 (79.69%) | 16.096 | 0.000 |
| Unparallel | 299 (26.58%) | 399 (20.31%) | |||
| Bqxcd1 | Identifiable | 160 (14.22%) | 1074 (54.66%) | 609.309 | 0.000 |
| Non‐identifiable but no blur | 80 (7.11%) | 240 (12.21%) | |||
| Non‐identifiable and blurred | 885 (78.67%) | 651 (33.13%) | |||
| Bqxcd2 | Identifiable | 160 (14.22%) | 1073 (54.61%) | 504.371 | 0.000 |
| Non‐identifiable but no angulation | 525 (46.67%) | 401 (20.41%) | |||
| Non‐identifiable and angled | 440 (39.11%) | 491 (24.99%) | |||
| Bqxcd3 | Identifiable | 160 (14.22%) | 1073 (54.61%) | 629.396 | 0.000 |
| Non‐identifiable but no microlobulation | 363 (32.27%) | 574 (29.21%) | |||
| Non‐identifiable and microlobulated | 602 (53.51%) | 318 (16.18%) | |||
| Bqxcd4 | Identifiable | 160 (14.22%) | 1074 (54.66%) | 497.430 | 0.000 |
| Non‐identifiable but no burr | 720 (64.00%) | 717 (36.49%) | |||
| Non‐identifiable and burr | 245 (21.78%) | 174 (8.85%) | |||
| hfhs | No change | 687 (61.07%) | 1549 (78.83%) | 114.225 | 0.000 |
| Enhanced | 198 (17.60%) | 204 (10.38%) | |||
| Attenuated (including mixed) | 240 (21.33%) | 212 (10.79%) | |||
| shuiz | No | 1079 (95.91%) | 1823 (92.77%) | 12.326 | 0.000 |
| Yes | 46 (4.09%) | 142 (7.23%) | |||
| End | Benign | 393 (34.93%) | 1467 (74.66%) | 471.132 | 0.000 |
| Malignant | 732 (65.07%) | 498 (25.34%) |
Note: The values are presented in n (%).
Comparison Between the Benign and Malignant Groups in the Validation Set
| Variables | Benign (n=1467) | Malignant (n=498) | |||
|---|---|---|---|---|---|
| Zyc | Left | 1352 (92.16%) | 214 (42.97%) | 555.895 | 0.000 |
| Right | 115 (7.84%) | 284 (57.03%) | |||
| FX | Parallel | 1040 (70.89%) | 34 (6.83%) | 656.956 | 0.000 |
| Unparallel | 152 (10.36%) | 88 (17.67%) | |||
| Bqxcd1 | Identifiable | 275 (18.75%) | 376 (75.50%) | ||
| Non‐identifiable but no blur | 1040 (70.89%) | 33 (6.63%) | 657.869 | 0.000 | |
| Non‐identifiable and blurred | 232 (15.81%) | 169 (33.94%) | |||
| Bqxcd2 | Identifiable | 195 (13.29%) | 296 (59.44%) | ||
| Non‐identifiable but no angulation | 1040 (70.89%) | 33 (6.63%) | 679.549 | 0.000 | |
| Non‐identifiable and angled | 323 (22.02%) | 251 (50.40%) | |||
| Bqxcd3 | Identifiable | 104 (7.09%) | 214 (42.97%) | ||
| Non‐identifiable but no microlobulation | 1040 (70.89%) | 34 (6.83%) | 808.091 | 0.000 | |
| Non‐identifiable and microlobulated | 415 (28.29%) | 302 (60.64%) | |||
| Bqxcd4 | Identifiable | 12 (0.82%) | 162 (32.53%) | ||
| Non‐identifiable but no burr | 1271 (86.64%) | 278 (55.82%) | 231.661 | 0.000 | |
| Non‐identifiable and burr | 116 (7.91%) | 88 (17.67%) | |||
| hfhs | No change | 80 (5.45%) | 132 (26.51%) | ||
| Enhanced | 1440 (98.16%) | 383 (76.91%) | 250.462 | 0.000 | |
| Attenuated (include mixed) | 27 (1.84%) | 115 (23.09%) |
Note: The values are presented in n (%).
Figure 1Representative ultrasound images showing malignant breast lesions. (A) A hypoechoic malignant lesion with irregular shape, calcification (thick arrow), and not circumscribed margin thin arrow). (B) A hypoechoic lesion with an oval shape, circumscribed margins (thin arrow), and enhancement posterior features (thick arrow). (C) A heterogeneous, hypoechoic structural disordered area with irregular shape and parallel orientation characteristic.
Performance Evaluation of the Different Models
| Model | Accuracy | Precision Class 1 | Recall Class 1 | AUC of ROC | AUC of PRC | F1 Score |
|---|---|---|---|---|---|---|
| Logistic regression | 0.720 | 0.734 | 0.891 | 0.771 | 0.846 | 0.805 |
| Random forest | 0.727 | 0.755 | 0.858 | 0.747 | 0.812 | 0.803 |
| Extra trees | 0.723 | 0.754 | 0.852 | 0.746 | 0.820 | 0.800 |
| Support vector | 0.709 | 0.717 | 0.913 | 0.638 | 0.736 | 0.803 |
| Multilayer Perceptron | 0.738 | 0.756 | 0.880 | 0.775 | 0.838 | 0.813 |
| XG Boost | 0.713 | 0.730 | 0.885 | 0.769 | 0.839 | 0.800 |
| Logistic regression | 0.772 | 0.528 | 0.936 | 0.906 | 0.794 | 0.675 |
| Random forest | 0.814 | 0.598 | 0.813 | 0.865 | 0.735 | 0.689 |
| Extra trees | 0.813 | 0.597 | 0.807 | 0.855 | 0.709 | 0.687 |
| Support vector | 0.768 | 0.524 | 0.936 | 0.852 | 0.632 | 0.671 |
| Multilayer Perceptron | 0.818 | 0.596 | 0.869 | 0.901 | 0.792 | 0.708 |
| XG Boost | 0.781 | 0.542 | 0.876 | 0.898 | 0.776 | 0.669 |
Figure 2ROC plots of the calibrated model in the test set (A) and validation set (B).
Figure 3Calibration plots of the calibrated model in the test set (A) and validation set (B).
Comparison Between Clinician Diagnosis and Gold Standard Diagnosis
| Clinician | Gold Standard | Total | ||
|---|---|---|---|---|
| Benign | Malignant | |||
| All validation set | Benign | 1318 | 36 | 1354 |
| Malignant | 149 | 462 | 611 | |
| Total | 1467 | 498 | 1965 | |
| Primary hospitals | Benign | 535 | 11 | 546 |
| Malignant | 54 | 81 | 135 | |
| Total | 589 | 92 | 681 | |
| Tertiary class A hospitals | Benign | 783 | 25 | 808 |
| Malignant | 95 | 381 | 476 | |
| Total | 878 | 406 | 1284 | |
Comparison Between Clinician and Model Diagnosis
| Model | Accuracy | Precision Class 1 | Recall Class 1 | AUC of ROC | AUC of PRC | F1 Score | Threshold | FPR | TPR |
|---|---|---|---|---|---|---|---|---|---|
| Clinicians | 0.906 | 0.756 | 0.927 | 0.913 | 0.851 | 0.833 | – | – | – |
| Logistic regression | 0.772 | 0.528 | 0.936 | 0.906 | 0.794 | 0.675 | 0.571 | 0.181 | 0.829 |
| Random Forest | 0.814 | 0.598 | 0.813 | 0.865 | 0.735 | 0.689 | 0.491 | 0.185 | 0.815 |
| Extra Trees | 0.813 | 0.597 | 0.807 | 0.855 | 0.709 | 0.687 | 0.505 | 0.185 | 0.807 |
| Support vector | 0.768 | 0.524 | 0.936 | 0.852 | 0.632 | 0.671 | 0.710 | 0.206 | 0.793 |
| Multilayer perceptron | 0.818 | 0.596 | 0.869 | 0.901 | 0.792 | 0.708 | 0.573 | 0.187 | 0.827 |
| XG Boost | 0.781 | 0.542 | 0.876 | 0.898 | 0.776 | 0.669 | 0.557 | 0.183 | 0.817 |
| Clinicians | 0.906 | 0.790 | 0.932 | 0.915 | 0.874 | 0.855 | – | – | – |
| Logistic regression | 0.798 | 0.618 | 0.941 | 0.915 | 0.839 | 0.746 | 0.584 | 0.155 | 0.833 |
| Random forest | 0.798 | 0.641 | 0.825 | 0.861 | 0.778 | 0.721 | 0.565 | 0.198 | 0.788 |
| Extra trees | 0.795 | 0.638 | 0.813 | 0.850 | 0.750 | 0.715 | 0.548 | 0.213 | 0.796 |
| Support vector | 0.793 | 0.612 | 0.941 | 0.851 | 0.687 | 0.742 | 0.712 | 0.210 | 0.791 |
| Multilayer perceptron | 0.807 | 0.643 | 0.877 | 0.903 | 0.829 | 0.742 | 0.573 | 0.210 | 0.837 |
| XG Boost | 0.792 | 0.621 | 0.877 | 0.900 | 0.816 | 0.727 | 0.581 | 0.208 | 0.828 |
| Clinicians | 0.905 | 0.683 | 0.918 | 0.894 | 0.807 | 0.784 | – | – | – |
| Logistic regression | 0.797 | 0.388 | 0.870 | 0.873 | 0.544 | 0.537 | 0.584 | 0.199 | 0.783 |
| Random forest | 0.747 | 0.321 | 0.783 | 0.771 | 0.446 | 0.456 | 0.627 | 0.246 | 0.739 |
| Extra trees | 0.746 | 0.318 | 0.772 | 0.766 | 0.409 | 0.451 | 0.644 | 0.251 | 0.750 |
| Support vector | 0.717 | 0.314 | 0.924 | 0.797 | 0.304 | 0.468 | 0.696 | 0.246 | 0.750 |
| Multilayer perceptron | 0.749 | 0.329 | 0.826 | 0.860 | 0.578 | 0.471 | 0.715 | 0.248 | 0.750 |
| XG Boost | 0.725 | 0.309 | 0.837 | 0.836 | 0.481 | 0.452 | 0.587 | 0.243 | 0.750 |
Performance of the Logistic Regression Model
| B | SE | OR | 95% CI | P | β | |
|---|---|---|---|---|---|---|
| fx | 1.454 | 0.165 | 4.281 | 3.098–5.917 | <0.001 | 0.322239 |
| bqxcd1 | 0.235 | 0.143 | 1.265 | 0.956–1.674 | 0.100 | 0.118155 |
| bqxcd2 | 0.334 | 0.142 | 1.396 | 1.058–1.844 | 0.019 | 0.155041 |
| bqxcd3 | 0.716 | 0.154 | 2.047 | 1.513–2.768 | <0.001 | 0.295653 |
| bqxcd4 | 1.184 | 0.247 | 3.267 | 2.013–5.303 | <0.001 | 0.425586 |
| hfhs | 0.340 | 0.101 | 1.405 | 1.152–1.714 | 0.001 | 0.123337 |
| shuiz | 1.193 | 0.269 | 3.298 | 1.947–5.586 | <0.001 | 0.170345 |
Figure 4Probability distribution by model.
Predicted Probability of Different Proportions of People by Model
| Logistic Regression | Random Forest | Extra Trees | Support Vector | Multilayer Perceptron | XG Boost | |
|---|---|---|---|---|---|---|
| 1% | 0.2158926 | 0.0870467 | 0 | 0.2690289 | 0.1223317 | 0.1271728 |
| 2% | 0.2481656 | 0.2063348 | 0.1830000 | 0.2690872 | 0.1924786 | 0.2399745 |
| 5% | 0.2953400 | 0.2432472 | 0.2500000 | 0.2691355 | 0.2032477 | 0.2851146 |
| 10% | 0.2953400 | 0.2826738 | 0.2857143 | 0.2691355 | 0.2580033 | 0.2999176 |
| 50% | 0.2953400 | 0.2826738 | 0.2857143 | 0.2691355 | 0.2580033 | 0.2999176 |
| 90% | 0.8769365 | 0.8999733 | 0.9291429 | 0.7422661 | 0.8754747 | 0.8494976 |
| 95% | 0.9327307 | 0.9831579 | 1 | 0.7428197 | 0.9669854 | 0.9255747 |
| 98% | 0.9648594 | 1 | 1 | 0.7554798 | 0.9834885 | 0.9730366 |
| 99% | 0.9675776 | 1 | 1 | 0.7882681 | 0.9877369 | 0.9751260 |