| Literature DB >> 35900608 |
Hongyan Wang1, Yuxin Jiang2, Yang Gu3, Wen Xu3, Bin Lin4, Xing An4, Jiawei Tian5, Haitao Ran6, Weidong Ren7, Cai Chang8, Jianjun Yuan9, Chunsong Kang10, Youbin Deng11, Hui Wang12, Baoming Luo13, Shenglan Guo14, Qi Zhou15, Ensheng Xue16, Weiwei Zhan17, Qing Zhou18, Jie Li19, Ping Zhou20, Man Chen21, Ying Gu22, Wu Chen23, Yuhong Zhang24, Jianchu Li3, Longfei Cong4, Lei Zhu25.
Abstract
BACKGROUND: Studies on deep learning (DL)-based models in breast ultrasound (US) remain at the early stage due to a lack of large datasets for training and independent test sets for verification. We aimed to develop a DL model for differentiating benign from malignant breast lesions on US using a large multicenter dataset and explore the model's ability to assist the radiologists.Entities:
Keywords: Artificial intelligence; Breast neoplasms; Deep learning; Diagnosis; Ultrasonography
Year: 2022 PMID: 35900608 PMCID: PMC9334487 DOI: 10.1186/s13244-022-01259-8
Source DB: PubMed Journal: Insights Imaging ISSN: 1869-4101
Fig. 1Study workflow. US Ultrasound; Conv Convolutional Layer; ReLU Rectified Linear Unit; DL deep learning; BI-RADS Breast Imaging Reporting and Data System
Patient demographics and breast lesion characteristics
| Characteristic | Training set | Internal test set | External test sets | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Benign, | Malignant, | Benign, | Malignant, | Benign, | Malignant, | ||||
| Mean age (range) (y) | 43.67 ± 13.14 (10–90) | 43.21 ± 13.65 (16–85) | 44.91 ± 12.06 (17–80) | ||||||
| Mean size (range) (cm) | 1.84 ± 0.99 (0.30–7.46) | 1.76 ± 0.94 (0.40–6.80) | 2.05 ± 1.08 (0.35–7.00) | ||||||
| Patients ( | 2724 | 1425 | 305 | 161 | 191 | 206 | |||
| Images ( | 6574 | 4904 | 728 | 546 | 489 | 802 | |||
| Age (y) | 38.99 ± 11.31 | 52.62 ± 11.70 | 0.000 | 38.78 ± 11.94 | 51.60 ± 12.75 | 0.000 | 39.07 ± 10.45 | 50.32 ± 10.89 | 0.000 |
| < 30 | 609 (22.36) | 24 (1.68) | 77 (25.24) | 3 (1.86) | 38 (19.90) | 1 (0.49) | |||
| 30–39 | 824 (30.25) | 162 (11.37) | 95 (31.15) | 25 (15.53) | 63 (32.98) | 28 (13.59) | |||
| ≥ 40 | 1291 (47.39) | 1239 (86.95) | 133 (43.61) | 133 (82.61) | 90 (47.12) | 177 (85.92) | |||
| Age at menarche (y) | 13.52 ± 1.50 | 14.32 ± 1.71 | 0.000 | 13.33 ± 1.67 | 14.32 ± 1.84 | 0.000 | 13.85 ± 1.61 | 14.21 ± 1.78 | 0.036 |
| Age at first live childbirth (y) | 25.21 ± 3.16 | 25.03 ± 3.51 | 0.044 | 25.38 ± 3.50 | 24.70 ± 3.39 | 0.056 | 26.27 ± 3.56 | 26.00 ± 3.00 | 0.457 |
| BMI (kg/m2) | 22.01 ± 3.05 | 23.64 ± 3.19 | 0.000 | 21.73 ± 2.95 | 23.21 ± 2.92 | 0.000 | 22.76 ± 3.32 | 24.81 ± 3.50 | 0.000 |
| Maximum diameter (cm) | 1.58 ± 0.87 | 2.34 ± 1.03 | 0.000 | 1.53 ± 0.79 | 2.19 ± 1.05 | 0.000 | 1.81 ± 1.03 | 2.26 ± 1.07 | 0.000 |
| ≤ 2 cm | 2107 (77.35) | 624 (43.79) | 241 (79.02) | 82 (50.93) | 132 (69.11) | 97 (47.09) | |||
| > 2 cm, ≤ 5 cm | 597 (21.92) | 775 (54.39) | 62 (20.33) | 76 (47.21) | 58 (30.37) | 105 (50.97) | |||
| > 5 cm | 20 (0.73) | 26 (1.82) | 2 (0.65) | 3 (1.86) | 1 (0.52) | 4 (1.94) | |||
| Distance from nipple (cm) | 2.48 ± 1.49 | 2.67 ± 1.57 | 0.000 | 2.44 ± 1.42 | 2.56 ± 1.52 | 0.364 | 2.63 ± 1.70 | 3.10 ± 1.69 | 0.006 |
| Family history (first-degree relatives) | 0.943 | 0.514 | 0.345 | ||||||
| No | 2660 (97.65) | 1385 (97.19) | 294 (96.39) | 157 (97.52) | 186 (94.41) | 197 (94.98) | |||
| Yes | 64 (2.35) | 40 (2.81) | 11 (3.61) | 4 (2.48) | 5 (5.59) | 9 (5.02) | |||
| History of benign breast disease | 0.000 | 0.045 | 0.005 | ||||||
| No | 2552 (93.69) | 1373 (96.35) | 284 (93.11) | 157 (97.52) | 171 (89.53) | 199 (96.60) | |||
| Yes | 172 (6.31) | 52 (3.65) | 21 (6.89) | 4 (2.48) | 20 (10.47) | 7 (3.40) | |||
| Clinical symptom(s) | 0.000 | 0.008 | 0.004 | ||||||
| No | 675 (24.78) | 136 (9.54) | 76 (24.92) | 23 (14.29) | 78 (40.84) | 56 (27.18) | |||
| Yes | 2049 (75.22) | 1289 (90.46) | 229 (75.08) | 138 (85.71) | 113 (59.16) | 150 (72.82) | |||
| Position | 0.130 | 0.486 | 0.513 | ||||||
| Left | 1390 (50.03) | 760 (53.33) | 160 (52.46) | 79 (49.07) | 92 (48.17) | 106 (51.46) | |||
| Right | 1334 (48.97) | 665 (46.67) | 145 (47.54) | 82 (50.93) | 99 (51.83) | 100 (48.54) | |||
Data in parentheses are percentages
BMI body mass index
Fig. 2Areas under the receiver operating characteristic curves (AUCs) of the model with the internal test set (a), the external test sets (b), the external test set A (c), and the external test set B (d). The confusion matrices of the model in distinguishing benign and malignant breast lesions with the internal test set (e), the external test sets (f), the external test set A (g), and the external test set B (h). Actual class, the pathology diagnosis; Predicted class, the binary prediction result of the deep learning model
Performance metrics for the DL model in the test sets
| AUC (95% CI) | Sensitivity (95% CI) | Specificity (95% CI) | PPV (95% CI) | NPV (95% CI) | ACC | F1 | MCC | |
|---|---|---|---|---|---|---|---|---|
| Internal test set | 0.908 (0.879–0.933) | 83.23 (76.55–88.65) | 83.61 (78.97–87.58) | 72.83 (67.33–77.71) | 90.43 (86.96–93.04) | 83.48 (79.79–86.73) | 0.777 | 0.650 |
| External test sets | 0.913 (0.881–0.939) | 88.84 (83.72–92.79) | 83.77 (77.76–88.70) | 85.51 (81.00–89.10) | 87.43 (82.48–91.13) | 86.40 (82.63–89.61) | 0.871 | 0.728 |
| External test set A | 0.908 (0.859–0.945) | 88.00 (79.98–93.64) | 85.57 (76.97–91.88) | 86.28 (79.39–91.12) | 87.37 (80.17–92.21) | 86.80 (81.26–91.19) | 0.871 | 0.736 |
| External test set B | 0.918 (0.871–0.952) | 89.62 (82.19–94.71) | 81.92 (72.63–89.10) | 84.82 (78.34–89.62) | 87.50 (79.87–92.51) | 86.00 (80.41–90.49) | 0.872 | 0.719 |
DL deep learning, AUC area under the receiver operating characteristic curve, PPV positive predictive value, NPV negative predictive value, ACC accuracy, MCC Matthews correlation coefficient, CI confidence interval
Performance metrics for the DL model versus the prospective BI-RADS assessment and the five radiologists in the comparison set
| AUC (95%CI) | Sensitivity (95%CI) | Specificity (95%CI) | PPV (95%CI) | NPV (95%CI) | ACC (95%CI) | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| DL | 0.924 (0.879–0.957) | 89.77 (81.47–95.22) | 82.30 (74.00–88.84) | 79.80 (72.51–85.54) | 91.18 (84.69–95.08) | 85.57 (79.94–90.12) | ||||||
| Pro | 0.969 (0.934–0.988) | 0.0058* | 98.86 (93.83–99.97) | 0.0078* | 53.10 (43.48–62.55) | < 0.0001* | 62.14 (57.40–66.67) | 0.0036* | 98.36 (89.45–99.77) | 0.0652 | 73.13 (66.45–79.13) | 0.0005* |
| R1 | 0.935 (0.892–0.965) | 0.5629 | 95.46 (88.77–98.75) | 0.2266 | 74.34 (65.27–82.09) | 0.0784 | 74.34 (67.84–79.91) | 0.3478 | 95.46 (88.90–98.22) | 0.2454 | 83.58 (77.72–88.42) | 0.5966 |
| R2 | 0.901 (0.851–0.939) | 0.2112 | 97.73 (92.03–99.72) | 0.0391* | 52.21 (42.61–61.70) | < 0.0001* | 61.43 (56.71–65.94) | 0.0025* | 96.72 (88.11–99.16) | 0.1734 | 72.14 (65.40–78.22) | 0.0002* |
| R3 | 0.852 (0.795–0.898) | 0.0021* | 100 (95.90–100) | 0.0039* | 21.24 (14.11–29.93) | < 0.0001* | 49.72 (47.33–52.11) | < 0.0001* | 100 | 0.1325 | 55.72 (48.57–62.71) | < 0.0001* |
| R4 | 0.795 (0.733–0.849) | < 0.0001* | 93.18 (85.75–97.46) | 0.5488 | 46.90 (37.45–56.52) | < 0.0001* | 57.75 (53.25–62.12) | 0.0004* | 89.83 (79.93–95.15) | 0.7778 | 67.16 (60.21–73.61) | < 0.0001* |
| R5 | 0.778 (0.714–0.834) | < 0.0001* | 97.73 (92.03–99.72) | 0.0391* | 17.70 (11.16–26.00) | < 0.0001* | 48.05 (45.77–50.33) | < 0.0001* | 90.91 (70.60–97.66) | 0.9682 | 52.74 (45.59–59.80) | < 0.0001* |
p value, comparison diagnostic performance with DL model
DL deep learning, BI-RADS Breast Imaging Reporting and Data System, AUC area under the receiver operating characteristic curve, PPV positive predictive value, NPV negative predictive value, ACC accuracy, CI confidence interval, Pro prospective BI-RADS assessment, R radiologist
*p value shows statistical difference
Fig. 3Areas under the receiver operating characteristic curves (AUCs) of the model versus the prospective BI-RADS assessment and the five radiologists in the comparison set (a). Performance metrics for the DL model versus the prospective BI-RADS assessment and the five radiologists in the comparison set (b). AUC area under the receiver operating characteristic curve; DL deep learning model; Pro the prospective Breast Imaging Reporting and Data System (BI-RADS) assessment; R radiologist. *Comparison diagnostic performance with DL model and shows statistical difference
Performance metrics for the experienced and inexperienced radiologists with and without model assistance in the comparison set
| AUC (95%CI) | Sensitivity (95%CI) | Specificity (95%CI) | PPV (95%CI) | NPV (95%CI) | ACC (95%CI) | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| All | 0.843 (0.819–0.865) | < 0.0001* | 96.82 (94.72–98.25) | < 0.0001* | 42.48 (38.36–46.67) | < 0.0001* | 56.72 (54.93–58.50) | < 0.0001* | 94.49 (91.03–96.66) | 0.1065 | 66.27 (63.25–69.19) | < 0.0001* |
| Ex | 0.919 (0.888–0.944) | 0.6778 | 96.59 (92.73–98.74) | 0.0118* | 63.27 (56.63–69.57) | < 0.0001* | 67.19 (63.27–70.90) | 0.0029* | 95.97 (91.52–98.14) | 0.0774 | 77.86 (73.48–81.83) | 0.0009* |
| Inex | 0.798 (0.764–0.830) | < 0.0001* | 96.97 (94.12–98.68) | 0.0005* | 28.61 (23.86–33.75) | < 0.0001* | 51.41 (49.64–53.17) | < 0.0001* | 92.38 (85.72–96.08) | 0.7031 | 58.54 (54.49–62.51) | < 0.0001* |
| All | 0.861 (0.838–0.881) | < 0.0001# | 97.27 (95.28–98.58) | 0.8036 | 64.25 (60.14–68.21) | < 0.0001# | 67.94 (65.46–70.32) | < 0.0001# | 96.80 (94.52–98.15) | 0.1533 | 78.71 (76.04–81.20) | < 0.0001# |
| Ex | 0.932 (0.903–0.954) | 0.1044 | 96.59 (92.73–98.74) | 1.0000 | 80.09 (74.28–85.09) | < 0.0001# | 79.07 (74.39–83.09) | 0.0041# | 96.79 (93.20–98.52) | 0.6885 | 87.31 (83.66–90.41) | < 0.0001# |
| Inex | 0.819 (0.786–0.849) | < 0.0001# | 97.73 (95.12–99.16) | 0.7266 | 53.69 (48.22–59.09) | < 0.0001# | 62.17 (59.40–64.86) | 0.0011# | 96.81 (93.18–98.54) | 0.0890 | 72.97 (69.23–76.48) | < 0.0001# |
| All | 0.908 (0.888–0.925) | < 0.0001# | 97.73 (95.86–98.91) | 0.4545 | 66.90 (62.85–70.77) | < 0.0001# | 69.69 (67.14–72.13) | < 0.0001# | 97.42 (95.33–98.59) | 0.0555 | 80.40 (77.81–82.81) | < 0.0001# |
| Ex | 0.933 (0.904–0.955) | 0.1202 | 97.16 (93.49–99.07) | 1.0000 | 75.66 (69.53–81.11) | < 0.0001# | 75.66 (71.16–79.67) | 0.0412# | 97.16 (93.49–98.79) | 0.5564 | 85.07 (81.21–88.41) | 0.0001# |
| Inex | 0.902 (0.875–0.924) | < 0.0001# | 98.11 (95.64–99.38) | 0.5488 | 61.06 (55.65–66.28) | < 0.0001# | 66.24 (63.17–69.18) | < 0.0001# | 97.64 (94.54–99.00) | 0.0265# | 77.28 (73.72–80.57) | < 0.0001# |
AUC area under the receiver operating characteristic curve, PPV positive predictive value, NPV negative predictive value, ACC accuracy, CI confidence interval, DL deep learning, All all the five radiologists, Ex experienced radiologists, Inex inexperienced radiologists
*p values are that of radiologists without DL assistance versus DL and show significant difference
#p values are that of radiologists with DL assistance vs. radiologists without DL assistance and show significant difference
Fig. 4Performance metrics for the radiologists with and without model assistance in the comparison set. a All the five radiologists; b experienced radiologists; c inexperienced radiologists. PPV, positive predictive value; NPV, negative predictive value; DL, deep learning