Jianming Li1, Yunyun Bu1, Shuqiang Lu2, Hao Pang3, Chang Luo2, Yujiang Liu1, Linxue Qian1. 1. Department of Ultrasound, Beijing Friendship Hospital, Capital Medical University, Beijing, China. 2. Department of Computer Science and Technology, Tsinghua University, Beijing, China. 3. School of Software, Beijing University of Posts and Telecommunications, Beijing, China.
Abstract
OBJECTIVES: Artificial intelligence (AI) has been an important addition to medicine. We aimed to explore the use of deep learning (DL) to distinguish benign from malignant lesions with breast ultrasound (BUS). METHODS: The DL model was trained with BUS nodule data using a standard protocol (1271 malignant nodules, 1053 benign nodules, and 2144 images of the contralateral normal breast). The model was tested with 692 images of 256 breast nodules. We used the accuracy, precision, recall, harmonic mean of recall and precision, and mean average precision as the indices to assess the DL model. We used 100 BUS images to evaluate differences in diagnostic accuracy among the AI system, experts (>25 years of experience), and physicians with varying levels of experience. A receiver operating characteristic curve was generated to evaluate the accuracy for distinguishing between benign and malignant breast nodules. RESULTS: The DL model showed 73.3% sensitivity and 94.9% specificity for the diagnosis of benign versus malignant breast nodules (area under the curve, 0.943). No significant difference in diagnostic ability was found between the AI system and the expert group (P = .951), although the physicians with lower levels of experience showed significant differences from the AI and expert groups (P = .01 and .03, respectively). CONCLUSIONS: Deep learning could distinguish between benign and malignant breast nodules with BUS. On BUS images, DL achieved diagnostic accuracy equivalent to that of expert physicians.
OBJECTIVES: Artificial intelligence (AI) has been an important addition to medicine. We aimed to explore the use of deep learning (DL) to distinguish benign from malignant lesions with breast ultrasound (BUS). METHODS: The DL model was trained with BUS nodule data using a standard protocol (1271 malignant nodules, 1053 benign nodules, and 2144 images of the contralateral normal breast). The model was tested with 692 images of 256 breast nodules. We used the accuracy, precision, recall, harmonic mean of recall and precision, and mean average precision as the indices to assess the DL model. We used 100 BUS images to evaluate differences in diagnostic accuracy among the AI system, experts (>25 years of experience), and physicians with varying levels of experience. A receiver operating characteristic curve was generated to evaluate the accuracy for distinguishing between benign and malignant breast nodules. RESULTS: The DL model showed 73.3% sensitivity and 94.9% specificity for the diagnosis of benign versus malignant breast nodules (area under the curve, 0.943). No significant difference in diagnostic ability was found between the AI system and the expert group (P = .951), although the physicians with lower levels of experience showed significant differences from the AI and expert groups (P = .01 and .03, respectively). CONCLUSIONS: Deep learning could distinguish between benign and malignant breast nodules with BUS. On BUS images, DL achieved diagnostic accuracy equivalent to that of expert physicians.