Literature DB >> 30993926

Effect of a Deep Learning Framework-Based Computer-Aided Diagnosis System on the Diagnostic Performance of Radiologists in Differentiating between Malignant and Benign Masses on Breast Ultrasonography.

Ji Soo Choi¹, Boo Kyung Han², Eun Sook Ko¹, Jung Min Bae¹, Eun Young Ko¹, So Hee Song¹, Mi Ri Kwon¹, Jung Hee Shin¹, Soo Yeon Hahn¹.

Abstract

OBJECTIVE: To investigate whether a computer-aided diagnosis (CAD) system based on a deep learning framework (deep learning-based CAD) improves the diagnostic performance of radiologists in differentiating between malignant and benign masses on breast ultrasound (US).
MATERIALS AND METHODS: B-mode US images were prospectively obtained for 253 breast masses (173 benign, 80 malignant) in 226 consecutive patients. Breast mass US findings were retrospectively analyzed by deep learning-based CAD and four radiologists. In predicting malignancy, the CAD results were dichotomized (possibly benign vs. possibly malignant). The radiologists independently assessed Breast Imaging Reporting and Data System final assessments for two datasets (US images alone or with CAD). For each dataset, the radiologists' final assessments were classified as positive (category 4a or higher) and negative (category 3 or lower). The diagnostic performances of the radiologists for the two datasets (US alone vs. US with CAD) were compared.
RESULTS: When the CAD results were added to the US images, the radiologists showed significant improvement in specificity (range of all radiologists for US alone vs. US with CAD: 72.8-92.5% vs. 82.1-93.1%; p < 0.001), accuracy (77.9-88.9% vs. 86.2-90.9%; p = 0.038), and positive predictive value (PPV) (60.2-83.3% vs. 70.4-85.2%; p = 0.001). However, there were no significant changes in sensitivity (81.3-88.8% vs. 86.3-95.0%; p = 0.120) and negative predictive value (91.4-93.5% vs. 92.9-97.3%; p = 0.259).
CONCLUSION: Deep learning-based CAD could improve radiologists' diagnostic performance by increasing their specificity, accuracy, and PPV in differentiating between malignant and benign masses on breast US.

Entities: Disease Gene Species

Keywords: Breast; CAD; Deep learning; Diagnostic performance; Radiologist; Ultrasound

Mesh：

Year: 2019 PMID： 30993926 PMCID： PMC6470083 DOI： 10.3348/kjr.2018.0530

Source DB: PubMed Journal: Korean J Radiol ISSN： 1229-6929 Impact factor: 3.500

INTRODUCTION

Ultrasound (US) is an important non-radiating imaging method for the detection and characterization of breast masses, which is well tolerated by patients and easily integrated into interventional procedures for patient treatment (1234). However, breast US has an inherent limitation of being operator dependent, which means that differences between operators in their knowledge and understanding of various breast US techniques lead to interobserver variability in the diagnosis of breast masses (35). The Breast Imaging Reporting and Data System (BI-RADS) for breast US counteracts these limitations by providing standardized terms that describe breast mass features and assessments as well as further recommendations for breast masses (6). Moreover, the BI-RADS has been proven to be an effective system in differentiating between benign and malignant masses (78). However, many BI-RADS US descriptors are found in both malignant and benign masses, and this issue is especially common with category 4 masses. Thus, category 4 breast masses have a wide range of malignancy risk (3–94%) (69), and their classification into subcategories 4a, 4b, and 4c is poorly reproducible among radiologists (10). To date, there has been no specific US descriptor that accurately predicts the risk of malignancy in breast masses (1011). Computer-aided diagnosis (CAD) is a computerized procedure that provides a second objective opinion to assist radiologists' image interpretation and diagnosis (12). To increase diagnostic accuracy and decrease interobserver variability, CAD systems for breast US have been applied to differentiate between malignant and benign masses (1314151617). Previous studies have shown that several breast US CAD systems had excellent diagnostic performance with a receiver operating characteristic (ROC) area under the curve (AUC) of approximately 0.9 for differentiating between benign and malignant masses (131416). Moreover, these systems decreased interobserver variability in biopsy recommendations (15). These studies used conventional CAD systems developed by individual research teams. Conventional CAD processes consist of feature extraction, selection, and classification (18192021). When adjusting the overall performance of conventional CAD, the most important issue is effective feature extraction, which can potentially alleviate the burden of feature selection and classification (1921). However, the extraction of meaningful features is a complex and time-consuming task requiring many image processing steps (21), which makes fine-tuning the overall performance of conventional CAD more difficult. Currently, deep learning techniques are considered to be the most advanced technology for image classification (2223). The main benefit of deep learning techniques is that they reduce the burden of feature selection and classification by generating a set of transformation functions and image features directly from the data (21). Deep learning techniques have been applied in radiology with promising results (242526). A recent study applied deep learning techniques to CAD for breast lesions on US as well as lung nodules on computed tomography, and showed that CAD with deep learning techniques (deep learning-based CAD) outperforms conventional CAD (27). However, no study has yet evaluated the effect of deep learning-based CAD on the decision processes of radiologists for diagnosing breast masses. Recently, deep learning-based CAD for breast US (S-Detect™ for Breast in RS80A; Samsung Medison Co., Ltd., Seoul, Korea) has become commercially available (21). Therefore, the purpose of this study was to investigate whether deep learning-based CAD could improve radiologists' diagnostic performance in differentiating between malignant and benign masses on breast US.

MATERIALS AND METHODS

Participants and Breast Masses

This study was approved by the Institutional Review Board of Samsung Medical Center. Written informed consent was obtained from all participants regarding the use of their medical information for research purposes. Women who were referred for breast US for diagnostic purposes were recruited from the Samsung Medical Center (Seoul, Korea) between January and December 2015. Eligible patients were women aged ≥ 20 years with breast masses detected by US. Women who had masses without definite final diagnoses were excluded. This study included 816 patients with 1043 breast masses, and their US images were used to build a database. From the database, 790 masses were randomly selected to construct datasets for training the deep learning-based CAD system (21). Thus, 253 remaining breast masses (80 malignant, 173 benign) from 226 patients were enrolled in this study. The median age of these patients was 47 years (interquartile range [IQR], 42.0–53.5 years). Their US images were used to construct datasets for image analysis. One hundred ninety-nine patients had one breast mass, and 27 patients had two breast masses. The final diagnosis for each mass was based on the histopathologic results of US-guided biopsy (n = 48), surgery (n = 99), or typical imaging findings only if they showed stability on follow-up imaging (n = 106) (Table 1). The mean follow-up duration was 21.5 months (range, 17–28 months). BI-RADS category 3 masses with insufficient follow-up were excluded from the study population.

Table 1

Characteristics of Overall 253 Breast Masses

Characteristics	Benign (n = 173)	Malignant (n = 80)	P
Age of patients (years)	44.0 (40.0–51.0)	51.5 (46.0–61.0)	< 0.001
B-mode US
Size (cm)	1.0 (0.7–1.3)	1.7 (1.2–2.5)	< 0.001
Pathologic diagnosis			-
Fibroadenoma	43 (24.8)	-
Fibrocystic change	6 (3.4)	-
Intraductal papilloma	6 (3.4)	-
Phyllodes tumor	5 (2.9)	-
Stromal fibrosis	2 (1.2)	-
Fibroadenomatoid mastopathy	2 (1.2)	-
Adenosis	2 (1.2)	-
Lobular carcinoma in situ	1 (0.6)	-
Cyst^*	1 (0.6)	-
N/A^†	105 (60.7)
Invasive ductal carcinoma	-	67 (83.7)
Ductal carcinoma in situ	-	9 (11.3)
Invasive lobular carcinoma	-	3 (3.7)
Invasive papillary carcinoma	-	1 (1.3)

Numeric data are presented as median (interquartile range). Non-numeric data are presented as number of lesions (percentage). *One cyst was diagnosed based on typical ultrasonographic features, without biopsy, †Benign mass assessed by Breast Imaging Reporting and Data System 2 or 3, and all with stability on follow-up US for at least 1 year. N/A = not available, US = ultrasound

US Examination

Three board-certified radiologists with more than eight years of experience in breast imaging were involved in image acquisition. US images were obtained using an RS80A system (Samsung Medison Co., Ltd.) with a 3–12-MHz linear high-frequency transducer. Radiologists performed bilateral whole breast B-mode US, and obtained three directional (i.e., transverse, longitudinal, and radial) static images showing the most suspicious features for each mass. For CAD analysis, video clips were subsequently recorded and it included the area of the entire mass and surrounding normal breast tissue. Video clips were recorded in one direction starting at one end of the mass and ending at the other end. Without considering other imaging findings, the radiologists independently assessed the BI-RADS final category based on the B-mode US findings (6). Biopsies were performed on masses assessed as BI-RADS category 4a or higher (n = 105), category 3 masses with palpable mass (n = 11), and masses increasing in size (n = 5). Moreover, biopsies were performed on category 3 or 2 masses upon patient request (n = 26). US-guided core needle biopsy was performed with at least four passes using a 14-gauge automated biopsy gun (Acecut; TSK Laboratory, Soja, Japan). US-guided vacuum-assisted biopsy was performed with an 8- or 11-gauge needle (Mammotome; Devicor Medical, Cincinnati, OH, USA).

Image Analysis by the Deep Learning-Based CAD System and Radiologists

For CAD analysis, the three radiologists who performed US data acquisition retrospectively reviewed the video clips for each mass. They chose representative static images (i.e., transverse and longitudinal or radial and anti-radial) showing the most suspicious features and identified the location of the mass. On the chosen static image, a two-dimensional region of interest (ROI) was automatically drawn along the mass margin by the deep learning-based CAD system based on the GoogLeNet Convolutionary Neural Network (S-DetectTM for Breast [High Accuracy Mode] in RS80A) (21) (Supplementary Materials, in the online-only Data Supplement). Moreover, when the automatically generated ROI was considered inaccurate by a radiologist, it was manually adjusted (Fig. 1). Based on the given ROI, the deep learning-based CAD system automatically analyzed the US features and provided a final assessment of the mass displayed on the screen. CAD final assessments were divided into two categories, “possibly benign” or “possibly malignant.”

Fig. 1

24-year-old woman diagnosed with fibroadenoma using US-guided biopsy.

A. Transverse B-mode US image shows 15-mm oval hypoechoic mass (arrows). B. After radiologist clicked on center point of mass on US image shown, two-dimensional region of interest (green line) was automatically drawn along mass margin through deep learning-based CAD. Following this, deep learning-based CAD analyzed US features of mass according to BI-RADS lexicon and displayed final assessment of “possibly benign” on screen. During first reading session (US images alone), two readers classified mass as BI-RADS category 4a because they assessed that margin of mass was angular (right arrow in A), whereas other two readers did not and classified mass as category 3. During second reading session (US images with CAD), two readers who previously classified mass as category 4a reassessed it as category 3, whereas two readers who previously classified it as category 3 did not change their classifications. BI-RADS = Breast Imaging Reporting and Data System, CAD = computer-aided diagnosis, US = ultrasound

Image analysis was independently performed by four radiologists who had not performed the US examination. Two radiologists (11 years and 3 years) were experienced in breast imaging, and the other two radiologists were in training with less than one year of breast imaging experience. In clinical practice, radiologists first perform B-mode US and then apply CAD to masses detected by B-mode US. Thus, two sequential reading sessions similar to actual practice were performed. First, the radiologists interpreted each mass with static B-mode US images, which were also used to derive the CAD results, alone and recorded its BI-RADS final assessment category (6). During the second reading session, the radiologists interpreted each mass by considering B-mode US images and their associated CAD results together. During this session, the radiologists subjectively recorded the category of each mass again, while considering prior category information from the first reading session. During the reading sessions, the radiologists were blinded to patient names, ages, identification numbers, other imaging modality findings, histopathological diagnoses, and clinical information.

Data and Statistical Analysis

Age and the mass size measured at US were compared between malignant and benign groups using the Mann-Whitney U test. To analyze the diagnostic performances of the radiologists and deep learning-based CAD for differentiating malignant from benign masses, the final assessments of the radiologists were categorized into positive (category 4a or higher) and negative (category 3 or lower) for each dataset. The deep learning-based CAD results were also categorized into positive (possibly malignant) and negative (possibly benign). The sensitivities, specificities, accuracies, positive predictive values (PPVs), and negative predictive values (NPVs) of the radiologists for the two datasets (US images alone or with the CAD results) and deep learning-based CAD were calculated based on the final breast mass diagnoses. To investigate the effect of deep learning-based CAD on the radiologists' diagnostic performance, the corresponding diagnostic values of each radiologist for the two datasets (US images alone or with the CAD results) were compared using McNemar's, chi-square, and Bennett's tests (28). Statistical differences in the diagnostic values of the experienced radiologists (readers 1 and 2), training radiologists (readers 3 and 4), and all radiologists (readers 1–4) between the two datasets were further analyzed by a generalized estimating equations approach (29). Moreover, to evaluate changes in the radiologists' decision making regarding the final BI-RADS category for predicting malignancy risk, an ROC curve analysis was performed using the seven-point BI-RADS rating score (i.e., 1, 2, 3, 4a, 4b, 4c, or 5). The ROC AUCs for radiologists were calculated, and the readers' AUCs for the two datasets (US images alone or with the CAD results) were compared using a nonparametric approach (30). For BI-RADS category 4a or higher, biopsy is performed. Thus, to evaluate changes in the radiologists' management decisions after CAD application, the total number of cases with management decision changes (i.e., biopsy or follow-up) of each reader was calculated (31). Interobserver agreement between the four radiologists regarding the final assessments (positive or negative) was also evaluated for the two datasets using κ statistics for each dataset. The κ values were interpreted as follows: ≤ 0.20, slight agreement; 0.21–0.40, fair agreement; 0.41–0.60, moderate agreement; 0.61–0.80, substantial agreement; and 0.81–1.00, excellent agreement (32). Analysis was performed using SAS version 9.4 (SAS Institute, Cary, NC, USA). Statistical significance was accepted when p values were less than 0.05.

RESULTS

The median diameter of all of the breast masses at US was 1.1 cm (IQR, 0.8–1.7 cm). Patients diagnosed with malignant masses were significantly older than those with benign masses (p < 0.001). The median size of the malignant masses was significantly larger than that of the benign masses (p < 0.001) (Table 1). The diagnostic performances of deep learning-based CAD and the radiologists for the two datasets (US images alone or with CAD) in differentiating malignant from benign masses are summarized in Table 2. The sensitivity, specificity, accuracy, PPV, and NPV of deep learning-based CAD were 85.0%, 95.4%, 92.1%, 89.5%, and 93.2%, respectively. Each radiologist had diagnostic performance changes when the deep learning-based CAD results were added to the US images. When the CAD results were combined with the US images, the two experienced radiologists and one of the training radiologists (readers 1–3) had significantly higher specificities, accuracies, and PPVs compared with those for the US images alone (range of readers 1–3 for US images alone vs. US images with CAD: specificity, 72.8–83.2% vs. 82.1–93.1% [p < 0.001, p = 0.006, and p = 0.014 for readers 1, 2, and 3, respectively]; accuracy, 77.9–84.2% vs. 86.2–90.9% [p < 0.001, p = 0.046, and p = 0.045]; and PPV, 60.2–70.4% vs. 71.0–85.2% [p < 0.001, p = 0.003, and p = 0.004]). When the sensitivities and NPVs of these readers were compared between the two datasets, readers 2 and 3 also showed higher, but non-significant, sensitivities and NPVs for the combination of the CAD results and US images compared with those for the US images alone (range of readers 2 and 3 for US images alone vs. US images with CAD: sensitivity, 86.3–88.8% vs. 90.0–95.0%; and NPV, 92.9–93.5% vs. 95.1–97.3%; all p > 0.05). Moreover, reader 1 had a similar sensitivity and NPV (US images alone vs. US images with CAD: sensitivity, 88.8% vs. 86.3%; NPV 93.3% vs. 93.6%; all p > 0.05). The other training radiologist (reader 4) had no significant differences in all of the diagnostic values when the CAD results were added to the US images (p > 0.05). The experienced radiologists (readers 1 and 2) had higher specificities (†p < 0.001), accuracies (†p < 0.001), and PPVs (†p < 0.001) for the US images with CAD compared with those for the US images alone. However, the training radiologists (readers 3 and 4) had higher sensitivities (‡p = 0.040) and NPVs (‡p = 0.045) for the US images with CAD compared with those for the US images alone. In a comparative analysis of overall radiologist performance between the two datasets, the radiologists showed significantly higher specificity (*p < 0.001), accuracy (*p = 0.038), and PPV (*p = 0.001) values for the combination of the CAD results and US images compared with those for the US images alone (Fig. 1, Table 2).

Table 2

Diagnostic Performance of Deep Learning-Based CAD and Radiologists for Two Datasets (US Images Alone or with CAD Results)

Parameters	CAD	Experienced Radiologists							Training Radiologists							P^*
		Reader 1			Reader 2			P^†	Reader 3			Reader 4			P^‡
		US Alone	US with CAD	P	US Alone	US with CAD	P	P^†	US Alone	US with CAD	P	US Alone	US with CAD	P	P^‡
Sensitivity (%)	85.0 (68/80)	88.8 (71/80)	86.3 (69/80)	0.683	86.3 (69/80)	90.0 (72/80)	0.371	0.782	88.8 (71/80)	95.0 (76/80)	0.182	81.3 (65/80)	86.3 (69/80)	0.221	0.040	0.120
Specificity (%)	95.4 (165/173)	72.8 (126/173)	93.1 (161/173)	< 0.001	83.2 (144/173)	90.2 (156/173)	0.006	< 0.001	75.1 (130/173)	82.1 (142/173)	0.014	92.5 (160/173)	89.0 (154/173)	0.211	0.373	< 0.001
Accuracy (%)	92.1 (233/253)	77.9 (197/253)	90.9 (230/253)	< 0.001	84.2 (213/253)	90.1 (228/253)	0.046	< 0.001	79.4 (201/253)	86.2 (218/253)	0.045	88.9 (225/253)	88.1 (223/253)	0.780	0.066	0.038
PPV (%)	89.5 (68/76)	60.2 (71/118)	85.2 (68/81)	< 0.001	70.4 (69/98)	80.1 (72/89)	0.003	< 0.001	62.3 (71/114)	71.0 (76/107)	0.004	83.3 (65/78)	78.4 (69/88)	0.214	0.267	0.001
NPV (%)	93.2 (165/177)	93.3 (126/135)	93.6 (161/172)	0.860	92.9 (144/155)	95.1 (156/164)	0.106	0.250	93.5 (130/139)	97.3 (142/146)	0.155	91.4 (160/175)	93.3 (154/165)	0.155	0.045	0.259

Data in parentheses were used to calculate percentages. p values between B-mode US alone and combination of B-mode US with CAD results for each reader. *Adjusted p value between overall radiologists for B-mode US alone and combination of B-mode US with CAD results by GEE approach, †Adjusted p value between experienced radiologists for B-mode US alone and combination of B-mode US with CAD results by GEE approach, ‡Adjusted p value between training radiologists for B-mode US alone and combination of B-mode US with CAD results by GEE approach. CAD = computer-aided diagnosis, GEE = generalized estimating equations, NPV = negative predictive value, PPV = positive predictive value

In predicting malignancy risk with the BI-RADS categories, the radiologists' AUCs for the US images with CAD (range, 0.914–0.951) were significantly higher than those for the US images alone (0.884–0.919; p < 0.001) (Fig. 2).

Fig. 2

ROC curves for radiologists for two datasets (US images alone vs. US images with CAD) based on probability of malignancy risk.

When deep learning-based CAD results were added to US, the readers' AUCs (right; range, 0.914–0.951) were significantly higher than those for US images alone (left; range, 0.884–0.919; p < 0.001). AUC = area under curve, ROC = receiver operating characteristic

Regarding radiologist management decision changes, deep learning-based CAD led to biopsy decisions being correctly changed to follow-up decisions for a mean of 10.1% (17.5/173) of the benign masses; however, follow-up decisions were incorrectly changed to biopsy decisions for a mean of 2.5% (4.3/173) of the benign masses (Table 3). In the malignant masses, follow-up decisions were correctly changed to biopsy decisions in a mean of 5.6% (4.5/80) of the masses (Fig. 3); however, biopsy decisions were incorrectly changed to follow-up decisions in a mean of 2.5% (2.0/80) of the masses (Fig. 4).

Table 3

Changes in Radiologists' Decision Making for Biopsy Recommendations When Deep Learning-Based CAD Results Were Added to US

Radiologists	Benign (n = 173)		Malignant (n = 80)
Radiologists	FU to Bx	Bx to FU	FU to Bx	Bx to FU
Reader 1	0	35	2	4
Reader 2	2	14	4	1
Reader 3	4	16	7	2
Reader 4	11	5	5	1
Mean ± standard deviation	4.3 ± 4.8	17.5 ± 12.6	4.5 ± 2.1	2.0 ± 1.4

Data are numbers of masses. Bx = biopsy, FU = follow-up

Fig. 3

50-year-old woman diagnosed with ductal carcinoma in situ using US-guided biopsy and surgical excision.

A. Transverse B-mode US image shows 13-mm oval mass with slightly heterogeneous echo pattern (arrows). B. Deep learning-based CAD analyzed US features of mass (green line) and displayed final assessment of “possibly malignant” on screen. During first reading session (US images alone), all four readers classified mass as BI-RADS category 3. During second reading session (US images with CAD), three of four readers changed their assessment to category 4a.

Fig. 4

48-year-old woman diagnosed with invasive ductal carcinoma using US-guided biopsy and surgical excision.

A. Transverse B-mode US image shows 19-mm isoechoic mass (arrows). B. Deep learning-based CAD analyzed US features of mass (green line) and displayed final assessment of “possibly benign” on screen. During first reading session (US images alone), mass was classified as BI-RADS category 4b by one reader, category 4a by another reader, and category 3 by other two readers. During second reading session (US images with CAD), reader who previously classified mass as category 4b reassessed it as category 4a, whereas reader who classified it as category 4a reassessed it as category 3. Two readers who classified mass as category 3 did not change their classifications.

For diagnosing malignant breast masses with US images alone, the four readers showed moderate to substantial agreement. When the CAD results were added to the US images, all of the readers showed substantial agreement (Table 4).

Table 4

Changes in Interobserver Agreement among Radiologists' Final Assessments when Deep Learning-Based CAD Results Were Added to US

Reading Modes	Reader 1	Reader 2	Reader 3	Reader 4
US alone
Reader 1	–	0.663 (0.571–0.755)	0.538 (0.434–0.643)	0.546 (0.447–0.645)
Reader 2	0.663 (0.571–0.755)	–	0.563 (0.461–0.666)	0.706 (0.616–0.796)
Reader 3	0.538 (0.434–0.643)	0.563 (0.461–0.666)	–	0.556 (0.456–0.656)
Reader 4	0.546 (0.447–0.645)	0.706 (0.616–0.796)	0.556 (0.456–0.656)	–
US with CAD
Reader 1	–	0.788 (0.707–0.868)	0.632 (0.536–0.728)	0.760 (0.675–0.845)
Reader 2	0.788 (0.707–0.868)	–	0.718 (0.631–0.805)	0.783 (0.702–0.863)
Reader 3	0.632 (0.536–0.728)	0.718 (0.631–0.805)	–	0.743 (0.659–0.827)
Reader 4	0.760 (0.675–0.845)	0.783 (0.702–0.863)	0.743 (0.659–0.827)	–

DISCUSSION

To our knowledge, few studies have investigated the effect of deep learning-based CAD on the decision-making process of radiologists diagnosing breast masses. For differentiating malignant from benign masses, we found that adding deep learning-based CAD results to B-mode US images significantly improved the specificities, accuracies, and PPVs of three out of four radiologists without losing sensitivity and NPV. However, the reduced impact on the sensitivity and NPV may be due to the radiologists' high sensitivity (81.3–88.8%) and NPV (91.4–93.5%) with US images alone. In addition, deep learning-based CAD significantly increased the AUCs of all of the radiologists for predicting malignancy risk with the BI-RADS categories. The overall results suggest that deep learning-based CAD can improve the performance of radiologists for diagnosing breast masses on breast US. Our results are in agreement with previous studies using CAD systems developed by individual research teams (13141627). Unlike these studies, we used a commercially available deep learning-based CAD system. Therefore, we believe that our results are of clinical value because our CAD system is directly applicable to clinical practice. Prior to this study, we anticipated that the CAD effects might be minimal for experienced radiologists compared with training radiologists. However, the diagnostic performance of the experienced radiologists significantly improved after the application of deep learning-based CAD, whereas the diagnostic performance of only one of the training radiologists improved. Considering the relatively high specificity (92.5%) of the other training radiologist (reader 4) with US images alone, increasing the specificity with deep learning-based CAD would be insignificant to this reader's performance. Consequently, our results suggest that deep learning-based CAD can improve the performances of experienced and inexperienced radiologists by increasing specificity. In addition, CAD improved the interobserver agreement for the final assessments of the radiologists differentiating between malignant and benign masses, which indicates that CAD may provide radiologists with greater consistency when using breast US for diagnosis and management. After the application of deep learning-based CAD, we found that the majority of the masses for which management decisions were changed were initially assessed as BI-RADS category 3 or 4a. However, management decisions did not change with typical benign (category 2) or moderate- to high-suspicion (category 4c or 5) masses. These findings indicate that deep learning-based CAD can improve diagnostic performance by leading radiologists to make correct biopsy decisions in cases where it is difficult to determine whether to perform a biopsy for BI-RADS 3 or 4a masses. Therefore, we believe that the commercially available deep learning-based CAD system used in our study can be an adjunctive tool similar to shear-wave elastography (SWE). SWE has been used as an ancillary tool to reduce the number of benign biopsies by further discriminating between category 3 or 4a masses detected by US (3334). In addition, by decreasing false-positive (10.1%, 17.5/173) and increasing true-positive (5.6%, 4.5/80) biopsies, deep learning-based CAD led to correct management decision changes by the radiologists. However, for malignant masses, incorrect decision changes from biopsy to follow-up occurred at a mean of 2.5% (2.0/80). Therefore, radiologists should be aware of this possibility when applying CAD in clinical practice. This study has several limitations. First, the radiologists selected a representative image and confirmed a deep learning-based CAD ROI, which means that the CAD results may have interobserver variability due to differences in the observed features between the representative images. However, in a recent study by Sultan et al. (35), the differences in US BI-RADS features between different observations did not change conventional CAD diagnostic performance for differentiating between breast masses due to continual retraining. Considering this study as well as the widespread use of the US BI-RADS lexicon for breast imaging, we think that if radiologists who are familiar with the US BI-RADS use our deep learning-based CAD, the CAD results between radiologists will not vary much. Second, non-mass lesions (e.g., architectural distortion, calcifications not associated with the mass) were excluded from our analysis because their margins were not clearly distinguishable from normal breast tissue, which made it difficult to confirm non-mass lesion ROIs for CAD. Therefore, our results are not directly applicable to the diagnosis of non-mass lesions detected by breast US. Finally, we potentially included benign or typically benign masses that did not undergo biopsy. However, for such masses, follow-up US is generally recommended without biopsy (6). Moreover, all of these masses were stable or decreased in size during follow-up. In conclusion, the diagnostic performance of deep learning-based CAD is higher than that of radiologists in differentiating between malignant and benign masses on breast US. When the CAD results were added to the US images, the radiologists showed improvement in their specificity, accuracy, and PPV without significant changes in their sensitivity and NPV. The use of deep learning-based CAD may improve the diagnostic performance of radiologists by increasing their specificity, accuracy, and PPV.

32 in total

1. Application of the mutual information criterion for feature selection in computer-aided diagnosis.

Authors: G D Tourassi; E D Frederick; M K Markey; C E Floyd
Journal: Med Phys Date: 2001-12 Impact factor: 4.071

2. Shear-wave elastography improves the specificity of breast US: the BE1 multinational study of 939 masses.

Authors: Wendie A Berg; David O Cosgrove; Caroline J Doré; Fritz K W Schäfer; William E Svensson; Regina J Hooley; Ralf Ohlinger; Ellen B Mendelson; Catherine Balu-Maestro; Martina Locatelli; Christophe Tourasse; Barbara C Cavanaugh; Valérie Juhan; A Thomas Stavros; Anne Tardivon; Joel Gay; Jean-Pierre Henry; Claude Cohen-Bacrie
Journal: Radiology Date: 2012-02 Impact factor: 11.105

3. Accuracy of screening mammography interpretation by characteristics of radiologists.

Authors: William E Barlow; Chen Chi; Patricia A Carney; Stephen H Taplin; Carl D'Orsi; Gary Cutter; R Edward Hendrick; Joann G Elmore
Journal: J Natl Cancer Inst Date: 2004-12-15 Impact factor: 13.506

4. BI-RADS lexicon for US and mammography: interobserver variability and positive predictive value.

Authors: Elizabeth Lazarus; Martha B Mainiero; Barbara Schepps; Susan L Koelliker; Linda S Livingston
Journal: Radiology Date: 2006-03-28 Impact factor: 11.105

Review 5. Computer-aided diagnosis in medical imaging: historical review, current status and future potential.

Authors: Kunio Doi
Journal: Comput Med Imaging Graph Date: 2007-03-08 Impact factor: 4.790

6. Deep Learning at Chest Radiography: Automated Classification of Pulmonary Tuberculosis by Using Convolutional Neural Networks.

Authors: Paras Lakhani; Baskaran Sundaram
Journal: Radiology Date: 2017-04-24 Impact factor: 11.105

7. Ultrasound positive predictive values by BI-RADS categories 3-5 for solid masses: An independent reader study.

Authors: A Thomas Stavros; Andrea G Freitas; Giselle G N deMello; Lora Barke; Dennis McDonald; Terese Kaske; Ducly Wolverton; Arnold Honick; Daniela Stanzani; Adriana H Padovan; Ana Paula C Moura; Marilia C V de Campos
Journal: Eur Radiol Date: 2017-04-10 Impact factor: 5.315

8. Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis.

Authors: Heung-Il Suk; Seong-Whan Lee; Dinggang Shen
Journal: Neuroimage Date: 2014-07-18 Impact factor: 6.556

9. A deep learning framework for supporting the classification of breast lesions in ultrasound images.

Authors: Seokmin Han; Ho-Kyung Kang; Ja-Yeon Jeong; Moon-Ho Park; Wonsik Kim; Won-Chul Bang; Yeong-Kyeong Seong
Journal: Phys Med Biol Date: 2017-09-15 Impact factor: 3.609

10. Breast cancer detection using automated whole breast ultrasound and mammography in radiographically dense breasts.

Authors: Kevin M Kelly; Judy Dean; W Scott Comulada; Sung-Jae Lee
Journal: Eur Radiol Date: 2009-09-02 Impact factor: 5.315

26 in total

1. Diagnostic performance improvement with combined use of proteomics biomarker assay and breast ultrasound.

Authors: Su Min Ha; Hong-Kyu Kim; Yumi Kim; Dong-Young Noh; Wonshik Han; Jung Min Chang
Journal: Breast Cancer Res Treat Date: 2022-01-27 Impact factor: 4.872

2. Application of Artificial Intelligence Computer-Assisted Diagnosis Originally Developed for Thyroid Nodules to Breast Lesions on Ultrasound.

Authors: Si Eun Lee; Eunjung Lee; Eun-Kyung Kim; Jung Hyun Yoon; Vivian Youngjean Park; Ji Hyun Youk; Jin Young Kwak
Journal: J Digit Imaging Date: 2022-07-28 Impact factor: 4.903

3. Application of endoscopic ultrasonography for detecting esophageal lesions based on convolutional neural network.

Authors: Gao-Shuang Liu; Pei-Yun Huang; Min-Li Wen; Shuai-Shuai Zhuang; Jie Hua; Xiao-Pu He
Journal: World J Gastroenterol Date: 2022-06-14 Impact factor: 5.374

4. Usefulness of a deep learning system for diagnosing Sjögren's syndrome using ultrasonography images.

Authors: Yoshitaka Kise; Mayumi Shimizu; Haruka Ikeda; Takeshi Fujii; Chiaki Kuwada; Masako Nishiyama; Takuma Funakoshi; Yoshiko Ariji; Hiroshi Fujita; Akitoshi Katsumata; Kazunori Yoshiura; Eiichiro Ariji
Journal: Dentomaxillofac Radiol Date: 2019-12-11 Impact factor: 2.419

5. An investigation of the classification accuracy of a deep learning framework-based computer-aided diagnosis system in different pathological types of breast lesions.

Authors: Mengsu Xiao; Chenyang Zhao; Qingli Zhu; Jing Zhang; He Liu; Jianchu Li; Yuxin Jiang
Journal: J Thorac Dis Date: 2019-12 Impact factor: 2.895

6. The diagnostic performance of ultrasound computer-aided diagnosis system for distinguishing breast masses: a prospective multicenter study.

Authors: Qi Wei; Yu-Jing Yan; Ge-Ge Wu; Xi-Rong Ye; Fan Jiang; Jie Liu; Gang Wang; Yi Wang; Juan Song; Zhi-Ping Pan; Jin-Hua Hu; Chao-Ying Jin; Xiang Wang; Christoph F Dietrich; Xin-Wu Cui
Journal: Eur Radiol Date: 2022-01-23 Impact factor: 5.315

7. Artificial Intelligence for Breast Ultrasound: Will It Impact Radiologists' Accuracy?

Authors: Manisha Bahl
Journal: J Breast Imaging Date: 2021-04-26

8. Automatic Detection and Classification of Rib Fractures on Thoracic CT Using Convolutional Neural Network: Accuracy and Feasibility.

Authors: Qing Qing Zhou; Jiashuo Wang; Wen Tang; Zhang Chun Hu; Zi Yi Xia; Xue Song Li; Rongguo Zhang; Xindao Yin; Bing Zhang; Hong Zhang
Journal: Korean J Radiol Date: 2020-07 Impact factor: 3.500

9. A Glimpse on Trends and Characteristics of Recent Articles Published in the Korean Journal of Radiology.

Authors: Yeon Hyeon Choe
Journal: Korean J Radiol Date: 2019-12 Impact factor: 3.500

10. Diagnostic Value of Breast Lesions Between Deep Learning-Based Computer-Aided Diagnosis System and Experienced Radiologists: Comparison the Performance Between Symptomatic and Asymptomatic Patients.

Authors: Mengsu Xiao; Chenyang Zhao; Jianchu Li; Jing Zhang; He Liu; Ming Wang; Yunshu Ouyang; Yixiu Zhang; Yuxin Jiang; Qingli Zhu
Journal: Front Oncol Date: 2020-07-07 Impact factor: 6.244