| Literature DB >> 30324429 |
L Barinov1,2,3, A Jairaj4, M Becker5,6, S Seymour4, E Lee5,6, A Schram5,6, E Lane6, A Goldszal5,6, D Quigley6, L Paster5,6.
Abstract
Ultrasound (US) is a valuable imaging modality used to detect primary breast malignancy. However, radiologists have a limited ability to distinguish between benign and malignant lesions on US, leading to false-positive and false-negative results, which limit the positive predictive value of lesions sent for biopsy (PPV3) and specificity. A recent study demonstrated that incorporating an AI-based decision support (DS) system into US image analysis could help improve US diagnostic performance. While the DS system is promising, its efficacy in terms of its impact also needs to be measured when integrated into existing clinical workflows. The current study evaluates workflow schemas for DS integration and its impact on diagnostic accuracy. The impact on two different reading methodologies, sequential and independent, was assessed. This study demonstrates significant accuracy differences between the two workflow schemas as measured by area under the receiver operating curve (AUC), as well as inter-operator variability differences as measured by Kendall's tau-b. This evaluation has practical implications on the utilization of such technologies in diagnostic environments as compared to previous studies.Entities:
Keywords: Artificial intelligence; Breast cancer; Clinical workflow; Computer-aided diagnosis (CAD); Decision support; Machine learning
Mesh:
Year: 2019 PMID: 30324429 PMCID: PMC6499739 DOI: 10.1007/s10278-018-0132-5
Source DB: PubMed Journal: J Digit Imaging ISSN: 0897-1889 Impact factor: 4.056
Fig. 1Lesion population statistics. a Tumor size. b Tumor grade. c Benign or malignant. d DCIS (non-invasive) vs invasive. e Lymph node status. f BI-RADS designation for the three radiologists tested
Fig. 2Ultrasound equipment characteristics. a Manufacturer. b Transducer frequency. “n/a” refers to cases in which US transducer frequency was not recorded
Score ranges and their corresponding categorical outputs. These ranges and categories are inherent to the system and were not designed or altered for this study
| Categorical output | Score range |
|---|---|
| Benign | [0, 0.25) |
| Probably benign | [0.25, 0.5) |
| Suspicious | [0.5, 0.75) |
| Malignant | [0.75 1.0] |
Fig. 3Screen capture of the study platform. The left side shows two orthogonal views with ROIs. On the right side is the DS output and the radiologist case assessment input (BI-RADS assessment and likelihood of malignancy percentage)
This table provides a summary of the three readers involved in this study
| Radiologist ID | Post-educational training experience (years) | ABR certified | Breast fellowship training |
|---|---|---|---|
| 1 | 20+ | x | x |
| 2 | 10+ | x | x |
| 3 | 5+ | x |
Fig. 4Schematic representation of the a sequential and b independent reading paradigms. A combination approach seen in c was utilized in this study
Fig. 5Results of the system evaluation. a ROC curves and corresponding AUCS assessing impact of ROI boundary variation. b Assessment of class switching due to ROI boundary variation
Fig. 6Sensitivity and specificity of each reader’s BI-RADS grading is compared to that of the systems corresponding categorical output
Each reader’s performance was assessed prior to being presented the system’s output. The results of their control reads as measured via AUC is shown in this table
| Radiologist ID | AUC, 95% CI |
|---|---|
| 1 | 0.7618 [0.7244–0.7934] |
| 2 | 0.7543 [0.7197–0.7887] |
| 3 | 0.7325 [0.6897–0.7689] |
In order to compare the two reading methodologies, the readers’ performance was assessed via AUC compared to their control reads summarized in Table 3. None of the readers attained statistical significance when utilizing sequential reads, while all readers were significantly better when utilizing an independent reader strategy
| Radiologist ID | Sequential read AUC, 95% CI | Independent read AUC, 95% CI | ||
|---|---|---|---|---|
| 1 | 0.7935 [0.7567–0.8229] | 0.235 | 0.8213 | 0.0285* |
| 2 | 0.7674 [0.7327–0.8001] | 0.601 | 0.8305 | 0.00155* |
| 3 | 0.7859 [0.7527–0.8174] | 0.0532 | 0.7988 | 0.0160* |
*Significant
Fig. 7Comparative assessment of a control, b sequential, and c independent reading workflows. Operating point specific improvement for Independent vs control assessments were additionally measured (D)
To further characterize reader performance, intra-reader variability was measured via Kendall’s tau-b
| Radiologist ID | Kendall’s tau-b for intra-reader variability assessment. |
|---|---|
| 1 | 0.597 |
| 2 | 0.595 |
| 3 | 0.529 |
Pairwise combinations of variability were measured utilizing the sequential read (SR) methodology. Interestingly, the independent read (IR) variability was lower than intra-reader variability
| Radiologist ID (KTB for reading methods) | 1 (control, SR, IR) | 2 (control, SR,IR) | 3 (control, SR,IR) |
|---|---|---|---|
| 1 (control, SR,IR) | (1, 1, 1) | (0.5505, 0.6263*, 0.7231**) | (0.4944, 0.6640*, 0.7476**) |
| 2 (control, SR,IR) | – | (1, 1, 1) | (0.4229, 0.5641*, 0.6231**) |
| 3 (control, SR,IR) | – | – | (1, 1, 1) |
*Significant with p < .01; **significant difference with p < 1e−8