| Literature DB >> 35908091 |
Sun Yeop Lee1, Sangwoo Ha1, Min Gyeong Jeon1, Hao Li1, Hyunju Choi1, Hwa Pyung Kim1, Ye Ra Choi2,3, Hoseok I4,5, Yeon Joo Jeong6, Yoon Ha Park7, Hyemin Ahn8, Sang Hyup Hong8, Hyun Jung Koo8, Choong Wook Lee8, Min Jae Kim9, Yeon Joo Kim10, Kyung Won Kim8, Jong Mun Choi11.
Abstract
While many deep-learning-based computer-aided detection systems (CAD) have been developed and commercialized for abnormality detection in chest radiographs (CXR), their ability to localize a target abnormality is rarely reported. Localization accuracy is important in terms of model interpretability, which is crucial in clinical settings. Moreover, diagnostic performances are likely to vary depending on thresholds which define an accurate localization. In a multi-center, stand-alone clinical trial using temporal and external validation datasets of 1,050 CXRs, we evaluated localization accuracy, localization-adjusted discrimination, and calibration of a commercially available deep-learning-based CAD for detecting consolidation and pneumothorax. The CAD achieved image-level AUROC (95% CI) of 0.960 (0.945, 0.975), sensitivity of 0.933 (0.899, 0.959), specificity of 0.948 (0.930, 0.963), dice of 0.691 (0.664, 0.718), moderate calibration for consolidation, and image-level AUROC of 0.978 (0.965, 0.991), sensitivity of 0.956 (0.923, 0.978), specificity of 0.996 (0.989, 0.999), dice of 0.798 (0.770, 0.826), moderate calibration for pneumothorax. Diagnostic performances varied substantially when localization accuracy was accounted for but remained high at the minimum threshold of clinical relevance. In a separate trial for diagnostic impact using 461 CXRs, the causal effect of the CAD assistance on clinicians' diagnostic performances was estimated. After adjusting for age, sex, dataset, and abnormality type, the CAD improved clinicians' diagnostic performances on average (OR [95% CI] = 1.73 [1.30, 2.32]; p < 0.001), although the effects varied substantially by clinical backgrounds. The CAD was found to have high stand-alone diagnostic performances and may beneficially impact clinicians' diagnostic performances when used in clinical settings.Entities:
Year: 2022 PMID: 35908091 PMCID: PMC9339006 DOI: 10.1038/s41746-022-00658-x
Source DB: PubMed Journal: NPJ Digit Med ISSN: 2398-6352
Sample characteristics for the two trials by data sources and abnormalities.
| The CAD stand-alone trial | |||
|---|---|---|---|
| Total | BMC | PNUH | |
| Total | |||
| 1050 | 500 | 550 | |
| Age | 48.4 (15.4) | 48.9 (14.5) | 47.9 (16.2) |
| Male | 702 (66.9) | 318 (63.6) | 384 (69.8) |
| Lesion size (cm2) | 64.6 [33.1, 112.1] | 63.0 [35.7, 108.2] | 66.0 [32.4, 114.4] |
| More than 1 lesion | 98 (17.8) | 56 (22.4) | 42 (14.0) |
| Pneumothorax | |||
| 250 | 100 | 150 | |
| Age | 37.6 (18.2) | 41.6 (18.2) | 35.0 (17.8) |
| Male | 215 (86.0) | 81 (81.0) | 134 (89.3) |
| Lesion size (cm2) | 69.0 [36.4, 127.2] | 49.1 [27.3, 103.1] | 90.7 [40.7, 142.4] |
| More than 1 lesion | 20 (8.0) | 10 (10.0) | 10 (6.7) |
| Consolidation | |||
| 300 | 150 | 150 | |
| Age | 57.1 (14.5) | 53.0 (15.7) | 61.2 (11.9) |
| Male | 208 (69.3) | 106 (70.7) | 102 (68.0) |
| Lesion size (cm2) | 60.7 [31.4, 100.7] | 71.0 [41.0, 118.9] | 46.7 [21.5, 90.3] |
| More than 1 lesion | 78 (26.0) | 46 (30.7) | 32 (21.3) |
| Normal | |||
| 500 | 250 | 250 | |
| Age | 48.5 (10.1) | 49.3 (10.5) | 47.7 (9.7) |
| Male | 279 (55.8) | 131 (52.4) | 148 (59.2) |
CAD Computer-aided detection system, BMC Boramae Medical Center, PNUH Pusan National University Hospital, N The number of cases.
Discrimination performances and localization accuracy of the CAD by abnormalities and hospitals.
| Consolidation | Pneumothorax | |||||
|---|---|---|---|---|---|---|
| Pooled | BMC | PNUH | Pooled | BMC | PNUH | |
| AUROC (95% CI) | 0.960 (0.945, 0.975) | 0.986 (0.974, 0.999) | 0.932 (0.905, 0.960) | 0.978 (0.965, 0.991) | 0.965 (0.940, 0.990) | 0.987 (0.974, 1.000) |
| Sensitivity (95% CI) | 0.933 (0.899, 0.959) | 0.980 (0.943, 0.996) | 0.887 (0.825, 0.933) | 0.956 (0.923, 0.978) | 0.930 (0.861, 0.971) | 0.973 (0.933, 0.993) |
| Specificity (95% CI) | 0.948 (0.930, 0.963) | 0.986 (0.967, 0.995) | 0.915 (0.883, 0.940) | 0.996 (0.989, 0.999) | 0.998 (0.986, 1.000) | 0.995 (0.982, 0.999) |
| Sensitivity (95% CI) | 0.912 (0.879, 0.938) | 0.949 (0.909, 0.976) | 0.872 (0.816, 0.916) | 0.941 (0.906, 0.966) | 0.901 (0.830, 0.949) | 0.969 (0.929, 0.990) |
| False positive lesions per case (false positive lesions/cases) | 0.104 (109/1050) | 0.078 (39/500) | 0.127 (70/550) | 0.020 (21/1050) | 0.018 (9/500) | 0.022 (12/550) |
| Dice (IQR) | 0.691 (0.664, 0.718) | 0.766 (0.733, 0.798) | 0.613 (0.572, 0.655) | 0.798 (0.770, 0.826) | 0.747 (0.696, 0.799) | 0.833 (0.803, 0.863) |
CAD Computer-aided detection system, BMC Boramae, PNUH Pusan National University Hospital, AUROC Area under the receiver operating characteristic, CI Confidence interval, IQR Interquartile range.
Logistic regression results (image-level) and mixed effects logistic regression results (lesion-level) for associations between lesion characteristics and diagnostic accuracy of the CAD.
| Image-level | Lesion-level | |||
|---|---|---|---|---|
| OR (95% CI) | OR (95% CI) | |||
| Intercept | 2.26 (0.33, 16.87) | 0.414 | 2.22 (0.08, 2.72) | 0.387 |
| Age | 0.98 (0.96, 1.01) | 0.267 | 0.99 (0.99. 1.04) | 0.201 |
| Female (vs. male) | 0.85 (0.34, 2.23) | 0.732 | 0.83 (0.59, 2.41) | 0.619 |
| PNUH (vs. BMC) | 0.60 (0.23, 1.50) | 0.282 | 0.82 (0.61, 2.45) | 0.586 |
| Pneumothorax (vs. consolidation) | 0.84 (0.30, 2.37) | 0.736 | 1.22 (0.39, 1.78) | 0.626 |
| More than 1 lesion (vs. a single lesion) | 0.55 (0.10, 4.33) | 0.513 | – | – |
| Lesion size (cm2) | 1.11 (1.07, 1.16) | < 0.001 | 1.11 (1.07, 1.14) | < 0.001 |
CAD Computer-aided detection system, OR Odds ratio, CI Confidence interval, PNUH Pusan National University Hospital, BMC Boramae Medical Center.
Fig. 1CAD sensitivities by varying dice thresholds in the stand-alone trial.
Sensitivities are presented at a the image-level and at b the lesion-level. The 95% confidence intervals are drawn as error bars at each point. The black dashed line indicates the minimum threshold of clinical relevance (i.e., dice = 0.2). Raw values are presented in Supplementary Table 1. CAD Computer-aided detection system.
Pooled accuracy, sensitivity, and specificity of six readers with and without the CAD assistance.
| Consolidation | Pneumothorax | |||||
|---|---|---|---|---|---|---|
| Without CAD assistance | With CAD assistance | Without CAD assistance | With CAD assistance | |||
| Accuracy (95% CI) | 0.952 (0.943, 0.959) | 0.967 (0.960, 0.974) | 0.001 | 0.988 (0.983, 0.991) | 0.992 (0.988, 0.995) | 0.044 |
| Sensitivity (95% CI) | 0.931 (0.915, 0.945) | 0.980 (0.970, 0.987) | < 0.001 | 0.943 (0.914, 0.964) | 0.956 (0.930, 0.975) | 0.278 |
| Specificity (95% CI) | 0.967 (0.957, 0.976) | 0.958 (0.947, 0.967) | 0.104 | 0.995 (0.991, 0.997) | 0.997 (0.994, 0.999) | 0.069 |
P-values are from the comparison between readers’ sensitivity (or specificity) with and without the CAD assistance.
CAD Computer-aided detection system, CI Confidence interval.
Fig. 2Individual readers’ diagnostic accuracy with and without the CAD assistance in the impact trial.
The accuracies are separately presented for a consolidation and b pneumothorax. The red circle represents accuracy with the CAD assistance, and the blue circle represents accuracy without the CAD assistance. The arrow indicates a directional change in the accuracy when the CAD was used. Raw values are presented in Supplementary Table 3. CAD Computer-aided detection system, TR Thoracic radiologist, RS Respiratory specialist, NTR Non-thoracic radiologist, NRS Non-respiratory specialist, RR Radiology resident, GP General practitioner.
Mixed effects logistic regression results for the causal effect of CAD assistance on readers’ diagnostic accuracy.
| Overall | CAD correct | CAD incorrect | ||||
|---|---|---|---|---|---|---|
| OR (95% CI) | OR (95% CI) | OR (95% CI) | ||||
| Intercept | 84.18 (32.67, 216.91) | < 0.001 | 85.79 (30.41, 242.05) | < 0.001 | 85.00 (9.92, 728.35) | < 0.001 |
| Age | 1.00 (0.98, 1.01) | 0.548 | 1.00 (0.99, 1.01) | 0.951 | 0.99 (0.96, 1.02) | 0.417 |
| Female (vs. male) | 0.79 (0.56, 1.37) | 0.349 | 1.14 (0.67, 1.94) | 0.627 | 0.77 (0.28, 2.09) | 0.611 |
| CheXpert (vs. PadChest) | 1.40 (0.38, 5.13) | 0.608 | 0.56 (0.13, 2.40) | 0.439 | 2.78 (0.08, 93.14) | 0.569 |
| Consolidation (vs. normal) | 0.89 (0.52, 1.53) | 0.670 | 0.37 (0.20, 0.69) | 0.002 | 0.67 (0.10, 4.48) | 0.677 |
| Pneumothorax (vs. normal) | 0.77 (0.28, 2.10) | 0.609 | 1.13 (0.33, 3.91) | 0.847 | 0.04 (0.01, 0.30) | 0.001 |
| CAD assistance | 1.73 (1.30, 2.32) | < 0.001 | 4.84 (3.13, 7.49) | < 0.001 | 0.29 (0.17, 0.52) | < 0.001 |
CAD Computer-aided detection system, OR Odds ratio, CI Confidence interval
Details of the CAD effects.
| Values | |
|---|---|
| Corrected cases when assisted / incorrect cases when unassisted (%) | 102/147 (69.39) |
| (FP to TN)/FP (%) | 20/43 (46.51) |
| (FN to TP)/FN (%) | 82/104 (78.85) |
| Misguided cases when assisted / originally correct cases when unassisted (%) | 43/2619 (1.64) |
| (TP to FN)/TP (%) | 6/1462 (0.41) |
| (TN to FP)/TN (%) | 37/1157 (3.20) |
| FN corrected (FN to TP) | 0.602 |
| FN not affected (FN to FN) | 0.366 |
| Difference (95% CI) | 0.236 (0.030, 0.442) |
CAD Computer-aided detection system, FP False positive, TN True negative, FN False negative, TP True positive, CI Confidence interval.
Fig. 3Examples of CAD predictions on chest radiographs.
a Closely located and diffuse lesions are predicted as a single large lesion, and b dense bone structures such as the intersection of the rib and clavicle or sternum were falsely identified as consolidation. CAD Computer-aided detection system.