| Literature DB >> 32599874 |
Jae Hyun Kim1, Jin Young Kim1, Gun Ha Kim1, Donghoon Kang2, In Jung Kim2, Jeongkuk Seo2, Jason R Andrews3, Chang Min Park4.
Abstract
Early identification of pneumonia is essential in patients with acute febrile respiratory illness (FRI). We evaluated the performance and added value of a commercial deep learning (DL) algorithm in detecting pneumonia on chest radiographs (CRs) of patients visiting the emergency department (ED) with acute FRI. This single-centre, retrospective study included 377 consecutive patients who visited the ED and the resulting 387 CRs in August 2018-January 2019. The performance of a DL algorithm in detection of pneumonia on CRs was evaluated based on area under the receiver operating characteristics (AUROC) curves, sensitivity, specificity, negative predictive values (NPVs), and positive predictive values (PPVs). Three ED physicians independently reviewed CRs with observer performance test to detect pneumonia, which was re-evaluated with the algorithm eight weeks later. AUROC, sensitivity, and specificity measurements were compared between "DL algorithm" vs. "physicians-only" and between "physicians-only" vs. "physicians aided with the algorithm". Among 377 patients, 83 (22.0%) had pneumonia. AUROC, sensitivity, specificity, PPV, and NPV of the algorithm for detection of pneumonia on CRs were 0.861, 58.3%, 94.4%, 74.2%, and 89.1%, respectively. For the detection of 'visible pneumonia on CR' (60 CRs from 59 patients), AUROC, sensitivity, specificity, PPV, and NPV were 0.940, 81.7%, 94.4%, 74.2%, and 96.3%, respectively. In the observer performance test, the algorithm performed better than the physicians for pneumonia (AUROC, 0.861 vs. 0.788, p = 0.017; specificity, 94.4% vs. 88.7%, p < 0.0001) and visible pneumonia (AUROC, 0.940 vs. 0.871, p = 0.007; sensitivity, 81.7% vs. 73.9%, p = 0.034; specificity, 94.4% vs. 88.7%, p < 0.0001). Detection of pneumonia (sensitivity, 82.2% vs. 53.2%, p = 0.008; specificity, 98.1% vs. 88.7%; p < 0.0001) and 'visible pneumonia' (sensitivity, 82.2% vs. 73.9%, p = 0.014; specificity, 98.1% vs. 88.7%, p < 0.0001) significantly improved when the algorithm was used by the physicians. Mean reading time for the physicians decreased from 165 to 101 min with the assistance of the algorithm. Thus, the DL algorithm showed a better diagnosis of pneumonia, particularly visible pneumonia on CR, and improved diagnosis by ED physicians in patients with acute FRI.Entities:
Keywords: acute febrile respiratory illness; artificial intelligence; chest radiograph; deep learning algorithm; emergency department
Year: 2020 PMID: 32599874 PMCID: PMC7356293 DOI: 10.3390/jcm9061981
Source DB: PubMed Journal: J Clin Med ISSN: 2077-0383 Impact factor: 4.241
Clinical Characteristics of Patients with Acute Febrile Respiratory Illness.
| All Patients | Pneumonia | Non-Pneumonia | |||
|---|---|---|---|---|---|
|
| |||||
| Age, years | 20.0 (20.0–21.0) | 20.0 (20.0–21.0) | 20·0 (20.0–21.0) | 0.737 | |
| >50 | 1 (0.3%) | 0 (0.0%) | 1 (0.3%) | 1.000 | |
| ≤50 | 376 (99.7%) | 83 (100.0%) | 293 (99.7%) | ||
| Sex | 1.000 | ||||
| Male | 375 (99.5%) | 83 (100.0%) | 291 (99.3%) | ||
| Female | 2 (0.5%) | 0 (0.0%) | 2 (0.7%) | ||
|
| |||||
| Fever | 377 (100.0%) | 83 (100.0%) | 294 (100.0%) | NA | |
| Maximum temperature, °C | 38.6 (38.3–39.1) | 38.6 (38.4–39.1) | 38.6 (38.3–39.0) | 0.669 | |
| 38–39 | 282 (74.8%) | 61 (73.5%) | 221 (75.2%) | 0.775 | |
| >39 | 95 (25.2%) | 22 (26.5%) | 73 (24.8%) | ||
| Dyspnea | 6 (1.6%) | 4 (4.8%) | 2 (0.7%) | 0.023 | |
| Cough | 377 (100.0%) | 83 (100.0%) | 294 (100.0%) | NA | |
| Sputum | 287 (76.1%) | 63 (75.9%) | 224 (76.2%) | 1.000 | |
| Rhinorrhea | 200 (53.1%) | 39 (47.0%) | 161 (54.8%) | 0.216 | |
| Sore throat | 275 (73.0%) | 50 (60.2%) | 225 (76.5%) | 0.005 | |
| Headache | 202 (53.6%) | 43 (51.8%) | 159 (54.1%) | 0.803 | |
| Nausea | 69 (18.3%) | 18 (21.7%) | 51 (17.3%) | 0.421 | |
| Vomiting | 23 (6.1%) | 9 (10.8%) | 14 (4.8%) | 0.064 | |
| Diarrhea | 22 (5.8%) | 5 (6.0%) | 17 (5.8%) | 1.000 | |
Data are median (IQR) or n (%). NA = not available. * Difference between pneumonia and non-pneumonia groups.
Figure 1Flow chart for the determination of reference standard. FRI = febrile respiratory illness, CR = chest radiograph, CT = computed tomography.
Figure 2AUROCs of DL algorithm and ED physicians (pneumonia vs. non-pneumonia). (a) The DL algorithm showed significantly higher performance than that for ED physicians (0.861 vs. 0.788; p = 0.019). (b) ED physicians’ performance was improved after assistance with DL algorithm (0.788 vs. 0.816; p = 0.068). AUROC = area under the receiver operating characteristics curve, DL = deep learning, ED = emergency department.
Diagnostic Performance of DL algorithm and ED physicians (pneumonia vs. non-pneumonia).
| AUROC (95% CI) | Sensitivity (95% CI) | Specificity (95% CI) | Reading | ||||
|---|---|---|---|---|---|---|---|
| DL algorithm | 0.861 (0.823–0.894) | NA | 0.583 * (0.471–0.690) | NA | 0.944 * (0.912–0.967) | NA | 13 |
| Session 1 (ED physicians only) | |||||||
| Observer 1 | 0.788 (0.743–0.827) | 0.019 a | 0.595 (0.483–0.701) | 1.000 a | 0.690 (0.634–0.741) | <0.0001 a | 156 |
| Observer 2 | 0.814 (0.771–0.851) | 0.132 a | 0.500 (0.389–0.611) | 0.119 a | 0.974 (0.949–0.989) | 0.093 a | 160 |
| Observer 3 | 0.808 (0.766–0.846) | 0.043 a | 0.500 (0.389–0.611) | 0.065 a | 0.997 (0.982–1.000) | 0.0001 a | 179 |
| Group | 0.788 (0.763–0.811) | 0.017 a | 0.532 (0.468–0.595) | 0.053 a | 0.887 (0.864–0.907) | <0.0001 a | 165 |
| Session 2 (ED physicians with DL algorithm assistance) | |||||||
| Observer 1 | 0.838 (0.798–0.874) | 0.111 b | 0.655 (0.543–0.755) | 0.302 b | 0.954 (0.924–0.975) | <0.0001 b | 97 |
| Observer 2 | 0.807 (0.765–0.846) | 0.801 b | 0.560 (0.447–0.668) | 0.227 b | 1.000 (0.988–1.000) | 0.008 b | 87 |
| Observer 3 | 0.806 (0.763–0.844) | 0.913 b | 0.583 (0.471–0.690) | 0.065 b | 0.990 (0.971–0.998) | 0.625 b | 119 |
| Group | 0.816 (0.793–0.838) | 0.068 b | 0.599 (0.536–0.660) | 0.008 b | 0.981 (0.970–0.989) | <0.0001 b | 101 |
AUROC = the area under the receiver operating characteristics curve, DL = deep learning, ED = emergency department. * Sensitivity and specificity of DL algorithm were determined at high-sensitivity threshold. a Comparison of performance with DL algorithm. b Comparison of performance with session 1.
Figure 3AUROCs of DL algorithm and ED physicians (visible pneumonia on CR vs. non-pneumonia). (a) The DL algorithm showed significantly higher performance than that for ED physicians (0.940 vs. 0.871; p = 0.007). (b) ED physicians’ performance was significantly improved after assistance with DL algorithm (0.871 vs. 0.916; p = 0.002).
Diagnostic Performance of DL algorithm and ED physicians (visible pneumonia on CR vs. non-pneumonia).
| AUROC (95% CI) | Sensitivity (95% CI) | Specificity (95% CI) | ||||
|---|---|---|---|---|---|---|
| DL algorithm | 0.940 (0.910–0.962) | NA | 0.817 * (0.696–0.905) | NA | 0·944 * (0·912–0·967) | NA |
| Session 1 (ED physicians only) | ||||||
| Observer 1 | 0.856 (0.816–0.891) | 0.003 a | 0.833 (0.715–0.917) | 1.000 a | 0.690 (0.634–0.741) | <0.0001 a |
| Observer 2 | 0.887 (0.850–0.918) | 0.053 a | 0.700 (0.568–0·812) | 0.119 a | 0.974 (0.949–0.989) | 0.093 a |
| Observer 3 | 0.920 (0.887–0.946) | 0.455 a | 0.683 (0.550–0.797) | 0.022 a | 0.997 (0.982–1.000) | 0.0001 a |
| Group | 0.871 (0.849–0.890) | 0.007 a | 0.739 (0.668–0.801) | 0.034 a | 0.887 (0.864–0.907) | <0.0001 a |
| Session 2 (ED physicians with DL algorithm assistance) | ||||||
| Observer 1 | 0.936 (0.905–0.958) | 0.007 b | 0.867 (0.754–0.941) | 0.774 b | 0.954 (0.924–0.975) | <0.0001 b |
| Observer 2 | 0.907 (0.873–0.935) | 0.412 b | 0.783 (0.658–0.879) | 0.227 b | 1.000 (0.988–1.000) | 0.008 b |
| Observer 3 | 0.907 (0.872–0.934) | 0.609 b | 0.817 (0.696–0.905) | 0.022 b | 0.990 (0.971–0.998) | 0.625 b |
| Group | 0.916 (0.898–0.931) | 0.002 b | 0.822 (0.758–0.875) | 0.014 b | 0.981 (0.970–0.989) | <0.0001 b |
AUROC = the area under the receiver operating characteristics curve, CR = chest radiograph, DL = deep learning, ED = emergency department. * Sensitivity and specificity of DL algorithm were determined at high-sensitivity threshold. a Comparison of performance with DL algorithm. b Comparison of performance with session 1.
Figure 4Representative case of the observer performance test. (a) The CR demonstrates patchy opacity in the left middle lung field (arrow), which was initially detected by only one of three observers. (b) The CT taken on the same day shows branching opacities and centrilobular nodules at the left upper lobe. (c) The DL algorithm correctly detected the lesion (probability score, 0.577). After assistance from the DL algorithm, all observers detected the lesion.
Figure 5False positive interpretations of the DL algorithm. (a,b) The CR shows radio-opaque letters “ROK ARMY” (arrows) of the shirt at the left middle lung field. (c) The DL algorithm wrongly localised the radio-opaque letters (probability score, 0.348). (d) There is an accidentally included abdominal shield at the lower part of the CR. (e) The DL algorithm wrongly detected the abdominal shield (probability score, 0.684). None of the three observers identified these foreign bodies as lesions.