| Literature DB >> 30418527 |
Eui Jin Hwang1, Sunggyun Park2, Kwang-Nam Jin3, Jung Im Kim4, So Young Choi5, Jong Hyuk Lee1, Jin Mo Goo1, Jaehong Aum2, Jae-Joon Yim6, Chang Min Park1.
Abstract
BACKGROUND: Detection of active pulmonary tuberculosis on chest radiographs (CRs) is critical for the diagnosis and screening of tuberculosis. An automated system may help streamline the tuberculosis screening process and improve diagnostic performance.Entities:
Keywords: chest radiograph; computer-aided detection; deep learning; tuberculosis
Year: 2019 PMID: 30418527 PMCID: PMC6695514 DOI: 10.1093/cid/ciy967
Source DB: PubMed Journal: Clin Infect Dis ISSN: 1058-4838 Impact factor: 9.079
Demographic Description of the 6 External Validation Datasets
| Demographic Variables | Seoul National University Hospital Dataset | Boramae Medical Center Dataset | Kyunghee University Hospital at Gangdong Dataset | Daejeon Eulji Medical Center Dataset | Montgomery Dataset | Shenzhen Dataset |
|---|---|---|---|---|---|---|
| Patients with TB | ||||||
| Number of patients | 83 | 70 | 103 | 70 | 52 | 320 |
| Gender (male:female) | 52:31 | 42:28 | 66:37 | 47:23 | 32:20 | 220:100 |
| Age (years)a | 59 (17–88) | 59 (25–94) | 51 (15–93) | 50 (20–86) | 48 (15–89) | 34 (2–89) |
| Mode of diagnosis (culture:polymerase chain reaction only) | 68:15 | 35:35 | 95:8 | 70:0 | U/A | U/A |
| Time interval between diagnosis and CR (days)a | 4 (0–14) | 2 (0–13) | 3 (0–30) | 2 (0–31) | U/A | U/A |
| Time interval between CR and CT (days)a | 7 (0–29) | 1 (0–28) | 3 (0–29) | 0 (0–7) | U/A | U/A |
| Total number of TB lesions on CR | 132 | 145 | 191 | 231 | 82 | 493 |
| Location of TB lesion (right:left:bilateral) | 36:12:35 | 26:8:36 | 34:20:49 | 22:10:38 | 17:16:19 | 126:69:125 |
| Patients without TB | ||||||
| Number of patients | 100 | 70 | 70 | 100 | 80 | 326 |
| Gender (male:female) | 49:51 | 24:46 | 26:44 | 45:55 | 25:59b | 220:106 |
| Age (years)a | 55 (25–80) | 54 (28–86) | 49.5 (15–73) | 44 (19–86) | 33.5 (4–70) | 31 (0–85) |
| Time interval between CR and CT (days)a | 0 (0–16) | 0 (0–13) | 4 (0–15) | 0 (0–20) | U/A | U/A |
Abbreviations: CR, chest radiograph; CT, computed tomography; TB, tuberculosis; U/A, unavailable.
aData are median values (range).
bInformation for 1 case was unavailable.
Performance of Deep Learning–based Automatic Detection Algorithm in the 6 External Validation Datasets
| Performance Measures | Seoul National University Hospital Dataset | Boramae Medical Center Dataset | Kyunghee University Hospital at Gangdong Dataset | Daejeon Eulji Medical Center Dataset | Montgomery Dataset | Shenzhen Dataset |
|---|---|---|---|---|---|---|
| Area under the receiver operating characteristic curve | 0.993 (0.984–1.002) | 0.979 (0.954–1.005) | 1.000 (0.999–1.000) | 1.000 (0.999–1.000) | 0.996 (0.991–1.001) | 0.977 (0.967–0.987) |
| Area under the alternative free-response receiver operating characteristic curve | 0.993 (0.983–1.003) | 0.981 (0.960–1.001) | 0.994 (0.987–1.001) | 1.000 (0.999–1.000) | 0.996 (0.990–1.002) | 0.973 (0.963–0.984) |
| SensitivitySENa | 0.952 (0.881–0.987) | 0.943 (0.860–0.984) | 1.000 (0.965–1.000) | 1.000 (0.949–1.000) | 1.000 (0.932–1.000) | 0.947 (0.916–0.969) |
| SpecificitySENa | 1.000 (0.964–1.000) | 0.957 (0.880–0.991) | 0.914 (0.823–0.968) | 0.980 (0.930–0.998) | 0.938 (0.860–0.979) | 0.911 (0.875–0.940) |
| True detection rateSENa | 0.962 (0.914–0.988) | 0.945 (0.894–0.976) | 1.000 (0.981–1.000) | 1.000 (0.984–1.000) | 1.000 (0.956–1.000) | 0.953 (0.931–0.970) |
| SensitivitySPEa | 0.843 (0.747–0.914) | 0.900 (0.805–0.959) | 0.990 (0.947–1.000) | 0.986 (0.923–1.000) | 0.846 (0.719–0.931) | 0.841 (0.796–0.879) |
| SpecificitySPEa | 1.000 (0.964–1.000) | 1.000 (0.949–1.000) | 1.000 (0.949–1.000) | 1.000 (0.964–1.000) | 1.000 (0.955–1.000) | 0.991 (0.973–0.998) |
| True detection rateSPEa | 0.750 (0.667–0.821) | 0.759 (0.681–0.826) | 0.806 (0.743–0.860) | 0.719 (0.656–0.776) | 0.719 (0.609–0.813) | 0.771 (0.731–0.807) |
aSubscript SEN indicates the high-sensitivity cutoff; subscript SPE indicates the high-specificity cutoff.
Figure 1.Performance of deep learning–based automatic detection algorithm (DLAD) at in-house validation and external validation. Original (a) and zoomed (b) receiver operating characteristic (ROC) curves for DLAD in in-house validation and external validation datasets. The DLAD showed consistently high performance in image-wise classification, not only in the internal validation dataset but also in the 6 external validation datasets; AUROC values ranged from 0.977 to 1.000. For lesion-wise localization performance assessed by jackknife alternative free-response ROC (c, d), DLAD showed consistently high performance in different datasets; AUAFROC ranged from 0.973 to 1.000. Abbreviations: AUAFROC, area under the alternative free-response receiver operating characteristic curves; BMC, Boramae Medical Center; DEMC, Daejeon Eulji Medical Center; KUHG, Kyunghee University Hospital at Gangdong; SNUH, Seoul National University Hospital.
Performance of Physicians According to Reader Groups
| Reader Groups | Area Under the Receiver Operating Characteristic Curve | Area Under the Alternative Free- response Receiver Operating Characteristic Curve | Sensitivity | Specificity | True Detection Rate |
|---|---|---|---|---|---|
| Session 1 (physician reading only) | |||||
| Nonradiology physicians | 0.746 (0.552–0.940) | 0.664 (0.466–0.861) | 0.723 (0.677–0.765) | 0.670 (0.627–0.711) | 0.582 (0.543–0.620) |
| | .0230 | .0088 | |||
| Board-certified radiologists | 0.946 (0.911–0.982) | 0.900 (0.856–0.943) | 0.906 (0.874–0.932) | 0.948 (0.925–0.966) | 0.797 (0.764–0.827) |
| | .0082 | .0003 | |||
| Thoracic radiologists | 0.971 (0.948–0.993) | 0.925 (0.890–0.959) | 0.952 (0.927–0.970) | 0.930 (0.904–0.951) | 0.870 (0.842–0.894) |
| | 0.0218 | 0.0001 | |||
| Session 2 (physician reading with DLAD assistance) | |||||
| Nonradiology physicians | 0.850 (0.694–1.005) | 0.781 (0.598–0.965) | 0.848 (0.810–0.881) | 0.800 (0.762–0.834) | 0.724 (0.688–0.758) |
| | .0610 | .0236 | <.0001 | <.0001 | <.0001 |
| Board-certified radiologists | 0.961 (0.933–0.988) | 0.924 (0.891–0.957) | 0.930 (0.901–0.953) | 0.954 (0.932–0.971) | 0.849 (0.819–0.875) |
| | .0606 | .0353 | .0075 | .0833 | <.0001 |
| Thoracic radiologists | 0.977 (0.957–0.997) | 0.942 (0.913–0.971) | 0.964 (0.941–0.980) | 0.936 (0.911–0.956) | 0.897 |
| | .1623 | .0036 | .0587 | .2568 | .0004 |
aComparison of performance with deep learning–based automatic detection (DLAD) algorithm.
bComparison of performance with session 1.
Figure 2.Comparison of diagnostic performance between deep learning–based automatic detection algorithm (DLAD) and physician groups. The DLAD showed significantly higher performance than all reader groups both in terms of image-wise classification (a) and lesion-wise localization (b) in the observer performance test. Abbreviations: AUAFROC, area under the alternative free-response receiver operating characteristic curves; AUROC, area under the receiver operating characteristic curve.