| Literature DB >> 30901052 |
Eui Jin Hwang1, Sunggyun Park2, Kwang-Nam Jin3, Jung Im Kim4, So Young Choi5, Jong Hyuk Lee1, Jin Mo Goo1, Jaehong Aum2, Jae-Joon Yim6, Julien G Cohen7, Gilbert R Ferretti7, Chang Min Park1.
Abstract
Importance: Interpretation of chest radiographs is a challenging task prone to errors, requiring expert readers. An automated system that can accurately classify chest radiographs may help streamline the clinical workflow.Entities:
Mesh:
Year: 2019 PMID: 30901052 PMCID: PMC6583308 DOI: 10.1001/jamanetworkopen.2019.1095
Source DB: PubMed Journal: JAMA Netw Open ISSN: 2574-3805
Performance of the Deep Learning–Based Automatic Detection Algorithm in the 5 External Validation Tests
| Measure | Performance (95% CI) | ||||
|---|---|---|---|---|---|
| Institution | |||||
| A | B | C | D | E | |
| AUROC | 0.983 (0.961-1.004) | 0.979 (0.960-0.998) | 0.979 (0.962-0.996) | 1.000 (1.000-1.000) | 0.973 (0.949-0.996) |
| AUAFROC | 0.985 (0.967-1.004) | 0.965 (0.941-0.989) | 0.972 (0.953-0.990) | 0.984 (0.971-0.997) | 0.923 (0.879-0.967) |
| High sensitivity threshold | |||||
| Sensitivity | 0.913 (0.841-0.959) | 0.973 (0.931-0.992) | 1.000 (0.964-1.000) | 1.000 (0.957-1.000) | 0.979 (0.927-0.997) |
| Specificity | 1.000 (0.963-1.000) | 0.880 (0.800-0.936) | 0.633 (0.525-0.732) | 0.940 (0.874-0.978) | 0.566 (0.462-0.665) |
| Precision | 1.000 (0.962-1.000) | 0.922 (0.868-0.959) | 0.752 (0.670-0.823) | 0.933 (0.825-0.948) | 0.688 (0.604-0.764) |
| F1 score | 0.955 (0.897-0.979) | 0.947 (0.898-0.975) | 0.858 (0.791-0.903) | 0.965 (0.886-0.973) | 0.808 (0.731-0.865) |
| High specificity threshold | |||||
| Sensitivity | 0.845 (0.760-0.909) | 0.945 (0.895-0.976) | 0.970 (0.915-0.994) | 1.000 (0.957-1.000) | 0.918 (0.844-0.964) |
| Specificity | 1.000 (0.963-1.000) | 0.980 (0.930-0.998) | 0.878 (0.792-0.937) | 1.000 (0.964-1.000) | 0.848 (0.762-0.913) |
| Precision | 1.000 (0.959-1.000) | 0.986 (0.949-0.998) | 0.898 (0.825-0.948) | 1.000 (0.957-1.000) | 0.856 (0.773-0.917) |
| F1 score | 0.916 (0.848-0.952) | 0.965 (0.921-0.987) | 0.933 (0.868-0.970) | 1.000 (0.957-1.000) | 0.886 (0.807-0.940) |
Abbreviations: AUAFROC, area under the alternative free-response receiver operating characteristic curve; AUROC, area under the receiver operating characteristic curve.
Figure 1. Results of External Validation Tests and Observer Performance Tests
The deep learning–based automatic detection algorithm (DLAD) showed consistently high image-wise classification (area under the receiver operating characteristic curve [AUROC], 0.973-1.000) (A) and lesion-wise localization (area under the alternative free-response receiver operating characteristic curve [AUAFROC], 0.923-0.985) (B) performances in external validation tests. In comparison of performance with physicians, the DLAD showed significantly high classification (AUROC, 0.983 vs 0.814-0.932) (C) and localization (AUAFROC, 0.985 vs 0.781-0.907) (D) performances than all observer groups.
Performance of Physicians in the Observer Performance Test
| Observer Group | AUROC (95% CI) | AUAFROC (95% CI) | P Value | Sensitivity (95% CI) | Specificity (95% CI) | |||
|---|---|---|---|---|---|---|---|---|
| Nonradiology physicians | 0.814 (0.764-0.864) | <.001 | 0.781 (0.731-0.832) | <.001 | 0.699 (0.657-0.738) | NA | 0.901 (0.871-0.926) | NA |
| Board-certified radiologists | 0.896 (0.856-0.937) | <.001 | 0.870 (0.830-0.910) | <.001 | 0.812 (0.775-0.845) | NA | 0.948 (0.925-0.966) | NA |
| Thoracic radiologists | 0.932 (0.901-0.963) | .002 | 0.907 (0.874-0.940) | <.001 | 0.876 (0.844-0.903) | NA | 0.946 (0.922-0.965) | NA |
| Nonradiology physicians | 0.904 (0.852-0.957) | <.001 | 0.873 (0.815-0.931) | <.001 | 0.835 (0.800-0.866) | <.001 | 0.924 (0.896-0.946) | .006 |
| Board-certified radiologists | 0.939 (0.911-0.968) | <.001 | 0.919 (0.886-0.951) | <.001 | 0.893 (0.863-0.919) | <.001 | 0.948 (0.925-0.966) | .62 |
| Thoracic radiologists | 0.958 (0.935-0.982) | .002 | 0.938 (0.914-0.961) | <.001 | 0.924 (0.898-0.946) | <.001 | 0.948 (0.925-0.966) | >.99 |
Abbreviations: AUAFROC, area under the alternative free-response receiver operating characteristic curve; AUROC, area under the receiver operating characteristic curve; DLAD, deep learning–based automatic detection algorithm; NA, not applicable.
Comparison of performance with DLAD.
Comparison of performance with session 1.
Figure 2. Representative Case From the Observer Performance Test (Malignant Neoplasm)
A, The chest radiograph (CR) shows nodular opacity at the right lower lung field (arrowhead), which was initially detected by 2 of 15 observers. B, The corresponding computed tomographic (CT) image reveals a nodule at the right middle lobe. C, The deep learning–based automatic detection algorithm (DLAD) correctly localized the lesion (probability score, 0.291). Four observers additionally detected the lesion after checking the output.
Figure 3. Representative Case From the Observer Performance Test (Pneumonia)
A, The chest radiograph (CR) shows subtle patchy increased opacity at the left middle lung field, which was initially missed by all 15 observers. B, The corresponding computed tomographic (CT) image shows patchy ground glass opacity at the left upper lobe. C, The deep learning–based automatic detection algorithm (DLAD) correctly localized the lesion (probability score, 0.371). Seven observers correctly detected the lesion after checking the result.