| Literature DB >> 35005058 |
Alexander Risman1, Miguel Trelles2, David W Denning3.
Abstract
Purpose: Chest x-rays are complex to report accurately. Viral pneumonia is often subtle in its radiological appearance. In the context of the COVID-19 pandemic, rapid triage of cases and exclusion of other pathologies with artificial intelligence (AI) can assist over-stretched radiology departments. We aim to validate three open-source AI models on an external test set. Approach: We tested three open-source deep learning models, COVID-Net, COVIDNet-S-GEO, and CheXNet for their ability to detect COVID-19 pneumonia and to determine its severity using 129 chest x-rays from two different vendors Phillips and Agfa.Entities:
Keywords: COVID-19; artificial intelligence; x-ray
Year: 2021 PMID: 35005058 PMCID: PMC8734487 DOI: 10.1117/1.JMI.8.6.064502
Source DB: PubMed Journal: J Med Imaging (Bellingham) ISSN: 2329-4302
Summary of dataset characteristics. With respect to age, parenthetical values represent the standard deviation; with respect to all other characteristics, the parenthetical values represent proportion of the dataset.
| Variable | Level | Overall |
|---|---|---|
|
| 129 | |
| Age | 44.32 (19.37) | |
| Sex | F | 73 (56.59) |
| M | 56 (43.41) | |
| X-ray manufacturer | Agfa | 83 (64.34) |
| Philips | 46 (35.66) | |
| PCR-positive | Negative | 107 (82.95) |
| Positive | 22 (17.05) |
Fig. 1AUROC curves and point estimates, with bootstrapped 95% confidence intervals, evaluating three open-source machine learning models on the classification task of distinguishing PCR-confirmed COVID-positive x-rays from COVID-negative ones. The performance of two radiologists is included for comparison.
Pearson’s and 95% confidence intervals, evaluating three open-source machine learning models on the regression task of rating the geographic severity of COVID-19 in PCR-confirmed positive chest x-rays, with radiologist-assigned severity scores as ground truth.
| CheXNet | COVID-Net-S-GEO | COVID-Net |
|---|---|---|
| 0.83 (0.64 to 0.93) | 0.93 (0.83 to 0.97) | −0.17 (−0.55 to 0.27) |
Fig. 2Scatter plot and best fit line for the machine learning model that best predicted geographic severity scores for COVID-positive x-rays (, ).
AUCs and bootstrapped 95% CIs for each model in the likelihood detection task by x-ray machine manufacturer. COVID-Net performed poorly, but it appears that this is largely due to poor performance on x-rays from Philips’ machines; the other models do not show a critical disparity in performance across x-ray machine manufacturer, while COVID-Net fails to outperform chance on Philips x-rays despite outperforming chance on Agfa ones.
| COVID-Net | CheXNet | COVID-Net-S-GEO | |||
|---|---|---|---|---|---|
| Agfa AUC | Philips AUC | Agfa AUC | Philips AUC | Agfa AUC | Philips AUC |
| 0.74 (0.54 to 0.90) | 0.60 (0.40 to 0.79) | 0.74 (0.60 to 0.87) | 0.81 (0.65 to 0.94) | 0.73 (0.51 to 0.92) | 0.72 (0.52 to 0.90) |