| Literature DB >> 35923381 |
Erich Kummerfeld1, Christopher J Tignanelli1, Ju Sun1, Le Peng1, Taihui Li1, Dyah Adila1, Zach Zaiman1, Genevieve B Melton-Meaux1, Nicholas E Ingraham1, Eric Murray1, Daniel Boley1, Sean Switzer1, John L Burns1, Kun Huang1, Tadashi Allen1, Scott D Steenburg1, Judy Wawira Gichoya1.
Abstract
Purpose: To conduct a prospective observational study across 12 U.S. hospitals to evaluate real-time performance of an interpretable artificial intelligence (AI) model to detect COVID-19 on chest radiographs. Materials andEntities:
Keywords: Application Domain; Classification; Diagnosis; Infection; Lung
Year: 2022 PMID: 35923381 PMCID: PMC9344211 DOI: 10.1148/ryai.210217
Source DB: PubMed Journal: Radiol Artif Intell ISSN: 2638-6100
Patient Demographics for Training and Validation Datasets
Figure 1:Overview of the COVID-19 diagnostic model pipeline shows segmentation module (top), outlier detection module (middle), and classification module (bottom). DICOM = Digital Imaging and Communications in Medicine, GAN = generative adversarial network, PNG = portable network graphics format.
Figure 2:COVID-19–negative (top) and COVID-19–positive images (bottom) with representative lung masks. Representative examples of a chest radiograph (left) and the accompanying lung segmentation mask (right) for participants with a negative COVID-19 diagnosis and a positive COVID-19 diagnosis are shown in each pairing.
Figure 3:COVID-19 diagnostic artificial intelligence (AI) scores for participants positive and negative for COVID-19. Box and whisker plots show (A) COVID-19 diagnostic scores (y-axis) for non–COVID-19 versus polymerase chain reaction–confirmed COVID-19 from real-time implementation at M Health Fairview and (B) initial COVID-19 diagnostic scores (y-axis) for non–COVID-19 versus mild or moderate (mild/mod) COVID-19 versus severe COVID-19. Boxes represent the IQR (25%–75%), with the median denoted by the horizontal line within each box. The Wilcoxon rank sum test, which is used to compare differences between two groups, was used to evaluate differences in COVID-19 diagnostic AI score and COVID-19 positivity. The Kruskal-Wallis test, which is used to compare differences between three groups, was used to evaluate differences in COVID-19 diagnostic AI score and disease severity. * = P < .001 compared with COVID-19–negative disease. Circles represent outlier points in the box distribution plots.
Figure 4:Interpretable artificial intelligence (AI) heatmaps from real-time implementation indicating features used for AI model prediction. COVID-19–negative and–positive chest radiographs (grayscale) and representative heatmaps (in color) from real-time implementation output are shown. Features of importance for model predictions are represented by the red end of the heatmap spectrum, and blue represents features of least importance. Representative images with both high and low diagnostic scores are provided. Black boxes were added to obscure potentially patient-identifying information.
Evaluation of Artificial Intelligence Model Performance over Week 8 through Week 19
Area under the Receiver Operating Characteristic Curve for Evaluation of Model Equity in Validation Datasets