| Literature DB >> 35031687 |
Chris K Kim1,2, Ji Whae Choi1,3, Zhicheng Jiao1,3, Dongcui Wang4, Jing Wu4, Thomas Y Yi1,3, Kasey C Halsey1,3, Feyisope Eweje5, Thi My Linh Tran1,3, Chang Liu4, Robin Wang5, John Sollee1,3, Celina Hsieh1,3, Ken Chang6, Fang-Xue Yang4, Ritambhara Singh2,7, Jie-Lin Ou4, Raymond Y Huang8, Cai Feng4, Michael D Feldman5, Tao Liu9, Ji Sheng Gong4, Shaolei Lu4, Carsten Eickhoff10, Xue Feng11, Ihab Kamel12, Ronnie Sebro5, Michael K Atalay1,3, Terrance Healey1,3, Yong Fan5, Wei-Hua Liao13, Jianxin Wang14, Harrison X Bai15,16,17.
Abstract
While COVID-19 diagnosis and prognosis artificial intelligence models exist, very few can be implemented for practical use given their high risk of bias. We aimed to develop a diagnosis model that addresses notable shortcomings of prior studies, integrating it into a fully automated triage pipeline that examines chest radiographs for the presence, severity, and progression of COVID-19 pneumonia. Scans were collected using the DICOM Image Analysis and Archive, a system that communicates with a hospital's image repository. The authors collected over 6,500 non-public chest X-rays comprising diverse COVID-19 severities, along with radiology reports and RT-PCR data. The authors provisioned one internally held-out and two external test sets to assess model generalizability and compare performance to traditional radiologist interpretation. The pipeline was evaluated on a prospective cohort of 80 radiographs, reporting a 95% diagnostic accuracy. The study mitigates bias in AI model development and demonstrates the value of an end-to-end COVID-19 triage platform.Entities:
Year: 2022 PMID: 35031687 PMCID: PMC8760275 DOI: 10.1038/s41746-021-00546-w
Source DB: PubMed Journal: NPJ Digit Med ISSN: 2398-6352
Demographic variance across diagnosis model test datasets.
| Training vs. Brown-April | Training vs. External | Training vs. Xiangya-February | |
|---|---|---|---|
| Sex | 0.499 | <0.001 | 0.343 |
| Age | 0.079 | 0.010 | <0.001 |
| Sex | <0.001 | <0.001 | <0.001 |
| Age | <0.001 | <0.001 | <0.001 |
| Sex | <0.001 | <0.001 | <0.001 |
| Age | <0.001 | <0.001 | <0.001 |
P-values were calculated using ANOVA and two-sample t-tests between the training dataset and each testing sample set.
Demographic and clinical variance across prognosis model test datasets.
| Training vs. External test | Internal test vs. External test | |
|---|---|---|
| Sex | <0.001 | <0.001 |
| Age | <0.001 | <0.001 |
| Temperature | ||
| O2 Saturation on room air | <0.001 | <0.001 |
| White blood cell count | <0.001 | 0.007 |
| Lymphocyte count | <0.001 | <0.001 |
| Creatinine | ||
| C-Reactive protein | <0.001 | <0.001 |
| Cardiovascular disease | 0.029 | 0.020 |
| Hypertension | 0.043 | |
| COPD | ||
| Diabetes | ||
| Chronic liver disease | ||
| Chronic kidney disease | 0.014 | |
| Cancer | ||
| Human Immunodeficiency Virus | ||
P-values were calculated using ANOVA and two-sample t-tests between the training dataset and each testing sample set, with values >0.05 marked in bold.
Fig. 1COVID-19 diagnosis AUROC curves for the internally held-out and external test sets.
The true and false positive rates for the study’s radiologists are also portrayed to assess model performance relative to traditional clinical methods. a Internally held-out test set and b external test set. TPR true positive rate, FPR false positive rate.
Fig. 2COVID-19 diagnosis model gradient-weighted class activation mapping (Grad-CAM) visualization.
All images were predicted correctly as COVID-19 positive. Grad-CAM heatmaps visualize which portions of the input chest radiograph were important for the classification decision. a Brown-April, original chest radiographs, b External, original chest radiographs, c Xiangya-February, original chest radiographs, d Brown-April, Grad-CAM overlay, e External, Grad-CAM overlay, and f Xiangya-February, Grad-CAM overlay.
Fig. 3COVID-19 triage pipeline.
The blue arrows represent an illustrative example of how a patient presenting with severe COVID-19 and high risk for critical deterioration would be triaged via the automated pipeline. Recommended patient outcomes would require physician approval before execution.
Fig. 4Features and value propositions of individual pipeline components.
The outlined components work together to deliver an end-to-end pipeline to rapidly identify and triage COVID-19 patients within an emergency department. Tangible value propositions are outlined for each component.
Fig. 5COVID-19 diagnosis prediction model development.
The flowchart delineates the inclusion and exclusion criteria for the training and testing cohorts, as well as the methods to train, test, and evaluate the model.