| Literature DB >> 33969323 |
Jannis Born1,2, David Beymer3, Deepta Rajan3, Adam Coy3,4, Vandana V Mukherjee3, Matteo Manica1, Prasanth Prasanna5,3, Deddeh Ballah6,3, Michal Guindy7,8, Dorith Shaham9, Pallav L Shah10,11,12, Emmanouil Karteris13, Jan L Robertus10,12, Maria Gabrani1, Michal Rosen-Zvi14,15.
Abstract
Although a plethora of research articles on AI methods on COVID-19 medical imaging are published, their clinical value remains unclear. We conducted the largest systematic review of the literature addressing the utility of AI in imaging for COVID-19 patient care. By keyword searches on PubMed and preprint servers throughout 2020, we identified 463 manuscripts and performed a systematic meta-analysis to assess their technical merit and clinical relevance. Our analysis evidences a significant disparity between clinical and AI communities, both in the focus on imaging modalities (AI experts neglected CT and Ultrasound, favoring X-Ray) and performed tasks (71.9% of AI papers centered on diagnosis). The vast majority of manuscripts were found deficient regarding potential use in clinical practice, but 2.7% (N=12) publications were assigned a high maturity level and are summarized in greater detail. We provide an itemized discussion of the challenges in developing clinically relevant AI solutions with recommendations and remedies.Entities:
Keywords: ACR, (American College of Radiology); AI, (Artificial Intelligence); Artificial Intelligence; COVID-19; CT, (Computed Tomography); CXR, (Chest Radiographs); Chest CT; Chest Ultrasound; Chest X-ray; Coronavirus; DL, (Deep Learning); Deep Learning; Digital Healthcare; LUS, (Lung Ultrasound); Lung imaging; MI, (Medical Imaging); Machine Learning; Medical Imaging; Meta Review; PRISMA; PRISMA, (Preferred Reporting Items for Systematic Reviews and Meta-Analyses); RT-PCR, (Reverse Transcriptase Polymerase Chain reaction); SARS-CoV-2; US, (Ultrasound)
Year: 2021 PMID: 33969323 PMCID: PMC8086827 DOI: 10.1016/j.patter.2021.100269
Source DB: PubMed Journal: Patterns (N Y) ISSN: 2666-3899
Figure 1Overview of systematic review and meta-analysis
(A) PRISMA flowchart illustrating the study selection used in the systematic review. Publication keyword searches on PubMed, arXiv, biorXiv, and medRxiv for all of 2020 were performed using two parallel streams. After duplicate matches were removed, titles were screened manually and a selection of 463 relevant manuscripts was chosen for manual review.
(B) Flowchart for quality/maturity assessment of papers. Each manuscript received a score of between 0 and 1 for five categories. Based on the total grade, a low, medium, or high maturity level was assigned. Details on the scoring system and scores for individual papers can be found in supplemental information.
Figure 2Venn diagrams for AI in MI
MI received growing attention in 2020, at least partially due to the COVID-19 pandemic. Automatic keyword searches on PubMed and preprint servers revealed that AI has been a majorly growing subfield of MI and that 827 publications in 2020 mentioned the terms MI, AI, and COVID-19.
Figure 3Number of papers per keyword and platform
Left: paper counts using AI on breast or lung imaging. At half-year resolution, the trends persisted; a >100% growth rate for lung was visible in the first half (H1) of 2020 whereas H2 brought about an additional growth of approximately one-third (not shown). The lightly shaded bars exclude COVID-19-related papers, which show the continuity of publications without COVID-19. Right: paper counts comparing the usage of AI on lung imaging modalities. COVID-19 is accompanied by a shift toward more CXR compared with CT papers. For each keyword, multiple synonyms were used (for details see appendix Table A1).
Figure 4Imaging modality comparison during the COVID-19 pandemic
CT takes the lion's share of clinical papers about lung imaging of COVID-19 (left). The AI community (right) instead published disproportionately more papers on CXR compared with clinicians, whereas CT and ultrasound are under-represented. Multimodal papers used more than one imaging modality.
Figure 5Distribution of manually reviewed papers on AI and MI during the COVID-19 pandemic
Relative proportions for primary performed task (A), quality (B), and data origin (C) are given. N is smaller for (B) and (C), since review papers were excluded from that analysis.
Figure 6Maturity score as function of task (N = 437)
Publications focusing on COVID-19 diagnosis/detection or pure segmentation achieved a significantly lower maturity score than publications addressing/severity assessment/monitoring or prognostic tasks (asterisks indicate significance levels 0.05, 0.01, and 0.001, respectively).
Detailed information on the 12 best papers found in our systematic meta-review of 463 papers (maturity score of high)
| Paper title | Primary task; modality | Key findings | Limitations | Patients (train/val/test) | No. of data sites | Labels | Architecture, dimensionality | Pretraining | Metrics | Results | Reproducibility (code/data open source) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Artificial intelligence-enabled rapid diagnosis of patients with COVID-19 | diagnosis, CT | system identified 68% of RT-PCR-positive patients with normal CT (asymptomatic). Clinical information is important for diagnosis and model is equally sensitive than a senior radiologist | small data size, mild cases have few abnormal findings on chest CT, severity of pathological findings variable in CT | 534/92/279 | 18 | RT-PCR tests | Inception-ResNet-v2 (pretrained ImageNet), 3-layer MLP, 2D | transfer learning (pulmonary tuberculosis model) | AUROC, sensitivity, specificity | 0.92 AUC, 84.3% sens, 82.8% spec | code—yes, data—no |
| Artificial intelligence augmentation of radiologist performance in distinguishing COVID-19 from pneumonia of other origin at chest CT | Diagnosis, CT | AI assistance improved radiologists' performance in diagnosing COVID-19. AI alone outperformed radiologists on sensitivity and specificity | bias in radiologist-annotation, heterogeneous data, bias in location of COVID (China) versus non-COVID pneumonia patients (USA) | 830/237/119 | 13 | RT-PCR tests, slice-level by radiologist | EfficientNet-B4, 2D | transfer learning (ImageNet) | AUROC, sensitivity, specificity, accuracy, AUPRC | 0.95 AUC, 95% sens, 96% spec, 96% acc, 0.9 AUPRC | code—yes, data—no |
| Automated assessment of CO-RADS and chest CT severity scores in patients with suspected COVID-19 using artificial intelligence | diagnosis, CT | a freely accessible algorithm that assigns CO-RADS and CT severity scores to non-contrast CT scans of patients suspected of COVID-19 with high diagnostic performance | only one data center, high COVID prevalence, low prevalence for other diseases | 476/105 | 1 | RT-PCR, radiology report | lobe segmentation 3D UNet, CO-RADS scoring, 3D Inception Net | transfer learning (ImageNet and kinetics) | AUC, sensitivity, specificity | internal: 0.95 AUC, external: 0.88 AUC | code—yes, data—no |
| Diagnosis of Covid-19 pneumonia using chest radiography: value of artificial intelligence | diagnosis, X-ray | AI surpassed senior radiologists in COVID-19 differential diagnosis | high COVID prevalence, human ROC-AUC were averaged from 3 readers | 5,208/2,193 | 5 hospitals, 30 clinics | RT-PCR, natural language processing on radiology report | CV19-Net | 3-stage transfer learning (ImageNet) | AUC, sensitivity, specificity | 0.92 AUC, 88.0% sens, 79.0% spec | code—yes, data—no |
| Development and evaluation of an artificial intelligence system for COVID-19 diagnosis | diagnosis, multimodal | paired cohort of chest X-ray (CXR)/CT data: CT is superior to CXR for diagnosis by wide margin. AI system outperforms all radiologists in 4-class classification | more data on more pneumonia subtypes needed, no clinical information used (could enable severity assessment) | 2,688/2,688/3,649 | 7 | – | lung seg 2D UNet, slice diagnosis 2D ResNet152 | transfer learning (pretrained ImageNet) | AUC, sensitivity, specificity | AUC 0.978 | code—yes, data—no |
| AI-assisted CT imaging analysis for COVID-19 screening: building and deploying a medical AI system | diagnosis, CT | system was deployed in 4 weeks in 16 hospitals; AI outperformed radiologists in sensitivity by wide margin | model fails when multiple lesions, metal or motion artifacts are present, system depends on fully annotated CT data | 1,136 | 5 | Nucleic acid test, 6 annotators (lesions, lung) | 3D UNet++, ResNet50 | full training | sensitivity, specificity | sens 97.4%, spec 92.2% | code—no, data—no |
| Automated assessment and tracking of COVID-19 pulmonary disease severity on chest radiographs using convolutional Siamese neural networks | severity, X-ray | continuous severity score used for longitudinal evaluation and risk stratification (admission CXR score predicts intubation and death, AUC = 0.8). Follow-up CXR score by AI is concordant with radiologist (r = 0.74) | patients only from urban areas in USA, no generalization to posteroanterior radiographs | 160,000/267 (images) | 2 | RT-PCR tests, 2–5 annotators, mRALE | Siamese DenseNet-121 | DenseNet-121 (ImageNet, fine-tuned on CheXpert) | PXS score, Pearson, AUC | r = 0.86, AUC = 0.8 | code—yes, data—partial (COVID CXR not released) |
| Development and clinical implementation of tailored image analysis tools for COVID-19 in the midst of the pandemic | severity, CT | developed algorithms for quantification of pulmonary opacity in 10 days. Human-level performance with <200 CT scans. Model integrated into clinical workflow | data: no careful acquisition, not complete, consecutively acquired or fully random sample; empirical HU-thresholds for quantification | 146/66 | 1 | RT-PCR, 3 radiologist annotators | 3D UNet | full training | Dice coefficient, Hausdoff distance | Dice = 0.97 | code—yes, data—no |
| Clinically applicable AI system for accurate diagnosis, quantitative measurements, and prognosis of COVID-19 pneumonia using computed tomography | prognosis, CT | AI with diagnostic performance comparable with senior radiologist. AI lifts junior radiologists to senior level. AI predicts drug efficacy and clinical prognosis. Identifies biomarkers for novel coronavirus pneumonia lesion. Data available | 3,777 | 4 | pixel-level annotation (5 radiologists) | lung-lesion seg DeepLabV3, diagnosis analysis 3D ResNet-18, gradient boosting decision tree | full training | Dice coefficient, AUC, accuracy, sensitivity, specificity | AUC 0.9797, acc 92.49%, sens 94.93%, spec 91.13% | code—yes, data—yes | |
| Relational modeling for robust and efficient pulmonary lobe segmentation in CT scans | segmentation, CT | leverages structured relationships with non-local module. Can enlarge receptive field of convolution features. Robustly segments COVID-19 infections | errors on border of segmentations, gross pathological changes not represented in data | 4,370/1,100 | 2 (pretraining: 21 centers) | radiology report | RTSU-Net (2-stage 3D UNet) | pretraining on COPDGene | intersection over union, average asymmetric surface distance | IOU 0.953, AASD 0.541 | code—yes, data—no/partial |
| Dual-branch combination network (DCN): toward accurate diagnosis and lesion segmentation of COVID-19 using CT images | diagnosis, CT | DCN for combined segmentation and classification. Lesion attention (LA) module improves sensitivity to CT images with small lesions and facilitates early screening. Interpretability: LA provides meaningful attention maps | diagnosis depends on accuracy of segmentation module, no slice-level annotation | 1,202 | 10 | RT-PCR, pixel-level annotation by 6 radiologists | UNet, ResNet-50 | full training | accuracy, Dice, sensitivity, specificity, AUC, average accuracy | acc 92.87%, Dice 99.11%, sens 92.86%, spec 92.91%, AUC 0.977, average acc 92.89% | code—no, data—no |
| AI-driven quantification, staging and outcome prediction of COVID-19 pneumonia | prognosis, CT | 2D/3D COVID-19 quantification, roughly on par with radiologists. Facilitates prognosis/staging which outperforms radiologists. Rich set of model ensembles, uses clinical features | test dataset partly split by centers | 693 (321,000 slices)/513 for test | 8 | RT-PCR | AtlasNet, 2D | full training | Dice coefficient, correlation, accuracy | Dice 0.7, balanced accuracy 0.7 | code—no, data—yes (without images) |
For discussion, please see the text.
Differences between the imaging modalities
| CT | CXR | LUS | |
|---|---|---|---|
| Benefit | high sensitivity high specificity | fast broadly available | portable radiation-free broadly available |
| Drawback | patient transportation low availability radiation dose increased workload for disinfection | low sensitivity non-specific large volume of radiographs leads to increased workload | user-dependent non-specific long acquisition time requires patient interaction |
| Clinical role | diagnose additional complications rule out additional etiologies of symptoms (effusions, bacterial pneumonia) | initial diagnosis monitoring clinical progression detection of complications | triage point-of-care monitoring for specific tasks |
Figure 7Workflow of collaboration between AI and clinical experts
Top: typical process of developing healthcare AI technology including task definition, data curation, building ML systems, and human-in-the-loop evaluation. Bottom: our proposed workflow, highlighting key components that need to be incorporated into the process to improve collaboration between AI and clinical experts. Note the disparity in value interpretation of the developed solutions by the two communities.