Literature DB >> 35996746

Artificial intelligence model on chest imaging to diagnose COVID-19 and other pneumonias: A systematic review and meta-analysis.

Lu-Lu Jia1, Jian-Xin Zhao1, Ni-Ni Pan1, Liu-Yan Shi1, Lian-Ping Zhao2, Jin-Hui Tian3, Gang Huang2.   

Abstract

Objectives: When diagnosing Coronavirus disease 2019(COVID-19), radiologists cannot make an accurate judgments because the image characteristics of COVID-19 and other pneumonia are similar. As machine learning advances, artificial intelligence(AI) models show promise in diagnosing COVID-19 and other pneumonias. We performed a systematic review and meta-analysis to assess the diagnostic accuracy and methodological quality of the models.
Methods: We searched PubMed, Cochrane Library, Web of Science, and Embase, preprints from medRxiv and bioRxiv to locate studies published before December 2021, with no language restrictions. And a quality assessment (QUADAS-2), Radiomics Quality Score (RQS) tools and CLAIM checklist were used to assess the quality of each study. We used random-effects models to calculate pooled sensitivity and specificity, I2 values to assess heterogeneity, and Deeks' test to assess publication bias.
Results: We screened 32 studies from the 2001 retrieved articles for inclusion in the meta-analysis. We included 6737 participants in the test or validation group. The meta-analysis revealed that AI models based on chest imaging distinguishes COVID-19 from other pneumonias: pooled area under the curve (AUC) 0.96 (95 % CI, 0.94-0.98), sensitivity 0.92 (95 % CI, 0.88-0.94), pooled specificity 0.91 (95 % CI, 0.87-0.93). The average RQS score of 13 studies using radiomics was 7.8, accounting for 22 % of the total score. The 19 studies using deep learning methods had an average CLAIM score of 20, slightly less than half (48.24 %) the ideal score of 42.00. Conclusions: The AI model for chest imaging could well diagnose COVID-19 and other pneumonias. However, it has not been implemented as a clinical decision-making tool. Future researchers should pay more attention to the quality of research methodology and further improve the generalizability of the developed predictive models.
© 2022 The Authors.

Entities:  

Keywords:  2D, two-dimensional; 3D, three-dimensional; AI, artificial intelligence; AUC, area under the curve; Artificial Intelligence; CNN, Convolutional neural network; COVID-19; COVID-19, Coronavirus disease 2019; CRP, C-reactive protein; CT, Computed tomography; CXR, Chest X-Ray; Diagnostic Imaging; GGO, ground-glass opacities; KNN, K-nearest neighbor; LASSO, least absolute shrinkage and selection operator; MEERS-COV, Middle East respiratory syndrome coronavirus; ML, machine learning; Machine learning; PLR, negative likelihood ratio; PLR, positive likelihood ratio; Pneumonia; ROI, regions of interest; RT-PCR, Reverse transcriptase polymerase chain reaction; SARS, severe acute respiratory syndrome; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2; SROC, summary receiver operating characteristic; SVM, Support vector machine

Year:  2022        PMID: 35996746      PMCID: PMC9385733          DOI: 10.1016/j.ejro.2022.100438

Source DB:  PubMed          Journal:  Eur J Radiol Open        ISSN: 2352-0477


Introduction

Beginning in 2020, the coronavirus disease 2019 (COVID-19) has spread widely around the world. As of July 15, 2022, there have been more than 557,917,904 confirmed cases of COVID-19 and 6,358,899 deaths worldwide [1]. Based on the estimated viral reproduction number (R0), the average number of infected individuals who transmit the virus to others in a completely non-immune population is about 3.77 [2], indicating that the disease is highly contagious. Therefore, It is crucial to identify infected individuals as early as possible for quarantine and treatment procedures. The diagnosis of COVID-19 relies on the following criteria: clinical symptoms, epidemiological history, chest imaging, and laboratory tests [3], [4]. The most common clinical symptoms were: fever, cough, dyspnea, malaise, fatigue, phlegm/discharge, among others [5]. However, these symptoms are nonspecific, and non-COVID-19 pneumonia will have similar symptoms [6]. Reverse transcriptase polymerase chain reaction (RT-PCR) is the gold standard for diagnosing COVID-19, however, it has been reported that RT-PCR may not be sensitive enough for early detection of suspected patients, and in many cases the test must be repeated multiple times to confirm the results [7], [8], [9]. Another major diagnostic tool for COVID-19 is chest imaging. Chest CT of COVID-19 is characterized by ground-glass opacities (GGO) (including crazy‐paving) and consolidation [10], [11], [12]. While typical CT images may be useful for early screening of suspected cases, images of various viral pneumonias are highly similar and overlap with image features of other lung infections [13]. For example, GGOs are common in other atypical pneumonia and viral pneumonia diseases such as influenza, severe acute respiratory syndrome (SARS), and Middle East respiratory syndrome (MERS) [14], making it difficult for radiologists to diagnose COVID-19. The results of a meta-analysis showed that chest CT can be used to rule out COVID-19 pneumonia, but cannot distinguish COVID-19 from other lung infections; The fact that both types of pneumonia can appear on chest CT as exudative lesions, GGOs, implies that CT cannot differentiate SARS-CoV-2 infection from other respiratory diseases [15]. Radiomics is an emerging field that can extract high-throughput imaging features from biomedical images and convert them into mineable data for quantitative analysis. The underlying assumption is that changes and heterogeneity of lesions at the microscopic scale (such as at the cellular or molecular level) can be reflected in the images [16], offering hope for distinguishing COVID-19 from other pneumonias. In the past 3 years, there have been many studies on the diagnosis of COVID-19 based on radiomics methods. However, there has not been any research systematically summarizing the current research on artificial intelligence(AI) models for distinguishing COVID-19 from other pneumonias on images, and the overall efficacy of this predictive model is still unknown.Additionally, because radiomics research is a multi-step, complicated process, it is crucial to evaluate the method's quality before applying it to clinical applications to assure dependable and repeatable models. Our systematic review aimed to (1) provide an overview of radiomics studies identifying COVID-19 from other pneumonias and evaluate the efficacy of prediction models; (2) Assess methodological quality and risk of bias in radiomics workflows; (3) Determine which algorithms are most commonly used to distinguish COVID-19 from other pneumonias.

Materials and methods

We followed the STARD (Standards for the Reporting of Diagnostic Accuracy Studies) [17] and Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [18]. The registration number CRD 42021272433.

Search strategy

We searched from the databases of Pubmed, Web of Science, Embase, and Cochrane Library, for studies conducted before November 30, 2021. We searched for preprints from medRxiv and bioRxiv, using the method of combining subject words and free words. The main subject words were"COVID-19", "Artificial Intelligence", and "Diagnostic Imaging". We aimed to identify all relevant studies, regardless of language or publication status, with no language restrictions. We also filtered through the search to identify relevant systematic reviews for inclusion. Meetings, letters, short communication, opinion article were excluded. Details of the search are provided in the Table S1.

Eligibility criteria

Two doctors independently screened articles that were retrieved electronically. Articles that met all the following criteria were included: (1) the index test was studied with chest CT, chest X-ray, or lung ultrasound (2) only tests of metrics interpreted by algorithms, not human interpretations, were included. We included studies involving human interpretations if they provided data related to the diagnostic accuracy of algorithmic interpretations, (3) they had information that distinguished between COVID-19 and other pneumonia, including (community-acquired pneumonia, bacterial pneumonia, viral pneumonia, influenza, interstitial pneumonia, etc.). Exclusion criteria were as follows: (1) the included cases had normal, lung cancer, lung nodules, or other non-pneumonic cases, (2) only the training group model was included and there was no validation group or data regarding diagnostic accuracy, (3) the validation group accepted the index test and reference standard studies with less than 10 participants, (4) no exact number of cases of COVID-19 was provided or other pneumonias, and the data related to the diagnostic accuracy were calculated by the number of CT image layers.

Data extraction

We extracted the following items: date of the study, number of participants and demographic information about participants, type of common pneumonia, type of images used in the model, interest in the selection basis of the area, the diagnostic performance of the training group model, the diagnostic accuracy data of the verification model, whether there was external verification, detailed information regarding the AI algorithm, technical parameters of the index test, reference standard results, and detailed information. Two reviewers independently assessed and extracted relevant information from each included study. For each study, we extracted 2 × 2 data (true positive (TP), true negative (TN), false positive (FP), false negative (FN)) for the validation group. If a study reported accuracy data for more than one model, we took the 2 × 2 contingency table for the model with the largest Youden index. If a study reported accuracy data for one or more radiologists and AI accuracy data, we extracted only the 2 × 2 contingency table corresponding to AI accuracy. If a study reported a combined model of clinical information and radiomics signature data and accuracy data for a separate radiomics data model, we only extracted the 2 × 2 contingency table corresponding to the radiomic model data. If both internal and external validation were reported in a study, we only extracted the 2 × 2 contingency table corresponding to the external validation accuracy data; if the training group, the validation group, and the test group were reported in a study, and we only extracted the test group accuracy data corresponding to the 2 × 2 contingency table. If a study reported accuracy data for more than one external validation, we extracted the 2 × 2 contingency table for the accuracy data for the validation group with the largest number of participants.

Quality assessment

The Radiomics Quality Score (RQS) [19], Checklist for Artificial Intelligence in Medical Imaging (CLAIM) checklist [20] and Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) [21] were used to assess the methodological quality and study-level risk of bias of the included studies, respectively. Studies based on machine learning(ML) methods were evaluated using the Radiomics Quality Score (RQS) (Table S2), while studies relying on deep learning(DL) methods were evaluated using the CLAIM checklist (Table S3). The Diagnostic Accuracy Study Quality Assessment 2 (QUADAS-2) standard consists of four parts: patient selection, index test, reference standard, and flow and timing (Table S4). Two graduate students independently assessed the quality and discussed disagreements with the evidence-based medicine teacher to reach a consensus.

Statistical analysis

We created a 2 × 2 table for each study based on data extracted directly from the article and calculated the accuracy of diagnostic tests [22] (sensitivity, specificity, positive predictive value, negative predictive value, positive likelihood ratio (PLR), and negative likelihood ratio (PLR) with 95 % CI of each study. We analyzed the data at the participant level, rather than the image level and lesion level, which is related to the treatment of pneumonia, and which is how most studies report data. We performed meta-analyses using a bivariate random-effects model, taking into account any correlations that may exist between sensitivity and specificity [23]. We did not perform meta-analyses if only two or three studies (less than four) were assessed for a given study. analysis, because the number of studies was too small for a reliable assessment.We assessed sensitivity and specificity with 95 % confidence intervals(CI) by plotting forest plots, and we performed meta-analyses using midas in Stata14. We explored heterogeneity between studies by visually examining the sensitivity and specificity of forest plots and summary receiver operating characteristic (SROC) plots. If sufficient data and information were available, we plan to perform subgroup analyses to explore study heterogeneity. We considered some sources of heterogeneity, including: comparisons between different imaging methods (CT vs. CRX), modeling methods (radiomics models vs. DL models), comparisons between different sample sizes, Comparisons between different regions of interest (Infection regions vs. Others) and different segmentation methods (2D vs. 3D). Also allow us to assess the impact of various factors on the model's diagnostic performance. We assessed publication bias because we included more than ten studies in this systematic review. We initially assessed reporting bias using funnel plot visual asymmetry, plotting measures of effect size with measures of study precision. We then conducted a formal evaluation using Deeks' test and diagnostic odds ratio (DOR) as a measure of test accuracy [24].

Results

Literature search

As of November 30, 2021, we have retrieved a total of 2001 articles, and after removing duplicates, there are 1509 articles left. Two reviewers independently browsed the titles and abstracts and removed 862 articles that did not match the research topic. After evaluating 647 AI-assisted imaging studies conducted to diagnose, classify, and detect the full text of COVID-19, we excluded 603 articles (including 165 non-new coronary pneumonia types that did not specifically account for the participants; 279 articles including COVID-19, common pneumonia, and healthy individuals; 8 articles including COVID-19, pneumonia, and other non-inflammatory lung diseases for participants; 53 articles including participants with COVID-19, pneumonia, healthy people, and other lung diseases; participants with COVID-19, 16 articles of other lung diseases; participants with COVID-19, 82 articles including healthy individuals); in the final 44 articles, we ultimately included 32 articles, because 8 articles lacked sufficient data to construct a 2 × 2 table; 2 articles lacked a validation group; 2 studies were conducted at the image level. The selection process is shown in Fig. 1.
Fig. 1

Flow diagram of the study selection process for this meta-analysis.

Flow diagram of the study selection process for this meta-analysis. We included 32 studies with a total of 6737 participants, of whom 4076 (60.5 %) were diagnosed with COVID-19, other pneumonias, including viral pneumonia (MEERS-COV, adenovirus, respiratory syncytial virus, influenza virus), bacterial pneumonia, fungal pneumonia, pneumonia caused by atypical pathogens (mycoplasma, chlamydia and legionella), pulmonary mycosis, interstitial pneumonia, and community-acquired pneumonia. The number of participants ranged from 105 to 5372. The mean age of the participants ranged from 40.92 ± 20.41 years to 61.45 ± 15.04 years. The percentage of male participants with COVID-19 ranged from 40.7 % to 62.0 %, and the percentage of male participants in other pneumonias ranged from 36.8 % to 64.0 %. The characteristics of the included studies are summarized in Table 1 and Table 2.
Table 1

Summary of general study characteristics.

Training validation/ Testing
StudyIDCountry of corresponding authorStudy typeIndex testDate sourceEligibility criteriaReference standardCommon type of pneumoniaNumber of COVID-19 vs. other pneumoniasAUCType of validationNumber of COVID-19 vs. other pneumoniasSENSPC
Ardakani 2020IranRCTSingle hospitalYesRT-PCRAtypical, viral pneumonia86 vs. 690.999Random split22 vs.171.000.99
Ardakani 2021IranRCTSingle hospitalYesRT-PCRAtypical and viral pneumonia244 vs.2440.988Random split62 vs.620.9350.903
Ali 2021turkeyRCXRSingle databaseNoNAViralpneumonia146 vs.901NR3fold CV73 vs.4440.973NR
Han2021KoreaRCT2datasetsNoNAViral pneumonia, bacterial pneumonia, fungalpneumonia164 vs.320NRExternal validation21 vs.400.9970.959
Di2020ChinaRCT5hospitalsNoRT-PCRCAP1933 vs. 1064NR10 fold CV215 vs.1180.9320.840
Bai 2020ChinaRCT10hospitalsYesRT-PCRPneumoniaof other origin377 vs.453NRRandom split42 vs.770.9500.960
Panwar 2020MexicoRCXR3datasetsNoNAPneumonia133 vs.231NRRandom split29 vs.850.9660.953
Kang 2020ChinaRCT3hospitalsNoRT-PCRCAP1046 vs. 719NRRandom split449 vs.3080.9660.932
Liu 2021ChinaRCT2hospitalsYesRT-PCRViral infections,mycoplasma infections,chlamydia infections,fungus infections,co-infections66 vs.3131.000External validation20 vs.200.8500.900
Chen 2021ChinaRCTSingle hospitalYesRT-PCROther types of pneumonia54 vs.600.984Random split9 vs.110.8160.923
Song 2020ChinaRCT2hospitalsYesRT-PCRCAP66 vs.660.979External validation15 vs.200.8000.750
Sun 2020ChinaRCT6hospitalsNoRT-PCRCAP1196 vs. 822NR5fold CV299 vs.2050.9310.899
Wang 2021ChinaRCT3hospitalsYesRT-PCROther types ofviral pneumonia74 vs.730.970External validation17 vs.170.7220.751
Zhou 2021ChinaRCT12hospitalsYesRT-PCRInfluenza pneumonia118 vs.157NRExternal validation57 vs.500.8600.772
Azouji2021SwitzerlandRCXR7datasetsNoNAMERS, SARS338 vs.222NR5fold CV85 vs.560.989NR
Cardobi 2021ItalyRCTSingle hospitalNoswab testInterstitial pneumonias54 vs.300.830Random split14 vs.170.5700.930
Yang 2021ChinaRCTSingle hospitalNoRT-PCROther pneumonias70 vs.70NR10fold CV20 vs.200.9420.854
Chikontwe 2021KoreaRCTSingle hospitalNoRT-PCRBacterial pneumonia38 vs.49NRRandom split30 vs.391.0000.975
Zhu 2021ChinaRCT6hospitalsNoRT-PCRCAP1345 vs. 924NR10fold CV150 vs.1030.9130.910
Xie 2020ChinaRCT5hospitalsYesRT-PCRBacterial infection,Viral infection227 vs.153NRprospective RWD243 vs.730.8100.820
Qi 2021ChinaRCT3hospitals+datasetYesRT-PCRCAP127 vs.90NR10fold CV14 vs.100.9720.940
Wang 2020ChinaRCT7hospitalsYesRT-PCRBacterial pneumonia, Mycoplasma pneumonia, Viral pneumonia, Fungal pneumonia560 vs.1490.900External validation102 vs.1240.8040.766
Yang 2020ChinaRCT8hospitalsNoRT-PCRCAP960 vs.6280.976External validation1605 vs. 4520.8690.901
Wu 2020ChinaRCT3hospitalsNoRT-PCROther pneumonia294 vs.1010.767Random split37 vs.130.8110.615
Zhang 2021ChinaRCT3hospitalsNoRT-PCRCAP, influenza, mycoplasma pneumonia72 vs.1270.9875fold CV31 vs.620.8790.887
Xin 2021ChinaRCT2hospitalsYesswab testsCAP34 vs.48NR5fold CV9 vs.120.9570.984
Guo 2020ChinaRCT2hospitalsNoRT-PCRSeasonal flu,CAP8 vs.420.970External validation11 vs.440.8890.935
Fang2020ChinaRCT2hospitalsYesnucleic acid detectionViral pneumonia136 vs.1030.959Random split56 vs.340.9290.971
Xia 2021ChinaRCXR2hospitalsYesnucleic acidInfluenza A/Bpneumonia246 vs.44NRRandom split266 vs.620.8690.742
Huang2020ChinaRCT15hospitalsYesRT-PCRViral pneumonia62 vs.640.8495fold CV27 vs.280.7780.786
Wu 2021ChinaRCTSingle hospitalYesnucleic acidOther infectiouspneumonia76 vs.77NR5fold CV19 vs.190.8090.842
Chen2021ChinaRCT2hospitalsNoRT-PCRViral pneumonia81 vs.810.807Random split27 vs.190.7330.822

Abbreviations: AUC: area under the curve; CAP: community acquired pneumonia;CT: Computed tomography; CV: cross-validation;CXR: Chest X-Ray; R: retrospective; RT-PCR: Reverse transcriptase polymerase chain reaction; RWD: Real-world dataset; SEN: sensitivity;SPC: specificity

Table 2

Summary of artificial intelligence-based prediction model characteristics described in included studies.

StudyIDROISegmentationStyleAI MethodLabelingProcedurePre-ProcessingAugmentationsModelStructureLossFunctionComparison between algorithmsAI vs. Radiologist
Ardakani 2020Regions of infections2DDLby a radiologist with more than 15 years of experience in thoracic imagingManual ROIextraction bycropping, Normalization, transfer-learningNATen well-knownCNNNATen well-knownCNNYes
Ardakani 2021CT chest2DMLBy two radiologistsfeature extractionrandom scalingshearinghorizontal flipensemble methodNADT, KNN, Naïve Bayes, SVMYes
Ali 2021Whole image2DDLNANormalization, transfer-learningHorizontal,vertical flip,Zoom, ShiftResNet50, ResNet101, Res Net 152NAResNet50, ResNet101, Res Net 152No
Han2021CT slices2DDLusing the labeled COVID-19 datasetboth labeled and unlabeled data can be usedrandom scalingrandom translation, random shearing, horizontal flipa semi-supervised deep neural networkstandard cross entropy lossSupervised learningNo
Di2020Infected lesions2DMLNAextracted both regional and radiomics features, SegmentationNAUVHLcross-entropySVM, MLP, iHL, tHLNo
Bai 2020Lungregions2DDLLesions (COVID-19 or pneumo-nia) were manually labeled by2 radiologistsNormalized, Segmentationflips, scaling, rotations, random brightness andcontrast manipulations, random noise, and blurringDNNNANoYes
Panwar 2020Whole image2DDLNAFilter, dimension reduction, deep transfer learningShear, RotationZoom, shiftA DL and Grad-CAMbinary cross-entropy lossNoNo
Kang 2020Lesion region3DMLNASegmentation,FeatureExtraction, NormalizationNAStructured Latent Multi-ViewRepresentation LearningRoss-entropy lossLR,SVM,GNB, KNN, NNNo
Liu 2021Each pneumonia lesion3DMLBy three experienced radiologistsFeatureExtraction, FiltersNALASSO regressionNANoYes
Chen 2021Consolidation and ground-glass opacity lesions3DMLBy fifteen radiologistsFeatureExtraction, wavelet filters, Laplacian of Gaussian filters, Feature selectionNASVMNANoNo
Song 2020CT images2DDLNAsemantic feature extractionNABigBiGANNASVM, KNNYes
Sun 2020Infected lungregions3DDLNAFeatureextractionNAAFS-DFNALR, SVM, RF, NNNo
Wang 2021Pneumonia lesions3D/2DMLBy four radiologistsmanual segmentation, FeatureextractionNALinear, LASSO, RF, KNNNALinear, LASSO, RF, KNNYes
Zhou 2021Lesion regions2DDLannotated by 2 radiologistsSegmentationrandomly flipped, croppedTrinary scheme(DL)Binarycross-entropy lossPlain scheme(DL)Yes
Azouji2021X-ray images2DDLNAResizing x-ray images, Contrast limited adaptive histogram equalization, Deep feature extraction, Deep feature fusionRotation, translationLMPL classifierhinge loss functionNaiveBayes, KNN, SVM,DT, AdaBoostM2, TotalBoost,RF, SoftMax,VGG-NetNo
Cardobi 2021Lung area3DMLNASegmentation, features extractionNALASSO modelNANoNo
Yang 2021Pneumonia lesion3DMLartificially delineatedSegmentation, features extractionspatially resampledSVMNASigmoid-SVM, Poly-SVM, Linear-SVM, RBF-SVMNo
Chikontwe 2021CT slices3DDLNASegmentationrandom transformations,flippingDA-CMILNADeCoVNet, MIL, DeepAttentionMIL, JointMILNo
Zhu 2021CT images3DDLNASegmentation,features extractionNAGACDNBinary cross entropySVM,KNN,NNNo
Xie 2020CT slices3DDLNASegmentation,extract 2D local features and 3D global featuresrandom horizontal flip, random rotation, random scale, random translation, and random elastic transformationDNNNANoYes
Qi 2021Lung field3DDLNAsegmentation of the lung field, Extraction of deep features, Feature representationImage rotation, reflection, and translationDR-MILNAMResNet-50-MIL, MmedicalNet, MResNet-50-MIL-max-pooling, MResNet-50-MIL-Noisy-AND-pooling, MResNet-50-Voting, MResNet-50-MontagesYes
Wang 2020Lung area3DDLNAfully automatic DL model to segment, normalization, convolutional filterNADLNANoNo
Yang 2020Infectionregions3DDLNAClass Re-Sampling Strategies, Attention MechanismscalingDual-Sampling Attention Networkbinary cross entropylossRN34 + US, Attention RN34 + USAttention RN34 + SSAttention RN34 + DSNo
Wu 2020CT slices3DDLNAsegmentationNAMulti-view deep learningfusion modelNASingle-view modelNo
Zhang 2021Major lesions3DDLNASegmentationFeature extraction, Feature selection,scalingDL-MLPNADL-SVM,DL-LR, DL-XGBoostYes
Xin 2021Lungs, lobes, and detected opacities2DDLConfirmed by 3 experienced radiologists and human auditingSegmentationFeature extractionNALR, MLP,SVM, XGboostNALR, MLP,SVM, XGboostNo
Guo 2020NRNAMLby two radiologistsSegmentationFeature extractionNARFNANoNo
Fang2020Primary lesion3D/2DMLby two chest radiologistsSegmentationfeature extraction, feature reduction and selectionNALASSO regressionNANoNo
Xia 2021Lung areas2DDLNASegmentationfeature extractionrandom rotation,scale, transmitDNNCategoricalcross-entropyNoYes(pulmonary physicians)
Huang2020Pneumonia lesion3DMLby two chest radiologistsSegmentationfeature extraction, filterNALogistic modelNANoNo
Wu 2021Maximal regions Involving inflammatory lesions2DMLby two radiologistsfeature extraction, manually delineatingNARFNANoNo
Chen2021Lesionregion2DMLby two radiologistsSegmentationfeature extraction,feature dimensionality reductionNAWSVMNARF, SVMLASSOYes

Abbreviations:AFS-DF:adaptive feature selection guided deep forest;AI:artificial intelligence;BigBiGAN: bi-directional generative adversarial network; CT: Computed tomography; CXR: Chest X-Ray; CNN: Convolutional neural network;DA-CMIL: Dual Attention Contrastive multiple instance learning; DT: Decision tree; DNN: Deep Neural Networks; DR-MIL: deep represented multiple instance learning; DL: deep learning; RF: Random Forests; GNB: Gaussian-Naive-Bayes; Grad-CAM: Gradient Weighted Class Activation Mapping; GACDN: generative adversarial feature completion and diagnosis network; IHL:Inductive Hypergraph Learning; KNN: K-nearest neighbor; LR: Logistic-Regression; LASSO: least absolute shrinkage and selection operator; LMPL: large margin piecewise linear; ML: machine learning; MLA: Machine learning algorithms; MLP: Multilayer Perceptron; MERS: Middle East respiratory syndrome; NN: Neural-Networks; ROI: Region of interest; SVM: Support vector machine; THL: Transductive Hypergraph Learning; 2D: two-dimensional;3D: three-dimensional;UVHL: Uncertainty Vertex-weighted Hypergraph Learning; WSVM: weighted support vector machine

Summary of general study characteristics. Abbreviations: AUC: area under the curve; CAP: community acquired pneumonia;CT: Computed tomography; CV: cross-validation;CXR: Chest X-Ray; R: retrospective; RT-PCR: Reverse transcriptase polymerase chain reaction; RWD: Real-world dataset; SEN: sensitivity;SPC: specificity Summary of artificial intelligence-based prediction model characteristics described in included studies. Abbreviations:AFS-DF:adaptive feature selection guided deep forest;AI:artificial intelligence;BigBiGAN: bi-directional generative adversarial network; CT: Computed tomography; CXR: Chest X-Ray; CNN: Convolutional neural network;DA-CMIL: Dual Attention Contrastive multiple instance learning; DT: Decision tree; DNN: Deep Neural Networks; DR-MIL: deep represented multiple instance learning; DL: deep learning; RF: Random Forests; GNB: Gaussian-Naive-Bayes; Grad-CAM: Gradient Weighted Class Activation Mapping; GACDN: generative adversarial feature completion and diagnosis network; IHL:Inductive Hypergraph Learning; KNN: K-nearest neighbor; LR: Logistic-Regression; LASSO: least absolute shrinkage and selection operator; LMPL: large margin piecewise linear; ML: machine learning; MLA: Machine learning algorithms; MLP: Multilayer Perceptron; MERS: Middle East respiratory syndrome; NN: Neural-Networks; ROI: Region of interest; SVM: Support vector machine; THL: Transductive Hypergraph Learning; 2D: two-dimensional;3D: three-dimensional;UVHL: Uncertainty Vertex-weighted Hypergraph Learning; WSVM: weighted support vector machine Most studies (20/32) included participants selected from two or more hospitals, 7 studies included participants from only one hospital, 4 studies used image data from public databases, and one study had participants from both hospitals and public database [25]. Most studies (28/32) used CT scans, and the remaining four studies used X-rays. Most of the studies (28/32) used RT-PCR as the diagnostic criteria for diagnosing SARS-CoV-2, and the diagnostic criteria of the remaining four studies were unknown [26], [27], [28], [29]. Sixteen studies performed automatic segmentation, 12 studies performed manual segmentation, and the remaining four studies input full-slice images. Fourteen studies performed two-dimensional(2D) segmentation, 15 studies performed three-dimensional(3D) segmentation, two studies performed both 2D segmentation and 3D segmentation [30], [31], and the remaining one study did not describe the segmentation method [32]. Fifteen studies used the infected lesions as regions of interest (ROI), 10 studies used the entire image level as the ROI input models, 6 studies used the entire lung region as ROI, and the remaining one study ROI was not described [32]. Pyradiomics (6/32) was the most often used software for extracting image characteristics, followed by MatLab (4/32), PyTorch (3/32) and Python (3/32). With 13 studies using radiomics models and 19 employing DL models, feature selection and dimensionality reduction are essential to prevent overfitting when developing radiomics models since radiomics characteristics typically exceed the sample size [33]. Least Absolute Shrinkage and Selection Operator (LASSO) regression is the most used algorithm. Twenty studies used two or more models, and 12 studies used a single model. The three most common models include convolutional neural network(CNN), support vector machine (SVM), K-nearest neighbor (KNN). Twenty studies only calculated the diagnostic performance of the AI model, 11 studies compared the AI model with the diagnostic performance of radiologists, and one study compared the AI model with the diagnostic level given by pulmonary physicians [34], The results of these 12 studies all showed that the diagnostic performance of the AI model in distinguishing other pneumonias of the SARS-CoV-2 was higher than that of radiologists or pulmonary physicians.

Risk of bias assessment

The mean RQS score of the included 13 studies was 7.8, accounting for 22 % of the total score. The highest RQS score was 13 (full score was 36), seen in only one study [35], and the lowest RQS score was 4 [32], [36]. Since no study considered the six items "Phantom study", "Imaging at multiple time points", "Biological correlates", "Cut-off analyses", "Prospective study" and "Comparison to 'goldstandard", these six items received a score of zero. Other underperforming items included "Multivariable analysis with nonradiomics features", "Calibration statistics" and "Potential clinical utility", "Cost-effectiveness analysis", "Open science and data", where each item had an average score below 15 % (Fig. 2). Table S5 provides a detailed description of the RQS scores.The average CLAIM score of the 19 included studies using the DL approach was 20, slightly less than half (48.24 %) of the ideal score of 42.00, the highest score was 29 [37] and the lowest was 14 [28] (Fig. 3, Table S6).
Fig. 2

Methodological quality evaluated by using the Radiomics Quality Score (RQS) tool. (A). Proportion of studies with different RQS percentage score. (B). Average scores of each RQS item (gray bars stand for the full points of each item, and red bars show actual points).

Fig. 3

CLAIM items of the 19 included studies expressed as percentage of the ideal score according to the six key domains. CLAIM, Checklist for Artificial Intelligence in Medical Imaging.

Methodological quality evaluated by using the Radiomics Quality Score (RQS) tool. (A). Proportion of studies with different RQS percentage score. (B). Average scores of each RQS item (gray bars stand for the full points of each item, and red bars show actual points). CLAIM items of the 19 included studies expressed as percentage of the ideal score according to the six key domains. CLAIM, Checklist for Artificial Intelligence in Medical Imaging. Risk of bias and applicability issues for 32 diagnostic-related studies according to QUADAS-2 are shown in Fig. S1. Overall, the methods of the 32 selected studies were of poor quality. Most studies showed unclear risk or high of bias in each domain (Table S7). Regarding patient selection, 22 studies were considered to be at high or unclear risk of bias due to unclear how participants were selected and/or unclear detailed exclusion criteria. With regard to the index test, 30 studies were considered to be at high or unclear risk of bias, because it was unclear whether a threshold was used or the threshold was not pre-specified. Regarding reference standards, 5 studies were considered to be at high or unclear risk of bias because reference standards were not described. Regarding the flow and timing, 30 studies were considered to be at high or unclear risk of bias, due to unclear time intervals between indicator tests and reference standards and/or to clarify whether all participants received the same reference standards.

Data analysis

A total of 32 studies were included in the meta-analysis, and for the validation or test group of all studies, the pooled values and 95 % CI for sensitivity, specificity, PLR, NLR, and AUC were 0.92 (95 % CI, 0.88–0.94), 0.91, (95 % CI, 0.87–0.93), 9.7 (95 % CI, 6.8–13.9), 0.09 (95 % CI, 0.06–0.13), 0.96 (95 % CI, 0.94–0.98), respectively. When calculating pooled estimates, We observed great heterogeneity between studies in terms of sensitivity (I2 = 84.7 %), specificity (I2 = 81.1 %). The forest plot is shown in Fig. 4, and we can also see the obvious difference between the 95% confidence and 95 % prediction regions from the SROC curve in Fig. 5, indicating a high possibility of heterogeneity across the studies.
Fig. 4

Coupled forest plots of pooled sensitivity and specificity of diagnostic performance of chest imaging for distinguished COVID-19 and other pneumonias. The numbers are pooled estimates with 95 % CIs in parentheses; horizontal lines indicate 95 % CIs.

Fig. 5

Diagnostic performance of SROC curve of an artificial intelligence model for distinguishing COVID-19 from other pneumonias on chest imaging. There was an obvious difference between the 95 % confidence and 95 % prediction regions, indicating a high possibility of heterogeneity across the studies.

Coupled forest plots of pooled sensitivity and specificity of diagnostic performance of chest imaging for distinguished COVID-19 and other pneumonias. The numbers are pooled estimates with 95 % CIs in parentheses; horizontal lines indicate 95 % CIs. Diagnostic performance of SROC curve of an artificial intelligence model for distinguishing COVID-19 from other pneumonias on chest imaging. There was an obvious difference between the 95 % confidence and 95 % prediction regions, indicating a high possibility of heterogeneity across the studies.

Subgroup analysis

We performed subgroup analyses including five different conditions and ten subgroups. Different imaging methods (CT, CRX), modeling methods (radiomics and deep learning), sample size (whether greater than 100), regions of interest (infection and others) and segmentation methods (2D and 3D) moderate to high diagnostic value was shown in each subgroup. The results are shown in Table 3.
Table 3

The results of subgroup analysis.

SubgroupNumber of studySensitivity(95 % CI)I2(%)SpecificityI2(%)PLRI2(%)NLRI2(%)AUC
Imaging modality
CRX40.91(0.88,0.94)85.60.96(0.95,0.98)95.326.04(3.73,181.94)93.30.04(0.00,0.41)92.60.9914
CT280.89(0.88,0.90)78.90.89(0.87,0.90)62.16.92(5.35,8.96)69.50.14(0.11,0.19)80.00.9427
Modeling methods
Radiomic algorithm130.92(0.90,0.94)78.40.90(0.87,0.92)36.87.16(4.96,10.33)53.00.15(0.08,0.28)85.60.9446
Deep learning190.88(0.87,0.89)78.00.91(0.90,0.92)88.58.32(5.69,12.18)82.50.12(0.09,0.17)76.90.9702
sample size
<100180.87(0.83,0.90)65.40.89(0.86,0.92)47.86.50(4.42,9.58)49.30.18(0.12,0.28)59.00.9371
>100140.89(0.88,0.90)87.00.91(0.90,0.92)90.88.81(6.02,12.89)86.20.10(0.07,0.14)88.60.9725
ROI
Infection regions150.89(0.88,0.90)81.00.89(0.88,0.91)48.86.89(5.20,9.12)58.00.14(0.09,0.20)81.30.9409
others160.88(0.86,0.90)80.40.92(0.90,0.94)89.59.33(5.64,15.45)83.30.11(0.07,0.19)83.20.9691
segmentation
2D140.91(0.89,0.93)71.60.93(0.91,0.95)88.99.71(5.78,16.33)79.30.10(0.06,0.17)77.30.9740
3D150.88(0.87,0.90)85.10.89(0.87,0.90)64.86.77(4.79,9.57)76.60.15(0.10,0.22)85.90.9386

Abbreviations: AUC: area under the curve; CT: Computed tomography; CXR: Chest X-Ray; NLR: negative likelihood ratio; PLR:positive likelihood ratio; ROI: Region of interest;2D: two-dimensional;3D: three-dimensional

The results of subgroup analysis. Abbreviations: AUC: area under the curve; CT: Computed tomography; CXR: Chest X-Ray; NLR: negative likelihood ratio; PLR:positive likelihood ratio; ROI: Region of interest;2D: two-dimensional;3D: three-dimensional

Publication bias

We assessed publication bias for the 3 included studies, first observing that the funnel plots (Fig. 6) were symmetric and uniformly distributed along the x and y axes. Second, we formally assessed using Deeks' test and observed that the slope coefficients were not statistically significant, (P = 0.89) indicating that the data were symmetric, suggesting a low possibility of publication bias.
Fig. 6

Effective sample size (ESS) funnel plots and the associated regression test of asymmetry, as reported by Deeks et al. A p value < 0.10 was considered evidence of asymmetry and potential publication bias.

Effective sample size (ESS) funnel plots and the associated regression test of asymmetry, as reported by Deeks et al. A p value < 0.10 was considered evidence of asymmetry and potential publication bias.

Discussion

In this systematic review, we aimed to determine the diagnostic accuracy of chest imaging-based AI models in distinguishing COVID-19 from other pneumonias, using the QUADAS-2, RQS tool, and the CLAIM checklist assess the quality of included studies. Furthermore, our meta-analysis is the first to quantitatively combine and interpret data from different independent surveys, potentially providing key clues for its clinical application and further research. Despite the favorable results, pooled sensitivity, specificity, and AUC were 0.92 (95 % CI, 0.88–0.94), 0.91 (95 % CI, 0.87–0.93), and 0.96 (95 % CI, 0.94–0.98), but due to the immature stage and relatively poor methodological quality, these imaging studies did not provide clear conclusions for clinical implementation and widespread use. In this review, the combination of the complete RQS tool, CLAIM checklist, and QUADAS-2 assessments revealed several common methodological limitations, some of which apply to both DL and ML studies. The majority of studies (13/32) did not have images segmented by multiple radiologists, however, due to inter-observer heterogeneity, unavoidable even among experienced radiologists [38], this also limits the generalizability of the developed predictive models. Some studies applied automatic segmentation, which overcomes the differences introduced by human factors. However, models created utilizing various segmentations would undoubtedly perform differently even when trained on the same dataset and using the same AI techniques, adding another level of heterogeneity to the field. More than half of the studies did not describe algorithms and software in sufficient detail to replicate the study. Only six percent of the studies published the codes for the models, indicating that readers have access to the full protocol, i.e., code availability. Open data and code facilitate independent researchers using the same methodology and same/different datasets to validate results, with the aim of making research findings more robust. However, only two studies published small amounts of data [37], [39]. Therefore, it is hypothesized that some practical issues, such as reproducibility and generalizability of AI models, should be well resolved before translating these models into routine clinical applications. We know that the typical imaging manifestations of SARS-CoV-2 are ground-glass opacities and consolidation foci, GGO is an indistinct increase in attenuation that occurs in various interstitial and alveolar processes while sparing bronchial [40] and vascular margins, while consolidation is an area of opacity obscuring the margins of the vessel and airway walls [41]. However, other types of pneumonia may share some similar CT imaging features with SARS-CoV-2, especially other viral pneumonias [42], [43], [44], This confuses radiologists when diagnosing SARS-CoV-2, unable to correctly diagnose whether it is SARS-CoV-2 or other pneumonia. A total of 11 studies in our systematic review also assessed the diagnostic performance of radiologists, and one study assessed the diagnostic performance of pulmonologists [34]. Then compared it with the diagnostic accuracy of AI models. all studies have shown that the diagnostic performance of AI models is higher than that of radiologists/pulmonologists. Shows that AI models have great potential in diagnosing SARS-CoV-2 and other pneumonias. We performed a subgroup analysis using five key factors, and in the subgroup analysis of different imaging modalities, the diagnostic performance of the chest X-rays -based AI models were better than that of the CT-based models, but only four studies focusing on chest X-rays (including 453 COVID-19 patients out of 1100 subjects) were included, and all studies used deep learning models. Therefore, the pooled results showing that chest X-rays is superior to chest CT are not entirely convincing. Another subgroup analysis showed that studies using DL models were slightly more valuable than those using ML. The main disadvantage of ML algorithm is that the method is based on hand-crafted feature extractors, which requires a lot of manpower and effort [45]. Furthermore, radiomic signatures are contrived and rely on domain-specific expertize [46]. The advantage of DL is that it does not need to manually extract features during the learning process, avoiding the defects of artificially designed features in radiomics analysis [47]. Since the classifier training, feature selection, and classification of DL model occur simultaneously, researchers only need to input images, not clinical data, or radiomics features. The most commonly used DL model in research is CNN,which inspired by the biological natural visual cognition mechanism, build by convolutional layer, rectified linear units layer, pooling layer and fully-connected layer [48], [49]. For example, VGG and ResNet are adjusted and combined by simple CNN [50]. In addition, the results showed that studies with large sample sizes had better diagnostic accuracy than studies with small sample sizes. Therefore, in future studies, increasing the sample size will improve the ability to diagnose SARS-CoV-2 and various other pneumonias. Limitations of this review. First, many articles published in authoritative journals using AI models to diagnose COVID-19 were not included because the models were not validated. Unvalidated models have limited value, and validation is an integral part of a complete radiomics analysis [19]. Models must be validated internally or externally. Second, the heterogeneity of studies was evident, we performed subgroup analyses to explore sources of heterogeneity, but this was limited, and in fact, heterogeneity is a recognized feature in a review of diagnostic test accuracy [23], and it is impossible to know the source of all the heterogeneity. To date, no systematic review or meta-analysis has been performed that includes all types of imaging techniques to diagnose COVID-19 and other pneumonias. Kao et al.[51] evaluated the CT-based radiomics signature model to successfully distinguish COVID-19 from other viral pneumonias, and came to similar conclusions as ours, with high study heterogeneity. They assessed studies up to February 26, 2021, so only 6 studies were included, and all studies were conducted in China. However, there are several systematic reviews on the diagnosis of COVID-19 based on AI models [52], [53], [54], [55]. The participants in their studies included a series of non-pneumonic participants including lung cancer patients, lung nodules patients, and normal healthy people. These non-pneumonic chest images each have their own typical features. The imaging features are significantly different from those of COVID-19, and radiologists can easily distinguish them, so we did not include such articles in our study. In conclusion, the artificial intelligence approach shows potential for diagnosing COVID-19 and other pneumonias. However, the immature stage and unsatisfactory quality of the research means that the proposed model cannot currently be used for clinical implementation. Before the AI models can be successfully introduced into the clinical environment of COVID-19, we need further large-sample multi-center research, open science and data, to increase the universality of the model. Furthermore, there are some technical hurdles that should be faced when considering the application of image mining tools into daily practice. Persistent efforts are required to make this tool widely available in clinical practice.

Ethical statement

This manuscript has not been published or presented elsewhere and is not under consideration for publication elsewhere. All the authors have approved the manuscript and agree with submission to your esteemed journal. There are no conflicts of interest to declare.

Funding

This study was supported by the Health Commission of Gansu Province, China [GSWSKY2020–15]. The funder has no role in the initial plan of the project, designing, implementing, data analysis, interpretation of data and in writing the manuscript.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
  51 in total

1.  Users' guides to the medical literature. III. How to use an article about a diagnostic test. B. What are the results and will they help me in caring for my patients? The Evidence-Based Medicine Working Group.

Authors:  R Jaeschke; G H Guyatt; D L Sackett
Journal:  JAMA       Date:  1994-03-02       Impact factor: 56.272

2.  Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies: The PRISMA-DTA Statement.

Authors:  Matthew D F McInnes; David Moher; Brett D Thombs; Trevor A McGrath; Patrick M Bossuyt; Tammy Clifford; Jérémie F Cohen; Jonathan J Deeks; Constantine Gatsonis; Lotty Hooft; Harriet A Hunt; Christopher J Hyde; Daniël A Korevaar; Mariska M G Leeflang; Petra Macaskill; Johannes B Reitsma; Rachel Rodin; Anne W S Rutjes; Jean-Paul Salameh; Adrienne Stevens; Yemisi Takwoingi; Marcello Tonelli; Laura Weeks; Penny Whiting; Brian H Willis
Journal:  JAMA       Date:  2018-01-23       Impact factor: 56.272

3.  Chest CT features of community-acquired respiratory viral infections in adult inpatients with lower respiratory tract infections.

Authors:  Kevin T Shiley; Vivianna M Van Deerlin; Wallace T Miller
Journal:  J Thorac Imaging       Date:  2010-02       Impact factor: 3.000

Review 4.  Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.

Authors:  Thomas Struyf; Jonathan J Deeks; Jacqueline Dinnes; Yemisi Takwoingi; Clare Davenport; Mariska Mg Leeflang; René Spijker; Lotty Hooft; Devy Emperador; Julie Domen; Anouk Tans; Stéphanie Janssens; Dakshitha Wickramasinghe; Viktor Lannoy; Sebastiaan R A Horn; Ann Van den Bruel
Journal:  Cochrane Database Syst Rev       Date:  2022-05-20

5.  Negative Nasopharyngeal and Oropharyngeal Swabs Do Not Rule Out COVID-19.

Authors:  Poramed Winichakoon; Romanee Chaiwarith; Chalerm Liwsrisakun; Parichat Salee; Aree Goonna; Atikun Limsukon; Quanhathai Kaewpoowat
Journal:  J Clin Microbiol       Date:  2020-04-23       Impact factor: 5.948

6.  Investigation of publication bias in meta-analyses of diagnostic test accuracy: a meta-epidemiological study.

Authors:  W Annefloor van Enst; Eleanor Ochodo; Rob J P M Scholten; Lotty Hooft; Mariska M Leeflang
Journal:  BMC Med Res Methodol       Date:  2014-05-23       Impact factor: 4.615

7.  Radiomics: Images Are More than Pictures, They Are Data.

Authors:  Robert J Gillies; Paul E Kinahan; Hedvig Hricak
Journal:  Radiology       Date:  2015-11-18       Impact factor: 11.105

8.  An improved multivariate model that distinguishes COVID-19 from seasonal flu and other respiratory diseases.

Authors:  Xing Guo; Yanrong Li; Hua Li; Xueqin Li; Xu Chang; Xuemei Bai; Zhanghong Song; Junfeng Li; Kefeng Li
Journal:  Aging (Albany NY)       Date:  2020-10-21       Impact factor: 5.682

9.  Diagnostic accuracy and interobserver variability of CO-RADS in patients with suspected coronavirus disease-2019: a multireader validation study.

Authors:  Davide Bellini; Nicola Panvini; Marco Rengo; Simone Vicini; Miriam Lichtner; Tiziana Tieghi; Dea Ippoliti; Federica Giulio; Elena Orlando; Mario Iozzino; Maria Grazia Ciolfi; Sarah Montechiarello; Ugo d'Ambrosio; Emanuele d'Adamo; Chiara Gambaretto; Stefano Panno; Vanessa Caldon; Cesare Ambrogi; Iacopo Carbone
Journal:  Eur Radiol       Date:  2020-09-23       Impact factor: 5.315

Review 10.  Characteristics of SARS-CoV-2 and COVID-19.

Authors:  Ben Hu; Hua Guo; Peng Zhou; Zheng-Li Shi
Journal:  Nat Rev Microbiol       Date:  2020-10-06       Impact factor: 78.297

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.