| Literature DB >> 34660166 |
José Daniel López-Cabrera1, Rubén Orozco-Morales2, Jorge Armando Portal-Díaz2, Orlando Lovelle-Enríquez3, Marlén Pérez-Díaz2.
Abstract
Since the outbreak of the COVID-19 pandemic, computer vision researchers have been working on automatic identification of this disease using radiological images. The results achieved by automatic classification methods far exceed those of human specialists, with sensitivity as high as 100% being reported. However, prestigious radiology societies have stated that the use of this type of imaging alone is not recommended as a diagnostic method. According to some experts the patterns presented in these images are unspecific and subtle, overlapping with other viral pneumonias. This report seeks to evaluate the analysis the robustness and generalizability of different approaches using artificial intelligence, deep learning and computer vision to identify COVID-19 using chest X-rays images. We also seek to alert researchers and reviewers to the issue of "shortcut learning". Recommendations are presented to identify whether COVID-19 automatic classification models are being affected by shortcut learning. Firstly, papers using explainable artificial intelligence methods are reviewed. The results of applying external validation sets are evaluated to determine the generalizability of these methods. Finally, studies that apply traditional computer vision methods to perform the same task are considered. It is evident that using the whole chest X-Ray image or the bounding box of the lungs, the image regions that contribute most to the classification appear outside of the lung region, something that is not likely possible. In addition, although the investigations that evaluated their models on data sets external to the training set, the effectiveness of these models decreased significantly, it may provide a more realistic representation as how the model will perform in the clinic. The results indicate that, so far, the existing models often involve shortcut learning, which makes their use less appropriate in the clinical setting. © IUPESM and Springer-Verlag GmbH Germany, part of Springer Nature 2021.Entities:
Keywords: Artificial Intelligence; COVID-19; Chest X-Rays; Deep Learning
Year: 2021 PMID: 34660166 PMCID: PMC8502237 DOI: 10.1007/s12553-021-00609-8
Source DB: PubMed Journal: Health Technol (Berl) ISSN: 2190-7196
Fig. 1Example of CXR image (A) and CT image (B) for a COVID-19 positive patient. Red arrows show a lesion visible on CT, but not detectable using CXR, extracted from [15]
Main studies using XAI techniques to identify COVID-19 using CXR
| Ref | Lung Segmentation | XAI method used | Performance index | Evaluation using ood set |
|---|---|---|---|---|
| [ | No | Grad-Cam, Grad-Cam + + , LRP | Precision = 92% Recall = 92% Fscore = 0.92 | No |
| [ | No | Grad-Cam | Acc = 95.57% | No |
| [ | No | Grad-Cam, Grad-Cam + + | Precision = 96.58% Recall = 96.59% Fscore = 0.96 | No |
| [ | No | Grad-Cam | Precision = 96.44% Recall = 96.33% Fscore = 0.96 Acc = 96.33% | No |
| [ | No | Grad-Cam | Two class Acc = 100% Sensitivity = 99% Specificity = 100% AUC = 1 Three class Acc = 98% Sensitivity = 96% Specificity = 99% AUC = 0.99 | No |
| [ | No | Occlusion, Saliency, Input X Gradient, Integrated Gradients, Guided Backpropagation, DeepLIFT | Micro-F1 = 0.89 | No |
| [ | No | RISE | Sensitivity = 100% Acc = 90.5% | No |
| [ | No | LIME, Saliency Map, Grad-Cam | Two class Acc = 98.02% Three class Acc = 97.12% | No |
| [ | No | Grad-Cam + + | Acc = 91.26% | No |
| [ | No | CycleGAN, Expected Gradients | Internal Partition (iid) AUC = 0.99 External Dataset (ood) AUC = 0.76 | Yes |
| [ | No | Grad-Cam | Acc = 96.3% | No |
| [ | - | Grad-Cam | Positivity Predicted Value = 95% Sensitivity = 94% Fscore = 0.95 | No |
| [ | Yes | Grad-Cam | Acc = 91.67% Fscore = 0.94 | No |
| [ | Yes (bounding box of the lungs) | Grad-Cam, LIME | Fscore = 0.92 | No |
| [ | Yes | Saliency Map, Guided Backpropagation, Grad-Cam | Acc = 97.94% AUC = 0.984 | No |
| [ | Yes | Grad-Cam, | Acc = 98.67% Fscore = 0.98 | No |
| [ | Yes | LIME, Grad-Cam | Fscore = 0.88 | No |
| [ | Yes | Grad-Cam | Acc = 88.9% Fscore = 0.84 Specificity = 96.4% | No |
Fig. 2Activation map for a modification of the CNN COVID-Net [60], obtained from the Grad-Cam method, by using the whole image to perform the classification. Image “a” belongs to the normal class, “b” belongs to the pneumonia class and “c” to COVID-19 class. In all cases, the regions on which the network is basing its decision are outside the lungs
Summary of research using an external image set (ood) as a method of evaluating their models
| Ref | Region used in the image | Performance index on iid set | Performance index on ood set | Sets of images |
|---|---|---|---|---|
| [ | Whole Image | Dataset 1 AUC = 0.992 Dataset 2 AUC = 0.995 | Dataset 1 AUC = 0.76 Dataset 2 AUC = 0.70 | Dataset 1 (GitHub-COVID)2 (ChestX-ray14)3 Dataset 2 (BIMCV-COVID-19 +)4 (PadChest)5 |
| [ | Bounding box of lungs | Dataset 1 Using COVID-Net CXR model [ Sensitivity = - Specificity = - Acc = 93.33% Using COVID-CAPS model [ Sensitivity = 90% Specificity = 95.8%% Acc = 95.3% | Dataset 2 Using COVID-Net CXR model [ Sensitivity = 99.29% Specificity = 0.23% Acc = 49.76% COVID-CAPS model [ Sensitivity = 69.01% Specificity = 26.30% Acc = 47.66% | Dataset 1 (COVIDx)6 Dataset 2 (COVIDGR)7 |
| [ | Bounding box of lungs | Dataset 1 AUC = 1 Dataset 2 AUC = 0.96 | Dataset 1 AUC = 0.38 Dataset 2 AUC = 0.63 | Dataset 1 (V2-COV19-NII)8 (ChestX-ray14)3 Dataset2 (COVID-19-AR)9 (BIMCV-COVID-19 +)4, (Chexpert)10 (Padchest)5 |
| [ | Segmented Lungs | Dataset 1 Sensitivity = 100% Specificity = 100% AUC = 1 Acc = 100% | Dataset 2 Sensitivity = 56% Specificity = 58% AUC = 0.59 Acc = 57% | Dataset 1 (GitHub-COVID)2 Dataset 2 CORDA (Private) |
| [ | Segmented Lungs | Dataset 1 State 1 (classify pneumonia) Sensitivity = 92.85% Specificity = 90.05% AUC = 0.9672 State 2 (classify COVID-19) Sensitivity = 85.26% Specificity = 85.86% AUC = 0.8804 | Dataset 2 State 1 (classify pneumonia) Sensitivity = 63.64% Specificity = 90.48% AUC = 0.9394 State 2 (classify COVID-19) Sensitivity = 50% Specificity = 40% AUC = 0.4 | Dataset1 (GitHub-COVID)2 (Padchest)5 (RSNA)11 Dataset2 Private image sets from Taiwanese hospitals |
2 https://github.com/ieee8023/covid-chestxray-dataset
3 https://nihcc.app.box.com/v/ChestXray-NIHCC
4 https://bimcv.cipf.es/bimcv-projects/bimcv-covid19/
5 https://bimcv.cipf.es/bimcv-projects/padchest/
6 https://github.com/lindawangg/COVID-Net
7 https://dasci.es/es/transferencia/open-data/covidgr/
8 https://data.uni-hannover.de/dataset/cov-19-img/resource/38e72a9b-30a9-422a-a481-c7491e655437
9 https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=70226443
10 https://stanfordmlgroup.github.io/competitions/chexpert/
11 https://www.kaggle.com/c/rsnapneumonia-detection-challenge/data
Summary of works using traditional Computer Vision approach to identify COVID-19 using chest X-Ray imaging
| Ref | Feature Extraction Method | Feature Selection/Reduction Method | Classification Algorithm | Performance Index | Image | Use of ood |
|---|---|---|---|---|---|---|
| [ | New Orthogonal Exponent Moments of Fractional Orders Derived | New feature selection method: Manta Ray; Foraging Optimization (MRFO) using Differential evolution | Knn | Acc = 93% | Whole Image | No |
| [ | VGG-19 + DenceNet-121 | - | SVM | Acc = 98.28% | Whole Image | No |
| [ | Each pixel as Feature; CNN; LBP; Gray Level Co-occurrence | - | MLP CNN | AUC = 0.93 | Whole Image | No |
| [ | Alexnet VGG-16 VGG-19 Xception Resnet18 Resnet50 Resnet101 Inceptionv3 Inceptionresnetv2 GoogleNet Densenet201 | - | SVM | Acc = 95.38% Sensitivity = 97.29% Specificity = 93.47% | Whole Image | No |
| [ | ChexNet [ | PCA | MLP SVM Knn SRC-Dalm SRC-Hom CRC-light CRC CSEN1 CSEN2 ReconNet ResNet-50 Inception-v3 | Acc = 99.26% Sensitivity = 97.14% Specificity = 99.49% | Whole Image, and lateral view | No |
| [ | MobileNet DesnseNet121 DenseNet201 Xception InceptionV3 InceptionResNetV2 ResNet50 ResNet152 VGG16 VGG19 NASNetLarge NASNetMobile ResNet50V2 ResNet101V2 ResNet152V2 | - | Decision Tree Random Forest XGBoost AdaBoost Bagging LightGBM | Acc = 98.00 Precision = 98.00 Recall = 98.00 | Whole Image | No |
| [ | New CNN | - | SVM Decision Tree Knn | Acc = 98.97% Sensitivity = 89.39% Specificity = 99.75% Fscore = 0.96 | Whole Image | No |
| [ | New architecture of CNN; Texture-based; FFT; Wavelet; GLCM; GLDM | DNE Relief LPP Fast-ICA recursive feature elimination variable ranking techniques | SVM GLM Random Forest | Precision = 95% Sensitivity = 94% Fscore = 0.94 | Whole Image | No |
| [ | Inception-v3 LBP LPQ LDN EQP LETRIST BSIFT OBIF | - | SVM Knn MLP Decision Tree Random Forest Ensemble (Sum rule, Product rule, Voting Rule) Clus-HMC | Fscore = 0.83 | Lung bounding box | No |