| Literature DB >> 34054148 |
Weronika Hryniewska1, Przemysaw Bombiski2, Patryk Szatkowski2, Paulina Tomaszewska1, Artur Przelaskowski1, Przemysaw Biecek1,3.
Abstract
The sudden outbreak and uncontrolled spread of COVID-19 disease is one of the most important global problems today. In a short period of time, it has led to the development of many deep neural network models for COVID-19 detection with modules for explainability. In this work, we carry out a systematic analysis of various aspects of proposed models. Our analysis revealed numerous mistakes made at different stages of data acquisition, model development, and explanation construction. In this work, we overview the approaches proposed in the surveyed Machine Learning articles and indicate typical errors emerging from the lack of deep understanding of the radiography domain. We present the perspective of both: experts in the field - radiologists and deep learning engineers dealing with model explanations. The final result is a proposed checklist with the minimum conditions to be met by a reliable COVID-19 diagnostic model.Entities:
Keywords: COVID-19; X-ray; computed tomography; deep learning; explainable AI; lungs
Year: 2021 PMID: 34054148 PMCID: PMC8139442 DOI: 10.1016/j.patcog.2021.108035
Source DB: PubMed Journal: Pattern Recognit ISSN: 0031-3203 Impact factor: 7.740
Fig. 1PRISMA Flow Diagram shows the flow of information through the different phases of a systematic review including inclusions and exclusions.
Fig. 2Taxonomy of AI applications in 25 reviewed studies.
This table presents the data sources used in studies reviewed in this survey. For each data source, we list articles that use it. The JPEG quality factor (QF) for most images has been set to 75, other cases are indicated. In the case of COVID-Net, please note that it is not a data source, but a study collating 5 datasets. Some other studies refer to it instead of referring to the original source.
| No. | Institution | Link to dataset | Used in article | Dynamic range of images | Data processing | Prepared for scientific experiments |
|---|---|---|---|---|---|---|
| 1) | University of Waterloo | github.com/lindawangg/COVID-Net | ||||
| 1a) | University of Waterloo | github.com/agchung/Figure1-COVID-chestxray-dataset | 8 bits, 48 cases | JPG, PNG | X-ray database for research purposes only, continuously growing; Metadata: offset, sex, age. finding, survival temperature, pO2, saturation, view, modality, artifacts/distortion, notes; Categories: covid, pneumonia, no finding | |
| 1b) | University of Waterloo | github.com/agchung/Actualmed-COVID-chestxray-dataset | 8 bits, 237 cases | PNG, BMP | X-ray database for research purposes only, continuously growing; Metadata: finding, view, modality, notes; Categories: covid, no finding | |
| 1c) | Qatar & Bangladesh Universities | kaggle.com/tawsifurrahman/covid19-radiography-database | 8 bits, 21165 cases | PNG, resized | X-ray database; No metadata; Categories: COVID-19 positive cases (3616), normal (10,192), lung opacity (Non-COVID lung infection - 6,012), viral pneumonia (1,345) | |
| 1d) | University of Montreal | github.com/ieee8023/covid-chestxray-dataset | 8 bits, 951 cases | JPG, PNG, resized | X-ray database; Metadata: covid severity scores, sex,age, finding, RT_PCR_positive, survival, intubated, intubation_present, went_icu, in_icu, needed_supplemental_O2, extubated, temperature, pO2_saturation, leukocyte_count, neutrophil_count, lymphocyte_count, clinical_notes, other_notes; Categories: covid, viral, bacterial, fungal, lipoid, aspiration, unknown | |
| 1f) | National Institutes of Health | kaggle.com/c/rsna-pneumonia-detection-challenge | 8 bits, 30227 (training)+3000 (test) cases | DICOM, resized | X-ray database of Pneumonia Detection Challenge; No metadata; Categories: normal. lung opacity, no lung opacity/not normal | |
| 7) | National Institutes of Health | nihcc.app.box.com/v/ChestXray-NIHCC | 8 bits, 112120 cases | PNG, resized | X-ray database of Common Thorax Disease; Metadata: finding ROI; Categories: no findings and 14 disease categories (Atelectasis, Cardiomegaly, Effusion, Infiltration, Mass, Nodule, Pneumonia, Pneumothorax, Consolidation, Edema, Emphysema, Fibrosis, Pleural_Thickening, Hernia) | |
| 8) | National Institutes of Health | kaggle.com/nih-chest-xrays/sample | 8 bits, Random sample of 5606 from 112,120 images of 30,805 unique patients | PNG, resized | X-ray database; Metadata: finding labels, follow-up, age, gender, view; Categories: Atelectasis, Cardiomegaly, Effusion, Infiltration, Mass, Nodule, Pneumonia, Pleural_Thickening, Hernia, Pneumothorax, Consolidation, Edema, Emphysema, Fibrosis | |
| 9) | University of Montreal | kaggle.com/praveengovi/coronahack-chest-xraydataset | 8 bits, 5910 cases (normal-1576, covid 58, SARS-4, virus-1493, bacteria 2777, ARDS-2) | JPG,PNG-resized | Collection Chest X Ray (anterior-posterior) of Healthy vs Pneumonia (Corona) affected patients infected patients along with few other categories such as SARS (Severe Acute Respiratory Syndrome), Streptococcus & ARDS (Acute Respiratory Distress Syndrome); No metadata | |
| 10) | University of California San Diego | kaggle.com/paultimothymooney/chest-xray-pneumonia | 8 bits, 5863 cases | JPG | Chest X-ray images (anterior-posterior) were selected from retrospective cohorts of pediatric patients of one to five years old from Guangzhou Women and Childrens Medical Center, Guangzhou. All chest X-ray imaging was performed as part of patients routine clinical care.; Categories: normal and pneumonia; No metadata | |
| 11) | University of California San Diego | github.com/UCSD-AI4H/COVID-CT | 8 bits, 349 cases | Images collected (scanned) from covid-related and medical papers in PNG (covid) or JPG (normal) | This dataset has 349 CT images containing clinical findings of COVID-19 from 216 patients; Categories: covid and noncovid cases; Metadata: age, gender, location, medical history (unfortunately modest), time after the onset of illness, severity, other diseases | |
| 12) | University of California San Diego | data.mendeley.com/datasets/rscbjbr9sj/2 | 8 bits, 5233 cases | JPG (QF=95 for normal and QF=75 for pneumonia) | Collection Chest X Ray; Categories: normal (1349 cases) vs pneumonia (3884 cases) including subcategories of bacteria and virus; No metadata | |
| 13) | Elazig in Turkey | github.com/muhammedtalo/COVID-19 | 8 bits, 1125 cases | JPG (QF=90, subsampling2x2), PNG (resized) | X-Ray Images collection; No metadata; Categories: covid (125 cases), no findings (500 cases), pneumonia (500 cases) | |
| 14) | National Library of Medicine | openi.nlm.nih.gov/gridquery?it=xg&coll=cxr&m=1&n=100 | 8 bits or full bits, 7470 cases | PNG (resized), Full DICOM | Chest X-rays collection with 3,955 radiology reports; Categories: 14 pulmonary categories; Metadata: time after the onset of illness, severity, other diseases, captions of symptoms as unstructured symptom description | |
| 15) | Stanford University School of Medicine | stanfordmlgroup.github.io/competitions/chexpert | 8 bits, 224,316 chest radiographs of 65,240 patients | JPG | Large dataset of chest X-rays which features uncertainty labels and radiologist-labeled reference standard evaluation sets; Categories: each report was labeled for the presence of 14 observations (no finding, enlarged cardiom., cardiomegaly, lesion, opacity, edema, consolidation, pneumonia, atelectasis, pneumothorax, pleural effusion, pleural other, fracture, support devices) as positive, negative, or uncertain; Metadata: related to above categories (blank for unmentioned, 0 for negative, -1 for uncertain, and 1 for positive) | |
| 16) | Hospital San Juan de Alicante - University of Alicante | bimcv.cipf.es/bimcv-projects/padchest | 8 bits, more than 160,000 images from 67,000 patients | PNG | PadChest: A large chest x-ray image dataset with multi-label annotated reports; the reports were labeled with 174 different radiographic findings, 19 differential diagnoses, and 104 anatomic locations; a 27% of the reports were manually annotated by trained physicians; Metadata: age, sex | |
| 17) | Hospital Universitario San Cecilio | github.com/ari-dasci/OD-covidgr | 8 bits, 852 images | JPEG (QF=90) | X-ray images: 426 positive covid cases and 426 negative cases; only the posterior-anterior view is considered; Categories: covid severity - normal-PCR+ (76), mild (100), moderate (171), severe (79); General metadata: positive images correspond to patients who have been tested positive with RT-PCR within a time span of at most 24h between the X-ray image and the test; every image has been taken using the same type of equipment and with the same format | |
| 18) | Beth Israel Deaconess Medical Center in Boston | physionet.org/content/mimic-cxr/2.0.0 | full bits, 227,835 imaging studies for 65,379 patients | full DICOM | Chest radiographs with metadata: electronic health record data, dicom metadata, free-text radiology reports Categories: 14 pulmonary observations with an additional uncertain category | |
| 19) | Societ Italiana di Radiologia Medica e Interventistica | sirm.org/category/senza-categoria/covid-19 | 8 bits | mostly JPG (QF=95, subsampling2x2) | Chest radiographs with free-text radiology and clinical reports, covid confirmation; Metadata includes selected information from electronic health record (e.g. symptoms, lab exams, ARDS, ventilatory assistance, previous exams); Categories: covid confirmation or no with 14 pulmonary observations | |
| 20) | National Cancer Institute | wiki.cancerimagingarchive.net/display/Public/LIDC-IDRI | full bits, 1308 cases | full DICOM | The Lung Image Database consists of diagnostic and lung cancer screening thoracic CT scans with marked-up annotated lesions (XML); it includes three categories (”nodule>=3 mm”, ”nodule<3 mm”, and ”non-nodule>=mm”); | |
| 21) | University of Brescia | brixia.github.io#dataset | full bits, 4,707 cases | full DICOM | COVID-19 subjects, acquired with both CR and DX modalities, in AP or PA projection with highly expressive multi-zone COVID-19 severity score, fully annotated; Metadata: the multi-region 6-valued Brixia-score defined for six zones, sex, age | |
| 22) | open-edit radiology resource | radiopaedia.org | 8 bits, a significant number of cases, constantly updated | JPG with different QF, resized | Database of general radiological purposes; in selected cases free-text radiology and clinical reports, selected; generally, quantitatively and qualitatively differentiated case reports | |
| 23) | generated using data augmentation | kaggle.com/nabeelsajid917/covid-19-x-ray-10000-images | 8 bits, 104 cases | JPEG with different QF, resized | Corona Virus X-ray Dataset; Categories: covid and normal; No metadata | |
| 24) | offline database or from hospital | |||||
Class balance in the reviewed studies. The class balance is crucial developing an accurate model. In the following rows there are presented: the number of COVID-19 images in the study, the total number of images, the number of classes into which images were divided, and aspect ratio. The aspect ratio is calculated by dividing the number of COVID-19 images by the total number of images and then multiplying it by the number of classes. The biggest collected COVID-19 dataset and the largest total number of images are marked in bold. The smaller the aspect ratio, the less COVID-19 cases participate in the whole study’s dataset, and vice versa. For this reason, the best-balanced dataset (nearest 1) is marked in bold. Studies which do not include full information about the number of cases were excluded.
| Study | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 250 | 230 | 345 | 502 | 127 | 377 | 112/137 | 76 | 358 | 400 | 400 | 58 | 234 | 68 | 829 | 99 | 120 | 269 | |||
| 6523 | 460 | 720 | 1004 | 2186 | 1125 | 754 | 366 | 5949 | 750 | 800 | 2800 | 1234 | 5941 | 1,865 | 239 | 5 801 | ||||
| 2/3 | 3 | 3 | 2 | 4 | 2 | 3 | 3/2 | 3 | 3 | 2 | 2 | 4 | 12 | 4 | 2 | 3 | 2 | 3 | ||
| 0.08/0.11 | 1.5 | 0.16 | 2 | 1.36 | 0.34 | 0.63 | 0.92/0.75 | 0.04 | 0.08 | 1.07 | 0.08 | 2.28 | 0.05 | 0.89 | 0.02 | 0.14 |
multilabel classification, training slides (106 COVID-19 images), four COVID-19 classes (NormalPCR+: 76 cases, Mild cases: 80, Moderate: 145 cases, Severe: 76 cases) and one Negative (377 cases)
Fig. 3Differences between AP and PA chest projections.
Image preprocessing techniques in the reviewed studies.
| Preprocessing technique | Reference |
|---|---|
| Resize to the same size | |
| Normalize pixel intensity | |
| Eliminate noise | |
| Use Perona-Malik filter | |
| Limit image intensity | |
| Equalize histogram | |
| Perform image enhancement | |
| Cast data type | |
| Change color space | |
| Crop image | |
| Zoom image / augmentation | |
| Add pixels | |
| Feature encoding | |
| Rotate image | |
| Use 2D wavelet transform | |
| Feature extraction | |
| Lack of preprocessing or description |
Data augmentation techniques used in studies. Some techniques are parametrizable, so the table indicates the techniques and parameters used. An indentation is used to show the subtypes of the method.
| Data augmentation technique | Values and studies |
|---|---|
| Affine transformations | |
| Rotation | |
| Scaling / Zooming | |
| Flip | |
| Horizontal | |
| Vertical | |
| Shifting / Translation | |
| Shearing | |
| Brightness change | |
| Crop | |
| Contrast change | |
| Gaussian noise | |
| ZCA whitening transformation | |
| Elastic transformation | |
| Grid distortion | steps=5, limit=0.3 |
| Optical distortion | distort=0.2, shift=0.05 |
| Warping | 10% |
| Multiple patches from each image | |
| Class-inherent transformations Network* | |
| Augmentation used but parameters are not specified | |
| No augmentation used |
* Inspired by generative adversarial networks.
Fig. 4Examples of explanations for COVID-related models from studies: [34], [36], [39], [42], [43], [44]. The following explanations are used: a) Grad-CAM, b) CAM, c) saliency, d) guided backpropagation, e) integrated gradients, f) LIME. Such explanations can be divided into 4 types: heat maps (image a) - c)), contour lines (d)), points (e)), and image pieces (f)).
The depth, number of parameters and type of layers for neural networks in considered papers. For large networks gradient based explanations are noisy. Some explanation techniques assume specific types of layers.
| Model architectures | Depth | No. of parameters | Layer types |
|---|---|---|---|
| ResNet18, ResNet34, ResNet50, ResNet15V2, ResNet50V2 | - | 11.7M-25.6M | ZeroPadding2D, Conv2D, BatchNormalization, Activation, MaxPooling2D, Add, GlobalAveragePooling2D, Dense |
| DenseNet121, DenseNet-161, DenseNet-201 | 121-201 | 8.1M-20.0M | ZeroPadding2D, Conv2D, BatchNormalization, Activation, MaxPooling2D, Concatenate, AveragePooling2D, GlobalAveragePooling2D, Dense |
| VGG-16, VGG-19 | 23-26 | 138-144 | Conv2D, Dense, Flatten, InputLayer, MaxPooling2D |
| InceptionV3 | 159 | 23.9M | Conv2D, BatchNormalization, Activation, MaxPooling2D, AveragePooling2D, Concatenate, GlobalAveragePooling2D, Dense |
| InceptionResNetV2 | 572 | 55.9M | Conv2D, BatchNormalization, Activation, MaxPooling2D, AveragePooling2D, Concatenate, Lambda, GlobalAveragePooling2D, Dense |
| MobileNet | 88 | 4.3M | Conv2D, BatchNormalization, ReLU, DepthwiseConv2D, ZeroPadding2D, GlobalAveragePooling2D, Reshape, Dropout, Activation |
| MobileNetV2 | 88 | 3.5M | Conv2D, BatchNormalization, ReLU, DepthwiseConv2D, ZeroPadding2D, Add, GlobalAveragePooling2D, Dense |
| NASNetMobile, NASNetLarge | - | 5.3M-88.9M | Conv2D, BatchNormalization, Activation, ZeroPadding2D, SeparableConv2D, Add, MaxPooling2D, AveragePooling2D, Cropping2D, Concatenate, GlobalAveragePooling2D, Dense |
| EfficientNet-B0 | - | 5.3M | Rescaling, Normalization, ZeroPadding2D, Conv2D, BatchNormalization, Activation, DepthwiseConv2D, GlobalAveragePooling2D, Reshape, Multiply, Dropout, Add, Dense |
| Efficient TBCNN | 0.23M | Conv2D, MaxPool2D, BatchNormalization, GlobalAveragePooling2D, Add, Dense | |
| Attention-56 | 115 | 31.9M | Conv2D, Lambda, MaxPool2D, UpSampling2D, AveragePooling2D, ZeroPadding2D, Dense, Add, Multiply, BatchNormalization, Dropout |
| Xception | 126 | 22.9M | Conv2D, BatchNormalization, Activation, SeparableConv2D, MaxPooling2D, Add, GlobalAveragePooling2D, Dense |
XAI techniques used in considered papers.
| Name of the XAI technique | Reference |
|---|---|
| Grad-CAM (gradient-weighted class activation mapping) | |
| LIME (local interpretable model-agnostic explanations) | |
| CAM (class activation mapping) | |
| Saliency (saliency map) | |
| Guided Backpropagation | |
| LRP (layer-wise relevance propagation) | |
| Occlusion (occlusion sensitivity) | |
| AM (activation mapping) | |
| Attribution maps | |
| DeepLIFT | |
| Feature maps | |
| Grad-CAM+ | |
| Guided Grad-CAM | |
| GSInquire | |
| Input X Gradient | |
| Integrated Gradients |
Fig. 5Examples of biased model explanations: a) [36], b) [27], c) [40], d) [47]. Red arrows in the image b) are marked by a radiologist to help locate the lesions. They were not present in the training set.
Summary showing which points from the checklist are fulfilled by the reviewed data resources.
| Checklist / Data resource | 2) | 3) | 4) | 5) | 6) | 7) | 8) | 9) | 10) | 11) | 12) | 13) | 14) | 17) | 23) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| [D] Does the data and its associated information provide sufficient diagnostic quality? | Y? | N? | N | N? | N? | N | N? | N | N? | N | N | N | Y | N? | N |
| [R] Are the low quality images rejected? | N | N | N | N | N | N | N | N | N | n/a | Y | N | N | Y? | N |
| [D] Is the dataset balanced in terms of sex and age? | Y? | ? | ? | Y? | Y? | Y | Y | ? | N | N | ? | ? | ? | Y? | ? |
| [R] Does the dataset contain one type of images (CT or X-ray or the same projection)? | Y | Y | Y | N | Y | Y | N | N | Y | Y | Y | Y | N | Y | N |
| [R] Are the lung structures visible (lung window) on CT images? | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | n/a | N | n/a | n/a | n/a | n/a | n/a |
| [D] Are images of children and of adults labeled as such within the dataset? | not all | N | N | Y? | Y | Y | Y | N | Y | not all | N | N | N | N | N |
| [R] Are images correctly categorized in relation to class of pathology? | N | N | Y | N | N | N | N | N | Y | N | Y | N | Y | N? | N |
| [D] Are AP/PA projections described for every X-ray image? | N | Y | N | Y | Y | Y | Y | N | Y | n/a | N | N | Y | Y | N |
Summary showing which points from the checklist are fulfilled by the peer-reviewed studies.
| Checklist / Study | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| [D] Is the data preprocessing described? | Y | N | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y |
| [D] Are artifacts (such as captions) removed? | ? | ? | Y | Y | Y | n/a | ? | Y | N | N | Y | n/a |
| [D] Are the lungs fully present after transformations? | ? | ? | ? | n/a | Y? | Y | n/a | ? | ? | ? | ? | N? |
| [R] Are lung structures visible after brightness or contrast transformations? | n/a | ? | n/a | n/a | n/a | Y | n/a | ? | n/a | ? | ? | n/a |
| [D] Are only sensible transformations applied? | Y | ? | Y | n/a | Y | Y | n/a | ? | ? | N | N | N |
| [D] Is the transfer learning procedure described? | Y | n/a | n/a | Y? | Y | n/a | Y | Y | Y | Y | Y | Y |
| [D] Is the applied transfer learning appropriate for this case? | N | n/a | n/a | N | N | n/a | N | Y? | N | N | N | N |
| [D] Are at least a few metrics of those proposed in | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y |
| [D] Is the model validated on a different database than the one used for training? | N | N | N | N | Y | Y | N | N | N | N | N | N |
| [R] Are other structures (i.e., bowel loops) misinterpreted as lungs in segmentation? | n/a | n/a | n/a | n/a | N | Y? | n/a | N | n/a | n/a | n/a | Y |
| [R] All the areas marked as highly explanatory are located inside the lungs? | Y | n/a | Y | n/a | Y | Y | Y? | Y | N | Y | Y | |
| [R] Are artifacts misidentified as part of the explanations? | Y | n/a | N | n/a | n/a | n/a | n/a | N | n/a | n/a | n/a | |
| [R] Are areas indicated as explanations consistent with opinions of radiologists? | N | n/a | n/a | n/a | n/a | n/a | n/a | n/a | Y | n/a | n/a | n/a |
| [R] Do explanations accurately indicate lesions? | Y? | n/a | Y? | n/a | Y? | Y | N | N | N | Y? | N | Y |