Pablo Vieira1,2, Orrana Sousa1, Deborah Magalhães3, Ricardo Rabêlo4, Romuere Silva1,3,4. 1. Electrical Engineering Department, Federal University of Piau, Picos, Brazil. 2. Development and Research, Maida.Health, Piau, Teresina, Brazil. 3. Information Systems Department, Federal University of Piau, Picos, Brazil. 4. Computer Science Department, Federal University of Piau, Teresina, Brazil.
Abstract
COVID-19 leads to radiological evidence of lower respiratory tract lesions, which support analysis to screen this disease using chest X-ray. In this scenario, deep learning techniques are applied to detect COVID-19 pneumonia in X-ray images, aiding a fast and precise diagnosis. Here, we investigate seven deep learning architectures associated with data augmentation and transfer learning techniques to detect different pneumonia types. We also propose an image resizing method with the maximum window function that preserves anatomical structures of the chest. The results are promising, reaching an accuracy of 99.8% considering COVID-19, normal, and viral and bacterial pneumonia classes. The differentiation between viral pneumonia and COVID-19 achieved an accuracy of 99.8%, and 99.9% of accuracy between COVID-19 and bacterial pneumonia. We also evaluated the impact of the proposed image resizing method on classification performance comparing with the bilinear interpolation; this pre-processing increased the classification rate regardless of the deep learning architectures used. We c ompared our results with ten related works in the state-of-the-art using eight sets of experiments, which showed that the proposed method outperformed them in most cases. Therefore, we demonstrate that deep learning models trained with pre-processed X-ray images could precisely assist the specialist in COVID-19 detection.
COVID-19 leads to radiological evidence of lower respiratory tract lesions, which support analysis to screen this disease using chest X-ray. In this scenario, deep learning techniques are applied to detect COVID-19 pneumonia in X-ray images, aiding a fast and precise diagnosis. Here, we investigate seven deep learning architectures associated with data augmentation and transfer learning techniques to detect different pneumonia types. We also propose an image resizing method with the maximum window function that preserves anatomical structures of the chest. The results are promising, reaching an accuracy of 99.8% considering COVID-19, normal, and viral and bacterial pneumonia classes. The differentiation between viral pneumonia and COVID-19 achieved an accuracy of 99.8%, and 99.9% of accuracy between COVID-19 and bacterial pneumonia. We also evaluated the impact of the proposed image resizing method on classification performance comparing with the bilinear interpolation; this pre-processing increased the classification rate regardless of the deep learning architectures used. We c ompared our results with ten related works in the state-of-the-art using eight sets of experiments, which showed that the proposed method outperformed them in most cases. Therefore, we demonstrate that deep learning models trained with pre-processed X-ray images could precisely assist the specialist in COVID-19 detection.
Acute infections of the lower respiratory tract have been a significant cause of disease and mortality worldwide [1]. Among them, pneumonia affects millions of people each year, presenting significant risks for children, adults aged 65 and over, and individuals with health problems such as diabetes, obesity, and high blood pressure, among others. Pneumonia exists for more than 30 different causes, but its generators are commonly viral and bacterial [2].In December 2019, a new virus of the coronavirus family appeared, called Acute Respiratory Syndrome Coronavirus 2 (SARS-COV-2). In March 2020, the World Health Organization (WHO) declared COVID-19 as a pandemic, and it already has 83 million confirmed cases and 1.8 million deaths globally [3]. At the beginning of 2021, new, more transmissible variants of the virus were detected in various parts of the world [4], worrying world leaders, health professionals, and the population with its rapid spread, which affects the economy and health systems worldwide.Although COVID-19 causes mild and moderate symptoms in most cases, some of the infected have severe symptoms, such as pneumonia, which is characterized by peripheral distribution, ground-glass opacity, fine reticular opacity, and vascular spacing [5]. The acute cases of respiratory syndrome can lead to multiple organ failure and death. Early diagnosis is crucial to prevent the proliferation of SARS-COV-2; consequently, consistent and effective testing is necessary.The standard test method is the Real-Time Reverse Transcription Polymerase Chain Reaction (real-time RT-PCR), which can cost between US$ 120 and US$ 130. It requires a specialized biosafety laboratory to host the Polymerase Chain Reaction (PCR) machine, costing between US$ 15,000 and US$ 90,000 [6]. Different body fluids, such as blood, sputum, feces, and urine, can be used. However, the pharyngeal swabs sample is commonly adopted, with a positivity rate of only 63% [7]. Thus, the RT-PCR is expensive, time-consuming, and produces a relatively low accuracy rate, making it difficult to identify and treat patients.The usage of chest X-rays can help screen cases of COVID-19. They were already applied to detect the SARS-COV-1 and Middle East Respiratory Syndrome (MERS) coronavirus, as they allow a quick, consistent, and cheaper pneumonia detection. COVID-19 lead to radiological evidence of lower respiratory tract lesions of lung even in patients who do not have clinical pneumonia [8], which favors detection by radiographs of a larger group of contaminated people. To this end, it is critical to differentiate the COVID-19 pneumonia radiographs from those with other types of pneumonia since they may have some similarities in the lung’s affected areas.Computer-Aided Diagnosis (CAD) systems can reduce the observational oversights and, consequently, false-negative rates, supporting the fast screening of pneumonia caused by SARS-COV-2, which reduces the health professional’s workload [9]. In this scenario, Deep Learning techniques have been adopted to detect COVID-19 pneumonia in X-ray images [10], [11]. These works obtain significant results, reaching more than 90% of correctness in the COVID-19 pneumonia classification.Inspired by these studies, this work investigates pulmonary diseases detection based on X-ray image processing and Deep Learning techniques. Thus, our objective lies in radiographic image classification considering several types of pneumonia, including COVID-19. To address this objective, we present three contributions: i) a complete methodology for automatic detection of the presence of pulmonary diseases in X-ray images, including COVID-19; ii) a pre-processing methodology which includes a new image resizing method with the maximum window function that preserves anatomical structures of the chest in X-ray; iii) an evaluation methodology that comprises seven different CNN architectures combined with transfer-learning and fine-tuning of all layers of CNN. In the evaluation, we contemplate viral and bacterial pneumonia, no finding, pneumonia in comorbidity with other diseases, and COVID-19.
Related works
Chowdhury et al. [12] adopted data augmentation in an X-ray image dataset composed of 423 samples of COVID-19, 1485 of Viral Pneumonia, and 1579 of Normal images. The authors also applied Transfer-Learning techniques with different CNN architectures: MobileNetv2, SqueezeNet, ResNet18, ResNet101, DenseNet201, CheXNet, Inceptionv3, and VGG19 with ImageNet trained weights. The results obtained in the COVID-19 and Normal classification had an accurracy of 0.997, a precision of 0.997, a sensitivity of 0.996, an F1-Score of 0.997, and a specificity of 0.995. The classification results involving COVID-19, Normal and Viral Pneumonia had an accuracy of 0.979, a precision of 0.975, a sensitivity of 0.979, an F1-Score of 0.979, and a specificity of 0.988. Given these promising results, it is essential to evaluate the classification methodology considering other classes provided by the adopted dataset, such as bacterial pneumonia.Waheed et al. [13] carried out COVID-19 X-ray images classification based on an Auxiliary Classifier Generative Adversarial Network (ACGAN) methodology. The authors applied the fine adjustment of the fully connected layer weights and resized the input images (112 112). In the pre-processing stage, the author performed data augmentation of a dataset with 403 images of COVID-19 and 721 images of the Normal-type. The results achieved an accuracy of 0.960, a sensitivity of 0.900, and a specificity of 0.970. The results obtained in this study are promising; however, the resizing employed, (112 112), significantly reduced the images’ dimensions so that it may have lost a lot of important information.Brunese et al. [14] detected pneumonia using a method that highlights chest radiography areas using VGG16 CNN architecture. Their study classified 250 images of COVID-19, 3520 images of Healthy samples, and 2753 images of Pneumonia; the results obtained had a sensitivity of 0.960, a specificity of 0.980, an F1-Score of 0.940, and an accuracy of 0.960. Despite the encouraging result, the authors do not explore other possibilities, such as pre-processing images.Khan et al. [15] proposed CoroNet, a CNN model to detect pneumonia from chest X-ray images automatically. The model was based on the pre-trained Xception architecture in the ImageNet dataset. The authors used 284 images of COVID-19, 310 Normal, 320 Bacterial Pneumonia, and 327 Viral Pneumonia. They obtained an accuracy of 0.900, a recall of 0.890, a specificity of 0.960, an F1-Score of 0.890, and a precision of 0.890. In this work, the authors, as in [14], did not explore pre-processing.In the study [16], a deep learning model was developed to classify the pneumonia caused by COVID-19 in X-ray images. The authors employed the transfer-learning on their CNN. For this experiment, 181 images of COVID-19 and 364 of the No-Finding type were used. The results obtained had an accuracy of 0.963 and a loss of 0.151. In this research, private images were used, which makes it difficult for third parties to reproduce the results.Togaar, Ergen and Comert [11] detected pneumonia caused by COVID-19 in X-ray images using the Fuzzy Color Technique. Also, they extracted the features of images via the MobileNet and SquezeNet to classify them with SVM. The authors used 295 images of COVID-19, 65 of Healthy, and 98 of Pneumonia. This study obtained an accuracy of 0.993 and an F1-Score of 0.993. Applying the proposed methodology in larger datasets and using different evaluation metrics could make the results more robust and comparable with the literature.Mahmud et al. [17] did several classification experiments of X-ray images to detect different pneumonia types. To this end, they developed the CNN ConvxNet. The dataset used was 305 images of COVID-19 Pneumonia, 1538 of Normal, 1493 of Viral Pneumonia, and 2780 of Bacterial Pneumonia. Considering COVID-19 and Normal samples, the classification results had an accuracy of 0.974, an AUC of 0.969, a precision of 0.963, a recall of 0.970 a specificity of 0.947, and an F1-Score of 0.971. Considering COVID-19 and Pneumonia Viral classes, the results obtained had an accuracy of 0.870, an AUC of 0.920, a precision of 0.880 a recall of 0.870 a specificity of 0.850, and an F1-Score of 0.870. Considering COVID-19 and Bacterial Pneumonia, the results obtained had an accuracy of 0.940, an AUC of 0.950, a precision of 0.930 a recall of 0.940 a specificity of 0.930, and an F1-Score of 0.930. The results considering only the classes of pneumonia had an accuracy of 0.890 an AUC of 0.900, a precision of 0.880, a recall of 0.900 a specificity of 0.876 and an F1-Score of 0.890. In the last experiment, the authors looked upon the whole cases, including Normal ones. The results had an accuracy of 0.900 an AUC of 0.910, a precision of 0.900 a recall of 0.890 a specificity of 0.890, and an F1-Score of 0.900. In this work, the images used were from private datasets, which hinders their reproduction by third parties.To identify different types of pneumonia, Rajaraman Sivaramakrishnan [10] performed lungs’ segmentation of X-ray images with CNN U-Net. The author then classified the images with VGG-16, Inception-V3, Xception, DenseNet-121, and NasNet-Mobile. The dataset was composed of 314 of COVID-19 Pneumonia, 1583 Normal samples, 2780 of Bacterial Pneumonia, 1493 of Viral Pneumonia, and 11,002 of Varied Pneumonia. The classification results achieved an accuracy of 0.930 an AUC of 0.950, a sensitivity of 0.970 a specificity of 0.860 a precision of 0.860, and an F1-Score of 0.940 for COVID-19, Viral and Bacterial types of pneumonia, and Normal classes. The results considering COVID-19 and Pneumonia classes had an accuracy of 0.910 an AUC of 0.960, a sensitivity of 0.950 a specificity of 0.850 a precision of 0.910, and an F1-Score of 0.930. Although U-Net segmentation improved the results, it involves a high computational cost. The aforementioned with U-Net is a good proposal; however, the authors can, in addition to providing, explore other techniques such as fine-tuning and cross-validation.In the work [18], the authors applied data augmentation and adopted the DarkCovidNet, inspired by DarkNet, to detect pneumonia. The dataset was composed of 127 COVID-19 Pneumonia images, 500 of Pneumonia, and 500 samples of No-Finding class. The results for COVID-19 Pneumonia and No-Finding classes achieved a sensitivity of 0.951 a specificity of 0.953 a precision of 0.980 an F-Score of 0.965, and an accuracy of 0.980. Considering all images, the results had a sensitivity of 0.853 a specificity of 0.921 a precision of 0.899 an F-Score of 0.873, and an accuracy of 0.870. In this work, it was not possible to identify whether the images were public.Rahimzadeh and Attar [19] adopted data augmentation to balance the dataset and resized the images (300 300). They used transfer learning with ImageNet on Xception and Resnet50 architectures. The dataset is composed of 180 images of COVID-19 Pneumonia, 6.054 of Pneumonia, and 8.851 of Normal-type. The results obtained achieved an accuracy of 0.914. Despite the small number of COVID-19 Pneumonia images, the result was representative; however, the study did not use cross-validation.The state-of-the-art survey showed the difficult to acquire X-ray images with COVID-19. Most of the datasets are private and the public ones are outnumbered compared with other problem classes, as viral or bacterial pneumonia. In this way, we face an unbalanced data problem. Still, we observed that, despite being essential in computer vision systems, few studies have investigated the pre-processing step. These gaps lead us to the proposal of this work.
Proposed methodology
The methodology followed in this study consists of 5 stages: 1) acquisition of X-ray images; 2) application of pre-processing techniques to improve essential characteristics for the problem; 3) data augmentation to reduce the imbalance between datasets; 4) training of CNNs to classify the images; 5) validation of the results. Fig. 1
illustrates these steps.
Fig. 1
Steps of the proposed method. 1st Acquisition of X-ray datasets; 2nd pre-processing steps; 3rd data augmentation techniques; 4th Features extraction and classification; and 5th Validation.
Steps of the proposed method. 1st Acquisition of X-ray datasets; 2nd pre-processing steps; 3rd data augmentation techniques; 4th Features extraction and classification; and 5th Validation.
Image acquisition
We used five different image datasets, which allowed us to perform several experiments. In the first stage, we collected frontal chest X-ray images from the datasets of COVID-DB [20], COVID-19 [21], and COVID-19-AR [22]. In this work, we merged these three datasets and named it COVID-DB. The other two datasets are composed of frontal chest X-ray images of other types of pneumonia besides COVID-19 pneumonia diseases. Thus, we used two datasets: the NIH Chest X-ray [23]; and the Pneumonia Chest X-ray [24]. Table 1
provides a summary of datasets class distribution.
Table 1
Overview of image datasets with their distribution per class.
Classes
COVID-DB [20], [21], [22]
NIH Chest X-ray [23]
Pneumonia Chest X-ray [24]
Healthy/No finding
-
60,412
1583
Pneumonia
-
307
-
Pneumonia with
other comorbidities
-
1035
-
Viral pneumonia
-
-
1493
Bacterial pneumonia
-
-
2782
COVID-19 pneumonia
717
-
-
Overview of image datasets with their distribution per class.The COVID-DB [20] is an open dataset of chest X-ray and computed tomography (CT) images of positive or suspected COVID-19 and other pneumoniapatients. This dataset provided 565 images. The COVID-19 Dataset [21] is public provided by the Italiana de Radiologia Medica e Interventistica, which provides X-ray and chest CT images of patients positive for COVID-19; we used the 57 samples of this dataset. COVID-19-AR [22] is a chest image dataset, both from X-rays and CT, with clinical and genomic correlations representing a rural population positive for COVID-19; we got 95 images from that dataset. Our investigation in this direction started in March 2020; our first dataset contained 292 images. After many investigations in trusted repositories, we were able to collect a total of 717 positive X-ray images for COVID-19, organizing the most extensive public dataset in state-of-the-art.The NIH Chest X-ray [23], was made available by the National Health Institute (NIH), with a set of 112,120 chest X-ray images from 30,805 patients. This dataset consists of 14 thoracic pathologies. However, we used three groups of the dataset: 1) images with no findings (healthy) composed by 60,412 samples, an essential class in any method of classifying medical images; 2) images with only Pneumonia composed by 307 samples; and 3) images with Pneumonia in Comorbidity with other diseases composed by 1035 samples. We chose images of patients with pneumonia because they had similar symptoms to the one caused by COVID-19, which can lead a patient to the hospital with suspicion of this infection.Pneumonia Chest X-ray [24] is a dataset of X-ray images from the frontal chest of patients with pneumonia. This dataset consists of 5863 images divided into three classes: bacterial pneumonia with 2782 images; viral pneumonia with 1493 images; and healthy with 1583 images. The idea of using this dataset is to test the classification behavior of the COVID-19 with different types of pneumonia.
Pre-processing
To improve the process of detecting and classifying diseases in X-ray images, we have applied pre-processing techniques.In X-ray images, the bones and lung tissues have a smooth border, making it difficult to detect anomalies. To overcome this problem, we improved the contrast of the images by applying a variant of Adaptive Histogram Equalization (EHA), the Limited Contrast Adaptive Histogram Equalization (CLAHE), which also divided the images into blocks equalized by the histogram of these regions. However, if there is noise in the region, treatment is done by limiting contrast [25]. In Fig. 2
b, we present an example of a CLAHE filter applied to the chest radiography image of Fig.
Fig. 2
Steps of pre-processing: (a) Chest X-ray original image; (b) application of CLAHE filter; (c) adding zero padding.
Steps of pre-processing: (a) Chest X-ray original image; (b) application of CLAHE filter; (c) adding zero padding.The CNNs employed in this study have square sized images as input (224 224), (299 299), and (331 331), making it necessary to resize images to those dimensions. We did a zero-padding process to avoid distortions in resizing; it allows a square shape that preserves the image proportions. In Fig. 2c, we show the application of zero-padding process in to the chest X-ray image.The resizing of images to the standard CNN entry size is usually done using the bilinear interpolation [26] technique, which considers a 2 2 pixel neighborhood, where a weighted average of that neighborhood is computed to arrive at the final interpolated value. As a result, we observed a smooth and blurry image. The interpolation implies essential information loss. Thus, we developed a method to minimize this loss applied to chest radiological classification, using the function based on Max Pooling.Usually, the main anomalies presented in a chest X-ray are presented in brighter intensities [27]. When a Bilinear Interpolation is applied in regions like these, this information is lost. This scenario is worst depending on the resizing factor, and the anomalies region could not be present after this process. In this way, we proposed to use a maximum window (Max Pooling) instead of the average one. This process leads to a faster convergence rate, selecting important information to improve the chest radiological classification.By adopting this form of reducing the dimensions of the images, we managed to preserve a significant number of pixels of greater intensity, as can be seen in Fig. 3
. It is important to note that what characterizes pneumonia, caused by COVID-19, virus or bacteria, in X-ray images is the alveoli full of secretions. However, an X-ray cannot pass through the lung completely. Parts of rays are absorbed by the secretion accumulated in the lung, leaving white spots in places that should be darker [27].
Fig. 3
Examples of resizing, COVID-19 type image with its original dimensions (a); Standard Resize Bilinear Interpolation (b); Resize produced using the Max Pool (c). Visualization of the preservation of the alveoli containing secretions: we can visualize the heat map of the cuts in (d), (e) and (f).
Examples of resizing, COVID-19 type image with its original dimensions (a); Standard Resize Bilinear Interpolation (b); Resize produced using the Max Pool (c). Visualization of the preservation of the alveoli containing secretions: we can visualize the heat map of the cuts in (d), (e) and (f).To better understand the operation of the preservation of the alveoli filled with secretions, we generated a heat map of the cuts (Figs. 3a, 3 b and 3 c). The result can be seen in Fig. 3d. We observe in 3 f that the characteristics that discriminate against COVID-19 are better preserved than by the standard resize method 3 e.
Data augmentation
One of the most significant difficulties encountered in the chest X-ray classification problem is the scarcity of images, making the datasets unbalanced. We adopted data augmentation to overcome this problem that encompasses techniques that improve the size and quality of training datasets. In this way, deep learning models can improve its learning by enhancing the generalization.In this research, we used six data augmentation techniques: 1) image rotation, varying the degrees from 0o to 6o, clockwise and counterclockwise. These values were chosen because they preserve the natural characteristics of the image, mainly of the lung; 2) defocusing using a median filter with 3x3, 5x5 and 7x7 kernels, as larger values lose important information; 3) zoom ranging from 0% to 30%, since larger values were inadequate for this type of image risking the loss of essential parts such as pieces of the lung; 4) mirroring; 5) height and width shift of range from 0% to 10%; 6) the increase of brightness from 10% to 20%. We also combined more than one operation randomly. Fig. 4
exemplifies the data augmentation.
Fig. 4
Examples of data augmentation used in our method. (a) Original image; (b) negative zoom, rotation and mirroring; (c) zoom and rotation; (d) zoom and rotation; (e) zoom and mirroring; (f) zoom, mirroring; (g) negative zoom, mirroring, rotation and change of track; (h) negative zoom and change of track, (i) zoom and rotation; (j) mirroring, rotation and change of lane; (k) rotation and negative zoom; (l) mirroring, zoom and rotation; (m) defocusing; (n) negative zoom and change of track; (o) mirroring and negative zoom.
Examples of data augmentation used in our method. (a) Original image; (b) negative zoom, rotation and mirroring; (c) zoom and rotation; (d) zoom and rotation; (e) zoom and mirroring; (f) zoom, mirroring; (g) negative zoom, mirroring, rotation and change of track; (h) negative zoom and change of track, (i) zoom and rotation; (j) mirroring, rotation and change of lane; (k) rotation and negative zoom; (l) mirroring, zoom and rotation; (m) defocusing; (n) negative zoom and change of track; (o) mirroring and negative zoom.
Convolutional neural network
In this work, we use CNN for the classification of chest X-ray images. This choice was due to its power of generalization in the classification of medical images presented in recent years [11], [12], [18]. Our survey in the literature also showed that CNN is the best method for classifying COVID-19 in digital images.CNNs are a particular type of multilayered neural networks that can be trained with a version of the backpropagation algorithm [28]. CNN differs from other neural networks only in architecture; they add layers of convolutions responsible for highlighting characteristics that can facilitate image classification. They are designed to recognize visual patterns from pixel images, inspired by biological processes. In them, the connectivity pattern between neurons is inspired by the organization of the visual cortex of animals.Fig. 5
shows the CNNs’process to extract features from the input image. The convolution filters are applied to extract characteristics and the pooling layer to reduce the image’s dimensions. This process can be repeated n times depending on the architecture. In the end, a vector of characteristics is generated and used by the multilayer neural network, also known as fully connected layers, to classify the input.
Fig. 5
Generic CNN architecture based on LeCun et al. [28]. It illustrates the main processes of a CNN and its layers.
Generic CNN architecture based on LeCun et al. [28]. It illustrates the main processes of a CNN and its layers.This work applied seven different CNN architectures: DenseNet201 (DN201) proposed by Huang et al. [29]; IncepetionResNetV2 (IRNV2) proposed by Szegedy et al. [30]; InceptionV3 (IV3) proposed by Szegedy et al. [31]; NASNetLarge (NASNL) proposed by Zoph et al. [32]; ResNet50 (RN50) proposed by He et al. [33]; VGG16 proposed by Simonyan and Zisserman [34]; and Xception (XC) proposed by Chollet [35]. We chose these architectures based on their performance in the ImageNet database challenge [36], a database with more than one million images and ten thousand classes used to assess an architecture performance.
Fine-tuning
There is a field of theoretical physics that studies how a model’s parameters must be adjusted very precisely for individual specific needs, which is called the fine adjustment [37].Paras. [36] observed that transfer learning using pre-trained networks in non-medical images, like ImageNet, performed better than networks trained from scratch to solve classification problems in X-ray images. It has also been reported that pre-training with grayscale images results in better accuracy than training with color images when using transfer learning for training with grayscale chest radiographs.Tajbakhsh et al. [38] showed in his study that training a CNN from scratch is complex, as it requires a large amount of data for the training to guarantee adequate convergence. An alternative to this problem would be to fine-tune a pre-trained CNN. Analyzing radiological images of cardiology and gastroenterology in their classifications, detection, and segmentation, they reached four conclusions: a pre-trained CNN with adequate fine-tuning surpassed or, at least, worked as well as a CNN trained from scratch; adjusted CNNs were more robust to the size of training sets than trained CNNs from scratch; neither the surface adjustment nor the profound adjustment was the ideal choice for a specific application; its layered fine-tuning scheme could offer a practical way to achieve the best performance for the available application based on the amount of data available.When using pre-trained CNNs to classify X-ray images, we realized the need to adjust the models. Fine-tuning uses transfer learning, in which architecture is usually pre-trained in another dataset; in our work, we use ImageNet [36]. Fine-tuning consists of removing the fully connected layers and then inserting a new set of layers with randomized weight initialization.The fine adjustment consists of retraining only the new fully connected layers, with a low rate to adjust the parameters. Convolutional layers are frozen, so their kernels are not trainable. Based on the number of COVID-19 images we obtained, as seen in Table 1, we chose to perform a Deep-Fine-Tuning, not freezing convolutional layers and enabling the adjustment of every model parameter.
Metrics
To evaluate the performance of our method, we use the k-fold cross-validation. In this approach, samples are randomly divided into k sets of equal sizes. In each of the folds, a single set of k is retained as validation data, and the remaining k-1 sets are used as training data. This process is repeated k times, each of the k sets used once for validation. The results of all k-folds are then calculated to evaluate the model. The average performance is used as an index for evaluating the method. This approach is computationally expensive but allows to test the entire dataset. This technique is suitable, especially when the number of changes between classes is unbalanced. This approach can also demonstrate how the trained model is generalizable to unseen data.To analyze our methodology’s performance, we use k=10 in the cross-validation. For each fold, we use the following metrics: Kappa index, Accuracy (Acc), F-Score, area under ROC curve (AUC), Sensitivity and Specificity. Those metrics are computed in terms of True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN).The Kappa index () is used as a measure to represent the confusion matrix [39]. For this reason, it was the metric used to infer the best models. This index is an agreement coefficient for nominal scales that measures the relationship between agreements. It can be computed as in Eq. 1.where,and,A more logical interpretation of the analysis of the kappa metric results can be found in Table 2
.
Table 2
Kappa index agreement.
Value of Kappa
Level of Agreement
% of Data that are Reliable
0-0.20
None
0 - 4%
0.21 - 0.39
Minimal
4 - 15%
0.40 - 0.59
Weak
15 - 35%
0.60 - 0.79
Moderate
35 - 63%
0.80 - 0.90
Strong
64 - 81%
Above 0.90
Almost Perfect
82 - 100%
Kappa index agreement.
Experiments and results
Experiments
Using the three datasets detailed in Section 3.1, we performed eleven experiments. Our goal was to explore the datasets using the available information to identify pulmonary diseases. The first set of experiments were carried out using COVID-DB and NIH Chest X-ray datasets, in which five experiments were performed. Four experiments were performed with the COVID-DB and Pneumonia Chest-X-ray datasets, which allowed us to analyze the methods behavior in different types of pneumonia. We also experimented with the three datasets simultaneously, COVID-DB, NIH Chest X-ray, and Pneumonia Chest X-ray. In this experiment, we analyzed the image classification behavior with COVID-19 and all Pneumonia, Normal, and No finding. The last experiment carried out in this work was the classification of the COVID-19 classes of the COVID-DB dataset and the Pneumonia class of the NIH Chest X-ray dataset. This classification was redone using the bilinear interpolation resizing technique, which allowed us to analyze the gain that our resizing method brought to this classification problem. Table 3
summarizes all the adopted experiments.
Table 3
Table with the summary of the experiments carried out in our research.
Datasets & Experiment & Classes
COVID-DB & NIH Chest X-ray
COVID-19 & Pneumonia
COVID-19 & Pneumonia with Comorbidity
COVID-19 & No Finding
COVID-19 & No Finding & Pneumonia
COVID-19 & No Finding & Pneumonia with Comorbidity
COVID-DB & Pneumonia Chest X-ray
COVID-19 & Normal
COVID-19 & Bacterial Pneumonia
COVID-19 & Viral Pneumonia
COVID-19 & Normal & Viral Pneumonia & Bacterial Pneumonia
COVID-DB & NIH Chest X-ray (Validation Method Max Pool Resize)
COVID-19 & Pneumonia & No Finding
Table with the summary of the experiments carried out in our research.Table 3 summarizes all performed experiments, which allowed us to evaluate the proposed method in several scenarios. The experiments were carried out using 10-fold cross-validation, and the metrics Kappa, Acc, F-Score, AUC, Sensitivity, and Specificity for performance evaluation.
Results
In this section, we report the results obtained for each experiment (Tables 4
–6
). Also, we evaluate the proposed resize method comparing with the bilinear interpolation (Table 7). Finally, we compare our results with those in the state-of-the-art (Table 8).
Table 4
Results obtained over all the experiments carried out to detect COVID-19 in X-ray images in the NIH Chest X-ray image set.
Method
Kappa
Acc
F-Score
AUC
Sensitivity
Specificity
COVID-19 Pneumonia & Pneumonia
DN201
0.872±0.060
0.944±0.028
0.937±0.032
0.955±0.026
0.989±0.016
0.864±0.069
IRNV2
0.916±0.057
0.963±0.026
0.958±0.029
0.968±0.018
0.991±0.007
0.910±0.071
IV3
0.895±0.068
0.953±0.031
0.947±0.034
0.964±0.026
0.995±0.008
0.878±0.073
NASNL
0.977±0.028
0.990±0.012
0.989±0.014
0.989±0.011
0.995±0.007
0.982±0.036
RN50
0.977±0.024
0.990±0.010
0.989±0.012
0.989±0.010
0.995±0.007
0.982±0.032
VGG16
0.970±0.032
0.987±0.014
0.985±0.016
0.985±0.016
0.992±0.010
0.978±0.029
XC
0.894±0.059
0.953±0.027
0.947±0.030
0.964±0.020
0.995±0.007
0.890±0.051
COVID-19 Pneumonia & Pneumonia Comorbidity
DN201
0.896±0.029
0.951±0.014
0.948±0.015
0.940±0.016
0.992±0.008
0.931±0.022
IRNV2
0.926±0.028
0.965±0.013
0.963±0.014
0.958±0.016
0.989±0.010
0.955±0.022
IV3
0.897±0.025
0.951±0.012
0.948±0.013
0.941±0.014
0.991±0.013
0.929±0.016
NASNL
0.951±0.031
0.977±0.015
0.975±0.015
0.972±0.018
0.993±0.021
0.957±0.030
RN50
0.966±0.022
0.983±0.010
0.983±0.011
0.981±0.013
0.990±0.012
0.975±0.026
VGG16
0.968±0.020
0.985±0.010
0.984±0.010
0.983±0.011
0.990±0.012
0.981±0.014
XC
0.903±0.021
0.951±0.010
0.951±0.010
0.944±0.011
0.992±0.008
0.932±0.013
COVID-19 Pneumonia & No Finding
DN201
0.923±0.011
0.986±0.002
0.961±0.005
0.960±0.003
0.933±0.020
0.992±0.000
IRNV2
0.929±0.003
0.987±0.001
0.965±0.002
0.967±0.008
0.932±0.017
0.994±0.002
IV3
0.925±0.022
0.986±0.004
0.962±0.011
0.966±0.013
0.934±0.024
0.993±0.001
NASNL
0.932±0.029
0.988±0.005
0.966±0.014
0.962±0.017
0.950±0.024
0.992±0.004
RN50
0.981±0.006
0.997±0.001
0.990±0.003
0.987±0.002
0.995±0.012
0.997±0.001
VGG16
0.969±0.005
0.994±0.001
0.985±0.003
0.984±0.007
0.972±0.010
0.997±0.001
XC
0.923±0.023
0.986±0.004
0.961±0.012
0.960±0.013
0.928±0.032
0.993±0.002
COVID-19 Pneumonia & No Finding & Pneumonia
DN201
0.910±0.024
0.979±0.006
0.919±0.019
0.947±0.012
0.953±0.033
0.992±0.002
IRNV2
0.913±0.020
0.980±0.005
0.924±0.017
0.952±0.014
0.942±0.027
0.992±0.002
IV3
0.910±0.022
0.979±0.005
0.919±0.020
0.943±0.014
0.957±0.021
0.993±0.003
NASNL
0.950±0.029
0.988±0.007
0.952±0.029
0.971±0.017
0.966±0.018
0.995±0.003
RN50
0.957±0.014
0.990±0.003
0.961±0.014
0.972±0.013
0.968±0.016
0.996±0.003
VGG16
0.957±0.015
0.990±0.004
0.957±0.015
0.974±0.010
0.981±0.022
0.994±0.004
XC
0.908±0.029
0.979±0.007
0.919±0.024
0.947±0.015
0.945±0.028
0.992±0.004
COVID-19 Pneumonia & No Finding & Pneumonia Comorbidity
DN201
0.925±0.014
0.974±0.005
0.945±0.010
0.958±0.010
0.951±0.033
0.991±0.003
IRNV2
0.923±0.013
0.973±0.005
0.944±0.010
0.958±0.011
0.947±0.033
0.991±0.004
IV3
0.930±0.016
0.976±0.005
0.947±0.012
0.960±0.010
0.963±0.024
0.990±0.004
NASNL
0.930±0.022
0.976±0.007
0.949±0.016
0.961±0.016
0.952±0.022
0.992±0.004
RN50
0.974±0.009
0.991±0.003
0.979±0.008
0.984±0.007
0.986±0.012
0.995±0.003
VGG16
0.975±0.009
0.991±0.003
0.980±0.008
0.983±0.007
0.983±0.018
0.996±0.003
XC
0.928±0.017
0.976±0.006
0.948±0.014
0.959±0.012
0.959±0.026
0.991±0.004
Table 6
COVID-19 Pneumonia, Pneumonia, and Normal/No Finding classes in X-ray images using the combination of the COVID-DB, NIH Chest X-ray, and Pneumonia Chest X-ray datasets.
Method
Kappa
Acc
F-Score
AUC
Sensitivity
Specificity
COVID-19 Pneumonia & All Pneumonia & Normal/No Finding
DN201
0.953±0.015
0.974±0.009
0.965±0.009
0.978±0.009
0.929±0.029
0.997±0.003
IRNV2
0.952±0.025
0.973±0.014
0.967±0.013
0.978±0.011
0.936±0.023
0.998±0.003
IV3
0.952±0.016
0.973±0.009
0.965±0.010
0.975±0.010
0.946±0.034
0.997±0.003
NASNL
0.957±0.021
0.976±0.012
0.971±0.012
0.981±0.006
0.942±0.046
0.998±0.003
RN50
0.939±0.019
0.966±0.010
0.954±0.010
0.972±0.009
0.894±0.025
0.998±0.002
VGG16
0.951±0.014
0.973±0.008
0.966±0.013
0.978±0.010
0.928±0.031
0.992±0.017
XC
0.958±0.015
0.973±0.008
0.971±0.007
0.982±0.007
0.934±0.034
0.999±0.002
Table 7
Classification experiment results considering the classes COVID-19, Pneumonia, and No Finding with the resizing method with max pool and without this method.
Method
Kappa
Acc
F-Score
AUC
Sensitivity
Specificity
COVID-19 Pneumonia & No Finding & Pneumonia
Bilinear interpolation
DN201
0.894±0.047
0.975±0.009
0.888±0.058
0.921±0.044
0.972±0.027
0.992±0.003
IRNV2
0.871±0.100
0.970±0.019
0.874±0.092
0.914±0.076
0.943±0.054
0.989±0.011
IV3
0.884±0.106
0.969±0.021
0.892±0.108
0.927±0.070
0.913±0.096
0.988±0.013
NASNL
0.904±0.054
0.977±0.010
0.901±0.060
0.928±0.047
0.945±0.040
0.993±0.003
RN50
0.935±0.033
0.984±0.008
0.930±0.038
0.957±0.026
0.971±0.043
0.995±0.004
VGG16
0.936±0.023
0.984±0.007
0.931±0.028
0.958±0.016
0.948±0.027
0.995±0.003
XC
0.919±0.022
0.969±0.006
0.924±0.033
0.949±0.022
0.966±0.027
0.992±0.005
Proposed Resize
DN201
0.910±0.024
0.979±0.006
0.919±0.019
0.947±0.012
0.953±0.033
0.992±0.002
IRNV2
0.913±0.020
0.980±0.005
0.924±0.017
0.952±0.014
0.942±0.027
0.992±0.002
IV3
0.910±0.022
0.979±0.005
0.919±0.020
0.943±0.014
0.957±0.021
0.993±0.003
NASNL
0.950±0.029
0.988±0.007
0.952±0.029
0.971±0.017
0.966±0.018
0.995±0.003
RN50
0.957±0.014
0.990±0.003
0.961±0.014
0.972±0.013
0.968±0.016
0.996±0.003
VGG16
0.957±0.015
0.990±0.004
0.957±0.015
0.974±0.010
0.981±0.022
0.994±0.004
XC
0.908±0.029
0.979±0.007
0.919±0.024
0.947±0.015
0.945±0.028
0.992±0.004
Table 8
State of the art results to COVID-19 Pneumonia detection in comparison with our methodology.
Methods
Number of Images
Kappa
Acc
F-Score
AUC
Sensitivity
Specificity
Normal & COVID-19
[13]
721/403
-
0.960
-
-
0.900
0.970
[12]
1,579/423
-
0.997
0.997
-
0.996
0.995
[17]
1,538/305
-
0.974
0.971
0.969
-
0.947
Our
1,583/717
0.991
0.996
0.995
0.995
0.996
0.997
Pneumonia & COVID-19
[10]
11,002/314
-
0.910
0.930
0.960
0.850
0.950
Our
307/717
0.977
0.990
0.989
0.989
0.995
0.982
Normal & Pneumonia & COVID-19
[10]
1,583/11,002/314
-
0.930
0.940
0.950
-
-
[11]
65/98/295
-
0.993
0.993
-
-
-
[14]
3,520/2,753/250
-
0.960
0.940
-
0.960
0.980
[19]
8,851/6,054/180
-
0.914
-
-
-
-
Our
1,583/4,275/717
0.993
0.998
0.996
0.997
0.992
0.999
Pneumonia Viral & COVID-19
[17]
1,493/305
-
0.870
0.870
0.920
-
0.850
Our
1,493/717
0.995
0.998
0.997
0.998
0.999
0.999
Pneumonia Bacterial & COVID-19
[17]
2,780/305
-
0.940
0.930
0.950
-
0.930
Our
2,782/717
0.996
0.999
0.998
0.997
0.998
0.998
Normal & Viral & Bacterial & COVID-19
[10]
1,583/1,493/2,780/314
-
0.930
0.940
0.950
-
-
[15]
310/327/320/284
-
0.900
0.890
-
-
0.960
[17]
1,583/1,493/2,780/305
-
0.900
0.900
0.910
-
0.890
Our
1,583/1,493/2,782/717
0.993
0.998
0.996
0.997
0.992
0.999
No-Finging & COVID-19
[16]
364/181
-
0.963
-
-
-
-
[18]
500/127
-
0.980
0.965
-
0.951
0.953
Our
60,412/717
0.981
0.997
0.990
0.987
0.995
0.997
No-Finging & Pneumonia & COVID-19
[18]
500/500/127
-
0.870
0.873
-
0.853
0.8212
Our
60,412/307/717
0.957
0.990
0.961
0.974
0.981
0.996
Results obtained over all the experiments carried out to detect COVID-19 in X-ray images in the NIH Chest X-ray image set.Results obtained throughout all the experiments carried out using COVID-DB and Pneumonia Chest X-ray datasets.COVID-19 Pneumonia, Pneumonia, and Normal/No Finding classes in X-ray images using the combination of the COVID-DB, NIH Chest X-ray, and Pneumonia Chest X-ray datasets.Table 4 shows the results obtained in the experiments performed with the COVID-DB and NIH chest X-ray datasets. Corroborating the literature, the ResNet50 and VGG16 architectures present better results in the classification of radiographic images containing COVID-19 than other architectures; the NASNetLarger architecture also had good results due to a regularization technique called ScheduledDropPath that significantly improves generalization [32].The classification between COVID-19 Pneumonia, and Pneumonia with Comorbidity shows how powerful the VGG16 and ResNet-50 models are in generalizing these images. The VGG16 obtained good results both for the way it organizes its layers of convolutions and pooling in stacks and for having a depth less than the other architectures, which makes it less complex, facilitating its generalization with few samples of images. Yet, ResNet50 can ensure that the vanishing gradient does not occur by adding the so-called “identity shortcut connection” pointed out by He et al. [33]. These classifications are essential since they simulate real situations. Still, it is complex because these two classes share many characteristics, which requires an adequate generalization on the part of the models. The results of the sensitivity and specificity metrics show that the method can solve the problem.In the classification between COVID-19 and No Finding, ResNet50 presented the best results with a Kappa of 0.981 and an F-Score of 0.997. The other networks also achieved over 92% of Kappa results. This experiment has images anatomically more distant than the experiments with COVID-19 Pneumonia and Pneumonia for example, and for this reason it had the highest Kappa reported in Table 4.The experiments with the classes COVID-19 Pneumonia, No Finding, and Pneumonia, become more complicated because there are two nearby classes (COVID-19 and Pneumonia) and a distant one (No Finding). In this scenario, we reached a Kappa, an almost perfect classification according to the agreement table. It is interesting to note that the Pneumonia Comorbidity class is a combination of very different characteristics since the images of patients with pneumonia also contain other diseases.Table 5 shows the results obtained in the experiments carried out with the COVID-DB and Chest X-ray of Pneumonia datasets. In these experiments, we evaluated the proposed method in more specific pneumonia cases, as viral and bacterial. The scenario with four classes - COVID-19 Pneumonia, Normal, Viral Pneumonia, and Bacterial Pneumonia - reached a Kappa of 0.993 and a Specificity of 0.999 using the NASNetLarge architecture. These results could be supported by different visual patterns from bacterial, viral, and COVID-19 pneumonia. The viral one is usually represented in X-ray images with diffuse interstitial patterns in both lungs. Bacterial pneumonia generates an X-ray with the areas with focal lobar consolidation [40], [41]. Comparing the COVID-19 and other viral pneumonia, SARS-COV-2 attacks lung cells, triggering an exacerbated protective response from the immune system. This process increases neutrophils, Interleukin-6 serum, c-reactive protein, a decrease in total lymphocytes, and an exacerbated increase in pro-inflammatory cytokines and chemokines. This process is more pronounced in patients with COVID-19 than in other types of viral pneumonia. These conditions are correlated with the severity and mortality of COVID-19 as they enhance inflammation [42], [43]. It implies more defined characteristics of the lesions on the radiographs. COVID-19patients are more likely to have greater ground-glass opacity than patients with other viral types of pneumonia such as, for example, H1N1 [43], [44], [45]. The results show that our proposal can differentiate patients into more specific clinical conditions. Thereby, it is possible to achieve proper clinical treatment faster, and it is a crucial point for any CAD system.
Table 5
Results obtained throughout all the experiments carried out using COVID-DB and Pneumonia Chest X-ray datasets.
Method
Kappa
Acc
F-Score
AUC
Sensitivity
Specificity
COVID-19 Pneumonia & Normal
DN201
0.982±0.011
0.992±0.005
0.991±0.006
0.992±0.006
0.984±0.012
0.996±0.004
IRNV2
0.980±0.017
0.991±0.007
0.990±0.008
0.992±0.007
0.978±0.018
0.997±0.003
IV3
0.988±0.015
0.995±0.006
0.994±0.007
0.994±0.007
0.991±0.016
0.997±0.005
NASNL
0.983±0.013
0.993±0.005
0.991±0.006
0.992±0.006
0.985±0.015
0.996±0.004
RN50
0.991±0.009
0.996±0.004
0.995±0.005
0.995±0.005
0.996±0.007
0.996±0.004
VGG16
0.987±0.014
0.994±0.006
0.993±0.007
0.992±0.008
0.996±0.009
0.994±0.007
XC
0.983±0.011
0.995±0.005
0.991±0.005
0.992±0.004
0.986±0.013
0.996±0.003
COVID-19 Pneumonia & Bacterial Pneumonia
DN201
0.992±0.007
0.997±0.002
0.996±0.003
0.995±0.004
0.997±0.006
0.995±0.005
IRNV2
0.991±0.007
0.997±0.002
0.996±0.003
0.994±0.004
0.997±0.006
0.995±0.006
IV3
0.993±0.007
0.998±0.002
0.997±0.004
0.996±0.004
0.997±0.006
0.996±0.006
NASNL
0.996±0.006
0.999±0.002
0.998±0.003
0.997±0.004
0.998±0.005
0.998±0.005
RN50
0.991±0.008
0.997±0.002
0.996±0.004
0.994±0.005
0.997±0.006
0.995±0.006
VGG16
0.991±0.006
0.997±0.002
0.996±0.003
0.994±0.003
0.997±0.006
0.995±0.005
XC
0.994±0.006
0.998±0.002
0.997±0.003
0.996±0.004
0.998±0.005
0.997±0.004
COVID-19 Pneumonia & Viral Pneumonia
DN201
0.991±0.009
0.996±0.004
0.995±0.004
0.996±0.004
0.993±0.012
0.997±0.003
IRNV2
0.993±0.009
0.997±0.004
0.997±0.004
0.997±0.004
0.994±0.012
0.999±0.003
IV3
0.992±0.009
0.996±0.004
0.996±0.005
0.996±0.005
0.995±0.011
0.997±0.005
NASNL
0.995±0.009
0.998±0.004
0.997±0.004
0.998±0.003
0.995±0.011
0.999±0.002
RN50
0.991±0.006
0.996±0.003
0.995±0.003
0.995±0.003
0.996±0.007
0.996±0.003
VGG16
0.993±0.005
0.997±0.002
0.996±0.002
0.996±0.003
0.999±0.004
0.996±0.003
XC
0.991±0.008
0.996±0.003
0.995±0.004
0.994±0.005
0.997±0.006
0.995±0.005
COVID-19 Pneumonia & Normal & Viral & Bacterial
DN201
0.977±0.011
0.995±0.003
0.988±0.006
0.992±0.005
0.970±0.020
0.998±0.002
IRNV2
0.984±0.006
0.996±0.001
0.992±0.003
0.994±0.004
0.980±0.012
0.998±0.002
IV3
0.981±0.008
0.995±0.002
0.991±0.004
0.994±0.005
0.975±0.011
0.999±0.002
NASNL
0.993±0.007
0.998±0.002
0.996±0.004
0.997±0.003
0.992±0.011
0.999±0.001
RN50
0.984±0.009
0.996±0.002
0.992±0.004
0.994±0.006
0.982±0.013
0.998±0.002
VGG16
0.985±0.010
0.996±0.002
0.993±0.005
0.994±0.006
0.983±0.013
0.998±0.002
XC
0.979±0.008
0.995±0.002
0.990±0.004
0.992±0.004
Table 6 shows the results obtained in the experiment performed with the COVID-DB, NIH Chest X-ray and Pneumonia Chest X-ray datasets. In this scenario, the experiment classes are: COVID-19 Pneumonia, composed of all images in COVID-DB; All Pneumonia, composed of all pneumonia images in NIH Chest X-ray and Chest X-ray Pneumonia datasets; and Normal/No Finding selected from Chest X-ray Pneumonia and NIH Chest X-ray, respectively. Then, this experiment evaluated the proposal performance using all available chest X-ray images. The best result was obtained using Xception architecture with Kappa=0.958. Also, all other CNNs obtained Kappa rates over 0.930, showing the robustness of the proposed method.To illustrate the gain of the proposed resizing method, we performed a comparison with bilinear interpolation, a widely used method available in several libraries, as Scikit-Image 1
, TensorFlow 2
, and Pytorch 3
. The experiment used the COVID-DB and NIH Chest X-ray datasets with: COVID-19 Pneumonia, No Finding, and Pneumonia classes. We chose this experiment based on the results presented in Table 4 as it was the most challenging for the proposed methodology - lower Kappa in the best scenarios. We present the results in Table 7. Using our proposed method to resize X-images, we improved six out of seven architectures; the exception was the Xception, where the Kappa was 0.919 against the 0.908 of our methodology. Kappa’s most significant increase was regarding the NASNetLarge architecture, corresponding to 0.053. It is also possible to observe that our proposed resizing method had a lower standard deviation than a bilinear interpolation, allowing us to infer the proposal’s higher robustness.Fig. 6
presents the distribution of the 10-folds of the cross-validation classification methodology. We chose the Kappa index because it allows us to improve the perception of unbalanced data combinations. Despite providing the best results, the boxplot shows that our method allows more uniformity between the ten folds. Our best result was achieved with VGG16 and ResNet50.
Fig. 6
Boxplot of the experiments of Table 7 using Kappa index to compare the bilinear interpolation with the proposed resizing method.
Boxplot of the experiments of Table 7 using Kappa index to compare the bilinear interpolation with the proposed resizing method.Classification experiment results considering the classes COVID-19, Pneumonia, and No Finding with the resizing method with max pool and without this method.State of the art results to COVID-19 Pneumonia detection in comparison with our methodology.Table 8 shows the comparison of our results with those presented in state-of-the-art. Our method overperformed the state-of-the-art in all scenarios, except for Normal and COVID-19 Pneumonia classes, where the best accuracy and F-Score were achieved by [12]. However, our work presented superior specificity and the same sensitivity. It is essential to highlight no significant difference for none of the metrics, with a disparity no higher than 0,02%. Further, the work of Chowdhury et al. [12] has two points of improvement: 1) the number of images used is smaller than ours; and 2) the authors did not use cross-validation, a basic methodology especially considering unbalanced datasets. Regarding the COVID-19 Pneumonia and Viral Pneumonia experiment, our proposal represents an improvement of 14.71% in Acc compared with Ozturk et al. [18].
Visual results
CNNs are known as black boxes because it is not easy to explain their excellent results. Selvaraju et al. [46] have proposed the Gradient-weighted Class Activation Mapping (Grad-CAM), a methodology that makes it possible to visualize important regions after the learning process. It uses class-specific gradient information to locate essential regions.Fig. 7
presents the Grad-CAM applied in our methodology. It shows the features that are obtained from lung regions where the disease appears. Thereby, outside artifacts, such as markers, patient information, and equipments used to obtain the image, are not responsible for our high-performance rates.
Fig. 7
Examples of Grad-CAM extracted from our method. Left (Figures (a), (c) and (e)) are examples of Pneumonia, Viral Pneumonia, and Bacterial Pneumonia, respectively. Right (Figures (b), (d) and (f)) are examples of COVID-19 X-ray images. For each one, we present the original image and its feature map computed with Grad-CAM, where red regions produce more important features than blue ones. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Examples of Grad-CAM extracted from our method. Left (Figures (a), (c) and (e)) are examples of Pneumonia, Viral Pneumonia, and Bacterial Pneumonia, respectively. Right (Figures (b), (d) and (f)) are examples of COVID-19 X-ray images. For each one, we present the original image and its feature map computed with Grad-CAM, where red regions produce more important features than blue ones. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)To explain why the proposed resize method achieved better results than bilinear interpolation, Fig. 8
compares how the two methodologies impact CNN learning. Fig. 8 shows the same X-ray image of a COVID-19patient (Fig. 8 (a)) resized with the proposed method and with the bilinear interpolation algorithm (Fig. 8 (b)). Fig. 8 (c) shows the filters of the last convolutional layer of VGG16, according to the experiment reported in Table 7. We highlight 36 from the 512 filters ordered by the highest activation value with common positions for each resizing method. Despite the visual similarities between the method filters, there are distinct regions. Thus, the selected filters suggest that resizing impacts the feature extraction. The output images show 2 points: (1) the attribute maps activate different regions for the two resizing methods; (2) the proposed method leads CNN to focus on the region of interest. The max window allows CNN to recognize the regions of the injuries caused by pneumonia. They are characterized by high-intensity regions in x-ray images [27], preserved by the maximum window method. On the other hand, the bilinear interpolation does not guarantee that such characteristics will be carried out since it uses an average window.
Fig. 8
Comparison of the impact of the two resize methods on CNN feature extraction. (a) original image; (b) resizing with bilinear interpolation and the proposed method; (c) 36 filters with highest activation value with common positions for each resize method; (d) Grad-CAMs corresponding to both resize methods.
Comparison of the impact of the two resize methods on CNN feature extraction. (a) original image; (b) resizing with bilinear interpolation and the proposed method; (c) 36 filters with highest activation value with common positions for each resize method; (d) Grad-CAMs corresponding to both resize methods.
Challenges and limitations
In the image classification process, we noticed Overfitting caused by the depth of the models. To overcome this problem, we adopted a smaller set of parameters for each new set of fully connected layers. In binary classifications, we set 1024 neurons for the input layer and 128 for the hidden layer, which decreased the processing power and the convergence time. We adopted 2048 neurons for the fully connected input layer when there are more than two classes to capture more information.We observed that when we classify two classes using the Sigmoid function in the output layer, the results improved more quickly. When the classification comprised more than two classes, Softmax had better performance. We also employ a dropout rate of 40% to avoid Overfitting.The classification of lung diseases is a challenge, as the diseases included in our study have similar symptoms. Also, signs of a lung infection may contain similar characteristics. We also observed some positive images of patients for COVID-19 Pneumonia, with few signs of infection in the lung. Another factor that makes classification difficult is the low quality of some images. These examples can be seen in the Fig. 9
.
Fig. 9
Miss-classified images of our methodology: COVID-19 (a),(b),(c) and (d); Pneumonia Viral (e); Pneumonia Bacterial (f); Pneumonia (g); and No Finding (h).
Miss-classified images of our methodology: COVID-19 (a),(b),(c) and (d); Pneumonia Viral (e); Pneumonia Bacterial (f); Pneumonia (g); and No Finding (h).Figs. 9 (a)-9(d) shows examples of X-ray images of patients with COVID-19 that our methodology miss-classified. In Figs. 9e-9 h, we have miss-classified X-ray images from patients with viral pneumonia, bacterial pneumonia, pneumonia and normal, respectively.
Conclusions and future works
Our study shows that Deep Learning classification methods could screen patients with pulmonary diseases, particularly COVID-19, since the standard PCR exam is scarce and expensive. Therefore, even if it cannot replace PCR or blood tests, it can be used to control the number of people being examined and, consequently, bring more agility and precision to health professionals’ work. X-ray image classification to identify pneumonia is a very complex and vital task, especially in a COVID-19 pandemic scenario, first, because pneumonia caused by COVID-19 brings many aspects similar to other types of pneumonia. Second, there is a scarcity of images referring to COVID-19 pneumonia, which is essential to deploy a CAD system to support the COVID-19 diagnosis.In this work, we presented an image resizing method based on the max pool function that improved the classification of pulmonary diseases in chest X-ray images. Our method was evaluated through eleven experiments related to COVID-19 pneumonia image classification. The most important experiment was classifying classes involving COVID-19, Pneumonia, and No Finding/Normal. The importance is due to the real possibility of a patient reaching a hospital with pneumonia symptoms and being a positive case of COVID-19. Our experiment shows that it is possible to classify this scenario, thus serving as a filter of which patients need the PCR exam.Our work carried out more experiments than all others found in the state-of-the-art. When comparing similar experiments, our method proved to be more robust and effective in practically all scenarios.Finally, as future work, we intend to segment the lung region, investigate the COVID-19 pneumonia detection in early stages, and evaluate the detection method on larger datasets, resulting in a more realistic scenario.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.