Literature DB >> 35615621

COVID-19 detection on Chest X-ray images: A comparison of CNN architectures and ensembles.

Fabricio Aparecido Breve1.   

Abstract

COVID-19 quickly became a global pandemic after only four months of its first detection. It is crucial to detect this disease as soon as possible to decrease its spread. The use of chest X-ray (CXR) images became an effective screening strategy, complementary to the reverse transcription-polymerase chain reaction (RT-PCR). Convolutional neural networks (CNNs) are often used for automatic image classification and they can be very useful in CXR diagnostics. In this paper, 21 different CNN architectures are tested and compared in the task of identifying COVID-19 in CXR images. They were applied to the COVIDx8B dataset, a large COVID-19 dataset with 16,352 CXR images coming from patients of at least 51 countries. Ensembles of CNNs were also employed and they showed better efficacy than individual instances. The best individual CNN instance results were achieved by DenseNet169, with an accuracy of 98.15% and an F1 score of 98.12%. These were further increased to 99.25% and 99.24%, respectively, through an ensemble with five instances of DenseNet169. These results are higher than those obtained in recent works using the same dataset.
© 2022 Elsevier Ltd. All rights reserved.

Entities:  

Keywords:  Chest X-ray images; Convolutional neural networks; Transfer learning

Year:  2022        PMID: 35615621      PMCID: PMC9122742          DOI: 10.1016/j.eswa.2022.117549

Source DB:  PubMed          Journal:  Expert Syst Appl        ISSN: 0957-4174            Impact factor:   8.665


Introduction

COVID-19 is an infectious disease caused by the Severe Acute Respiratory Syndrome CoronaVirus 2 (SARS-CoV-2) (Khan et al., 2021). It quickly became a global pandemic in less than four months after its first detection in December 2019 in Wuhan, China (Monshi et al., 2021). As of February 2022, it has over 434 million confirmed cases and almost 6 million deaths reported to World Health Organization (World Health Organization, 2022). Early detection of positive COVID-19 cases is critical for avoiding the virus’s spread. The most common technique for diagnosing COVID-19 is known as transcriptase-polymerase chain reaction (RT-PCR). It detects SARS-CoV-2 through collected respiratory specimens of nasopharyngeal or oropharyngeal swabs. However, RT-PCR testing is expensive, time-consuming, and shows poor sensitivity (Monshi et al., 2021, Mostafiz et al., 2020), especially in the first days of exposure to the virus (Long et al., 2020). Up to 54% of COVID-19 patients may have an initial negative RT-PCR result (Arevalo-Rodriguez et al., 2020). Patients that receive a false negative diagnosis may contact and infect other people before they are tested again. Therefore, it is important to have alternative methods to detect the disease, such as Chest X-ray (CXR) images. CXR equipment is widely available in hospitals and CXR images are cheap and fast to acquire. They can be inspected by radiologists to find visual indicators of the virus (Feng et al., 2020). In the past decade, the rise of deep learning methods (Goodfellow et al., 2016, LeCun et al., 2015, Schmidhuber, 2015), especially the convolutional neural networks (CNNs), were responsible for many advances in automatic image classification (Krizhevsky et al., 2012). CXR diagnostic using deep learning methods is a mechanism that can be explored to surpass the limitations of RT-PCR insufficient test kits, waiting time of test results, and test costs (Mostafiz et al., 2020). Many studies concerning the application of CNNs to COVID-19 diagnostic on CXR images were published since the last year (Abbas et al., 2021, Alawad et al., 2021, Chhikara et al., 2021, Heidari et al., 2020, Hira et al., 2021, Ismael and Şengür, 2021, Jia et al., 2021, Karthik et al., 2021, Khan et al., 2021, Mohammad Shorfuzzaman, 2020, Monshi et al., 2021, Mostafiz et al., 2020, Narin et al., 2021, Nigam et al., 2021). However, most of them used relatively small and more homogeneous datasets. In this paper, the COVIDx8B dataset1  (Zhao et al., 2021) is used. It has 16,352 CXR images, from which 2,358 are COVID-19 positive and the remaining are from both healthy and pneumonia patients. Released in March 2021, this dataset is composed of images from six other open-source chest radiography datasets. Therefore it is larger and more heterogeneous than earlier available datasets. However, there are only a few works that used this dataset so far (Dominik, 2021, Pavlova et al., 2021, Zhao et al., 2021). A recent survey on applications of artificial intelligence in the COVID-19 pandemic (Khan et al., 2021) reviewed dozens of papers, including papers on CNNs applied to CXR images and all of them used earlier available datasets which are smaller than COVIDx8B. In this paper, a comparison of different CNN models applied to the COVIDx8B dataset is presented, including popular architectures such as VGG (Simonyan & Zisserman, 2015), ResNet (He et al., 2016a), DenseNet (Huang et al., 2017), and EfficientNet (Tan & Le, 2019). They were all trained in the same conditions with the training and test subsets defined by the dataset authors. The initial weights of all methods were defined to those trained on the ImageNet dataset (Russakovsky et al., 2015), which is commonly used in transfer learning scenarios (Oquab et al., 2014). The accuracy, sensitivity (TPR), precision (PPV), and F1 score were evaluated using the test subset. Later, some models’ continuous output (before the classification layer) were combined (ensembles) to overcome individual limitations and provide better classification results. The remainder of this paper is organized as follows. Section 2 shows related work, in which CNNs were used to detect COVID-19 on CXR images. Section 3 presents the COVIDx8B dataset. Section 4 shows the CNN architectures employed in this paper. Section 5 shows the computer simulations comparing the proposed models and other recent approaches from the literature for COVID-19 classification on CXR images using the same dataset. Section 6 shows the computer simulations with CNN ensembles, improving the classification performance of individual models. Finally, the conclusions are drawn in Section 7.

Related work

Many studies have investigated the use of machine learning techniques to detect COVID-19. Many of the researchers used CNN techniques and CXR images and faced challenges due to the lack of available datasets (Alawad et al., 2021). While many authors provided tables comparing results achieved in different works, the comparisons are not fair, since the used datasets are frequently different and pose different levels of challenge. Therefore, here the related works are described focusing on what architectures have been used to handle the problem of COVID-19 detection on CXR images and the size of the evaluated datasets. Nigam et al. (2021) used VGG16, DenseNet121, Xception, NASNet, and EfficientNet in a dataset with 16,634 images. Though this dataset is slightly larger than COVIDx8B, unfortunately, the authors did not make it publicly available. The highest accuracy was 93.48% obtained with EfficientNetB7. Ismael and Şengür (2021) used ResNet18, ResNet50, ResNet101, VGG16, and VGG19 for deep feature extraction and support vector machines (SVM) for CXR images classification. The highest accuracy was 94.7% obtained with ResNet50. However, they used a small dataset with only CXR images. Abbas et al. (2021) validated a deep CNN called Decompose, Transfer, and Compose (DeTraC) for COVID-19 CXR images classification with 93.1% accuracy. They used a combination of two small datasets, totaling images. Hira et al. (2021) used the AlexNet, GoogleNet, ResNet-50, Se-ResNet-50, DenseNet121, Inception V4, Inception ResNet V2, ResNeXt-50, and Se-ResNeXt-50 architectures. Se-ResNeXt-50 achieved the highest classification accuracy of 99.32%. They used a combination of four datasets, totaling 8,830 CXR images. Alawad et al. (2021) used VGG16 both as a stand-alone classifier and as a feature extractor for SVM, Random-Forests (RF), and Extreme-Gradient-Boosting (XGBoost) classifiers. VGG-16 and VGG16+SVM models provide the best performance with 99.82% accuracy. They used a combination of five datasets, totaling 7,329 CXR images. Narin et al. (2021) used ResNet50, ResNet101, ResNet152, InceptionV3, and Inception-ResNetV2. ResNet50 achieved the highest classification performance with 96.1%, 99.5%, and 99.7% accuracy on three different datasets, totaling 7,406 CXR images. Monshi et al. (2021) focused on data augmentation and CNN hyperparameters optimization, increasing VGG19 and ResNet50 accuracy. They also proposed CovidXrayNet, a model based on EfficientNet-B0, which achieved an accuracy of 95.82% on an earlier version of the COVIDx dataset with 15,496 CXR images. Heidari et al. (2020) focused on preprocessing algorithms to improve the performance of VGG16. They used a dataset with 8,474 CXR images and reached 94.5% accuracy. Jia et al. (2021) proposed a modified MobileNet to classify CXR and CT images. They applied their method to a CXR dataset with 7,592 CXR images and achieved 99.3% accuracy. They also applied it to an earlier version of COVIDx with 13,975 CXR images, achieving 95.0% accuracy. Karthik et al. (2021) proposed a custom CNN architecture which they called Channel-Shuffled Dual-Branched (CSDB). They achieved an accuracy of 99.80% on a combination of seven datasets, totaling 15,265 images. Mostafiz et al. (2020) used a hybridization of CNN (ResNet50) and discrete wavelet transform (DWT) features. The random forest-based bagging approach was used for classification. They combined different datasets and used data augmentation techniques to produce a total of CXR images and achieved 98.5% accuracy. Mohammad Shorfuzzaman (2020) used VGG16, ResNet50V2, Xception, MobileNet, and DenseNet121 in a transfer learning scenario. They collected CXR images from different sources to compose a dataset with images. The best accuracy (98.15%) was achieved with ResNet50V2. They also made an ensemble of the four best models (ResNet50V2, Xception, MobileNet, and DenseNet121) with the final output obtained by majority voting, raising the accuracy to 99.26%. Chhikara et al. (2021) proposed a InceptionV3 based-model and applied it to three different datasets with 11,244, 8,246, and 14,486 CXR images, respectively. The model has reached an accuracy of 97.7%, 84.95%, and 97.03% on the mentioned datasets, respectively. Pavlova et al. (2021) proposed the COVIDx8B dataset, which they claim is the largest and most diverse COVID-19 CXR dataset in open access form, and the COVID-Net CXR-2 model, a CNN specially tailored for COVID-19 detection on CXR images using machine-driven design, which achieved an accuracy of 95.5%. Zhao et al. (2021) used ResNet50V2 to classify the COVIDx8B dataset with an accuracy of 96.5% in the best scenario. Dominik (2021) proposed a lightweight architecture called BaseNet and achieved an accuracy of 95.50% on COVIDx8B. He also used an ensemble composed of BaseNet, VGG16, VGG19, ResNet50, DenseNet121, and Xception to achieve 97.75% accuracy. It was further increased to 99.25% using an optimal classification threshold.

Dataset

Most of the early research regarding COVID-19 detection on CXR images suffered from the lack of available datasets (Alawad et al., 2021). The authors would frequently combine different smaller datasets, so fairly comparing the results was impossible. COVIDx8B is a large and heterogeneous COVID-19 CXR benchmark dataset with 16,352 CXR images coming from patients of at least countries (Pavlova et al., 2021). It is constructed with images extracted from six open-source chest radiography datasets, which are shown in Table 1. Notice that the sum of the images in the source datasets is much larger than the size of COVIDx8B since not all of them were selected by the authors. Example images from the COVIDx8B dataset are shown in Fig. 1.
Table 1

List of datasets that compose the COVIDx8B benchmark dataset.

Source datasetSizeReference
Covid-chestxray-dataset950Cohen et al. (2020)
COVID-19 Chest X-ray Dataset Initiative55Chung (2020b)
Actualmed COVID-19 Chest X-ray Dataset Initiative238Chung (2020a)
COVID-19 Radiography Database-Version 321,165Chowdhury et al., 2020, Rahman et al., 2021
RSNA Pneumonia Detection Challenge29,684Wang et al. (2017)
RSNA International COVID-19 Open Radiology Database (RICORD)1,257Tsai et al. (2021)
Fig. 1

Examples of CXR images from the COVIDx8B dataset. The first row shows COVID-19 negative patient cases, and the second row shows COVID-19 positive patient cases.

Though COVIDx8B does not include information on patients’ demographics, half of their source datasets do. The covid-chestxray-dataset has registers from male patients and registers from female patients. The average age is years old. The COVID-19 positive registers are from male and female patients, with an average age of years old. The Fig. 1 COVID-19 Chest X-ray Dataset Initiative has only registers, most of them do not indicate sex. Among the remaining, there are male patients and female patients. Only patients have their exact age registered and the average is years old. All patients with the exact age described are COVID-19 positive or unlabeled. The RSNA International COVID-19 Open Radiology Database (RICORD) only has COVID-19 positive cases. They come from male and female patients, with an average age of years old. List of datasets that compose the COVIDx8B benchmark dataset. Four of the source datasets have both COVID-19 positive and negative cases. The RSNA Pneumonia Detection Challenge has only COVID-19 negative cases (non-COVID pneumonia, normal, etc.) and The RSNA International COVID-19 Open Radiology Database (RICORD) has only COVID-19 positive cases. COVID8xB training subset is composed of 15,952 images, from which 2,158 are COVID-19 positive and 13,794 are COVID-19 negative. The negative group also includes images of patients with non-COVID-19 pneumonia, which poses a major challenge as it is usually difficult to distinguish between COVID-19 and non-COVID19 pneumonia. The test subset has COVID-19 positive images from different patients and COVID-19 negative images. In the negative group, images are from healthy patients. The other images are from non-COVID pneumonia patients. The test images were randomly selected from international patient groups curated by the Radiological Society of North America (RSNA) (Tsai et al., 2021, Wang et al., 2017). The images were annotated by an international group of scientists and radiologists from different institutes around the world. The test set was selected in such a way to ensure no patient overlap between training and test sets (Pavlova et al., 2021). Examples of CXR images from the COVIDx8B dataset. The first row shows COVID-19 negative patient cases, and the second row shows COVID-19 positive patient cases.

CNN architectures

This section presents the CNN architectures explored in this work. It also describes the layers added to complete the models and perform the CXR images classification. Table 2 shows the tested architectures, some of their characteristics, and their respective literature references.
Table 2

CNN architectures, some of their characteristics, and their references.

ModelInput Image ResolutionOutput of Last Conv. LayerTrainable ParametersReference
DenseNet121224 × 2247×7×10247,216,770Huang et al. (2017)
DenseNet169224 × 2247×7×166412,911,234Huang et al. (2017)
DenseNet201224 × 2247×7×192018,585,218Huang et al. (2017)
EfficientNetB0224 × 2247×7×12804,335,998Tan and Le (2019)
EfficientNetB1240 × 2408×8×12806,841,634Tan and Le (2019)
EfficientNetB2260 × 2609×9×14088,062,212Tan and Le (2019)
EfficientNetB3300 × 30010×10×153611,090,218Tan and Le (2019)
InceptionResNetV2299 × 2998×8×153654,670,178Szegedy et al. (2017)
InceptionV3299 × 2998×8×204822,293,410Szegedy et al. (2016)
MobileNet224 × 2247×7×10243,469,890Howard et al. (2017)
MobileNetV2224 × 2247×7×12802,552,322Sandler et al. (2018)
NASNetMobile224 × 2247×7×10564,504,084Zoph et al. (2018)
ResNet101224 × 2247×7×204843,077,890He et al. (2016a)
ResNet101V2224 × 2247×7×204843,053,954He et al. (2016b)
ResNet152224 × 2247×7×204858,744,578He et al. (2016a)
ResNet152V2224 × 2247×7×204858,712,962He et al. (2016b)
ResNet50224 × 2247×7×204824,059,650He et al. (2016a)
ResNet50V2224 × 2247×7×204824,044,418He et al. (2016b)
VGG16224 × 2247×7×51214,846,530Simonyan and Zisserman (2015)
VGG19224 × 2247×7×51220,156,226Simonyan and Zisserman (2015)
Xception299 × 29910×10×204821,332,010Chollet (2017)
The output of the last convolutional layer of the original CNN is fed to a global average pooling layer. Following, there is a dense layer with neurons using ReLU (Rectified Linear Unit) activation function, a dropout layer with a 20% rate, and a softmax classification layer. This proposed architecture is illustrated in Fig. 2, where indicates the horizontal and vertical input size of the CNN (image size), while , , and indicate the size of the CNN output in its last convolutional layer. These values depend on the original CNN architecture and they are indicated in Table 2. The table also shows the number of trainable parameters in each CNN architecture, including both their original layers and the dense layers added for COVID-19 classification.
Fig. 2

The proposed CNN Transfer Learning architecture.

CNN architectures, some of their characteristics, and their references. The proposed CNN Transfer Learning architecture.

CNN comparison

In this section, the computer simulations comparing the CNN models applied to the COVIDx8B dataset are presented. All simulations were performed using Python and TensorFlow in three desktop computers with NVIDIA GeForce GPU boards: GTX 970, GTX 1080, and RTX 2060 SUPER, respectively.2 No pre-processing was applied, except for those steps pre-defined by each CNN architecture, which is basically resizing the image to the CNN input size and normalizing the input range. In all tested scenarios, each CNN had its weights initially set to those pre-trained on the Imagenet dataset (Russakovsky et al., 2015), which has millions of images and hundreds of classes. This dataset is frequently used in transfer learning scenarios. The training phase was conducted using the Adam optimizer (Kingma & Ba, 2014). The learning rate was set to 10−5 for the original CNN layers and 10−3 for the dense layers proposed in this work. The idea is to allow bigger weight changes in the classification layers, which need to be trained from scratch, while only fine-tuning the CNN layers, taking advantage of the weights previously learned from the Imagenet dataset. From the training subset, 20% of the images are randomly taken to compose the validation subset, using stratification to keep the same classes proportion. Since the training subset is unbalanced, different class weights were defined for each class: 0.5782 and 3.6960 for negative and positive classes, respectively. These values were calculated based on TensorFlow documentation3 : where is the class weight, is the amount of examples belonging to class , and is the total amount of examples. All models were trained for up to epochs. An early stopping criterion was set to interrupt the training phase if the loss on the validation set did not decrease during the last epochs. The final weights are always restored to those adjusted in the epoch that achieved the lowest validation loss. For each CNN model, the training phase was performed five times with different training/validation splits, generating five instances with different adjusted weights. The same five training/validation splits were used for all models. Each instance was evaluated on the test subset and the following measures were obtained: accuracy (ACC), sensitivity (TPR), precision (PPV), and F1 score. The results are shown in Table 3. Each value is the average of the measures obtained on the five different instances of each model. The standard deviation is also presented. Results of the same evaluation applied to the training and validation subsets are available in Appendix.
Table 3

Comparison of 21 different CNN models applied to the COVIDx8B dataset. Each model is executed five times. The highest values for each measure are highlighted in bold.

ModelACC
TPR
PPV
F1
MeanS.D.MeanS.D.MeanS.D.MeanS.D.
DenseNet1690.98150.00560.97000.01380.99300.00750.98120.0058
EfficientNetB20.97600.00490.96000.01410.99180.00510.97560.0052
InceptionResNetV20.97550.00990.95900.02460.99190.00510.97490.0106
InceptionV30.97500.00650.95200.01440.99790.00410.97440.0069
MobileNet0.97100.00600.94300.01360.99900.00210.97010.0064
EfficientNetB00.97050.00510.95100.00860.98960.00330.96990.0053
EfficientNetB30.97000.01630.94700.03370.99270.00510.96900.0177
DenseNet2010.96950.01760.94000.03420.99890.00220.96830.0186
ResNet152V20.96950.02440.94200.05100.99700.00400.96790.0268
ResNet1520.96600.02230.93700.04430.99470.00330.96440.0243
DenseNet1210.96300.00530.92700.01030.99890.00220.96160.0057
Xception0.96150.00770.92300.01541.00000.00000.95990.0083
VGG190.95800.01980.91700.03850.99890.00230.95580.0216
EfficientNetB10.95700.02240.92400.04130.98920.00750.95510.0242
ResNet500.95450.01720.90900.03441.00000.00000.95200.0192
VGG160.95250.01230.90900.02820.99580.00520.95010.0138
ResNet101V20.95300.03020.91000.06430.99590.00500.94970.0342
MobileNetV20.94850.01720.90300.03590.99350.00190.94570.0190
ResNet1010.94100.01700.88300.03330.99880.00230.93700.0190
ResNet50V20.92800.00750.85900.01530.99660.00460.92260.0087
NASNetMobile0.85300.06530.70900.13170.99600.00340.82120.0918

Average0.95690.01620.91780.03340.99570.00360.95360.0187
It is worth noticing that most related work only shows the results of a single execution on each tested CNN architecture. This may lead to wrong conclusions as there is always some expected variance on multiple executions of neural networks, which are stochastic by nature. Comparison of 21 different CNN models applied to the COVIDx8B dataset. Each model is executed five times. The highest values for each measure are highlighted in bold. DenseNet169 achieved the highest accuracy (98.15%), TPR (97. 00%), and F1 score (98.12%) among all the tested models. The highest PPV (100%) was achieved by Xception and ResNet50 models. EfficientNetB2 achieved the second-best accuracy, PPV, and F1 score. Compared to other recent approaches applied to the same dataset, DenseNet169, EfficientNetB2, and InceptionResNetV2 achieved the best accuracy, TPR, and F1 score, as shown in Table 4. It is worth noticing that EfficientNetB2 has less trainable parameters (8.06 million) than all the other architectures in this comparison, including the Covid-Net CXR-2 (9.2 million), which was specially tailored for the COVIDx8B dataset.
Table 4

Comparison of the best four models tested in this paper (in italic) with other recently proposed models applied to the COVIDx8B dataset. The highest values for each measure are highlighted in bold. The results obtained by other authors were compiled from the respective cited references.

ModelACCTPRPPVF1Source
DenseNet1690.98150.97000.99300.9812this paper
EfficientNetB20.97600.96000.99180.9756this paper
InceptionResNetV20.97550.95900.99190.9749this paper
InceptionV30.97500.95200.99790.9744this paper
VGG16 (ImageNet)0.97500.95001.00000.9744Dominik (2021)
Covid-Net0.94000.93501.00000.9664Pavlova et al. (2021)
DenseNet121 (ChestXray)0.96500.93500.99470.9639Dominik (2021)
ResNet50V2 (Bit-M)0.96500.93001.00000.9637Zhao et al. (2021)
Covid-Net CXR-20.96300.95500.97000.9624Pavlova et al. (2021)
VGG19 (ImageNet)0.96250.92501.00000.9610Dominik (2021)
ResNet-50 (ImageNet)0.95750.92000.99460.9558Dominik (2021)
DenseNet121 (ImageNet)0.95750.91501.00000.9556Dominik (2021)
Xception (ImageNet)0.95500.91001.00000.9529Dominik (2021)
ResNet50V2 (Bit-S)0.94800.89501.00000.9446Zhao et al. (2021)
ResNet50V2 (Random)0.92800.85501.00000.9218Zhao et al. (2021)
ResNet500.90500.88500.92200.9031Pavlova et al. (2021)
There are some common characteristics among the two best- performing CNN architectures. DenseNet and EfficientNet are newer approaches (2017 and 2019, respectively) than VGG (2015) and ResNet (2016). DenseNet and EfficientNet also focus on architecture efficiency to use less trainable parameters than the earlier approaches. In this case, the strategy used in these newer models was more suitable for these types of CXR images. Unfortunately, many related works compared fewer and/or earlier models only. Therefore, future studies should consider a wider variety of models to verify if this tendency confirms. In particular, from the related work section, only Nigam et al. (2021) and Monshi et al. (2021) explored EfficientNet, but they also reported good results with it, showing this is a promising architecture for CXR images. Table 5 compares results reported in individual papers described in Section 2, where the authors are motivated to use a setup such their algorithm is the best performing, with the best result found in this paper for an individual CNN architecture, in which there is no motivation to implement optimizations to boost any particular architecture. Despite that, the best result from this paper is still in the top half of the best accuracy ranking. For each paper, the CNN architecture used and the dataset size are provided for reference.
Table 5

Comparison of different CNN-based models applied to different COVID-19 datasets found in individual papers and the best result by an individual CNN model applied to the COVIDx8B dataset in this paper.

ReferenceArchitectureDataset SizeAccuracy
Alawad et al. (2021)VGG167,32999.82%
Karthik et al. (2021)CSDB15,26599.80%
Hira et al. (2021)Se-ResNeXt-508,83099.32%
Jia et al. (2021)MobileNet7,59299.30%
Mostafiz et al. (2020)ResNet504,80998.50%
Narin et al. (2021)ResNet507.40698.43%
this paperDenseNet16916,35298.15%
Chhikara et al. (2021)InceptionV311,24497.70%
Chhikara et al. (2021)InceptionV314,48697.03%
Monshi et al. (2021)EfficientNetB015,49695.82%
Jia et al. (2021)MobileNet13.97595,00%
Ismael and Şengür (2021)ResNet5038094.70%
Heidari et al. (2020)VGG168,47494.50%
Nigam et al. (2021)EfficientNetB716,63493.48%
Abbas et al. (2021)DeTrac19693.10%
Chhikara et al. (2021)InceptionV38,24684.95%
Comparison of the best four models tested in this paper (in italic) with other recently proposed models applied to the COVIDx8B dataset. The highest values for each measure are highlighted in bold. The results obtained by other authors were compiled from the respective cited references. Comparison of different CNN-based models applied to different COVID-19 datasets found in individual papers and the best result by an individual CNN model applied to the COVIDx8B dataset in this paper.

CNN ensembles

This section presents the computer simulations with ensembles of different CNN models and ensembles of multiple instances of the same model. All the ensembles experiments used the output of the last dense layer, just before the softmax activation function. Therefore, for each image, each model will output two continuous values, which can be interpreted as the probability of each class. Then, the output of the ensemble will be the average of its members’ output. The same weights trained for the experiments in Section 5 were used for the experiments in this section. In the first ensemble experiment, the two models that achieved the best individual F1 score (DenseNet169 and EfficientNetB2) were combined in the first ensemble configuration. The second ensemble configuration adds the third-best model (InceptionResNetV2). The third ensemble configuration adds the fourth-best model (InceptionV3) and so on, with up to seven models. Then, in the last ensemble configuration, all the models are combined. In this first experiment, only one instance of each model composes each ensemble, thus there are five ensembles for each configuration. Table 6 shows the average and standard deviation of the measures obtained for each ensemble configuration.
Table 6

Ensembles of CNN models applied to the COVIDx8B dataset. Each ensemble configuration is executed five times with different instances of the models. The highest values for each measure are highlighted in bold.

ModelsACC
TPR
PPV
F1
MeanS.D.MeanS.D.MeanS.D.MeanS.D.
Top 2 models0.98550.00240.97300.00400.99800.00250.98530.0025
Top 3 models0.98850.00340.97700.00681.00000.00000.98840.0035
Top 4 models0.98700.00190.97400.00371.00000.00000.98680.0019
Top 5 models0.98650.00200.97300.00401.00000.00000.98630.0021
Top 6 models0.98800.00100.97600.00201.00000.00000.98790.0010
Top 7 models0.98650.00250.97300.00511.00000.00000.98630.0026
All models0.97750.00320.95500.00631.00000.00000.97700.0033
The best accuracy, TPR, and F1 score were achieved when the three best models were combined (DenseNet169, EfficientNetB2, and InceptionResNetV2). All the ensembles, except for the one with the best two models, achieved a PPV of 100%. Except for the ensemble of all models, all the other ensembles achieved higher accuracy, TPR, and F1 scores than the best individual model. For the second ensembles experiment, the five instances of each model are combined to form an ensemble. It is expected that five instances, even if they are from the same model, will improve the measures by alleviating the randomness effects of the training. Ensembles of CNN models applied to the COVIDx8B dataset. Each ensemble configuration is executed five times with different instances of the models. The highest values for each measure are highlighted in bold. Table 7 shows the measures obtained with these ensembles for each model. It also shows the gain obtained by the ensemble when compared to the average of the single instances. All the models had gained with the ensembles. The highest measures were obtained by DenseNet169, with an F1 score of 99.24% and an accuracy of 99.25%. This is the same accuracy obtained by Dominik (2021) using an ensemble of multiple models and an optimized threshold. To the best of my knowledge, this is the highest accuracy achieved in this dataset at the time this paper is being written.
Table 7

Ensembles of CNN models applied to the COVIDx8B dataset. Each ensemble is composed of five instances of the same model, with different training/validation splits. The highest values for each measure and the highest gains in comparison to single instances of each model are highlighted in bold.

ModelsACC
TPR
PPV
F1
MeanGainMeanGainMeanGainMeanGain
DenseNet1690.99251.12%0.98501.55%1.00000.70%0.99241.14%
EfficientNetB20.98500.92%0.97501.56%0.99490.31%0.98480.94%
InceptionResNetV20.98751.23%0.97501.67%1.00000.82%0.98731.27%
InceptionV30.98000.51%0.96000.84%1.00000.21%0.97960.53%
MobileNet0.98251.18%0.96502.33%1.00000.10%0.98221.25%
EfficientNetB00.97500.46%0.96000.95%0.98970.01%0.97460.48%
EfficientNetB30.98501.55%0.97502.96%0.99490.22%0.98481.63%
DenseNet2010.98251.34%0.96502.66%1.00000.11%0.98221.44%
ResNet152V20.99002.11%0.98004.03%1.00000.30%0.98992.27%
ResNet1520.98001.45%0.96502.99%0.99480.01%0.97971.59%
DenseNet1210.97250.99%0.94501.94%1.00000.11%0.97171.05%
Xception0.96250.10%0.92500.22%1.00000.00%0.96100.11%
VGG190.97001.25%0.94002.51%1.00000.11%0.96911.39%
EfficientNetB10.97251.62%0.95002.81%0.99480.57%0.97191.76%
ResNet500.96501.10%0.93002.31%1.00000.00%0.96371.23%
VGG160.95500.26%0.91000.11%1.00000.42%0.95290.29%
ResNet101V20.96501.26%0.93002.20%1.00000.41%0.96371.47%
MobileNetV20.96501.74%0.93503.54%0.99470.12%0.96391.92%
ResNet1010.95751.75%0.91503.62%1.00000.12%0.95561.99%
ResNet50V20.93500.75%0.87001.28%1.00000.34%0.93050.86%
NASNetMobile0.87502.58%0.75005.78%1.00000.40%0.85714.37%

Average0.96831.20%0.93832.28%0.99830.26%0.96661.38%
For the third and last ensembles experiment, the first experiment is repeated, but now using all the five instances of each model in the ensemble. Table 8 shows the measures obtained with each ensemble and the gain obtained by these ensembles when compared to the ensembles which used only a single instance of each model. In this case, there were only small differences and some of them were negative. Therefore, the best ensemble overall is still the one with multiple instances of DenseNet169.
Table 8

Ensembles of CNN models applied to the COVIDx8B dataset. Each ensemble configuration has five instances of each participant model. The highest values for each measure and the highest gains in comparison with the ensembles of single instances for each model are highlighted in bold.

ModelsACC
TPR
PPV
F1
MeanGainMeanGainMeanGainMeanGain
Top 2 models0.9850−0.05%0.9700−0.31%1.00000.20%0.9848−0.05%
Top 3 models0.9875−0.10%0.9750−0.20%1.00000.00%0.9873−0.11%
Top 4 models0.98750.05%0.97500.10%1.00000.00%0.98730.05%
Top 5 models0.98750.10%0.97500.21%1.00000.00%0.98730.10%
Top 6 models0.9875−0.05%0.9750−0.10%1.00000.00%0.9873−0.06%
Top 7 models0.98750.10%0.97500.21%1.00000.00%0.98730.10%
All models0.97750.00%0.95500.00%1.00000.00%0.97700.00%

Average0.98570.01%0.9714−0.01%1.00000.03%0.98550.00%
Ensembles of CNN models applied to the COVIDx8B dataset. Each ensemble is composed of five instances of the same model, with different training/validation splits. The highest values for each measure and the highest gains in comparison to single instances of each model are highlighted in bold. Ensembles of CNN models applied to the COVIDx8B dataset. Each ensemble configuration has five instances of each participant model. The highest values for each measure and the highest gains in comparison with the ensembles of single instances for each model are highlighted in bold.

Conclusions

In this paper, different CNN architectures are applied to the detection of COVID-19 on CXR images. The comparison was performed using the COVIDx8B, a large and heterogeneous COVID-19 CXR images dataset, which is composed of six open-source CXR datasets. The training was repeated five times for each model, with different training and validation splits to get more reliable results, while most related works tested fewer models and performed only a single execution for each one. CNN ensembles were also explored in this work, combining both different models and multiple instances of the same model. DenseNet169 achieved the best results regarding the accuracy and the F1 score, both as a single instance and with an ensemble of five instances. The classification accuracies were 98.15% and 99.25% for the single instance and the ensemble, respectively, while the F1 scores were 98.12% and 99.24%, also respectively. These results are better than those achieved in recent works where the same dataset was used. The simulations performed for this paper add more evidence of the efficacy of CNNs in the detection of COVID-19 on CXR images, which is very important to assist in quick diagnostics and to avoid the spread of the disease. Moreover, these experiments may also guide future research as they tested a large amount of CNN architectures and identified which of them produces the best results for this particular task.

CRediT authorship contribution statement

Fabricio Aparecido Breve: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Table A.9

Classification accuracy (ACC) achieved by the CNN architectures when applied to the train, validation, and test subsets individually.

Dataset/SubsetTrainValidationTest
DenseNet1690.99510.97940.9815
EfficientNetB20.99360.97930.9760
InceptionResNetV20.98350.96810.9755
InceptionV30.99600.97840.9750
MobileNet0.99360.97880.9710
EfficientNetB00.98940.97610.9705
EfficientNetB30.99480.98030.9700
ResNet152V20.99450.97570.9695
DenseNet2010.99710.98160.9695
ResNet1520.99230.97830.9660
DenseNet1210.99620.98060.9630
Xception0.99090.97770.9615
VGG190.99220.98040.9580
EfficientNetB10.98020.96970.9570
ResNet500.99550.98060.9545
ResNet101V20.99090.97070.9530
VGG160.99130.97720.9525
MobileNetV20.99870.98080.9485
ResNet1010.99230.98030.9410
ResNet50V20.98590.96620.9280
NASNetMobile0.97980.96600.8530
Table A.10

Classification sensitivity (TPR) achieved by the CNN architectures when applied to the train, validation, and test subsets individually.

Dataset/SubsetTrainValidationTest
DenseNet1690.99870.96110.9700
EfficientNetB20.99360.96620.9600
InceptionResNetV20.98300.94910.9590
InceptionV30.99650.94580.9520
EfficientNetB00.97600.93610.9510
EfficientNetB30.99350.96620.9470
MobileNet0.99570.94400.9430
ResNet152V20.99280.93750.9420
DenseNet2010.99640.94540.9400
ResNet1520.98540.95090.9370
DenseNet1210.99730.94210.9270
EfficientNetB10.97500.95050.9240
Xception0.98470.93330.9230
VGG190.98740.93380.9170
ResNet101V20.98820.92780.9100
VGG160.99150.93240.9090
ResNet500.99180.93380.9090
MobileNetV20.99330.92410.9030
ResNet1010.97010.91620.8830
ResNet50V20.97580.89210.8590
NASNetMobile0.88740.82550.7090
Table A.11

Classification precision (PPV) achieved by the CNN architectures when applied to the train, validation, and test subsets individually.

Dataset/SubsetTrainValidationTest
Xception0.95020.90571.0000
ResNet500.97590.92421.0000
MobileNet0.95870.90400.9990
VGG190.95660.92280.9989
DenseNet1210.97540.91690.9989
DenseNet2010.98220.92150.9989
ResNet1010.97270.93660.9988
InceptionV30.97480.89980.9979
ResNet152V20.96820.89040.9970
ResNet50V20.92440.86300.9966
NASNetMobile0.96020.91590.9960
ResNet101V20.95010.87010.9959
VGG160.94740.90440.9958
ResNet1520.95980.89640.9947
MobileNetV20.99730.93370.9935
DenseNet1690.96720.89460.9930
EfficientNetB30.97020.89770.9927
InceptionResNetV20.90770.84080.9919
EfficientNetB20.96310.89250.9918
EfficientNetB00.94860.89280.9896
EfficientNetB10.89240.84700.9892
Table A.12

Classification F1 score achieved by the CNN architectures when applied to the train, validation, and test subsets individually.

Dataset/SubsetTrainValidationTest
DenseNet1690.98250.92660.9812
EfficientNetB20.97740.92730.9756
InceptionResNetV20.94280.89050.9749
InceptionV30.98550.92210.9744
MobileNet0.97680.92330.9701
EfficientNetB00.96190.91390.9699
EfficientNetB30.98150.93040.9690
DenseNet2010.98920.93310.9683
ResNet152V20.98010.91280.9679
ResNet1520.97220.92260.9644
DenseNet1210.98620.92920.9616
Xception0.96700.91900.9599
VGG190.97160.92810.9558
EfficientNetB10.93120.89540.9551
ResNet500.98370.92870.9520
VGG160.96880.91730.9501
ResNet101V20.96790.89660.9497
MobileNetV20.99530.92870.9457
ResNet1010.97140.92620.9370
ResNet50V20.94930.87730.9226
NASNetMobile0.92090.86750.8212
  19 in total

Review 1.  Deep learning in neural networks: an overview.

Authors:  Jürgen Schmidhuber
Journal:  Neural Netw       Date:  2014-10-13

Review 2.  Deep learning.

Authors:  Yann LeCun; Yoshua Bengio; Geoffrey Hinton
Journal:  Nature       Date:  2015-05-28       Impact factor: 49.962

3.  An automatic approach based on CNN architecture to detect Covid-19 disease from chest X-ray images.

Authors:  Swati Hira; Anita Bai; Sanchit Hira
Journal:  Appl Intell (Dordr)       Date:  2020-11-27       Impact factor: 5.086

4.  Automatic detection of coronavirus disease (COVID-19) using X-ray images and deep convolutional neural networks.

Authors:  Ali Narin; Ceren Kaya; Ziynet Pamuk
Journal:  Pattern Anal Appl       Date:  2021-05-09       Impact factor: 2.580

5.  The RSNA International COVID-19 Open Radiology Database (RICORD).

Authors:  Emily B Tsai; Scott Simpson; Matthew P Lungren; Michelle Hershman; Leonid Roshkovan; Errol Colak; Bradley J Erickson; George Shih; Anouk Stein; Jayashree Kalpathy-Cramer; Jody Shen; Mona Hafez; Susan John; Prabhakar Rajiah; Brian P Pogatchnik; John Mongan; Emre Altinmakas; Erik R Ranschaert; Felipe C Kitamura; Laurens Topff; Linda Moy; Jeffrey P Kanne; Carol C Wu
Journal:  Radiology       Date:  2021-01-05       Impact factor: 11.105

6.  CovidXrayNet: Optimizing data augmentation and CNN hyperparameters for improved COVID-19 detection from CXR.

Authors:  Maram Mahmoud A Monshi; Josiah Poon; Vera Chung; Fahad Mahmoud Monshi
Journal:  Comput Biol Med       Date:  2021-04-15       Impact factor: 6.698

7.  COVID-19: Automatic detection from X-ray images by utilizing deep learning methods.

Authors:  Bhawna Nigam; Ayan Nigam; Rahul Jain; Shubham Dodia; Nidhi Arora; B Annappa
Journal:  Expert Syst Appl       Date:  2021-03-16       Impact factor: 6.954

8.  Classification of COVID-19 chest X-Ray and CT images using a type of dynamic CNN modification method.

Authors:  Guangyu Jia; Hak-Keung Lam; Yujia Xu
Journal:  Comput Biol Med       Date:  2021-04-29       Impact factor: 4.589

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.