Literature DB >> 33519327

Automated medical diagnosis of COVID-19 through EfficientNet convolutional neural network.

Gonçalo Marques¹, Deevyankar Agarwal¹, Isabel de la Torre Díez¹.

Abstract

COVID-19 infection was reported in December 2019 at Wuhan, China. This virus critically affects several countries such as the USA, Brazil, India and Italy. Numerous research units are working at their higher level of effort to develop novel methods to prevent and control this pandemic scenario. The main objective of this paper is to propose a medical decision support system using the implementation of a convolutional neural network (CNN). This CNN has been developed using EfficientNet architecture. To the best of the authors' knowledge, there is no similar study that proposes an automated method for COVID-19 diagnosis using EfficientNet. Therefore, the main contribution is to present the results of a CNN developed using EfficientNet and 10-fold stratified cross-validation. This paper presents two main experiments. First, the binary classification results using images from COVID-19 patients and normal patients are shown. Second, the multi-class results using images from COVID-19, pneumonia and normal patients are discussed. The results show average accuracy values for binary and multi-class of 99.62% and 96.70%, respectively. On the one hand, the proposed CNN model using EfficientNet presents an average recall value of 99.63% and 96.69% concerning binary and multi-class, respectively. On the other hand, 99.64% is the average precision value reported by binary classification, and 97.54% is presented in multi-class. Finally, the average F1-score for multi-class is 97.11%, and 99.62% is presented for binary classification. In conclusion, the proposed architecture can provide an automated medical diagnostics system to support healthcare specialists for enhanced decision making during this pandemic scenario.

Entities: Chemical Disease Species

Keywords: Automated decision support system; COVID-19; Convolutional Neural Network (CNN); Deep learning; Machine learning

Year: 2020 PMID： 33519327 PMCID： PMC7836808 DOI： 10.1016/j.asoc.2020.106691

Source DB: PubMed Journal: Appl Soft Comput ISSN： 1568-4946 Impact factor: 6.725

Introduction

Currently, COVID-19 has led to critical consequences on the economic and social structures of developed and developing countries [1], [2]. The first case of infection by coronavirus has been reported in December 2019 at Wuhan, China [3]. However, the current pandemic situation still to be resolved [4], [5]. This virus critically affects several countries such as the USA [6], Brazil [7], India [8] and Italy [9]. Therefore, several research units are working to develop policies, vaccines and novel methods to control this pandemic scenario [10]. On the one hand, numerous researchers from the medicine domain are developing drugs to stop the virus prefoliation [11], [12]. First, it is necessary to develop new methods to help infected people [13], [14], [15]. Second, it is also crucial to plan sanitary policies to prevent infected patients from disseminating the virus [16], [17]. On the other hand, computer science researchers have a critical role in the development of new methods to support virus diagnostics [18], [19]. Several innovations have been developed using a set of numerous technologies, such as the development of mobile applications to monitor and track the interaction between people [20], [21]. Artificial intelligence (AI) is also crucial in this field and to develop solutions to support diagnosis [22]. AI methods have been used to create automated systems for COVID-19 diagnosis [23], [24], [25]. These methods will never replace human care. However, they can be a relevant solution to combat the virus. AI is widely used in the medicine domain [26]. Despite the ethical concern regarding the application of AI with patients’ data in the current pandemic scenario, these methods should be used to support medical staff [27], [28], [29]. The stress factors that affect medical professionals during this pandemic scenario concerning the increase of patients in the hospitals are significantly affect their work and performance [30]. Consequently, it is necessary to create novel methods that can support their work. Nowadays, researchers are using CNN (Convolutional Neural Network), a class of deep learning neural networks for several applications [31], [32], [33]. CNNs have an input layer, an output layer, and hidden layers [34]. The hidden layers usually consist of convolutional layers, ReLU layers, pooling layers, and fully connected layers [35], [36]. CNN’s represents a huge breakthrough in automatic image classification systems as there is no need for pre-processing the images, that was needed in traditional machine learning algorithms [37], [38], [39]. The main objective of this research is to propose a medical decision support system using CNNs. To the best of the authors’ knowledge, there is no similar study that proposes an automated method to the detected COVID in CT X-ray images using EfficientNet [40], [41]. Therefore, the main contribution of this paper is to present an automated medical diagnosis system implemented using EfficientNet. Numerous studies and applications have been reported in the literature. The authors believe that in the critical pandemic scenario is crucial to share all the methods and materials to allow the readers to reproduce the results. The authors share the python scripts developed in the Google Colab platform. In this way, the software is accessible to all the readers who can execute the scripts for future research activities. This architecture can be used for transfer learning, and it is more efficient than most of its predecessors such as VGG (e.g. VGG16 or VGG19), GoogLeNet (e.g. InceptionV3), and Residual Network (e.g. ResNet50) [42]. The EfficientNet model consists of 8 models from B0 to B7, with each subsequent model number referring to variants with more parameters and higher accuracy. EfficientNet architecture uses transfer learning to save time and computational power. Consequently, it provides higher accuracy values than the competitor known models. This is due to the use of a clever scaling at depth, width, and resolution. The authors have used the B4 model, as it contains 19 m parameters, that is feasible for our experimental setup, as B5, B6 and B7 include 30M, 43M and 66M params, respectively [41]. Furthermore, the authors have used separated datasets to validate the proposed CNN models using images that are not included during the testing and training phase. The proposed model has been evaluated using stratified cross-validation 10-fold stratified. This paper presents two main experiments including different datasets for testing and validation to ensure the non-occurrence of overfitting. First, the binary classification results using images from COVID-19 patients and normal patients are shown. Second, the multi-class results using images from COVID-19, pneumonia and normal patients are discussed. The source code is provided in this document as supplementary files. The remainder of this paper is structured as follows. Section 2 introduces the related work. The materials and methods used in this research are described in Section 3. Section 4 presents the results of the proposed CNN model. The discussion and comparison of the proposed method with the related work available in state of the art are presented in Section 5. Finally, the conclusions are presented in Section 6. Related work on COVID-19 detection systems.

Related work

Numerous researchers are working at their best effort using AI technologies to develop novel systems to support COVID-19 diagnosis. These studies aim to create new automated systems for COVID-19 diagnosis. These methods should be used to support medical staff in the current pandemic scenario. Furthermore, machine learning technologies can be used to decrease the stress factors that affect medical professionals during this pandemic scenario concerning the increase of workflow in healthcare facilities. Ozturk et al. [43] propose an automated detection system for COVID-19 cases using deep neural networks and chest X-ray images. The proposed method is based on the DarkNet model for real-time detection and implements 17 convolutional layers. This study aims to support the decision making of radiologists to validate their screening process. The heatmaps produced by this automated system have been evaluated by radiologists. The dataset used includes 1125 images. In total, 125 samples are used for COVID-19 class, 500 for pneumonia class and for 500 normal class. The authors have used 5-fold cross-validation to validate the performance of the proposed method. The average accuracy of 98.08% and 87.02% is reported for binary and multi-class classification. The limitation on the number of samples used for COVID-19 class is reported by the authors. A deep learning model to improve the accuracy of binary classification of COVID-19 is proposed in [44]. The proposed CNN was implemented based on the VGG-19 classifier. The dataset used includes a total of 364 X-ray scans. The model performance has been validated using random sampling. The ratio used for train, validation and testing was 80:20:20. The number of samples for normal class and COVID-19 was 233 and 115 during training, 56 and 32 during validation, and 75 and 34 during testing. The results show an accuracy of 96.3%. The limited number of samples for COVID-19 cases is stated by the authors. Apostolopoulos et al. [45] propose a transfer learning approach using VGG-19 and MobileNet v2 for automated detection of patients with pneumonia and COVID-19. Two different datasets have been included in this study. One dataset of a total of 1427 samples that include 504, 700 and 224 images of normal, pneumonia and COVID-19, respectively, have been used. On the other hand, a different dataset of 224 samples for COVID-19, 714 samples of pneumonia patients and 504 of normal patients is also included. The 10-fold cross-validation has been used to evaluate the proposed models. The VGG-19 and MobileNet v2 reported 98.75% and 97.40% of accuracy for binary classification and 93.48% and 92.85% for multi-class concerning the first dataset. Furthermore, the MobileNet v2 has been applied in the second dataset presenting an accuracy of 96.78% for binary classification and 94.72% for multiclass. The authors state that a more in-depth analysis using more patient data concerning COVID-19 individuals is required. The authors of [46] proposed a fast screen system for COVID-19 detection based on deep learning neural networks. The presented method is based on the nCOVnet and uses chest X-rays images. The dataset included in this study has a total of 337 samples. In total, 192 of the samples are from COVID-19 positive patients and 142 images of normal patients. The model’s performance has been evaluated using random sampling using 70% for training and 30% for testing. The proposed system for binary classification provides an accuracy of 88.10%. The authors of this study state the limitations on the number of samples included in the used dataset. A novel artificial neural network system for COVID-19 detection is proposed in [47]. The proposed method is based on Convolutional CapsNet and uses chest X-ray images. The system includes binary and multi-class classification features. The dataset used includes a total of 3150 samples, 1050 images each class (normal, pneumonia and COVID-19). The 10-fold cross-validation is used to evaluate the performance of the proposed system. The results show an accuracy of 97.24% and 84.22% for binary and multi-class classification. The limitations reported by the authors focus on the hardware resources needed to process a massive number of images and the processing time. Nour et al. [48] propose a novel medical diagnosis model of COVID-19 to support clinical applications. The system is based on deep features and Bayesian optimization. The CNN model is applied for automated extraction of features that are often processed by different machine learning methods such as kNN, SVM and Decision Tree. The used dataset contains 2033 of samples, 135 for COVID-19, 939 for normal class and 941 for pneumonia. The authors have used data augmentation to increase the number of samples concerning COVID-19 class. The performance of the proposed system has been evaluated using 70% and 30% of the dataset for training and testing, respectively. The proposed CNN presents an accuracy of 97.14%. In [49], the authors have used augmentation for increasing the size of training dataset by using stationary wavelets and compared different transfer learning CNN architectures. The dataset used includes 349 samples for COVID-19 and 397 samples for normal class. The authors also applied data augmentation techniques to increase the number of samples for both classes. In this study, 70% of the samples are used for training, and 30% have been considered for validation. The proposed method provides 99.4% accuracy during testing for a binary classifier using the ResNet 18 model. Konar et al. [50] propose a semi-supervised shallow neural network model for automated diagnostic of COVID-19. This study included two datasets. One of them consists of a total of 2482 samples, from these 1252 samples are from COVID-19 positive patients, and 1230 are from not infected patients. The second dataset includes 20 samples of COVID-19 positive patients. The proposed model has tested using random sampling with a ratio of 70% for training and 30% for testing. Moreover, the model has also been evaluated using 5 and 10-fold validation. The proposed method presents an accuracy of 93.1%. In summary, several methods have been proposed in the literature for the automated diagnostic of COVID-19. These studies use different number of images and datasets from multiple sources. Moreover, different approaches have been used to evaluate the performance of the models such as cross-validation and random sampling. Most of the studies state the limitation associated with the number of samples to conduct the experiments. Table 1 summarizes the related work on COVID-19 detection systems.

Table 1

Related work on COVID-19 detection systems.

Reference	Model	Data used	Number of images	Classification	Evaluation method
[43]	DarkNet model	Chest X-ray	1125 — total; 125 — COVID-19; 500 — Pneumonia; 500 — Normal	Binary and Multi-class	5-fold cross-validation
[44]	VGG-19	Chest X-ray	545 — total; 181 — COVID-19; 364 - Normal	Binary	Random sampling 80:20:20 for train, validation and testing.
[45]	VGG-19 and MobileNet v2	Chest X-ray	Dataset 1: 1427 — total; 504 — Normal; 700 — Pneumonia; 224 — COVID-19 Dataset 2: 1442 — total; 504 — Normal; 714 — Pneumonia; 224 — COVID-19	Binary and Multi-class	10-fold cross-validation
[46]	nCOVnet	Chest X-ray	337 — total; 192 — COVID-19; 142- Normal	Binary	Random sampling 70% for training and 30% for testing
[47]	CapsNet	Chest X-ray	3150 — total; 1050 — Normal; 1050 — Pneumonia; 1050 — COVID-19	Binary and Multi-class	10-fold cross-validation
[48]	Proposed CNN	Chest X-ray	2033 — total; 135 — COVID-19, ; 939 — Normal; 941 — Pneumonia.	Binary and Multi-class	Random sampling 70% for training and 30% for testing
[49]	ResNet 18	Chest X-ray	746 — total 349 — COVID-19 397 — Normal.	Binary	Random sampling. 70% for training and 30% for testing.
[50]	Proposed Semi-supervised model	Chest X-ray	Dataset 1: 2482 — total; 1230 — Normal; 1252 — COVID-19 Dataset 2: 20 — COVID-19	Binary	Random sampling 70% for training and 30% for testing

Methods and materials

This section presents the methods and materials used in this study. Section 3.1 details the datasets of X-ray images used to test and train the proposed method. The proposed CNN is presented in Section 3.2. Finally, the validation method and experimental setup are presented in Section 3.3.

X-ray Image DataSet

The samples used to train and test the proposed method have been collected from public datasets. It is critical to ensure the equal number of samples that cover the analyzed classes to properly validate the performance of the model. Consequently, the authors have used the same number of images to train and test. Table 2 presents the reference and number of images used by the authors. In total, 404 samples have been used corresponding to Normal, Pneumonia and COVID-19. These samples have been used in the stratified 10-fold cross-validation.

Table 2

Dataset information.

Class	Reference	Number of images for training/ testing	Number of images for validation
NORMAL	[51]	404	96
PNEUMONIA		404	100
COVID-19	[52]	404	100

Furthermore, the authors have tested the model using a separate dataset for validation. The number of samples used to validate the proposed model was 96 for normal class and 100 for pneumonia and COVID-19 class. These datasets were not used in the training phase of the model, and this experiment has been conducted to test the non-occurrence of overfitting. On the one hand, pneumonia and normal images have been retrieved from the dataset available in [51]. This dataset is public and contains validated chest X-ray images of the pneumonia patients and normal patients. It is freely available on the Kaggle website. This dataset only contains the folders named as Pneumonia and Normal, and there is no other information available. The authors have downloaded these folders directly from the Kaggle website to be included in the proposed work. On the other hand, the COVID-19 Image DataSet has been used to retrieve the COVID-19 positive samples and is available in [52]. It is a public dataset of validated chest X-ray images of COVID-19 positive patients. It is available on GitHub database repository. The images are collected from public sources as well as through indirect collection from hospitals and physicians. This project is approved by the University of Montreal’s Ethics Committee (CERSES-20-058-D). The Research community is adding images continuously in this dataset. The main purpose of developing this dataset is to improve prognostic predictions to triage and manage patient care. The authors used only two attributes, the findings column to identify COVID-19 images and the name of the image. Dataset information.

Proposed CNN

The authors have used the EfficientNetB4 model for the transfer learning process and added a global_average_pooling2d layer to minimize overfitting by reducing the total number of parameters. In addition to this, a sequence of 3 inner dense layers with RELU activation functions and dropout layers have been added. In total, a 30% dropout rate has chosen randomly to avoid overfitting. Finally, one output dense layer contains two output units in case of binary classification, and 3 output units for multi-class classification, with softmax activation function that has been added to create the proposed automated detection system. The details of the layers and their order in the proposed model, output shape of each layer, the number of parameters (weights) in each layer, and the total number of parameters (weights) are presented in Table 3. The total number of parameters is 17,913,755.

Table 3

Layer types and parameters used in the proposed model.

Layer (type)	Output shape	Param #
EfficientNetB4 (Model)	7 ×7× 1792	17,673,816
global_average_pooling2d	1792	0
dense (Dense)	128	229,504
dropout (Dropout)	128	0
dense_1 (Dense)	64	8256
dropout_1 (Dropout)	64	0
dense_2 (Dense)	32	2080
dropout_2 (Dropout)	32	0
dense_3 (Dense)	2/3	99
Total Parameters: 17,913,755
Trainable Parameters: 17,788,555
Non-trainable Parameters: 125,200

All the software and libraries used in the proposed work are open source. To reproduce the results, the readers should use Google Colab Notebook using the GPU run time type. This software can be used without costs since it is provided by Google for research activities using a Tesla K80 GPU of 12 GB. The EfficientNet Models are pre-trained, scaled CNN models that can be used for transfer learning in image classification problems. The model is developed by Google AI in May 2019 and is available from Github repositories. The ImageDataAugmentor is a custom image data generator for Keras which supports augmentation modules. It is also developed by Google AI and is available from Github repositories. Finally, the Albumentations library is also developed by Google AI and can be installed from Github Repositories. In summary, the software used can be used without license concerns as it is free and open source. Layer types and parameters used in the proposed model. The authors have used three different main libraries in the proposed method. These libraries include the EfficientNet module, the Albumentation module and the ImageDataAugmentor module. On the one hand, the EfficientNet Models are based on simple and highly effective compound scaling methods. This method enables to scale up a baseline ConvNet to any target resource constraints while maintaining model efficiency, used for transfer learning datasets. In general, EfficientNet models achieve both higher accuracy and better efficiency over existing CNNs such as AlexNet, ImageNet, GoogleNet, and MobileNetV2 [41]. EfficientNet could serve as a new foundation for future computer vision tasks. There is no similar study that uses EfficientNet for transfer learning concerning COVID-19 classification to the best of authors knowledge until this date. EfficientNet includes models from B0 to B7, and each one has different parameters from 5.3M to 66M. The authors used EfficentNetB4 that contains 19M parameters, as it is suitable according to our resources and purpose. On the other hand, the Albumentation library is widely used in industry, deep learning research, machine learning competitions, and open source projects. This module efficiently implements a variety of image transform operations that are optimized for performance. This library provides an image augmentation interface for different computer vision tasks, including object classification, segmentation, and detection. The authors have used the Compose method of the Albumentaion library. This library reduces overfitting, improve the performance of classifiers and the decrease execution time [53]. After the implementation of this library for augmentation purposes in each fold, the model accuracy of the model has increased, and the processing time decreased. Finally, the ImageDataAugmentor is a custom image data generator for Keras supporting the use of modern augmentation modules (e.g. imgaug and albumentations) [54]. This library is used to configure the Image data generator according to the albumentations settings to decrease execution time. Data generator is created by using the constructor of ImageDataAugmentor class with two arguments. One is rescale whose value is set as 1/255 to transform every pixel value from range [0, 255] to [0, 1]. Another is the augment value that is configured as the output of the compose function of the Albumentation library. Data generator has used further to process the image datasets. Fig. 1 presents the block diagram of the proposed work.

Fig. 1

Block Diagram of the proposed work.

Validation and experimental setup

The model has been validated in two different phases. On the one hand, the 10-fold cross-validation method has been using the same dataset for training and testing. On the other hand, a separate dataset which contains samples that have not been used during the training phase has been applied to validate the performance of the model. The confusion matrix has been extracted. Consequently, the precision, recall and F1-score have been computed concerning the separated classes. Finally, the average values for each fold have been calculated. The experimental setup used to conduct this study is detailed in Algorithm 1.

Results

The experiments were carried out on Google Colab notebook using GPU run time type. The training of the proposed CNN model was realized using stratified 10-fold cross-validation method. In total, 11 epochs were used in each fold. Moreover, 69 steps for multi-class and 46 steps for binary classification are used in each epoch. The mini-batch size used was 16. The training of the model was completed in a total of 7590 iterations for multi-class and 5060 iterations for binary class. The time elapsed for the training of the model was 111.83 min for multi-class CNN and 79.16 min for binary class. The initial learning rate was 0.0001. The authors employed a ReduceLROnPlateau method since it reduces the learning rate when it stops improving. This callback monitors the improvement, and if no improvement is verified for a ‘patience’ number of epochs, the learning rate is reduced. The authors have defined patience=3 and the min_lr=0.000001 in the proposed method. The ADAM optimization method was used as a solver. The training and validation graphs with the loss, confusion matrix, and area under the curve of receiver operating characteristics for each fold of the proposed CNN can be verified from the supplementary files. After training of all the 10-fold CNN models, the best model is identified and used for the validation testing, by using different datasets. The performance reported in the validation experiment is promising. The scripts and detailed information concerning the experiments can be consulted in the supplementary files. It is crucial to provide all the details about the methods and materials used to allow the readers to reproduce the results. Therefore, the python scripts developed in a Google Colab platform are included as supplementary files. The supplementary file A presents the results for binary classification, and the supplementary file B contains the results for multi-class classification.

Experimental results of binary classification

In this section, the results for the binary classification are presented. In total, 728 images have been used for training, and 80 images have been used for testing. The results are presented for each fold, and the average value is also reported. The accuracy, precision, recall and f1-score results are presented for each class and for both classes. Table 4 presents the results for the binary classification considering the COVID-19 class. The lower performance values of the model concerning the COVID-19 class samples are reported for the 2, 3 and 8-fold. The lower precision values of 97.61% and 97.56% occurred in the 2-fold and 8-fold, respectively. Moreover, the minimum recall value is 97.56% in 3-fold. The F1-score is 98.79% at 2-fold and 98.79% at 3-fold and 8-fold. The average precision, recall, and F1-score are 99.51%, 99.75% and 99.63%, respectively.

Table 4

Results of binary classification for COVID-19 class.

Fold	Precision	Recall	F1 score
1	100%	100%	100%
2	97.61%	100%	98.79%
3	100%	97.56%	98.76%
4	100%	100%	100%
5	100%	100%	100%
6	100%	100%	100%
7	100%	100%	100%
8	97.56%	100%	98.76%
9	100%	100%	100%
10	100%	100%	100%
Average	99.51%	99.75%	99.63%

The performance result for binary classification concerning the normal class is presented in Table 5. The lowest recall value is 97.56% reported for the 3-fold. Furthermore, the precision is 97.61% and 97.56% for 2-fold and 3-fold, respectively. Finally, the F1-score is 98.79% at 2-fold and 98.76% for 3-fold and 8-fold.

Table 5

Results of binary classification for normal class.

Fold	Precision	Recall	F1 score
1	100%	100%	100%
2	97.61%	100%	98.79%
3	97.56%	100%	98.76%
4	100%	100%	100%
5	100%	100%	100%
6	100%	100%	100%
7	100%	100%	100%
8	100%	97.56%	98.76%
9	100%	100%	100%
10	100%	100%	100%
Average	99.51%	99.75%	99.63%

Results of binary classification for COVID-19 class. Table 6 presents the accuracy, precision, recall and F1-score results for binary classification for both classes. The average accuracy reported is 99.62%. Moreover, 99.64%, 99.63% and 99.62% are reported concerning precision, recall, and F1-score, respectively.

Table 6

Results of binary classification between classes.

Fold	Accuracy	Precision	Recall	F-1 Score
1	100%	100%	100%	100%
2	98.76%	98.88%	98.75%	98.76%
3	98.76%	98.78%	98.78%	98.76%
4	100%	100%	100%	100%
5	100%	100%	100%	100%
6	100%	100%	100%	100%
7	100%	100%	100%	100%
8	98.76%	98.78%	98.78%	98.76%
9	100%	100%	100%	100%
10	100%	100%	100%	100%
Average	99.62%	99.64%	99.63%	99.62%

Results of binary classification for normal class. Results of binary classification between classes.

Experimental validation results of classification

An external dataset which contains samples that have not been used during the training phase has been applied to validate the performance of the model and ensure that the model is not overfitted. This external dataset contains 96 samples for normal class and 100 samples for COVID-19 class. To the best of the author’s knowledge, there is no similar study that has used this method to validate the proposed CNN model. The trained model of 10-fold has been used for validation. The confusion matrix for the cross-validation test is presented in Fig. 2.

Fig. 2

Confusion matrix of the validation testing for the binary classifier.

The precision value is 100% for COVID-19 class and 98.96% for normal class. Moreover, the reported recall value is 99% for COVID-19 class and 100% for normal class. Finally, the F1-score is 99.49% for COVID-19 class and 99.48% for normal class. The average accuracy is 99.49%. Therefore, this experiment ensures the efficiency of the proposed method. The receiver operating characteristic for binary-class data is presented in Fig. 3.

Fig. 3

ROC curve of the validation testing for the binary classifier.

Confusion matrix of the validation testing for the binary classifier. ROC curve of the validation testing for the binary classifier.

Experimental results of multi-class classification

The classification results for multi-class are presented in this section. In total, 1092 samples have been used for training, and 122 images have been used for testing. Three different classes are analyzed, such as COVID-19, normal and pneumonia. Table 7 presents the classification results concerning the COVID-19 class. On the one hand, the lower precision value of 97.56% has been reported for 8, 9 and 10-fold. On the other hand, the lower F1-score of 98.76% is reported for 8, 9 and 10-fold. The average precision recall and F1-score values are 99,26% 100% and 99,62%, respectively.

Table 7

Results of multi-class classification for COVID-19 class.

Fold	Precision	Recall	F1 Score
1	100%	100%	100%
2	100%	100%	100%
3	100%	100%	100%
4	100%	100%	100%
5	100%	100%	100%
6	100%	100%	100%
7	100%	100%	100%
8	97.56%	100%	98.76%
9	97.56%	100%	98.76%
10	97.56%	100%	98.76%
Average	99.26%	100%	99.62%

The results for normal class are presented in Table 8. The maximum precision value is 100% reported at the 6-fold, and the minimum precision value is 92.50% in 10-fold. Moreover, the lower recall result is 92.50% at 10-fold. Finally, the F1-score ranges from 92.50% at 10-fold to 98.79% at 8-fold. The average results of 96.06%, 96.51% and 96.27% are presented for precision, recall and F1-score, respectively.

Table 8

Results of multi-class classification for normal class.

Fold	Precision	Recall	F1 Score
1	92.68%	95.00%	93.82%
2	95.00%	95.00%	95.00%
3	97.50%	97.50%	97.50%
4	95.00%	95.00%	95.00%
5	97.50%	97.50%	97.50%
6	100%	97.56%	98.76%
7	95.34%	100%	97.61%
8	97.61%	100%	98.79%
9	97.50%	95.12%	96.29%
10	92.50%	92.50%	92.50%
Average	96.06%	96.51%	96.27%

Results of multi-class classification for COVID-19 class. The results for pneumonia class are presented in Table 9. These results show that precision ranges from 95% at the 1-fold to 100% at 6,7, and 8-fold. The lowest value for recall is 87.80% at 10-fold. Finally, the better F1-score value is reported at 6-fold, and the lower value is 92.30% presented in 10-fold. The average values reported for precision, recall and f1-score concerning pneumonia samples are 97.47%, 93.58% and 95.45%, respectively.

Table 9

Results of multi-class classification for pneumonia class.

Fold	Precision	Recall	F1 Score
1	95.00%	92.68%	93.82%
2	95.12%	95.12%	95.12%
3	97.5%	95.12%	96.29%
4	95.12%	95.12%	95.12%
5	97.43%	95.00%	96.20%
6	100%	100%	100%
7	100%	92.50%	96.10%
8	100%	92.50%	96.10%
9	97.29%	90.00%	93.50%
10	97.29%	87.80%	92.30%
Average	97.47%	93.58%	95.45%

Results of multi-class classification for normal class. The average score values between classes such as COVID-19, normal and pneumonia are presented in Table 10. The average accuracy, precision, recall and F1-score are 96.70%, 97.59%, 96,69% and 97.11%, respectively. On the one hand, the highest average precision is reported at 6-fold, and the lowest value is 95.78% presented at 10-fold. On the other hand, the lower result concerning recall is 93.43% presented at 10-fold, and the higher value is reported at 99.18% at 6-fold. Furthermore, the average F1-score ranges from 94.52% at 10-fold to 99.58% at fold-6-fold. The results presented in Table 9 ensure the performance of the proposed method for automated medical diagnosis.

Table 10

Results of multi-class classification between all classes.

Fold	Accuracy	Precision	Recall	F-1 Score
1	95.90%	95.89%	95.89%	95.88%
2	96.72%	96.70%	96.70%	96.70%
3	97.54%	98.33%	97.54%	97.93%
4	96.72%	96.70%	96.70%	96.70%
5	97.52%	98.31%	97.50%	97.90%
6	99.17%	100%	99.18%	99.58%
7	97.52%	98.44%	97.50%	97.90%
8	97.52%	98.39%	97.50%	97.88%
9	95.04%	97.45%	95.04%	96.18%
10	93.38%	95.78%	93.43%	94.52%
Average	96.70%	97.59%	96.69%	97.11%

Results of multi-class classification for pneumonia class. Results of multi-class classification between all classes.

Experimental validation results of multi-class classification

Similarly to the experiment presented in Section 4.2, an external dataset has been applied to validate the performance of the model. This external dataset contains 96 samples for normal class, 100 samples for COVID-19 class and 100 samples for pneumonia class. The trained model of 10-fold has been used for validation. The extracted confusion matrix for the cross-validation test for multi-class classification is presented in Fig. 4.

Fig. 4

Confusion matrix of the validation testing for the multi-class classifier.

The average reported accuracy is 96.62%. The results for precision, recall and F1-score for COVID-19 class is 98.00%. Moreover, 98.92%, 95.83% and 97.35% are reported for precision, recall and F1-score, respectively, concerning samples for normal class. Finally, the precision, recall and F1-score for pneumonia class reported are 96.00%. The receiver operating characteristic for multi-class data is presented in Fig. 5.

Fig. 5

ROC curve of the validation testing for the multi-class classifier.

Confusion matrix of the validation testing for the multi-class classifier. In summary, the results of the proposed CNN model for automated medical diagnostics support are promising. The reported average accuracy values for binary and multi-class are 99.62% and 96.70%, respectively. On the one hand, the proposed CNN model using EfficientNet architecture presents an average recall value of 99.63% and 96.69% concerning binary and multi-class, respectively. On the other hand, the average precision of 99.64% is reported by binary classification, and 97.54% is presented in multi-class. Finally, the average F1-score value for multi-class is 97.11%, and 99.62% is presented for binary classification. ROC curve of the validation testing for the multi-class classifier.

Discussion

The proposed model is compared with the related work. Indeed, it is crucial to mention this comparison is limited concerning the differences in the samples used and the parameters of the machine learning methods. Furthermore, since most of the studies do not provide the software, it is not possible to compare the methods applied by the studies available in the literature using the samples included in our research or vice-versa. The number of new methods proposed in the literature by computer science researchers increases every day. Currently, the focus on these systems to support health professionals is a trending topic. Different architectures such as DarkCovidNET, VGG19, MobileNet v2, CapsNet and nCOVnet have been proposed for automated medical diagnosis of COVID-19. This section aims to compare the proposed model results with similar studies available in state of the art. Table 11 presents the results presented for binary classification studies available in the literature.

Table 11

Comparison of the state-of-art models for binary classification.

Ref.	Architecture	Accuracy	Recall	Specificity	Precision	F1-Score
[43]	DarkCovidNet	98.08%	95.13%	95.3%	98.03%	96.51%
[44]	VGG19	96.33%	97.05%	96.0%	91.6%	94.24%
[45]	VGG19	98.75%	92.85%	98.75%	–	–
[45]	MobileNet v2	97.40%	99.10%	97.09%	–	–
[46]	nCOVnet	88.10%	82.00%	97.06%	97.62%	89.13%
[47]	CapsNet	97.24%	97.42%	97.04%	97.08%	97.24%
[49]	RestNet 18	99.4%	100%	98.6%	99.00%	99.5%
[50]	Semi-supervised model	93.1%	83.5%	–	89.0%	82.6%
Proposed	EfficientNet	99.62%	99.63%	-	99.64%	99.62%

The proposed method outperforms all the works concerning binary classification. Nevertheless, the authors of [45] proposed an architecture with a relevant accuracy of 98.75%. Moreover, the method proposed in [43] also presents a significant accuracy of 98.08%. Consequently, the implementation of the EfficientNet architecture presents promising results for automated medical diagnosis of COVID-19 concerning binary classification. Different methods for multi-class classification of COVID-19 patients are presented in Table 12.

Table 12

Comparison of the state-of-art models for multi-class classification.

Ref.	Architecture	Accuracy	Recall	Specificity	Precision	F1-Score
[43]	DarkCovidNet	87.02%	85.35%	92.18%	89.96%	87.37%
[45]	VGG19	93.48%	92.85%	98.75%	–	–
[45]	MobileNet v2	92.85%	99.10%	97.09%	–	–
[47]	CapsNet	84.22%	84.22%	91.79%	84.61%	84.21%
[48]	Proposed CNN	97.14%	94.61%	98.29%	–	95.75%
Proposed	EfficientNet	96.70%	96.69%	-	97.59%	97.11%

Comparison of the state-of-art models for binary classification. The accuracy levels proposed by the methods in Table 11 range from 84.22% [47] to 97.14% [48]. When compared with the system proposed by the authors of [48] our method provides less accuracy but higher recall and F1-score. The authors of [48] present a recall of 94.61% and an F1-score of 95.75%. The proposed method provides 96.69% and 97.11% concerning recall and F1-score, respectively. Comparison of the state-of-art models for multi-class classification. The authors have used stratified 10-fold cross-validation to evaluate the proposed models. On the one hand, Table 3 contains the precision, recall, and F1 score of each fold for the binary classification considering the COVID-19 class. From the analysis of Table 3 the authors identified the best model, and consequently choose the model trained in the 10-fold for validation using samples that have not been including in the training process. On the other hand, Table 6 presents the precision, recall, and F1 score of each fold for the multi-class classification considering the COVID-19 class. Based on the results of Table 6 the authors selected the model trained in the 6-fold for validation using samples that have not been included in the training process. The output of validation testing is detailed in Sections 4.2, 4.4 for binary and multi-class, respectively. Fig. 2 presents the confusion matrix and Fig. 3 shows the AUC-ROC curve concerning binary classification. The results presented an accuracy of 99.49% on the validation process is similar to the presented testing accuracy of 99.62%, which proves the model is not overfitted. Fig. 4 shows the confusion matrix, and Fig. 5 presents the AUC-ROC curve. The achieved accuracy during the validation process of multi-class is 96.62%, that is similar to the 96.70% accuracy reported during testing and proves that the proposed model is not overfitted. To the best of the authors’ knowledge, there is no automated system for COVID-19 diagnosis in the literature that includes the combination of the following features. On the one hand, the proposed model uses EfficientNet for transfer learning. On the other hand, the proposed methods are evaluated using 10-fold stratified cross-validation method. This method is used for selecting the images for testing and training. It reduces bias and ensures that all the images are used 9 times for training and 1 time for testing. On the other hand, the proposed model includes validation using an external dataset of images. In total, 296 images that have been not used during the training of the model are used for cross-validation. The validation results state similar accuracy as expected, which proves the model is not overfitted. This is detailed in Sections 4.2, 4.4. Furthermore, the Albumentation library is used pre-processing images during each fold. This library is not used to increase the size of the datasets as presented in the related work. Instead, Albumentation is implemented to increase transfer learning performance, reduce overfitting, and improve execution time. Finally, the source code of all the experiments is available as supplementary files to allow the readers to reproduce the experiments. The proposed models have been developed and testing using Google Colab. Therefore, the software can be executed on the Google’s cloud servers. The proposed study includes ADAM optimization method. ADAM benefits of AdaGrad and RMSProp methods. Most of the similar studies also used ADAM optimizer. Furthermore, Adam is currently recommended as the default algorithm as it usually presents better results than RMSProp. Nevertheless, it is often also worth trying SGDNesterov Momentum as an alternative. The authors aim to integrate ADAM optimizer with SOM (Self-Organization Map) and PCA (Principal Component Analysis) to improve performance as proposed by the authors of [55]. In the proposed work the authors have used 10-fold stratified cross-validation and an Albumentation library for performing augmentations to pre-processing images in each fold, and not for increasing the size of training datasets as proposed the authors of [48], [49]. The authors are able to use all the images at least once for training and testing both and also increasing the learning of the model. Augmentation is used for two purposes such as increasing the training set size, and another is the k-fold cross validation for pre-processing images in each fold to improve the performance of the model as proposed in [53]. In summary, the authors state the promising results of the EfficientNet architecture for automated diagnosis of COVID-19 for binary and multi-class classification. Moreover, the authors recommend the use of Albumentation and ImageDataAugmentor modules. The presented work contributes to the actual body of knowledge since it provides an effective solution for automated diagnosis of COVID-19. Systems such as the proposed will never aim to replace the medical professionals. Instead, these methods will support them and also reduce their exposure during the current pandemic scenario.

Conclusion

This paper has presented an automated system to support the diagnosis of COVID-19 patients. The proposed method implements EfficientNet architecture and has been tested using 10-fold cross-validation. Furthermore, an external dataset has been used for validation. On the one hand, the average accuracy, recall, precision and F1-score for binary classification is of 99.62%, 99.63%, 99.64% and 99.62%, respectively. On the one hand, the proposed model presents an average accuracy of 96.70% for multi-class. The average recall, precision and F1-score reported by our method is 96.69%, 97.59% and 97.11%. To the best of the authors’ knowledge, there is no similar study that proposes an automated method to the detected COVID-19 using EfficientNet. The authors state relevant limitations transversal to all the methods available in the literature. First, although the critical number of individuals that have been affected by COVID-19, the datasets available are not yet robust. Nevertheless, the empirical knowledge regarding CNN applications state that the performance will increase with the proliferation of the number of samples available and used for training. Moreover, it is critical to study in detail the evaluation of this type of methods considering the evolution of the disease in the patient. The proposed methods can be effective to detect the disease in the advanced stage, but it is essential to focus on the initial stage where the methods can have lower performance. The coding scripts are provided as supplementary files. It is crucial to share all the methods and materials to allow the readers to reproduce the results. Moreover, in this way, it is possible to support future research activates. The readers can consult, update and implement different parameters to improve the results.

CRediT authorship contribution statement

Gonçalo Marques: Literature analysis, Interpretation of results, Preparation of the manuscript. Deevyankar Agarwal: Literature analysis, Interpretation of results, Preparation of the manuscript. Isabel de la Torre Díez: Literature analysis, Interpretation of results, Preparation of the manuscript, Project administration.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

45 in total

1. Automatic Localization and Brand Detection of Cervical Spine Hardware on Radiographs Using Weakly Supervised Machine Learning.

Authors: Raman Dutt; Dylan Mendonca; Huai Ming Phen; Samuel Broida; Marzyeh Ghassemi; Judy Gichoya; Imon Banerjee; Tim Yoon; Hari Trivedi
Journal: Radiol Artif Intell Date: 2022-01-19

2. A Survey on Machine Learning and Internet of Medical Things-Based Approaches for Handling COVID-19: Meta-Analysis.

Authors: Shahab S Band; Sina Ardabili; Atefeh Yarahmadi; Bahareh Pahlevanzadeh; Adiqa Kausar Kiani; Amin Beheshti; Hamid Alinejad-Rokny; Iman Dehzangi; Arthur Chang; Amir Mosavi; Massoud Moslehpour
Journal: Front Public Health Date: 2022-06-23

3. Hierarchical attention network for multivariate time series long-term forecasting.

Authors: Hongjing Bi; Lilei Lu; Yizhen Meng
Journal: Appl Intell (Dordr) Date: 2022-06-17 Impact factor: 5.019

4. COVID-19 Isolation Control Proposal via UAV and UGV for Crowded Indoor Environments: Assistive Robots in the Shopping Malls.

Authors: Muhammet Fatih Aslan; Khairunnisa Hasikin; Abdullah Yusefi; Akif Durdu; Kadir Sabanci; Muhammad Mokhzaini Azizan
Journal: Front Public Health Date: 2022-05-31

5. Covid-19 Imaging Tools: How Big Data is Big?

Authors: K C Santosh; Sourodip Ghosh
Journal: J Med Syst Date: 2021-06-03 Impact factor: 4.460

6. Boosted EfficientNet: Detection of Lymph Node Metastases in Breast Cancer Using Convolutional Neural Networks.

Authors: Jun Wang; Qianying Liu; Haotian Xie; Zhaogang Yang; Hefeng Zhou
Journal: Cancers (Basel) Date: 2021-02-07 Impact factor: 6.639

7. Expeditious COVID-19 similarity measure tool based on consolidated SCA algorithm with mutation and opposition operators.

Authors: Mohamed Issa
Journal: Appl Soft Comput Date: 2021-02-20 Impact factor: 6.725

8. Using artificial intelligence techniques for COVID-19 genome analysis.

Authors: M Saqib Nawaz; Philippe Fournier-Viger; Abbas Shojaee; Hamido Fujita
Journal: Appl Intell (Dordr) Date: 2021-02-17 Impact factor: 5.019

9. Batch Similarity Based Triplet Loss Assembled into Light-Weighted Convolutional Neural Networks for Medical Image Classification.

Authors: Zhiwen Huang; Quan Zhou; Xingxing Zhu; Xuming Zhang
Journal: Sensors (Basel) Date: 2021-01-24 Impact factor: 3.576

10. An extended fuzzy decision-making framework using hesitant fuzzy sets for the drug selection to treat the mild symptoms of Coronavirus Disease 2019 (COVID-19).

Authors: Arunodaya Raj Mishra; Pratibha Rani; R Krishankumar; K S Ravichandran; Samarjit Kar
Journal: Appl Soft Comput Date: 2021-02-05 Impact factor: 6.725