Literature DB >> 33230395

CNN-based transfer learning-BiLSTM network: A novel approach for COVID-19 infection detection.

Muhammet Fatih Aslan¹, Muhammed Fahri Unlersen², Kadir Sabanci¹, Akif Durdu³.

Abstract

Coronavirus disease 2019 (COVID-2019), which emerged in Wuhan, China in 2019 and has spread rapidly all over the world since the beginning of 2020, has infected millions of people and caused many deaths. For this pandemic, which is still in effect, mobilization has started all over the world, and various restrictions and precautions have been taken to prevent the spread of this disease. In addition, infected people must be identified in order to control the infection. However, due to the inadequate number of Reverse Transcription Polymerase Chain Reaction (RT-PCR) tests, Chest computed tomography (CT) becomes a popular tool to assist the diagnosis of COVID-19. In this study, two deep learning architectures have been proposed that automatically detect positive COVID-19 cases using Chest CT X-ray images. Lung segmentation (preprocessing) in CT images, which are given as input to these proposed architectures, is performed automatically with Artificial Neural Networks (ANN). Since both architectures contain AlexNet architecture, the recommended method is a transfer learning application. However, the second proposed architecture is a hybrid structure as it contains a Bidirectional Long Short-Term Memories (BiLSTM) layer, which also takes into account the temporal properties. While the COVID-19 classification accuracy of the first architecture is 98.14%, this value is 98.70% in the second hybrid architecture. The results prove that the proposed architecture shows outstanding success in infection detection and, therefore this study contributes to previous studies in terms of both deep architectural design and high classification success.

Entities: Chemical Disease Gene Species

Keywords: AlexNet; BiLSTM; COVID-19; Hybrid architecture; Transfer learning

Year: 2020 PMID： 33230395 PMCID： PMC7673219 DOI： 10.1016/j.asoc.2020.106912

Source DB: PubMed Journal: Appl Soft Comput ISSN： 1568-4946 Impact factor: 6.725

Introduction

A new virus in the coronavirus family, named COVID-19, was spread from Asia to the world as a new wave of respiratory infections at the end of 2019. After the rapid worldwide spread of COVID-19 and severe clinical manifestations, the World Health Organization (WHO) officially declared COVID-19 an unprecedented health crisis and a pandemic. The COVID-19 pandemic has caused devastating economic consequences and threatened human lives since it first emerged. The outbreak has spread around the world, with the total number of cases and deaths reported worldwide to more than 36,000,000 and 1,000,000 respectively [1]. In this period, the severity and presence of pneumonia caused by COVID-19 have been evaluated in many research proposals. The studies aim to effectively detect the patients who are infected by COVID-19 as early as possible so that they must be isolated to prevent spread and they may receive appropriate treatment. While the main technique for COVID-19 diagnosis is the reverse transcription polymerase chain reaction (RT-PCR) test-kits, medical imaging such as Computed tomography (CT), magnetic resonance imaging (MRI), Positron Emission Tomography (PET), Ultrasound (US), and chest X-ray (CXR) is an important alternative way for detection. Both CT and CXR can indicate abnormalities of lung disease, including COVID-19 [2], [3]. Even though CXR is more accessible around the world hospitals, chest CT scan is more sensitive than CXR for early detection of COVID-19 disease changes, as well as for staging of the disease and monitoring progression [4], [5]. CT images, on the other hand, are considered a powerful analysis tool [6], [7] widely applied to biomedical imaging [8] and clinical diagnosis [9] and provide non-destructive 3D visualization of internal structures. However, the features of the community-acquired bacterial pneumonia is difficult to classify as the COVID-19 [10]. Average error, average (mean) fitness, average select size, standard deviation fitness, worst fitness, and best fitness are used for feature selection in too many studies such as Genetic Algorithm (GA) [11], Grey Wolf Optimizer (GWO) [12], Particle Swarm Optimization (PSO) [13], hybrid PSO–GWO [14], Bowerbird Optimizer (SBO) [15], Biogeography-Based Optimizer (BBO) [16], Bat Algorithm (BA) [17], Multiverse Optimization (MVO) [18], and Firefly Algorithm (FA) [19]. Nowadays, huge data sets can be evaluated much more easily with the emergence of deep learning models. As in many areas, the most preferred deep learning model in medicine is Convolutional Neural Networks (CNN)-based deep learning models [20]. When the patients learn early diagnosis, they can have the time for better medical care and better-personalized therapies [21]. CNNs, is a synthetic neural network, is one of machine learning algorithms. Several CNN models such as GoogLeNet [22], VGG-Net [23], ResNet [24], and AlexNet [25] are popular at image classification in the last decade. Ardakani, Kanafi, Acharya, Khadem and Mohammadi [26] presented the state-of-the-art CNN architectures mentioned above to differentiate COVID-19 cases from other (non-COVID) cases. According to the experiments, deep learning techniques using radiograph images could be a usable method to identify COVID-19. Many of the deep learning-based studies for the diagnosis of COVID-19 are only CNN-based, as can be seen in the Related Works section. CNNs can learn local response from temporal or spatial data, but lack the ability to learn sequential correlations. Unlike CNNs, Recurrent Neural Networks (RNNs) specialize in sequential modeling but cannot extract features in parallel. However, RNNs suffer from the vanishing gradients problem which prevents learning of long data sequences. Long short-term memory (LSTM) [27] is a kind of RNN architecture that are effectively solves the vanishing gradient problem. Moreover, it can learn long data sequences. Bidirectional long short term memory (BiLSTM) [28] stands for bi-directional LSTM; this means that the signal propagates backward as well as forward in time. Compared to BiLSTM, LSTM only uses historical context. Therefore BiLSTM can solve sequential modeling task better than LSTM [29]. In this research, deep learning architectures for COVID-19 infection detection are proposed. COVID-19 Radiography Database [30] is used as the dataset. First, ANN-based segmentation is applied to raw chest CT X-ray images before training to improve classification accuracy. As a result of the segmentation, the lung part of the raw image is cropped. In order to provide data diversity, the number of segmented images is increased with the data augmentation technique. Later, 85% of these images are given as input to architectures designed for training. Both architectures include the previously trained AlexNet architecture (transfer learning). The first architecture is AlexNet’s modified version in accordance with chest CT images. The second architecture includes the BiLSTM layer, which takes into account the temporal features in the image in addition to the first architecture. This study contributes to previous studies in terms of providing ANN-based lung segmentation, proposing a hybrid structure containing BiLSTM layer with transfer learning, and achieving high classification success. The contributions of this research are summarized below. Performing ANN-based automatic lung segmentation to obtain robust features, To develop a CNN-based transfer learning–BiLSTM network for early detection of COVID-19 infection. The proposed hybrid method is benchmarked against other state-of-the-art models. The proposed model is uncomplicated, it can easily detect COVID-19 completely automatically. The rest of the paper is organized as follows. Section 2 surveys the related work regarding COVID-19. The methodology and dataset are mentioned in Section 3. Performance evaluation and results are in Section 4. Discussion of the results, comparison with previous studies is described in Section 5. Finally, chapter 6 concludes the article and provides information on future works. Block diagram of the proposed algorithm. Sample images of COVID-19 Radiography Database.

Related works

In literature, there are many studies about artificial intelligence employed for various purposes like Alzheimer’s disease diagnosis, cancer estimation, biopsy and dermoscopy analysis etc. [31], [32], [33], [34], [35]. In recent times, the COVID-19 pandemic creates heavy work on health workers. Therefore, any help by artificial intelligence to physicians makes their works and decisions healthy. In this context, various methods are proposed in the literature to interpret the X-ray or Computer Tomography images in terms of COVID-19. Some of the previous studies can be summarized as follows: Khan and Aslam [36] proposed a tool based on deep-learning techniques for diagnosis process of COVID-19 by using X-ray images. In that study, some of the deep-learning models were investigated. It was reported that VGG-16 and VGG-19 models have the better performance than the others. Aman Jaiswal [37] used X-ray images to diagnose COVID-19 by various deep learning algorithms. In that study, CNN architectures performances were compared. Additionally a majority rule was suggested as a novel approach. The best performance in the paper was presented as 98.96%. Nour, Cömert and Polat [38] proposed CNN architecture based automatic diagnosing system for detecting positive COVID-19 via using X-ray images. The suggested CNN model, consisting of a five convolution layered serial network was trained from scratch. The CNN model extracted discriminative features used to feed machine learning algorithms like k-NN, SVM and DT. It was stated that the most efficient results were ensured by the SVM classifier with an accuracy of 98.97%. Chowdhury, Rahman, Khandakar, Mazhar, Kadir, Mahbub, Islam, Khan, Iqbal and Al-Emadi [30] presented deep CNN (DCNN) based transfer learning approach for automatic detection of COVID-19 pneumonia using X-ray images. Eight different popular CNN based deep learning algorithms (SqueezeNet, ResNet18, InceptionV3, etc.) were trained. In three-class study including data augmentation technique, the highest classification success was achieved with DenseNet as 97.94%. Asif and Wenhui [39] proposed a DCNN based automatic COVID-19 diagnosis system by using X-ray images. The X-ray images were applied to DCNN based model Inception-V3 with transfer learning without any pre-process. The classification accuracy of the diagnosis system was 96%. Toğaçar, Ergen and Cömert [40] used a deep learning model to detect COVID-19 by X-ray images. In that study, the classes were restructured using the Fuzzy Color technique as a preprocessing step and the images that were structured with the original images were stacked. Later, the stacked dataset was trained using MobileNetV2 and SqueezeNet deep learning models. The Support Vector Machines (SVM) method was employed for image classification. The classification rate obtained with the proposed approach was 99.27%. Ucar and Korkmaz [41] demonstrated an artificial intelligence based structure that uses chest X-ray images to estimate COVID-19 disease. The SqueezeNet was tuned for the COVID-19 diagnosis with the Bayesian optimization additive. Additionally the dataset was augmented. As a result of the study, the accuracy of COVID-19 classification was stated as 98.3%. Ozturk, Talo, Yildirim, Baloglu, Yildirim and Acharya [42] suggested an automatic COVID-19 detection system based on deep learning. The DarkCovidNet model was designed for the automatic detection of COVID-19 using chest X-ray images. In that study, any preprocess on X-ray images such as augmentation or segmentation etc. was not applied. As a result of the classification made with 3 classes of data, the classification accuracy was stated as 87.02%. Khan, Shah and Bhat [43] introduced a CNN architecture (CoroNet) to detect COVID-19 using CT and X-ray scans. This model was based on Xception architecture and pre-trained on ImageNet dataset. As a result of the application, it has been shown that the proposed architecture provides 89.6% and 95% accuracy for 4 classes and 3 classes, respectively. Sharma, Rani and Gupta [44] created deep learning models to determine COVID-19 patients from X-ray images. In order to increase number of X-ray images, they performed 25 different types of augmentations on the original images. It was reported that a better performance than previous studies was achieved. Narin, Kaya and Pamuk [45] applied COVID-19 diagnosis using chest X-ray images by developing the ResNet50, Inception-ResNetv2 and InceptionV3 deep learning models. In that study, the binary classification was performed, and the data were validated with 5 fold cross-validation. An average accuracy of 98% was achieved with the ResNet50 model. Ahuja, Panigrahi, Dey, Rajinikanth and Gandhi [46] has developed a three-step implementation to perform COVID-19 detection using CT images. In the first step, they increased the number of data by using 3 level stationary wavelet decomposition. In the second step, they made a classification based on transfer learning using pre-trained models In the last step, abnormalities in the CT image were localized. Finally, in a different study, Singh, Bansal, Ahuja, Dubey, Panigrahi and Dey [47], after the image augmentation and preprocessing step, fine-tuned the VGG16 architecture and used the model to extract features from lung CT scan images. The classification was performed using four different classifiers (CNN, Extreme Learning Machine (ELM), Online sequential ELM, and Bagging Ensemble with SVM). The highest success rate of 95.7% was achieved by using Bagging Ensemble with SVM. The number of studies mentioned above can be increased even more. In the diagnosis of COVID-19 infection, both the images given to the network and the architecture of the network are very effective on the results. As seen above, methods such as CNN, transfer learning, machine learning have been used frequently for the diagnosis of COVID-19. In addition, most of the works perform image augmentation, cropping, image size reduction, etc. operations on the raw images and give the final image to the deep network. According to Liu and Guo [29], BiLSTM is more effective on classification accuracy than the convolutional layer. However, until now, BiLSTM, which is quite modern and has a higher classification success than CNN, has not been used in previous studies in the diagnosis of COVID-19. What makes this study different from others is to give ANN-based segmented lung images to the CNN-based transfer learning–BiLSTM network. When the results are examined, it is seen that the proposed method provides a successful and easy to apply COVID-19 diagnosis.

Methodology

In this section, detailed information will be given about the COVID-19 Radiography Database, lung segmentation, data augmentation and finally the architectures used for classification. A general block diagram of the study is given in Fig. 1. The information about each block in Fig. 1, the architectures used and the results obtained are discussed under five headings.

Fig. 1

Block diagram of the proposed algorithm.

COVID-19 radiography database

In this study an open-access database that covers the posterior-to-anterior chest X-ray images is used [30]. A team of researchers from Qatar University, Doha, Qatar and the University of Dhaka, Bangladesh along with their collaborators from Pakistan and Malaysia in collaboration with medical doctors have created a chest X-ray images database for COVID-19 positive cases along with Normal and Viral Pneumonia images. In current database, there are 219 COVID-19 positive images, 1345 viral pneumonia images and 1341 normal images. As shown in Table 1, a total of 2905 images with three classes are presented in this dataset.

Table 1

Number of samples belonging to each class in the COVID-19 Radiography Database.

Class	Number of samples
COVID-19	219
Viral Pneumonia	1345
Normal	1341
Total	2905

The COVID19 Radiology database was created by collecting the samples from different resources such as the Italian Society of Medical and Interventional Radiology (SIRM) COVID-19 Database [48], Novel Corona Virus 2019 Dataset developed by Joseph Paul Cohen and Paul Morrison and Lan Dao in GitHub [49] and images extracted from 43 different publications. A sample image belonging to all three classes is shown in Fig. 2.

Fig. 2

Sample images of COVID-19 Radiography Database.

Number of samples belonging to each class in the COVID-19 Radiography Database.

Lung segmentation

Image segmentation is an important issue for artificial intelligence, because noises or irrelevant patterns in the image can lead to false predictions. The raw X-ray images in the dataset contain different noises as seen in Fig. 3. In order not to take these noises into account in the artificial intelligence algorithm, the segmentation process is applied.

Fig. 3

Noises or irrelevant patterns in a sample X-ray Image.

In a chest X-ray image, the lung part of the image is examined for COVID-19 detection. For this reason, an image containing only the lung part provides a more successful detection. The positions of the lungs in the raw images are not fixed. Therefore, each image should be considered separately to segment the lung part. But in the COVID-19 radiography database there are 2905 images. For this reason, the segmentation process should be done automatically, not manually. To use only the lung part in the classification process, ANN-based automatic segmentation has been adopted in this study. Noises or irrelevant patterns in a sample X-ray Image. In ANN-based automatic segmentation application, 50 images from COVID-19, Normal and Viral Pneumonia are selected as input for ANN. For each selected image, three selection points (bottom left, bottom right and middle top) are manually determined as in Fig. 4(a). Then the mask is created using these three selection points (see Fig. 4(b)). Finally, this mask image is multiplied with the original image and the result image is obtained as seen in Fig. 4(c). Then, the result images are converted to grayscale.

Fig. 4

Manually segmented X-ray images.

After converting 50 images from each class into the result image as in Fig. 4, ANN-based segmentation is applied for automatic creation of these result images. The size of the result image is converted from 1024 × 1024 to a 1 × 1048576 row matrix. While this matrix is used as input in ANN, the coordinates of the determined selection points are used as output. By this way, the ANN is trained with the Levenberg–Marquardt backpropagation method. There is one hidden layer with 100 neurons in this network. The output layer is consisting of 6 neurons which represent the two coordinates of three selection points. The error value as a result of training with ANN is 0.94. After the training, all of the other images are given as input to this trained ANN. Cropping is applied according to the estimated coordinate values. In Fig. 5 an original and result X-ray image is presented.

Fig. 5

Proposed ANN-based segmentation.

Manually segmented X-ray images.

Data augmentation

One way to classify successful in deep learning is to have a large dataset. However, it is not always possible to reach a large number of data. For this reason, the data can be augmented in computer environment to increase the classification success. Data augmentation methods do not present any new, visual features of the images that could significantly improve the learning abilities of the algorithm used and the greater generalization abilities of networks. Color, texture, and geometric based data augmentation techniques are not equally popular because of their different disadvantages. Currently, only geometric transformations are commonly used, although a wide variety of other interesting methods have been developed in the past. In this study, the image rotation technique, one of the geometric transformation methods, is used to increase the chest X-ray images. While this technique provides data diversity and more accurate classification, it also includes disadvantages such as additional memory, conversion calculation costs, and additional training time [50], [51]. The rotation process has been applied to images belonging to the COVID-19 class, which has a much smaller number of samples (see Table 1). After the data augmentation step, the COVID-19 class images are increased four times and the number of new COVID-19 class samples has reached 1095. The cropped chest X-ray images are rotated in degrees counterclockwise from 0° to 359° according to a randomly generated number (see Fig. 6).

Fig. 6

Rotation operation.

Proposed mAlexNet architecture

Nowadays, deep learning-based artificial intelligence studies provide state-of-the-art solutions in computer vision. CNN, which is a deep learning method, is now preferred in different disciplines in image recognition applications. Small details that people cannot notice, can be easily distinguished using CNN. CNNs recognize visual patterns directly from pixel images with minimal preprocessing. The CNN structure was introduced by the LeNet architecture [52], and AlexNet [53] made CNN popular. With the various designs and applications made since then, CNN’s popularity has grown exponentially. AlexNet is the first CNN to achieve the highest classification accuracy in the ImageNet Large Scale Visual Recognition Competition (ILSVRC) in 2012. Based on deep CNN, AlexNet occur two fully connected layers, five convolutional layers, and a single Softmax output layer. The image input layer requires 227 227 3 images. Each convolutional layer is followed by the Rectified Linear Unit (ReLU) activation function [53]. After each convolution layer, AlexNet has maximum pooling to reduce network size. After the last convolutional layer there are two fully-connected layers with 4096 outputs. Finally, a layer is added after fully-connected layers to classify the given data. This last layer classifies 1000 objects using the Softmax function [54], [55], [56]. Different types of classification studies using AlexNet have been performed many times until now. However lately, more efficient transfer learning applications have started to be preferred in many deep learning studies rather than designing a network from scratch or using an existing network directly. Because using and modifying a pre-trained CNN model is much easier and faster than training a new CNN model with randomly initiated weights. In CNN-based architectures, visual features are usually extracted and learned in the first layers. Therefore, the first layers are not changed and changes can be made on the last layers to take advantage of an existing architecture. Using transfer learning, architectures trained on large data sets are used directly. Thus, previously learned parameters, especially weights, are transferred to the modified new model [57]. Although the new dataset differs from the network’s previous training data content, the low-level features are similar. By transferring the parameters of the pre-trained model, the new model can gain a powerful feature extraction capability, and the training calculations and memory cost of the new model can be reduced [55]. An example architecture for a transfer learning is shown in Fig. 7. The last three layers of the pre-trained CNN model are removed and new layers are added for the new task. As a result, the modified network is used for the new classification task.

Fig. 7

An example transfer learning process.

In the first step of this study, chest X-ray images are classified using a transfer learning-based modified AlexNet (mAlexNet) architecture. AlexNet consists of 25 layers including convolution, fully connected (fc), ReLU, normalization, pooling, etc. The AlexNet architecture is configured for 1000 classes. The last three layers of the AlexNet model have been removed to be compatible with our study. These last three layers are modified to classify COVID-19, Viral Pneumonia and Normal images. The remaining parameters of the original AlexNet model have been preserved. The removed layers and modified/newly added layers are shown in Fig. 8. Although there are 1000 neurons in the fc8 layer in the original AlexNet model, there are 25 neurons in the newly added fc8 layer. Because 25 features are required for the proposed second hybrid model. Fig. 8 shows the proposed mAlexNet architecture. In Table 2, mAlexNet layers and parameters of these layers are given. Table 2 also includes the parameters of the training algorithm.

Fig. 8

Proposed mAlexnet architecture.

Table 2

Layers and parameters of the proposed mAlexNet.

Layer name	Size	Fiter size	Stride	Padding	Output channel	Activation function
conv1	55 × 55	11 × 11	4	0	96	relu
maxpool1	27 × 27	3 × 3	2	0	96	–
conv2	27 × 27	5 × 5	1	2	256	relu
maxpool2	13 × 13	3 × 3	2	0	256	–
conv3	13 × 13	3 × 3	1	1	384	relu
conv4	13 × 13	3 × 3	1	1	384	relu
conv5	13 × 13	3 × 3	1	1	256	relu
maxpool5	6 × 6	3 × 3	2	0	256	–
fc6	–	–	–	–	4096	relu
fc7	–	–	–	–	4096	relu
fc8	-	–	–	–	25	relu
fc9	-	–	–	–	2	softmax

Training parameters

Optimizer	Max. Epoch	Mini Batch Size	Initial Learning Rate (α)	Momentum (γ)

SGDM	100	60	0.001	0.95

Proposed ANN-based segmentation. Rotation operation. When the training parameters are examined, it is seen that the Mini Batch parameter, which allows the training data to be divided into small groups, is 60 and the optimization algorithm used to reduce the train error is Stochastic Gradient Descent with Momentum (SGDM). Parameter values of this algorithm are also found in Table 2. An example transfer learning process. Proposed mAlexnet architecture. Layers and parameters of the proposed mAlexNet. The parameter updating equation performed using the SGDM algorithm is given in Eq. (1). In this equation, the goal is to update the weights according to the error value called Loss function () and decrease the next error value. For this, in each iteration, the error value is sought by moving in specific small steps (limited by the learning rate ()) towards the negative gradient of the Loss function. During this search, the weight values are updated with the back propagation algorithm in each iteration. The contribution of the current weight value to the weight value in the previous iteration is determined by the Momentum () coefficient. By using each parameter value in Table 2 in Eq. (1), it is ensured that the classification errors in the chest X-ray images are minimized during training. As a result, by using the parameters shown in Table 2, the network seen in Fig. 8 is trained with the SGDM algorithm. After the training, the accuracy values obtained as a result of the classification made with test images are given in the Results and Discussion section.

Proposed hybrid model

The second architecture, designed for the detection of COVID-19, includes the first architecture completely. The convolutional structure in this architecture is exactly the same as the previous architecture. The most important feature that distinguishes the hybrid structure from the previous architecture is the BiLSTM layer. As seen in Fig. 9, Flatten and Bidirectional Long Short-Term Memories (BiLSTM) layers have been added in addition to the previous architecture. The Flatten layer simply ensures that the data is transformed into a one-dimensional array. The purpose is to prepare the data for the input of the BiLSTM layer. BiLSTM and LSTM have the Recurrent Neural Network (RNN) architecture used to process sequential data. Unlike traditional neural network algorithms, RNN assumes a relationship between input data, so it is suitable for sequential and temporal data. LSTM has become very popular because it solves the RNN’s vanishing gradient problem. Unlike LSTM, BiLSTM does not only store historical information, but also examines the relationship between data in two directions. Therefore, for sequential data, it provides more successful results than LSTM.

Fig. 9

Proposed mAlexNet–BiLSTM (Hybrid) architecture.

Since the BiLSTM layer is suitable for sequential data, firstly, the features are extracted from the images, for this, convolutional architecture is used. Therefore, the new architecture includes both convolutional layers and BiLSTM, as shown in Fig. 9. Parameter values and training parameters of the designed mAlexNet–BiLSTM hybrid architecture are given in Table 3.

Table 3

Hyperparameters of layers used for hybrid architecture and training options.

BiLSTM-1			BiLSTM-2			fc9
Number of hidden units	State activation function	Gate activation function	Number of hidden units	State activation function	Gate activation function	Output size	State activation function
125	tanh	sigmoid	100	tanh	sigmoid	3	tanh

Training parameters

Optimizer	Gradient decay factor (β1)	Squared Gradient decay factor (β2)	Max. Epoch	Mini batch size		Initial learning rate (α)	Epsilon (ϵ)

Adam	0.9	0.999	200	512		0.001	10−8

As seen in Table 3 and Fig. 9, two consecutive BiLSTM (BiLSTM-1, BiLSTM-2) layers are used. The temporal features obtained as a result of these layers are given as an input to the fully connected layer (fc9) and the classification is completed using Softmax. In proposed architecture, hidden layer neuron numbers, activation functions, etc. parameter values are found by trial and error method. During the training phase, the Adam optimization algorithm is used to reduce the error in each iteration. Adam optimization algorithm is adaptive learning rate algorithm that is been designed specifically for training deep neural networks. Adam outperforms other optimization algorithms thanks to its relatively low memory requirement advantage [58]. Adam is an adaptive learning rate method, that is, it calculates individual learning rates for different parameters. Adaptive learning rates are adjustments to learning rate during training step by reducing the learning rate according to a predefined schedule. The Adam algorithm is a combination of SGDM and Root Mean Square Propagation (RMSprop) optimization methods. The Adam algorithm, which uses parameter updating similar to RMSProp, contains the momentum term, unlike RMSProp. The name Adam is derived from adaptive moment estimation, and this is because Adam uses predictions of the first and second gradient moments to adapt the learning rate for each weight of the neural network. In short, the expected value of the first and second power of the term to be updated is used. To estimates the moments, Adam calculates exponential weighted moving average of the gradient, and then squares the gradient. This algorithm includes two decay parameters that control the distortion rates of these calculated moving averages. The parameter update equation for Adam is given between Eqs. (2)–(4) [58], [59], [60]. Hyperparameters of layers used for hybrid architecture and training options. Proposed mAlexNet–BiLSTM (Hybrid) architecture. The parameter values in Table 3 are used in the Adam optimization algorithm shown in Eqs. (2)–(4), and thus, the weights are updated in each iteration in the training phase. As a result of these updates, after reaching the determined iteration number limit, the accuracy values calculated using test data can be seen in Fig. 11 and Table 4.

Fig. 11

Confusion matrices of proposed methods.

Table 4

Performance metrics of the proposed architectures.

Architecture	Acc. (%)	Error	Recall	Specificity	Precision	False Positive rate	F1-score	AUC	MCC	Kappa
mAlexNet	98.14	0.0186	0.9826	0.9906	0.9816	0.0094	0.9820	0.9855	0.9726	0.9581
mAlexNet + BiLSTM	98.70	0.0130	0.9876	0.9933	0.9877	0.0067	0.9876	0.9900	0.9809	0.9707

Results

Using the architectural parameters and training parameters of the two architectures described above, the performance of both algorithms is determined by test data. In the experimental studies, a laptop with Intel Core i7-7700HG CPU, 16 GB RAM, NVIDIA GeForce GTX 1050 4 GB is used. Deep learning software and calculation of results are carried out in Matlab environment. Training graphics of mAlexNet and hybrid architecture are shown in Fig. 10. In Fig. 10, it is seen that both training and test loss approach the minimum at the end of the graph. Training time on CNN and BiLSTM networks is 139 s and 85 s, respectively. However, these times are directly related to the number of epochs and iterations. In the training step, the number of iterations for mAlexNet is determined as 1520, the number of epochs is 5, and the training duration is 139 s. In hybrid architecture training, the number of iterations is 1150 and the number of epochs is 50, the training takes 224 (139+85) seconds. The proposed method is not complicated and easy to implement since it automatically realizes segmentation and does not include feature extraction step due to its end-to-end learning architecture. Confusion matrices obtained according to the classification accuracy are shown in Fig. 11. In addition, different performance metrics such as accuracy, recall, specificity, precision, F1-score, MCC, Kappa, Area Under Curve (AUC) calculated for performance measurement. Receiver operating characteristic (ROC) curves are also shown in Fig. 12. The formula for each metric is defined between Eqs. (5)–(12) [61]. More detailed information on metrics can be found in reference [62]. The performance values obtained with these formulas are shown in Table 4. : True Positive : True Negative : False Positive : False Negative

Fig. 10

Training graphics of proposed architectures.

Fig. 12

ROC curves of the proposed methods.

Considering the results obtained in Table 4, it can be seen that the results of both architectures are successful. However, the classification success of the hybrid structure formed by adding BiLSTM to the mAlexNet model is higher. The calculated accuracy rates are 98.14% and 98.70% for mAlexNet and mAlexNet+BiLSTM, respectively. In addition, as shown in Table 4, Precision, Recall, F1-Score, Specificity and MCC values are also higher for the hybrid architecture. This shows that the hybrid architecture performs better overall performance and unbiased classification. Training graphics of proposed architectures. Confusion matrices of proposed methods. ROC curves of the proposed methods. Performance metrics of the proposed architectures.

Discussion

The main goal of this study is to show that the CNN-based transfer learning–BiLSTM hybrid structure is highly effective for the diagnosis of COVID-19. Similar to other previous studies, our study is based on deep learning. However, it differs due to its methodological contribution. Studies suggesting different methods previously made using deep learning are compared in Table 5 in terms of their accuracy. Accordingly, it is seen that the proposed method is comparable with previous studies in terms of accuracy.

Table 5

Comparison of the proposed hybrid method with previous studies.

Study	Method	Accuracy (%)
Wang and Wong [63]	Deep Learning	92.30
Afshar, Heidarian, Naderkhani, Oikonomou, Plataniotis and Mohammadi [64]	Capsule network	95.70
Chowdhury, Rahman, Khandakar, Mazhar, Kadir, Mahbub, Islam, Khan, Iqbal and Al-Emadi [30]	Transfer Learning	97.94
Farooq and Hafeez [65]	Transfer Learning	96.20
Ucar and Korkmaz [41]	Bayes-SqueezeNet	98.30
Apostolopoulos and Mpesiana [66]	Transfer Learning	93.48
Xu, Jiang, Ma, Du, Li, Lv, Yu, Ni, Chen and Su [67]	ResNet + Location Attention	86.70
Ozturk, Talo, Yildirim, Baloglu, Yildirim and Acharya [42]	DarkCovidNet	87.02
Narin, Kaya and Pamuk [45]	Transfer Learning	98.00
Asif and Wenhui [39]	Transfer Learning	96.00
Nour, Cömert and Polat [38]	Deep-Machine Learning	98.97
Khan, Shah and Bhat [43]	Transfer Learning	95.00
Gupta, Anjum, Gupta and Katarya [68]	InstaCovNet-19	99.08
Sethy and Behera [69]	ResNet50 + SVM	95.40
Hemdan, Shouman and Karar [70]	VGG19	90.00
Rahimzadeh and Attar [71]	Xception + ResNet50V2	91.40
Proposed Method	Hybrid	98.70

As can be seen in Table 5, many studies including CNN architecture have been carried out so far. The biggest advantage of these architectures is that they contain an end-to-end learning structure, i.e. there is no handcrafted e feature extraction step. In addition, the new trend is transfer learning-based CNN architectures, as it improves classification accuracy. Therefore, the combination of different pre-trained models or pre-trained model-machine learning methods have been frequently proposed recently. However, the approach suggested in this study is different from the previous ones. Comparison of the proposed hybrid method with previous studies. The proposed method owes its success to lung segmentation and hybrid architecture. Most of the deep learning-based studies for the diagnosis of COVID-19 are only CNN-based, as shown in Table 5. In addition, most studies give raw images as input to the CNN without lung segmentation. This causes the features extracted from the X-ray image to express that image class poorer. Since the proposed study performs the segmentation process automatically, it provides both high classification accuracy and convenience. Also, according to the study by Liu and Guo [29], BiLSTM have greater effects than the convolutional layer on the classification accuracy. In this study, CNN is used for feature extraction and BiLSTM is used to classify COVID-19 according to these features. This provides a high classification success compared to most previous studies. Moreover, the proposed architecture gives the features extracted from CNN directly to the BiLSTM layer. Therefore, its application is simple and uncomplicated. The general disadvantage of deep learning studies is that the ability to generalize is largely dependent on training data. Today, millions of people around the world have been infected with COVID-19. Therefore, it is not certain whether the proposed deep learning-based studies will show the same success in a different patient’s CT image. This uncertainty can be overcome by performing the training process using millions of images. The number of CT images and classification accuracy can be increased with data augmentation techniques. However, this does not provide as strong learning as adding a real and different sample. Therefore, the number of image data should be increased and training should be done for a real and general success. Successes obtained with limited data do not fully reflect the truth. Of course, increasing the number of data requires a more powerful computer and the training time increases. Although the transfer learning based CNN–BiLSTM structure proposed in this study has achieved a high success, it requires more training time since it includes both CNN and BiLSTM. In addition, the lungs are not separated from each other in the ANN-based segmentation step performed in the application. Although the extracted features represent the infection better than the raw image, it would be more accurate to segment both lung images regionally.

Conclusion and future work

Early detection of COVID-19 disease is crucial to preventing the disease from spreading to other people. This study uses chest X-ray images to easily diagnose COVID-19. First, ANN-based segmentation is applied to the raw images, so that only the lung area is evaluated for COVID-19 detection. Then, in order to provide data diversity in the images, images belonging to the COVID-19 class are augmented. The last step after the segmentation and data augmentation steps is to give the result images as input to the designed deep learning network. Both proposed architectures include the pretrained Alexnet architecture. While the first architecture is only a transfer learning application, the second architecture includes an additional BiLSTM layer. The results show that the proposed second hybrid architecture is more successful for COVID-19 detection. The different aspects of this study compared to other studies are that it proposes ANN-based segmentation and uses a hybrid architecture. Since the proposed model has an end-to-end learning structure, it provides automatic detection of COVID-19 by using chest X-ray images without requiring any handcrafted feature extraction technique. In this way, a fast and stable system helps expert radiographs as a decision support system. In this way, the workload of radiologists is reduced and misdiagnosis is prevented. Although the proposed method is successful, different methods based on deep learning will be proposed for the detection of COVID-19 in future studies. The first planned study is to increase the success by increasing the number of datasets. As is known, the success of deep learning depends largely on the number of labeled data. Therefore, generative adversarial network (GAN) combined with a deep neural network (DNN) structure will be developed. Another planned study is to develop a stronger CNN-based lung segmentation.

CRediT authorship contribution statement

Muhammet Fatih Aslan: Investigation, Methodology, Writing - review & editing. Muhammed Fahri Unlersen: Investigation, Methodology, Writing - review. Kadir Sabanci: Methodology, Investigation. Akif Durdu: Methodology, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

51 in total

1. Automated Screening of COVID-19-Based Tongue Image on Chinese Medicine.

Authors: Guang Zhang; Xueying He; Delin Li; Cuihuan Tian; Benzheng Wei
Journal: Biomed Res Int Date: 2022-06-23 Impact factor: 3.246

2. Non-iterative learning machine for identifying CoViD19 using chest X-ray images.

Authors: Sahil Dalal; Virendra P Vishwakarma; Varsha Sisaudia; Parul Narwal
Journal: Sci Rep Date: 2022-07-13 Impact factor: 4.996

3. An Analysis of New Feature Extraction Methods Based on Machine Learning Methods for Classification Radiological Images.

Authors: Firoozeh Abolhasani Zadeh; Mohammadreza Vazifeh Ardalani; Ali Rezaei Salehi; Roza Jalali Farahani; Mandana Hashemi; Adil Hussein Mohammed
Journal: Comput Intell Neurosci Date: 2022-05-25

4. COVID-19 Isolation Control Proposal via UAV and UGV for Crowded Indoor Environments: Assistive Robots in the Shopping Malls.

Authors: Muhammet Fatih Aslan; Khairunnisa Hasikin; Abdullah Yusefi; Akif Durdu; Kadir Sabanci; Muhammad Mokhzaini Azizan
Journal: Front Public Health Date: 2022-05-31

5. Classification of Ear Imagery Database using Bayesian Optimization based on CNN-LSTM Architecture.

Authors: Kamel K Mohammed; Aboul Ella Hassanien; Heba M Afify
Journal: J Digit Imaging Date: 2022-03-16 Impact factor: 4.903

6. Using handpicked features in conjunction with ResNet-50 for improved detection of COVID-19 from chest X-ray images.

Authors: Sheetal Rajpal; Navin Lakhyani; Ayush Kumar Singh; Rishav Kohli; Naveen Kumar
Journal: Chaos Solitons Fractals Date: 2021-02-10 Impact factor: 5.944

7. A new approach for computer-aided detection of coronavirus (COVID-19) from CT and X-ray images using machine learning methods.

Authors: Ahmet Saygılı
Journal: Appl Soft Comput Date: 2021-03-17 Impact factor: 6.725