Literature DB >> 34764564

Stacked-autoencoder-based model for COVID-19 diagnosis on CT images.

Abstract

With the outbreak of COVID-19, medical imaging such as computed tomography (CT) based diagnosis is proved to be an effective way to fight against the rapid spread of the virus. Therefore, it is important to study computerized models for infectious detection based on CT imaging. New deep learning-based approaches are developed for CT assisted diagnosis of COVID-19. However, most of the current studies are based on a small size dataset of COVID-19 CT images as there are less publicly available datasets for patient privacy reasons. As a result, the performance of deep learning-based detection models needs to be improved based on a small size dataset. In this paper, a stacked autoencoder detector model is proposed to greatly improve the performance of the detection models such as precision rate and recall rate. Firstly, four autoencoders are constructed as the first four layers of the whole stacked autoencoder detector model being developed to extract better features of CT images. Secondly, the four autoencoders are cascaded together and connected to the dense layer and the softmax classifier to constitute the model. Finally, a new classification loss function is constructed by superimposing reconstruction loss to enhance the detection accuracy of the model. The experiment results show that our model is performed well on a small size COVID-2019 CT image dataset. Our model achieves the average accuracy, precision, recall, and F1-score rate of 94.7%, 96.54%, 94.1%, and 94.8%, respectively. The results reflect the ability of our model in discriminating COVID-19 images which might help radiologists in the diagnosis of suspected COVID-19 patients. © Springer Science+Business Media, LLC, part of Springer Nature 2020.

Entities: Chemical

Keywords: COVID-19 diagnosis; Computed tomography; Deep learning; Stacked autoencoder

Year: 2020 PMID： 34764564 PMCID： PMC7652058 DOI： 10.1007/s10489-020-02002-w

Source DB: PubMed Journal: Appl Intell (Dordr) ISSN： 0924-669X Impact factor: 5.086

Introduction

Coronavirus severe acute respiratory syndrome (SARS)-CoV-2 broke out in December 2019. All patients infected with COVID-19 virus developed symptoms of mild or severe respiratory disease COVID-19 [1, 2]. In the following months, COVID-19 spreads rapidly around the world. On March 11, 2020, the World Health Organization declared COVID-19 disease to be a global pandemic [3, 4]. The inefficiency of the global detection of the disease is one of the reasons for its rapid spread [5]. Since the isolation and genome sequencing of COVID-19 virus [6, 7], the current diagnostic methods for detection of COVID-19 virus include nucleic acid detection kit method (TR-PCR method) and COVID-19 nucleic acid sequencing method. However, TR-PCR method requires at least 4 hours to obtain the test results. And nucleic acid sequencing method takes much longer [8, 9]. Moreover, for some countries and regions with insufficient funds, the reagent and equipment needed for these diagnostic methods will be relatively tight, thus delaying the rapid diagnosis of infected people, which led to the rapid spread of COVID-19 in the world [10]. Chest CT images play a significant role in the auxiliary diagnosis of COVID-19. The COVID-19 chest CT assisted diagnosis method based on deep learning might take a few seconds to obtain accurate test results [11-13]. Currently, many researchers have proposed the chest CT diagnostic model of COVID-19 [14-17], such as patch-based deep neural network architecture [14], and dual-sampling attention network [17]. However, a little chest CT images are available obtained because of patient privacy. Therefore, most of these diagnostic models are trained on a small chest CT dataset of COVID-19 patients. However, these detection models based on deep learning have large variances and are prone to gradient disappearance and overfitting. The performance of these detection models still has great room for improvement. To solve the problem of gradient disappearance and overfitting, a stacked autoencoder detector model is proposed to improve the performance of COVID-19 diagnostic in this paper. The stacked autoencoder detector model is trained on the currently available small chest CT dataset of COVID-19 patients. And comparing the performance of our model with the baseline model, we achieved an average 10% improvement in the accuracy of our model. In this paper, our main contributions include: A new stacked-autoencoder-based model was proposed for COVID-19 diagnosis that can overcome the gradient disappearance and overfitting caused by deep neural network training on a small dataset to some extent. A new reconstruction loss was constructed as a regular term, which can improve the detection accuracy. The average performance of our model outperforms the current binary baseline COVID-19 diagnostic model based on the same small chest CT dataset. The remainder of this paper is organized as follows: the previous work is presented in Section 2, the methodologies including the proposed model, datasets, and training strategy are described in Section 3, and the experimental design and results are presented and analyzed in Section 4. Section 5 and Section 6 discusses and summarizes this paper.

Previous works

In recent years, deep learning technology has achieved promising results in the automatic analysis of multimodal medical images to complete radiological tasks [18-20]. Deep convolutional neural networks are a powerful deep learning architecture, which has been widely applied to image classification, pattern recognition, and other fields [21]. In previous studies, deep convolutional neural networks have been exploited to classify chest CT images and successfully diagnosed common chest diseases such as Tuberculosis screening [22] and mediastinal lymph nodes in CT images [23]. During the current COVID-19 outbreak, researchers are trying to make their efforts to alleviate the epidemic through their research [14-17]. Based on the previous studies, the application of deep convolutional neural networks in the COVID-19 auxiliary diagnosis of chest CT is a worthy research direction. Many researchers are working in this direction. Generally, there are two different kinds of deep learning diagnostic models for COVID-19. One kind of COVID-19 diagnostic methods is binary classification diagnostic model, including DenseNet [24], DRE-Net [25], M-Inception [26], etc. In [24], a publicly available COVID-CT dataset was built, in which there were 275 chest CT scans of COVID-19 positive patients. A deep convolutional neural network was trained on this dataset and the model achieved an accuracy rate of 84.7%. In [25], a deep learning model was built for pneumonia (COVID-19) classification. By tuning hyperparameters according to the validation set, the model achieved an accuracy rate of 86%. In [26], an inception migration neural network was constructed, and which achieved 82.9% accuracy finally. The other is multi-classification diagnostic model, including location-attention oriented model [27], CoroNet [28], COV-Net [29], etc. In [27], it used Res-Net to extract features from CT images together with a location-attention mechanism model, could accurately distinguish COVID-19 cases from Influenza-A viral pneumonia cases and health cases, with an overall accuracy rate of 86.7%. In [28], a deep convolutional neural network was trained on the dataset which included COVID-19 positive chest CT images, pneumonia bacterial, pneumonia viral, and normal CT images. And the model achieved an overall accuracy of 89.5%. In [29], a deep learning-based CT diagnostic system was developed to identify COVID-19 patients based on the collected CT datasets. The experimental results showed that the accuracy of the model was 87%. These chest CT assisted diagnostic models using deep learning for COVID-19 are mostly based on a limited number of COVID-19 CT datasets. The performance indicators of these models, such as accuracy and recall rate have not reached the requirements for the actual detection of COVID-19. And the application of deep learning techniques to identify and detect novel COVID-19 in chest CT is still quite limited so far. Therefore, this paper aims to propose a new framework of deep learning classifiers to assist radiologists to automatically diagnose COVID-19 in chest CT images.

Methodology

Overall architecture of the proposed model

Generally, the better experimental results are obtained by deep network fitting training based on large datasets. Deep learning modeling is usually to establish a deeper network structure, and the deeper the neural network is in theory, the higher the fitting degree of the model. However, since the traditional multi-layer neural network uses the way of loss backpropagation, the smaller the loss of propagation as the deeper the network layer, which results in the problem of gradient disappearance [30]. For the training of deep neural networks on a small dataset, the problem of gradient disappearance is more and more serious. Solving the gradient disappearance problem in the training process of a small CT image dataset is of great importance. Kaiming He put forward by using deep residual network structure to improve the gradient disappeared. By adding a shortcut connection structure in the deep neural network, the gradient can be transferred from the first layer to the last layer of the network, which can alleviate the problem of gradient getting smaller and smaller in the deep neural network to some extent. Different from this approach, stack autoencoder improves gradient disappearance from the perspective of improved training way. In general, stack autoencoder uses a separate encoding and decoding network for all convolution layers in the network for separate training. In single layer encoding and decoding networks, gradient disappearance is generally not easy to occur, thus avoiding the problem of gradient disappearance that may occur during the training of the entire deep network. This is also the core idea of a stacked autoencoder neural network [31, 32]. A stacked autoencoder neural network is a modeling method to use an autoencoder neural network. The overall architecture of the stacked autoencoder detector model is shown in Fig. 1.

Fig. 1

Stacked autoencoder model structure, a the overall architecture of the stacked autoencoder detection model, b stacked autoencoder layer 1 structure, c stacked autoencoder layer 2 structure

Stacked autoencoder model structure, a the overall architecture of the stacked autoencoder detection model, b stacked autoencoder layer 1 structure, c stacked autoencoder layer 2 structure In Fig. 1a, a 3 by 3 convolution layer is used to extract the feature layer by layer, in which Local Response Normalization(LRN) and Max-Pooling are conducted before the convolution operation of feature h2、h3 and h4 (feature h2、h3 and h4 are the encoding outputs of Autoencoder 2, 3 and 4). LRN is proposed in Alex-Net by Hinton [33]. LRN layer simulates the lateral inhibition mechanism of the biological nervous system and creates a competitive mechanism for the activity of local neurons, making the relatively large value of response relatively larger and improving the generalization ability of the model. In Fig. 1b, Chest CT images are used directly for encoding and decoding. The layer 1 network includes the encoder part for feature extraction and decoder part for restoring the original feature map of the input. Different from Fig. 1b, the input feature map of Fig. 1 is firstly operated by LRN and Max-Pooling. Figure 1c shows the network of the Autoencoder 2 of the whole stacked autoencoder diagnosis model. The network of layer 3 and layer 4 is the same as that of the layer 2. It is important to note that the loss functions of the first four layers of autoencoder networks are different. The loss of the layer 1 autoencoder network is the reconstruction loss, as shown in Eq. (1): Where denotes the size of batch size, is the parameter matrix of layer 1, is the input CT image, and is loss function Mean-Squared-Error (MSE), as shown in Eq. (2): The loss function term selected for the first fourth layers of the model is the mean squared error, aiming to approximate the real data as much as possible [34]. The value of the loss function becomes smaller and smaller after iteration training until it is reduced to the minimum. The layer 2 loss function is based on the layer 2 itself reconstruction loss plus the layer 1 autoencoder network loss function as the regularization term, as shown in Eq. (3): Where denotes the layer 1 autoencoder network loss function as the regularization item, is the parameter matrix of layer 2, and is the output of layer 1 , as shown in Eq. (4): The loss function of layer 3 and 4 are similar to those of the layer 2. The loss function of layer 3 is the reconstruction loss of layer 3 plus the loss function of the first two layers of the autoencoder network as shown in Eq. (5). The loss function of layer 4 is the reconstruction loss of layer 4 plus the loss function of the first three layers of the autoencoder network as shown in Eq. (6). Where and are the output of layer 2 and the output of layer 3 . And when calculating the classification loss, we add the loss function of each of the four layers autoencoder networks as the regular term, as shown in Eq. (5): Where denotes the parameter matrix of the last layer, is the label of CT image, is the output of layer 5, and is loss function term , as shown in Eq. (6): The loss function term selected for the last layer of the model is the cross-entropy loss, aiming to obtain the probability values of COVID-19 and non-COVID-19. The optimization function acts on the loss function in the process of backpropagation, and the optimizer wants to find a minimum loss value, that is, the local optimal solution. We build new global loss functions by continually adding the loss function of the previous layer as regular terms to the loss of the current layer. On this new loss function, the optimizer can find a better local optimal solution. In other words, this new global loss function not only makes the feature extraction effect of each layer better but also improves the effect of the final classification.

The detection model

In this section, we introduce the modeling approach of the stacked autoencoder detection model. Our model is a neural network composed of autoencoders that train multi-layer networks layer by layer, which trains the convolution kernel of each layer by autoencoder in the order from front to back. The output of the last layer is taken as the input feature of the softmax classifier, and the classification results are output by softmax. Here we present the entire modeling process of our model in three steps. Firstly, an autoencoder is trained to obtain the input first-order feature h1 of the original CT scan image data, as shown in Fig. 2. We can use a formula to represent this process:

Fig. 2

Stacked autoencoder layer 1 structure

Stacked autoencoder layer 1 structure Where x represents the input CT image, is the weight matrix of layer 1 encoder, is the weight matrix of the layer 1 decoder. In general, the result of training a convolutional neural network is equivalent to obtaining a complex encoding function . The entire convolutional network is a high-level function with numerous parameters. In this process, there is generally no decoding process. Here we train each layer separately by adding a decoding process to get the corresponding encoding function . The output of the decoding function is required to be as similar as possible to the input. Thus, after each layer is trained separately, the weight w learned by the encoding function has a better ability to extract features than the weight w obtained by the traditional training method. And we can get the feature h1, which is the output of , as shown in Eq. (4). Secondly, the output feature h1 in the previous step is taken as the input to acquire feature h2 through layer 2 autoencoder, as shown in Fig. 3. In the same way, feature h2 can be used as the input to obtain feature h3 through autoencoder on the next layer. Again, we can get feature h4. After getting feature h4, feature h4 is used as the input to obtain h5 through the next dense layer.

Fig. 3

Stacked autoencoder layer 2 structure

Stacked autoencoder layer 2 structure Thirdly, we connect the feature h5 of the previous step to the softmax classifier to get the results of classifying the digital labels of the image, as shown in Fig. 4. In this step, the softmax classifier obtains the probability of two types of labels by combining the feature data calculated from the previous layers. One is Y1, which is the CT scan image of non-COVID-19. The other category is Y2, which represents CT images of COVID-19 positive patients. Finally, these six layers are combined to form a stacked autoencoder detector with four convolution layers、one dense layer, and one softmax layer, which can classify CT scan images.

Fig. 4

Stacked autoencoder layer 5 structure

Training

Datasets

A baseline chest CT dataset of COVID-19 collected and published by UC San Diego is used in our research [24]. The artificial intelligence method for CT image detection of COVID-19 have the advantages of fast speed, low cost, and high accuracy. However, due to privacy concerns, the CT scans used in these works are not shared with the public. This greatly hinder the research and development of more advanced AI methods for more accurate testing of COVID-19 based on CT. To address this issue, the dataset collectors build a COVID-CT dataset that contain 275 COVID-19 positive chest scan images and 195 COVID-19 negative chest scan images. And the dataset is open-sourced to the public, to foster the R&D of CT-based testing of COVID-19. The published website for this dataset is in the footnote section of this page1. The COVID−CT dataset have been uploaded to GitHub by dataset collectors and is being continuously supplemented. The dataset collectors extract publicly available CT images from 760 preprints, which are from medRxiv and bioRxiv, and manually select images with clinical manifestations of COVID−19 by reading the image descriptions. The CT images vary in size, with the minimum, average and maximum heights of 153, 491, and 1853. The minimum, average, and maximum widths are 124, 383, and 1485. These scans are from 143 patient cases. Before training the model using the CT scan images, uniform resizing and standardization are adopted for these data. The final processing size is 224 by 224.

The training strategy

In this section, we describe the training methods of our model from three aspects: the division of training set, training process, and training results. Before training the model, the first step is to partition the dataset. By the hold-out method, the original dataset is divided into three mutually exclusive sets, which are divided into a training set, verification set, and test set. Table 1 describes the partitioning of the dataset. The training set includes 183 COVID-19 positive CT images and 146 COVID-19 negative CT images. The verification set includes 57 COVID-19 positive CT images and 15 COVID-19 negative CT images. The test set includes 35 COVID-19 positive CT images and 34 COVID-19 negative CT images. And the purpose of setting up the dataset is to maintain the consistency of the dataset with the baseline model on the current COVID-2019 binary classification. So that, we can provide more fair comparisons in the experiments

Table 1

Original statistics of data split

Classes	Non- COVID	COVID − 2019	Total
Train	146	183	329
Validation	15	57	73
Test	34	35	69

Original statistics of data split To obtain a reliable and stable model, the 5-fold cross-validation method is used here. Cross-validation is effective in overcoming the overfitting problem. It can make full use of all CT images in the limited dataset for training, and finally take the average of the results of cross-validation, which makes the evaluation results more convincing. We mix the training set and the test set into a new dataset. Then the new dataset is divided into 5-fold cross-validation, as shown in Fig. 5. The new dataset is bisected into five mutually exclusive subsets, selecting one of them as the test set each time and training the model five times. And the test results of the five test sets are summed and averaged to obtain the comprehensive performance evaluation parameters of the model.

Fig. 5

Schematic representation of training and testing schemes employed in the 5-fold cross-validation procedure

Schematic representation of training and testing schemes employed in the 5-fold cross-validation procedure After the dataset is divided, the next step was to train the model with the divided training set. According to the sequence of the network layer, the training starts from layer 1 autoencoder of the stacked autoencoder detector model. To improve the robustness of the model, gaussian noise is added to the CT scan images. Therefore, the layer 1 network can perform certain denoising ability on the input image after the layer 1 autoencoder training is completed. And we save the training parameters of layer 1 autoencoder to provide a good initial weight for the training of layer 2. When the layer 2 networks are trained, the original data are firstly inputted to the layer 1 network to acquire the input of layer 2. Similarly, the layer 3 and 4 are trained separately after the layer 2 network is trained. When each layer network is trained separately, the optimization function selected is Adam optimizer [35]. The key to the neural network is to calculate each neuron, asfollows: Where is the number of layers, is the parameter matrix, is the deviation, is the output matrix of each layer, and g is the activation function. The Relu function is used for better normalization with LRN [36]. By this formula, each layer of neurons in the neural network is calculated simultaneously. The predicted value is obtained by a series of calculations on the hidden layer. Then, the stochastic gradient descent method shown in Eq. (11) is applied to the backpropagation, so that the weight and partial positive of each layer can be fitted appropriately [37, 38]. To further improve the performance of the model, a cascaded approach is used to further optimize network parameters. Cascading all layers together to create a new network. Here, the output of the first layer is reused, and the input data is also the original CT scan image data. Like the layer 1 of the cascade network structure, the other layers are redefined, but the parameters such as the weights trained in each layer are shared. After the whole model is trained, we get the end-to-end stack autoencoder detection model. In the third step, we introduce our model hyperparameters and loss in the training process. Table 2 shows the model parameters that had been tested a lot. And we get the best-stacked autoencoder detection model through these parameters. The training loss of the first four layers of the autoencoder network is shown in Fig. 6a-d. As shown in Table 2, we set the epochs to be 500 rounds. During the training, we set the loss value to be saved every 1 round. In Fig. 6a, we can see that the initial loss value of layer 1 is above 1.0, which is a relatively high loss. Because the initial weight and other parameters of the autoencoder network in layer 1 are randomly initialized [39], and the CT image sent into the network for training is the image with added noise. The loss reduces to about 0.5 after 30 epochs, after which the loss is reduced to the minimum and starts to appear slight fluctuations. In Fig. 6b, since the input of the layer 2 autoencoder network is obtained through the layer 1 encoder network, the initial value of the layer 2 training loss is a small value like 0.55. Then it drops to about 0.45. The descending curves of the training loss in layer 3 and layer 4 are similar to that in layer 2, as shown in Fig. 6c-d.

Table 2

The parameters of the model

Parameters	Values
Hidden layer	5
Neurons	#1 3x3x16
	#2 3x3x32
	#3 3x3x64
	#4 3x3x128
	#5 6272
Learning rate	0.001
Activation function	#1#2#3#4#5 Relu
Activation function	#6 Softmax
Loss function	#1#2#3#4 Mean squared loss
Loss function	#6 Sparse softmax cross entropy
Optimizer	Adam
Epochs	500
Batch size	128

Fig. 6

Training loss of stacked autoencoder model: a is the training loss of autoencoder layer 1, b is the training loss of autoencoder layer 2, c is the training loss of autoencoder layer 3, d is the training loss of autoencoder layer 4, e is the training loss of the last classification layer, f is the training loss and accuracy of cascaded stacked autoencoder model

The parameters of the model The decline of classification loss in the last layer seems much more gradual than that in the first four layers, as shown in Fig. 6e. And the training loss after the drop is still larger in Fig. 6b-d. It shows that the overall loss has not fallen much. This is because the loss functions of the layer 2, 3, 4 and last layer each add the previous reconstruction loss. In the process of gradient descent and back-propagation, the encoding function of each layer can further improve the ability of feature extraction. In Fig. 6f, we can preliminarily see the effect of the new loss function constructed by the superposition reconstruction loss. Figure 6f is the training loss obtained from unified training by connecting all levels. This cascading network shares the parameters such as the weight of the trained decoding function, so it can be seen from the Fig. 6f that the initial training loss is small, and its value is below 0.4, and then down to about 0.3. And the training accuracy has been above 0.9. Training loss of stacked autoencoder model: a is the training loss of autoencoder layer 1, b is the training loss of autoencoder layer 2, c is the training loss of autoencoder layer 3, d is the training loss of autoencoder layer 4, e is the training loss of the last classification layer, f is the training loss and accuracy of cascaded stacked autoencoder model

Experimental design and results

Evaluation metrics

In the experiment, the following performance metrics are used to measure the performance of the stack autoencoder detection model. TP represents the true positive. TN represents the true negative. FP represents the false positive. FN represents the false negatives. The matrix consisting of the parameters of the four-test metrics is the confusion matrix. The values of each performance index can be calculated from the confusion matrix [40], where: Accuracy reflects the judgment ability of the detection model to the whole test set, it can judge the positive as positive and the negative as negative. Recall refers to the proportion of the predicted positive cases in the total positive cases. Precision refers to the proportion of the real positive example in the positive example judged by the detection model. F1 refers to the harmonic mean of precision rate and recall rate and represents the discriminant ability of the model for each category.

Experimental results

In this section, we report the experimental results of our model on the test set and compare them with some existing models. The stacked autoencoder detection model has trained a total of six times. For the first time, we trained and tested our model with the dataset obtained by the original dataset partitioning method in Table 1. The test results of the confusion matrix are shown in Fig. 7a. Figure 7b-f is the test results of the confusion matrix obtained by using the 5-fold cross-validation method shown in Fig. 5. Also, accuracy, precision, F1-score, and recall results for the binary classification task are given in Table 3.

Fig. 7

The original and 5-fold confusion matrix results for the binary classification task: a Original confusion matrix, b Fold-1 CM, c Fold-2 CM, d Fold-3 CM, e Fold-4 CM, and f Fold-5 CM

Table 3

Performance metrics the proposed model including each fold, an average of 5 folds, and the original dataset

Folds	Accuracy (%)	Precision (%)	Recall (%)	F1 Score (%)
Fold-1	86.2	100	75.0	85.7
Fold-2	93.7	89.7	100	94.6
Fold-3	97.4	97.6	97.6	97.6
Fold-4	97.5	97.7	97.7	97.7
Fold-5	98.7	97.7	100	98.8
Average	94.7	96.54	94.1	94.8
Original	88.4	100	77.1	87.1

Performance metrics the proposed model including each fold, an average of 5 folds, and the original dataset The original and 5-fold confusion matrix results for the binary classification task: a Original confusion matrix, b Fold-1 CM, c Fold-2 CM, d Fold-3 CM, e Fold-4 CM, and f Fold-5 CM It can be noted from Table 3 that the proposed model has achieved an average accuracy of 94.7% in detecting COVID-19 and the obtained average precision, recall, and F1-score values of 96.54%, 94.1%, and 94.8%, respectively. This result is significantly better than the model performance trained by the dataset divided by the original method. From the results of 5-fold cross-validation, we speculate that this is because the distribution of the dataset divided by the original method deviates from the distribution of the standard COVID-19 CT image data. As a result, the model trained by this dataset can only detect COVID-19 CT images with certain features, so its test results in the test set are worse than the results of 5-fold cross-validation. The training set obtained by the method of Fold-1 and the original division method contain CT images deviating from the standard distribution, so the experimental result of Fold-1 is similar to the experimental result of the original division method. And the other four cross-validation models showed high generalization on the test set. Meanwhile, in addition to training our model, we also trained the baseline model in [26] on the original partitioned dataset and obtained test results, as shown in the first row of Table 4. Moreover, we build the same convolutional network detection model as our model framework. Similarly, we train the convolutional network detection model on the original partitioned data sets and obtain the test results, as shown in the second row of Table 4. The first three models in Table 4 are all trained and tested with the data sets obtained by the original data partitioning method. The fourth and fifth rows show the performance of the model presented in recent related studies. Both the two models focus on the COVID-19 CT image binary classification task. The performance of all models is shown in Table 4. From the comparison of the performance, we can see that our model archives the best performance. And it is the first time to use a stacked autoencoder to train the COVID-19 detection model on a small CT dataset. Our technical contribution lies in we build a stacked convolutional autoencoder and design a new loss function which adds the reconstruction loss to classification loss for our detection model. The comparison results show that our stacked autoencoder model is indeed effective. This is mainly due to the stacked autoencoder neural network has a strong feature expression ability and the advantages of deep convolutional neural network. It can usually obtain the hierarchical grouping structure feature or the partial-whole structure feature of the input. The stacked autoencoder tends to learn the characteristic vector corresponding to the sample, which can better represent the data characteristics of the high-level sample.

Table 4

Performance comparison of different deep learning models

The Model	Accuracy (%)	Precision (%)	Recall (%)	F1 Score (%)
Baseline Model [26]	84.7	97	76.2	85.3
Convolution Model	84.2	90	77.1	83.1
Our Model (Original dataset)	88.4	100	77.1	87.1
DRE-Net [27]	86.0	79.0	96.0	87.0
M-Inception [28]	82.9	73.0	88.0	77.0
Our Model (Average of 5 folds)	94.7	96.54	94.1	94.8

Performance comparison of different deep learning models Besides, after 5-fold cross-validation, the average performance metrics of our model are much better than that of the baseline model except for precision. The average precision of our model is not higher than that of the baseline model. Through further analysis and verification, we find that is because many samples in the training set divided by Fold-2 have special characteristics, which interferes with the discriminant results of the model. We can also see that from the Fold-2 experiment in Table 3. In the experimental results of Fold-2, the precision of the model is only 89.7%, which is quite different from the precision of the other four crossover experiments. This also proves that our analysis results are reasonable. The inspiration is to try to screen out those special samples that have a great influence on the model when training the model.

Discussions

In this section, we discuss the reasons why our model can achieve better detection performance. We find that this is mainly because the stacked autoencoder detector model has the following advantages: Firstly, each layer of the stacked autoencoder detector model can be trained separately, which ensures the controllability of the dimensionality reduction of the CT scan image features. The training results of each convolution layer can be obtained. Figure 8 shows the process of extracting features from each layer convolution in the stacked autoencoder detection model. In Fig. 8, there are a total of ten columns, and each column displays different CT images feature maps extracted by each layer in the stacked autoencoder model. Besides, the figure has 12 rows, and each row shows feature maps obtained by a feature extraction operation. The 12 operations were divided into four blocks, and the operations of each block are 3 by 3 convolution, Max-Pooling, and LRN. The first, fourth, seventh, and tenth rows are the feature maps obtained after the convolution operation. These four convolution layers have been trained by the autoencoder. After convolution, the maximum pooling operation and LRN normalization are performed to reduce the dimensionality of the feature maps and improve the response value of useful features.

Fig. 8

The process of extracting features in the stacked autoencoder detection model. From top to bottom: twelve different feature extraction operations. From left to right: ten different CT scan images feature maps From the last raw feature maps of Fig. 8, we can see that our model can extract sample features useful for binary classification from the original CT input image after four-layer autoencoder training alone. But it is worth noting that if the original input image has some special features that are not related to classification, it will have an impact on the final classification result. For example, in the extraction process of the sixth column of the feature maps in Fig. 8, we found that there are some useless features outside the chest CT range. However, these useless features are characterized in the last row of the feature map, which will interfere with the subsequent classification operations. This was also mentioned when discussing the experimental results earlier. Secondly, the regularization item added in each layer also plays an important role in improving the detection accuracy of the detection model. The regularization term is helpful for the model to find a better local optimal point when gradient descent is carried out so that the model finally achieves a good convergence effect. Figures 9 and 10 respectively show the testing effect of autoencoder layer 1 and layer 2. In Fig. 9, the first row is the original image after adding noise, and the second row is the original CT image. The third row is the recovered output image by the layer 1 decoding. As can be seen from Fig. 9, the Gaussian noise added in the original CT scan image can be removed, and some high-dimensional features of the original data can be extracted. In Fig. 10, the first two rows of images are the original CT images and the CT images after adding noise. The third row is the output of the original CT image after the dimension reduction of the first layer. The fourth row is the decoding output of the layer 2. We can see from Fig. 10 that the output of features after deeper extraction is in a state of low-dimensional features. And the detection model can quickly obtain the low-dimensional features of this layer and optimize the parameters of this layer through gradient descent. We can also see from Figs. 9 and 10 that the input and output feature maps of autoencoder layer 1 and layer 2 are very close. This is also the obvious manifestation of the regularization effect after adding the reconstruction loss as the regularization item in each layer.

Fig. 9

Test results of layer 1

Fig. 10

Test results of layer 2

Test results of layer 1 Test results of layer 2 Thirdly, a stacked autoencoder network is a method of using an autoencoder network, which is a neural network composed of multi-layer trained autoencoder. Since each layer in the network is trained separately, it is equivalent to initialize a reasonable value for the parameters of each layer in the network before the cascade training. So, this network is easier to train and has faster convergence and higher accuracy. It is usually not easy to build a complete set of available models for high-dimensional classification problems. Only blindly increasing depth will only make the results more and more uncontrollable. And the network will be an uncontrollable black box in the end [41]. The dimension reduction layer by layer can simplify the complex problem. The stacked autoencoder detector can be used to train any deep network. For the stacked autoencoder, the features after dimensionality reduction are directly used for the secondary training. The depth of arbitrary layers can be deepened without worrying about the gradient disappearance in the training.

Conclusions

In this paper, we have proposed a fast and accurate stacked autoencoder detection model to detect COVID-19 cases from chest CT images. And our model is fully automated with an end-to-end structure without the need for manual feature extraction. In the current severe epidemic, our model can detect COVID-19 positive cases quickly and efficiently. The stacked autoencoder detector model can help front-line clinicians to diagnose suspected cases. And the auxiliary diagnostic model developed by using artificial intelligence methods such as deep learning is of great significance to the prevention and control of epidemic diseases in countries and regions with a shortage of medical materials and equipment in the world. Besides, with the release of more and more COVID-19 chest CT scan image datasets, the detection accuracy of such deep learning models as the stacked autoencoder detector will be greatly improved. It will play a great role in the prevention and control of the COVID-19 epidemic and cutting off the transmission chain.

24 in total

1. How poorer countries are scrambling to prevent a coronavirus disaster.

Authors: Amy Maxmen
Journal: Nature Date: 2020-04 Impact factor: 49.962

2. Theory of deep convolutional neural networks: Downsampling.

Authors: Ding-Xuan Zhou
Journal: Neural Netw Date: 2020-01-25

Review 3. Backpropagation and the brain.

Authors: Timothy P Lillicrap; Adam Santoro; Luke Marris; Colin J Akerman; Geoffrey Hinton
Journal: Nat Rev Neurosci Date: 2020-04-17 Impact factor: 34.870

4. Deep Learning Enables Accurate Diagnosis of Novel Coronavirus (COVID-19) With CT Images.

Authors: Ying Song; Shuangjia Zheng; Liang Li; Xiang Zhang; Xiaodong Zhang; Ziwang Huang; Jianwen Chen; Ruixuan Wang; Huiying Zhao; Yutian Chong; Jun Shen; Yunfei Zha; Yuedong Yang
Journal: IEEE/ACM Trans Comput Biol Bioinform Date: 2021-12-08 Impact factor: 3.710

Review 5. Molecular advances in severe acute respiratory syndrome-associated coronavirus (SARS-CoV).

Authors: Ken Yan Ching Chow; Chung Chau Hon; Raymond Kin Hi Hui; Raymond Tsz Yeung Wong; Chi Wai Yip; Fanya Zeng; Frederick Chi Ching Leung
Journal: Genomics Proteomics Bioinformatics Date: 2003-11 Impact factor: 7.691

6. The establishment of reference sequence for SARS-CoV-2 and variation analysis.

Authors: Changtai Wang; Zhongping Liu; Zixiang Chen; Xin Huang; Mengyuan Xu; Tengfei He; Zhenhua Zhang
Journal: J Med Virol Date: 2020-03-20 Impact factor: 20.693

7. Pandemic potential of 2019-nCoV.

Authors: Robin Thompson
Journal: Lancet Infect Dis Date: 2020-02-07 Impact factor: 25.071

8. Digital technology and COVID-19.

Authors: Daniel Shu Wei Ting; Lawrence Carin; Victor Dzau; Tien Y Wong
Journal: Nat Med Date: 2020-04 Impact factor: 53.440

9. CoroNet: A deep neural network for detection and diagnosis of COVID-19 from chest x-ray images.

Authors: Asif Iqbal Khan; Junaid Latief Shah; Mohammad Mudasir Bhat
Journal: Comput Methods Programs Biomed Date: 2020-06-05 Impact factor: 5.428

14 in total

1. A fast lightweight network for the discrimination of COVID-19 and pulmonary diseases.

Authors: Oussama Aiadi; Belal Khaldi
Journal: Biomed Signal Process Control Date: 2022-06-21 Impact factor: 5.076

Review 2. The COVID-19 epidemic analysis and diagnosis using deep learning: A systematic literature review and future directions.

Authors: Arash Heidari; Nima Jafari Navimipour; Mehmet Unal; Shiva Toumaj
Journal: Comput Biol Med Date: 2021-12-14 Impact factor: 6.698

3. COVID-19 prediction using AI analytics for South Korea.

Authors: Adwitiya Sinha; Megha Rathi
Journal: Appl Intell (Dordr) Date: 2021-04-08 Impact factor: 5.086

4. Artificial intelligence (AI) for medical imaging to combat coronavirus disease (COVID-19): a detailed review with direction for future research.

Authors: Toufique A Soomro; Lihong Zheng; Ahmed J Afifi; Ahmed Ali; Ming Yin; Junbin Gao
Journal: Artif Intell Rev Date: 2021-04-15 Impact factor: 9.588

5. Hybrid PSO-SVM algorithm for Covid-19 screening and quantification.

Authors: M Sahaya Sheela; C A Arun
Journal: Int J Inf Technol Date: 2022-01-12

6. A novel unsupervised approach based on the hidden features of Deep Denoising Autoencoders for COVID-19 disease detection.

Authors: Michele Scarpiniti; Sima Sarv Ahrabi; Enzo Baccarelli; Lorenzo Piazzo; Alireza Momenzadeh
Journal: Expert Syst Appl Date: 2021-12-16 Impact factor: 6.954

7. COVID-Transformer: Interpretable COVID-19 Detection Using Vision Transformer for Healthcare.

Authors: Debaditya Shome; T Kar; Sachi Nandan Mohanty; Prayag Tiwari; Khan Muhammad; Abdullah AlTameem; Yazhou Zhang; Abdul Khader Jilani Saudagar
Journal: Int J Environ Res Public Health Date: 2021-10-21 Impact factor: 3.390

8. Exploiting probability density function of deep convolutional autoencoders' latent space for reliable COVID-19 detection on CT scans.

Authors: Sima Sarv Ahrabi; Lorenzo Piazzo; Alireza Momenzadeh; Michele Scarpiniti; Enzo Baccarelli
Journal: J Supercomput Date: 2022-02-24 Impact factor: 2.557

Review 9. Supervised and weakly supervised deep learning models for COVID-19 CT diagnosis: A systematic review.

Authors: Haseeb Hassan; Zhaoyu Ren; Chengmin Zhou; Muazzam A Khan; Yi Pan; Jian Zhao; Bingding Huang
Journal: Comput Methods Programs Biomed Date: 2022-03-05 Impact factor: 7.027

Review 10. Role of Artificial Intelligence in COVID-19 Detection.

Authors: Anjan Gudigar; U Raghavendra; Sneha Nayak; Chui Ping Ooi; Wai Yee Chan; Mokshagna Rohit Gangavarapu; Chinmay Dharmik; Jyothi Samanth; Nahrizul Adib Kadri; Khairunnisa Hasikin; Prabal Datta Barua; Subrata Chakraborty; Edward J Ciaccio; U Rajendra Acharya
Journal: Sensors (Basel) Date: 2021-12-01 Impact factor: 3.576