Literature DB >> 35602305

Deep learning based model for classification of COVID -19 images for healthcare research progress.

Saroj Kumar¹, L Chandra Sekhar Redd², Susheel George Joseph³, Vinay Kumar Sharma⁴, Sabireen H⁵.

Abstract

As imaging technology plays an important role in the diagnosis and evaluation of the new coronavirus pneumonia (COVID-19), COVID-19 related data sets have been published one after another, but there are relatively few data sets and research progress in related literature. To this end, through COVID-19-related journal papers, reports, and related open-source data set websites, organize and analyze the new coronary pneumonia data set and the deep learning models involved, including computed tomography (CT) image data sets and X-ray (CXR) Image dataset. Analyze the characteristics of the medical images presented in these data sets; focus on open-source data sets, as well as classification and segmentation models that perform well on related data sets. Finally, the future development trend of lung imaging technology is discussed.

Copyright © 2022 Elsevier Ltd. All rights reserved. Selection and peer-review under responsibility of the scientific committee of the International Conference on Innovative Technology for Sustainable Development.

Entities: Chemical

Keywords: COVID-19 Data Set; Deep Learning; Image Classification; Image Segmentation

Year: 2022 PMID： 35602305 PMCID： PMC9113957 DOI： 10.1016/j.matpr.2022.04.884

Source DB: PubMed Journal: Mater Today Proc ISSN： 2214-7853

Introduction

Medical imaging has become a common and effective method for auxiliary diagnosis of diseases. Since the outbreak of COVID-19, although CT images lack a certain sensitivity to the characteristics of early new coronary lesions, it is still necessary to use CT images to assist in diagnosis and treatment tracking of patients with new coronary pneumonia, and to use X-ray images for auxiliary detection [1], [2], [3]. At present, the use of artificial intelligence technology to perform image classification and lesion segmentation on COVID-19 CT images and CXR images has become a topic of widespread concern in medical image analysis, and scholars have also carried out a large number of related studies. Data sets and algorithm models are the two most important factors currently studied based on deep learning methods. In the early stage of the outbreak of the new crown pneumonia, due to patient privacy, relevant COVID-19 image data sets were rarely made public. The COVID-19 image data set used in most research work contains tens to hundreds of image data. There are also some papers using private data sets with more data, but these data sets do not support widespread use [4], [5], [6], [7], [8]. Due to the lack of sufficient training data, most research work uses DataAugmentation to expand the training set, and proposes a variety of detection and segmentation models based on small sample COVID-19 data sets. With the development of related research work and the accumulation of medical image data, a number of large-scale COVID-19 image data sets have been released to the public. This article combs through a large number of scattered open-source data sets mentioned in different works of literature and reports, provides relevant descriptions and download links; analyzes and summarizes the mainstream algorithm models and application characteristics of COVID-19 image classification and image segmentation; compares CT images and CXR Describe the characteristics of the image.

Research model based on deep learning

COVID-19 research based on deep learning can be classified and explained from the perspective of model tasks (classification or segmentation). The performance of lung lesions of different severity is different, which brings certain challenges to the task of classification and segmentation. Convolutional Neural Network (CNN) learns the high-level features of the image and maps it to a one-dimensional vector, and then outputs the classification results through the softmax layer; the segmentation is based on the U-shaped structure, the encoder first extracts the features through convolution, and the decoder Then through the deconvolution for pixel classification, and finally output the segmentation label. With the release of a large number of data sets, high-quality data sets can help model the type accurately extracts lesion features, and most of the data sets have been used by scientific researchers for efficient diagnosis and prognosis of COVID-19. Most models are trained on multiple data sets to improve the generalization ability of the model. Some of the earlier open source data sets are used more, and some data sets have not yet been widely used.

COVID-19 classification model

For the classification tasks of new coronary pneumonia, there are generally two classifications (to distinguish between COVID-19 and non-COVID-19) and three classifications (to distinguish between COVID-19, ordinary pneumonia and normal). CT Image Classification The performance comparison of CT image classification models is shown in Table 1. The 3D classification model generally performs better than the 2D model. Still, there is currently no universally available 3D pre-training model. The 3D model has a large number of parameters, which is challenging to converge when the amount of data is small, and it is easy to overfit. The classification effect of the model trained on COVID-CT-Dataset is not good, and the reason may be due to too little data or poor quality. On the SARS-CoV-2 CT dataset, DenseNet201 has the best classification performance.

Table 1

CT Image Texture Features Of Lesions Abnormal Tissues.

Group	Type	Average Gray	Standard Deviation	Skewness	Energy	Entropy
First Group	Lesion	179.3	58.0	0.026	0.019	6.86
First Group	Normal	94.3	43.3	0.950	0.012	6.84
Second Group	Lesion	166.5	50.2	0.250	0.008	7.31
Second Group	Normal	86.4	45.1	1.340	0.012	6.84
Third Group	Lesion	160.1	49.9	0.170	0.007	7.47
Third Group	Normal	87.3	53.6	0.840	0.010	7.06

CXR image classification The CXR classification model is similar to the CT classification model, using methods such as data enhancement, transfer learning, and ensemble learning, and the classification accuracy is high. It is worth noting that some lightweight CNNs tend to have better classification results than complex structures.

COVID-19 segmentation model

The segmentation of the COVID-19 lesion area is mainly used on CT images. At present, there are few data sets used for segmentation, and the edges of COVID-19 lesions are blurred. As a result, the Dice index for lesion area segmentation has not reached 90%, and the generalization ability of the model is not strong. There are still challenges in this segmentation field. Attention Mechanism Combining the attention mechanism can better highlight the characteristics of the segmentation area and enrich the context-dependent information. Author in [8] proposed combining the attention mechanism into the U-Net architecture, introducing the scSE attention module to capture contextual information to obtain better feature representation; the encoder and decoder parts use the residual of dilated convolution (Res_dil) to increase the receptive field. Author in [9] proposed a dynamically deformable attention network DDANet, and introduced the CCA (Criss-Cross Attention) module [7] into the U-Net architecture to continuously learn attention coefficients. The segmentation effect of this model is better than that of U-Net. Net and Inf-Net have improved significantly. The attention mechanism model (D2A U-Net) automatically segments lung infections in CT slices and expands the receptive field by expanding convolution to prevent information loss. At the same time, the introduction of gated attention module (Gate Attention Module, GAM).

COVID-19 imaging manifestations

Computed tomography (CT) and X-ray (CXR) images are common and important chest medical imaging data. In medical image analysis, the statistical and texture features of lesion images are very important basis for image detection and recognition, and are widely used to quantitatively describe the characteristics of lesion images [9]. Statistical Characteristics Most medical images are grayscale images, and the grayscale value statistics show low-contrast grayscale characteristics. The following introduces two statistical features that can effectively distinguish gray levels in medical images. Skewness: A measure of the degree of asymmetry relative to the mean gray value. Pass right the measurement of the skewness coefficient can determine the degree of asymmetry of the data distribution and direction. The formula is as follows: In the formula, σ is the standard deviation; m is the average gray level; rj is the gray value of the histogram whose probability density is not 0; p(rj) is the probability density corresponding to rj. Entropy: Reflects the average amount of information in the image, and is often used to describe the complexity of the image. Use p(rj) to denote the probability density corresponding to rj, and the definition of entropy is as follows: Texture Feature The texture feature is usually not derived directly from the image, but the characteristics of the original image are first extracted through some calculation and stored in an intermediate matrix. In medical imaging research, the most commonly used texture feature is the gray-level co-occurrence matrix [10], [11], [12], and it is measured by some characteristics of the gray-level co-occurrence matrix. The following is the definition of inverse gap and correlation. Inverse gap: reflect the local changes of the image texture. The formula is as follows: In the formula, pij is the normalized count of the number of occurrences of position j from position i when the specified distance is d. CT Imaging Performance Chest computed tomography is a non-invasive scan to obtain accurate images of the patient's chest. For patients with different severity of COVID-19, their chest CT images show different characteristics [13]. The most common manifestations of COVID-19 patients on lung CT are ground-glass opacity (GGO) and consolidation (CL). When the disease worsens, the number of GGO and consolidation increases, and they are mainly distributed at the edge of the lung; as the disease improves, the lesions are gradually absorbed to form fibrotic stripes [14], [15], [16], [17], [18], [19]. Most patients will also show imaging features such as thickened interlobular septum and thickened bronchial blood vessels [20]. Fig. 1 shows the CT imaging findings of the patient's lungs.

Fig. 1

CT image of lungs of a COVID-19 patient.

CT image of lungs of a COVID-19 patient. Combined with the statistical characteristics of the image, three groups of normal and infected lung CT samples from COVID-19 were randomly selected in the CC-CCII data set [21], and the statistical characteristics of the lesion and normal lung tissue area were analyzed and compared. The results As shown in Table 1 . CT Image Texture Features Of Lesions Abnormal Tissues. It can be seen from Table 1 that the standard deviation, skewness, and entropy of the lesion area and the normal tissue are significantly different. Because the lesion area has ground glass characteristics, the mean value is higher, and the asymmetry is high, and the skewness value is small. Based on statistical features, although they cannot be obviously used for lesion detection and judgment, the differences of these features have certain reference value for feature learning and structural design of deep learning networks. X-Ray Image Performance Compared with CT scan tomography, X-ray CXR images are easier to obtain, which is widely used for chest imaging detection. In COVID-19 imaging diagnosis, the main obstacle to the use of CXR is the lack of details that can be confirmed visually. CXR images appear as airspace turbidity, mainly distributed at the edge of the lung [22], as shown in Fig. 2 . In actual use, CXR and CT are usually combined for better diagnostic analysis [23].(See Fig. 3. ).

Fig. 2

CT Image Texture Features Of Lesions And Normal Tissues.

Fig. 3

Lungs CXR of normal and COVID-19 patients.

CT Image Texture Features Of Lesions And Normal Tissues. Lungs CXR of normal and COVID-19 patients. Because the CXR image lacks detailed information, the texture characteristics of the entire image are compared. In the COVID-19 Radiography Database data set [24], three groups of normal and patient lung CXR samples were randomly selected. Introduction to the COVID-19 Radiography Database data set. The results of the texture feature analysis are shown in Fig. 2. Compared with the CXR image of the lung infected with COVID-19, the CXR image of the normal lung has a certain difference in the texture feature based on the gray-level co-occurrence matrix. However, some differences are not very obvious. Only the contrast of the data is more obvious. The contrast of the infected person's image is 2 to 3 times that of the normal lung image. Researchers are proposing several security protocols[24], [25], [26], [27], [28], [29], [30] in healthcare system to protect healthcare information among various users and servers in wireless network.

Conclusion

This article mainly analyzes the application of different imaging data sets of new coronary pneumonia under different tasks. It collects and organizes 18 open-source image data sets, 13 of which contain CT images and 8 data sets that contain CXR images. These are provided. The description and download link of the data set. Because the data sets come from different countries, different institutions and different equipment, the quality of the collected image data is uneven, and there is a lack of certain quality standards, which results in some data sets that cannot be used well. Therefore, learn from TCIA for some standards for image data collection, it is recommended that when new data collection is performed, the consistency of the image data format, the standardization of metadata (data, date, location, image resolution, etc.), and the integrity of data tags should be carried out. Uniform specification requirements, or carry out research on the quality evaluation standards of collected images. In addition, since medical imaging data often includes the patient's personal information, the data collection should be deprived of privacy, so that the image and lesion marking information should be separated from the patient's information. Combined with the classification and segmentation tasks of COVID-19 images, the applications of the current mainstream deep learning algorithm models are compared. The idea of attention mechanism has achieved more obvious effects in medical image analysis, but the global attention mechanism is currently used. The focus area of medical imaging has typical local characteristics. The study of local attention mechanisms will become a more effective way in the future. Research ideas. At the same time, the small sample set and the data are not balanced method research is still a problem worthy of an in-depth discussion in the field of medical image processing.

CRediT authorship contribution statement

Saroj Kumar: Data curation, Formal analysis, Conceptualization, Writing – review & editing, Investigation, Methodology. L Chandra Sekhar Redd: Data curation, Investigation, Methodology, Validation, Software, Writing – review & editing, Software. Susheel George Joseph: Data curation, Visualization, Investigation, Validation, Software, Software. Vinay Kumar Sharma: Data curation, Visualization, Methodology, Validation, Software, Writing – review & editing, Software. Sabireen H: Visualization, Investigation, Methodology, Validation, Software, Writing – review & editing, Software.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

5 in total

5. COVID-19-CT-CXR: A Freely Accessible and Weakly Labeled Chest X-Ray and CT Image Collection on COVID-19 From Biomedical Literature.

Authors: Yifan Peng; Yuxing Tang; Sungwon Lee; Yingying Zhu; Ronald M Summers; Zhiyong Lu
Journal: IEEE Trans Big Data Date: 2020-11-04