Literature DB >> 36119394

Chest X ray and cough sample based deep learning framework for accurate diagnosis of COVID-19.

Santosh Kumar1, Rishab Nagar1, Saumya Bhatnagar1, Ramesh Vaddi2, Sachin Kumar Gupta3, Mamoon Rashid4,5, Ali Kashif Bashir6, Tamim Alkhalifah7.   

Abstract

All witnessed the terrible effects of the COVID-19 pandemic on the health and work lives of the population across the world. It is hard to diagnose all infected people in real time since the conventional medical diagnosis of COVID-19 patients takes a couple of days for accurate diagnosis results. In this paper, a novel learning framework is proposed for the early diagnosis of COVID-19 patients using hybrid deep fusion learning models. The proposed framework performs early classification of patients based on collected samples of chest X-ray images and Coswara cough (sound) samples of possibly infected people. The captured cough samples are pre-processed using speech signal processing techniques and Mel frequency cepstral coefficient features are extracted using deep convolutional neural networks. Finally, the proposed system fuses extracted features to provide 98.70% and 82.7% based on Chest-X ray images and cough (audio) samples for early diagnosis using the weighted sum-rule fusion method.
© 2022 Elsevier Ltd. All rights reserved.

Entities:  

Keywords:  COVID-19; Chest X-ray; Cough-breathing sounds; Deep learning; Multimodal; Segmentation

Year:  2022        PMID: 36119394      PMCID: PMC9472671          DOI: 10.1016/j.compeleceng.2022.108391

Source DB:  PubMed          Journal:  Comput Electr Eng        ISSN: 0045-7906            Impact factor:   4.152


Introduction

The COVID-19 pandemic had unprecedented global social and economic impact cite1. With more than 50 million infected patient cases and more than 500,000 deaths as of mid-July 2020, the pandemic continues to spread globally with coronavirus new strain without signs of abating. It is still increasing the number of infected people worldwide. Disruptive technologies and emerging computing algorithms are being used by various organizations and public union states to find efficient solutions for early patient diagnosis [1]. Several interdisciplinary researchers and clinical testing organizations have deployed diagnosis models for patient tracing, testing, and treatment strategies that have successfully crushed the pandemic situation and facilitated medical services around the world [2]. These clinical testing methods include polymerase Chain Reaction (PCR) and other procedures that rapidly make millions to billions of copies of specific Deoxyribonucleic acid (DNA) samples. It is widely used to identify COVID-19 patients through the use of biological Macromolecule Ribonucleic Acid (m-RNA), a nasopharyngeal, oropharyngeal swab of infected people [3]. By detecting unique sequences of virus RNA, the real-time Reverse Transcription Polymerase Chain Reaction (rRT-PCR) method is also used to confirm COVID-19 infected cases. While the (rRT-PCR) testing system is the current gold standard cite4. It is insufficient for controlling the pandemic for several reasons, including, but not limited to, (1) limited availability of COVID-19 testing due to huge geographical and temporal factors, (2) the demand for in-person visits to a hospital, clinical lab, or mobile lab center, (3) The real-time Reverse Transcription Polymerase Chain Reaction (rRT-PCR) method is also used to confirm COVID-19 infected cases by detecting unique sequences of virus RNAg. To address these issues, several researchers [4], [5] proposed a model based on the possibility of using widespread and simple chest X-ray (CXR) imaging for early diagnosis and screening of COVID-19 patients, attracting considerable interest from both clinical procedures and AI-based early diagnosis techniques. Several researchers cite9-cite12 proposed and tested frameworks for screening patients based on various statistical modeling techniques such as epidemiological-based methods and real-time Reverse Transcription Polymerase Chain Reaction (rRT-PCR) method is also used to confirm COVID-19 infected cases by detecting unique sequences of virus RNA [6]. The unimodal model processes accumulated sample data about susceptible, infected, recovered, and deceased populations to provide significant statistical measures based on a patient database for early classification (e.g., positive, negative, and inclusive) of COVID-19-infected patients. Due to the proper availability of the annotated COVID-19 database, these models require only statistical measures for further data processing and produce inaccurate results. Furthermore, the performance of these frameworks and models is poor, which is hampered by the lack of properly labeled datasets for accurate classification of the COVID-19 case. It is a significant difficulty for early classification. As a result, statistical learning-based unimodal frameworks cannot produce meaningful results for early diagnosis. Hence, there is a need to design and develop multimodal learning frameworks to process the massive number of databases for the early diagnosis of infected patients using deep multimodal techniques.[4]. Deep multimodal learning frameworks are pulling the eyes of many researchers nowadays. These techniques help us provide better solutions to fight this virus within-population screening, medical help, notification, and infection control suggestions [7]. These emerging techniques improve the planning, treatment, and reported outcomes of this pandemic. The deep multimodal learning frameworks are highly used to represent better extracted discriminatory feature sets from multiple heterogeneous databases of COVID-19 patients. The multimodal framework processes the captured data through different deep learning techniques and extracts features from each separate dataset. The measured scores from unimodal datasets can be integrated using deep multimodal fusion methods for accurate classification. The multimodal frameworks provide the predict mortality risk by adequately analyzing these collected datasets of the patients [8].

Motivation

The second wave of the COVID-19 pandemic has resulted in an increase require expensive kits including testing, that are not always easily obtainable, working to develop or undeveloped countries are also still working hard to improve their detection performance [8]. As a consequence, low-cost, easily available, and dependable pre-screening tests are critical for identifying and diagnosing COVID-19 and limiting local emergence and spread of infection [9]. To provide solutions for early diagnosis of COVID-19 cases, numerous researchers have contributed to significantly alleviating early diagnostic procedures for COVID-19 patients using deep learning and machine learning techniques [10], [11]. In this paper, we address the following issue: How can COVID-19 patients be diagnosed early? We proposed a novel multimodal learning framework based on chest X-ray images, cough (sound) samples, and facial expressions using deep learning techniques to solve this problem. The framework contains a diagnostic model based on chest X-rays and a model based on cough for early COVID-19 detection. In addition to these models, we used the fatigue detection model to detect patient fatigue based on facial expressions captured in video. The model uses machine learning techniques to detect the patient’s fatigue level after classifying the patient’s facial emotions. Cough (audio) samples are known to behave energetically in lower frequencies for statistical analysis of cough samples to predict various respiratory diseases. As a result, COVID-19 analysis based on cough (audio) sample plays the important role in addressing COVID-19 problems and further investigates the distinctness of path morphological alterations in the human respiratory system induced by post-COVID-19 infections compared to other earlier COVID-19 and other respiratory infections [12], [13], [14]. As a result, we used cough samples to make an early diagnosis of COVID-19. The proposed framework uses speech signal processing-based techniques to remove noises from cough (audio) samples collected from COVID-19 patients. The proposed cough model computes the Mel spectrogram of the processed cough samples for better representation and analysis of cough samples. The discriminatory features are extracted from the spectrograms and classified using a convolutional neural network (CNN) model. The multimodal framework integrates the chest -X-ray-based diagnostic model to perform early diagnosis of COVID-19 based on captured chest computed tomography (CT) images using the deep learning-based U-NET learning model. The CT scan-based diagnostic has been recognized as an informative tool for diagnosing critical diseases. The chest X-ray model performs the extractions of multiple types of features (such as radiomic features and handcrafted features) for the classification of COVID-19 and Non-COVID-19 using deep learning techniques.

Contribution

The major contributions of this work are as follows: We proposed a novel multimodal framework that consists of a chest-X-ray imaging model, and cough (voice) sample model for early diagnosis of COVID-19 using deep multi-modal fusion and weighted sum rule fusion techniques. In the proposed framework, a chest X-ray-based model is used to extract texture and holistic features from chest X-ray image datasets using deep U-Net-based learning techniques and the Darknet model to segment the images for better representation of extracted features using structured latent representation learning techniques. It provides robustness, generalization, and stability to the proposed framework. To improve the performance of the proposed framework, a cough diagnostic model is proposed to extract the features from the cough samples of infected patients, because the cough (audio) sample of patients has distinct latent features from distinct respiratory syndromes. The extracted discriminant Mel frequency Cepstral Coefficients (MFCCs) features of cough samples are extracted by speech signal processing techniques and convolutional neural network to train a cough diagnostic learning model for performing the preliminary diagnosis solely based on the cough to differentiate COVID-19 cough from non-COVID-19 cough sample. The proposed framework integrates (fuse) the extracted features from chest-X-ray images, cough (voice) samples, and facial expression images using the weighted sum-rule fusion technique to perform accurate classification for early diagnosis of patients. The experimental results of the existing handcrafted texture feature descriptor techniques and holistic feature extraction along with the representation methods are used to evaluate the performance analysis of the proposed framework for patient diagnosis. The remaining part of the paper is organized as follows: Section 2 provides a literature review of the chest and cough-based methods for early diagnosis of patients. Section 3 shows a proposed framework for early classification using deep learning methods and the weighted sum rule fusion method. The experimental result and analysis are illustrated in Section 4. The comparative analysis of the proposed framework is given with the existing method for cough and chest X-ray-based methods. Finally, the conclusion and future directions are given in Section 5. Existing works for COVID-19 detection using machine learning techniques based on chest-X ray images and cough(audio) samples. Abbreviation: FC (FC)-DenseNet103, CON COVIDX-N, R1 ResNet50, S SFT SVMCNET COVID-Net, DNET Dark-CoviDNet MultiM Multimodal, In-v3 Inception v3, R (ResNet-18), R1 ResNet-50 COT 7 COVID-19 9 Healthy, SFT Singlet, Fourier transformation, SVM Support vector machine, SP Speech processing, T dataset consists of 4352 unique people collected from the web app 2261 unique people from the Android app 4352 and 5634 samples. Cough Crowd sourced Respiratory Sound Data, SF Shape chest Features, 2DFT Two-dimensional (2D) Fourier transformation, COVID10 [5 COVID19 5 healthy 10], ES Electronic stethoscope, BB 120 COVID 100 healthy) B Breathing, Sp Smartphone app, TM Trained Model, Per. Performance(%) R Recordings Source, RSS Respiratory Sound, NS Number of Subjects, NR Number of Recordings, PS Pre-processing Steps, ST Short-term magnitude spectrogram, CO COVID-19 1620 healthy, Mel Mel-frequency cepstral coefficients (MFCC), COD 2660 COVID-19 2660 healthy, CODD 114 COVID-19, 1388 healthy, MFCC Spectrogram Mel spectrum, power spectrum Tona, Mel-frequency cepstral coefficients (MFCC), ENCNN Ensemble CNN, Total (COVID-19 34 516 samples 2100 1500 chest X-ray positive 600 Chest X-ray: Negative).

Literature work

In this section, the literature work is divided into different subsections describing each part of the system’s literature survey separately. The summary of the existing work for early detection of COVID-19 based on chest X-ray images and cough samples is shown in Table 1.
Table 1

Existing works for COVID-19 detection using machine learning techniques based on chest-X ray images and cough(audio) samples.

StudyYearRRSSNSNRPSTMPer (%)
[3]2020CXR247DLFC92.5
[4]2019CXR125200DNETMLSP98%
[5]2020X-ray230DLCONSPSP90%
[6]2022BBB220MFCCSDLLow-D70%
[7]2022Sound200ChestHSRSFsCNN70%
[15]2020cough4352SPMFCCDLCNN80%
[16]2020WebCoughCOD5320DLR297.10%
[17]2020X-ray30R1CNNDLSP95.4%
[13]2021XCXR200CNNCNETSP92.4
[18]2020SPCoughCO3621STR-18AUC(0.72)
[19]2021ESBCOV10102DFTInv380%

Abbreviation: FC (FC)-DenseNet103, CON COVIDX-N, R1 ResNet50, S SFT SVMCNET COVID-Net, DNET Dark-CoviDNet MultiM Multimodal, In-v3 Inception v3, R (ResNet-18), R1 ResNet-50 COT 7 COVID-19 9 Healthy, SFT Singlet, Fourier transformation, SVM Support vector machine, SP Speech processing, T dataset consists of 4352 unique people collected from the web app 2261 unique people from the Android app 4352 and 5634 samples. Cough Crowd sourced Respiratory Sound Data, SF Shape chest Features, 2DFT Two-dimensional (2D) Fourier transformation, COVID10 [5 COVID19 5 healthy 10], ES Electronic stethoscope, BB 120 COVID 100 healthy) B Breathing, Sp Smartphone app, TM Trained Model, Per. Performance(%) R Recordings Source, RSS Respiratory Sound, NS Number of Subjects, NR Number of Recordings, PS Pre-processing Steps, ST Short-term magnitude spectrogram, CO COVID-19 1620 healthy, Mel Mel-frequency cepstral coefficients (MFCC), COD 2660 COVID-19 2660 healthy, CODD 114 COVID-19, 1388 healthy, MFCC Spectrogram Mel spectrum, power spectrum Tona, Mel-frequency cepstral coefficients (MFCC), ENCNN Ensemble CNN, Total (COVID-19 34 516 samples 2100 1500 chest X-ray positive 600 Chest X-ray: Negative).

Chest-X-ray modality based system

Various works done for COVID-19 detection using chest X-ray or similar inputs are mentioned in this subsection. Linda et al. [1] proposed COVID-net for prediction of the covid case and reported 93.3% accuracy. Soares et al. [2] proposed a system using three learning architectures Xception, ResNet, and VGG-16for classification of the infected patients based on Chest-X-ray images. These three models are used as the base and optimize only the fully connected layers. The accuracies claimed for Xception is 95.90%, for Resnet is 94.60%, and for VGG-16, the accuracy is 97.30%. Nevertheless, there is a significant overfitting issue in Exception and Resnet while low overfitting in VGG-16. Yujin et al. [3] have also used a backup algorithm of lung segmentation to get better results with 88.90% with segmentation and 79.80% without segmentation using the FC-Dense Net learning model. Tulin et al. [4] have made the classifier without transfer learning and getting good accuracy without over-fitting. They have implemented two kinds of classifiers binary for COVID-19 and no-findings and tertiary for COVID-19, Pneumonia, and no-findings. Hover, they have only one algorithm means they are directly putting the chest X-ray images into the classifier if there was a different algorithm like segmentation that can make the results more trustworthy.

Cough based diagnostic model

In this subsection, cough-based COVID-19 diagnosis has been reported in the literature. Imran et al. [7] collected cough samples from COVID-19, bronchitis, and pertussis patients along with healthy individuals and created an Artificial Intelligent (AI) engine/system. In a similar direction, Laguarta et al. [8] collected variable length, cough audio recordings with bit-rate 16 kBs. It consists of 2660 COVID-19 positive samples with a 1:10 ratio of positive to control subjects. Each sample is split into 6 s audio chunks and processed with an MFCCs feature extraction technique. The proposed model consists of 4 biomarkers, namely: (1) changes in vocal cords, (2) changes in lungs and (3) respiratory tracts, muscular degradation, and (4) changes in sentiment/mood able to classify with an accuracy of 97.10% [12], [13], [14], [18], [19], [20]. Based on overall observation, we conclude that the availability of the open source datasets of chest X-ray images is not efficient to perform the early diagnosis and accurate prediction of COVID-19 patients using machine learning techniques. The available open-source cough and chest X-ray datasets are limited and not available in proper annotated with appropriately labeled datasets for training various ML-based models for fast diagnosis and accurate prediction of COVID-19. For achieving a better analysis of different subject studies, there is a need to acquire chest-X ray images as well as cough sample datasets of the same subjects (patients) for performing early diagnosis of COVID-19 and post-COVID-19 symptoms using deep learning techniques.

Proposed framework

In this section, the work of the proposed multimodal framework is illustrated. The framework takes input data from (1) chest-X-ray images, (2) cough the (voice/speech) samples, and (3) facial expression (fatigue-based detection model) of infected users (shown in Fig. 1).
Fig. 1

Illustrates working of the proposed multimodal framework for the diagnosis of COVID-19.

Chest-X-ray images are scanned by X-rays (Optima IGS 330). The facial expression of the patient is captured through videos by the surveillance system. The system collects cough (speech) data using the microphone of the mobile phone of the users. While performing alphabet pronunciation, the user also speaks out the alphabet/vowels, which are being selected. It attempts to build a diagnostic tool for COVID-19based on respiratory, cough, and speech sounds. The participants need to record breathing sounds, cough sounds, sustained sound of vowel sounds, and a counting exercise that takes around 5–7 min. No personally identifiable data is collected from the participants. After that, the system performs the pre-processing and feature extraction from different models for accurate prediction of COVID-19. The extracted features are fused using the weighted sum-rule fusion method. A brief description of the model is given in the next subsections. Illustrates working of the proposed multimodal framework for the diagnosis of COVID-19. Database of chest X-ray images. A: IEEE-8023 CXR - Cohen dataset [21], B: Pneumonia and normal chest X-ray, C: Shenzhen CXR with Masks, D: Montgomery county CXR images, E: COVIDGR 1.0, W Women, M Men, N Negative cases, P positive D 426P(239W 187M) used training model.

Chest X-ray based COVID-19 detection model

The chest X-ray image database is collected from different sources for extracting rich and distinct information for early classification. The primary objective is to perform early diagnosis for COVID-19 patient and community-acquired pneumonia for characterizing the relationships between multiple types of discriminatory features from captured chest X-ray images and these diseases, which caters to a possible pipeline for early diagnosis of COVID-19 using deep learning techniques [9]. In the proposed multimodal system for COVID-19detection, a deep learning model is used to classify infected and on-infected persons based on the collected database. The Chest-X-ray-based working model consists of several steps: (1) database description and pre-processing, (2) segmentation of images, (3) feature extraction and representation, and (4) classification (shown in Fig. 2). The chest X-ray classification model classifies given input chest X–ray images into two classes: COVID-19 and non-COVID-19, it is a binary classification task which is reported the final result comes as COVID-19 or non-COVID-19. It is found that as increase the number of classes, the system may get less accurate. have divided the whole work of the chest X-ray model into several distinct patches using U-net based segmentation technique. The working of the model is shown in Fig. 3. Table 2 shows the different used chest X-ray image databases for the classification of COVID-19 using deep learning techniques.
Fig. 2

Illustrates working of pipeline chest X-ray imaging-based model.

Fig. 3

Shows segmentation model based on U-net learning architecture.

Table 2

Database of chest X-ray images.

DatasetSizeUsed model
A468Used to make COVID-19 class data
B5860Used to make non-COVID-19 class data
C566Segmentation model
D138Used for validation of segmentation model
E852426N(190W + 236M) + D

A: IEEE-8023 CXR - Cohen dataset [21], B: Pneumonia and normal chest X-ray, C: Shenzhen CXR with Masks, D: Montgomery county CXR images, E: COVIDGR 1.0, W Women, M Men, N Negative cases, P positive D 426P(239W 187M) used training model.

Illustrates working of pipeline chest X-ray imaging-based model. Shows segmentation model based on U-net learning architecture.

Data collection

We have collected the COVID-19 chest X-ray images and cough samples from open datasets [10]. Specifically, details on the X-ray images database are summarized in Table 2. The major significance between the prepared and open cough sample datasets is given as follows: Various datasets are used which are available at open-source platforms from GitHub and Kaggle. For the COVID-19 detection using chest X-ray images, the data sets are mentioned in Table 2. All four data sets have different usage in the model. The first chest X-ray image data set is IEEE-8023 which consists of 180 chest X-ray images of COVID-19 positive X- rays which are used to make the data for COVID-19 class. Pneumonia dataset from Kaggle: It contains 5863 CXR images of pneumonia and normal patients used to make data for the non-COVID-19 class. Shenzhen dataset and Montgomery dataset: have two image datasets, namely (1) Shenzhen dataset and (2) Montgomerydataset. Both datasets are used for obtaining information about lung segmentation due to lung mask availability for these datasets. Shenzhen dataset is used to train the segmentation model, and Montgomery is used to validating the lung segmentation model. There are 180 X-ray images involved in this research study. For COVID-19 positive, a total of 180 Chest-X-images are taken from GitHub. For Non-COVID class, the normal, and pneumonia chest X-ray images are considered from Kaggle. These two datasets are used to augment the classification of COVID-19 and non-COVID patients. After data collection, the database is pre-processed using image processing techniques to remove noises and other artifacts from images before providing it as input to the Chest-X-ray-based classification model. As mentioned in Fig. 2, the next step is lung segmentation using deep U-Net learning architecture to find the region of interest for extracting features to train the proposed model.

Segmentation of chest-X-ray image

Segmentation is the process of dividing the input image into the distinct region of interest. In the chest X-ray image processing model, the lung images of infected patients are divided into regions of interest using the deep U-Net-based DL architecture for classification (see Fig. 4). The pre-processing of Chest X-ray images is shown in Algorithm 1.
Fig. 4

Depicts the architecture of the proposed classification model.

The main objective of the segmentation process taken into the collected data is to archive proper extraction of discriminatory information from these chest X-rays database that plays an essential role in the classification of COVID-19 and No-findings. To divide the X-ray image into distinct partitions, deep U-Net learning architecture is used to extract labelled information from these regions for classification purposes. The dataset is segmented to mark lungs and regions of interest to allocate the labelled infected parts in the given chest X-ray image. The segmentation algorithm helps to localize the lungs, which is the essential part of studying chest X-ray images. To improve the accuracy of the proposed system deep learning U-Net-based segmentation model is used. trained the U-Net model on the Shenzhen hospital X-rays dataset and validate it on the Montgomery Country dataset. Since the region of interest in this model is to lungs part in a chest X-ray, the Montegomery and Shenzhen dataset are used with the lungs masks and train the model with the masks to get the lungs region as output (shown in Fig. 4).

Classification

This step builds a classification model using the extracted labelled X-ray image feature vectors. The proposed system employs a deep learning model U-Net learning model based convolutional neural network (CNN) to classify COVID-19 and non-COVID-19 patients. The basic architecture of U-Net learning-based CNN is shown in Fig. 4 which contains convolution, pooling, and fully connected layers. For the Darknet-19 model classification model, have found it more rational to use existing work as a starting point rather than scratch. Darknet-19 is a well-accepted model in deep learning which has 19 convolutional and 5-max-pool layers. Each convolutional layer has a different number of filters. The number of filters increased as we go deeper into the architecture. are using fewer layers in the proposed system architecture since non-availability of the sufficient chest and cough databases are a major challenging problem for training of the proposed model. To solve these problems, the proposed model is trained based on segmented images by generated mask. The proposed system used the Darknet model directly (without segmentation) to solve the overfitting problem. However, the overfitting issue is achieved due to the high parameters used in this work. This proposed system has used 17 convolutional layers with batch normalization method and leaky ReLU nonlinear activation function to train the model over-collected image database to solve the overfitting issue. Depicts the architecture of the proposed classification model. The batch normalization method is used to make the proposed segmentation model more stable and to make inputs in standard form. Batch normalization is a regularized that is easy to implement and compatible with many other models and training algorithms. The batch normalization method mitigates generalization error and allows dropout to be omitted because of the noises in the estimate of the statistics used to normalize each hyperparameters and variable. The leakyRelu is a variation of Rectified Linear Unit (ReLU), which is used to fix the dying ReLU problem. In leakyRelu, instead of the function becoming zero for ( 0), it has a slight negative slope, nearly 0.01, which prevents the dying ReLU. used leakyRelu activation in the proposed architecture. An activation function takes a number and performs a mathematical operation. Based on the output of the activation function, it decided which neuron has to activate. The pooling layer reduces feature maps’ dimension using down-sampling by summarizing the features using the downsampling technique. It uses the maximum value from each cluster of neurons at the last layer since it is a binary classification, so the proposed system has two classes, COVID-19 and non-COVID-19 prediction, for early diagnosis. For the compilation of the model, Adam optimizer technique is used to update the weights and binary cross-entropy as loss functions. The fully connected layer measures the scoring probability of the output classes. These layers can be repeated according to the system requirement. The CNN model uses the backpropagation technique to learn the suitable connection weights by computing the losses in the scores. Fig. 4 shows the architecture of the proposed classification model. This model consists of 17 convolutional layers, 16 batch normalization, 5 max-pooling layers, 1 Flatten, and 1 linear normalization, 5 max-pooling layers, 1 Flatten, and 1 linear. To better understand the architecture, have divided it into blocks, the Basic Block, which consists of one convolutional layer followed by batch normalization and leaky ReLU. Triple Block is the combination of basic block three times, one after the other. Finally, the flattened layer and linear layer I used to predict the output of the early diagnosis. To better understand the architecture, it is divided into blocks, the basic block, consists of one convolutional layer followed by batch normalization and leaky ReLU. The triple block is the combination of basic blocks three times, one after the other.

Cough sample-based Covid-19 diagnosis

The noticeable symptoms of COVID-19-infected patients have cough and breathing difficulties. When these breathing and cough samples are analyzed using machine learning techniques, it is envisioned that the respiratory sounds of patients give useful insights, enabling an early classification framework/method of patients as a diagnostic tool’s design using deep learning techniques.

Cough sample collection

For the detection of COVID-19 patients, the cough, breathing, and speech sound samples of individual patients (speaker/users) are considered for the training of the proposed framework. have used several datasets from different sources available in open-source platforms. The coswara cough sample database is used and provided by IISc Bangalore, India under the Coswara [9] project for diagnosis of COVID-19 patients. In the Coswara cough database, audio recordings are collected via worldwide crowd-sourcing using the b applications from different speakers. The dataset comprised of various categories namely cough (two kinds; heavy and shallow), breathing (two kinds; heavy and shallow), sustained vowel phonation (three kinds: a, e, o) and digit counting (two kinds: fast and normal) along with metadata information (shown in Table 2). The Coughvid [10], DetectNow [10] and Virufy datasets are publicly available. Each month’s audio recordings of cough samples are recollected in a separate folder and compressed along with its metadata information in multiple files. The files are then uncompressed to get the extracted recordings. The cough sample consists of audio recordings in.WAV format at 44.1 kHz. The cough audio sample has recordings from COVID-19 and non-COVID-19 people. The size of the cough audio sample was 5000 samples (i.e., 5000 500 subjects (200 COVID-19150(healthy)150 (non-COVID-19 patient) 10 samples of each people. We have chosen the COVID-19 positive samples and the negative samples randomly chosen for balanced distribution. Fig. 5 shows the spectrogram of the cough sample based on different respiratory characteristics of the patient. Fig. 6 shows the spectrogram of the cough sample based on different respiratory systems.
Fig. 5

Working prototype for the cough-based model.

Fig. 6

Shows the spectrogram of cough (voice) based on pronounced vowels (a, e, i, o, u) by patients.

Working prototype for the cough-based model. Shows the spectrogram of cough (voice) based on pronounced vowels (a, e, i, o, u) by patients. Working prototype for the cough-based model.

Pre-processing step

The audio recording datasets of patients consist of cough, breathing, phonation, and counting samples. These datasets are manually segmented into binary classes, namely, positive and negative. The cough samples are labeled as positive, mild, or positive, segmented as positive, and samples in which COVID-19 status was labeled as healthy or no respiratory illness exposed are resegmented as negative. We used Feature extraction is a dimensional reduction process for statistical analysis of cough samples. The Mel-Frequency Cepstral Coefficients (MFCCs) features are extracted from the segmented signals using the hamming window method. The MFCC of the cough speech signal is used as a feature for classification purposes. The pre-processing algorithmic step for the cough sample is given in Algorithm 1. The short-time spectrum analysis of processed cough samples is achieved by using the short-time spectrum analysis method (shown in Algorithm 2). down-sampled the recordings at a 16 kHz sample rate. The working of the prototype model is shown in Fig. 7. The computation of MFCCs features is shown in Fig. 8.
Fig. 7

Working prototype for the cough-based model.

Fig. 8

Extraction of MFCCs from the cough (voice) samples.

The proposed system takes cough samples (speech data) using the users’ mobile phone microphones. After that, the Discrete Fourier Transform (DFT) technique is a spectrogram that gets the signal in the time interval from the frequency domain after dividing the speech signal into a small number of speech frames. We computed the power spectrum that is obtained for mapping it onto the Mel scale using triangular filters. Fig. 9 shows the power spectrum of a speech input volume. Next, log outputs are found using the Discrete Cosine Transform (DCT)technique. Finally, delta () and delta-delta ( ) coefficients are calculated as follows: Let us consider the MFCC of a window frame (t) is denoted by . The delta coefficient () is computed as (shown in Eq. (1)), where [i] is a () window and is usually set to 6 to 10 frames, as the consumer devices’ speech input may have different signal duration, the MFCCs feature vector will also be of various lengths. Therefore, the proposed system normalizes the feature vector by constructing MFCCs with no sound for shorter signals. A detailed description of feature extraction is given in Algorithm 2.
Fig. 9

Illustration of cough for volume and its power spectrum.

Extraction of MFCCs from the cough (voice) samples.

Feature extraction

Coughing is one of the most important respiratory symptoms for early diagnosis of critical dis of different statistical information in lower level frequency with significant spectrogram coefficients or vectors. Therefore, we considered the cough (audio) signal for the representation of the sequence of spectral vectors for COVID-19 diagnosis. This representation of cough samples in the time vs. frequency representation is called a spectrogram with a high energy band for analysis of multi-pulmonary level cough signals for early diagnosis of COVID-19. The MFCCs feature of the Intrinsic Mode Functions (IMFs) and windowed cough signal samples are extracted from cough spectrograms to classify COVID-19 and non-COVID-19 using the convolutional neural network (CNN) model. Because MFCCs features consist of discriminatory features which depict the human vocal respiratory characteristics that vary with time. Therefore, the discriminatory features are extracted from the cough sample together with () first-order derivatives, and second-order () derivatives on measured short-term power spectrogram of the cough sound sample for early diagnosis. The extracted MFCC features from the prepared cough audio datasets resulted in a 25 000 × 36 feature matrix. Then, the feature reduction method is used to select the significant values from the extracted features using the Principal Component Analysis (PCA) method. The computation of MFCCs features is illustrated in Algorithm 3. The performance of speech voice-based classification tasks is greatly enhanced by adding time derivatives to the basic static MFCC parameters of the cough samples. This is popularly known as delta coefficients (delta () and delta-delta () coefficients). It describes the vocal tract system in the human body based on computed acceleration values of the vocal cough system. Therefore, we have measured the derivatives to analyze MFCCs features and are computed using the following regression formula. The detailed description is given step by step procedures as follows (shown in Eq. (2)): Illustration of cough for volume and its power spectrum. The Discrete Fourier Transform (DFT) technique is applied to get the signal in the time interval from the frequency domain after dividing the speech signal into a small number of speech frames. Ultimately, the power spectrum has been obtained for mapping it onto the Mel scale. Fig. 9 depicts the power spectrum of a cough audio input volume. Next, log outputs are found using the Discrete Cosine Transform technique. Finally, delta () and delta-delta () coefficients are calculated as follows: Let us consider the MFCC of a window frame (t) is denoted by . The delta coefficient ( is computed as (shown in Eq. (1)): where, I is a window is usually set to 6 to 10 frames, as the consumer devices’ speech input may have different signal duration, the MFCCs feature vectors will also be of various lengths. Therefore, the proposed system normalizes the feature vector by constructing MFCCs with no sound for shorter signals.

Classification

This step builds a classification model using the extracted MFCCs feature vector with delta and delta-delta. The proposed system employs a deep CNN to classify the speech signal. The basic architecture of the CNN model contains convolution, pooling, and fully connected layers (shown in Fig. 8). The CNN model starts with an input layer that takes a spectrogram of the obtained features (i.e., feature map) and passes it to the convolution layer. The convolution layer uses a ReLU activation function to generate feature maps. As the input, the MFCC features are used for training the CNN learning model. It consists of two blocks of layers. Each block layer comprises two convolutional layers followed by a 2 × 2 max-pooling layer, batch normalization layer, and a 0.20 dropout to prevent the model from overfitting problems (shown in Fig. 9). Next, the pooling layer reduces the dimension of feature maps using down-sampling by summarizing the features. The basic architecture of CNN is shown in Fig. 12. The batch normalization layer is used to standardize the input layer and stabilize the learning process. The convolutional layers in the first block use a kernel size of 5 × 5 with 32 filters, whereas, the second block uses a kernel size of 3 × 3 with 32 filters each in both convolutional layers. They learned complex features from these four convolutional layers that are flattened and then passed to a fully connected layer of 256 neurons followed by a dropout layer of a dropout rate of 0.30 to prevent overfitting. Finally, the output layer with two neurons and softmax () activation functions is used to classify COVID-19 and non-COVID-19 case patients for the given input.
Fig. 12

Segmentation of chest X-ray images using the proposed model.

Weighted sum-rule fusion method

The weighted sum-rule fusion method is used to integrate the matching scores from the chest X-ray detection model and cough sample-based model to perform an accurate prediction of infected people. The combined results of the two models provide a more robust and reliable result. The fatigue detection model is used to analyze the facial expressions of infected people for doctors so that the physical appearance of the patient can be taken into consideration for the diagnosis of symptoms of people. The weighted sum-rule fusion method integrates score probabilities () of chest-X-rays and cough models. The fusion of the two models makes the result more accurate for early diagnosis of patients (shown in Fig. 9). The individual weight for each chest X-ray model and cough sample model is given by (Eq. (9)). The confusion matrix is shown in Table 3. It shows predicated actual positive and negative classes (shown in Eq. (3)).
Table 3

Performance of proposed chest X-ray model based on fold cross-validation.

FoldsSensitivitySpecificityPrecisionAccuracyF1
10.94591.00001.00000.98900.9722
21.000091.00001.00001.00001.0000
30.94530.912350.93670.95780.9646
40.93540.942850.96690.96770.9879
50.98910.976850.95560.95760.9789
Next, assigned the individual weights (W1) and (W2) to the chest X-ray diagnostic and cough sample based model to combine the accuracy for accurate production of COVID-19. The proposed multimodal framework performs fusion of both models to calculate the mean weighted accuracy for accurate prediction of COVID-19 and non-COVID-19. The computation of mean weighted accuracy is given by Eq. (4). where, [S1] and [S2] are the matching scores of models. This weighted average accuracy of the proposed framework provides robust and more reliable results, about whether the patient can be COVID-19 positive or not for early diagnosis of COVID-19.

Experimental results and discussions

In this section, the experimental results are evaluated based on the different data sets such as chest X-ray, cough, and facial image data set sample-based multimodal learning framework. The proposed system is trained with 566 images with lung masks using U-Net-based deep learning architecture. The segmentation model provided the input segmented images to the classification model of chest X-rays with a validation accuracy of 98.46%. The input chest X-ray image, actual masks, and output predicted mask is highlighted in Fig. 10. The graphs of training and validation losses and accuracy with respect to the number of epochs for the segmentation model are shown in Fig. 11.
Fig. 10

Shows base image, mask, and predicted mask.

Fig. 11

Loss and accuracy curves of the segmentation model.

Shows base image, mask, and predicted mask.

Classification model

The segmented data is used as input for the proposed multimodal-based classification model. For training the model 180 images, for COVID-19 positive and 720 images for the non-COVID class. Out of these 900 images, the 20% and 80% split of the total databases is used for validation and training sets of the proposed model. have validated the proposed framework using a 5-fold cross-validation technique. It is given 20% of total images for validation and the remaining 80% of total images for the proposed model’s training. The automatic hyper-parameter selection method is used to tune the proposed framework for early diagnosis of COVID-19 cases. The proposed framework reduces the need to understand the selection strategy for the training of the models. The automatic selection method of the hyperparameters is highly computationally costly. Fig. 11 shows the loss and accuracy curves of the segmentation model. To evaluate the performance of the classification methods, we evaluated the following indicators based on confusion matrix-based measures (shown in Eqs. (5)–(9)): where TP is defined as True Positive, FP is False Positive, TN is True Negative, and FN is False Negative cases. Among them, the F1 score was employed as the evaluation criteria for early halting. Finally, the overall metric scores of the algorithm were obtained by averaging each metric over numerous classes, as shown in Table 3, Table 4. The proposed multimodal framework for identifying COVID-19 and non-COVID-19 discoveries uses segmented chest X-ray images as input. For training the model, we have 468 images of COVID-19-positive patients and 720 images of non-COVID-19 patients. We employed 20% and 80% of the chest X-ray image database for training and validation of the proposed model, respectively, and for measuring system performance. A 5-fold cross-validation procedure is used to validate the suggested model. Fig. 12 shows the confusion matrix-based measure for chest X-ray image and cough-based diagnosis.
Table 4

Performance of proposed cough model based on fold cross-validation.

FoldSensitivitySpecificityPrecisionAccuracyF1
10.9279.96410.95380.98670.8792
20.97220.98610.94590.98330.9589
30.86570.917360.93670.94790.9746
40.94840.972880.92890.95760.9689
50.895940.946860.92660.93860.9979
Loss and accuracy curves of the segmentation model. To obtain higher performance of the proposed framework, the chest-X ray images are divided into different distinct regions of interest using the U-net model. Fig. 13 depicts the segmentation of chest X-ray images using the proposed model. The confusion matrix of the proposed method is shown in Fig. 14(a) for chest X-ray images and in Fig. 14(b) for cough audio signals. We used 20% of the total Chest-X-ray images for validation and the remaining 80% for the suggested model training scheme (see Fig. 15 shows fold 3 and fold-4 accuracy).
Fig. 13

Confusion Matrix features (a) chest X-ray images and (b) cough based diagnosis.

Fig. 14

Confusion Matrix for (a) chest X-ray images and (b) cough based diagnosis.

Fig. 15

(a) shows the accuracy of fold 3 and (b) fold 4 for CXR classification.

Performance of proposed chest X-ray model based on fold cross-validation. Performance of proposed cough model based on fold cross-validation. Performance of proposed cough model based on fold cross-validation. Segmentation of chest X-ray images using the proposed model. Confusion Matrix features (a) chest X-ray images and (b) cough based diagnosis. Confusion Matrix for (a) chest X-ray images and (b) cough based diagnosis. (a) shows the accuracy of fold 3 and (b) fold 4 for CXR classification. Shows model accuracy vs epoch for COVID-19 classification based on CXR images.

Cough sound analysis

The experimental results demonstrate that the proposed cough detection algorithm classifies the given cough samples into two classes: COVID-19 positive and non-COVID-19 (no findings) events with an overall accuracy of 82.30% (average accuracy is shown in Table 4, Table 5, respectively).
Table 5

Performance of proposed cough model based on fold cross-validation.

FoldSensitivitySpecificityPrecisionAccuracyF1
10.9279.96410.95380.98670.8792
20.81310.82140.81310.82300.8131
The accuracy-based error graph based on computed average (mean) loss versus epochs of the neural network-based model is shown in Fig. 15. Fig. 16, Fig. 17 show the confusion matrix for cough-based diagnosis and the accuracy of the proposed framework with the segmentation method on Chest-X-ray images. Fig. 18 illustrates the average loss of the proposed model for the classification of symptoms with the segmentation method.
Fig. 16

Shows model accuracy vs epoch for COVID-19 classification based on CXR images.

Fig. 17

Depicts accuracy of the proposed framework with segmentation method on Chest-X-ray images.

Fig. 18

Illustrates the losses of the CXR classification model with segmentation method.

The L2 norm-based regularization technique is used to reduce the overfitting problem. It is also called the ridge regression method which adds a squared magnitude of coefficients as a penalty term to the loss function. The L2 regularization technique is used to remove the overcoming overfitting issues. Depicts accuracy of the proposed framework with segmentation method on Chest-X-ray images. Illustrates the losses of the CXR classification model with segmentation method. Moreover, various hyper-parameters like dropout rate, activation functions, and the number of hidden layers, learning rates of deep neural network-based models have also been performed to solve the overfitting problem. The number of epochs versus the decay of model loss has been investigated to rule out the possibility of overfitting [21], [22]. These performance metrics are based on a confusion matrix (shown in Tables 4, 6, and Fig. 16).
Table 6

Accuracy (%) of the proposed Framework on selected weights.

Model TypeWeightAccuracy
X-ray based detection0.800.983
Cough-based detection0.520.827
After calculating the accuracy of the individual models, our next task was to calculate the average accuracy of the two models i.e., chest X-ray-based detection-based model and Cough based detection model. The fusion of the two model’s accuracy will increase the reliability and robustness of the overall framework. Since both the models are trained and tested using separate data sets, combining the models was a difficult task. The weighted average method is used for calculating the average accuracy. Some weight was assigned to both the models and using these weights calculated average accuracy. For calculating the weights, have used the confusion matrix of each model. The weights assigned to both models are. For calculating the weighted mean of the accuracy have used the formula (8), (9). Therefore, the classification accuracy of the framework is 90.50%. Accuracy (%) of the proposed Framework on selected weights. Comparisons with existing techniques on cough for COVID-19 detection. Abbreviation: Ref. Reference, Prop. Proposed, DLT DLT-based classifier, ED Ensembles DNN, TL Transfer Learning with VGGish, chest-CT Chest CT Scan, REs-A ResNet and Location Attention, MI M-Inception techniques, CNN Convolutional neural Network , Coswara Coswara cough audio database, CC Coswara/ Coughvid.

Comparison with the existing methods

In this section, the comparative study of existing approaches for COVID-19 detection is illustrated in Table 7, Table 8. Each learning-based modality in multimodal which are chest X-ray-based, fatigue-based, and cough sound-based is taken into consideration to validate the accuracy of the proposed multimodal-based framework with current state of art methods that are discussed.
Table 7

Comparisons with existing techniques on cough for COVID-19 detection.

ReferencesDatasetTechnique usedPerformance (%)Remark
[23]chest-CTRes-A86.7%Takes more time, less accuracy
[24]Chest CTMI82.90%Less dataset
[7]Chest-X rayDTL92.1%Overfitting problem
[25]CCED77.1%Less accuracy
Prop.CoswaraCNN82.70%%Less processing time

Abbreviation: Ref. Reference, Prop. Proposed, DLT DLT-based classifier, ED Ensembles DNN, TL Transfer Learning with VGGish, chest-CT Chest CT Scan, REs-A ResNet and Location Attention, MI M-Inception techniques, CNN Convolutional neural Network , Coswara Coswara cough audio database, CC Coswara/ Coughvid.

Table 8

Existing work for early detection of COVID-19 detection using ML techniques based on chest X and coughing samples.

StudyYearRRSSNSNRPSTMPer (%)
[13]2021XCXR200CNNCNETSP92.4
[17]2020X-ray30R1CNNDLSP95.4%
[4]2019CXR125200DNETMLSP98%
[5]2020X-ray230DLCONSPSP90%
[7]2022Sound200ChestHSRSFsCNN70%
[15]2020cough4352SPMFCCDLCNN80%
[16]2020WebCoughCOD5320DLR297.10%
[18]2020SPCoughCO3621STR-18AUC(0.72)
[19]2021ESBCOV10102DFTInv380%
[16]2022BBB220MFCCSDLLow-D70%
Prop.2022CT+CTotal(98.33+82.7%)
To demonstrate the importance of our chest X-ray model there is a comparison with various works done so far in the early diagnosis of COVID-19 field. To demonstrate the importance of the dataset, the experiments of different X-ray models are used for COVID-19 diagnosis based on different data sets and the current state of the arts. The comparative analysis of the existing methods for chest X-rays and cough samples is shown in Table 8. It mentioned various works and proposed works with the type of data used and results. The comparative analysis of our work with various work done so far in this field they have compared the experiments of different cough-based models of COVID-19 diagnosis based on different datasets. In Table 8, the results are mentioned based on various existing works and proposed works with the type of data used and results. Existing work for early detection of COVID-19 detection using ML techniques based on chest X and coughing samples.

Conclusion and future directions

The proposed multimodal framework performs the fusion of two models using the weighted sum-rule fusion technique for the classification of COVID-19 patients and the average accuracy achieved in models. For chest X-ray classification, an accuracy of 98.33% is obtained and the cough sound model achieved 82.7% accuracy for the classification of patients. Based on overall observations, the final accuracy of the proposed framework is 92.03% for the early diagnosis of patients. The proposed system can be used remotely at different places, especially where lack no medical facilities, COVID-19 detection centers, monitoring systems, and other diagnosis centers. The major shortcomings and limitations faced for developing the multimodal framework for early diagnosis of the patient are: (1) there is no availability of a sufficient database of Chest-X ray and cough (audio) samples of the same patients for training the proposed models for early diagnosis of COVID-19 cases which can be COVID-19 positive class. In open source datasets, proper label and annotated datasets of chest X-ray and cough samples are available individually, however, these datasets consist of the minimum number of samples with proper labels. The non-COVID-19 case, mild infection cases, and other detection of severity levels, (2) there is always room for improvement for analysis for Cough (audio) samples-based diagnosis using deep multimodal fusion techniques. This work can also be improved by bringing new disruptive technologies and methods for fast computation of early prediction of COVID-19 cases at different levels. The future directions for the proposed multimodal are illustrated as follows: (1) The databases will be increased to evaluate results, (2) a smart system will be developed based on the Android application for early diagnosis of patients and other users of the medical services

CRediT authorship contribution statement

Santosh Kumar: Conception and design of study, Analysis and/or interpretation of data, Writing – original draft, Writing – review & editing. Rishab Nagar: Conception and design of study, Analysis and/or interpretation of data, Writing – original draft, Writing – review & editing. Saumya Bhatnagar: Conception and design of study, Analysis and/or interpretation of data, Writing – original draft, Writing – review & editing. Ramesh Vaddi: Conception and design of study, Analysis and/or interpretation of data, Writing – original draft, Writing – review & editing. Sachin Kumar Gupta: Conception and design of study, Analysis and/or interpretation of data, Writing – original draft, Writing – review & editing. Mamoon Rashid: Conception and design of study, Analysis and/or interpretation of data, Writing – original draft, Writing – review & editing. Ali Kashif Bashir: Conception and design of study, Analysis and/or interpretation of data, Writing – original draft, Writing – review & editing. Tamim Alkhalifah: Conception and design of study, Analysis and/or interpretation of data, Writing – original draft, Writing – review & editing.

Declaration of Competing Interest

No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.compeleceng.2022.108391.
  14 in total

1.  AI4COVID-19: AI enabled preliminary diagnosis for COVID-19 from cough samples via an app.

Authors:  Ali Imran; Iryna Posokhova; Haneya N Qureshi; Usama Masood; Muhammad Sajid Riaz; Kamran Ali; Charles N John; Md Iftikhar Hussain; Muhammad Nabeel
Journal:  Inform Med Unlocked       Date:  2020-06-26

2.  Deep Learning COVID-19 Features on CXR Using Limited Training Data Sets.

Authors:  Yujin Oh; Sangjoon Park; Jong Chul Ye
Journal:  IEEE Trans Med Imaging       Date:  2020-05-08       Impact factor: 10.048

3.  Automatic Identification of Cough Events from Acoustic Signals.

Authors:  Renard Xaviero Adhi Pramono; Syed Anas Imtiaz; Esther Rodriguez-Villegas
Journal:  Conf Proc IEEE Eng Med Biol Soc       Date:  2019-07

4.  RCoNet: Deformable Mutual Information Maximization and High-Order Uncertainty-Aware Learning for Robust COVID-19 Detection.

Authors:  Shunjie Dong; Qianqian Yang; Yu Fu; Mei Tian; Cheng Zhuo
Journal:  IEEE Trans Neural Netw Learn Syst       Date:  2021-08-03       Impact factor: 10.451

5.  Automated detection of COVID-19 cases using deep neural networks with X-ray images.

Authors:  Tulin Ozturk; Muhammed Talo; Eylul Azra Yildirim; Ulas Baran Baloglu; Ozal Yildirim; U Rajendra Acharya
Journal:  Comput Biol Med       Date:  2020-04-28       Impact factor: 4.589

6.  Prediction of muscular paralysis disease based on hybrid feature extraction with machine learning technique for COVID-19 and post-COVID-19 patients.

Authors:  Prabu Subramani; Srinivas K; Kavitha Rani B; Sujatha R; Parameshachari B D
Journal:  Pers Ubiquitous Comput       Date:  2021-03-03       Impact factor: 3.006

7.  AI-enabled radiologist in the loop: novel AI-based framework to augment radiologist performance for COVID-19 chest CT medical image annotation and classification from pneumonia.

Authors:  Hemant Ghayvat; Muhammad Awais; A K Bashir; Sharnil Pandya; Mohd Zuhair; Mamoon Rashid; Jamel Nebhen
Journal:  Neural Comput Appl       Date:  2022-03-01       Impact factor: 5.606

8.  A deep-learning based multimodal system for Covid-19 diagnosis using breathing sounds and chest X-ray images.

Authors:  Unais Sait; Gokul Lal K V; Sanjana Shivakumar; Tarun Kumar; Rahul Bhaumik; Sunny Prajapati; Kriti Bhalla; Anaghaa Chakrapani
Journal:  Appl Soft Comput       Date:  2021-05-26       Impact factor: 6.725

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.