| Literature DB >> 34249142 |
V P Jayachitra1, S Nivetha1, R Nivetha1, R Harini1.
Abstract
The COVID-19 emerged at the end of 2019 and has become a global pandemic. There are many methods for COVID-19 prediction using a single modality. However, none of them predicts with 100% accuracy, as each individual exhibits varied symptoms for the disease. To decrease the rate of misdiagnosis, multiple modalities can be used for prediction. Besides, there is also a need for a self-diagnosis system to narrow down the risk of virus spread in testing centres. Therefore, we propose a robust IoT and deep learning-based multi-modal data classification method for the accurate prediction of COVID-19. Generally, highly accurate models require deep architectures. In this work, we introduce two lightweight models, namely CovParaNet for audio (cough, speech, breathing) classification and CovTinyNet for image (X-rays, CT scans) classification. These two models were identified as the best unimodal models after comparative analysis with the existing benchmark models. Finally, the obtained results of the five independently trained unimodal models are integrated by a novel dynamic multimodal Random Forest classifier. The lightweight CovParaNet and CovTinyNet models attain a maximum accuracy of 97.45% and 99.19% respectively even with a small dataset. The proposed dynamic multimodal fusion model predicts the final result with 100% accuracy, precision, and recall, and the online retraining mechanism enables it to extend its support even in a noisy environment. Furthermore, the computational complexity of all the unimodal models is minimized tremendously and the system functions effectively with 100% reliability even in the absence of any one of the input modalities during testing.Entities:
Keywords: Audio; Deep learning; Image; Multimodal fusion; Reduced network complexity; Self-diagnosis
Year: 2021 PMID: 34249142 PMCID: PMC8260502 DOI: 10.1016/j.bspc.2021.102960
Source DB: PubMed Journal: Biomed Signal Process Control ISSN: 1746-8094 Impact factor: 3.880
Summary of previous methodologies on COVID-19 prediction.
| Literature/Year | Modality | Finding | Methods/Features | Result | Challenges/Research Gap |
|---|---|---|---|---|---|
| [3] (2020) | Cough | Classification: COVID-19/ Pneumonia/ Pertussis/ Others | LSTM, MFCC features, SVM | LSTM - 88% (Acc.), SVM- 94% (Acc.) | In |
| [17] (2020) | Cough | Classification: COVID-19/ Others | CNN, MFCC | 98.5% (Acc.), 94.2 (Spec.), 0.97 (AuC) | |
| [12] (2020) | Voice | Comparison of healthy and COVID-19 patients | Two-way ANOVA and Wilcoxon's rank-sum test | Significant differences observed between COVID-19 patients and the healthy participants. | |
| [14] (2020) | Respiratory characteristics | COVID-19 prediction | Bi-GRU | 83.69% (Acc.), 90.23% (Sens.) and 76.31% (Spec.) | |
| [22] (2020) | X-Rays | COVID-19 diagnosis | Patch-based CNN | 91.9% (Acc.) | |
| [9] (2020) | Clinical Symptoms - Age, fever, cough, etc. | To identify the highly correlated features in predicting COVID-19 | XGBoost | Significant symptoms identified: fever, cough, lung infection. | In |
| [28] (2021) | Laboratory findings | COVID diagnosis | Naive Bayes, Random Forest, and SVM | SVM - 95% (Acc.) | |
| [1] (2020) | Cough | Classification: COVID-19/ Bronchitis/ Pertussis/ Normal | Transfer learning based ML and DL models | 92.85%(Acc.) | In |
| [23] (2020) | X-Rays | To identify normal, COVID-19, viral pneumonia | Fusion of ResNet-101 and ResNet-152 | 96.1%(Acc.) | |
| [29] (2020) | X-Rays | COVID-19 diagnosis | COVID-CheXNet (Score level fusion of ResNet-34 and HRNets) | 99.99%(Acc.) | |
| [32] (2021) | X-Rays | COVID-19 diagnosis | COVID-DeepNet model (Deep Belief Network (DBN) and Convolutional DBN) | 99.93%(Acc.) | |
| [35] (2021) | CT images | COVID-19 Infected Area Segmentation | UNet, dynamic retraining method | 95% (Confidence level) | Increased system overhead due to image retraining. |
| [21] (2020) | CT scans | COVID-19 prediction | Attention based Multiple instance learning | 97.9% (Acc.), 99.0% (AuC) | 3D imaging increases training and spatial complexity |
| [30] (2020) | X-Rays | Benchmarking study to identify the best ML model for COVID-19 | Entropy and TOPSIS methods | SVM was identified as the best model | In |
| [31] (2021) | X-Rays | Comparative study of deep learning models for COVID-19 diagnosis | Deep learning models | Resnet-50 (98.8%), MobileNetV2 (93.5%) (Acc.) | |
| [19] (2020) | CT scans | COVID-19 prediction | AlexNet | AuC of 0.995 | |
| [13] (2020) | Cough and Speech | COVID-19 prediction | RNN, Ensemble Stacking | 78% (Rec.) | In |
| [10] (2020) | Fever, cough, shortness of breath | IoT based COVID diagnosis | ML models (SVM, KNN, Naive Bayes, etc.) | 92.95% (Acc.) | |
| [15] (2020) | Temperature, cough rate, respiratory rate, and blood oxygen saturation | IoT based COVID diagnosis | Fog based ML | SVM - 74.7% (Acc.) |
Fig. 1Cognitive Disease Prediction System Architecture.
Fig. 2CovParaNet Architecture.
Fig. 3CovTinyNet Architecture.
Hyperparameters considered for optimization of proposed models.
| CovParaNet | Learning rate | [0.0003, 0.0001, 0.01] | 0.0001 |
| Batch Size | [32, 64] | 32 | |
| Epochs | [30, 50, 70, 100] | 70 | |
| Sampling rate | [22050, 44100] | 44100 Hz | |
| No. of MFCCs | [13, 26] | 13 | |
| Convolution Layers | [3, 6, 7, 9, 13] | 7 | |
| Activation function | Fixed | ReLU | |
| Optimizer | Fixed | Adam | |
| Loss Function | Fixed | Sparse Categorical Cross Entropy Loss | |
| Train test split | Fixed | 75% - 25% | |
| CovTinyNet | Learning rate | [0.0003, 0.0001, 0.01] | 0.0003 |
| Batch Size | [32, 64] | 32 | |
| Epochs | [30, 50, 70, 100] | 50 | |
| Convolution Layers | [11, 12, 13] | 12 | |
| Activation function | Fixed | ReLU | |
| Optimizer | Fixed | Adam | |
| Loss Function | Fixed | Cross Entropy Loss | |
| Train test split | Fixed | 80% - 20% | |
| Dynamic Multimodal Fusion | Number of decision trees | Fixed | 10 |
| Minimum samples to split internal node | Fixed | 2 | |
| Train test split | Fixed | 75% - 25% |
Fig. 4Training and validation accuracy of CovParaNet.
Fig. 5Training and validation loss of CovParaNet.
Fig. 8Confusion Matrices of CovParaNet.
Fig. 10Confusion Matrix of CovTinyNet.
Fig. 6Accuracy Graph of CovTinyNet.
Fig. 7Loss Graph of CovTinyNet.
Fig. 9ROC Curve of CovTinyNet.
Comparison of CovParaNet with existing models for acoustic dataset.
| Cough | CovParaNet | ||||
| CNN | 92.57% | 86.45% | 88.12% | 87.28% | |
| RNN | 91.84% | 90.90% | 82.05% | 86.25% | |
| Speech | CovParaNet | 100% | |||
| CNN | 94.63% | 100% | 82.54% | 90.43% | |
| RNN | 93.91% | 100% | 77.65% | 87.42% | |
| Breathing | CovParaNet | 93.53% | |||
| CNN | 96.33% | 95.65% | 95.25% | ||
| RNN | 96.73% | 98.44% | 92.88% | 95.58% |
Comparsion of CovTinyNet with state-of-the-arts for visual dataset.
| Chest X-ray | CovTinyNet | 100% | 89% | 94% | 99.45% | ||
| ResNet-18 | 18 | 97.60% | 100% | 84.21% | 91.43% | ||
| ResNet-50 | 50 | 98.33% | 100% | 99.75% | |||
| MobileNetV2 | 53 | 96.80% | 100% | 86.67% | 92.86% | 99.64% | |
| DenseNet-121 | 121 | 96.80% | 100% | 78.95% | 88.24% | 98.51% | |
| UNet | 23 | 97.38% | 97.90% | 96.68% | 97.29% | 98.80% | |
| Chest CT | CovTinyNet | ||||||
| ResNet-18 | 18 | 97.38% | 98.08% | 96.96% | 97.51% | 99.84% | |
| ResNet-50 | 50 | 96.17% | 98.03% | 94.68% | 96.32% | 99.52% | |
| MobileNetV2 | 53 | 93.55% | 92.62% | 95.44% | 94.01% | 98.91% | |
| DenseNet-121 | 121 | 97.38% | 98.45% | 96.58% | 97.50% | 99.67% | |
| UNet | 23 | 96.00% | 85.00% | 89.47% | 87.18% | 95.60% |
Comparison of Dynamic Multimodal fusion with MaxVoting fusion
| Dynamic Multimodal Fusion | ||||
| MaxVoting Fusion | 98% | 96.15% | 100% | 98.04% |
| 1: Initialize R = 44100, p = 13 |
| 2: S = R * D |
| 3: N = S/ H |
| 4: |
| 5: |
| 6: Load i with |
| 7: |
| 8: St = Segment from l to r |
| 9: where, |
| 10: |
| 11: |
| 12: Convert to mel scale, |
| 13: |
| 14: |
| 15: |
| 16: |
| 17: |
| 18: Create M with specification given in |
| 19: Compile M with Adam optimizer and |
| 20: |
| 21: Train M with |
| 22: |
| 23: |
| 24: Y = Predict(M,i) |
| 25: |
| 26: |
| 1: Initialize parameters T = 50, |
| 2: |
| 3: preprocess images |
| 4: F = generate |
| 5: |
| 6: feature vector V = flatten |
| 7: produce class probabilities P |
| 8: W = update(W) |
| 9: |
| 10: Load the trained model M with W |
| 11: |
| 12: Y = Predict(M,i) |
| 13: |
| 14: |
| 1: Initialize n = 200, k = 5, n_estimators = 10, min_samples_split = 2 |
| 2: |
| 3: |
| 4: |
| 5: |
| 6: |
| 7: Split |
| 8: Build the RFC model with max_features = |
| 9: Train the RFC with |
| 10: Y = Test |
| 11: |
| 12: |
| 13: Append |
| 14: |
| 15: |
| 16: Retrain RFC with F to update learned parameters |
| 17: |