| Literature DB >> 35938050 |
Ankush Manocha1, Munish Bhatia1.
Abstract
Over the last two years, the novel coronavirus has become a significant threat to the health of the public, and numerous approaches are developed to determine the symptoms of COVID-19. To deal with the complex symptoms of COVID-19, a Deep Learning-assisted Multi-modal Data Analysis (DMDA) approach is introduced to determine COVID-19 symptoms by utilizing acoustic and image-based data. Furthermore, the classified events are forwarded to the proposed Dynamic Fusion Strategy (DFS) for confirming the health status of the individual. Initially, the performance of the proposed solution is evaluated on both acoustic and image-based samples and the proposed solution attains the maximum accuracy of 96.88% and 98.76%, respectively. Similarly, the DFS has achieved an overall symptom determination accuracy of 98.72% which is highly acceptable for decision-making. Moreover, the proposed solution shows high reliability with an accuracy of 95.64% even in absence of any one of the data modalities during testing.Entities:
Keywords: Covid-19; Deep learning; Semi-supervised learning; Smart healthcare; Smart monitoring
Year: 2022 PMID: 35938050 PMCID: PMC9346103 DOI: 10.1016/j.compeleceng.2022.108274
Source DB: PubMed Journal: Comput Electr Eng ISSN: 0045-7906 Impact factor: 4.152
A comparative analysis of COVID-19 determination methods.
| Sr. No. | Reference | Data samples | Technique | Accuracy | Limitation |
|---|---|---|---|---|---|
| Acoustic | Cough | LSTM and SVM | 88.2% | Only a single input modality was employed, resulting in adequate confidence levels ( | |
| Voice | Two-way ANOVA | 83.69% | Only a single input modality was employed, resulting in adequate confidence levels ( | ||
| Cough | Transfer learning | 92.85% | Utilizing two or three deep models with several layers for a single input modality enhances the implementation complexity. This necessitates a compact framework. | ||
| Cough | CNN and MFCC | 98.54% | Only a single input modality was employed, resulting in adequate confidence levels ( | ||
| Image | CT scans | AlexNet | AuC of | Due to the variety in symptoms across patients, a single modality of input was employed, which cannot be depended on. Additionally, such models are unfit for noisy situations. | |
| CT scans | Attention-based multiple | 97.9% | 3D imaging increases training and spatial complexity. | ||
| X-rays | Patch-based CNN | 91.9% | Only a single input modality was employed, resulting in adequate confidence levels ( | ||
| X-rays | Fusion of ResNet-101 | 96.1% | Utilizing two or three deep models with several layers for a single input modality enhances the implementation complexity. This necessitates a compact framework. | ||
Fig. 1The conceptual framework of the proposed solution.
Fig. 2The detailed flow of the proposed solution.
Fig. 3Irregular acoustic event classification.
Fig. 4Image-based irregular event determination.
The detail of parameters of the DFS.
| Sr. No. | Parameters | Value |
|---|---|---|
| 1. | Validated dataset X | |
| 2. | Test samples | |
| 3. | Sample of cough | |
| 4. | Sample of speech | |
| 5. | Sample of Breathing | |
| 6. | Sample of CT scan | |
| 7. | Test subject | |
| 8. | Models | |
| 9. | Looping variables | |
| 10. | Prediction vector | |
| 11. | Training-based prediction vector | |
| 12. | Test-based prediction vector | |
| 13. | DT classification model | |
| 14. | Number of trees | |
| 15. | Number of features | |
| 16. | Misclassified prediction vector |
Hyperparameters of model.
| Model | Hyperparameter | Range | Optimized value |
|---|---|---|---|
| Proposed | Rate of learning | [0.0003, 0.001, 0.01] | 0.001 |
| Size of batch | [16, 32, 64] | 32 | |
| Number of epochs | [20, 40, 60, 80, 100] | 80 | |
| Sampling rate | [22 050, 44 100] | 44 100 Hz | |
| No. of MFCCs | [13, 26] | 13 | |
| Convolution layer | [3, 6, 7, 9, 13] | 13 | |
| No. of GRU cells | [32, 64, 128] | 128 | |
| Activation function | Fixed | ReLU | |
| Optimizer | Fixed | Adam | |
| Loss function | Fixed | Sparse categorical | |
| Train and test ratio | Fixed | 80%–20% |
Fig. 5Training and validation accuracy of model.
Fig. 6Training and validation loss of model.
Fold-1 based cross validation.
| Dataset | Cough | Speech | Breathing |
|---|---|---|---|
| Cough | 0.0% | 0.78% | |
| Speech | 0.0% | 0.0% | |
| Breathing | 0.0% | 0.0% |
Fold-2 based cross validation.
| Dataset | Cough | Speech | Breathing |
|---|---|---|---|
| Cough | 0.0% | 0.98% | |
| Speech | 0.0% | 0.0% | |
| Breathing | 0.85% | 0.0% |
Fold-3 based cross validation.
| Dataset | Cough | Speech | Breathing |
|---|---|---|---|
| Cough | 0.0% | 0.78% | |
| Speech | 0.0% | 0.0% | |
| Breathing | 1.2% | 0.0% |
Fold-4 based cross validation.
| Dataset | Cough | Speech | Breathing |
|---|---|---|---|
| Cough | 0.0% | 1.3% | |
| Speech | 0.0% | 0.0% | |
| Breathing | 0.81% | 0.0% |
Fold-5 based cross validation.
| Dataset | Cough | Speech | Breathing |
|---|---|---|---|
| Cough | 0.0% | 1.27% | |
| Speech | 0.0% | 0.0% | |
| Breathing | 0.81% | 0.0% |
Overall prediction accuracy.
| Dataset | Accuracy | Precision | Recall | F-measure |
|---|---|---|---|---|
| Cough | 97.12% | 96.96% | 94.11% | 96.42% |
| Speech | 96.08% | 100% | 92.87% | 91.68% |
| Breathing | 97.45% | 100% | 93.53% | 97.18% |
| Mean | 96.88% | 98.99% | 92.61% | 95.09% |
Fig. 7Event-based symptom determination performance evaluation of model.
Comparison of the proposed solution with state-of-the-art models.
| Dataset | Model | Accuracy | Precision | Recall | F-measure |
|---|---|---|---|---|---|
| Cough | Proposed | 97.12% | 96.96% | 94.11% | 95.14% |
| CNN | 92.57% | 86.45% | 88.12% | 87.28% | |
| RNN | 91.84% | 90.90% | 82.05% | 86.25% | |
| LSTM | 94.25% | 92.52% | 91.41% | 90.48% | |
| Speech | Proposed | 96.08% | 100% | 92.87% | 91.41% |
| CNN | 94.63% | 100% | 88.54% | 90.43% | |
| RNN | 93.91% | 100% | 85.65% | 87.42% | |
| LSTM | 92.98% | 100% | 89.74% | 89.52% | |
| Breathing | Proposed | 97.45% | 100% | 93.53% | 97.24% |
| CNN | 94.33% | 95.65% | 94.86% | 93.25% | |
| RNN | 92.73% | 93.44% | 92.88% | 91.58% | |
| LSTM | 93.56% | 94.84% | 92.98% | 95.43% | |
Hyperparameters of the proposed model.
| Model | Hyperparameter | Range | Optimized value |
|---|---|---|---|
| Proposed | Rate of learning | [0.0003, 0.001, 0.01] | 0.001 |
| Size of batch | [16, 32, 64, 128] | 32 | |
| Epochs | [20, 40, 60, 80, 100] | 80 | |
| Convolution layer | [11, 12, 13] | 12 | |
| Activation function | Fixed | ReLU | |
| Optimizer function | Fixed | Adam | |
| Loss calculator | Fixed | Cross entropy | |
| Train and test ratio | Fixed | 80%–20% |
Fig. 8Training accuracy graph of the proposed network.
Fig. 9Training loss graph of the proposed network.
Comparison of the proposed model with state-of-the-arts models for image dataset.
| Dataset | Framework | Accuracy | Precision | Recall | F-Measure | RoC |
|---|---|---|---|---|---|---|
| CT-Images | Proposed model | 90.64% | ||||
| ResNet-50 | 98.33% | 100% | 98.90% | |||
| DenseNet-121 | 96.80% | 94.35% | 78.95% | 88.24% | 96.51% | |
| MobileNetV2 | 96.80% | 100% | 86.67% | 92.86% | 97.64% | |
| ResNet-18 | 97.60% | 100% | 84.21% | 91.43% | 98.75% | |
| UNet | 97.38% | 97.90% | 96.68% | 97.29% | 98.80% |
Hyperparameters of the dynamic fusion model.
| Model | Hyperparameter | Range | Optimized value |
|---|---|---|---|
| Dynamic fusion | Number of decision tree | Fixed | 10 |
| Splitted samples | Fixed | 2 | |
| Training ratio | Fixed | 80 | |
| Testing ratio | Fixed | 20 |
Overall symptom determination accuracy.
| Model | Accuracy | Precision | Recall | F-measure |
|---|---|---|---|---|
| Dynamic fusion strategy | 98.72% | 100% | 94.87% | 96.68% |
| MaxVoting fusion | 95.54% | 100% | 93.53% | 94.18% |
Fig. 10Reliability analysis.
Fig. 11Stability analysis.
| |
| |
| P(i, j) |
| |
| |
| |
| |
| |
| Transfer |
| |
| |