| Literature DB >> 35368915 |
Alok Aggarwal1, Madam Chakradar1, Manpreet Singh Bhatia1, Manoj Kumar1, Thompson Stephan2, Sachin Kumar Gupta3, S H Alsamhi4, Hatem Al-Dois5.
Abstract
Individuals with pre-existing diabetes seem to be vulnerable to the COVID-19 due to changes in blood sugar levels and diabetes complications. As observed globally, around 20-50% of individuals affected by coronavirus had diabetes. However, there is no recent finding that diabetic patients are more prone to contract COVID-19 than nondiabetic patients. However, a few recent findings have observed that it could be at least twice as likely to die from complications of diabetes. Considering the multifold mortality rate of COVID-19 in diabetic patients, this study proposes a COVID-19 risk prediction model for diabetic patients using a fuzzy inference system and machine learning approaches. This study aimed to estimate the risk level of COVID-19 in diabetic patients without a medical practitioner's advice for timely action and overcoming the multifold mortality rate of COVID-19 in diabetic patients. The proposed model takes eight input parameters, which were found as the most influential symptoms in diabetic patients. With the help of the various state-of-the-art machine learning techniques, fifteen models were built over the rule base. CatBoost classifier gives the best accuracy, recall, precision, F1 score, and kappa score. After hyper-parameter optimization, CatBoost classifier showed 76% accuracy and improvements in the recall, precision, F1 score, and kappa score, followed by logistic regression and XGBoost with 75.1% and 74.7% accuracy. Stratified k-fold cross-validation is used for validation purposes.Entities:
Mesh:
Year: 2022 PMID: 35368915 PMCID: PMC8974235 DOI: 10.1155/2022/4096950
Source DB: PubMed Journal: J Healthc Eng ISSN: 2040-2295 Impact factor: 2.682
Figure 1Proposed inference pipeline.
Performance characteristics of ML techniques on COVID-19 symptoms.
| S. no | Model | Accuracy | Recall | Precision | F1 score | Kappa |
|---|---|---|---|---|---|---|
| 1 | Logistic regression | 0.7391 | 0.503 | 0.7536 | 0.7195 | 0.5995 |
| 2 | AdaBoost classifier | 0.7324 | 0.549 | 0.7433 | 0.7093 | 0.5908 |
| 3 | CatBoost classifier | 0.7166 | 0.601 | 0.7159 | 0.7136 | 0.5817 |
| 4 | Light gradient boosting machine | 0.7041 | 0.557 | 0.7031 | 0.6997 | 0.561 |
| 5 | Gradient boosting classifier | 0.6968 | 0.483 | 0.7052 | 0.6816 | 0.537 |
| 6 | Extreme gradient boosting | 0.6935 | 0.473 | 0.7037 | 0.6757 | 0.5303 |
| 7 | Extra trees classifier | 0.6928 | 0.562 | 0.6929 | 0.6908 | 0.5494 |
| 8 | Decision tree classifier | 0.6909 | 0.59 | 0.697 | 0.6922 | 0.5501 |
| 9 | Random forest classifier | 0.6909 | 0.558 | 0.6898 | 0.6884 | 0.5459 |
| 10 | SVM-linear kernel | 0.6733 | 0.449 | 0.703 | 0.639 | 0.4971 |
| 11 | K-neighbor classifier | 0.6534 | 0.495 | 0.6474 | 0.6461 | 0.485 |
| 12 | Ridge classifier | 0.6487 | 0.345 | 0.4885 | 0.5572 | 0.4365 |
| 13 | Quadratic discriminant analysis | 0.5182 | 0.426 | 0.5352 | 0.5067 | 0.3164 |
| 14 | Naive Bayes | 0.4943 | 0.493 | 0.6474 | 0.5279 | 0.3152 |
Input/output variables and their fuzzy sets.
| Input/output variables | Fuzzy sets |
|---|---|
| Cough (Input 1) |
|
| Fever (Input 2) |
|
| Sore throat (Input 3) |
|
| Cardiovascular disease (Input 4) |
|
| High blood pressure (Input 5) |
|
| Age (Input 6) |
|
| Sex (Input 7) |
|
| Travel history during the last 3 weeks (Input 8) |
|
| Prescription (output) |
|
Figure 2Fuzzy set membership diagrams.
Rule base of the fuzzy inference.
| Sl. no. | Input parameters | Output parameter | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Cough | Fever | Sore throat | Cardio. Disease | B.P. | Age | Sex | Travel history | Risk level | |
| 1 | Low | Low | Low | Low | Low | Low | Low | Low | Risk level 1 |
| 2 | Medium | Low | Low | Low | Low | Low | Low | Low | Risk level 1 |
| 3 | High | Low | Low | Low | Low | Low | Low | Low | Risk level 1 |
| 4 | Low | Medium | Low | Low | Low | Low | Low | Low | Risk level 1 |
| 5 | Medium | Medium | Low | Low | Low | Low | Low | Low | Risk level 1 |
| 6 | High | Medium | Low | Low | Low | Low | Low | Low | Risk level 1 |
| 7 | Low | High | Low | Low | Low | Low | Low | Low | Risk level 1 |
| 8 | Medium | High | Low | Low | Low | Low | Low | Low | Risk level 1 |
| 9 | High | High | Low | Low | Low | Low | Low | Low | Risk level 1 |
| 10 | Low | Low | Medium | Low | Low | Low | Low | Low | Risk level 1 |
| . . . | . . . | . . . | . . . | . . . | . . . | . . . | . . . | . . . | . . . |
| . . . | . . . | . . . | . . . | . . . | . . . | . . . | . . . | . . . | . . . |
| . . . | . . . | . . . | . . . | . . . | . . . | . . . | . . . | . . . | . . . |
| 3888 | High | High | High | High | High | Very high | High | High | Risk level 5 |
Sample of eight outputs.
| Cough | Fever | Sore throat | Cardiovascular disease | BP | Age | Sex | Travel history | Prescription |
|---|---|---|---|---|---|---|---|---|
| 3 | 101.5 | 0.5 | 4.5 | 120 | 27 | 0.5 | 0.5 | Risk level 2 |
| 2 | 105 | 0.6 | 5 | 110 | 37.5 | 0.8 | 0.8 | Risk level 3 |
| 5 | 103 | 0.3 | 4 | 140 | 47 | 0.6 | 0.8 | Risk level 4 |
| 5 | 105 | 0.8 | 5 | 150 | 32 | 0.4 | 0.7 | Risk level 5 |
| 2 | 99 | 0.3 | 4 | 110 | 48 | 0.5 | 0.4 | Risk level 1 |
| 6 | 99 | 0.6 | 3 | 120 | 25 | 0.4 | 0.4 | Risk level 2 |
| 6 | 103 | 0.2 | 6 | 125 | 20 | 0.5 | 0.8 | Risk level 4 |
| 2 | 98 | 0.6 | 5 | 135 | 55 | 0.3 | 0.3 | Risk level 3 |
Figure 3Comparison of accuracy after hyper-parameter optimization.
Figure 4Comparison of recall after hyper-parameter optimization.
Figure 5Comparison of precision after hyper-parameter optimization.
Figure 6Comparison of kappa score after hyper-parameter optimization.
Figure 7Comparison of F1 score after hyper-parameter optimization.
Figure 8Confusion matrices of CatBoost classifier before hyper-parameter tuning.
Figure 9Confusion matrices of CatBoost classifier after hyper-parameter tuning.
Figure 10ROC curve for CatBoost classifier with AUC scores.
Figure 11Validation of training and cross-validation scores.
Top three best performing models after tuning hyper-parameter.
| S. no. | Model | Accuracy | Recall | Precision | F1 score | Kappa score |
|---|---|---|---|---|---|---|
| 1 | CatBoost classifier | 0.7582 | 0.64 | 0.772 | 0.767 | 0.663 |
| 2 | Logistic regression | 0.751 | 0.57 | 0.753 | 0.731 | 0.631 |
| 3 | XGBoost classifier | 0.7471 | 0.55 | 0.727 | 0.721 | 0.591 |