| Literature DB >> 34131365 |
Elif Ceren Gök1, Mehmet Onur Olgun1.
Abstract
An increase in the number of patients and death rates make Covid-19 a serious pandemic situation. This problem has effects on health security, economical security, social life, and many others. The long and unreliable diagnosis process of the Covid-19 makes the disease spread even faster. Therefore, fast and efficient diagnosis is significant for dealing with this pandemic. Computer-aided medical diagnosis systems are very common applications and due to the importance of the problem, providing accurate predictions is required. In this study, blood samples of patients from Einstein Hospital in Brazil has collected and used for prediction on the severity level of Covid-19 with machine learning algorithms. The study was constructed in two stages; in stage-one, no preprocessing method has applied while in stage-two preprocessing has emphasized for achieving better prediction results. At the end of the study, 0.98 accuracy was obtained with the tuned Random Forest algorithm and several preprocessing methods.Entities:
Keywords: Covid-19; Imputation; Machine learning; Random forest; SMOTE-NC
Year: 2021 PMID: 34131365 PMCID: PMC8193596 DOI: 10.1007/s00521-021-06189-y
Source DB: PubMed Journal: Neural Comput Appl ISSN: 0941-0643 Impact factor: 5.606
Fig. 1Workflow diagram of the study
Fig. 2SMOTE
Fig. 3Random forest
Fig. 4Confusion matrix
Fig. 5Illustration of grid search CV
Fig. 6a Initial data and b Cleared data
Fig. 7Distribution of each feature under target variable
Fig. 8Confusion matrix of gradient boosting
Fig. 9a Data before imputation and b After imputation and duplicate row elimination
Fig. 10Target distribution of data a Before SMOTE-NC b After SMOTE-NC
Tuned hyperparameters of random forest
| max_depth | min_samples_leaf | min_samples_split | n_estimators | random_state |
|---|---|---|---|---|
| 15 | 0.001 | 3 | 750 | 21 |
Fig. 11Confusion matrix of random forest
Classification report of gradient boosting classifier before preprocessing
| Target | Precision | Recall | F1-score | Accuracy |
|---|---|---|---|---|
| 0 | 0.96 | 1.00 | 0.98 | 0.9492 |
| 1 | 0.80 | 1.00 | 0.89 | |
| 2 | 0.00 | 0.00 | 0.00 | |
| 3 | 0.00 | 0.00 | 0.00 |
Classification report of random forest classifier after preprocessing
| Target | Precision | Recall | F1-score | Accuracy |
|---|---|---|---|---|
| 0 | 1.00 | 0.98 | 0.99 | 0.9796 |
| 1 | 0.87 | 1.00 | 0.93 | |
| 2 | 0.70 | 1.00 | 0.82 | |
| 3 | 1.00 | 1.00 | 1.00 |
Similar approaches with Einstein data set
| Studies | Recall Score |
|---|---|
| Barbosa et al. [ | 0.9989 |
| 0.9981 | |
| 0.9903 | |
| Banerjee et al. [ | 0.92 |
| 0.65 | |
| Schwab et al. [ | 0.82 |
| 0.80 | |
| 0.75 |