| Literature DB >> 33451801 |
Tarik Alafif1, Reem Alotaibi2, Ayman Albassam3, Abdulelah Almudhayyani4.
Abstract
A respiratory syndrome COVID-19 pandemic has become a serious public health issue nowadays. The COVID-19 virus has been affecting tens of millions people worldwide. Some of them have recovered and have been released. Others have been isolated and few others have been unfortunately deceased. In this paper, we apply and compare different machine learning approaches such as decision tree models, random forest, and multinomial logistic regression to predict isolation, release, and decease states for COVID-19 patients in South Korea. The prediction can help health providers and decision makers to distinguish the states of infected patients based on their features in early intervention to take an action either by releasing or isolating the patient after the infection. The proposed approaches are evaluated using Data Science for COVID-19 (DS4C) dataset. An analysis of DS4C dataset is also provided. Experimental results and evaluation show that multinomial logistic regression outperforms other approaches with 95% in a state prediction accuracy and a weighted average F1-score of 95%.Entities:
Keywords: COVID-19; Classification; Decease; Decision tree; Isolation; Multinomial logistic regression; Prediction; Random forest; Release
Mesh:
Year: 2021 PMID: 33451801 PMCID: PMC7785285 DOI: 10.1016/j.isatra.2020.12.053
Source DB: PubMed Journal: ISA Trans ISSN: 0019-0578 Impact factor: 5.911
Fig. 1The number of samples distribution for each state in DS4C dataset.
Fig. 2The samples distribution for sex feature in DS4C dataset.
Fig. 3The samples distribution for age feature in DS4C dataset.
Fig. 4The causes for COVID-19 infection in South Korea according to the infection cases in DS4C dataset.
Fig. 5The generated DT architecture of our approach using the maximum DT depth of 3.
Prediction accuracy, error rates and the weighted average F1-scores for the applied algorithms using DS4C dataset.
| Algorithms implemented | Accuracy | Error rate | Weighted average F1-score |
|---|---|---|---|
| DT (Depth | 82.92% | 17.08% | 81.74% |
| DT (Depth | 85.63% | 14.37% | 84.67% |
| DT (Depth | 88.21% | 11.79% | 87.47% |
| RF | 92.55% | 07.45% | 92.28% |
| MLR |
Fig. 6Maximum tree depth tuning based DT.
Fig. 7RF error rates with an increase in the number of classification trees. The number of trees and the estimated error rates are shown on x-axis and y-axis, respectively.
Confusion matrices for actual versus predicted patients’ states.
| DT (Depth | |||
|---|---|---|---|
| Deceased | Isolated | Released | |
| Deceased | 0 | 0 | 0 |
| Isolated | 21 | 1,444 | 90 |
| Released | 57 | 714 | 2,839 |
| DT (Depth | |||
| Deceased | Isolated | Released | |
| Deceased | 0 | 0 | 0 |
| Isolated | 23 | 1,586 | 92 |
| Released | 55 | 572 | 2,837 |
| DT (Depth | |||
| Deceased | Isolated | Released | |
| Deceased | 0 | 0 | 0 |
| Isolated | 25 | 1,808 | 181 |
| Released | 53 | 350 | 2,748 |
| RF | |||
| Deceased | Isolated | Released | |
| Deceased | 22 | 18 | 38 |
| Isolated | 0 | 1,992 | 166 |
| Released | 0 | 163 | 2,766 |
| MLR | |||
| Deceased | Isolated | Released | |
| Deceased | 44 | 14 | 20 |
| Isolated | 15 | 2,042 | 101 |
| Released | 19 | 89 | 2,821 |