| Literature DB >> 33063049 |
L J Muhammad1, Md Milon Islam2, Sani Sharif Usman3, Safial Islam Ayon2.
Abstract
Novel coronavirus (COVID-19 or 2019-nCoV) pandemic has neither clinically proven vaccine nor drugs; however, its patients are recovering with the aid of antibiotic medications, anti-viral drugs, and chloroquine as well as vitamin C supplementation. It is now evident that the world needs a speedy and quicker solution to contain and tackle the further spread of COVID-19 across the world with the aid of non-clinical approaches such as data mining approaches, augmented intelligence and other artificial intelligence techniques so as to mitigate the huge burden on the healthcare system while providing the best possible means for patients' diagnosis and prognosis of the 2019-nCoV pandemic effectively. In this study, data mining models were developed for the prediction of COVID-19 infected patients' recovery using epidemiological dataset of COVID-19 patients of South Korea. The decision tree, support vector machine, naive Bayes, logistic regression, random forest, and K-nearest neighbor algorithms were applied directly on the dataset using python programming language to develop the models. The model predicted a minimum and maximum number of days for COVID-19 patients to recover from the virus, the age group of patients who are of high risk not to recover from the COVID-19 pandemic, those who are likely to recover and those who might be likely to recover quickly from COVID-19 pandemic. The results of the present study have shown that the model developed with decision tree data mining algorithm is more efficient to predict the possibility of recovery of the infected patients from COVID-19 pandemic with the overall accuracy of 99.85% which stands to be the best model developed among the models developed with other algorithms including support vector machine, naive Bayes, logistic regression, random forest, and K-nearest neighbor. © Springer Nature Singapore Pte Ltd 2020.Entities:
Keywords: COVID-19; Coronavirus; Data mining; Decision tree; Pandemic; Patients’ recovery
Year: 2020 PMID: 33063049 PMCID: PMC7306186 DOI: 10.1007/s42979-020-00216-w
Source DB: PubMed Journal: SN Comput Sci ISSN: 2661-8907
Data type of each attribute
| S/N | Attribute | Data type |
|---|---|---|
| 1 | Gender | Object |
| 2 | Age | Object |
| 3 | Infection_case | Object |
| 4 | No_day | Int64 |
| 5 | State | Object |
Sample of the instances of the dataset
| S/N | Sex | Age | Infection_case | No_day | State |
|---|---|---|---|---|---|
| 1 | Male | 50s | Overseas inflow | 13 | Released |
| 2 | Male | 30s | Overseas inflow | 32 | Released |
| 3 | Male | 50s | Contact with patient | 20 | Released |
| 4 | Male | 20s | Overseas inflow | 16 | Released |
| 5 | Female | 20s | Contact with patient | 24 | Released |
| 6 | Female | 50s | Contact with patient | 19 | Released |
| 7 | Male | 20s | Contact with patient | 10 | Released |
| 8 | Male | 20s | Overseas inflow | 22 | Released |
| 9 | Male | 30s | Overseas inflow | 16 | Released |
| 10 | Female | 60s | Contact with patient | 24 | Released |
| 11 | Female | 50s | Overseas inflow | 23 | Released |
| 12 | Male | 20s | Overseas inflow | 20 | Released |
| 13 | Male | 80s | Contact with patient | 11 | Released |
| 14 | Female | 60s | Contact with patient | 25 | Released |
| 15 | Male | 70s | Contact with patient | 21 | Released |
Fig. 1Frequency of sex attribute
Fig. 2Frequency of age attribute
Fig. 3Frequency of infection_case attribute
Fig. 4Frequency of no_days attribute
Fig. 5Frequency of state attribute
Fig. 6Decision Tree model for COVID-19 infectedpatients’ recovery
Performance evaluation of predictive data mining models
| S/N | Predictive data mining models | Accuracy (%) |
|---|---|---|
| 1 | Decision tree | 99.85 |
| 2 | Support vector machine | 98.85 |
| 3 | Naive Bayes | 97.52 |
| 4 | Logistic regression | 97.49 |
| 5 | Random forest | 99.60 |
| 6 | K-nearest neighbor | 98.06 |
Fig. 7Performance evaluation results of the models