| Literature DB >> 32719767 |
Celestine Iwendi1, Ali Kashif Bashir2, Atharva Peshkar3, R Sujatha4, Jyotir Moy Chatterjee5, Swetha Pasupuleti6, Rishita Mishra7, Sofia Pillai6, Ohyun Jo8.
Abstract
Integration of artificial intelligence (AI) techniques in wireless infrastructure, real-time collection, and processing of end-user devices is now in high demand. It is now superlative to use AI to detect and predict pandemics of a colossal nature. The Coronavirus disease 2019 (COVID-19) pandemic, which originated in Wuhan China, has had disastrous effects on the global community and has overburdened advanced healthcare systems throughout the world. Globally; over 4,063,525 confirmed cases and 282,244 deaths have been recorded as of 11th May 2020, according to the European Centre for Disease Prevention and Control agency. However, the current rapid and exponential rise in the number of patients has necessitated efficient and quick prediction of the possible outcome of an infected patient for appropriate treatment using AI techniques. This paper proposes a fine-tuned Random Forest model boosted by the AdaBoost algorithm. The model uses the COVID-19 patient's geographical, travel, health, and demographic data to predict the severity of the case and the possible outcome, recovery, or death. The model has an accuracy of 94% and a F1 Score of 0.86 on the dataset used. The data analysis reveals a positive correlation between patients' gender and deaths, and also indicates that the majority of patients are aged between 20 and 70 years.Entities:
Keywords: COVID-19; boosting; healthcare analytics; infection; patient data; random forest classification
Year: 2020 PMID: 32719767 PMCID: PMC7350612 DOI: 10.3389/fpubh.2020.00357
Source DB: PubMed Journal: Front Public Health ISSN: 2296-2565
Dataset description.
| id | Patient Id | NA | Numeric |
| location | The location where the patient belongs to | Multiple cities located throughout the world | String, Categorical |
| country | Patient's native country | Multiple countries | String, Categorical |
| gender | Patient's gender | Male, Female | String, Categorical |
| age | Patient's age | NA | Numeric |
| sym_on | The date patient started noticing the symptoms | NA | Date |
| hosp_vis | Date when the patient visited the hospital | NA | Date |
| vis_wuhan | Whether the patient visited Wuhan, China | Yes (1), No (0) | Numeric, Categorical |
| from_wuhan | Whether the patient belonged to Wuhan, China | Yes (1), No (0) | Numeric, Categorical |
| death | Whether the patient passed away due to COVID-19 | Yes (1), No (0) | Numeric, Categorical |
| Recov | Whether the patient recovered | Yes (1), No (0) | Numeric, Categorical |
| symptom1. symptom2, symptom3, symptom4, symptom5, symptom6 | Symptoms noticed by the patients | Multiple symptoms noticed by the patients | String, Categorical |
Figure 1Symptoms in patients.
Figure 2Correlation between data features.
Figure 3Evaluation metrics for decision tree.
Figure 6Evaluation metrics for Boosted Random Forest.
Figure 7Decision tree.
Figure 8Decision tree 1.
Figure 11Decision tree 100.
Optimal hyperparameters returned by grid search.
| n_estimators | 100 |
| max_depth | 2 |
| min_samples_leaf | 2 |
| min_samples_split | 2 |
| criterion | gini |
Evaluation results.
| Recall score | 0.75 |
| Precision score | 1.0 |
| F1 score | 0.86 |
| Accuracy | 0.94 |
Figure 12Comparison of Models' performance.