Literature DB >> 36160183

Prediction of Length of Hospital Stay of COVID-19 Patients Using Gradient Boosting Decision Tree.

GholamReza Askari1,2, Mohammad Hossein Rouhani1,2, Mohammad Sattari3.   

Abstract

The aim of this paper is to predict the patient hospitalization time with coronavirus disease 2019 (COVID-19). It uses various data mining techniques, such as random forest. Many rules were derived by applying these techniques to the dataset. The extracted rules mainly were related to people over 55 years old. The rule with the most support states that if the person is between 70 and 80 years old, has cardiovascular disease, and the gender is female; then, the person will be hospitalized for at least five days. The gradient boosting random forest technique has performed better than other techniques. As a limitation of the study, it can be pointed out that a few features were unavailable and had not been recorded. Patients with diabetes, chronic respiratory problems, and cardiovascular diseases have a relatively long hospitalization. So, the hospital manager should consider a suitable priority for these patients. Older people were also more likely to take part in the selection rules.
Copyright © 2022 GholamReza Askari et al.

Entities:  

Year:  2022        PMID: 36160183      PMCID: PMC9507755          DOI: 10.1155/2022/6474883

Source DB:  PubMed          Journal:  Int J Biomater        ISSN: 1687-8787


1. Introduction

The coronavirus disease of 2019 (COVID-19) is a phenomenon that has plagued and killed many people in large numbers of countries [1]. COVID-19 is defined as a disease or infection by a new strain of coronavirus and it is called acute coronary syndrome. The devastating effects of COVID-19 are still being seen worldwide. These effects are also evident in the cultural and social dimensions. The disease spread rapidly and it has disrupted the ordinary lives of people. Moreover, it prevented people from attending many gatherings. Masks and social distance have been proposed as approaches to combat this disease. These approaches have led to dramatic changes in business conditions. Also, they have raised the new technologies issues [2, 3]. COVID-19 has many clinical features. The clinical features of COVID-19 vary from asymptomatic to severe disease and death [4]. Many underlying disorders, including cardiovascular disease, chronic kidney disease, chronic respiratory disease, diabetes mellitus (DM), hypertension, and obesity are represented as potential risk factors for severe COVID-19. The severe COVID-19 leads to hospitalization in the intensive care unit [ICU] and even death [5, 6]. COVID-19 is a multisymptom disease. The symptoms included fever, cough, fatigue, sputum production, diarrhea, and taste disturbances [7, 8]. Some patients also experienced muscle pain, fatigue, and loss of taste or smell [9]. Prolonged hospitalization has a high cost for the individual and the health system. It causes a significant burden, especially for the poor and low-income groups [10]. Prolonged hospitalization put a lot of pressure on hospitals and medical staff. Thus, it was challenging to manage the ICU beds in hospitals. Considering that the mortality rate of hospitalized patients varies from 5% to 25% [11], if the system can predict the patient hospitalization time, it will implement an effective strategy to overcome this issue. In fact, by indicating the hospitalization time, the managers can help make appropriate decisions about the allocation of hospital beds. It will also help improve decisions of the disease. The science of data mining has been proposed to reduce the workload of physicians. It provides a suitable model for making better decisions in recent years. The primary purpose of this paper is to predict the time of COVID-19 hospitalization by data mining techniques.

2. Materials and Methods

It consists of four parts: data collection, data preprocessing, modeling, and evaluation measures.

2.1. Data Collection

The database contains information of COVID-19 hospitalized patients. This information was available in the SIB system. The integrated health system (IHS) entitled “SIB” was launched in 2016 and aimed to act as an electronic health record (EHR) in the field of health. This system consists of more than 35,000 covid-19 patient information. Unfortunately, more than 14,000 of these people have died. As the dead people information is unrelated to hospitalization time prediction, only 21000 patient data were used. Also, the information about patients whose COVID-19 result is negative was ignored (n = 7000). So, it applies approximately 14000 patient data. Moreover, about 1700 patients were registered in ICU. This patient information was excluded from the dataset. So, the final evaluation applied on 12300 patient information. Database attributes include patient's age, gender, COVID-19 outcome, underlying diseases (cancer, chronic kidney disease, diabetes, cardiovascular disease, chronic neurological disease, AIDS, chronic blood disease, chronic liver disease, chronic respiratory disease, and hypertension), malnutrition, obesity, date of admission, sample date, date of discharge, sampling date, date of death, date of COVID-19 outcome, and pregnancy. The COVID-19 results consist of four values: negative, positive, repeated samples, and re-sampling. The underlying disease features were binary. The negative results were ignored.

2.2. Data Preprocessing

The first step in data preprocessing was to select a subset of related features. Most of the database features are unrelated. So, at first, the list of influential factors was determined using the opinion of cardiologists. Therefore, 19 out of 45 features were selected as the most relevant features of the dataset. Features that were unrelated to the study are removed from the attribute set. The removed features are sampling date, date of the COVID-19 outcome, and date of death. The date of determining the COVID-19 outcome is when the result of the COVID-19 tests is ready. The age was divided into 18 to 55, 56 to 64, 65 to 69, 70 to 79, and 80 years and above. The hospitalization time is the target attribute, which is calculated by subtracting the date of discharge from the date of admission. Hospitalization time was divided into intervals of less than 24 hours [12], one to three days, four to five days, six to eight days, nine to ten days, and more than ten days. Some records miss the discharge date. As the admission time (target attribute) is calculated based on the discharge date, records whose discharge date was not defined were removed.

2.3. Modeling

Patients may have different hospitalization times, and the number of patients in each class will differ. For example, the class of one day has 1000 patients, and the class of more than 10 days has 40 patients. This difference will cause an imbalance in the number of patients in each class. So, the first step in modeling is using techniques to eliminate the imbalance in the dataset.

2.3.1. Modeling-Imbalance in the Dataset

There are several ways to resolve the imbalance. One way is data balancing. Data balancing has two methods. The first one is random majority under-sampling, which balances class distribution by randomly deleting majority class instances. The second one is random minority oversampling (ROS), which adds randomly selected instances of the minority class (by replacement) to the original dataset.

2.3.2. Modeling Techniques

The techniques used in this section include gradient boosting random forest, ID3, and random forest. Random forest [13] operates randomly by creating several trees at random and making decisions based on selection. The ID3 technique is a fuzzy decision tree-based with the most minor depth, in which each feature is placed in the tree growth path only once. This technique, unlike the random forest technique, is a definitive technique. The proposed method is the gradient boosting technique. This technique aim is creating a robust final model from a series of weak models.

2.4. Evaluation Measures

The evaluation part used different measures. One of these measures was accuracy [14]. The closer the accuracy to 1, the better the performance of the methods. This measure is calculated based on the following formula: Another criterion is confidence [15] that the closer to one, the higher performance of the method.

3. Findings

According to Table 1, the number of COVID-19 patients with diabetes, the most common group of underlying diseases, was 1485. Cardiovascular disease is the second most common underlying disease among COVID-19 patients. The third most common disease is hypertension, which affects 1,201 COVID-19 patients. Moreover, about 200 COVID-19 patients were also battling cancer. The lowest number of people with underlying disease belonged to AIDS and splenectomy. Table 2 represents values of gender, age, discharge, COVID-19 result, length of hospital stays, and pregnancy.
Table 1

The number of COVID-19 patients with each underlying disease.

Underlying diseaseThe number of patients
Cancer184
Cardiovascular disease1105
Chronic liver disease73
Chronic neurological disease287
Chronic respiratory disease478
Chronic kidney disease326
Diabetes1094
AIDS4
Hypertension1098
Obesity65
Chronic blood disease68
Other immunodeficiency diseases54
Splenectomy4
Table 2

Lists of the other attributes and their different values.

AttributesType of values
PregnancyYes-No

DischargeOutpatient
Hospitalization

Age<18
18–55
56–64
65–69
70–79
≥80

GenderFemale-male

COVID-19 resultNegative
Positive
Positive again
Need for re-sampling

Length of hospital stay<1 day
1–3 days
4–5 days
6–8 days
9–10 days
>10 day
In Table 3, the random forest gradient boosting method performed better than the other techniques. It has acceptable performance in more than 73% of cases. This technique has also been able to improve the performance of the random forest by about 3.5%.
Table 3

The accuracy of different data mining techniques.

TechniquesAccuracy (%)
ID372.28
Random forest70.13
Gradient boosting random forest73.51
By applying different methods, 34 rules were derived. The proposed method chose the rules with more than 200 support. So, 9 out of 34 rules were derived, which are shown in Table 4.
Table 4

Extracted selected rules regarding the prediction of the hospitalization time of COVID-19 patients.

Extracted selected rulesSupportConfidence (%)
If the person is between 18 and 55 years old, has cancer, the gender is male, and the COVID-19 result is positive; then, the person will be hospitalized for between 1 and 3 days28572
If the person is between 70 and 80 years old, has a cardiovascular disease, and the gender is female; then, the person will be hospitalized for at least 5 days257076
If the person is between 55 and 64 years old, has a chronic kidney disease, and the gender is male; then, the person will be hospitalized for between 1 and 5 days87483
If the person is between 18 and 54 years old and has a chronic liver disease; then, the person will be hospitalized for 4 to 5 days54770
If the person is over 65 years old, has a chronic neurological disease, and the gender is female; then, the person will be hospitalized for between 1 and 5 days47575
If the person is between 65 and 70 old, has a chronic respiratory disease, and the gender is male; then, the person will be hospitalized for between 1 and 8 days31968
If the person is between 18 and 54 years old and has diabetes; then, the person will be hospitalized for between 5 and 10 days72781
If the person is between 55 and 64 years old, has diabetes, and the gender is male; then, the person will be hospitalized for more than 8 days67371
If the person is a woman, over 80 years old, and has hypertension; then, the person will be hospitalized for less than five days82369
The rule with the most support states that if the person is between 70 and 80 years old, has cardiovascular disease, and the gender is female; then, the person will be hospitalized for at least five days. The rule with the most confidence states that if the person is between 55 and 64 years old has a chronic kidney disease; then, the person will be hospitalized for between 1 and 5 days.

4. Discussion

This paper predicted the COVID-19 patients' hospitalization time by using various data mining techniques. Many rules were derived by implementing these techniques on the dataset. The extracts mainly were related to people over 55 years old. Diabetes, chronic respiratory disease, and cardiovascular disease patients have relatively long hospitalization. The rule with the most support states that if the person is between 70 and 80 years old, has cardiovascular disease, and the gender is female; then, the person will be hospitalized for at least five days. Older patients are more prone to COVID-19, which affects their hospitalization time [13]. The cardiovascular has less referring to hospital than before. So, their number of hospitalizations have decreased during the COVID-19 pandemic. The rule states with 83% confidence that if the person is between 55 and 64 years old, has chronic kidney disease, and the gender is male; then, the person will be hospitalized for between 1 and 5 days. Unlike people with heart disease, people with kidney diseases had more visits to medical centers during the COVID-19 period than before. Studies have also shown that kidney patients are more likely to develop COVID-19 coronary infections due to a weakened immune system [16]. The rule states with 71% confidence that if the person is between 55 and 64 years old, has diabetes, and the gender is male; then, the person has been hospitalized for more than 8 days. Also, another rule states with 81% certainty that if the person is between 18 and 54 years old and has diabetes; then, the person will be hospitalized for between 5 and 10 days. By examining these two rules and comparing them with other rules, the role of the underlying disease of diabetes is evident, so this disease has taken up the most hospitalization time. The interesting point in the rule with 72% confidence states that if the person is between 18 and 55 years old, has cancer, and the gender is male; then, the hospitalization time will between 1 and 3 days. The hospitalization time of cancer patients between 18 and 55 years is less of other patients. During the COVID-19 period, the number of cancer patients decreased compared to before this period [17]. The gradient boosting random forest technique has performed better than other techniques. As a limitation of the study, it can be pointed out that a few features were unavailable and had not been recorded.

5. Conclusion

This paper aimed to predict the hospitalization time of COVID-19 patients using decision tree-based techniques. The output of the article was in the form of rules. Diseases such as diabetes, chronic respiratory, and cardiovascular had more extended hospital stay than other diseases. So, the hospital manager should consider a suitable priority for these patients. Older people were also more likely to take part in the selection rules.
  13 in total

1.  Clinical characteristics of 276 hospitalized patients with coronavirus disease 2019 in Zengdu District, Hubei Province: a single-center descriptive study.

Authors:  Yiping Wei; Weibiao Zeng; Xiangyun Huang; Junyu Li; Xingting Qiu; Huadong Li; Dinghua Liu; Zhaofeng He; Wenzhong Yao; Ping Huang; Chao Li; Min Zhu; Chunlan Zhong; Xingen Zhu; Jiansheng Liu
Journal:  BMC Infect Dis       Date:  2020-07-29       Impact factor: 3.090

Review 2.  A Review of Coronavirus Disease-2019 (COVID-19).

Authors:  Tanu Singhal
Journal:  Indian J Pediatr       Date:  2020-03-13       Impact factor: 1.967

3.  Fewer Hospitalizations for Acute Cardiovascular Conditions During the COVID-19 Pandemic.

Authors:  Ankeet S Bhatt; Alea Moscone; Erin E McElrath; Anubodh S Varshney; Brian L Claggett; Deepak L Bhatt; James L Januzzi; Javed Butler; Dale S Adler; Scott D Solomon; Muthiah Vaduganathan
Journal:  J Am Coll Cardiol       Date:  2020-05-26       Impact factor: 24.094

4.  Clinical characteristics of fatal and recovered cases of coronavirus disease 2019 in Wuhan, China: a retrospective study.

Authors:  Yan Deng; Wei Liu; Kui Liu; Yuan-Yuan Fang; Jin Shang; Ling Zhou; Ke Wang; Fan Leng; Shuang Wei; Lei Chen; Hui-Guo Liu
Journal:  Chin Med J (Engl)       Date:  2020-06-05       Impact factor: 2.628

5.  Has the chief complaint of patients with COVID-19 disease changed over time?

Authors:  Mohammad Dehghani Firouzabadi; Fatemeh Dehghani Firouzabadi; Sogand Goudarzi; Hesam Jahandideh; Maryam Roomiani
Journal:  Med Hypotheses       Date:  2020-06-07       Impact factor: 1.538

6.  Risk Factors of Fatal Outcome in Hospitalized Subjects With Coronavirus Disease 2019 From a Nationwide Analysis in China.

Authors:  Ruchong Chen; Wenhua Liang; Mei Jiang; Weijie Guan; Chen Zhan; Tao Wang; Chunli Tang; Ling Sang; Jiaxing Liu; Zhengyi Ni; Yu Hu; Lei Liu; Hong Shan; Chunliang Lei; Yixiang Peng; Li Wei; Yong Liu; Yahua Hu; Peng Peng; Jianming Wang; Jiyang Liu; Zhong Chen; Gang Li; Zhijian Zheng; Shaoqin Qiu; Jie Luo; Changjiang Ye; Shaoyong Zhu; Xiaoqing Liu; Linling Cheng; Feng Ye; Jinping Zheng; Nuofu Zhang; Yimin Li; Jianxing He; Shiyue Li; Nanshan Zhong
Journal:  Chest       Date:  2020-04-15       Impact factor: 9.410

7.  COVID-19 Infection in a Patient with End-Stage Kidney Disease.

Authors:  Dian Fu; Bo Yang; Jing Xu; Zhiguo Mao; Chenchen Zhou; Cheng Xue
Journal:  Nephron       Date:  2020-03-27       Impact factor: 2.847

8.  Locked-down digital work.

Authors:  Alexander Richter
Journal:  Int J Inf Manage       Date:  2020-06-01

9.  Hospital admission rates, length of stay, and in-hospital mortality for common acute care conditions in COVID-19 vs. pre-COVID-19 era.

Authors:  A A Butt; A B Kartha; N A Masoodi; A M Azad; N A Asaad; M U Alhomsi; H A H Saleh; R Bertollini; A-B Abou-Samra
Journal:  Public Health       Date:  2020-09-21       Impact factor: 2.427

10.  Characteristics of and Important Lessons From the Coronavirus Disease 2019 (COVID-19) Outbreak in China: Summary of a Report of 72 314 Cases From the Chinese Center for Disease Control and Prevention.

Authors:  Zunyou Wu; Jennifer M McGoogan
Journal:  JAMA       Date:  2020-04-07       Impact factor: 56.272

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.