| Literature DB >> 34900191 |
Abstract
An improved nonlinear weighted extreme gradient boosting (XGBoost) technique is developed to forecast length of stay for patients with imbalance data. The algorithm first chooses an effective technique for fitting the duration of stay and determining the distribution law and then optimizes the negative log likelihood loss function using a heuristic nonlinear weighting method based on sample percentage. Theoretical and practical results reveal that, when compared to existing algorithms, the XGBoost method based on nonlinear weighting may achieve higher classification accuracy and better prediction performance, which is beneficial in treating more patients with fewer hospital beds.Entities:
Mesh:
Year: 2021 PMID: 34900191 PMCID: PMC8654524 DOI: 10.1155/2021/4714898
Source DB: PubMed Journal: J Healthc Eng ISSN: 2040-2295 Impact factor: 2.682
Figure 1Framework of XGBoost model.
Pseudocode for nonlinear weighted XGBoost algorithm.
| Model: nonlinear weighted XGBoost algorithm |
|---|
| Input: high-dimensional patient medical data |
| Output: hospital length-of-stay |
| |
| For |
|
|
| For |
|
|
| Ω( |
|
|
|
|
| End |
| End |
Parameters setting for different classifiers.
| The adopted classifiers | The setting of the predefined parameters |
|---|---|
| The proposed method | Number of trees: [10, 30, 50, 100] |
| Maximum depth of tree: [3, 5, 8, 10] | |
| Minimum leaf node weight sum: [1, 3, 6, 9] | |
| Learning rate parameters: [0.05, 0.1, 0.15, 0.2] | |
|
| |
| Naive Bayes | Weight control parameters: [0.5, 1, 1.5, 2, 2.5] |
|
| |
| XGBoost | Default parameters |
| Smoothing parameters | |
|
| |
| SVM | Kernel function: RBF |
| Penalty coefficient: [0.01, 0.1, 1, 10] | |
| Kernel parameter: [0.01, 0.001, 0.0001] | |
|
| |
| KNN | Number of nearest neighbors: [3, 5, 8, 10] |
| Maximum number of leaves: [5, 8, 10, 30] | |
|
| |
| Decision tree | Number of trees: [10, 30, 50, 100] |
| Maximum depth range: [3, 5, 8, 10] | |
| Learning rate range: [0.05, 0.1, 0.15, 0.2] | |
Gender distribution.
| Number of patients | Mean | Median | Standard deviation | Kurtosis | Skewness | |
|---|---|---|---|---|---|---|
| Total | 1986 | 9.10 | 8 | 14.25 | 9.55 | 2.289 |
| Male | 1028 | 9.69 | 8 | 14.68 | 7.18 | 2.158 |
| Female | 958 | 8.26 | 9 | 13.01 | 15.67 | 2.731 |
Figure 2Distribution of the length of stay of patients.
Quantitative prediction indicators of different algorithms.
| Model | ACC | RMSE | F1-score | Kappa coefficient |
|---|---|---|---|---|
| Naive Bayes | 0.7912 | 2.211 | 0.725 | 0.8017 |
| Decision tree | 0.8247 | 1.885 | 0.829 | 0.8182 |
| SVM | 0.8661 | 1.592 | 0.865 | 0.8611 |
| KNN | 0.8117 | 1.721 | 0.773 | 0.8482 |
| XGBoost | 0.7958 | 1.807 | 0.782 | 0.8251 |
| The proposed model | 0.8211 | 1.523 | 0.851 | 0.8622 |
Figure 3Comparison of ROC curve for different models.
Figure 4Comparison of PR curve for different models.
Figure 5Comparison of learning curve for different models.