| Literature DB >> 34857799 |
Sooyoung Yoo1, Jinwook Choi2,3, Borim Ryu4,5, Seok Kim4.
Abstract
Although several studies have attempted to develop a model for predicting 30-day re-hospitalization, few attempts have been made for sufficient verification and multi-center expansion for clinical use. In this study, we developed a model that predicts unplanned hospital readmission within 30 days of discharge; the model is based on a common data model and considers weather and air quality factors, and can be easily extended to multiple hospitals. We developed and compared four tree-based machine learning methods: decision tree, random forest, AdaBoost, and gradient boosting machine (GBM). Above all, GBM showed the highest AUC performance of 75.1 in the clinical model, while the clinical and W-score model showed the best performance of 73.9 for musculoskeletal diseases. Further, PM10, rainfall, and maximum temperature were the weather and air quality variables that most impacted the model. In addition, external validation has confirmed that the model based on weather and air quality factors has transportability to adapt to other hospital systems.Entities:
Mesh:
Year: 2021 PMID: 34857799 PMCID: PMC8639801 DOI: 10.1038/s41598-021-02395-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Basic characteristics of study data for each visit type.
| Characteristics | Derived cohorts | P value | ||
|---|---|---|---|---|
| Readmitted (N = 5794) | Non-readmitted (N = 56,128) | |||
| Age, year, mean (SD) | 75.2 (6.8) | 74.7 (6.7) | 0.000 | |
| Gender | Male, n (%) | 54.8 | 49.7 | |
| Female, n (%) | 45.2 | 50.3 | ||
| Age during hospital visit | 60s | 23.8 | 26.0 | |
| 70s | 49.7 | 50.2 | ||
| 80s | 23.7 | 21.4 | ||
| 90s | 2.7 | 2.4 | ||
| Season during admission | Spring | 25.4 | 24.1 | 0.049 |
| Summer | 25.5 | 26.8 | ||
| Fall | 24.4 | 24.8 | ||
| Winter | 24.6 | 24.3 | ||
| Average length of stay, mean (SD) | 2.5 (4.3) | 0.2 (0.4) | ||
| Charlson comorbidity index, mean | 1.11 | 0.52 | ||
Number of visits in each disease group and outcome incidence rate in our research cohorts.
| Disease groups | Train/test population (internal) | Valid population (external) | ||
|---|---|---|---|---|
| Target size (N) | % incidence | Target size (N) | % incidence | |
| Diseases of the circulatory system (I00–I99) | 9357 | 14.0 | 87,063 | 10.3 |
| Mental and behavioral disorders (F00–F99) | 3174 | 16.3 | 7228 | 17.5 |
| Diseases of the musculoskeletal system and connective tissue (M00–M99) | 13,564 | 11.8 | 41,015 | 11.7 |
| Diseases of the respiratory system (J00–J99) | 10,310 | 15.7 | 87,604 | 15.1 |
Comparison of disease-specific performance in each model based on the area under the receiver operating characteristic curve.
| Disease groups | Prediction models | Internal validation | External validation | ||
|---|---|---|---|---|---|
| Clinical covariates only | Clinical covariates and W-scores | Clinical covariates only | Clinical covariates and W-scores | ||
| Diseases of the circulatory system | DT | 0.653 | 0.674 | 0.664 | 0.679 |
| RF | 0.693 | 0.686 | 0.688 | 0.681 | |
| ADA | 0.698 | 0.708 | 0.672 | 0.67 | |
| GBM | 0.726a | 0.717a | 0.704a | 0.696a | |
| Mental and behavioral disorders | DT | 0.612 | 0.691 | 0.706 | 0.737 |
| RF | 0.703 | 0.692a | 0.743 | 0.686a | |
| ADA | 0.716 | 0.654 | 0.747 | 0.728 | |
| GBM | 0.747a | 0.676 | 0.751a | 0.727 | |
| Diseases of the musculoskeletal system and connective tissue | DT | 0.68 | 0.690b | 0.856 | 0.889b |
| RF | 0.719 | 0.734 | 0.909 | 0.882 | |
| ADA | 0.726b | 0.739a | 0.917b | 0.915a | |
| GBM | 0.751a | 0.725 | 0.883a | 0.9 | |
| Diseases of the respiratory system | DT | 0.634 | 0.607 | 0.651 | 0.622 |
| RF | 0.653 | 0.643 | 0.658 | 0.638 | |
| ADA | 0.663 | 0.667 | 0.639 | 0.655 | |
| GBM | 0.672a | 0.675a | 0.669a | 0.667a | |
aBest performances for each disease.
bMajor improvements in external validation.
Weather and air quality predictors in W-score.
| Disease groups | covariateName | covariateValue | CovariateMean | |
|---|---|---|---|---|
| CovariateMean WithOutcome | CovariateMean WithNoOutcome | |||
| Diseases of the circulatory system | PM10 | 0.0016 | 12.59 | 13.13 |
| Rainfall | 0.0011 | 2.22 | 2.32 | |
| Humidity | 0.0005 | 0.29 | 0.28 | |
| Min Temperature | 0.0006 | 0.65 | 0.59 | |
| Max Temperature | 0.0005 | 0.85 | 0.83 | |
| Mental and behavioral disorders | PM10 | 0.0016 | 12.36 | 13.24 |
| Rainfall | 0.0012 | 1.95 | 2.30 | |
| Humidity | 0.0014 | 0.41 | 0.31 | |
| Min Temperature | 0.0008 | 0.67 | 0.58 | |
| Max Temperature | 0.0004 | 0.71 | 0.81 | |
| Diseases of the musculoskeletal system and connective tissue | PM10 | 0.0015 | 12.85 | 13.18 |
| Rainfall | 0.0012 | 2.24 | 2.32 | |
| Humidity | 0.0008 | 0.31 | 0.30 | |
| Min Temperature | 0.0007 | 0.63 | 0.55 | |
| Max Temperature | 0.0007 | 0.86 | 0.84 | |
| Diseases of the respiratory system | PM10 | 0.0038 | 12.86 | 13.01 |
| Rainfall | 0.0032 | 2.30 | 2.29 | |
| Humidity | 0.0005 | 0.31 | 0.29 | |
| Min Temperature | 0.0012 | 0.60 | 0.54 | |
| Max Temperature | 0.0036 | 1.01 | 0.97 | |
Summary of parameter values in each model.
| Models | Parameters | Values | Parameter mean |
|---|---|---|---|
| DT | classWeight | “Balance” or “None” | |
| maxDepth | 10 | The maximum depth of the tree | |
| minImpuritySplit | 10−7 | Threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf | |
| minSamplesLeaf | 10 | The minimum number of samples per leaf | |
| minSamplesSplit | 2 | The minimum samples per split | |
| RF | Max depth | 4, 10, 17 | Max levels in a tree |
| mtries | −1 = square root of total features, 5, 20 | Number of features in each tree | |
| ntrees | 500 | Number of trees | |
| ADA | Learning rate | 1 | Learning rate shrinks the contribution of each classifier by learning_rate. There is a trade-off between learningRate and nEstimators |
| n estimators | 4 | The maximum number of estimators at which boosting is terminated | |
| GBM | Learning rate | 0.005, 0.01, 0.1 | The boosting learn rate |
| earlyStopRound | 25 | Stopping after rounds without improvement | |
| Max depth | 4, 6, 17 | Max levels in a tree | |
| minRows | 2 | Min data points in a node | |
| ntrees | 100, 1000 | Number of trees |
Figure 1ROC curves for the validation of the Adaboost and decision tree models.
Figure 2Study cohort design.
Figure 3Overall methodology of the study.
Figure 4Prediction window.