| Literature DB >> 33313182 |
Jingyi Wu1,2, Guilan Kong1,2, Yu Lin1, Hong Chu3, Chao Yang3, Ying Shi4, Haibo Wang1,2,4, Luxia Zhang1,2,3.
Abstract
BACKGROUND: The hospital admission rate is high in patients treated with peritoneal dialysis (PD), and the length of stay (LOS) in the hospital is a key indicator of medical resource allocation. This study aimed to develop a scoring tool for predicting prolonged LOS (pLOS) in PD patients by combining machine learning and traditional logistic regression (LR).Entities:
Keywords: Length of stay (LOS); logistic regression (LR); machine learning; peritoneal dialysis (PD); scoring methods
Year: 2020 PMID: 33313182 PMCID: PMC7723539 DOI: 10.21037/atm-20-1006
Source DB: PubMed Journal: Ann Transl Med ISSN: 2305-5839
Candidate predictor variables considered for the scoring system
| Category | Variable name |
|---|---|
| Demographic characteristics | Age |
| Sex | |
| Nationality | |
| Place of residence | |
| Insurance type | |
| Disease characteristics† | Admission reason |
| Specific cause of CKD | |
| Comorbidities† | Diabetes |
| Hypertension | |
| Heart failure | |
| Cardiac arrhythmia | |
| Coronary heart disease | |
| Stroke | |
| Pulmonary infections | |
| Infections except pulmonary infections | |
| Tumor | |
| Gastrointestinal hemorrhage | |
| Gastrointestinal inflammation/ulcer | |
| Gallbladder disease | |
| Liver disease | |
| Kidney stone | |
| Peripheral vascular disease | |
| Gout | |
| Hyperparathyroidism | |
| Hypoparathyroidism | |
| Hyperlipidemia | |
| Fracture | |
| Clinical characteristics | Admission type |
| Number of hospitalizations within 6 months | |
| Number of emergency admissions within 6 months | |
| Admission department | |
| Planned admission or not | |
| Admission day of the week | |
| Admission to the same hospital as the last or not |
†, disease characteristics and comorbidities were extracted using ICD-10 codes. CKD, chronic kidney disease.
Figure 1Diagram of our proposed approach for building a scoring tool. LR, logistic regression; RF, random forest; CART, classification and regression tree; GBDT, gradient boosting decision tree.
Characteristics of patients treated with PD in the final cohort
| Item | Total, n (%) | With pLOS, n (%) | Without pLOS, n (%) | Proportion difference (%) (with pLOS − without pLOS) |
|---|---|---|---|---|
| N | 22,859 | 5,754 (25.2) | 17,105 (74.8) | – |
| Age, year | 51.9±14.9 | 52.7±15.0 | 51.6±14.9 | – |
| Sex | ||||
| Female | 10,142 (44.4) | 2,597 (45.1) | 7,545 (44.1) | 1.0 |
| Male | 12,717 (55.6) | 3,157 (54.9) | 9,560 (55.9) | −1.0 |
| Nationality | ||||
| Han | 18,089 (79.1) | 4,597 (79.9) | 13,492 (78.9) | 1.0 |
| Others | 931 (4.1) | 225 (3.9) | 706 (4.1) | −0.2 |
| Unclear | 3,839 (16.8) | 932 (16.2) | 2,907 (17.0) | −0.8 |
| Place of residence | ||||
| Eastern China | 9,023 (39.5) | 1,992 (34.6) | 7,031 (41.1) | −6.5 |
| Northern China | 2,138 (9.4) | 680 (11.8) | 1,458 (8.5) | 3.3 |
| Central China | 3,267 (14.3) | 909 (15.8) | 2,358 (13.8) | 2.0 |
| Southern China | 3,803 (16.6) | 1,054 (18.3) | 2,749 (16.1) | 2.2 |
| Southwestern China | 2,788 (12.2) | 670 (11.6) | 2,118 (12.4) | −0.8 |
| Northwestern China | 1,043 (4.6) | 188 (3.3) | 855 (5.0) | −1.7 |
| Northeastern China | 797 (3.5) | 261 (4.5) | 536 (3.1) | 1.4 |
| Insurance type | ||||
| UEBMI | 8,635 (37.8) | 2,121 (36.9) | 6,514 (38.1) | −1.2 |
| URBMI | 2,083 (9.1) | 520 (9.0) | 1,563 (9.1) | −0.1 |
| NRCMS | 5,821 (25.5) | 1,544 (26.8) | 4,277 (25.0) | 1.8 |
| Free medical care | 312 (1.4) | 80 (1.4) | 232 (1.4) | 0.0 |
| Self-paid treatment | 3,318 (14.5) | 788 (13.7) | 2,530 (14.8) | −1.1 |
| Others | 2,690 (11.8) | 701 (12.2) | 1,989 (11.6) | 0.6 |
| Admission reason | ||||
| ESKD† | 13,329 (58.3) | 2,003 (34.8) | 11,326 (66.2) | −31.4 |
| Dialysis access | 4,213 (18.4) | 2,271 (39.5) | 1,942 (11.4) | 28.1 |
| Dialysis complications | 3,502 (15.3) | 902 (15.7) | 2,600 (15.2) | 0.5 |
| Diabetes | 188 (0.8) | 50 (0.9) | 138 (0.8) | 0.1 |
| Hypertension | 349 (1.5) | 85 (1.5) | 264 (1.5) | 0.0 |
| Heart failure | 35 (0.2) | 10 (0.2) | 25 (0.1) | 0.1 |
| Coronary heart disease | 125 (0.5) | 27 (0.5) | 98 (0.6) | −0.1 |
| Stroke | 90 (0.4) | 26 (0.5) | 64 (0.4) | 0.1 |
| Infection | 228 (1.0) | 42 (0.7) | 186 (1.1) | −0.4 |
| Hypertension | 114 (0.5) | 36 (0.6) | 78 (0.5) | 0.1 |
| Gastrointestinal hemorrhage | 20 (0.1) | 6 (0.1) | 14 (0.1) | 0.0 |
| Tumor | 51 (0.2) | 15 (0.3) | 36 (0.2) | 0.1 |
| Severe anemia | 84 (0.4) | 51 (0.9) | 33 (0.2) | 0.7 |
| Surgery | 531 (2.3) | 230 (4.0) | 301 (1.8) | 2.2 |
| Specific cause of CKD | ||||
| Diabetic nephropathy | 2,657 (11.6) | 752 (13.1) | 1,905 (11.1) | 2.0 |
| Hypertensive nephropathy | 2,846 (12.5) | 636 (11.1) | 2,210 (12.9) | −1.8 |
| Glomerulonephropathy | 5,061 (22.1) | 1,311 (22.8) | 3,750 (21.9) | 0.9 |
| Tubulointerstitial nephropathy | 394 (1.7) | 106 (1.8) | 288 (1.7) | 0.1 |
| Obstructive nephropathy | 337 (1.5) | 113 (2.0) | 224 (1.3) | 0.7 |
| Others | 11,564 (50.6) | 2,836 (49.3) | 8,728 (51.0) | −1.7 |
| Number of comorbidities | ||||
| 0 | 3,460 (15.1) | 709 (12.3) | 2,751 (16.1) | −3.8 |
| 1 | 7,853 (34.4) | 1,741 (30.3) | 6,112 (35.7) | −5.4 |
| 2 | 6,239 (27.3) | 1,678 (29.2) | 4,561 (26.7) | 2.5 |
| 3 | 3,409 (14.9) | 983 (17.1) | 2,426 (14.2) | 2.9 |
| ≥4 | 1,898 (8.3) | 643 (11.2) | 1,255 (7.3) | 3.9 |
†, admission reason was recorded as ESKD in the electronic inpatient discharge record. UEBMI, urban employee basic medical insurance; URBMI, urban resident basic medical insurance; NRCMS, new rural cooperative medical insurance; ESKD, end-stage kidney disease; pLOS, prolonged length of stay; CKD, chronic kidney disease.
Prediction performance of the four models
| Model | Brier score | AUROC | ECI |
|---|---|---|---|
| LR | 0.161 | 0.743 | 8.036 |
| CART | 0.163 | 0.731 | 8.173 |
| RF | 0.158 | 0.756 | 7.883 |
| GBDT | 0.158 | 0.755 | 7.891 |
LR, logistic regression; CART, classification and regression tree; RF, random forest; GBDT, gradient boosting decision tree; AUROC, area under the receiver operation characteristic curve; ECI, estimated calibration index.
Figure 2Ten most predictive factors identified by the RF model. ESKD, end-stage kidney disease; RF, random forest; NRCMS, new rural cooperative medical insurance.
The LR model derived using top 10 predictive variables identified by the RF model
| Variable | Coefficient | P value | OR | 95% confidence interval |
|---|---|---|---|---|
| Admission reason | ||||
| ESKD | −1.506 | 0.000* | 0.222 | 0.206, 0.240 |
| Dialysis complications | −0.861 | 0.000* | 0.423 | 0.382, 0.468 |
| Admission to the same hospital as the last or not | ||||
| No | ||||
| Yes | −0.552 | 0.000* | 0.576 | 0.522, 0.636 |
| Number of hospitalizations within 6 months | ||||
| 0 | ||||
| 1 | 0.096 | 0.087 | 1.100 | 0.986, 1.228 |
| 2 | −0.117 | 0.137 | 0.889 | 0.762, 1.038 |
| 3 | −0.174 | 0.162 | 0.840 | 0.659, 1.073 |
| ≥4 | −0.324 | 0.033* | 0.723 | 0.537, 0.973 |
| Comorbidity | ||||
| Pulmonary infections | 0.619 | 0.000* | 1.856 | 1.665, 2.071 |
| Admission department | ||||
| Nephrology department | −0.155 | 0.000* | 0.856 | 0.804, 0.911 |
| Others | ||||
| Place of residence | ||||
| Northern China | 0.126 | 0.033* | 1.134 | 1.010, 1.273 |
| Central China | 0.433 | 0.000* | 1.542 | 1.392, 1.707 |
| Southern China | 0.357 | 0.000* | 1.429 | 1.297, 1.575 |
| Insurance type | ||||
| NRCMS | −0.038 | 0.341 | 0.962 | 0.889, 1.042 |
*, P<0.05. NRCMS, new rural cooperative medical insurance; LR, logistic regression; RF, random forest; ESKD, end-stage kidney disease.
A scoring system for LOS prediction
| Variable | Score |
|---|---|
| Admission reason | |
| ESKD | 2 |
| Dialysis complications | 5 |
| Others | 7 |
| Admission to the same hospital as the last or not | |
| No | 2 |
| Yes | 0 |
| Number of hospitalizations within 6 months | |
| <4 | 1 |
| ≥4 | 0 |
| Comorbidity | |
| Pulmonary infections | 2 |
| Admission department | |
| Nephrology department | 1 |
| Others | 0 |
| Place of residence | |
| Northern China | 1 |
| Central China | 2 |
| Southern China | 1 |
LOS, length of stay; ESKD, end-stage kidney disease.
Figure 3Comparison of the LR generated and observed pLOS probabilities in different pLOS risk groups. LR, logistic regression; pLOS, prolonged length of stay.
Figure 4Distribution of averaged LOS across different pLOS risk groups. LOS, length of stay; pLOS prolonged LOS.