| Literature DB >> 34149290 |
Ke Wang1,2,3, Jing Tian4, Chu Zheng1,3, Hong Yang1,3, Jia Ren1, Chenhao Li1,3, Qinghua Han4, Yanbo Zhang1,3.
Abstract
PURPOSE: This study sought to develop models with good identification for adverse outcomes in patients with heart failure (HF) and find strong factors that affect prognosis. PATIENTS AND METHODS: A total of 5004 qualifying cases were selected, among which 498 cases had adverse outcomes and 4506 cases were discharged after improvement. The study subjects were hospitalized patients diagnosed with HF from a regional cardiovascular hospital and the cardiology department of a medical university hospital in Shanxi Province of China between January 2014 and June 2019. Synthesizing minority oversampling technology combined with edited nearest neighbors (SMOTE+ENN) was used to pre-process unbalanced data. Traditional logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGBoost) were used to build risk identification models, and each model was repeated 100 times. Model discrimination and calibration were estimated using F1-score, the area under the receiver-operating characteristic curve (AUROC), and Brier score. The best performing of the five models was used to identify the risk of adverse outcomes and evaluate the influencing factors.Entities:
Keywords: SHAP; SMOTE+ENN; XGBoost; heart failure; machine learning
Year: 2021 PMID: 34149290 PMCID: PMC8206455 DOI: 10.2147/RMHP.S310295
Source DB: PubMed Journal: Risk Manag Healthc Policy ISSN: 1179-1594
Figure 1Architecture of the system.
Figure 2Results of feature screening by RFE-RF with fivefold CV.
Risk Factors Selected for Adverse Outcomes in Patients with HF
| Variable | Adverse Outcomes | P value | Variable | Adverse Outcomes | P value | ||
|---|---|---|---|---|---|---|---|
| No | Yes | No | Yes | ||||
| Age (years) | 67.0(59.0–76.0) | 76.0(68.0–81.0) | <0.001 | HDLC (μmol/L) | 1.0(0.8–1.1) | 1.0(0.9–1.2) | 0.004 |
| DP (mmHg) | 130(120–140) | 130(118–150) | 0.029 | LDLC (μmol/L) | 2.4(1.9–2.9) | 2.3(1.8–2.9) | 0.008 |
| SP (mmHg) | 80(70–85) | 76(70–84) | <0.001 | BUN (mmol/L) | 6.0(4.9–7.6) | 7.0(5.4–9.41) | <0.001 |
| Height (cm) | 167.0(160.0–171.0) | 165.0(160.0–170.0) | 0.013 | CR (mmol/L) | 78.0(66.0–92.9) | 91.2(74.9–115.6) | <0.001 |
| Weight (kg) | 69.0(60.0–75.0) | 65.0(55.0–71.0) | <0.001 | UA (μmol/L) | 365.0(297.0–443.0) | 403.0(324.0–502.1) | <0.001 |
| BMI (kg/m) | 24.9(22.5–27.2) | 23.4(21.1–25.9) | <0.001 | K.1 (mmol/L) | 4.1(3.8–4.3) | 4.1(3.8–4.4) | 0.007 |
| WBC (109/L) | 6.6(5.5–7.9) | 6.9(5.7–8.4) | 0.003 | NA (mmol/L) | 140.0(138.0–142.0) | 139.3(137.0–141.2) | <0.001 |
| RBC (1012/L) | 4.4(4.0–4.8) | 4.2(3.8–4.6) | <0.001 | CL (mmol/L) | 104.0(101.8–107.0) | 102.2(99.4–105.0) | <0.001 |
| RDW (%) | 13.8(13.3–14.5) | 14.4(13.7–15.3) | <0.001 | CYSC (mg/L) | 1.1(0.9–1.3) | 1.27(1.04–1.6) | <0.001 |
| HGB (g/L) | 137.0(125.0–149.0) | 130.0(117.0–143.0) | <0.001 | NTPROBNP | 869.8(324.8–2427.7) | 3072.1(1324.3–6324.1) | <0.001 |
| NEU (1010/L) | 4.2(3.3–5.3) | 4.7(3.6–5.9) | <0.001 | SG | 1.0(1.0–1.0) | 1.0(1.0–1.0) | 0.007 |
| N (%) | 63.5(57.1–70.0) | 68.5(62.3–75.1) | <0.001 | Heartrate | 70(62–82) | 78.5(67–92) | <0.001 |
| ALT (U/L) | 19.0(13.4–29.0) | 17.0(11.8–28.0) | <0.001 | QRS (ms) | 96(88–108) | 102(90–122) | <0.001 |
| ALB (g/L) | 43.6(40–46.9) | 40.8(37.0–43.8) | <0.001 | QTC (ms) | 431(406–462) | 447(420–478) | <0.001 |
| TBIL (μmol/L) | 14.5(11.0–19.6) | 15.3(11.3–21.7) | 0.006 | LA (mm) | 38.4(36.0–42.0) | 41.0(38.0–46.0) | <0.001 |
| DBIL (μmol/L) | 3.5(2.4–5.2) | 4.8(3.1–6.6) | <0.001 | RA (mm) | 35.0(31.0–40.0) | 37.8(33.0–45.0) | <0.001 |
| x.GT (U/L) | 27.0(18.1–43.7) | 33.0(20.0–56.0) | <0.001 | RA1 (mm) | 43.0(39.0–47.0) | 45.0(40.0–50.0) | <0.001 |
| GLU (μmol/L) | 5.1(4.5–6.2) | 5.3(4.6–6.8) | <0.001 | LVDD (mm) | 52.0(47.0–58.0) | 55.0(49.0–61.0) | <0.001 |
| TG (mmol/L) | 1.4(1.0–1.9) | 1.2(0.9–1.6) | <0.001 | EF (%) | 53.0(41.0–62.0) | 45.0(35.0–56.3) | <0.001 |
| Healthcare | <0.001 | NYHA | <0.001 | ||||
| Urban employee | 2270(50.4%) | 263(52.8%) | 18(0.4%) 0.4% | 0(0.0%) | |||
| Urban residents | 559(13.30%) | 56(11.2%) | 2025(44.9%) | 96(19.3%) | |||
| Rural cooperative | 1160(25.7%) | 103(20.7%) | 1696(37.6%) | 193(38.8%) | |||
| Poverty relief | 6(0.1%) | 0(0.0%) | IV | 767(17.0%) | 209(42.0%) | ||
| Full public | 24(0.5%) | 11(2.2%) | Pumonary | <0.001 | |||
| Self-paying | 142(3.2%) | 31(6.2%) | No | 3968(88.1%) | 327(65.7%) | ||
| Other | 305(6.8%) | 34(6.8%) | Yes | 538(11.9%) | 171(34.3%) | ||
| Lung Rales | <0.001 | PVS1AI | <0.001 | ||||
| No | 3648(81.0%) | 285(57.2%) | No | 2507(55.6%) | 179(35.9%) | ||
| Moist rales | 830(18.4%) | 205(41.2%) | Little | 1718(38.1%) | 246(49.4%) | ||
| Dry rales | 28(0.6%) | 8(1.6%) | Moderate | 246(5.5%) | 59(11.8%) | ||
| Infection | <0.001 | Massive | 35(0.8%) | 14(2.8%) | |||
| No | 4129(91.6%) | 376(75.5%) | |||||
| Yes | 377(8.4%) | 122(24.5%) | |||||
Note: Values are median (interquartile range) or n (%).
Results of ML Models for the Unbalanced Data and the Data After Pretreatment with SMOTE+ENN(SME) [Mean (95% CI)]
| Models | F1-Score | AUC | Brier Score |
|---|---|---|---|
| LR | 0.0000(0.0000,0.0000) | 0.7583(0.7542,0.7624) | 0.7583(0.7542,0.7624) |
| KNN | 0.0375 (0.0322,0.0429) | 0.6721 (0.6675,0.6768) | 0.0904 (0.0898,0.0909) |
| SVM | 0.0000 (0.0000,0.0000) | 0.7218 (0.7117,0.7318) | 0.0869 (0.0865,0.0873) |
| RF | 0.0000 (0.0000,0.0000) | 0.7993 (0.7957,0.8030) | 0.0796 (0.0793,0.0798) |
| XGBoost | 0.3515 (0.3458,0.3572) | 0.7918 (0.7879,0.7957) | 0.1733 (0.1728,0.1737) |
| SME-LR | 0.2914(0.2891,0.2936) | 0.7819(0.7784,0.7853) | 0.2801(0.2782,0.2820) |
| SME-KNN | 0.2667 (0.2631,0.2703) | 0.6481 (0.6437,0.6525) | 0.3256 (0.3230,0.3283) |
| SME-SVM | 0.1976 (0.1922,0.2030) | 0.6963 (0.6925,0.7001) | 0.1632 (0.1615,0.1650) |
| SME-RF | 0.3606 (0.3567,0.3645) | 0.7983 (0.7947,0.8019) | 0.1577 (0.1565,0.1588) |
| SME-XGBoostb | 0.3673 (0.3633,0.3712) | 0.8010 (0.7974,0.8046) | 0.1769 (0.1748,0.1789) |
| <0.001 | <0.001 | <0.001 |
Notes: aP value is the result of one-way analysis of variance for the three indicators of models. bAfter multiple comparisons of least-significant difference (LSD), SME-XGBoost is significantly different from other models.
Figure 3Categorization threshold of prediction score (A) and prediction distributions of adverse outcomes in patients with HF (B).
Figure 4SHAP summary plots for the risk of adverse outcomes in patients with HF. The importance ranking of the top 20 risk factors with stability and interpretation using SME-XGBoost model. The SHAP value (x-axis) is a unified index responding to the impact of a feature in the model. In each feature importance row, all patients’ attribution to outcome were plotted using different color dots, in which the red dot represented high risk value and the blue dot represented low risk value.