| Literature DB >> 30470779 |
Lijuan Wu1,2, Yong Hu3,4, Xiaoxiao Liu1,2, Xiangzhou Zhang1,2, Weiqi Chen1,2, Alan S L Yu5, John A Kellum6, Lemuel R Waitman7, Mei Liu8.
Abstract
Acute Kidney Injury (AKI) is a common complication encountered among hospitalized patients, imposing significantly increased cost, morbidity, and mortality. Early prediction of AKI has profound clinical implications because currently no treatment exists for AKI once it develops. Feature selection (FS) is an essential process for building accurate and interpretable prediction models, but to our best knowledge no study has investigated the robustness and applicability of such selection process for AKI. In this study, we compared eight widely-applied FS methods for AKI prediction using nine-years of electronic medical records (EMR) and examined heterogeneity in feature rankings produced by the methods. FS methods were compared in terms of stability with respect to data sampling variation, similarity between selection results, and AKI prediction performance. Prediction accuracy did not intrinsically guarantee the feature ranking stability. Across different FS methods, the prediction performance did not change significantly, while the importance rankings of features were quite different. A positive correlation was observed between the complexity of suitable FS method and sample size. This study provides several practical implications, including recognizing the importance of feature stability as it is desirable for model reproducibility, identifying important AKI risk factors for further investigation, and facilitating early prediction of AKI.Entities:
Mesh:
Year: 2018 PMID: 30470779 PMCID: PMC6251919 DOI: 10.1038/s41598-018-35487-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Clinical demographics of patients in the analysis cohort.
| Characteristic n (%) | AKI-1 (n = 6,396) | AKI-2 (n = 678) | AKI-3 (n = 185) | non-AKI (n = 69,698) | P value |
|---|---|---|---|---|---|
| 18–25 | 303 (4.74) | 29 (4.28) | 25 (13.51) | 4596 (6.59) | <0.001 |
| 26–35 | 514 (8.04) | 44 (6.49) | 23 (12.43) | 7339 (10.53) | <0.001 |
| 36–45 | 711 (11.12) | 76 (11.21) | 25 (13.51) | 8601 (12.34) | 0.004 |
| 46–55 | 1218 (19.04) | 157 (23.16) | 35 (18.92) | 14374 (20.62) | 0.016 |
| 56–64 | 1672 (26.14) | 185 (27.29) | 49 (26.49) | 16192 (23.23) | <0.001 |
| >64 | 1978 (30.93) | 187 (27.58) | 28 (15.14) | 18596 (26.68) | <0.001 |
|
| |||||
| White | 4791 (74.91) | 487 (71.83) | 130 (70.27) | 53177 (76.30) | <0.001 |
| African American | 918 (14.35) | 111 (16.37) | 36 (19.46) | 9336 (13.39) | 0.003 |
| Asian | 45 (0.70) | 7 (1.03) | 2 (1.08) | 600 (0.86) | 0.302 |
| Other | 642 (10.04) | 73 (10.77) | 17 (9.19) | 6585 (9.45) | 0.079 |
|
| |||||
| Male | 3822 (59.76) | 378 (55.75) | 109 (58.92) | 37850 (54.31) | <0.001 |
Note: P value for the comparison of any AKI and non-AKI group was obtained by using Chi-square test.
Clinical variables considered in the encounters.
| Feature Category | # of Variables | Details |
|---|---|---|
| Demographics | 3 | Age, gender, race |
| Patients’ status | 5 | BMI, diastolic BP, systolic BP, pulse, temperature |
| Lab tests | 14 | Albumin, ALT, AST, Ammonia, Blood Bilirubin, BUN, Ca, CK-MB, CK, Glucose, Lipase, Platelets, Troponin, WBC |
| Comorbidities | 29 | University Health System Consortium (UHC) comorbidity |
| Admission diagnosis | 315 | University Health System Consortium (UHC) APR-DRG |
| Medications | 1271 | All medications are mapped to RxNorm ingredient |
| Medical History | 280 | ICD9 codes mapped to CCS major diagnoses |
Figure 1The comparison flow chart of feature selection methods. (t denotes the feature ranking of tth bootstrap samples, where 0 < t ≤ 100; i (or j) stands for ith (or jth) feature selection method, where 1 ≤ i, j ≤ 8).
Figure 2The stability of different feature selection methods.
Similarity of the 8 feature ranking methods with top 50 features.
| AKI | Methods | Chi2 | ILFS | ReliefF | LS | LLCFS | mRMR | RF | GBM |
|---|---|---|---|---|---|---|---|---|---|
| Stage 1 | Chi2 | 1.00 | 0.32 | 0.30 | 0.22 | 0.25 | 0.35 | 0.28 | 0.35 |
| ILFS | 1.00 | 0.39 | 0.35 | 0.45 | 0.25 | 0.45 | 0.35 | ||
| ReliefF | 1.00 | 0.41 | 0.45 | 0.19 | 0.64 | 0.39 | |||
| LS | 1.00 | 0.59 | 0.12 | 0.52 | 0.25 | ||||
| LLCFS | 1.00 | 0.22 | 0.61 | 0.30 | |||||
| mRMR | 1.00 | 0.25 | 0.45 | ||||||
| RF | 1.00 | 0.45 | |||||||
| GBM | 1.00 | ||||||||
| Stage 2 | Chi2 | 1.00 | 0.32 | 0.32 | 0.27 | 0.30 | 0.56 | 0.45 | 0.39 |
| ILFS | 1.00 | 0.39 | 0.35 | 0.37 | 0.27 | 0.39 | 0.33 | ||
| ReliefF | 1.00 | 0.37 | 0.43 | 0.27 | 0.52 | 0.39 | |||
| LS | 1.00 | 0.79 | 0.19 | 0.47 | 0.30 | ||||
| LLCFS | 1.00 | 0.23 | 0.52 | 0.32 | |||||
| mRMR | 1.00 | 0.37 | 0.41 | ||||||
| RF | 1.00 | 0.52 | |||||||
| GBM | 1.00 | ||||||||
| Stage 3 | Chi2 | 1.00 | 0.20 | 0.25 | 0.22 | 0.20 | 0.54 | 0.27 | 0.37 |
| ILFS | 1.00 | 0.22 | 0.27 | 0.28 | 0.18 | 0.28 | 0.28 | ||
| ReliefF | 1.00 | 0.30 | 0.30 | 0.25 | 0.32 | 0.28 | |||
| LS | 1.00 | 0.79 | 0.16 | 0.45 | 0.30 | ||||
| LLCFS | 1.00 | 0.15 | 0.47 | 0.32 | |||||
| mRMR | 1.00 | 0.23 | 0.33 | ||||||
| RF | 1.00 | 0.56 | |||||||
| GBM | 1.00 |
Figure 3The prediction performance of different feature selection methods.
Figure 4The trade-off between stability and prediction performance.
Top 10 features selected by 8 feature ranking methods for AKI stages 1–3.
| AKI | Chi2 | ILFS | ReliefF | LS | LLCFS | mRMR | RF | GBM |
|---|---|---|---|---|---|---|---|---|
| Stage 1 | MED134 | MED746 | MED1086 | Age | Pulse | MED134 | Age | WBC |
| MED1086 | MED1100 | WBC | Temperature | Systolic BP | MED1086 | Pulse | MED1086 | |
| MED516 | MED582 | MED321 | Pulse | BMI | WBC | WBC | MED1039 | |
| BUN | MED880 | Glucose | WBC | WBC | Glucose | BMI | MED134 | |
| WBC | MED308 | MED12 | BMI | BUN | MED548 | Systolic BP | CCS58 | |
| MED548 | Calcium | MED880 | Systolic BP | Age | MED1039 | MED1086 | BUN | |
| MED746 | Glucose | MED134 | AST | Glucose | BUN | MED134 | COM24 | |
| MED939 | MED134 | Calcium | BUN | AST | DRG0 | BUN | DRG179 | |
| MED321 | WBC | Age | Glucose | Bilirubin | DRG3 | Calcium | DRG97 | |
| MED880 | MED139 | MED677 | Diastolic BP | Temperature | COM2 | MED516 | MED319 | |
| Stage 2 | MED1086 | MED1100 | MED1086 | Age | Age | MED1086 | MED1086 | WBC |
| MED321 | Calcium | Glucose | Temperature | BMI | WBC | Age | MED1086 | |
| MED516 | MED582 | MED321 | Pulse | Pulse | DRG0 | WBC | DRG261 | |
| WBC | WBC | WBC | BMI | Systolic BP | COM12 | MED321 | DRG0 | |
| DRG261 | MED746 | MED655 | WBC | WBC | DRG261 | Systolic BP | MED1039 | |
| DRG0 | MED321 | MED880 | Systolic BP | AST | MED516 | Pulse | MED677 | |
| Glucose | MED134 | MED12 | AST | Glucose | Glucose | BMI | MED321 | |
| Bilirubin | Glucose | AST | Diastolic BP | BUN | Temperature | MED516 | COM2 | |
| Temperature | MED1086 | MED134 | MED655 | MED655 | Bilirubin | Glucose | Calcium | |
| MED548 | Albumin | COM12 | MED314 | Platelets | COM2 | Diastolic BP | DRG3 | |
| Stage 3 | MED1086 | MED321 | DRG178 | Age | Age | MED1086 | Age | Age |
| MED321 | Glucose | DRG3 | Temperature | WBC | DRG0 | Systolic BP | MED1086 | |
| DRG0 | MED1086 | DRG97 | Pulse | Pulse | Systolic BP | MED1086 | MED321 | |
| Temperature | WBC | MED1086 | BMI | BMI | Temperature | Pulse | MED516 | |
| MED516 | BMI | COM24 | Systolic BP | Systolic BP | WBC | Diastolic BP | Temperature | |
| DRG3 | MED139 | CCS71 | WBC | AST | MED314 | BMI | DRG261 | |
| Systolic BP | MED308 | MED314 | AST | BUN | DRG3 | MED321 | DRG0 | |
| DRG261 | Systolic BP | BMI | Diastolic BP | Diastolic BP | COM12 | Temperature | WBC | |
| WBC | MED582 | DRG0 | MED655 | Platelets | MED321 | WBC | CCS219 | |
| DRG263 | MED880 | COM12 | MED314 | Temperature | DRG261 | MED314 | MED314 |
Abbreviation: DRG0: Liver transplant; DRG3: Tracheotomy w/dmv w exten proc; DRG97: Maj small & large bowel proc; DRG178: Kidney/urinary trach malignancy; DRG179: Kidney/urinary trach-nonmalig; DRG261: Infect & parasitic disease; DRG263: Septicemia & dissem infect; COM2: Renal failure; COM12: Obesity; COM24: Hypertension; MED12: oxycodone; MED134: benzoic acid; MED139: 1,2,6-hexanetriol; MED308: (all-z)−4,7,10,13,16-docosapentaenoic acid; MED314: lactate; MED319: amphotericin b liposome; MED321: vancomycin; MED516: glucose; MED548: insulin regular, human buffered; MED582: levofloxacin; MED655: calcium chloride; MED677: polyethylene glycol 3350; MED746: insulin, aspart, human/rdna; MED880: heparin, porcine; MED939: amiodarone; MED1039: aldesleukin; MED1086: tazobactam; MED1100: magnesium sulfate; CCS58: Cystic fibrosis; CCS71: Skin and subcutaneous tissue infections; CCS219: Cancer of liver and intrahepatic bile duct.