| Literature DB >> 35372378 |
Aaron Chuah1, Giles Walters2, Daniel Christiadi2, Krishna Karpe2, Alice Kennard2, Richard Singer2, Girish Talaulikar2, Wenbo Ge3, Hanna Suominen3,4, T Daniel Andrews1,5, Simon Jiang1,2,5.
Abstract
Background andEntities:
Keywords: XGBoost (Extreme Gradient Boosting); chronic kidney disease; end stage kidney disease (ESKD); machine learning (ML); prediction model
Year: 2022 PMID: 35372378 PMCID: PMC8965763 DOI: 10.3389/fmed.2022.837232
Source DB: PubMed Journal: Front Med (Lausanne) ISSN: 2296-858X
Baseline characteristics.
|
| |||
|---|---|---|---|
| Age at presentation (median, IQR) | 62 (48–72) | 64 (51–73) | 0.13 |
| Sex Male | 1,101 (58%) | 274 (57%) | 0.9 |
| Diagnosis | 0.8 | ||
| DKD | 412 (22%) | 112 (23%) | |
| HTN | 357 (19%) | 83 (17%) | |
| GN and Vasculitis | 272 (14%) | 65 (14%) | |
| Genetic | 50 (2.6%) | 10 (2.1%) | |
| Other | 819 (43%) | 208 (44%) | |
| CKD Stage | 0.14 | ||
| Stage 1 | 137 (7.2%) | 20 (4.2%) | |
| Stage 2 | 571 (30%) | 153 (32%) | |
| Stage 3a | 379 (20%) | 102 (21%) | |
| Stage 3b | 489 (26%) | 114 (24%) | |
| Stage 4 | 334 (17%) | 89 (19%) |
P-values reported are for chi-squared tests of homogeneity between the train and test sets. Since these p-values are insignificant (above 0.05), the train and test sets are deemed homogeneous. CKD, Chronic Kidney Disease; DKD, Diabetic Kidney Disease; GN, Glomerulonephritis.
The 25 most predictive features of the initial and optimized models.
|
|
| ||||
|---|---|---|---|---|---|
| eGFR | 80th percentile | 1 | 0.25254 | 1 | 0.24691 |
| eGFR | Minimum | 2 | 0.15594 | 2 | 0.14371 |
| eGFR | 20th percentile | 3 | 0.13539 | 3 | 0.13772 |
| eGFR | 10th percentile | 4 | 0.11008 | 4 | 0.11848 |
| Standing heart rate | Minimum | 5 | 0.09653 | 25 | 0.01417 |
| Age | Initial | 6 | 0.09118 | 5 | 0.08176 |
| eGFR | CWT coefficients (2,5,10,20; 2,2) | 7 | 0.05538 | 10 | 0.03539 |
| Glucose | 80th percentile | 8 | 0.05421 | 8 | 0.05527 |
| eGFR | 70th percentile | 9 | 0.04888 | 6 | 0.06204 |
| eGFR | 60th percentile | 10 | 0.03283 | 14 | 0.02945 |
| Standing-sitting pulse pressure difference | Mean | 11 | 0.0327 | – | – |
| eGFR | Mean change in 40 and 80th percentile | 12 | 0.03149 | 11 | 0.03376 |
| Glucose | CWT coefficients (2,5,10,20; 2,20) | 13 | 0.02902 | 18 | 0.02302 |
| Glucose | 10th percentile | 14 | 0.02861 | – | – |
| Standing diastolic blood pressure | Minimum | 15 | 0.02707 | – | – |
| Glucose | Unnormalized CID Complexity Estimate | 16 | 0.026 | 24 | 0.01522 |
| Sitting heart rate | Minimum | 17 | 0.02561 | 9 | 0.03748 |
| eGFR | Mean central second derivative | 18 | 0.02501 | 12 | 0.03189 |
| eGFR | 40th percentile | 19 | 0.0224 | – | – |
| eGFR | Mean change | 20 | 0.02229 | 23 | 0.01572 |
| Glucose | Absolute energy | 21 | 0.02136 | 16 | 0.02408 |
| eGFR | Linear trend rightmost-value | 22 | 0.01957 | 15 | 0.02541 |
| Sitting heart rate | Maximum | 23 | 0.01931 | – | – |
| Sitting systolic blood pressure | Minimum | 24 | 0.01916 | – | – |
| Glucose | Median | 25 | 0.01889 | 21 | 0.01802 |
| Sitting heart rate | Linear trend rightmost-value | – | – | 7 | 0.06066 |
| Sitting heart rate | Welch density (coeff = 2) | – | – | 13 | 0.02951 |
| Sitting systolic blood pressure | C3 non-linearity in timeseries (lag = 1) | – | – | 17 | 0.02353 |
| Glucose | 10th percentile | – | – | 19 | 0.023 |
| Standing-sitting pulse pressure difference | CWT coefficients (2,5,10,20; 0,2) | – | – | 20 | 0.02133 |
| Glucose | Maximum | – | – | 22 | 0.01694 |
Figure 1Representative SHAP dependency plots taken from the top 25 features in the optimal model. SHAP values represent the predictive value of a feature in models in which they are integrated. Positive SHAP valu es imply a contribution to ESKD risk, while negative values are protective against ESKD. Selected panels are (A) estimated glomerular filtration rate (eGFR) at the 80th percentile, the most predictive of ESKD in both models, (B) minimum eGFR, the second-most predictive value in both models, (C) Standing heart rate, (D) patient age at study initiation, (E) blood glucose levels at the 80th percentile, and (F) individual points represent the SHAP and feature values of an individual evaluated by the optimal model. The points in all plots are colored spectrally by initial age to differentiate between younger patients (yellow) and older patients (dark blue). The top 25 SHAP dependency plots are shown in full in Supplementary Figure 3.
Figure 2Boxplots of (A) predictions of year of ESKD for each patient by all six clinicians and (B) predictions of ESKD across the patient cohort by each clinician.
Comparative performance of 2-year ESKD predictions by optimized model and expert clinicians.
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| Predicted ESKD (actual = 5/49 patients) | 4 | 2 | 9 | 13 | 13 | 14 | 22 |
| True positives | 3 | 0 | 2 | 2 | 4 | 4 | 4 |
| True negatives | 43 | 42 | 37 | 33 | 35 | 34 | 26 |
| False positives | 1 | 2 | 7 | 11 | 9 | 10 | 18 |
| False negatives | 2 | 5 | 3 | 3 | 1 | 1 | 1 |
| Accuracy | 0.939 | 0.857 | 0.796 | 0.714 | 0.796 | 0.776 | 0.612 |
| Sensitivity | 0.600 | 0.000 | 0.400 | 0.400 | 0.800 | 0.800 | 0.800 |
| Specificity | 0.977 | 0.955 | 0.841 | 0.750 | 0.795 | 0.773 | 0.591 |
| Precision (PPV) | 0.750 | 0.000 | 0.222 | 0.154 | 0.308 | 0.286 | 0.182 |
| F1 score | 0.667 | 0.000 | 0.286 | 0.222 | 0.444 | 0.421 | 0.296 |
| MCC | 0.638 | 0.000 | 0.188 | 0.103 | 0.408 | 0.384 | 0.238 |
ESKD, End Stage Kidney Disease; MCC, Matthews Correlation Coefficient; PPV, Positive Predictive Value.
Comparative performance of 2-year ESKD predictions by optimized model and 4- and 8- variable KFRE.
|
|
|
|
|
|---|---|---|---|
| Predicted end-stage (actual = 5/49 patients) | 4 | 2 | 1 |
| True positives | 3 | 1 | 0 |
| True negatives | 43 | 41 | 41 |
| False positives | 1 | 1 | 1 |
| False negatives | 2 | 3 | 4 |
| Accuracy | 0.939 | 0.913 | 0.891 |
| Sensitivity | 0.600 | 0.250 | 0.000 |
| Specificity | 0.977 | 0.976 | 0.976 |
| Precision (PPV) | 0.750 | 0.500 | 0.000 |
| F1 score | 0.667 | 0.613 | 0.488 |
| MCC | 0.638 | 0.271 | 0.000 |
ESKD, End Stage Kidney Disease; KFRE, Kidney Failure Risk Equation; MCC, Matthews Correlation Coefficient; PPV, Positive Predictive Value.