| Literature DB >> 32941535 |
Daijo Inaguma1, Akimitsu Kitagawa1, Ryosuke Yanagiya2, Akira Koseki3, Toshiya Iwamori3, Michiharu Kudo3, Yukio Yuzawa4.
Abstract
Artificial intelligence is increasingly being adopted in medical fields to predict various outcomes. In particular, chronic kidney disease (CKD) is problematic because it often progresses to end-stage kidney disease. However, the trajectories of kidney function depend on individual patients. In this study, we propose a machine learning-based model to predict the rapid decline in kidney function among CKD patients by using a big hospital database constructed from the information of 118,584 patients derived from the electronic medical records system. The database included the estimated glomerular filtration rate (eGFR) of each patient, recorded at least twice over a period of 90 days. The data of 19,894 patients (16.8%) were observed to satisfy the CKD criteria. We characterized the rapid decline of kidney function by a decline of 30% or more in the eGFR within a period of two years and classified the available patients into two groups-those exhibiting rapid eGFR decline and those exhibiting non-rapid eGFR decline. Following this, we constructed predictive models based on two machine learning algorithms. Longitudinal laboratory data including urine protein, blood pressure, and hemoglobin were used as covariates. We used longitudinal statistics with a baseline corresponding to 90-, 180-, and 360-day windows prior to the baseline point. The longitudinal statistics included the exponentially smoothed average (ESA), where the weight was defined to be 0.9*(t/b), where t denotes the number of days prior to the baseline point and b denotes the decay parameter. In this study, b was taken to be 7 (7-day ESA). We used logistic regression (LR) and random forest (RF) algorithms based on Python code with scikit-learn library (https://scikit-learn.org/) for model creation. The areas under the curve for LR and RF were 0.71 and 0.73, respectively. The 7-day ESA of urine protein ranked within the first two places in terms of importance according to both models. Further, other features related to urine protein were likely to rank higher than the rest. The LR and RF models revealed that the degree of urine protein, especially if it exhibited an increasing tendency, served as a prominent risk factor associated with rapid eGFR decline.Entities:
Mesh:
Year: 2020 PMID: 32941535 PMCID: PMC7497987 DOI: 10.1371/journal.pone.0239262
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Patient flow.
Fig 2Representative examples of reference points in each group.
Patients characteristics and laboratory data at reference point.
| Variables | All n, 19,732 | RD group n, 9,866 | Non-RD group n, 9,866 | p value |
|---|---|---|---|---|
| Age (years old) | 68.5, 13.7 | 68.5, 13.7 | 68.5, 13.6 | 1.000 |
| Female gender (%) | 41.7 | 41.7 | 41.7 | 1.000 |
| Comorbidity of diabetes (%) | 31.3 | 35.6 | 27.1 | < 0.001 |
| History of AKI (%) | 4.6 | 4.6 | 4.5 | 0.707 |
| SBP (mmHg) | 131, 26 | 136, 26 | 128, 26 | < 0.001 |
| DBP (mmHg) | 73, 15 | 74, 15 | 72, 15 | < 0.001 |
| Use of RASIs (%) | 61.8 | 56.8 | 66.8 | < 0.001 |
| eGFR (ml/min/1.73m2) | 39.9, 26.0 | 39.9, 26.0 | 39.9, 26.1 | 0.760 |
| Serum creatinine (mg/dL) | 2.23, 2.04 | 2.25, 2.05 | 2.21, 2.04 | 0.061 |
| BUN (mg/dL) | 29.5, 19.1 | 29.8, 17.9 | 29.3, 20.1 | < 0.001 |
| Hemoglobin (mg/dL) | 11.5, 2.2 | 11.4, 2.1 | 11.5, 2.3 | 0.001 |
| Hematocrit (%) | 34.8, 6.4 | 34.7, 6.1 | 34.9, 6.8 | < 0.001 |
| Serum T-C (mg/dL) | 181, 49 | 186, 50 | 175, 47 | < 0.001 |
| Serum TG (mg/dL) | 142, 91 | 151, 100 | 133, 79 | < 0.001 |
| Serum uric acid (mg/dL) | 6.2, 2.0 | 6.3, 1.9 | 6.0, 2.0 | < 0.001 |
| Urine protein | 1.9, 1.8 | 2.3, 1.9 | 1.4, 1.6 | < 0.001 |
| Urine protein | 2 [0, 3] | 2 [0, 5] | 1 [0, 3] |
Mean, standard deviation, Value, %
* Continuous value of urine protein test by dipstick
** Semi-quantity test of urine protein test by dipstick 50% [25%, 75%]
0; -, 1; ±, 2; +, 3; ++, 4; +++, 5; ++++
RD; rapid decline, AKI; acute kidney injury, SBP; systolic blood pressure, DBP; diastolic blood pressure, RASI; renin angiotensin system inhibitor, eGFR; estimated glomerular filtration rate, BUN; blood urea nitrogen, T-C; total cholesterol, TG; triglyceride
Fig 3Receiver operating characteristic curve for prediction of the RD.
A. The Pattern 1 (the LR model). B. The Pattern 2 (the LR model). C. The Pattern 3 (the LR model). D. The Pattern 1 (the RF model). E. The Pattern 2 (the RF model). F. The Pattern 3 (the RF model).
Comparison of AUC by models.
| Model | Pattern | AUC |
|---|---|---|
| Logistic regression model | 1 | 0.67 |
| 2 | 0.69 | |
| 3 | 0.71 | |
| Random forest model | 1 | 0.68 |
| 2 | 0.71 | |
| 3 | 0.73 |
Each feature includes: comorbidity of diabetes, history of AKI, SBP, DBP, use of RASIs, urine protein, hemoglobin, serum uric acid, BUN, serum total cholesterol, serum triglyceride
1; at baseline (at start point of rapid eGFR decline)
2; at baseline, average and standard deviation of features during 180 days prior to the baseline, and 7-day exponentially smoothed average of features
3; at baseline, average and standard deviation of features during 90, 180, and 360 days prior to the baseline, and 7-day exponentially smoothed average of features
AUC; area under curve, AKI; acute kidney injury, SBP; systolic blood pressure, DBP; diastolic blood pressure, RASI; renin angiotensin system inhibitor, BUN; blood urea nitrogen, eGFR; estimated glomerular filtration rate
Ranking of 10 top logistic regression and random forest model features.
| Rank | Logistic regression | Random forest | ||
|---|---|---|---|---|
| Features | 2 | 3 | 2 | 3 |
| 1 | hemoglobin (7-day ESA) | urine protein (7-day ESA) | urine protein (7-day ESA) | hemoglobin (90 SD) |
| 2 | urine protein (7-day ESA) | hemoglobin (7-day ESA) | hemoglobin (180 SD) | urine protein (7-day ESA) |
| 3 | hemoglobin (180 mean) | SBP (7-day ESA) | urine protein (180 mean) | urine protein (180 mean) |
| 4 | total cholesterol (baseline) | hemoglobin (90 SD) | urine protein (baseline) | urine protein (360 mean) |
| 5 | hemoglobin (180 SD) | total cholesterol (baseline) | uric acid (180 SD) | urine protein (90 mean) |
| 6 | SBP (7-day ESA) | total cholesterol (7-day ESA) | uric acid (7-day ESA) | hemoglobin (180 SD) |
| 7 | total cholesterol (7-day ESA) | hemoglobin (360 mean) | uric acid (180 mean) | urine protein (baseline) |
| 8 | SBP (180 mean) | hemoglobin (180 mean) | total cholesterol (baseline) | hemoglobin (360 SD) |
| 9 | urine protein (180 mean) | hemoglobin (90 mean) | BUN (baseline) | total cholesterol (90 SD) |
| 10 | hemoglobin (baseline) | uric acid (90 SD) | SBP (baseline) | uric acid (90 SD) |
Features
2; at baseline, average and standard deviation of features during 180 days prior to the baseline, and 7-day ESA of features
3; at baseline, average and standard deviation of features during 90, 180, and 360 days prior to the baseline, and 7-day ESA of features
ESA; exponentially smoothed average, SBP; systolic blood pressure, SD; standard deviation, BUN; blood urea nitrogen