| Literature DB >> 35046014 |
Angier Allen1, Zohora Iqbal1, Abigail Green-Saxena1, Myrna Hurtado2, Jana Hoffman1, Qingqing Mao1, Ritankar Das1.
Abstract
INTRODUCTION: Diabetic kidney disease (DKD) accounts for the majority of increased risk of mortality for patients with diabetes, and eventually manifests in approximately half of those patients diagnosed with type 2 diabetes mellitus (T2DM). Although increased screening frequency can avoid delayed diagnoses, this is not uniformly implemented. The purpose of this study was to develop and retrospectively validate a machine learning algorithm (MLA) that predicts stages of DKD within 5 years upon diagnosis of T2DM. RESEARCH DESIGN AND METHODS: Two MLAs were trained to predict stages of DKD severity, and compared with the Centers for Disease Control and Prevention (CDC) risk score to evaluate performance. The models were validated on a hold-out test set as well as an external dataset sourced from separate facilities.Entities:
Keywords: algorithms; decision support techniques; diabetes mellitus; kidney diseases; type 2
Mesh:
Year: 2022 PMID: 35046014 PMCID: PMC8772425 DOI: 10.1136/bmjdrc-2021-002560
Source DB: PubMed Journal: BMJ Open Diabetes Res Care ISSN: 2052-4897
Figure 1Patient inclusion diagram. Hold-out test set and external validation set both consist of patients who are not seen during training and validation of the MLAs. The external validation set consists only of patients from clinical sites that are not used in training, validation and hold-out test sets. MLAs, machine learning algorithms; T2DM, type 2 diabetes mellitus.
Measurements used as inputs for machine learning algorithms (MLAs) and for calculating CDC risk score.
| Measurements used as inputs to MLA | Measurements used as inputs to CDC risk score | |
| Demographics | Age | Age |
| Sex | Sex | |
| Clinical measurements | BMI | None |
| Blood pressure (systolic and diastolic) | ||
| Laboratory values | Blood urea nitrogen | None |
| Creatinine | ||
| eGFR | ||
| Cholesterol (HDL and LDL) | ||
| White cell count | ||
| Medical history | Presence of past acute kidney injury | Cardiovascular disease |
| History of chronic heart failure | Congestive heart failure | |
| Reported smoking history | Peripheral vascular disease | |
| Reported alcohol history | Proteinuria | |
Demographics (age, sex), clinical measurements (BMI, blood pressure (systolic and diastolic)), laboratory values (blood urea nitrogen, creatinine and eGFR, cholesterol (high-density lipoprotein and low-density lipoprotein), white cell count), and medical history (presence of past acute kidney injury, history of chronic heart failure, reported smoking history, reported alcohol history) served as input features for the MLA models. The clinical and laboratory measurement values were pooled using 5th and 95th percentiles, median, and last available result over 1 year prior to T2DM diagnosis.
BMI, body mass index; CDC, Centers for Disease Control and Prevention; eGFR, estimated glomerular filtration rate; HDL, high-density lipoprotein; LDL, low-density lipoprotein; T2DM, type 2 diabetes mellitus.
Figure 2Area under the receiver operating characteristic curve (AUROC) plots of machine learning models random forest (RF) and gradient boosted tree (XGB), and Centers for Disease Control and Prevention (CDC) CKD scoring system for (A) hold-out dataset and (B) external validation dataset for prediction of DKD stages 3–5 in the 5 years following T2DM diagnosis. A random classifier was used as the baseline. CKD, chronic kidney disease; DKD, diabetic kidney disease; T2DM, type 2 diabetes mellitus.
Results on hold-out test set
| XGB | RF | CDC | |
| Any-stage DKD | |||
| AUROC | 0.750 | 0.748 | 0.634 |
| Sensitivity | 0.700 | 0.700 | 0.633 |
| Specificity | 0.670 | 0.662 | 0.560 |
| LR+ | 2.120 | 2.071 | 1.440 |
| LR− | 0.447 | 0.453 | 0.655 |
| DOR | 4.738 | 4.575 | 2.197 |
| DKD stages 3–5 | |||
| AUROC | 0.825 | 0.823 | 0.672 |
| Sensitivity | 0.750 | 0.750 | 0.637 |
| Specificity | 0.742 | 0.739 | 0.614 |
| LR+ | 2.906 | 2.870 | 1.652 |
| LR− | 0.336 | 0.338 | 0.591 |
| DOR | 8.638 | 8.492 | 2.794 |
| DKD stages 4–5 | |||
| AUROC | 0.830 | 0.821 | 0.617 |
| Sensitivity | 0.751 | 0.751 | 0.581 |
| Specificity | 0.739 | 0.712 | 0.581 |
| LR+ | 2.876 | 2.606 | 1.387 |
| LR− | 0.337 | 0.349 | 0.721 |
| DOR | 8.544 | 7.461 | 1.923 |
Comparison of XGB, RF and the CDC DKD performance includes AUROC, sensitivity, specificity, LR+ and LR‒, and DOR. Prediction of DKD within 5 years following T2DM diagnosis is divided into any-stage DKD, DKD stages 3–5 and DKD stages 4–5.
AUROC, area under the receiver operating characteristic; CDC, Centers for Disease Control and Prevention; DKD, diabetic kidney disease; DOR, diagnostic OR; LR+, positive likelihood ratio; LR−, negative likelihood ratio; RF, random forest; T2DM, type 2 diabetes mellitus; XGB, gradient boosted tree.
Results on external validation set
| XGB | RF | CDC | |
| Any-stage DKD | |||
| AUROC | 0.769 | 0.769 | 0.643 |
| Sensitivity | 0.761 | 0.758 | 0.651 |
| Specificity | 0.622 | 0.619 | 0.573 |
| LR+ | 2.015 | 1.989 | 1.522 |
| LR− | 0.384 | 0.391 | 0.610 |
| DOR | 5.251 | 5.089 | 2.496 |
| DKD stages 3–5 | |||
| AUROC | 0.831 | 0.832 | 0.676 |
| Sensitivity | 0.807 | 0.804 | 0.640 |
| Specificity | 0.690 | 0.692 | 0.623 |
| LR+ | 2.605 | 2.608 | 1.697 |
| LR− | 0.279 | 0.283 | 0.578 |
| DOR | 9.322 | 9.215 | 2.937 |
| DKD stages 4–5 | |||
| AUROC | 0.826 | 0.827 | 0.620 |
| Sensitivity | 0.819 | 0.826 | 0.608 |
| Specificity | 0.664 | 0.643 | 0.576 |
| LR+ | 2.438 | 2.313 | 1.436 |
| LR− | 0.273 | 0.270 | 0.680 |
| DOR | 8.933 | 8.555 | 2.111 |
Comparison of XGB, RF and the CDC DKD performance includes AUROC, sensitivity, specificity, LR+ and LR‒, DOR, and threshold. Prediction of DKD within 5 years following T2DM diagnosis is divided into any-stage DKD, DKD stages 3–5 and DKD stages 4–5.
AUROC, area under the receiver operating characteristic; CDC, Centers for Disease Control and Prevention; DKD, diabetic kidney disease; DOR, diagnostic OR; LR−, negative likelihood ratio; LR+, positive likelihood ratio; RF, random forest; T2DM, type 2 diabetes mellitus; XGB, gradient boosted tree.