| Literature DB >> 32932685 |
Christoph Weber1, Lena Röschke1, Luise Modersohn2, Christina Lohr2, Tobias Kolditz2, Udo Hahn2, Danny Ammon3, Boris Betz1, Michael Kiehntopf1.
Abstract
Automated identification of advanced chronic kidney disease (CKD ≥ III) and of no known kidney disease (NKD) can support both clinicians and researchers. We hypothesized that identification of CKD and NKD can be improved, by combining information from different electronic health record (EHR) resources, comprising laboratory values, discharge summaries and ICD-10 billing codes, compared to using each component alone. We included EHRs from 785 elderly multimorbid patients, hospitalized between 2010 and 2015, that were divided into a training and a test (n = 156) dataset. We used both the area under the receiver operating characteristic (AUROC) and under the precision-recall curve (AUCPR) with a 95% confidence interval for evaluation of different classification models. In the test dataset, the combination of EHR components as a simple classifier identified CKD ≥ III (AUROC 0.96[0.93-0.98]) and NKD (AUROC 0.94[0.91-0.97]) better than laboratory values (AUROC CKD 0.85[0.79-0.90], NKD 0.91[0.87-0.94]), discharge summaries (AUROC CKD 0.87[0.82-0.92], NKD 0.84[0.79-0.89]) or ICD-10 billing codes (AUROC CKD 0.85[0.80-0.91], NKD 0.77[0.72-0.83]) alone. Logistic regression and machine learning models improved recognition of CKD ≥ III compared to the simple classifier if only laboratory values were used (AUROC 0.96[0.92-0.99] vs. 0.86[0.81-0.91], p < 0.05) and improved recognition of NKD if information from previous hospital stays was used (AUROC 0.99[0.98-1.00] vs. 0.95[0.92-0.97]], p < 0.05). Depending on the availability of data, correct automated identification of CKD ≥ III and NKD from EHRs can be improved by generating classification models based on the combination of different EHR components.Entities:
Keywords: ICD-10 billing codes; area under the precision-recall curve (AUCPR); area under the receiver operating characteristic (AUROC); artificial neural network (ANN), clinical natural language processing (clinical NLP); chronic kidney disease (CKD); discharge summaries; electronic health record (EHR); estimated glomerular filtration rate (eGFR); generalized linear model network (GLMnet); laboratory values; machine learning (ML); no known kidney disease (NKD); phenotyping; random forest (RF)
Year: 2020 PMID: 32932685 PMCID: PMC7563476 DOI: 10.3390/jcm9092955
Source DB: PubMed Journal: J Clin Med ISSN: 2077-0383 Impact factor: 4.241
Epidemiological Characteristics from all Individuals and from Individuals with CKD ≥ III or NKD Identified by the Reference Standard, Respectively.
| Characteristics | Cohort ( | CKD ≥ III ( | NKD ( |
|---|---|---|---|
| Age, years, mean [SD] | 74.6 | 77.9 | 68.4 |
| Sex, male | 476 | 215 | 79 |
| eGFR at admission, | ( | ( | 88.6 |
| ( | |||
| Charlson morbidity category ≥1 | 711 (95.3%) | 366 (98.1%) | 113 (87.6%) |
| ≥3 | 387 (49.3%) | 224 (60.1%) | 36 (27.9%) |
| Median | 2 | 3 | 2 |
| Myocardial infarction | 128 (16.3%) | 75 (20.1%) | 11 (8.5%) |
| Chronic heart failure | 419 (54.4) | 247 (66.2%) | 33 (25.6%) |
| Peripheral vascular disease | 131 (16.7%) | 75 (20.1%) | 17 (13.2%) |
| Cerebrovascular disease | 51 (6.5%) | 28 (7.5%) | 7 (5.4%) |
| Dementia | 31 (3.9%) | 18 (4.8%) | 4 (3.1%) |
| Chronic pulmonary disease | 183 (23.3%) | 73 (16.9%) | 23 (17.8%) |
| Rheumatic diseases | 13 (1.7%) | 4 (1.1%) | 3 (2.3%) |
| Peptic ulcer disease | 21 (2.7%) | 11 (2.9%) | 1 (0.8%) |
| Hemiplegia or paraplegia | 29 (3.7%) | 8 (2.1%) | 6 (4.7%) |
| Liver disease | 137 (17.5%) | 44 (11.8%) | 35 (25.1%) |
| Diabetes mellitus | 332 (42.3%) | 152 (40.7%) | 51 (39.5%) |
| Any malignancy | 137 (17.5%) | 32 (8.6%) | 38 (29.5%) |
| Hypertension | 567 (72.3%) | 270 (72.4%) | 93 (72.1%) |
| Major cause for admission | |||
| Infectious diseases | 58 (7.4%) | 28 (7.5%) | 6 (4.7%) |
| Oncology disorders | 119 (15.2%) | 30 (8.0%) | 34 (26.4%) |
| Cardiovascular | 315 (40.1%) | 192 (51.5%) | 40 (31.0%) |
| Diseases | |||
| Pulmonary diseases | 82 (10.4%) | 25 (6.7%) | 12 (9.3%) |
| Gastrointestinal | 118 (15.0%) | 35 (9.4%) | 27 (20.9%) |
| and liver diseases | |||
| Kidney diseases | 47 (6.0%) | 36 (9.7%) | 2 (1.6%) |
| other | 46 (5.9%) | 27 (7.2%) | 8 (6.2%) |
1 eGFR at admission could not be calculated for all individuals because creatinine was massively interfered with by bilirubin or hemoglobin at admission.
Figure 1Venn diagrams comparing identification of CKD ≥ III by laboratory results (eGFR values), discharge summaries or ICD -10 billing codes within all patients (a) and within patients with CKD ≥ III according to reference standard (b). (a) Numbers of patients from the study cohort with CKD recognized by laboratory results (eGFR values), discharge summaries or ICD-10 billing codes. (b) Numbers of patients from the study cohort with CKD correctly recognized by laboratory results (eGFR values), discharge summaries or ICD -10 billing codes. A total of 19 patients were recognized by neither of the three formal criteria, but by manual review only.
Epidemiological characteristics from patients with CKD identified by reference standard or recognized by laboratory results (eGFR values), discharge summaries or ICD-10 billing codes.
| Characteristics | Reference Standard ( | eGFR | Discharge Summaries ( | ICD-10 Billing Codes ( |
|---|---|---|---|---|
| Age, years, mean [SD] | 77.9 | 78.0 | 76.4 | 77.2 |
| Sex, male | 215 | 189 | 258 | 182 |
| eGFR at admission, | ( | 26.8 | ( | 25.7 |
| Charlson morbidity category ≥1 | 366 (98.1%) | 326 (97.9%) | 413 (98.1%) | 297 (99%) |
| ≥3 | 224 (60.1%) | 198 (59.5%) | 257 (61.1%) | 220 (73.3%) |
| Median | 3 | 3 | 3 | 3 |
1 eGFR could not be calculated for all individuals because creatinine was massively interfered with by bilirubin or hemoglobin at admission.
Figure 2Venn diagrams comparing identification of no known kidney disease (NKD) by laboratory results (eGFR values), discharge summaries or ICD -10 billing codes within all patients (a) and within patients with CKD ≥ III according to reference standard (b). (a) Numbers of patients from the study cohort with NKD recognized via the eHealth sources laboratory results (eGFR values), discharge summaries or ICD-10 billing codes. (b) Numbers of patients from the study cohort with NKD correctly recognized via laboratory results (eGFR values), discharge summaries or ICD-10 billing codes.
Epidemiological characteristics from patients with NKD identified by reference standard or recognized by sources laboratory results (eGFR values), discharge summaries or ICD-10 billing codes.
| Chracteristics | Reference Standard ( | eGFR ( | Discharge Summaries ( | ICD-10 Billing Codes ( |
|---|---|---|---|---|
| Age, years, mean [SD] | 68.4 | 69.3 | 72.9 | 73.3 |
| Sex, male | 79 | 161 | 196 | 265 |
| eGFR at admission, | 88.6 | 84.5 | 76.0 *,1 | 69.9 *,2 |
| Charlson morbidity score ≥1 | 113 (87.6%) | 232 (91.7%) | 308 (92.2%) | 403 (92.2%) |
| ≥3 | 36 (27.9%) | 91 (36.0%) | 116 (34.7%) | 145 (33.2%) |
| Median | 2 | 2 | 2 | 2 |
* eGFR could not be calculated for all individuals because creatinine was massively interfered with by bilirubin or hemoglobin at admission. 1 n = 331; 2 n = 434.
Performance of different rules for identification of patients with CKD compared to the reference standard.
| Category | Sensitivity | Specificity | PPV | NPV | AUROC | AUCPR (CI) |
|---|---|---|---|---|---|---|
| ICD-10 billing codes | 0.71 | 0.91 | 0.88 | 0.78 | 0.81 | 0.86 |
| Discharge summary | 0.86 | 0.76 | 0.76 | 0.86 | 0.81 | 0.84 |
| eGFR <60 mL/min/1.73 m2 | 0.81 | 0.92 | 0.91 | 0.84 | 0.87 | 0.90 |
| eGFR_at_admission | 0.96 | 0.75 | 0.77 | 0.95 | 0.85 | 0.88 |
| eGFR_at_discharge | 0.91 | 0.82 | 0.82 | 0.91 | 0.86 | 0.89 |
Performance of different rules for identification of patients with NKD compared to the reference standard.
| Category | Sensitivity | Specificity | PPV | NPV | AUROC | AUPR |
|---|---|---|---|---|---|---|
| ICD-10 billing codes | 0.99 | 0.53 | 0.29 | 1 | 0.76 | 0.64 |
| (0.74–0.78) | (0.56–0.73) | |||||
| Discharge summary | 0.98 | 0.68 | 0.38 | 1 | 0.83 | 0.68 |
| (0.81–0.86) | (0.60–0.76) | |||||
| eGFR ≥ 60 mL/min/1.73m2 | 1.00 | 0.82 | 0.52 | 1 | 0.91 | 0.75 |
| eGFR_at_admission | 1.00 | 0.71 | 0.41 | 1.00 | 0.86 | 0.70 |
| eGFR_at_discharge | 1.00 | 0.64 | 0.35 | 1.00 | 0.82 | 0.68 |
Figure 3Area under the receiver operating characteristic (AUROC) and under the precision-recall curve (AUCPR) for simple categorical classifiers based on combinations of EHR components for CKD ≥ III (a) and NKD (b) on the test dataset. eGFR values = “eGFR”, discharge summaries = “DS” and ICD-10 billing codes = “ICD”. For the complete list of all combinations, see Supplementary Materials. Logistic regression was calculated on the training dataset. Performance is calculated on the test dataset (n = 156). * Indicates p < 0.05 for difference in AUROC compared to eGFR.
Figure 4AUROC (a,c) and AUCPR (b,d) of the simple categorical classifier and of models calculated from logistic regression and the three ML methods for identification of CKD (a,b) and NKD (c,d) in different scenarios of data availability. (a) AUROC and (b) AUCPR for identification of CKD ≥ III; (c) AUROC and (d) AUCPR for identification of NKD. SC = simple categorical classifier, LR = logistic regression, GLMnet = generalized linear machine network, RF = random forest, NN = Artificial Neuronal Network. N = 156 patients (test dataset). Scenarios: (1) All data from the index hospital stay including laboratory values, demographics, ICD-billing codes and ICDs from discharge summaries; (2) laboratory values and demographics from the index hospital stay; (3) and (4) includes, in addition to (1) or (2), laboratory values from previous hospital stays, respectively. * Indicates p < 0.05 for difference in AUROC between SC and all other models.