| Literature DB >> 36008537 |
Yiannis Kokkinos1, JoAnn Morrison2, Richard Bradley3, Theodoros Panagiotakos1, Jennifer Ogeer4, Dennis Chew5, Ciaran O'Flynn3, Geert De Meyer3, Phillip Watson6, Ilias Tagkopoulos1,7.
Abstract
The aim of this study was to derive a model to predict the risk of dogs developing chronic kidney disease (CKD) using data from electronic health records (EHR) collected during routine veterinary practice. Data from 57,402 dogs were included in the study. Two thirds of the EHRs were used to build the model, which included feature selection and identification of the optimal neural network type and architecture. The remaining unseen EHRs were used to evaluate model performance. The final model was a recurrent neural network with 6 features (creatinine, blood urea nitrogen, urine specific gravity, urine protein, weight, age). Identifying CKD at the time of diagnosis, the model displayed a sensitivity of 91.4% and a specificity of 97.2%. When predicting future risk of CKD, model sensitivity was 68.8% at 1 year, and 44.8% 2 years before diagnosis. Positive predictive value (PPV) varied between 15 and 23% and was influenced by the age of the patient, while the negative predictive value remained above 99% under all tested conditions. While the modest PPV limits its use as a stand-alone diagnostic screening tool, high specificity and NPV make the model particularly effective at identifying patients that will not go on to develop CKD.Entities:
Mesh:
Substances:
Year: 2022 PMID: 36008537 PMCID: PMC9411602 DOI: 10.1038/s41598-022-18793-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Schematic representation of EHR processing. Starting from the EHR subset with sufficient information for classification (‘Included EHR’) EHR are classified into 3 CKD status groups based on clinical diagnosis, risk and other pointers for CKD in the record. Subsequently records with sufficient visits and blood/urine data prior to the evaluation visit are selected. Finally the EHR are split randomly in training and test sets. For the ‘CKD probable’ status group the training set is not shown as it was not used in the study. EHR, electronic health record.
Figure 2Schematic representation of CKD status assignment, EHR data use, and reference time T0 scaling for 3 hypothetical dog EHR profiles. CKD, chronic kidney disease; EHR, electronic health record.
Demographics and summaries for the study data set at the time of evaluation (T0).
| Dataset | Training | Training | Test | Test |
|---|---|---|---|---|
| Diagnosis group | CKD | No CKD | CKD | No CKD |
| Number of dogs | 15,299 | 22,944 | 7515 | 11,644 |
| Mean visits per dog | 15.18 | 11.99 | 15.22 | 11.91 |
| Male to female ratio | 1:1.1 | 1:0.93 | 1:1.1 | 1:0.94 |
| Mean (SD) age (years) at T0 | 11.56 (3.37) | 7.19 (2.93) | 11.59 (3.37) | 7.12 (2.90) |
| Mean (SD) weight (kg) at T0 | 13.04 (11.20) | 15.10 (12.42) | 13.34 (11.32) | 14.81 (12.11) |
| Mean (SD) BUN (mg/dL) at T0 | 56.23 (32.72) | 17.38 (5.59) | 55.86 (32.80) | 17.50 (5.78) |
| Mean (SD) creatinine (mg/dL) at T0 | 2.67 (1.86) | 1.09 (0.29) | 2.67 (1.88) | 1.09 (0.27) |
| Mean (SD) urine protein (mg/dL) at T0 | 91.01 (180.81) | 49.52 (136.21) | 94.29 (190.89) | 48.61 (146.51) |
| Mean (SD) USG at T0 | 1.020 (0.011) | 1.039 (0.012) | 1.020 (0.011) | 1.039 (0.012) |
| Percent missing creatinine | 11.32% | 7.26% | 10.91% | 7.72% |
| Percent missing USG | 57.52% | 60.83% | 57.58% | 60.89% |
Mean and Standard Deviation (SD) are shown for continuous measures. USG, urine-specific gravity; BUN, blood urea nitrogen.
Figure 3Distribution of age, creatinine, BUN and USG at the time of evaluation (T0) in the study EHR set differentiated by CKD status. Dotted lines on the creatinine line and USG plots show limits used for staging in the IRIS guidelines. CKD, chronic kidney disease; IRIS, International Renal Interest Society; USG, urine-specific gravity; BUN, blood urea nitrogen.
Figure 4Model sensitivity and specificity with 95% confidence intervals as a function of age at evaluation.
CKD prevalence, model sensitivity, specificity, positive predictive value and negative predictive value estimates for CKD prediction at the time of diagnosis for different life stage groups.
| Life stage | Sensitivity (%) | Specificity (%) | Prevalence (%) | PPV (%) | NPV(%) |
|---|---|---|---|---|---|
| Adult | 62.42 | 99.69 | 0.15 | 22.86 | 99.94 |
| Mature | 82.40 | 97.99 | 0.48 | 16.57 | 99.91 |
| Senior | 94.99 | 92.88 | 1.46 | 16.47 | 99.92 |
| Geriatric | 98.90 | 69.52 | 5.21 | 15.14 | 99.91 |
Adult is defined as 1.5 to 6.5 years, mature as 6.5 to 9.75 years and under, senior 9.75 to 13 years, and geriatric as over 13 years. Disease prevalence is estimated from the life stage groups within the study population and model performance determined using the test data set.
Figure 5Model sensitivity at T0 with 95% confidence intervals as a function of the number of visits in the EHR before the time of diagnosis T0 (a) and model sensitivity with 95% confidence intervals as a function of the time before diagnosis where the prediction was made only with the data up to that point (b).
Model predictive performance metrics.
| Years before T0 | Sensitivity | Specificity | Accuracy | PPV | NPV |
|---|---|---|---|---|---|
| 0 | 91.42 | 97.16 | 93.04 | 21.22 | 99.93 |
| 0.5 | 80.22 | 97.63 | 90.34 | 22.07 | 99.83 |
| 1 | 68.79 | 98.18 | 84.83 | 24.02 | 99.73 |
| 1.5 | 55.00 | 98.51 | 78.79 | 23.59 | 99.62 |
| 2 | 44.76 | 98.95 | 73.03 | 26.29 | 99.54 |
| 2.5 | 35.64 | 99.36 | 68.62 | 31.78 | 99.46 |
| 3 | 26.48 | 99.49 | 61.81 | 30.28 | 99.39 |
| 3.5 | 23.20 | 99.65 | 59.85 | 35.67 | 99.36 |
For all PPV and NPV calculation the prevalence was fixed at 0.83% (overall data set prevalence).