| Literature DB >> 29888059 |
Ross S Kleiman1,2, Eric R LaRose1, Jonathan C Badger1,3, David Page2,3, Michael D Caldwell4, James A Clay5, Peggy L Peissig1.
Abstract
Calciphylaxis is a disorder that results in necrotic cutaneous lesions with a high rate of mortality. Due to its rarity and complexity, the risk factors for and the disease mechanism of calciphylaxis are not fully understood. This work focuses on the use of machine learning to both predict disease risk and model the contributing factors learned from an electronic health record data set. We present the results of four modeling approaches on several subpopulations of patients with chronic kidney disease (CKD). We find that modeling calciphylaxis risk with random forests learned from binary feature data produces strong models, and in the case of predicting calciphylaxis development among stage 4 CKD patients, we achieve an AUC-ROC of 0.8718. This ability to successfully predict calciphylaxis may provide an excellent opportunity for clinical translation of the predictive models presented in this paper.Entities:
Year: 2018 PMID: 29888059 PMCID: PMC5961821
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc
Description of the five experimental patient populations. For each population, we describe the ICD-9 and ICD-10 diagnosis codes associated. Moreover, we present the number of case and control patients for each population and the total number of non-zero features on their records that were included in the model.
| Any Stage | Stage 3 | Stage 4 | Stage 5 | ESRD | |
|---|---|---|---|---|---|
| ICD-9 Code(s) | 585-585.9 | 585.3 | 585.4 | 585.5 | 585.6 |
| ICD-10 Code(s) | N18-N18.9 | N18.3 | N18.4 | N18.5 | N18.6 |
| # Features | 9,288 | 5,037 | 6,662 | 5,864 | 6,974 |
| # Cases | 38 | 10 | 15 | 12 | 17 |
| # Controls | 363 | 100 | 148 | 117 | 165 |
| 401 | 110 | 163 | 129 | 182 |
AUC-ROC, AUC-PR, Accuracy, and Precision @ Recall = 80% values for each experimental condition and algorithm. For each experimental condition, the algorithm that performed best is highlighted in bold if it is statistically significantly better than the other three algorithms as determined by a sign test. Note that just binary random forests achieved statistical significance and did so in two experimental conditions. For CKD stage 4 binary random forests outperformed its competitors with p ≤ 3.78e – 04. For ESRD stage 4 binary random forests outperformed its competitors with p ≤ 1.92e – 04.
| Stage | Model | Features | AUC-ROC | AUC-PR | Accuracy | Pre. @ 80% |
|---|---|---|---|---|---|---|
| 3 | Logistic Regression | Binary | 0.789 | 0.189 | 0.545 | 0.190 |
| Continuous | 0.785 | 0.215 | 0.545 | 0.164 | ||
| Random Forest | Binary | 0.846 | 0.261 | 0.545 | 0.212 | |
| Continuous | 0.847 | 0.279 | 0.545 | 0.230 | ||
| 4 | Logistic Regression | Binary | 0.791 | 0.190 | 0.515 | 0.188 |
| Continuous | 0.635 | 0.125 | 0.528 | 0.131 | ||
| Random Forest | ||||||
| Continuous | 0.791 | 0.200 | 0.546 | 0.194 | ||
| 5 | Logistic Regression | Binary | 0.707 | 0.194 | 0.543 | 0.164 |
| Continuous | 0.689 | 0.169 | 0.532 | 0.195 | ||
| Random Forest | Binary | 0.796 | 0.212 | 0.535 | 0.248 | |
| Continuous | 0.770 | 0.184 | 0.535 | 0.213 | ||
| ESRD | Logistic Regression | Binary | 0.746 | 0.168 | 0.538 | 0.168 |
| Continuous | 0.750 | 0.198 | 0.549 | 0.170 | ||
| Random Forest | ||||||
| Continuous | 0.838 | 0.425 | 0.538 | 0.244 | ||
| Any | Logistic Regression | Binary | 0.713 | 0.188 | 0.534 | 0.156 |
| Continuous | 0.734 | 0.162 | 0.541 | 0.173 | ||
| Random Forest | Binary | 0.738 | 0.199 | 0.536 | 0.151 | |
| Continuous | 0.725 | 0.185 | 0.536 | 0.156 |
Figure 1.ROC- and PR-Curves for binary feature random forest across the five experimental conditions.
Ranking of 10 top random forest model features for each of the 5 CKD experiments: any CKD stage, stage 3 CKD, stage 4 CKD, stage 5 CKD, and ESRD. For each experiment, feature importance values were averaged first across each of the k-folds for cross-validation, and then those values were averaged across the 30 repetitions completed to produce a final importance value for each feature.
| Rank | Any Stage | Stage 3 | Stage 4 | Stage 5 | ESRD |
|---|---|---|---|---|---|
| 1 | Hepatitis B Surface Ag | Obesity | Anemia in CKD | Obesity | Thyroxine (T4) |
| 2 | Anemia inCKD | Lactescence/Chylomicrons | Morbid Obesity | Morbid Obesity | Obesity |
| 3 | Secondary Hyper-parathyroidismof Renal Origin | Chylomicrons | Thyroid Stimulating Hormone-Reg’l C | Parathyroid Hormone (PTH),1-84 | Amylase-Pancreatic |
| 4 | Direct Microscopy | ALT (GPT) | Obesity | Radiologic exam knee complete 4/more views | Age |
| 5 | Ulcer of Lower Limb, Unspecified | Differential Polychromatophili | Chylomicrons | Instrument Neutrophil # | Frac.O2Hb, Arterial |
| 6 | Iron Defic Anemia Nos | Differential Poikilocytosis | Lactescence/Chylomicrons | Age | Arthropathy, unspecified |
| 7 | Hepatitis B Surface (HBs) Ab | Uric Acid, Blood | Thyroxine (T4) | Uric Acid, Bld | Prothrombin Time(PT) |
| 8 | % O2 Saturation | Differential Activated Lymph | Secondary Hyper-parathyroidismof Renal Origin | Non-HDL Cholesterol | Abdominal Pain |
| 9 | Differential Poikilocytosis | Sodium serum plasma or whole blood | Prescription transmit via erxsystem | Prescription transmit via erxsystem | Chronic Liver DiseaseNec |
| 10 | Blood Urea Nirtrogen-Post-Dial | Platelet Estimate | Hypercholesterolemia | Skin Suture Nec | Blood count complete automated |