| Literature DB >> 26466993 |
Alexander Decruyenaere1, Philippe Decruyenaere2, Patrick Peeters2, Frank Vermassen3, Tom Dhaene4, Ivo Couckuyt4.
Abstract
BACKGROUND: Predictive models for delayed graft function (DGF) after kidney transplantation are usually developed using logistic regression. We want to evaluate the value of machine learning methods in the prediction of DGF.Entities:
Mesh:
Year: 2015 PMID: 26466993 PMCID: PMC4607098 DOI: 10.1186/s12911-015-0206-y
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Optimal hyper-parameters after exhaustive grid search
| Statistical method | Hyper-parameter | Values |
|---|---|---|
| Decision tree | Class weights | auto, 0 to 0.20 and 1 to 0.80, 0 to 0.10 and 1 to 0.90, 0 to 0.05 and 1 to 0.95 |
| Maximum depth | 1 to 10 (8) | |
| Minimum samples split | 2 to nVars+1 (18) | |
| Maximum features | auto, sqrt, log2 | |
| Random forest | Number of estimators | 1000 |
| Class weights | auto, 0 to 0.20 and 1 to 0.80, 0 to 0.10 and 1 to 0.90, 0 to 0.05 and 1 to 0.95 | |
| Maximum depth | 1 to 10 (9) | |
| Minimum samples split | 2 to nVars+1 (24) | |
| Maximum features | auto, sqrt, log2 | |
| Random forest (full) | Number of estimators | 1000 |
| Class weights | auto, 0 to 0.20 and 1 to 0.80, 0 to 0.10 and 1 to 0.90, 0 to 0.05 and 1 to 0.95 | |
| Maximum depth | 1 to 10 (1) | |
| Minimum samples split | 2 to nVars+1 (63) | |
| Maximum features | auto, sqrt, log2 | |
| Gradient boosting | Number of estimators | 1000 |
| Maximum depth | 1 to 10 (1) | |
| Minimum samples split | 2 to nVars+1 (9) | |
| Maximum features | auto, sqrt, log2 | |
| Learning rate | 0.1, 0.05, 0.02, 0.01 | |
| LDA | Number of components | None or 1 to nVars +1 |
| QDA | Regularizing parameter | 0 to 1 (0.89) |
| Linear SVM | Class weights | auto, 0 to 0.20 and 1 to 0.80, 0 to 0.10 and 1 to 0.90, 0 to 0.05 and 1 to 0.95 |
| C | 0.001, 0.01, 0.1, 1, 10, 100, 1000 | |
| Radial SVM | Class weights | auto, 0 to 0.20 and 1 to 0.80, 0 to 0.10 and 1 to 0.90, 0 to 0.05 and 1 to 0.95 |
| C | 0.001, 0.01, 0.1,1, 10, 100, 1000 | |
| Gamma | 0.1, 0.01, 0.001, 0.0001 | |
| Polynomial SVM | Class weights | auto, 0 to 0.20 and 1 to 0.80, 0 to 0.10 and 1 to 0.90, 0 to 0.05 and 1 to 0.95 |
| C | 0.001, 0.01, 0.1,1, 10, 100, 1000 | |
| Gamma | 0.1, 0.01, 0.001, 0.0001 | |
| Logistic regression | Class weights | auto, 0 to 0.20 and 1 to 0.80, 0 to 0.10 and 1 to 0.90, 0 to 0.05 and 1 to 0.95 |
| C | 0.001, 0.01, 0.1,1, 10, 100, 1000 |
The hyper-parameters that are not described in this table are set to the default values used in the scikit-learn library [27]
Abbreviations: LDA linear discriminant analysis, QDA quadratic discriminant analysis, SVM support vector machine
Baseline characteristics (n = 497)
| Donor | |
| Sex | |
| male | 60.4 % (300) |
| female | 39.6 % (197) |
| Subtype | |
| DBD | 90.3 % (449) |
| DCD | 9.7 % (48) |
| Age (year) | 42.6 ± 14.77 |
| Terminal SCr (mg/dL) | 0.878 ± 0.4757 |
| Preservation/Operation | |
| Preservation solution | |
| HTK | 31.0 % (154) |
| HTK + UW | 0.2 % (1) |
| UW | 68.6 % (341) |
| missing | 0.2 % (1) |
| CIT (hour) | 14.19 ± 4.328 |
| WIT (min) | 22.3 ± 7.09 |
| Recipient | |
| Sex | |
| male | 66.6 % (331) |
| female | 33.4 % (166) |
| Modality of dialysis | |
| hemodialysis | 71.2 % (354) |
| peritoneal dialysis | 22.7 % (113) |
| pre-emptive | 6.0 % (30) |
| HLA mismatches | |
| 0 | 8.9 % (44) |
| 1 | 7.8 % (39) |
| 2 | 26.4 % (131) |
| 3 | 40.8 % (203) |
| 4 | 10.9 % (54) |
| 5 | 4.0 % (20) |
| 6 | 1.2 % (6) |
| Age (year) | 52.8 ± 11.68 |
| Duration of dialysis (year) | 2.7 ± 1.68 |
| PRA at time of Tx (%) | 2.7 ± 11.44 |
Abbreviations: CIT cold ischemia time, DBD donor after brain death, DCD donor after cardiac/circulatory death, HLA human leukocyte antigen, HTK histidine-tryptophan-ketoglutarate, PRA panel reactive antibody, SCr serum creatinine, Tx transplantation, UW University of Wisconsin, WIT warm ischemia time
Performance of the statistical methods after 10-fold stratified cross-validation
| Statistical method | Sensitivity (%) | PPV (%) | AUROC (%) | ||
|---|---|---|---|---|---|
| No DGF | DGF | No DGF | DGF | ||
| Decision tree | 75.4 ± 6.64 | 29.5 ± 16.29 | 88.2 ± 2.73 | 14.2 ± 8.13 | 52.5 ± 8.55 |
| Gradient boosting | 98.8 ± 1.55 | 16.2 ± 12.94 | 89.2 ± 1.67 | 58.3 ± 38.19 | 77.2 ± 9.64 |
| Random forest | 96.3 ± 4.05 | 16.4 ± 14.92 | 89.0 ± 2.09 | 43.9 ± 38.19 | 73.9 ± 9.94 |
| Random forest (full) | 100.0 ± 0.00 | 0.0 ± 0.00 | 87.5 ± 0.64 | 0.0 ± 0.00 | 71.6 ± 12.38 |
| LDA | 94.7 ± 2.92 | 27.6 ± 15.10 | 90.2 ± 2.00 | 42.3 ± 19.94 | 82.2 ± 6.14 |
| QDA | 89.9 ± 5.35 | 37.6 ± 17.26 | 91.0 ± 2.55 | 37.9 ± 20.82 | 79.6 ± 7.55 |
| Linear SVM | 72.0 ± 6.29 | 83.8 ± 7.51 | 96.9 ± 1.34 | 30.6 ± 5.60 | 84.3 ± 4.11 |
| Radial SVM | 57.9 ± 7.45 | 88.8 ± 7.38 | 97.2 ± 1.87 | 23.6 ± 4.14 | 83.3 ± 4.05 |
| Polynomial SVM | 97.5 ± 1.90 | 10.9 ± 12.20 | 88.5 ± 1.14 | 24.0 ± 24.17 | 79.8 ± 5.33 |
| Logistic regression | 65.0 ± 8.25 | 85.5 ± 8.94 | 96.9 ± 1.84 | 26.5 ± 4.75 | 81.7 ± 5.82 |
Abbreviations: AUROC area under the receiver operating characteristic curve, DGF delayed graft function, LDA linear discriminant analysis, PPV positive predictive value, QDA quadratic discriminant analysis, SVM support vector machine
Fig. 1Receiver operating characteristic curves after 10-fold stratified cross-validation. Abbreviations: AUROC area under the receiver operating characteristic curve, LDA linear discriminant analysis, QDA quadratic discriminant analysis, SVM support vector machine
Fig. 2P-values (%) of pairwise model comparison using Wilcoxon signed-rank test. Abbreviations: LDA linear discriminant analysis, QDA quadratic discriminant analysis, SVM support vector machine
Weights of the selected features
| Feature | Odds ratio (LR)a | Z-score (linear SVM)a | Gini index (RF)b | |
|---|---|---|---|---|
| Donor | ||||
| Age (per 1 year) | 1.060 | 0.744 | 0.037 | (#9) |
| BMI (per 1 kg/m2) | 0.751 | −1.700 | 0.023 | (#20) |
| Terminal SCr (per 1 mg/dL) | 6.512 | 1.126 | 0.024 | (#17.5) |
| Hypotensive episodes: yes | 1.784 | 0.165 | 0.001 | (#48.5) |
| Diabetes mellitus: yes | 0.013 | −1.041 | 0.001 | (#48.5) |
| History of hypertension: yes | 3.585 | 0.940 | 0.011 | (#28) |
| Donor after cardiac death: yes | 25.789 | 1.534 | 0.080 | (#1) |
| Preservation/Operation | ||||
| Machine perfusion: yes | 0.003 | −1.078 | 0.000 | (#60) |
| Perioperative graft reperfusionc | 0.740 | −0.844 | 0.027 | (#14.5) |
| Preservation solution | ||||
| HTK + UW | 0.00005 | −0.510 | 0.000 | (#60) |
| UW | 0.080 | −1.557 | 0.016 | (#25) |
| HTK | 0.050 | −1.725 | 0.007 | (#32.5) |
| Male donor-to-female recipient: yes | 0.352 | −0.750 | 0.019 | (#23) |
| Recipient | ||||
| BMI (per 1 kg/m2) | 1.144 | 0.941 | 0.054 | (#4) |
| Duration of dialysis (per 1 day) | 1.0005 | 0.324 | 0.057 | (#3) |
| PRA at time of Tx (per 1 %) | 0.977 | −0.557 | 0.008 | (#30.5) |
| Peak PRA (per 1 %) | 1.017 | 0.585 | 0.025 | (#16) |
| Acute CNI toxicity: yes | 22.044 | 0.964 | 0.007 | (#32.5) |
| Reduced cardiac function: yes | 5.570 | 0.897 | 0.033 | (#13) |
| Impaired ECV: yes | 0.003 | −1.141 | 0.000 | (#60) |
| Urinary tract obstruction: yes | 6.638 | 0.942 | 0.004 | (#38.5) |
| Iliac artery | ||||
| normal | 1.520 | 0.221 | 0.001 | (#48.5) |
| atheromatosis | 2.389 | 0.573 | 0.006 | (#34.5) |
| stenosis | 28.465 | 0.948 | 0.037 | (#9) |
aFitted on the reduced data set
bFitted on the full data set. Tied rank amongst all 68 features is given in parentheses
cPerioperative graft reperfusion is an ordinal feature (poor – patchy – moderate – good)
Abbreviations: BMI body mass index, CNI calcineurin inhibitor toxicity, ECV effective circulating volume, HTK histidine-tryptophan-ketoglutarate, LR logistic regression, PRA panel reactive antibody, RF random forest, SCr serum creatinine, SVM support vector machine, Tx transplantation, UW University of Wisconsin