| Literature DB >> 30962703 |
Tian Chen1, Pamela Brewster1, Katherine R Tuttle2, Lance D Dworkin1, William Henrich3, Barbara A Greco4, Michael Steffes5, Sheldon Tobe6, Kenneth Jamerson7, Karol Pencina8, Joseph M Massaro8, Ralph B D'Agostino8, Donald E Cutlip9, Timothy P Murphy10, Christopher J Cooper1, Joseph I Shapiro11.
Abstract
BACKGROUND: Data derived from the Cardiovascular Outcomes in Renal Atherosclerotic Lesions (CORAL) study were analyzed in an effort to employ machine learning methods to predict the composite endpoint described in the original study.Entities:
Keywords: cardiovascular disease; chronic kidney disease; glomerular filtration rate; hypertension; ischemic renal disease; renal artery stenosis
Year: 2019 PMID: 30962703 PMCID: PMC6433104 DOI: 10.2147/IJNRD.S194727
Source DB: PubMed Journal: Int J Nephrol Renovasc Dis ISSN: 1178-7058
ROC values achieved with training and testing sets
| Method | ROC (training), % | ROC (testing), % | ||||
|---|---|---|---|---|---|---|
| GLM | 62.8±1.3 | 62.7±3.7 | ||||
| SVM | 63.1±1.3 | 65.3±4.1 | ||||
| RPART | 52.4±1.5 | 53.0±1.2 | ||||
| nnet | 59.8±1.7 | 63.1±3.2 | ||||
| RF | 67.7±1.9 | 68.1±4.3 | ||||
| nnet | <0.01 | |||||
| RF | <0.01 | <0.01 | ||||
| RPART | <0.01 | <0.01 | <0.01 | |||
| SVM | NS | <0.01 | <0.01 | <0.01 | ||
| nnet | NS | |||||
| RF | <0.01 | <0.05 | ||||
| RPART | <0.01 | <0.01 | <0.01 | |||
| SVM | NS | NS | NS | <0.01 | ||
Notes: Results expressed as mean ± SD of n=10 trials with different seed values used to split CORAL data set into training and testing subsets. Statistical comparison of both training and testing ROC by ANOVA showed it to be highly significant. Comparison of group means using Holm–Sidak correction for multiple comparisons shown with significance reported as NS, P<0.05, and P<0.01 levels.
Abbreviations: CORAL, Cardiovascular Outcomes in Renal Atherosclerotic Lesions; GLM, generalized linear method; NS, nonsignificant; ROC, receiver operator curve; nnet, neural network; RF, random forest; RPART partition; SVM, support vector machine.
Figure 1Representative ROCs generated with different models with a seed of 2. Red is generalized linear, green the support vector machine, blue the decision tree, orange the neural network, and purple the random forest model.
Abbreviation: ROC, receiver operator curve.
Confusion matrices in different models
| Method | True neg (n) | False pos (n) | False neg (n) | True pos (n) | Sens (%) | Spec (%) | Acc (%) |
|---|---|---|---|---|---|---|---|
| GLM | 68 | 29 | 15 | 14 | 33 | 82 | 65 |
| SVM | 81 | 39 | 2 | 4 | 9 | 98 | 67 |
| RPART | 72 | 35 | 11 | 8 | 19 | 87 | 63 |
| nnet | 60 | 24 | 23 | 19 | 44 | 72 | 63 |
| RF | 80 | 30 | 3 | 13 | 30 | 96 | 74 |
Notes: Results selected from analysis performed with seed 2 chosen to generate training and testing sets. Sens refers to sensitivity at detecting a composite outcome (true pos/[true pos + false neg]). Spec refers to specificity at excluding a composite outcome (true neg/[true neg + false pos]), and Acc refers to the accuracy of the assignment. Although results are only shown with seed 2, results were very similar with different seeds, varying only by a few percentage points.
Abbreviations: GLM, generalized linear method; neg, negative; pos, positive; nnet, neural network; RF, random forest; RPART partition; SVM, support vector machine.
Top four important variables with different models
| Method | 1 | 2 | 3 | 4 |
|---|---|---|---|---|
| GLM | SBP | Chol | Htn Meds | Potassium |
| SVM | SBP | Creat | Cyst C | eGFR |
| RPART | SBP | Protein | HbA1c | Diabetes |
| nnet | LDL | TIA | DBP | Creat |
| RF | SBP | Creat | HbA1c | DBP |
Notes: Data derived from seed =2. Similar results with different seeds for all models.
Abbreviations: Chol, cholesterol; Creat, creatinine; eGFR, estimated glomerular filtration rate; GLM, generalized linear method; HbA1c, glycated hemoglobin; Htn, hypertension; LDL, low-density lipoprotein; nnet, neural network; RF, random forest; RPART partition; SVM, support vector machine; TIA, transient ischemic attack.
| “Age.at.Enrollment” – age of subject | “HTN.Total.Meds.Baseline” – number of antihypertensive meds at baseline |
Abbreviations: MI, myocardial infarction; MI, myocardial infarction; CKD, chronic kidney disease; conc., concentration; HCRI, Harvard Clinical Research Institute.