| Literature DB >> 23879411 |
Ross K K Leung1, Ying Wang, Ronald C W Ma, Andrea O Y Luk, Vincent Lam, Maggie Ng, Wing Yee So, Stephen K W Tsui, Juliana C N Chan.
Abstract
BACKGROUND: Multi-causality and heterogeneity of phenotypes and genotypes characterize complex diseases. In a database with comprehensive collection of phenotypes and genotypes, we compared the performance of common machine learning methods to generate mathematical models to predict diabetic kidney disease (DKD).Entities:
Mesh:
Year: 2013 PMID: 23879411 PMCID: PMC3726338 DOI: 10.1186/1471-2369-14-162
Source DB: PubMed Journal: BMC Nephrol ISSN: 1471-2369 Impact factor: 2.388
Baseline characteristics of 673 Chinese patients with type 2 diabetes stratified by the onset of diabetic kidney disease (DKD) after a median follow up period of 8 years
| Number | 554 | 119 | |
| | | | |
| Age (years) | 56 | 64 | <0.001b |
| (47 to 63) | (58 to 69) | ||
| Male sex | 39.7% (220) | 47.9% (57) | 0.100a |
| Age of onset (years) | 45 | 53 | <0.001b |
| (38–55) | (45–60) | ||
| Duration of diabetes (years) | 9 | 10 | 0.032b |
| (2 to 13) | (6 to 13) | ||
| Smoking | | | 0.003a |
| Ex smokers | 31.8% (176) | 20.2% (24) | |
| Current smokers | 9.7% (54) | 20.2% (24) | |
| BMI (kg/m2) | 24.7 | 25.2 | 0.160b |
| (22.3 to 27.0) | (22.7 to 27.2) | ||
| Waist circumference (cm) Men | 88.0 | 89.0 | 0.325b |
| (83.0 to 92.8) | (84.0 to 96.0) | ||
| Waist circumference (cm) Women | 83.0 | 85.0 | 0.109b |
| (77.0 to 89.0) | (77.8 to 93.0) | ||
| Waist to hip ratio | 0.88 | 0.91 | <0.001b |
| (0.84 to 0.92) | (0.87 to 0.96) | ||
| Systolic BP (mmHg) | 132 | 156 | <0.001b |
| | (120 to 145) | (140 to 171) | |
| Diastolic BP (mmHg) | 77 | 83 | <0.001b |
| (70 to 85) | (76 to 93) | ||
| | | | |
| HbA1c (%) | 7.5 | 7.6 | 0.919b |
| (6.7 to 8.7) | (6.6 to 8.8) | ||
| Fasting plasma glucose (mmol/L) | 7.9 | 8.3 | 0.849b |
| (6.5 to 10.5) | (6.2 to 10.1) | ||
| LDL cholesterol (mmol/L) | 3.20 | 3.70 | 0.008b |
| (2.70 to 3.90) | (2.80 to 4.38) | ||
| HDL cholesterol (mmol/L) | 1.20 | 1.11 | <0.001b |
| (1.00 to 1.50) | (0.90 to 1.40) | ||
| Triglyceride (mmol/L) | 1.25 | 1.87 | <0.001b |
| (0.87 to 1.95) | (1.14 to 2.55) | ||
| Total cholesterol (mmol/L) | 5.3 | 5.7 | 0.001b |
| (4.6 to 6.0) | (4.8 to 6.7) | ||
| White blood cell count (×109/L) | 7.0 | 7.7 | <0.001b |
| (5.8 to 8.3) | (6.8 to 9.1) | ||
| ACR (mg/mmol) | 1.5 | 245.6 | <0.001b |
| (0.8 to 4.7) | (81.5 to 423.4) | ||
| eGFR (ml/min/ 1.73 m2) | 119.9 | 38.0 | <0.001b |
| (101.7 to 138.1) | (27.0 to 48.9) | ||
| | | | |
| Lipid lowering drugs | 5.4% (30) | 23.5% (28) | <0.001a |
| ACEI/ARB | 6.1% (34) | 28.6% (34) | <0.001a |
| Other blood pressure lowering drugs | 21.3% (118) | 63.9% (76) | <0.001a |
| Oral blood glucose lowering drugs | 53.2% (295) | 36.1% (43) | 0.001a |
| Insulin | 16.8% (93) | 39.5% (47) | <0.001a |
a Derived from Chi-square text, % (N);b Mann–Whitney Two-Sample Test, Median (25th to 75th quartiles).
Abbreviation: BMI, body mass index; HbA1c, glycated hemoglobin; HDL, high density lipoprotein; LDL, low density lipoprotein.
Figure 1Ten-fold cross-validation predictive performance by different machine learning methods in the DKD training dataset using A) clinical and genetic attributes, B) genetic-only attributes, C) clinical-only attributes. Abbreviations – svmradial: support vector machine using radial basis kernel function, rpart: recursive partitioning and regression trees, nnet: feed-forward neural networks and multinomial log-linear models, nb: naïve Bayes classifier, cforest: random forest utilizing conditional inference trees as base learners, C5.0 Tree: C5.0 decision tree, pls: partial least squares regression.
Figure 2Prediction accuracy by different machine learning methods in the DKD training and testing datasets using A) clinical and gene attributes, B) genetic-only attributes, C) clinical-only attributes. Circles in pink and blue represent prediction accuracy using training and testing data respectively.
Figure 3Prediction performance by support vector machine (svmRadial) and random forest (cforest) using 10 and 5 most frequently selected clinical and genetic attributes respectively. A) prediction accuracy in the DKD training and testing datasets, B) ranking of importance of attributes based on svmRadial, C) ranking of importance of attributes based on cforest.