| Literature DB >> 34847294 |
Zeid Khitan1, Tanmay Nath2, Prasanna Santhanam3.
Abstract
Albuminuria and estimated glomerular filtration rate (e-GFR) are early markers of renal disease and cardiovascular outcomes in persons with diabetes. Although body composition has been shown to predict systolic blood pressure, its application in predicting albuminuria is unknown. In this study, we have used machine learning methods to assess the risk of albuminuria in persons with diabetes using body composition and other determinants of metabolic health. This study is a comparative analysis of the different methods to predict albuminuria in persons with diabetes mellitus who are older than 40 years of age, using the LOOK AHEAD study cohort-baseline characteristics. Age, different metrics of body composition, duration of diabetes, hemoglobin A1c, serum creatinine, serum triglycerides, serum cholesterol, serum HDL, serum LDL, maximum exercise capacity, systolic blood pressure, diastolic blood pressure, and the ankle-brachial index are used as predictors of albuminuria. We used Area under the curve (AUC) as a metric to compare the classification results of different algorithms, and we show that AUC for the different models are as follows: Random forest classifier-0.65, gradient boost classifier-0.61, logistic regression-0.66, support vector classifier -0.61, multilayer perceptron -0.67, and stacking classifier-0.62. We used the Random forest model to show that the duration of diabetes, A1C, serum triglycerides, SBP, Maximum exercise Capacity, serum creatinine, subtotal lean mass, DBP, and subtotal fat mass are important features for the classification of albuminuria. In summary, when applied to metabolic imaging (using DXA), machine learning techniques offer unique insights into the risk factors that determine the development of albuminuria in diabetes.Entities:
Keywords: albuminuria; diabetes; machine learning; metabolic syndrome; proteinuria
Mesh:
Year: 2021 PMID: 34847294 PMCID: PMC8696217 DOI: 10.1111/jch.14397
Source DB: PubMed Journal: J Clin Hypertens (Greenwich) ISSN: 1524-6175 Impact factor: 3.738
The value of the tuned parameters used for the machine learning algorithms
| Algorithm | Feature space | Tuned model |
|---|---|---|
| SVR | ‘C’ = [0.01,0.02,0.03,0.04,0.05,0.005] | SVC(C = 0.03,break_ties = False,cache_size = 200,class_weight = None,coef0 = 0.0,decision_function_shape = ‘ovr’,degree = 3,gamma = ‘auto’,kernel = ‘rbf’,max_iter = ‐1,probability = True,random_state = 42,shrinking = True,tol = 0.001,verbose = False) |
| RFC |
‘min_samples_leaf’: [1,2,3,4,5], ‘min_samples_split’: [2,3,4,5], ‘n_estimators’: [80,100,120], |
RandomForestClassifier(bootstrap = True, ccp_alpha = 0.0,class_weight = None,criterion = ‘entropy’,max_depth = 4,max_features = ‘auto’,max_leaf_nodes = None, max_samples = None,min_impurity_decrease = 0.0,min_impurity_split = None,min_samples_leaf = 4,min_samples_split = 2,min_weight_fraction_leaf = 0.0,n_estimators = 100, n_jobs = None,oob_score = False,random_state = 42, verbose = 0,warm_start = False) |
| GBC |
‘learning_rate’:[0.01,0.001,0.0001], ‘n_estimators’:[80,100,120], ‘min_samples_split’:[1,2,3,4,5], ‘min_samples_leaf’:[2,3,4,5], | GradientBoostingClassifier(ccp_alpha = 0.0,criterion = ‘friedman_mse’,init = None,learning_rate = 0.01,loss = ‘deviance’,max_depth = 4,max_features = ‘auto’,max_leaf_nodes = None, min_impurity_decrease = 0.0, min_impurity_split = None,min_samples_leaf = 4,min_samples_split = 2,min_weight_fraction_leaf = 0.0,n_estimators = 120,n_iter_no_change = None,presort = ‘deprecated’,random_state = 42,subsample = 1.0,tol = 0.0001,validation_fraction = 0.1,verbose = 0,warm_start = False) |
| LR |
‘penalty’ : [‘l1’, ‘l2’], ‘C’:[0.1,1,5,10,50,100,1000] | LogisticRegression(C = 0.1,class_weight = None,dual = False,fit_intercept = True,intercept_scaling = 1,l1_ratio = None,max_iter = 1000,multi_class = ‘auto’,n_jobs = None,penalty = ‘l2’,random_state = 42,solver = ‘lbfgs’,tol = 0.0001,verbose = 0,warm_start = False) |
Descriptive statistics of over 1300 participants showing the different factors and their distribution
| Parameter | Mean | SD | Min | 25% | 50% | 75% | Max |
|---|---|---|---|---|---|---|---|
| Subtotal Lean(g) | 50 618.84 | 10 026.94 | 28 947.80 | 42 762.02 | 49 439.23 | 57 956.04 | 80 409.82 |
| Subtotal Fat (g) | 38 873.46 | 10 400.89 | 17 980.00 | 31 008.20 | 36 900.25 | 45 606.53 | 72 435.67 |
| Diabetes Duration (years) | 6.63 | 6.20 | 0.00 | 2.00 | 5.00 | 10.00 | 46.00 |
| Age (years) | 58.38 | 6.59 | 45.00 | 55.00 | 58.00 | 63.00 | 75.00 |
| A1C (%) | 7.31 | 1.21 | 4.70 | 6.40 | 7.10 | 7.98 | 12.50 |
| Serum Creatinine (mg/dL) | 0.80 | 0.20 | 0.40 | 0.70 | 0.80 | 0.90 | 1.80 |
| Serum Triglycerides(mg/dL) | 194.16 | 131.48 | 21.00 | 115.00 | 165.00 | 233.75 | 1527.00 |
| Total Cholesterol(mg/dL) | 194.43 | 37.07 | 82.00 | 167.00 | 192.00 | 217.00 | 405.00 |
| HDL Cholesterol (mg/dL) | 43.40 | 11.63 | 15.00 | 35.00 | 42.00 | 50.00 | 112.00 |
| Maximum Exercise Capacity (Mets) | 7.47 | 1.94 | 3.70 | 6.00 | 7.15 | 8.70 | 15.30 |
| Systolic Blood Pressure(mmHg) | 129.85 | 17.22 | 77.00 | 117.00 | 129.00 | 141.50 | 209.50 |
| Diastolic Blood Pressure(mmHg) | 69.85 | 9.41 | 42.50 | 63.50 | 70.00 | 76.50 | 100.00 |
| Ankle Brachial Index(ratio) | 1.17 | 0.14 | 0.67 | 1.08 | 1.16 | 1.24 | 2.68 |
| Albumin to Creatinine Ratio | 0.18 | 0.38 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 |
FIGURE 1The correlation matrix after removing the highly correlated variables of body composition
FIGURE 2The confusion matrices of the different machine learning models in the training dataset
FIGURE 3The confusion matrices of the different machine learning models in the testing dataset
FIGURE 4The results of the cross‐validation of the different models
FIGURE 5The ROC curves of the different models showing the Area Under the Curves (AUCs)
FIGURE 6The feature selection based on the level of importance (based on the Random Forest Classifier)