| Literature DB >> 33789908 |
Tibor V Varga1,2,3, Jinxi Liu4, Ronald B Goldberg5, Guannan Chen4, Samuel Dagogo-Jack6, Carlos Lorenzo7, Kieren J Mather8, Xavier Pi-Sunyer9, Søren Brunak2, Marinella Temprosa10.
Abstract
INTRODUCTION: Although various lipid and non-lipid analytes measured by nuclear magnetic resonance (NMR) spectroscopy have been associated with type 2 diabetes, a structured comparison of the ability of NMR-derived biomarkers and standard lipids to predict individual diabetes risk has not been undertaken in larger studies nor among individuals at high risk of diabetes. RESEARCH DESIGN AND METHODS: Cumulative discriminative utilities of various groups of biomarkers including NMR lipoproteins, related non-lipid biomarkers, standard lipids, and demographic and glycemic traits were compared for short-term (3.2 years) and long-term (15 years) diabetes development in the Diabetes Prevention Program, a multiethnic, placebo-controlled, randomized controlled trial of individuals with pre-diabetes in the USA (N=2590). Logistic regression, Cox proportional hazards model and six different hyperparameter-tuned machine learning algorithms were compared. The Matthews Correlation Coefficient (MCC) was used as the primary measure of discriminative utility.Entities:
Keywords: diabetes mellitus; lipids; lipoproteins; prediabetic state; type 2
Year: 2021 PMID: 33789908 PMCID: PMC8016090 DOI: 10.1136/bmjdrc-2020-001953
Source DB: PubMed Journal: BMJ Open Diabetes Res Care ISSN: 2052-4897
Clinical characteristics at baseline among participants with available data (N=2590)
| All | Placebo | Metformin | Lifestyle |
| |
| Treatment | 2590 | 867 | 865 | 858 | |
| Age | 50.8 (44.4, 58.3) | 50.4 (45.0, 57.9) | 50.7 (43.5, 59.2) | 51.0 (44.9, 57.9) | 0.508 |
| Sex | 0.441 | ||||
| Male | 898 (34.7) | 292 (33.7) | 294 (34.0) | 312 (36.4) | |
| Female | 1692 (65.3) | 575 (66.3) | 571 (66.0) | 546 (63.6) | |
| Ethnicity | 0.290 | ||||
| Non-Hispanic white | 1395 (53.9) | 464 (53.5) | 445 (51.4) | 486 (56.6) | |
| African American | 510 (19.7) | 170 (19.6) | 170 (19.7) | 170 (19.8) | |
| Hispanic | 428 (16.5) | 144 (16.6) | 154 (17.8) | 130 (15.2) | |
| Asian American | 116 (4.5) | 37 (4.3) | 49 (5.7) | 30 (3.5) | |
| American Indian | 141 (5.4) | 52 (6.0) | 47 (5.4) | 42 (4.9) | |
| HbA1c (%) | 5.9 (5.6, 6.2) | 5.9 (5.6, 6.2) | 5.9 (5.6, 6.2) | 5.9 (5.6, 6.2) | 0.702 |
| Fasting glucose (mg/dL) | 105.0 (100.0, 111.0) | 105.0 (100.0, 111.0) | 105.0 (100.0, 111.0) | 104.0 (100.0, 111.0) | 0.748 |
| BMI (kg/m2) | 32.6 (28.9, 37.2) | 32.8 (28.8, 37.5) | 32.5 (28.9, 36.9) | 32.6 (29.1, 37) | 0.795 |
| Waist (cm) | 103.8 (94.8, 113.2) | 103.5 (94.1, 113.2) | 103.2 (95.1, 113) | 104.4 (94.9, 113.2) | 0.779 |
| Systolic blood pressure (mm Hg) | 122.0 (113.0, 133.0) | 122.0 (113, 132.0) | 122.0 (113.0, 133.0) | 123.5 (113.0, 133.8) | 0.429 |
| Family history of diabetes | 0.786 | ||||
| No | 803 (31.0) | 270 (31.1) | 261 (30.2) | 272 (31.7) | |
| Yes | 1787 (69.0) | 597 (68.9) | 604 (69.8) | 586 (68.3) | |
| GDM history | 0.645 | ||||
| No | 1450 (56.0) | 491 (56.6) | 485 (56.1) | 474 (55.2) | |
| Yes | 242 (9.3) | 84 (9.7) | 86 (9.9) | 72 (8.4) | |
| Not applicable (male) | 898 (34.7) | 292 (33.7) | 294 (34.0) | 312 (36.4) | |
| Blood pressure medication | 0.569 | ||||
| No | 2166 (83.6) | 733 (84.5) | 715 (82.7) | 718 (83.7) | |
| Yes | 424 (16.4) | 134 (15.5) | 150 (17.3) | 140 (16.3) | |
| 2-hour glucose (mg/dL) | 162.0 (149.0, 178.0) | 163.0 (149.0, 178.0) | 162.0 (150.0, 178.0) | 163.0 (149.0, 179.0) | 0.801 |
| Insulinogenic index (uU/mg) | 103.8 (66.7, 158.2) | 105.8 (66.3, 163.6) | 104.8 (67.2, 160.3) | 101.8 (66.7, 152.3) | 0.729 |
| IFI (mL/uU) | 4.2×10−2 (0, 6.2×10−2) | 4.2×10−2 (0, 6.2×10−2) | 4.4×10−2 (0, 6.2×10−2) | 4.2×10−2 (0, 6.2×10−2) | 0.600 |
| Triglycerides (mg/dL) | 144.0 (102.0, 204.0) | 149.0 (105.0, 206.5) | 142.0 (98.0, 201.0) | 142.0 (100.2, 201.0) | 0.096 |
| Total cholesterol (mg/dL) | 203.0 (179.0, 227.8) | 202.0 (177.0, 227.5) | 205.0 (180.0, 229.0) | 202.5 (178.2, 226) | 0.541 |
| LDL-C (mg/dL) | 124.0 (102.0, 145.0) | 123.0 (101.0, 146.5) | 125.0 (102.0, 145.0) | 122.0 (103.0, 144.0) | 0.720 |
| HDL-C (mg/dL) | 44.0 (37.0, 52.0) | 43.0 (37.0, 51.0) | 45.0 (37.0, 53.0) | 44.0 (38.0, 53.0) | 0.018 |
The table shows the median and 25th and 75th percentiles for continuous variables and counts and percentages for categorical variables.
P values are calculated using Kruskal-Wallis rank sum tests (for continuous variables) and Χ2 tests (for categorical variables) to assess mean differences between the three treatment arms.
BMI, body mass index; GDM, gestational diabetes mellitus; HbA1c, hemoglobin A1c; HDL-C, high-density lipoprotein cholesterol; IFI, reciprocal of the fasting insulin level; LDL-C, low-density lipoprotein cholesterol.
Figure 1MCC and ROC AUC statistics across all machine learning algorithms and baseline lipid-related prediction models in relation to short-term and long-term diabetes incidence (N=2590). MCC averages are represented by circles and ROC AUC averages are represented by squares. The averages are calculated from the five obtained MCC and ROC AUC values from the five separate test sets in the nested cross-validation framework. The error bars represent SD of the five obtained MCC and ROC AUC values. The left panel shows the discriminative utilities for short-term, while the right panel shows the discriminative utilities for long-term diabetes incidence. Model 1 includes predictors: age at randomization, sex, self-reported ethnicity, all laboratory lipids, lipid-lowering medication use and treatment arm. Model 2 includes all Model 1 predictors and all baseline lipid-related NMR analytes. ANN, artificial neural network; GLM, generalized linear model (refers to logistic regression here); MCC, Matthews Correlation Coefficient; NMR, nuclear magnetic resonance; RF, random forest; ROC AUC, receiver operating characteristic area under the curve; SGB, stochastic gradient boosting; SVM-L, support vector machine with linear kernel; SVM-P, support vector machine with polynomial kernel; SVM-R, support vector machine with radial kernel; T2D, type 2 diabetes.
Figure 2(A) Univariate discriminative utilities of continuous analytes at baseline in relation to short-term and long-term diabetes incidence (N=2590). The MCC and ROC AUC values are averages, calculated from the five obtained MCC and ROC AUC values from the five separate test sets in the nested cross-validation framework using the GLM method (logistic regression). The black circles represent MCC and ROC AUC values for short-term diabetes, while the red circles represent MCC and ROC AUC values for long-term diabetes. The predictors are sorted according to their MCC values for short-term diabetes. Model 1 includes predictors: age at randomization, sex, self-reported ethnicity, all laboratory lipids, lipid-lowering medication use and treatment arm. Model 2 includes all model 1 predictors and all baseline lipid-related NMR analytes. (B) Distributions of the six best performing univariate predictors for short-term diabetes, stratified by incident diabetes status (N=2590). The upper panel of the figure shows a schematic explanation for distributions that generally indicate good versus poor discriminative utility. The lower panel of the figure shows the density plots of the variables fasting glucose, 2-hour glucose, HbA1c, insulinogenic index, TRL size and glycine. AcAc, acetoacetate; ApoA1, apolipoprotein A1; ApoB, apolipoprotein B; BHB, beta-hydroxy-butyrate; BMI, body mass index; GlycA, glycoprotein acetylation; GLM, generalized linear model; HbA1c, hemoglobin A1c; HDL-C, high-density lipoprotein cholesterol; IFI, reciprocal of the fasting insulin level; LDL-C, low-density lipoprotein cholesterol; MCC, Matthews Correlation Coefficient; NMR, nuclear magnetic resonance; PPD, peak particle density; ROC AUC, receiver operating characteristic area under the curve; SBP, systolic blood pressure; T2D, type 2 diabetes; TC, total cholesterol; TG, triglycerides; TRL, triglyceride rich lipoprotein; TRL-C, TRL-cholesterol; TRL-G, TRL-triacylglycerol.
Figure 3MCC and ROC AUC statistics across all machine learning algorithms and all prediction models in relation to short-term and long-term diabetes incidence (N=2590). MCC averages are represented by circles and ROC AUC averages are represented by squares. The averages are calculated from the five obtained MCC and ROC AUC values from the five separate test sets in the nested cross-validation framework. The error bars represent SD of the five obtained MCC and ROC AUC values. The left panel shows the discriminative utilities for short-term, while the right panel shows the discriminative utilities for long-term diabetes incidence. This figure demonstrates the model results for all baseline models (N=2590). ANN, artificial neural network; GLM, generalized linear model (refers to logistic regression here); MCC, Matthews Correlation Coefficient; RF, random forest; ROC AUC, receiver operating characteristic area under the curve; SGB, stochastic gradient boosting; SVM-L, support vector machine with linear kernel; SVM-P, support vector machine with polynomial kernel; SVM-R, support vector machine with radial kernel; T2D, type 2 diabetes.