Literature DB >> 33789908

Predictive utilities of lipid traits, lipoprotein subfractions and other risk factors for incident diabetes: a machine learning approach in the Diabetes Prevention Program.

Tibor V Varga^1,2,3, Jinxi Liu⁴, Ronald B Goldberg⁵, Guannan Chen⁴, Samuel Dagogo-Jack⁶, Carlos Lorenzo⁷, Kieren J Mather⁸, Xavier Pi-Sunyer⁹, Søren Brunak², Marinella Temprosa¹⁰.

Abstract

INTRODUCTION: Although various lipid and non-lipid analytes measured by nuclear magnetic resonance (NMR) spectroscopy have been associated with type 2 diabetes, a structured comparison of the ability of NMR-derived biomarkers and standard lipids to predict individual diabetes risk has not been undertaken in larger studies nor among individuals at high risk of diabetes. RESEARCH DESIGN AND METHODS: Cumulative discriminative utilities of various groups of biomarkers including NMR lipoproteins, related non-lipid biomarkers, standard lipids, and demographic and glycemic traits were compared for short-term (3.2 years) and long-term (15 years) diabetes development in the Diabetes Prevention Program, a multiethnic, placebo-controlled, randomized controlled trial of individuals with pre-diabetes in the USA (N=2590). Logistic regression, Cox proportional hazards model and six different hyperparameter-tuned machine learning algorithms were compared. The Matthews Correlation Coefficient (MCC) was used as the primary measure of discriminative utility.
RESULTS: Models with baseline NMR analytes and their changes did not improve the discriminative utility of simpler models including standard lipids or demographic and glycemic traits. Across all algorithms, models with baseline 2-hour glucose performed the best (max MCC=0.36). Sophisticated machine learning algorithms performed similarly to logistic regression in this study.
CONCLUSIONS: NMR lipoproteins and related non-lipid biomarkers were associated but did not augment discrimination of diabetes risk beyond traditional diabetes risk factors except for 2-hour glucose. Machine learning algorithms provided no meaningful improvement for discrimination compared with logistic regression, which suggests a lack of influential latent interactions among the analytes assessed in this study. TRIAL REGISTRATION NUMBER: Diabetes Prevention Program: NCT00004992; Diabetes Prevention Program Outcomes Study: NCT00038727. © Author(s) (or their employer(s)) 2021. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.

RCT Entities: Population Interventions Outcomes

Entities: CellLine Chemical Disease Gene Mutation Species

Keywords: diabetes mellitus; lipids; lipoproteins; prediabetic state; type 2

Year: 2021 PMID： 33789908 PMCID： PMC8016090 DOI： 10.1136/bmjdrc-2020-001953

Source DB: PubMed Journal: BMJ Open Diabetes Res Care ISSN： 2052-4897

A large number of lipid and lipoprotein biomarkers demonstrate robust associations with type 2 diabetes, and certain lipid biomarkers such as triglycerides and high-density lipoprotein cholesterol are components of established clinical type 2 diabetes risk prediction models. High-throughput, large-scale, low-cost assessments of previously unconsidered biomarkers, such as full nuclear magnetic resonance (NMR)-derived biomarker panels, are becoming commonplace. Lipoproteins and other NMR-derived analytes do not offer clinically meaningful improvement in the prediction of type 2 diabetes compared with standard laboratory lipids or a minimal model that comprised age, sex, ethnicity and fasting glucose in a population of individuals with pre-diabetes. Association is not prediction: while numerous biomarkers demonstrate robust statistical associations with type 2 diabetes, their cumulative discriminative utility can be low. Baseline postprandial 2-hour glucose levels offer a meaningful improvement in discriminating future diabetes when compared with simpler models including fasting glucose. Future studies should evaluate NMR-derived analytes and biomarkers from other ‘omics’ profiles with regard to their discriminatory utilities for type 2 diabetes in diverse, prospective, general population cohorts using statistically appropriate predictive models.

Introduction

Lipid and lipoprotein abnormalities are well-established risk factors for type 2 diabetes.1 Elevated triglyceride and reduced high-density lipoprotein cholesterol (HDL-C) levels have been shown to associate with incident diabetes after adjustment of other standard diabetes risk factors,2–4 and triglyceride values >250 mg/dL and HDL-C <35 mg/dL have been recommended by the American Diabetes Association as screening criteria for pre-diabetes and diabetes5 and are routinely used in diabetes risk scores.4 Furthermore, alterations in lipoprotein size and concentration such as those characterized by nuclear magnetic resonance (NMR) have been found to associate with incident diabetes6–10 and have been shown to identify insulin resistance-based dyslipoproteinemia early in the course of diabetes development.6 7 In most studies to date, these associations remained statistically significant after adjusting for standard lipid measurements.6–9 While alterations in lipids and lipoproteins demonstrate reproducible, robust statistical associations with type 2 diabetes, it is unknown whether standard lipid measurements have predictive utility (good classification of future cases) for diabetes incidence and whether lipoproteins improve prediction over standard lipids or standard diabetes risk factors.11 This is especially pertinent in subjects at high risk of developing diabetes, such as those with pre-diabetes, where improved individualized prediction12 might allow more targeted implementation of prevention strategies. The statistical methods underlying evaluation of risk factor association differ from those used for assessing outcome prediction. Specifically, to properly assess whether a biomarker can classify an individual correctly according to whether they eventually develop a disease or not requires specific statistical testing pertaining to outcome prediction and discriminative utility13 14 using measures such as the receiver operating characteristic area under the curve (ROC AUC) and other metrics. We leveraged data from the Diabetes Prevention Program (DPP) and the Diabetes Prevention Program Outcomes Study (DPPOS) to evaluate the predictive utility of standard lipid measurements as well as NMR-measured lipoprotein size and concentration for incident diabetes. The DPP was a randomized clinical trial that tested the effect of lifestyle and metformin interventions compared with placebo in preventing diabetes in a large cohort with pre-diabetes who were at high risk of diabetes development.15 We evaluated whether lipid measures added predictive utility to standard glycemic, anthropometric and other established risk factors in the three intervention groups. Since these interventions have significant effects on metabolic markers,16 we included in the analysis both baseline measures and their changes 1 year after randomization, and we tested whether these factors predicted incident diabetes differently in participants who progressed relatively rapidly compared with those who progressed more slowly. In addition, because the NMR method has been extended to include several novel non-lipid biomarkers that have been shown to associate with diabetes,17 these were tested as well. Lastly, based on the assumption that state-of-the-art machine learning algorithms might have advantages over logistic regression models when latent interactions exist in the data matrix,18 we examined whether there were differences in discriminative utilities using a range of standard statistical and machine learning algorithms. All models were internally validated using a robust, nested cross-validation framework.

Materials and methods

Participants

The DPP was a multiethnic, multicenter, randomized controlled trial (RCT) located in the USA. Initially, 3234 individuals with fasting glucose levels 95–125 mg/dL and impaired glucose tolerance who were overweight or obese were randomized into four arms: intensive lifestyle intervention, metformin, troglitazone and placebo control.15 19 The troglitazone arm was subsequently terminated due to side effects. Individuals in the metformin arm received 850 mg metformin two times per day, and those in the lifestyle arm received individual and group-based counseling and were encouraged to maintain a moderate level of physical activity and reduce their dietary fat consumption.15 The placebo arm received general advice on healthy lifestyle habits. The primary endpoint of the DPP was type 2 diabetes incidence, assessed semiannually by a fasting glucose test and annually by an oral glucose tolerance test, and the RCT was terminated at 3.2 years.20 The DPPOS was established as a continuation of the DPP. By maintaining the three original intervention groups, the main aim of the DPPOS was to investigate whether the treatment effects on diabetes would translate into long-lasting health effects.21 After removing individuals who were initially randomized to the troglitazone arm and those with no NMR analytes measured, the total sample size for this study was 2590 at baseline. All participants provided written, informed consent.

Standard laboratory and NMR methods

Information on basic and clinical variables in the DPP has been reported elsewhere.19 22 In brief, anthropometric measures, blood pressure and clinical data were collected using standard methods. Measures of insulin, glycemia and standard lipids were obtained at the Central Biochemistry Laboratory (Northwest Lipid Research Laboratories, University of Washington, Seattle, Washington).16 The reciprocal of the fasting insulin level (1/FI or IFI) was used as a marker of insulin resistance, and the insulinogenic index (Δ-insulin (30–0 min)/Δ-glucose (30–0 min)) was used as a marker of insulin secretion. The insulinogenic index was determined during an oral glucose tolerance test.23 Lipoprotein subclass concentrations and lipoprotein sizes at randomization (the beginning of the DPP) and 1 year after randomization were quantified by NMR spectroscopy at LipoScience using fasting heparin samples stored at −70°C.16 Laboratory lipids included serum triglycerides, total cholesterol, HDL-C and low-density lipoprotein cholesterol (LDL-C) levels. NMR analytes included lipid/lipoprotein and non-lipid measures. The lipid-related analytes included HDL-related measures: large, medium, small HDL and H1P, H2P, H3P, H4P, H5P, H6P, and H7P concentrations, and HDL size; LDL-related measures: large, small LDL concentrations and LDL size; and triacylglycerol-rich lipoprotein (TRL)-related measures: very large, large, medium, small TRL concentrations, TRL-carried cholesterol and triglyceride levels, TRL size, LDL peak particle density, apolipoprotein B and apolipoprotein A1.16 22 The H1P–H7P subclasses represent a refined classification of HDL particles from the smallest (H1P) to the largest particles (H7P). Non-lipid analytes were also measured using NMR, including amino acids (glycine, valine, leucine, isoleucine, alanine), ketones (acetone, beta-hydroxy-butyrate, acetoacetate, total ketones), citrate, and glycoprotein acetylation (GlycA), a novel inflammatory biomarker.24 The predictive utility of these analytes was evaluated for incident diabetes at the end of the DPP (short-term, 3.2 years) and at the end of the DPPOS (long-term, 15 years).

Statistical analyses

Statistical analyses were performed using R v.3.6.1.25 In the analytic framework, single analytes measured at baseline were evaluated in univariate prediction models. In addition, the following multivariable models were evaluated: Model 1: age at randomization, sex (male, female), self-reported ethnicity (non-Hispanic white, African American, Hispanic, American Indian and Asian American), laboratory lipids, lipid-lowering medication use (yes/no), and treatment arm (placebo, lifestyle, metformin). Model 2: Model 1 + all baseline lipid-related NMR analytes. Model 3: age at randomization, sex, self-reported ethnicity, fasting glucose, baseline hemoglobin A1c (HbA1c), and treatment arm. Model 4: Model 3 + family history of diabetes (yes/no), gestational diabetes mellitus history (yes/no for women and not applicable for men), systolic blood pressure (SBP), blood pressure medication use (yes/no), waist circumference, and body mass index (BMI). Model 5: Model 3 + all laboratory lipids and lipid-lowering medication use. Model 6: Model 3 + all laboratory lipids, all baseline lipid-related NMR analytes, and lipid-lowering medication use. Model 7a: Model 6 + all baseline NMR analytes. Model 7b: Model 6 + all baseline NMR analytes and their changes. Model 8: Model 7a + family history of diabetes, gestational diabetes mellitus history, SBP, blood pressure medication use, waist circumference, and BMI. Model 9: Model 4 + postprandial glucose, insulinogenic index, and IFI. In Model 7b, NMR analytes changes were defined as: As NMR data at 1 year were available in a smaller sample (n=2067 vs N=2590), model comparison was undertaken in two separate analytic steps: (1) using the total sample size at baseline (N=2590) and not considering model 7b; and (2) using the total sample size at baseline and follow-up (n=2067) and considering all models, including model 7b. The data contained no missing values. Before analyses, variables with zero or near-zero variances and linearly dependent variables were removed. To determine the effect of correlation among included variables on the subsequent analytic framework, three distinct pairwise correlation filters were used in separate models: (1) Pearson’s |r|>0.6; (2) Pearson’s |r|>0.8; and (3) no correlation filter. Model comparisons were undertaken in a 5-fold nested cross-validation framework (illustrated by online supplemental figure 1) using the caret package.26 In brief, the outer cross-validation loops split the data into five training and validation sets (80%) and five hold-out test sets (20%). The inner cross-validation loops split the five training + validation sets into five training (80%) and validation sets (20%). In this construct, the inner loops are used for hyperparameter optimization using a grid search, and the outer loops are used to establish discriminative utility using the hold-out test sets. The hyperparameter optimization step was implemented as machine learning algorithms have a large number of parameters (eg, number of hidden units and layers in a neural network, number of trees in random forest) that can alter the performance of the algorithms. As the performance of the algorithms depends on the used data, it is of importance to systematically evaluate a wide range of these tunable parameters during the training + validation phase. All numeric variables have been scaled to mean=0 and SD=1 in the training sets, and the summary statistics of the training data were used to scale the test data in a separate step.27 Downsampling of the majority outcome class in the training set was implemented to ensure outcome balance. No downsampling was undertaken in the test sets. In the inner and outer cross-validation loops, the Matthews Correlation Coefficient (MCC) was used as a measure of discriminative utility, as it has been shown to be one of the most robust measures in binary classification problems.28 MCC is defined as: where TP, TN, FP and FN correspond to the number of true positives, true negatives, false positives and false negatives from the confusion matrix, respectively. ROC AUC values were also presented as a secondary measure. MCC and ROC AUC values from the five tests sets were averaged. Of note, ROC AUC values range between 0 and 1, with 0.5 representing random guess. In contrast, MCC values range between −1 (perfect negative correlation) and 1 (perfect correlation), with 0 representing random guess (no correlation). Logistic regression (generalized linear model, GLM), Cox proportional hazards model and six hyperparameter-tuned machine learning algorithms were employed to assess the discriminative utilities of the models. The six algorithms were stochastic gradient boosting, random forest, support vector machines with linear kernel (SVM-L), polynomial kernel (SVM-P) and radial kernel (SVM-R), and artificial neural network (ANN). The methods and hyperparameters are described in online supplemental file 2. We hypothesized that any improvement in discriminative utilities between GLM and the more elaborate machine learning algorithms will be due to linear and/or non-linear interactions that the logistic regression framework would not be able to detect without adding explicit interaction terms. To test this hypothesis, we conducted an experiment on simulated data to assess whether latent interactions would be detected or not using the eight algorithms above. This simulation experiment and its results are described in detail in online supplemental file 3.

Results

Baseline and 1-year characteristics

Baseline clinical characteristics are shown in table 1. Concentrations and sizes of the main lipoprotein classes and their 1-year changes have been reported previously in a smaller subset of the DPP.16 Additional NMR analysis contributed more detailed phenotypic resolution with additional measured metabolites, in a larger sample size. Thus, the descriptive statistics of baseline and 1-year NMR analytes and their changes by treatment and overall were recalculated and shown in online supplemental tables 1–3, respectively. At 3.2 and 15 years following randomization, 20.9% and 50.4% of the study cohort had developed diabetes, respectively. A heatmap representing Spearman correlations among laboratory lipids and NMR analytes is shown in the interactive online supplemental figure 2.

Table 1

Clinical characteristics at baseline among participants with available data (N=2590)

	All	Placebo	Metformin	Lifestyle	P value
Treatment	2590	867	865	858
Age	50.8 (44.4, 58.3)	50.4 (45.0, 57.9)	50.7 (43.5, 59.2)	51.0 (44.9, 57.9)	0.508
Sex					0.441
Male	898 (34.7)	292 (33.7)	294 (34.0)	312 (36.4)
Female	1692 (65.3)	575 (66.3)	571 (66.0)	546 (63.6)
Ethnicity					0.290
Non-Hispanic white	1395 (53.9)	464 (53.5)	445 (51.4)	486 (56.6)
African American	510 (19.7)	170 (19.6)	170 (19.7)	170 (19.8)
Hispanic	428 (16.5)	144 (16.6)	154 (17.8)	130 (15.2)
Asian American	116 (4.5)	37 (4.3)	49 (5.7)	30 (3.5)
American Indian	141 (5.4)	52 (6.0)	47 (5.4)	42 (4.9)
HbA1c (%)	5.9 (5.6, 6.2)	5.9 (5.6, 6.2)	5.9 (5.6, 6.2)	5.9 (5.6, 6.2)	0.702
Fasting glucose (mg/dL)	105.0 (100.0, 111.0)	105.0 (100.0, 111.0)	105.0 (100.0, 111.0)	104.0 (100.0, 111.0)	0.748
BMI (kg/m²)	32.6 (28.9, 37.2)	32.8 (28.8, 37.5)	32.5 (28.9, 36.9)	32.6 (29.1, 37)	0.795
Waist (cm)	103.8 (94.8, 113.2)	103.5 (94.1, 113.2)	103.2 (95.1, 113)	104.4 (94.9, 113.2)	0.779
Systolic blood pressure (mm Hg)	122.0 (113.0, 133.0)	122.0 (113, 132.0)	122.0 (113.0, 133.0)	123.5 (113.0, 133.8)	0.429
Family history of diabetes					0.786
No	803 (31.0)	270 (31.1)	261 (30.2)	272 (31.7)
Yes	1787 (69.0)	597 (68.9)	604 (69.8)	586 (68.3)
GDM history					0.645
No	1450 (56.0)	491 (56.6)	485 (56.1)	474 (55.2)
Yes	242 (9.3)	84 (9.7)	86 (9.9)	72 (8.4)
Not applicable (male)	898 (34.7)	292 (33.7)	294 (34.0)	312 (36.4)
Blood pressure medication					0.569
No	2166 (83.6)	733 (84.5)	715 (82.7)	718 (83.7)
Yes	424 (16.4)	134 (15.5)	150 (17.3)	140 (16.3)
2-hour glucose (mg/dL)	162.0 (149.0, 178.0)	163.0 (149.0, 178.0)	162.0 (150.0, 178.0)	163.0 (149.0, 179.0)	0.801
Insulinogenic index (uU/mg)	103.8 (66.7, 158.2)	105.8 (66.3, 163.6)	104.8 (67.2, 160.3)	101.8 (66.7, 152.3)	0.729
IFI (mL/uU)	4.2×10⁻² (0, 6.2×10⁻²)	4.2×10⁻² (0, 6.2×10⁻²)	4.4×10⁻² (0, 6.2×10⁻²)	4.2×10⁻² (0, 6.2×10⁻²)	0.600
Triglycerides (mg/dL)	144.0 (102.0, 204.0)	149.0 (105.0, 206.5)	142.0 (98.0, 201.0)	142.0 (100.2, 201.0)	0.096
Total cholesterol (mg/dL)	203.0 (179.0, 227.8)	202.0 (177.0, 227.5)	205.0 (180.0, 229.0)	202.5 (178.2, 226)	0.541
LDL-C (mg/dL)	124.0 (102.0, 145.0)	123.0 (101.0, 146.5)	125.0 (102.0, 145.0)	122.0 (103.0, 144.0)	0.720
HDL-C (mg/dL)	44.0 (37.0, 52.0)	43.0 (37.0, 51.0)	45.0 (37.0, 53.0)	44.0 (38.0, 53.0)	0.018

The table shows the median and 25th and 75th percentiles for continuous variables and counts and percentages for categorical variables.

P values are calculated using Kruskal-Wallis rank sum tests (for continuous variables) and Χ2 tests (for categorical variables) to assess mean differences between the three treatment arms.

BMI, body mass index; GDM, gestational diabetes mellitus; HbA1c, hemoglobin A1c; HDL-C, high-density lipoprotein cholesterol; IFI, reciprocal of the fasting insulin level; LDL-C, low-density lipoprotein cholesterol.

Clinical characteristics at baseline among participants with available data (N=2590) The table shows the median and 25th and 75th percentiles for continuous variables and counts and percentages for categorical variables. P values are calculated using Kruskal-Wallis rank sum tests (for continuous variables) and Χ2 tests (for categorical variables) to assess mean differences between the three treatment arms. BMI, body mass index; GDM, gestational diabetes mellitus; HbA1c, hemoglobin A1c; HDL-C, high-density lipoprotein cholesterol; IFI, reciprocal of the fasting insulin level; LDL-C, low-density lipoprotein cholesterol.

Comparison of lipid-related models

We first evaluated Model 1 and Model 2 to compare the predictive utilities of standard lipids and all lipid-related NMR analytes. MCC and ROC AUC values from these models are presented in figure 1. The means and SD for MCC and ROC AUC for all models, methods, and correlation filters are browsable in the interactive online supplemental table 4. The discriminative utilities of models for short-term and long-term diabetes demonstrated a maximum observed MCC of 0.16 and a maximum ROC AUC of 0.62. Model 2 offered small improvement in the discriminative utilities compared with model 1 for both short-term and long-term diabetes. A maximum MCC of 0.12 was observed for model 1 using the GLM and ANN methods for short-term diabetes (max ROC AUC=0.61) and a maximum MCC of 0.14 using ANN for long-term diabetes (max ROC AUC=0.58). In comparison, a maximum MCC of 0.16 was observed for model 2 using the SVM-L method for short-term diabetes (max ROC AUC=0.62) and a maximum MCC of 0.16 using SVM-R for long-term diabetes (max ROC AUC=0.60).

Figure 1

MCC and ROC AUC statistics across all machine learning algorithms and baseline lipid-related prediction models in relation to short-term and long-term diabetes incidence (N=2590). MCC averages are represented by circles and ROC AUC averages are represented by squares. The averages are calculated from the five obtained MCC and ROC AUC values from the five separate test sets in the nested cross-validation framework. The error bars represent SD of the five obtained MCC and ROC AUC values. The left panel shows the discriminative utilities for short-term, while the right panel shows the discriminative utilities for long-term diabetes incidence. Model 1 includes predictors: age at randomization, sex, self-reported ethnicity, all laboratory lipids, lipid-lowering medication use and treatment arm. Model 2 includes all Model 1 predictors and all baseline lipid-related NMR analytes. ANN, artificial neural network; GLM, generalized linear model (refers to logistic regression here); MCC, Matthews Correlation Coefficient; NMR, nuclear magnetic resonance; RF, random forest; ROC AUC, receiver operating characteristic area under the curve; SGB, stochastic gradient boosting; SVM-L, support vector machine with linear kernel; SVM-P, support vector machine with polynomial kernel; SVM-R, support vector machine with radial kernel; T2D, type 2 diabetes.

Univariate prediction models

As a second step, we examined univariate prediction models. The results from these models for short-term and long-term diabetes, using GLM, are presented in figure 2A. In these analyses, standard lipid and lipid-related NMR analytes as well as clinical and glycemic risk factors and non-lipid NMR analytes were included. Using MCC, the strongest predictors of both short-term and long-term diabetes were glycemic traits, although the insulinogenic index and IFI were considerably weaker than others. Among standard lipids, triglycerides showed the strongest prediction for short-term diabetes using both MCC and ROC AUC (MCC=0.15; ROC AUC=0.62), while HDL-C was the strongest predictor of long-term diabetes development (MCC=0.08; ROC AUC=0.56). Among NMR lipid analytes, TRL size had the highest prediction for short-term diabetes (MCC=0.15; ROC AUC=0.56), while large and small LDL particles, LDL size and TRL size demonstrated the highest predictive utility for long-term diabetes (MCC ~0.10; ROC AUC ~0.57 for these four lipid analytes).

Figure 2

(A) Univariate discriminative utilities of continuous analytes at baseline in relation to short-term and long-term diabetes incidence (N=2590). The MCC and ROC AUC values are averages, calculated from the five obtained MCC and ROC AUC values from the five separate test sets in the nested cross-validation framework using the GLM method (logistic regression). The black circles represent MCC and ROC AUC values for short-term diabetes, while the red circles represent MCC and ROC AUC values for long-term diabetes. The predictors are sorted according to their MCC values for short-term diabetes. Model 1 includes predictors: age at randomization, sex, self-reported ethnicity, all laboratory lipids, lipid-lowering medication use and treatment arm. Model 2 includes all model 1 predictors and all baseline lipid-related NMR analytes. (B) Distributions of the six best performing univariate predictors for short-term diabetes, stratified by incident diabetes status (N=2590). The upper panel of the figure shows a schematic explanation for distributions that generally indicate good versus poor discriminative utility. The lower panel of the figure shows the density plots of the variables fasting glucose, 2-hour glucose, HbA1c, insulinogenic index, TRL size and glycine. AcAc, acetoacetate; ApoA1, apolipoprotein A1; ApoB, apolipoprotein B; BHB, beta-hydroxy-butyrate; BMI, body mass index; GlycA, glycoprotein acetylation; GLM, generalized linear model; HbA1c, hemoglobin A1c; HDL-C, high-density lipoprotein cholesterol; IFI, reciprocal of the fasting insulin level; LDL-C, low-density lipoprotein cholesterol; MCC, Matthews Correlation Coefficient; NMR, nuclear magnetic resonance; PPD, peak particle density; ROC AUC, receiver operating characteristic area under the curve; SBP, systolic blood pressure; T2D, type 2 diabetes; TC, total cholesterol; TG, triglycerides; TRL, triglyceride rich lipoprotein; TRL-C, TRL-cholesterol; TRL-G, TRL-triacylglycerol.

Comparison of all models

In order to test whether the inclusion of lipid data augmented the predictive utility of models incorporating usual epidemiological measures, we next evaluated seven additional models of increasing complexity. Baseline models are presented in figure 3, while all models (including model 7b with the 1-year change variables and a smaller sample size) are presented in online supplemental figure 3. The means and SD for MCC and ROC AUC for all methods, models and correlation filters are browsable in the interactive online supplemental table 4.

Figure 3

MCC and ROC AUC statistics across all machine learning algorithms and all prediction models in relation to short-term and long-term diabetes incidence (N=2590). MCC averages are represented by circles and ROC AUC averages are represented by squares. The averages are calculated from the five obtained MCC and ROC AUC values from the five separate test sets in the nested cross-validation framework. The error bars represent SD of the five obtained MCC and ROC AUC values. The left panel shows the discriminative utilities for short-term, while the right panel shows the discriminative utilities for long-term diabetes incidence. This figure demonstrates the model results for all baseline models (N=2590). ANN, artificial neural network; GLM, generalized linear model (refers to logistic regression here); MCC, Matthews Correlation Coefficient; RF, random forest; ROC AUC, receiver operating characteristic area under the curve; SGB, stochastic gradient boosting; SVM-L, support vector machine with linear kernel; SVM-P, support vector machine with polynomial kernel; SVM-R, support vector machine with radial kernel; T2D, type 2 diabetes. The standard and NMR lipid measures or the full NMR panel did not improve prediction (max MCC=0.30; max ROC AUC=0.74) when added to the simpler clinical model including fasting glucose (max MCC=0.30; max ROC AUC=0.73). The highest discriminative utility was observed for Model 9 (including 2-hour glucose). This was seen in evaluation of short-term diabetes prediction (using SVM-R; MCC=0.36) and long-term diabetes prediction (using SVM-P and ANN; MCC=0.34). For prediction of short-term diabetes the ANN, SVM-L and SVM-R resulted in ROC AUC=0.77, and for long-term diabetes the SVM-P, SVM-R and ANN methods produced ROC AUC=0.73. In a sensitivity analysis, Model 9 was repeated without the IFI and insulinogenic index variables. This analysis yielded very similar results to the original Model 9 metrics, indicating that 2-hour glucose alone here is sufficient to achieve the highest MCC and ROC AUC values. Although Cox models and random forest generally underperformed the other methods in both short-term and long-term T2D classification (figure 3), there were no large differences in discriminative utilities between GLM and the other machine learning algorithms. No meaningful differences were observed when comparing the results from models filtered by the other evaluated correlation thresholds (|r|>0.8 and no filter). Rank transformation of NMR analytes did not materially affect the results.

Discussion

We have undertaken analyses of the predictive utility of traditional clinical factors, biochemical measures of routinely measured blood lipids, and NMR measures of lipoproteins as predictors of short-term and long-term incident diabetes in the DPP and the DPPOS. We found that standard lipids such as triglycerides and HDL-C were poor predictors of both short-term and long-term incident type 2 diabetes in the DPP and the DPPOS. Triglyceride level was a somewhat better predictor of short-term diabetes than HDL-C in univariate analysis, whereas HDL-C was the better predictor of longer-term diabetes development. This may at least in part be due to the fact that HDL-C is a more stable measure over time than triglycerides. Although TRL size appeared overall to be the best individual NMR-based predictor of incident diabetes, a model containing NMR-measured lipoprotein measures provided minimal added discriminative utility in predicting incident diabetes over the model containing only standard lipids in our study. The best predictive models included measures of glycemia; the inclusion of standard lipids or NMR-based lipoprotein size and concentration measures did not augment the predictive utility of models incorporating glycemia. Multiple studies demonstrate that elevated triglyceride and reduced HDL-C are strongly associated with insulin resistance and diabetes development.2–4 In our analyses, total HDL and triglycerides were also significantly associated with diabetes development. These lipid alterations were shown to be due to lipoprotein size and concentration abnormalities resulting from insulin resistance, which manifest prior to the development of dysglycemia.6–10 There may also be other mechanisms linking lipoprotein abnormalities to diabetes development.29 These studies also showed that these epidemiological associations remained statistically significant after the adjustment of conventional risk factors for diabetes development such as BMI, family history and glycemic measures. This observation suggests that lipid and lipoprotein markers may have value for risk stratification in patients with dysglycemia. In addition, in most studies,6–9 lipoprotein abnormalities such as higher very-low-density lipoprotein (VLDL) size remained associated with incident diabetes after adjusting for triglyceride and HDL-C concentrations, suggesting increased sensitivity as markers of risk compared with standard lipid measurements. Single or multiple NMR analytes, related measures, or composite scores based on NMR analytes such as the lipoprotein insulin resistance score have been shown to strongly associate with glycemia30 31 and type 2 diabetes.10 32 33 However, the question as to whether various lipid and non-lipid NMR analytes can offer improvement in the classification of future type 2 diabetes status is debated. Recent studies showing strong associations between the lipoprotein insulin resistance score and type 2 diabetes incidence showed no or very small improvement in discriminative utilities of NMR lipoproteins when compared with established predictors.9 32 The apparent contradiction between our findings (showing limited predictive utility) versus our supplemental analysis of associations and previous studies (reporting strong associations of the same biomarkers) showcases a common, yet poorly understood, phenomenon. Biomarkers with robust statistical associations in populations are often poor classifiers of future disease status in individuals13 34 35 and therefore may not have value for individualized (n=1) prediction.12 While association studies are important in demonstrating a link between a biomarker and a disease, and may point to potential interventions for preventing or treating a disease, prediction of disease is more useful in making clinical decisions in a given individual. Thus, while lipids and lipoprotein abnormalities are linked to the pathophysiological changes underlying the development of type 2 diabetes and may have importance in identifying individuals with pre-diabetes at increased risk of cardiovascular disease, in our analyses they did not add discriminative value in predicting incident diabetes. Of note, the expanded NMR assessment that included measurement of branched chain amino acids and glycine, which have been shown to be associated with insulin resistance and incident diabetes,36 also did not add discriminative utility to the glycemic model, although glycine and TRL size were among the best individual non-glucose predictors of short-term diabetes and branched chain amino acids were the best univariate predictors of long-term diabetes after glycemic traits in this cohort. Interestingly, TRL size has been shown to improve risk prediction for diabetes elsewhere, particularly in individuals with lower HbA1c values.37 Since lifestyle intervention and to a lesser extent metformin treatment caused beneficial changes in lipids and lipoproteins16 in the DPP, we also tested an NMR model that included changes in analytes after 1 year of lifestyle and metformin treatment, but found that these treatment-related changes added no additional discriminative utility for incident diabetes. Other than known strengths and limitations of using data emanating from randomized trials,38 a further limitation of our analysis should be considered. Our findings were obtained in a clinical trial with pre-diabetes who were at high risk of developing diabetes and whose risk factor distributions may be different from individuals in the general population. Thus, these results might not be generalizable to populations with different distributions of diabetes risk factors. Specifically, our findings might not apply when assessing risk or predicting diabetes in people with normal glucose response. Predictive models for incident diabetes that also include data from healthier populations might provide evidence for stronger discriminative utilities of non-glycemic markers.39 On the other hand, the importance of more precise prediction of diabetes in high-risk subjects remains, since in the DPPOS, even after 15 years of follow-up, over 40% of participants in the placebo group did not develop diabetes.40 An additional potential limitation is the long-term storage of blood samples before the NMR analysis—as all samples were analyzed >7 years after the samples had been obtained, it is possible that some more sensitive molecules, for instance amino acids, could have been affected.41 In our study, the fasting glucose and 2-hour glucose levels at baseline were the best univariate predictors (mean MCC ~0.25 and ROC AUC ~0.67) outperforming all other risk factors including BMI and measures of insulin resistance and insulin secretion. This is not unexpected since these glycemic measures define the criteria for the diagnosis of incident diabetes in the DPP/DPPOS. The addition of 2-hour glucose to the glycemic model including fasting glucose increased its discriminative utility, and although 2-hour glucose measurement is less widely used than it once was to identify people with pre-diabetes, it clearly improved prediction of diabetes over other measures. This improvement in discriminative utility when adding 2-hour glucose to the model had been observed elsewhere in large populations.42 Of note, even the best model (Model 9, max MCC=0.36, max ROC AUC=0.77) that included both the fasting and 2-hour glucose measurements was not able to predict future diabetes well in our study. New biomarkers arising from various ‘omics’ platforms, environmental and lifestyle determinants, personal disease histories and other layers of personal data may prove useful in improving prediction models.43 Lipidomics and metabolomics, in particular, offer promising avenues for further research;44 multiple recent reports demonstrate that these more refined assessments have the potential to yield well-performing predictive models, even when compared with simpler models incorporating glycemic measures.45–50 Future studies should evaluate lipidomic and metabolomic profiles, sampled at multiple occasions—in both fasting and metabolically challenged states—to gain a more holistic picture and hopefully superior predictive models to predict diabetes. Overall prediction was somewhat superior for all models for short-term diabetes (metrics of discriminative utility consistently higher in these models compared with those in the long-term diabetes models), although this was only apparent when assessing predictive validities using ROC AUC. This could reflect a higher predictive validity of risk factors in those at highest risk of diabetes development. In this study, we chose to compare standard statistical methods for prediction with sophisticated machine learning algorithms that can provide improvement to established methods in prediction modeling.51 We acknowledge that the relatively small sample size in our study is a limitation when applying machine learning algorithms. We aimed to offset the small sample size by establishing a nested cross-validation framework so that all observations can be used in test sets and thereby maximize test data size and decrease the chances of mismatch between the random test data and the whole cohort. A simulation study was undertaken to test the utility of machine learning algorithms to detect latent interactions that might impact discriminative utilities. The result of the performed simulation experiment was that latent interactions, if present, would be detected using some of the more sophisticated machine learning algorithms, compared with simpler methods, such as GLM and SVM-L. As no large improvement in the discriminative utilities was observed when comparing prediction models, the results of this study are indicative of the lack of interactions that would meaningfully impact the performance of the used predictive models, for example, between our measures and the treatment arms. In conclusion, although lipid and lipoprotein size and concentration measures associate strongly with incident diabetes, they did not add predictive utility to other standard clinical and glycemic risk factors in the DPP/DPPOS. Even using the best predictors, namely fasting and 2-hour glucose measurements, binary prediction of diabetes development was only moderate. Given that machine learning algorithms were not superior to traditional logistic regression in this setting, we conclude that influential non-linearities in the analyzed data were limited.

48 in total

1. Making Machine Learning Models Clinically Useful.

Authors: Nigam H Shah; Arnold Milstein; Steven C Bagley PhD
Journal: JAMA Date: 2019-10-08 Impact factor: 56.272

Review 2. Lipidomics: potential role in risk prediction and therapeutic monitoring for diabetes and cardiovascular disease.

Authors: Peter J Meikle; Gerard Wong; Christopher K Barlow; Bronwyn A Kingwell
Journal: Pharmacol Ther Date: 2014-02-06 Impact factor: 12.310

3. Association is not prediction: A landscape of confused reporting in diabetes - A systematic review.

Authors: Tibor V Varga; Kristoffer Niss; Angela C Estampador; Catherine B Collin; Pope L Moseley
Journal: Diabetes Res Clin Pract Date: 2020-10-15 Impact factor: 5.602

4. Plasma Lipidomic Profiling and Risk of Type 2 Diabetes in the PREDIMED Trial.

Authors: Cristina Razquin; Estefanía Toledo; Clary B Clish; Miguel Ruiz-Canela; Courtney Dennis; Dolores Corella; Christopher Papandreou; Emilio Ros; Ramon Estruch; Marta Guasch-Ferré; Enrique Gómez-Gracia; Montserrat Fitó; Edward Yu; José Lapetra; Dong Wang; Dora Romaguera; Liming Liang; Angel Alonso-Gómez; Amy Deik; Mónica Bullo; Lluis Serra-Majem; Jordi Salas-Salvadó; Frank B Hu; Miguel A Martínez-González
Journal: Diabetes Care Date: 2018-10-16 Impact factor: 19.112

5. Human high-density lipoprotein particles prevent activation of the JNK pathway induced by human oxidised low-density lipoprotein particles in pancreatic beta cells.

Authors: A Abderrahmani; G Niederhauser; D Favre; S Abdelli; M Ferdaoussi; J Y Yang; R Regazzi; C Widmann; G Waeber
Journal: Diabetologia Date: 2007-04-17 Impact factor: 10.122

6. The Diabetes Prevention Program (DPP): description of lifestyle intervention.

Authors:
Journal: Diabetes Care Date: 2002-12 Impact factor: 19.112

7. Nuclear magnetic resonance lipoprotein abnormalities in prediabetic subjects in the Insulin Resistance Atherosclerosis Study.

Authors: Andreas Festa; Ken Williams; Anthony J G Hanley; James D Otvos; David C Goff; Lynne E Wagenknecht; Steven M Haffner
Journal: Circulation Date: 2005-06-28 Impact factor: 29.690

8. Lipoprotein particle size and concentration by nuclear magnetic resonance and incident type 2 diabetes in women.

Authors: Samia Mora; James D Otvos; Robert S Rosenson; Aruna Pradhan; Julie E Buring; Paul M Ridker
Journal: Diabetes Date: 2010-02-25 Impact factor: 9.461

9. Lipoprotein particle profiles by nuclear magnetic resonance compared with standard lipids and apolipoproteins in predicting incident cardiovascular disease in women.

Authors: Samia Mora; James D Otvos; Nader Rifai; Robert S Rosenson; Julie E Buring; Paul M Ridker
Journal: Circulation Date: 2009-02-09 Impact factor: 29.690

10. Effects of Long-Term Storage at -80 °C on the Human Plasma Metabolome.

Authors: Antje Wagner-Golbs; Sebastian Neuber; Beate Kamlage; Nicole Christiansen; Bianca Bethan; Ulrike Rennefahrt; Philipp Schatz; Lars Lind
Journal: Metabolites Date: 2019-05-17