Literature DB >> 34707419

Machine Learning-Based Prediction for 4-Year Risk of Metabolic Syndrome in Adults: A Retrospective Cohort Study.

Hui Zhang¹, Dandan Chen¹, Jing Shao¹, Ping Zou², Nianqi Cui³, Leiwen Tang¹, Xiyi Wang⁴, Dan Wang¹, Jingjie Wu¹, Zhihong Ye¹.

Abstract

PURPOSE: Machine learning (ML) techniques have emerged as a promising tool to predict risk and make decisions in different medical domains. We aimed to compare the predictive performance of machine learning-based methods for 4-year risk of metabolic syndrome in adults with the previous model using logistic regression. PATIENTS AND METHODS: This was a retrospective cohort study that employed a temporal validation strategy. Three popular ML techniques were selected to build the prognostic models. These techniques were artificial neural networks, classification and regression tree, and support vector machine. The logistic regression algorithm and ML techniques used the same five predictors. Discrimination, calibration, Brier score, and decision curve analysis were compared for model performance.
RESULTS: Discrimination was above 0.7 for all models except classification and regression tree model in internal validation, while the logistic regression model showed the highest discrimination in external validation (0.782) and the smallest discrimination differences. The logistic regression model had the best calibration performance, and ANN also showed satisfactory calibration in internal validation and external validation. For overall performance, logistic regression had the smallest Brier score differences in internal validation and external validation, and it also had the largest net benefit in external validation.
CONCLUSION: Overall, this study indicated that the logistic regression model performed as well as the flexible ML-based prediction models at internal validation, while the logistic regression model had the best performance at external validation. For clinical use, when the performance of the logistic regression model is similar to ML-based prediction models, the simplest and more interpretable model should be chosen.

Entities: Chemical

Keywords: calibration; discrimination; machine learning; metabolic syndrome; prognosis model

Year: 2021 PMID： 34707419 PMCID： PMC8543031 DOI： 10.2147/RMHP.S328180

Source DB: PubMed Journal: Risk Manag Healthc Policy ISSN： 1179-1594

Introduction

Metabolic Syndrome (MetS) refers to a group of risk factors including hypertension, hyperglycemia, dyslipidemia, hypertension, and abdominal obesity.1 It is well known that metabolic risk factors can increase the likelihood of developing heart disease and diabetes mellitus. Research has suggested that MetS predicts a 5-fold increase in the risk of type 2 diabetes mellitus, a 1.5-fold increase in all-cause mortality, and a two-fold increase in the risk of cardiovascular disease.2–4 Moreover, evidence has shown that MetS is related to the occurrence of cancers and chronic kidney disease.5,6 All these influences are associated with increased healthcare costs. Consequently, it is crucial to develop a prediction model to identify individuals who are at a high risk of MetS early and provide the appropriate treatment strategy. A prediction model can estimate the individualized absolute risk probability of a particular outcome. Prediction models can be classified into two categories: (1) diagnostic models, which are developed to identify whether a disease is present; (2) prognostic models, which are developed to detect whether an outcome will occur in the future.7 A prediction model can motivate both physicians and patients in their clinical risk-management decisions, guide patient management, and inform health initiatives.7 Clinical practice would therefore benefit from accurate individual estimates of MetS through the use of prediction models. A systematic review was performed previously by our team to assess the risk of bias of the prognostic prediction models for MetS.8 We found that existing prognostic prediction models for metabolic syndrome had a high risk of bias in their methodological quality. This means that the predictive performance of models can be distorted and cannot be applied in clinical practice. Therefore, we developed a prediction model for 4-year risk of metabolic syndrome in adults using logistic regression which was internally and externally validated based on health examination cohorts.9,10 The reproducibility and generalizability of this prognostic model was also determined. Although the discrimination and calibration of this previous prognostic model were satisfactory, we have only applied the logistic regression technique to develop this model based on these datasets. Over the last few years, an increasing number of advanced and more flexible machine learning (ML) techniques have been developed and we are now in an era of employing artificial intelligence in medicine.11 ML techniques have also emerged as a promising tool to predict risk and make decisions in different medical domains. Data can be learned directly and automatically by ML without any prior assumptions since ML relies on patterns and inferences from the data itself.12 Compared to conventional statistical techniques (eg, logistic regression), ML is capable of “big”, non-linear, and high-dimensional data. In our previous systematic review, some studies used ML to develop prognostic models for metabolic syndrome, but they were at high risk bias due to inadequate reporting of model performance, selection of predictors, and sample size.8 Inaccurate estimation of these models creates barriers to their use in clinical practice. Herein, we present ML-based methods for the prediction of 4-year risk of metabolic syndrome in adults and compare ML model performance with the previous model using logistic regression.9,10

Methods

Source of Data

The healthcare information and management systems of a tertiary hospital provided the data used in this study. For model development and internal validation, a retrospective cohort of health examinations from 1 January 2011 to 31 December 2014 was obtained, and the datasets were used to develop and internally validate the logistic regression model. For external validation, a retrospective cohort of health examinations from January 2015 to 31 December 2018 was obtained. The following inclusion criteria were chosen: (1) participants ≥18 years old; (2) participants who were not diagnosed with metabolic syndrome in 2011 and 2018; (3) participants who attended a health examination for 4 consecutive years.

Outcomes

The outcome was defined as metabolic syndrome (MetS), and the diagnostic criteria used was the 2009 Joint Scientific Statement (harmonizing criteria 2009).13 According to the criteria, the diagnosis of MetS can be established if an individual has three of the following five criteria: Central obesity: waist circumference ≥ 85 cm in men; waist circumference ≥ 80 cm in women; Triglycerides of 1.7 mmol/L or greater or treatment; Plasma high-density lipoprotein cholesterol (HDL-C) <1.0 mmol/L in men or treatment; Plasma high-density lipoprotein cholesterol <1.3 mmol/L in women or treatment; Blood Pressure of 130/85 or greater or treatment; fasting plasma glucose of 5.6 mg/dl or greater or treatment.

Predictor Variables

Based on our previous work, age (years), total cholesterol (TC, mmol/l), serum uric acid (UA, μmol/l), alanine transaminase (ALT, U/L), and body mass index (BMI, Kg/m2) were identified as predictors in the prognostic prediction model.9,10 Therefore, these specified predictors were included in ML-based models.

Sample Size

Usually, the number of events per variable (EPV) are used to evaluate the appropriate sample size in prediction modeling studies. It is recommended that EPV should be above 200 for ML-based prediction models.14 EPV in this study met the criteria.

Machine Learning

In this study, we have chosen three popular ML techniques to develop the prognostic models:15,16 Artificial Neural Networks (ANN), Classification and Regression Tree (CART), and Support Vector Machine (SVM). An ANN includes many artificial neurons called processing units.17 They can simulate the signal transmission which consists of an input layer, several hidden layers, and an output layer. There are many perceptron in each layer, and different weights connect the perceptron between layers. An ANN has self-learning capabilities to produce the best prediction by matching each input with corrected output.17 A CART is a type of decision tree methodology and is a graphical depiction of a decision that helps show a set of decisions followed by potential outcomes.18 A parent node is the starting point for a CART. A possible decision or an action results in binary groups. Two child nodes are then generated from a parent node and this tree-growing methodology leads to the best split based on the splitting criterion. During this course, every child node will become a parent node when it splits. The decision-making process stops when no contribution exists in the further branching. An SVM involves a quadratic optimization problem, which includes minimizing penalties and maximizing margin width. This means that an SVM will iteratively generate the hyperplane to minimize the error, and the datasets are separated into classes to find a maximum marginal hyperplane (MMH) using a mathematical transformation known as the kernel trick. By using a logistic transformation, a rescaled version of the original classifiers scores is generated, which is posterior estimates.

Statistical Analysis

Adhering to the TRIPOD guidelines, continuous variables will not be dichotomized to avoid a loss of prognostic information. For variables with missing data, we employed multiple imputation to generate five imputed data sets.19 The multiple imputation used fully conditional specification methods for final estimates. Variables were removed if they contained missing values above 50%. The ML-based prediction model employed the same inclusion criteria, candidate predictors, and outcome definition, which were used in the logistic regression-based prediction model.9,10 Optimal hyperparameters were selected to enable the ML algorithms to work optimally (). The performance of ML-based prognostic models was evaluated. Discrimination (whether the model separates individuals who suffer from events from those who do not) was evaluated using the C-statistic, with the ideal value as 1. Calibration (whether the estimations of risk are accurate) was assessed using a calibration plot, calibration intercept, and calibration slope. The ideal value for calibration intercept and slope are 0 and 1, respectively. Additionally, the Brier score was evaluated because it can be viewed as overall performance of a model. Decision curve analysis (DCA) was employed to evaluate clinical utility since this technique is increasingly used in supporting decision-making.20 Clinicians can make a judgment about using prediction models as a strategy associated with benefits (treating a true-positive case) and harms (treating a false-positive case). In DCA, the net benefit is the key, which can be interpreted as the proportion of true positives in the observed proportion of false positives weighted by the different threshold probability. We compared ML model performance with the previous logistic regression model.9,10 In order to adhere to TRIPOD guidelines, training and validated performance were reported in this study. The 10-fold cross-validation technique was used for calibration plot, and the bootstrapping method was used for other indexes for internal validation. All analyses were performed using R software (version 3.6.2).

Ethical Approval

This study was approved by the ethics committee of Zhejiang University School of Medicine Sir Run Run Shaw Hospital (20181220-3). The ethics committee has approved a request to waive the documentation of informed consent because of the secondary use of the data. This study was conducted in accordance with the principles of the Declaration of Helsinki.

Results

Baseline Characteristics

In the development and internal validation cohort, there were 6793 participants followed up for 4 years. A total of 1750 participants suffered from MetS (25.76%) by the end of the 4 year period. In the external validation cohort, there were 7681 participants followed up for 4 years and 2222 participants developed MetS (28.93%) during the study period. Baseline characteristics can be found in Table 1.

Table 1

Characteristics of Participants

Candidate Predictor	Derivation Cohort (N=6793) n (%)/Mean±SD	Missing Values n (%)	External Cohort (N=7681) n (%)/Mean±SD	Missing Values n (%)
Age	41.488±10.411	0	41.988±11.246	0
TC	4.419±0.846	0.471	4.834±0.893	0.234
UA	303.305±86.971	0.501	331.758±83.226	0.234
BMI	22.802±2.726	30.59	22.697±2.681	31.194
ALT	21.453±20.271	0.118	20.984±15.619	0.169
Outcome Indicator	Events (N=1750) n (%)/Mean±SD	Missing Values n (%)	Events (N=2222) n (%)/Mean±SD	Missing Values n (%)
WC	86.445±6.972	16.971	86.445±7.327	17.057
TG	1.969±1.383	0.114	1.83±1.165	0.045
HDL-c	1.151±0.29	0.114	1.183±0.3	0.045
SBP	127.648±14.457	4.914	125.101±14.657	3.375
DBP	78.301±10.582	4.914	75.841±10.361	3.375
FPG	5.306±0.993	0.114	5.324±0.762	0.045

Characteristics of Participants

Discrimination

For all models, the C-statistic at internal validation was above 0.7 except CART algorithms (0.663, 95% CI: 0.645–0.681). There was a decrease in the C-statistic for 2 models (logistic regression and ANN) when external validation was performed (Table 2), and the C-statistic of the CART algorithms was the lowest at internal validation (0.663, 95% CI:0.645–0.681) and external validation (0.740, 95% CI: 0.728–0.752). The logistic regression model showed the highest level of discrimination at external validation (0.782, 95% CI: 0.771–0.793), and it had the smallest discrimination differences (0.001).

Table 2

Results of the Discrimination at Internal and External Validation

Model	Internal Validation	External Validation
Logistic regression	0.783(0.772–0.795)	0.782(0.771–0.793)
ANN	0.788 (0.777–0.800)	0.780 (0.769–0.791)
CART	0.663 (0.645–0.681)	0.740 (0.728–0.752)
SVM	0.740 (0.726–0.754)	0.742(0.729–0.755)

Results of the Discrimination at Internal and External Validation

Calibration

In the internal validation, calibration slopes and intercepts were similar for all models except the model based on SVM algorithms. For calibration in the external validation, the calibration intercept (−0.045, 95% CI: −0.113–0.022) and the calibration slope (1.006, 95% CI: −0.011–1.063) of logistic regression were close to 0 and 1, respectively. The SVM model had unsatisfactory calibration performance in internal validation and external validation (Figures 1, 2, and Table 3).

Figure 1

Calibration plot in internal validation.

Figure 2

Calibration plot in external validation.

Table 3

Results of the Calibration Intercept, Calibration Slope, and the Brier Score at Internal and External Validation

Model	Internal Validation	External Validation
Calibration Intercept
Logistic regression	−0.008(−0.088–0.073)	−0.045(−0.113–0.022)
ANN	0.009 (−0.068–0.086)	0.102 (0.032–0.172)
CART	0 (−0.077–0.078)	−0.072 (−0.139–0.004)
SVM	0.310 (0.201–0.420)	0.341(0.250–0.433)
Calibration Slope
Logistic regression	0.995(0.934–1.058)	1.006(−0.011–1.063)
ANN	0.996 (0.934–1.059)	0.974 (0.918–1.031)
CART	1.000 (0.939–1.062)	0.812 (0.760–0.864)
SVM	0.559(0.518–0.600)	0.574(0.539–0.611)
Brier Score
Logistic regression	0.156	0.164
ANN	0.153	0.165
CART	0.218	0.176
SVM	0.185	0.192

Results of the Calibration Intercept, Calibration Slope, and the Brier Score at Internal and External Validation Calibration plot in internal validation. Calibration plot in external validation.

Brier Score

At internal validation, the Brier score of ANN (0.153) was better than that of logistic regression (0.156), and CART (0.218) had the poorest Brier score. The Brier score was very similar for logistic regression (0.164) and ANN (0.165) at external validation, but logistic regression was slightly higher.

Clinical Utility

Figure 3 presents the results of the decision curve analysis. Based on the plotted net benefit, the logistic regression model was superior to the other models (CART, SVM), and was slightly higher than the ANN model.

Figure 3

Decision-curve analysis in external validation dataset.

Discussion

Development and validation of clinical risk prediction models help healthcare providers to identify disease and make clinical decisions. Additionally, it can classify true positive patients with significant risk factors earlier in order to optimize hospital resources. ML methods are starting to be used to improve medical research and clinical care with tremendous potential since electronic health records gained popularity in healthcare systems. Therefore, the results of this study can offer important insights on the differences between flexible ML algorithms and traditional logistic regression in both internal validation and external validation. We found that discrimination was above 0.7 for all models except the CART model in internal validation, while the logistic regression model showed the highest discrimination in external validation and the discrimination difference was the smallest. Based on the results of calibration plot, slope, and intercept, we found the logistic regression model had the best calibration performance, and ANN showed satisfactory calibration. For overall performance, logistic regression had the smallest brier score in external validation, and it also had the smallest brier score differences in internal validation and external validation. Although ANN had the best brier score at internal validation, logistic regression was slightly better than ANN in external validation. The results of the decision curve analysis indicated that the logistic regression model had the largest net benefit in external validation. This could be explained by the fact that the most important prognostic effects are more likely to serve as independent, linear effects.16 According to TRIPOD, external validation provides important information about the transportability of models. The predictive performance of the logistic regression model at internal validation indicated that traditional algorithms can perform as well as ML algorithms, and the logistic regression model had the best performance at external validation. As the logistic regression models can be presented as a formula, a linear predictor called a prognostic index can be calculated. It is easy to use and understand the prediction model based on traditional algorithms, while ML algorithms are limited by the difficulty of explaining these prediction models. Additionally, based on the logistic regression model, the risk calculator can be produced to facilitate clinical application. Therefore, for clinical use, when the performance of the logistic regression model is similar or superior to ML-based prediction models, the simplest and more interpretable model should be chosen.21 Due to enhanced access to electronic health records and increased international collaborations, available data are becoming increasingly large and ML techniques are being widely used for clinical prediction. However, a recent systematic review suggested that compared to the traditional statistical approach, flexible ML techniques did not find incremental value.15 The findings of this study supported this conclusion. It may be true that compared to traditional statistical methods, flexible ML techniques are likely to be data hungry.22 The comparison between the logistic regression model and ML based-models highlighted key findings from a recent systematic review.15 First, reporting of methodology lacked transparency. Second, the findings were incomplete and unclear. Additionally, calibration plot, calibration intercept, and calibration slope were seldom examined. Consequently, there were several strengths in the present study. Adhering to TRIPOD, ML-based prediction models in our study were both internally and externally validated with optimal hyperparameters. Calibration was accessed using a multidimensional approach (Calibration plot, calibration intercept, and calibration slope). Additionally, the Brier score and decision curve analysis were applied to evaluate overall performance and clinical utility, respectively.

Limitations

Some limitations must be considered in this study. First, datasets were collected from one hospital. The multicenter dataset could be a more reliable and generalizable source. Second, we only used five predictors to develop ML-based prediction models according to the previous results.9,10 This may limit the performance of ML-based prediction models as it is well known that flexible ML methods perform better than the classic regression approach when numerous predictors and high-dimensional data are used. However, limited numbers of predictors are less likely to be over-fitting, and using high-dimensional data usually represents fewer details or lower quality.16

Conclusion

Overall, this study indicated that the logistic regression model performed as well as the flexible ML-based prediction models at internal validation, and the logistic regression model had the best performance at external validation. For clinical use, when the performance of the logistic regression model is similar to ML-based prediction models, the simplest and more interpretable model should be chosen. The results indicated that continuous updating of prediction models in different datasets was crucial to ensure the logistic regression model performed as well as the flexible ML-based prediction models. Additionally, the utility was emphasized by comparing classic algorithms to ML through a small number of evidence-based predictor variables.

21 in total

Review 1. Definition of metabolic syndrome: Report of the National Heart, Lung, and Blood Institute/American Heart Association conference on scientific issues related to definition.

Authors: Scott M Grundy; H Bryan Brewer; James I Cleeman; Sidney C Smith; Claude Lenfant
Journal: Circulation Date: 2004-01-27 Impact factor: 29.690

2. The metabolic syndrome and chronic kidney disease in U.S. adults.

Authors: Jing Chen; Paul Muntner; L Lee Hamm; Daniel W Jones; Vecihi Batuman; Vivian Fonseca; Paul K Whelton; Jiang He
Journal: Ann Intern Med Date: 2004-02-03 Impact factor: 25.391

3. Machine learning algorithms performed no better than regression models for prognostication in traumatic brain injury.

Authors: Benjamin Y Gravesteijn; Daan Nieboer; Ari Ercole; Hester F Lingsma; David Nelson; Ben van Calster; Ewout W Steyerberg
Journal: J Clin Epidemiol Date: 2020-03-20 Impact factor: 6.437

Review 4. Reporting and Interpreting Decision Curve Analysis: A Guide for Investigators.

Authors: Ben Van Calster; Laure Wynants; Jan F M Verbeek; Jan Y Verbakel; Evangelia Christodoulou; Andrew J Vickers; Monique J Roobol; Ewout W Steyerberg
Journal: Eur Urol Date: 2018-09-19 Impact factor: 20.096

5. PROBAST: A Tool to Assess Risk of Bias and Applicability of Prediction Model Studies: Explanation and Elaboration.

Authors: Karel G M Moons; Robert F Wolff; Richard D Riley; Penny F Whiting; Marie Westwood; Gary S Collins; Johannes B Reitsma; Jos Kleijnen; Sue Mallett
Journal: Ann Intern Med Date: 2019-01-01 Impact factor: 25.391

6. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration.

Authors: Karel G M Moons; Douglas G Altman; Johannes B Reitsma; John P A Ioannidis; Petra Macaskill; Ewout W Steyerberg; Andrew J Vickers; David F Ransohoff; Gary S Collins
Journal: Ann Intern Med Date: 2015-01-06 Impact factor: 25.391

10. Development and Internal Validation of a Prognostic Model for 4-Year Risk of Metabolic Syndrome in Adults: A Retrospective Cohort Study.

Authors: Hui Zhang; Dandan Chen; Jing Shao; Ping Zou; Nianqi Cui; Leiwen Tang; Dan Wang; Zhihong Ye
Journal: Diabetes Metab Syndr Obes Date: 2021-05-18 Impact factor: 3.168