Literature DB >> 34007195

Derivation and Validation of a Prediction Model for Predicting the 5-Year Incidence of Type 2 Diabetes in Non-Obese Adults: A Population-Based Cohort Study.

Xin-Tian Cai¹, Li-Wei Ji², Sha-Sha Liu¹, Meng-Ru Wang¹, Mulalibieke Heizhati¹, Nan-Fang Li¹.

Abstract

PURPOSE: The aim of this study was to derivate and validate a nomogram based on independent predictors to better evaluate the 5-year risk of T2D in non-obese adults. PATIENTS AND METHODS: This is a historical cohort study from a collection of databases that included 12,940 non-obese participants without diabetes at baseline. All participants were randomised to a derivation cohort (n = 9651) and a validation cohort (n = 3289). In the derivation cohort, the least absolute shrinkage and selection operator (LASSO) regression model was used to determine the optimal risk factors for T2D. Multivariate Cox regression analysis was used to establish the nomogram of T2D prediction. The receiver operating characteristic (ROC) curve, C-index, calibration curve, and decision curve analysis were performed by 1000 bootstrap resamplings to evaluate the discrimination ability, calibration, and clinical practicability of the nomogram.
RESULTS: After LASSO regression analysis of the derivation cohort, it was found that age, fatty liver, γ-glutamyltranspeptidase, triglycerides, glycosylated hemoglobin A1c and fasting plasma glucose were risk predictors, which were integrated into the nomogram. The C-index of derivation cohort and validation cohort were 0.906 [95% confidence interval (CI), 0.878-0.934] and 0.837 (95% CI, 0.760-0.914), respectively. The AUC of 5-year T2D risk in the derivation cohort and validation cohort was 0.916 (95% CI, 0.889-0.943) and 0.829 (95% CI, 0.753-0.905), respectively. The calibration curve indicated that the predicted probability of nomogram is in good agreement with the actual probability. The decision curve analysis demonstrated that the predicted nomogram was clinically useful.
CONCLUSION: Our nomogram can be used as a reasonable, affordable, simple, and widely implemented tool to predict the 5-year risk of T2D in non-obese adults. With this model, early identification of high-risk individuals is helpful to timely intervene and reduce the risk of T2D in non-obese adults.

Entities: Chemical

Keywords: nomogram; prediction model; risk factor; type 2 diabetes

Year: 2021 PMID： 34007195 PMCID： PMC8123981 DOI： 10.2147/DMSO.S304994

Source DB: PubMed Journal: Diabetes Metab Syndr Obes ISSN： 1178-7007 Impact factor: 3.168

Introduction

Type 2 diabetes (T2D) is a disease characterized by hyperglycemia caused by insulin resistance and relatively insufficient insulin.1 In recent decades, due to the increase in obesity and the popularization of sedentary lifestyles, the incidence of T2D has increased rapidly all over the world, and it is rapidly becoming a serious public health problem in developed and developing countries.2 In the Western Pacific region, including China, Japan and other countries, T2D is considered to be an epidemic, and is the region with the largest number of diabetes in the world.3 T2D can lead to a variety of complications, such as cardiovascular disease, diabetic retinopathy, diabetic nephropathy, diabetic neuropathy, diabetic foot and so on.4 The incidence rate of T2D and its complications is one of the main causes of death, which causes huge burden to patients, especially those living in economically backward areas. In 2019, the latest results from the International Diabetes Federation Diabetes Atlas, 9 th edition, estimated that more than 4 million adults die of T2D and its complications, accounting for 11.3% of all-cause mortality. Approximately half (46.2%, 1.9 million) the deaths attributable to T2D occurred in those younger than 60 years, the working age group.5 Obesity caused by a sedentary lifestyle and a high energy diet are generally considered to be the main risk factors for the onset of T2D.6 However, many T2D patients do not present obesity and the specific etiology of T2D in these non-obese individuals is still unclear.7 However, some studies have reported that T2D in non-obese individuals may be caused by lifestyle, intestinal flora structure, genetic and environmental factors.7–9 As a debilitating chronic epidemic, the core content of T2D prevention strategies is to identify individuals with high risk of T2D.10 As a debilitating chronic epidemic, early detection, early diagnosis, and early treatment are important components of T2D prevention and health care.11 The core part of T2D prevention strategy is to screen individuals at high risk for developing T2D.12 Studies have shown that lifestyle changes and early pharmacological interventions can prevent or delay the onset of T2D and reduce the harmful effects of T2D in non-obese adults.13,14 In addition, screening individuals at high risk for prediabetes will facilitate the targeted implementation of low-cost, time-consuming intervention programs while avoiding the burden of prevention and treatment in low-risk populations.1,15 Therefore, it is very important to investigate the high risk factors of T2D in non-obese people and find a reliable, simple and accurate screening tool to identify the high risk group of T2D in non-obese people. This will be conducive to the effective implementation of T2D prevention program in non-obese adults. Risk prediction models have considerable potential for the decision-making process of sub-healthy population and patient management. They can screen individuals to determine the increased risk of undiagnosed diseases, thereby initiating secondary prevention management and treatment, and ultimately improving patient prognosis.16–18 Researchers around the world have developed dozens of T2D risk prediction models in different populations. Although there are a large number of risk prediction models, only a few are routinely used in clinical practice.19–22 In addition to the heterogeneity of population and complex mathematical formulas, the lack of simple and intuitive tools to promote the use of these risk prediction models will be another important obstacle for risk communication between patients and clinicians. The theory of nomogram was put forward by French engineer Philbert Maurice d’Ocagne in 1884. In the field of medicine, the advantage of nomogram is that it can personally predict a certain clinical outcome or the probability of a certain type of event, so it has great value in clinical practice.23 The nomogram transforms the complex regression equation into a simple and intuitive graph. According to the degree of influence of each predictor on the outcome event, assign scores to each predictor, and then add the scores to obtain the total score. Finally, Through the function conversion relationship between the total score and the occurrence probability of the outcome event, the predicted probability of the individual outcome event is calculated.24 In the East Asian population, only a limited number of reliable T2D prediction nomogram models have been established, but all have several limitations.25–28 First of all, most do not consider lifestyle changes, such as physical activity, smoking and alcohol consumption behaviour. Others are based on invasive and cost-effective data, or small-scale and inappropriate cohort selection. Others are based on short-term follow-up or lack of transparent reporting on the steps that produced the pattern. Most importantly, these nomogram models are based on the general population rather than the non-obese population. Using a simple and intuitive nomogram model to accurately estimate the risk of T2D in non-obese adults can help high-risk people take timely intervention measures to reduce the incidence of T2D and improve the quality of life. Therefore, in this study, we constructed and validated a nomogram based on independent predictors to better assess the 5-year risk of T2D in non-obese adults.

Materials and Methods

Data Source

We obtained the data from the “DATADRYAD” database (). This is a website that allows users to download raw data freely. All these authors have waived their copyright on the original research data. Therefore, we can use these data for secondary analysis without infringing the rights of the author. When we use this data, we refer to the Dryad data package in this study in accordance with the Dryad Terms of Service. (Dryad data package: Okamura Takuro, Hashimoto Yoshitaka, Hamaguchi Masahide, Ohobra Akihiro, Kojima Takao, Fukui Michiaki (2018) Data from: Ectopic fat obesity presents the greatest risk for incident type 2 diabetes: a population-based longitudinal study. Dryad Digital Repository. ). The variables of raw data in the database file included baseline information, incident T2D and follow-up duration. The variables were extracted as follows: baseline age, gender, baseline ethanol consumption, baseline fatty liver, baseline body mass index (BMI), baseline waist circumference (WC), baseline alanine aminotransferase (ALT), baseline aspartate transaminase (AST), baseline body weight, baseline habit of exercise, baseline γ-glutamyltranspeptidase (GGT), baseline high density lipoprotein-cholesterol (HDL-C), baseline total cholesterol (TC), baseline triglycerides (TG), baseline glycosylated haemoglobin A1c (HbA1c), baseline alcohol consumption, baseline smoking status, baseline fasting plasma glucose (FPG), baseline systolic blood pressure (SBP), baseline diastolic blood pressure (DBP), follow-up duration, incident T2D.

Study Population

It is worth noting that Okamura Takuro and his collaborators completed the entire study.29 In order to give readers a clearer understanding of the design and implementation steps of the entire study, we give a brief retell. Okamura Takuro et al conducted a population-based longitudinal analysis study at the Murakami Memorial Hospital in Gifu City, Japan, from 2004 to 2015. In this study Okamura Takuro et al used the NAGALA (NAfld in the Gifu Area, Longitudinal Analysis) database to investigate the effect of obesity phenotype on the risk of developing T2D. Since most of the participants require repeated examinations, the researchers conducted a follow-up study of incident T2D diagnosed by blood tests and fatty liver diagnosed by abdominal ultrasound.30 A total of 15,744 subjects were recruited in the original study and screened according to exclusion criteria. Exclusion criteria: (1) alcoholic fatty liver disease, (2) viral hepatitis (defined by measuring hepatitis B antigen and hepatitis C antibody), (3) any drug use during baseline examination, (4) diabetic patients at baseline, (5) participants with missing covariates, and (6) participants diagnosed with T2D at baseline (participants diagnosed by self-report or diagnosed by a fasting plasma glucose ≥ 6.1 mmol/L).

Data Collection and Measurements

In the original study, standardized self-administered questionnaires were used to investigate the medical history and lifestyle factors of all participants, including physical activity, drinking and smoking habits. The researchers assessed alcohol consumption by asking participants about the type and amount of alcohol consumed per week in the previous month, and then estimated the mean weekly alcohol intake. The participants were divided into the following four groups: no or minimal alcohol consumption, <40 g/week; light alcohol consumption, 40–140 g/week; moderate alcohol consumption, 140–280 g/week; or heavy alcohol consumption, >280 g/week. The researchers also classified the participants into three groups according to their smoking status: never smoker, past smoker or current smoker. Non-smokers were defined as participants who never smoked, past smokers were defined as participants who smoked in the past but quit smoking before the baseline examination, and current smokers were defined as participants who smoked at the baseline examination. In addition, the participants were asked about their weekly frequency of physical activity, such as jogging, bicycling, and swimming that lasted long enough to produce perspiration. Exercise status was characterized as regular if any sport that lasted long enough to produce perspiration > 1 ×/week was performed. The questions in the original study were drawn from a validated questionnaire.29,31,32 Blood samples were collected from the participants after fasting for at least 8 h each visit. Samples were centrifuged immediately and were stored at −80°C until analysis. The clinical measurements of GGT, TC, TG, HDL-C, and ALT et al were performed on an automatic analyzer (HITACHI High-Technologies Co., Ltd., Tokyo, Japan). FPG was measured with either the enzymatic or glucose oxidase peroxidative electrode method. HbA1c was measured using a latex agglutination immunoassay, high-performance liquid chromatography or the enzymatic method.

Definitions

The T2D was defined as any of the following: FPG ≥126 mg/dL, HbA1c ≥ 48 mmol/mol, or self-reported during follow-up.33 In Asian populations, obesity was defined as a BMI of 25 kg/m2 as a cut-off,29,34 which has been validated by several studies.35,36 The diagnosis of fatty liver was made by abdominal ultrasound examination performed by trained medical technicians. The images were critically reviewed by a gastroenterologist and the fatty liver was diagnosed without reference to the personal data of the participant. Participants with liver contrast and liver brightness in the four known criteria (vascular blurring, deep attenuation, liver brightness, and hepatorenal echo contrast) were diagnosed with fatty liver.

Ethical Approval

As this study was based on a secondary analysis of previous data and the personal information of the patients in the original data was anonymous, informed consent from the participants was not required. In a previously published article, Okamura et al29 made it clear that the study was approved by the Murakami Memorial Hospital Ethics Committee and that written informed consent was obtained from each participant.

Statistical Analyses

The study is consistent with the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement.37 For the derivation and validation of the nomogram, all participants were randomly divided into derivation cohort and validation cohort with a theoretical ratio of 3:1. Baseline characteristics were expressed as the mean ± standard deviation (normal distribution) or median (quartiles) (skewed distribution) of continuous variables, as well as the frequency or percentage of categorical variables. Two-sample t-tests were used to analyze differences between derivation cohort and validation cohort for continuous variables with normal distribution, Wilcoxon rank-sum tests for continuous variables with non-normal distribution, and chi-square tests for categorical variables. Risk factors were screened by the least absolute shrinkage and selection operator (LASSO) regression analysis, a method of shrinkage and variable selection for linear regression models. To obtain a subset of predictors, LASSO regression analysis compresses the estimates of uncorrelated variables to near zero, and then variables with zero regression coefficients after the contraction process were excluded from the model. Analysis steps: Step 1: valid variables from the dataset were included in the LASSO regression analysis process and the optimal penalty parameter λ is determined by k-fold (10-fold in this case) cross-validation. Step 2: a multivariate Cox regression analysis was used to build the predictive model by combining the features selected in the LASSO regression model. These characteristics were considered to be the hazard ratio (HR) and P-value for the 95% confidence interval (CI). Step 3: the results of the multivariate Cox regression analysis were visualized using forest plots and nomogram respectively. The discrimination ability of the prediction model was evaluated and compared using Harrell’s concordance index (C-index). The time-dependent receiver operating characteristic (ROC) curve analysis was applied to evaluate the prediction model’s performance at different times. When C-index > 0.7, the model has good discrimination. The area under the ROC curve (AUC) is used to evaluate whether the model’s prediction results meet the requirements. AUC usually ranges from 0.5 to 1.0. The closer the AUC value is to 1, the better the identification ability of prediction model is. The calibration curves were drawn in the derivation cohort and the validation cohort respectively, and the calibration effectiveness of the prediction model was evaluated using the Hosmer-Lemeshow fitting test. The clinical usefulness of the nomogram prediction model was evaluated for the whole cohort by a decision curve analysis (DCA). DCA is a method to determine the clinical practicability of the prediction model according to the net benefit at different threshold probabilities. C-index, ROC curve, calibration curve and DCA were analyzed by 1000 bootstrap resamplings to reduce the over-fitting bias. All statistical analyses was performed using R software (R Development Core Team; ; version 3.6.1). For all analyses, P < 0.05 was deemed statistically significant and all tests were two-sided unless otherwise indicated.

Results

Characteristics of the Derivation and Validation Cohorts

A total of 12,940 participants were included in this study, of which 9651 were in the derivation cohort and 3289 were in the validation cohort. A flow diagram of study design is depicted in Figure 1. The crude incidence was 1.6 and 1.4 cases per 100 person-years for the derivation cohort and validation cohort, respectively. The median follow-up time for the derivation cohort was 1972 days (quartile: 994–3427), and the median follow-up time for the validation cohort was 2131 days (quartile: 1065–3584). In addition, there were no statistically significant differences in baseline demographics, clinical characteristics, follow-up time, and incidence of T2D between the two cohorts. Baseline characteristics of the derivation and validation cohorts are summarised in Table 1.

Figure 1

Flow diagram of study design.

Table 1

Demographic and Clinical Characteristics of Study Population in the Derivation and Validation Cohorts

Characteristic	All	Derivation Cohort	Validation Cohort	P-value
No. of participants	12,940	9651	3289
Age (years)	43.64 ± 8.99	43.64 ± 9.01	43.63 ± 8.94	0.983
BMI (kg/m2)	21.11 ± 2.14	21.11 ± 2.14	21.12 ± 2.11	0.789
WC (cm)	74.03 ± 7.32	74.04 ± 7.32	73.99 ± 7.32	0.757
ALT (IU/L)	16.00 (12.00–21.00)	16.00 (12.00–21.00)	16.00 (12.00–21.00)	0.311
AST (IU/L)	17.73 ± 8.16	17.70 ± 6.55	17.81 ± 11.69	0.641
GGT (IU/L)	14.00 (11.00–20.00)	14.00 (11.00–20.00)	14.00 (11.00–20.00)	0.568
HDL-c (mmol/L)	1.51 ± 0.40	1.51 ± 0.41	1.51 ± 0.39	0.836
TC (mmol/L)	5.07 ± 0.85	5.07 ± 0.85	5.08 ± 0.85	0.654
TG (mmol/L)	0.68 (0.46–1.00)	0.68 (0.46–1.00)	0.68 (0.46–0.99)	0.680
HbA1c (mmol/mol)	32.83 ± 3.44	32.81 ± 3.43	32.88 ± 3.47	0.350
FPG (mg/dl)	92.21 ± 7.35	92.22 ± 7.34	92.17 ± 7.38	0.773
SBP (mmHg)	112.35 ± 14.07	112.34 ± 14.09	112.38 ± 14.01	0.879
DBP (mmHg)	70.12 ± 9.94	70.10 ± 9.96	70.18 ± 9.87	0.686
Gender [n(%)]				0.280
Female	6406 (49.51%)	4751 (49.23%)	1655 (50.32%)
Male	6534 (50.49%)	4900 (50.77%)	1634 (49.68%)
Fatty liver [n(%)]				0.359
No	11,603 (89.67%)	8640 (89.52%)	2963 (90.09%)
Yes	1337 (10.33%)	1011 (10.48%)	326 (9.91%)
Habit of exercise [n(%)]				0.222
No	10,614 (82.02%)	7893 (81.78%)	2721 (82.73%)
Yes	2326 (17.98%)	1758 (18.22%)	568 (17.27%)
Alcohol consumption [n(%)]				0.889
Non	9955 (76.93%)	7421 (76.89%)	2534 (77.04%)
Light	1467 (11.34%)	1101 (11.41%)	366 (11.13%)
Moderate	1104 (8.53%)	826 (8.56%)	278 (8.45%)
Heavy	414 (3.20%)	303 (3.14%)	111 (3.37%)
Smoking status [n(%)]				0.033
Never	7846 (60.63%)	5789 (59.98%)	2057 (62.54%)
Past	2349 (18.15%)	1786 (18.51%)	563 (17.12%)
Current	2745 (21.21%)	2076 (21.51%)	669 (20.34%)
Follow-up duration (days)	2016.00 (1017.00–3465.25)	1972.00 (994.00–3426.50)	2131.00 (1065.00–3584.00)	0.074
Incident T2D [n(%)]				0.504
No	12,739 (98.45%)	9497 (98.40%)	3242 (98.57%)
Yes	201 (1.55%)	154 (1.60%)	47 (1.43%)

Notes: Data are presented as n (%), mean ± SD or median (interquartile range).

Abbreviations: BMI, body mass index; WC, waist circumference; ALT, alanine aminotransferase; AST, aspartate aminotransferase; GGT, γ-glutamyl transpeptidase; HDL-c, high-density lipoprotein cholesterol; TC, total cholesterol; TG, triglyceride; HbA1c, glycosylated hemoglobin A1c; FPG, fasting plasma glucose; SBP, systolic blood pressure; DBP, diastolic blood pressure; T2D, type 2 diabetes.

Demographic and Clinical Characteristics of Study Population in the Derivation and Validation Cohorts Notes: Data are presented as n (%), mean ± SD or median (interquartile range). Abbreviations: BMI, body mass index; WC, waist circumference; ALT, alanine aminotransferase; AST, aspartate aminotransferase; GGT, γ-glutamyl transpeptidase; HDL-c, high-density lipoprotein cholesterol; TC, total cholesterol; TG, triglyceride; HbA1c, glycosylated hemoglobin A1c; FPG, fasting plasma glucose; SBP, systolic blood pressure; DBP, diastolic blood pressure; T2D, type 2 diabetes. Flow diagram of study design.

Characteristics Selection by LASSO Regression Analysis

Through LASSO regression analysis, we obtained 6 non-zero coefficient characteristics, which shows that we reduced the 18 indicators to 6 indicators. Figure 2A shows the optimization parameters (Lambda) of LASSO regression model were selected by 10-fold cross-validation. Dotted vertical lines were drawn at the optimal values by using the minimum standard and the minimum standard of 1 SE (the 1-SE standard). Figure 2B shows the LASSO coefficient profiles of the 6 features. A coefficient profile plot was produced against the log (Lambda) sequence. Vertical lines were drawn at the value selected using 10-fold cross-validation, where optimal lambda resulted in 6 features with non-zero coefficients. These features included age, fatty liver, GGT, TG, HbA1c and FPG (Table 2).

Figure 2

Demographic and clinical feature selection using the LASSO regression model. (A) 10-fold cross-validation via minimum criteria was applied for optimal parameter (lambda) selection through LASSO model. Partial likelihood deviance (binomial deviance) curve was schemed versus log (lambda). Dotted vertical lines were drawn at the optimal values by using the minimum criteria and 1 SE of the minimum criteria (the 1-SE criteria). (B) LASSO coefficient profiles of the 19 features. A coefficient profile plot was produced against the log (lambda) sequence. Vertical line was generated at the value selected by 10-fold cross-validation, where optimal lambda resulted in six features with nonzero coefficients.

Table 2

Risk Factors of Type 2 Diabetes According to the LASSO Regression Model in Non-Obese Adults

Factors	Coefficients	Lambda.1se
Fatty liver	0.465048274	0.0051
Age	0.002806796
GGT	0.001415237
TG	0.094786652
HbA1c	0.200263334
FPG	0.084919892

Abbreviations: LASSO, least absolute shrinkage and selection operator; GGT, γ-glutamyl transpeptidase; TG, triglyceride; HbA1c, glycosylated hemoglobin A1c; FPG, fasting plasma glucose.

Risk Factors of Type 2 Diabetes According to the LASSO Regression Model in Non-Obese Adults Abbreviations: LASSO, least absolute shrinkage and selection operator; GGT, γ-glutamyl transpeptidase; TG, triglyceride; HbA1c, glycosylated hemoglobin A1c; FPG, fasting plasma glucose. Demographic and clinical feature selection using the LASSO regression model. (A) 10-fold cross-validation via minimum criteria was applied for optimal parameter (lambda) selection through LASSO model. Partial likelihood deviance (binomial deviance) curve was schemed versus log (lambda). Dotted vertical lines were drawn at the optimal values by using the minimum criteria and 1 SE of the minimum criteria (the 1-SE criteria). (B) LASSO coefficient profiles of the 19 features. A coefficient profile plot was produced against the log (lambda) sequence. Vertical line was generated at the value selected by 10-fold cross-validation, where optimal lambda resulted in six features with nonzero coefficients.

Multivariate Cox Regression Analysis in the Training Cohort

With T2D as the dependent variable, the 6 potential risk factors selected in the LASSO regression method were used as independent variables, including age, fatty liver, GGT, TG, HbA1c and FPG. The results of multivariate Cox regression analysis showed that age (HR 1.04; 95% CI 1.02–1.06), fatty liver (HR 1.98; 95% CI 1.46–2.68), GGT (HR 1.03; 95% CI 1.01–1.05), TG (HR 1.20; 95% CI 1.03–1.41), HbA1c (HR 1.28; 95% CI 1.22–1.34), and FPG (HR 1.12; 95% CI 1.09–1.15) were independent risk factors for T2D in the non-obese population. The results of multivariate Cox regression analysis are shown in the forest plot (Figure 3).

Figure 3

Forest plot of the HR of the selected feature. Use forest plot to visualize multivariate Cox regression analysis.

Establishment of a Predicting Nomogram

The nomogram was established to predict the 5-year risk of T2D in non-obese adults based on the significant predictors (age, fatty liver, GGT, TG, HbA1c and FPG) in the training cohort (Figure 4). Each value for the individuals was determined according to the top Points scale, and then the points for each variable were added. Finally, a personalized 5-year risk of T2D was obtained according to Total Points scale.

Figure 4

Nomogram for predicting the 5-year risk of T2D in non-obese adults. To use the nomogram, find the position of each variable on the corresponding axis. A vertical line was drawn from that value to the top points scale to determine the number of points that were assigned by that variable value. Then, the points from each variable value were added. Finally, draw a line from the total point axis to estimate the 5-year risk of T2D at the lower line of the nomogram.

Model Performance for Derivation and Validation Cohort

The C-index of the nomogram in derivation cohort was 0.906 (95% CI, 0.878–0.934), whereas in validation cohort was 0.837 (95% CI, 0.760–0.914) (Table 3), indicating that the nomogram has good ability of discrimination and prediction. The results of time-dependent ROC analyses are shown in Figure 5A and B. The predictive model of the nomogram resulted in AUC ranging from 0.906 to 0.929 for different time points in derivation cohort. The predictive model of the nomogram resulted in AUC ranging from 0.740 to 0.869 for different time points in validation cohort. In the derivation cohort, the AUC for 5-year risk of T2D in non-obese adults was 0.916 (95% CI, 0.889–0.943). Likewise, in the validation cohort, the AUC for 5-year risk of T2D in non-obese adults was 0.829 (95% CI, 0.753–0.905), indicating that the nomogram has good ability of discrimination. The calibration of the nomogram prediction model was evaluated by Hosmer-Lemeshow fitting test, and calibration curves were obtained (Figure 6A and B). Using the Hosmer-Lemeshow test, there was no statistically significant difference between the predicted risk of T2D and the observed risk (P > 0.05). The calibration curves of the nomogram model for the probability of T2D in the non-obese population showed good prediction accuracy between the predicted and observed values, both in the derivation and validation cohorts.

Table 3

C-Index in the Nomogram on Derivation Cohort and Validation Cohorts

	C-Index	95% CI	Dxy	Variance	n
Derivation cohort	0.906	0.878–0.934	0.812	0.028	9651
Validation cohort	0.837	0.760–0.914	0.674	0.077	3289

Figure 5

Time-dependent receiver-operating characteristic (ROC) curves of the model in the derivation and validation cohort. (A) Time-dependent ROC curve of the model in the derivation cohort. (B) Time-dependent ROC curve of the model in the validation cohort. The solid and dashed lines depict the AUC and random chance, respectively. *Using bootstrap resampling (times = 1000).

Figure 6

Calibration curves for the derivation and validation cohort models. (A) Calibration curve of the model in the derivation cohort. (B) Calibration curve of the model in the validation cohort. The red solid line represents an ideal predictive model, and the solid black line shows the actual performance of the predictive model. The yellow shadow represents 95% confidence interval. The calibration curves showed a good correlation between the predicted probability and actual probability. *Using bootstrap resampling (times = 1000).

C-Index in the Nomogram on Derivation Cohort and Validation Cohorts Time-dependent receiver-operating characteristic (ROC) curves of the model in the derivation and validation cohort. (A) Time-dependent ROC curve of the model in the derivation cohort. (B) Time-dependent ROC curve of the model in the validation cohort. The solid and dashed lines depict the AUC and random chance, respectively. *Using bootstrap resampling (times = 1000). Calibration curves for the derivation and validation cohort models. (A) Calibration curve of the model in the derivation cohort. (B) Calibration curve of the model in the validation cohort. The red solid line represents an ideal predictive model, and the solid black line shows the actual performance of the predictive model. The yellow shadow represents 95% confidence interval. The calibration curves showed a good correlation between the predicted probability and actual probability. *Using bootstrap resampling (times = 1000).

Clinical Usefulness of the Predicting Nomogram

Using data from the whole cohort, DCA showed clinical utility for the 5-year risk of T2D in a non-obese population. The farther the model curve is from the black and light gray lines, the better the clinical application effect of the nomogram prediction model. When the threshold probability was between 1% and 69.5%, the nomogram of T2D risk prediction provided more net benefits than “all individuals with T2D” or “no individuals with T2D” (Figure 7), indicating that the nomogram was clinically useful.

Figure 7

The Decision curve analysis of the nomogram in the whole cohort. Net benefit was produced against the high risk threshold. The black line represents the net benefit when none of the participants are considered to develop diabetes, while the light gray line represents the net benefit when all participants are considered to develop diabetes. The area between the “no treatment line” (black line) and “all treatment line” (light gray line) in the model curve indicates the clinical utility of the model. The farther the model curve is from the black and light gray lines, the better the clinical use of the nomogram. *Using bootstrap resampling (times = 1000).

Discussion

T2D is a major health problem that is growing in severity worldwide.38 The number of people with T2D aged 20–79 years was predicted to rise to 642 million by 2040.39 The prevalence of T2D is also increasing rapidly in the Western Pacific region and is expected to exceed 202 million by 2035, attributed to rapid urbanization and changes in lifestyle and epigenetics.40,41 T2D can lead to a variety of complications that can cause severe physical and psychological distress and place an increasing burden on the socioeconomic and public health care systems.42 Obesity is now a well-known independent risk factor for T2D.8 Many mechanisms have been reported regarding the association of obesity with T2D, including insulin resistance, the prohibition of insulin action, and relative insufficiency of insulin action.43,44 Therefore, weight loss in obese individuals is effective in controlling T2D.43 Recently, T2D in non-obese individuals has attracted attention. In fact, according to an epidemiological study in Japan, over 60% of people with T2D in this Japanese population were not obese.45 And another study reported that 68.2% of instances of newly detected T2D was those of non-obese T2D in southern China.46 Compared with obese individuals, the key defect that causes the development of hyperglycemia in T2D in non-obese individuals is impaired pancreatic insulin secretion and decreased insulin resistance.45,47 It is worth noting that the increased risk of cardiovascular disease in non-obese type 2 diabetic patients is similar to that of obese type 2 diabetic patients.48,49 In addition, a meta-analysis showed that adults with normal weight at the time of incident T2D had higher non-cardiovascular and cardiovascular mortality compared with obese adults.50 Therefore, non-obese T2D may have underlying pathophysiological changes, which may lead to a worse prognosis than obese T2D. However, a possible mechanism explaining this phenomenon has yet to be elucidated.51–53 Therefore, early detection of individuals at high risk for T2D in the non-obese population is essential to reduce the incidence of T2D, the complications of T2D, and the socioeconomic burden, which prompted us to carry out this study. Although different T2D prediction models based on demographic information and clinical measurements have been established, they have been used mainly in European and American populations. In the East Asian population, only a limited number of reliable T2D prediction models have been established, each containing different risk predictors. In addition, their predictive performance and clinical utility vary greatly. In 2019, Lin et al26 performed a Cox proportional hazards regression analysis to develop a nomogram to predict the 5-year incidence of T2D based on age, sex, hypertensive dyslipidemia, smoking status, BMI, and family history of T2D. The C-index of the model was 0.815. However, they did not perform a decision curve analysis to assess the clinical utility of the model. In addition, categorizing the risk predictors of continuous variables will cause detrimental information loss and affect the ability to detect real relationships. In 2019, Wang et al28 developed a nomogram to predict the risk of 2-year T2D in healthy residents of mainland China based on BMI, age, FPG, HDL-C, LDL-C, and TG. The AUCs for females and males were 0.847 and 0.755, respectively. Consistent with our nomogram, their nomogram contains continuous predictor variables. However, they did not consider HbA1c, fatty liver, smoking and history of alcohol consumption. In addition, they did not measure how closely the predicted risk fits the actual risk. Compared with the above-mentioned similar studies, our nomogram fills these gaps. First of all, our research has a large sample size (n = 12,940). Second, we comprehensively considered new factors in the study, such as HbA1c, fatty liver, smoking and drinking history. Third, nomograms and decision curves were used to visualize risk scores to improve their clinical utility. Fourth, the performance of nomogram was evaluated using multiple new methods, including calibration plots and decision curves. Nomogram is considered to be a practical and reliable prediction tool, which can evaluate the individual probability of clinical events by integrating different prognosis and determinants, and quantify individual risk by combining a variety of important prognostic factors.24,28 Nomogram relies on a user-friendly digital interface to improve accuracy and provide easier to understand prognosis to help better clinical decision-making.24,54,55 In this population-based cohort study, we developed and validated an individualized prediction nomogram of 5-year risk of T2D in a non-obese population. To the best of our knowledge, our study is the first to establish an individualized predictive nomogram for predicting the 5-year risk of T2D in a non-obese population. And because this is a retrospective cohort study, it can significantly reduce the risk of selection bias and information bias. A large degree of discrimination and prediction ability was found in both the derivation cohort and the validation cohort, which indicates that there is a relatively good predictive ability to distinguish individuals with T2D risk from individuals without T2D risk. The calibration curve indicated that the prediction model was relatively accurate in predicting the risk of T2D. In addition, the decision curve analysis proved that the clinical application of nomogram can avoid additional T2D screening for individuals with low risk of T2D in non-obese adults, reduce economic burden and medical cost. Risk predictors in this prediction model included age, GGT, TG, FPG, HbA1c, and fatty liver. These variables identified as risk factors for T2D are consistent with previous studies.25,26,28,56 T2D usually occurs in adults and is more common in the elderly. Advanced age is an unalterable risk factor for T2D. Ageing of pancreatic β-cells leads to decreased glucose sensitivity and defective insulin secretion, leading in hyperglycaemia and T2D.57 Age-related glucose intolerance is usually accompanied by insulin resistance and β-cell dysfunction. The epigenetic changes caused by ageing may affect gene expression and insulin secretion in pancreatic islets.58 Bacos et al59 found that age-related changes in DNA methylation in human pancreatic islets are related to insulin secretion and T2D. GGT exists on the surface of most cell types and is highly active in tissues, especially in liver, pancreas, kidney and bile duct. Traditionally, serum GGT has been considered as a marker of hepatobiliary disease or excessive alcohol consumption.60 Although many epidemiological studies have confirmed that serum GGT levels are closely associated with the onset of T2D and are important biochemical risk indicators for predicting T2D, the specific biological mechanism of serum GGT and T2D is not completely clear.61 At present, researchers have proposed several possible mechanisms to explain the relationship between GGT and T2D. Firstly, GGT is not only a sensitive indicator of oxidative stress, but also a direct contributor to oxidative stress. Oxidative stress is a factor that reduces insulin secretion by destroying pancreatic cells.60 Secondly, we propose that GGT is associated with T2D through hepatic steatosis (such as non-alcoholic fatty liver), which is associated with impaired hepatic insulin resistance.62 Finally, genetic variants of GGT may be a factor that increases the risk of T2D.60,63 According to previous studies, dyslipidemia is a well-known independent risk factor for T2D and impaired fasting glucose.64,65 Similar to these reports, in our prediction model, individuals with dyslipidemia in the non-obese population had a higher T2D risk score.25,56 Dyslipidemia and T2D often coexist in the same individual. As an endocrine organ, adipose tissue can affect glucose and lipid metabolism, and TG is the most abundant lipid in adipose tissue.66 TG itself may directly lead to the disorder of glucose metabolism. Excess adipose tissue releases many lipid metabolites, pro-inflammatory cytokines and cellular stress, all of which mediate insulin resistance.67,68 FPG level can reflect the secretion level and function of basal insulin. Elevated FPG levels are associated with an increased risk of T2D, which may be closely related to insulin response and insulin sensitivity.69 HbA1c level not only reflects the blood glucose level at a certain time point, but also reflects the average blood glucose level in the past three months.70 Previous studies have shown that HbA1c can be used as a predictor of T2D, diabetic complications and diabetic drug response.71,72 A broader prospective open cohort study in England found that both FPG and HbA1c improved the ability to identify 10-year T2D risk prediction models.73 Objectively, there are some limitations to our study. First, the diagnosis of T2D depends on fasting blood glucose ≥ 126 mg/dl, HbA1c ≥ 48 mmol/mol or self-reported T2D, rather than 2-hour oral glucose tolerance test, which may be underestimated. Secondly, there is a lack of data of LDL-C, insulin and fasting C-peptides levels in the database, so it is not possible to compare the accuracy of triglyceride glucose index and homeostasis model assessment of insulin resistance in predicting risk of T2D. Third, as this large cohort study was conducted in Japan. Therefore, whether the results of this study can be extended to other races and some special groups, such as pregnant women and children, requires further validation by external cohorts. Fourth, the nomogram is based on a retrospective cohort, excluding individuals with incomplete data, which may lead to selection bias. Finally, this report is a secondary analysis on the basis of the existing database. Although many confounding factors have been adjusted, some variables not included in the database, such as family history, diet, rest and sleep, psychological factors and data of participants receiving treatment, have not been adjusted. Therefore, the potential impact of these residual confounding factors on the results can not be ignored.

Conclusion

We developed and validated a personalized prediction nomogram for non-obese adults with 5-year T2D risk, including age, GGT, TG, FPG, HbA1c, and fatty liver. The application of this model is helpful for clinicians, especially community medical workers, to evaluate the risk of T2D in non-obese adults, and formulate effective primary prevention strategies for T2D according to the evaluation results, to reduce the risk of T2D.

73 in total

Review 1. Managing Dyslipidemia in Type 2 Diabetes.

Authors: Adam J Nelson; Simon K Rochelau; Stephen J Nicholls
Journal: Endocrinol Metab Clin North Am Date: 2017-12-20 Impact factor: 4.741

Review 2. The global agenda for the prevention of type 2 diabetes.

Authors: William H Herman
Journal: Nutr Rev Date: 2017-01 Impact factor: 7.110

Review 3. Nomograms in oncology: more than meets the eye.

Authors: Vinod P Balachandran; Mithat Gonen; J Joshua Smith; Ronald P DeMatteo
Journal: Lancet Oncol Date: 2015-04 Impact factor: 41.316

4. Non alcoholic fatty liver disease and risk of incident diabetes in subjects who are not obese.

Authors: K-C Sung; D-C Seo; S-J Lee; M-Y Lee; S H Wild; C D Byrne
Journal: Nutr Metab Cardiovasc Dis Date: 2019-02-07 Impact factor: 4.222

5. IDF Diabetes Atlas estimates of 2014 global health expenditures on diabetes.

Authors: Joao da Rocha Fernandes; Katherine Ogurtsova; Ute Linnenkamp; Leonor Guariguata; Till Seuring; Ping Zhang; David Cavan; Lydia E Makaroff
Journal: Diabetes Res Clin Pract Date: 2016-04-27 Impact factor: 5.602

6. The metabolic syndrome as a predictor of nonalcoholic fatty liver disease.

Authors: Masahide Hamaguchi; Takao Kojima; Noriyuki Takeda; Takayuki Nakagawa; Hiroya Taniguchi; Kota Fujii; Tatsushi Omatsu; Tomoaki Nakajima; Hiroshi Sarui; Makoto Shimazaki; Takahiro Kato; Junichi Okuda; Kazunori Ida
Journal: Ann Intern Med Date: 2005-11-15 Impact factor: 25.391

7. Total and High Molecular Weight Adiponectin Levels and Prediction of Cardiovascular Risk in Diabetic Patients.

Authors: Dagmar Horáková; Kateřina Azeem; Radka Benešová; Dalibor Pastucha; Vladimír Horák; Lenka Dumbrovská; Arnošt Martínek; Dalibor Novotný; Zdeněk Švagera; Milada Hobzová; Dana Galuszková; Vladimír Janout; Sandra Doněvská; Jana Vrbková; Helena Kollárová
Journal: Int J Endocrinol Date: 2015-05-05 Impact factor: 3.257

8. Age-specific impact of diabetes mellitus on the risk of cardiovascular mortality: An overview from the evidence for Cardiovascular Prevention from Observational Cohorts in the Japan Research Group (EPOCH-JAPAN).

Authors: Yoichiro Hirakawa; Toshiharu Ninomiya; Yutaka Kiyohara; Yoshitaka Murakami; Shigeyuki Saitoh; Hideaki Nakagawa; Akira Okayama; Akiko Tamakoshi; Kiyomi Sakata; Katsuyuki Miura; Hirotsugu Ueshima; Tomonori Okamura
Journal: J Epidemiol Date: 2017-01-05 Impact factor: 3.211

9. Aging and stress induced β cell senescence and its implication in diabetes development.

Authors: Na Li; Furong Liu; Ping Yang; Fei Xiong; Qilin Yu; Jinxiu Li; Zhiguang Zhou; Shu Zhang; Cong-Yi Wang
Journal: Aging (Albany NY) Date: 2019-11-13 Impact factor: 5.682

10. Personalized risk prediction for type 2 diabetes: the potential of genetic risk scores.

Authors: Kristi Läll; Reedik Mägi; Andrew Morris; Andres Metspalu; Krista Fischer
Journal: Genet Med Date: 2016-08-11 Impact factor: 8.822

4 in total

1. Establishment and validation of a nomogram that predicts the risk of type 2 diabetes in obese patients with non-alcoholic fatty liver disease: a longitudinal observational study.

Authors: Xintian Cai; Mengru Wang; Shasha Liu; Yujuan Yuan; Junli Hu; Qing Zhu; Jing Hong; Guzailinuer Tuerxun; Huimin Ma; Nanfang Li
Journal: Am J Transl Res Date: 2022-07-15 Impact factor: 3.940

2. Associations Between the Metabolic Score for Insulin Resistance Index and the Risk of Type 2 Diabetes Mellitus Among Non-Obese Adults: Insights from a Population-Based Cohort Study.

Authors: Xin-Tian Cai; Qing Zhu; Sha-Sha Liu; Meng-Ru Wang; Ting Wu; Jing Hong; Jun-Li Hu; Nanfang Li
Journal: Int J Gen Med Date: 2021-11-06

3. Risk prediction models for incident type 2 diabetes in Chinese people with intermediate hyperglycemia: a systematic literature review and external validation study.

Authors: Shishi Xu; Ruth L Coleman; Qin Wan; Yeqing Gu; Ge Meng; Kun Song; Zumin Shi; Qian Xie; Jaakko Tuomilehto; Rury R Holman; Kaijun Niu; Nanwei Tong
Journal: Cardiovasc Diabetol Date: 2022-09-13 Impact factor: 8.949

4. Application of a Novel Prediction Model for Predicting 2-Year Risk of Non-Alcoholic Fatty Liver Disease in the Non-Obese Population with Normal Blood Lipid Levels: A Large Prospective Cohort Study from China.

Authors: Liwei Ji; Xintian Cai; Yang Bai; Tao Li
Journal: Int J Gen Med Date: 2021-06-28

4 in total