| Literature DB >> 34073854 |
Hyerim Kim1, Dong Hoon Lim2, Yoona Kim3.
Abstract
Few studies have been conducted to classify and predict the influence of nutritional intake on overweight/obesity, dyslipidemia, hypertension and type 2 diabetes mellitus (T2DM) based on deep learning such as deep neural network (DNN). The present study aims to classify and predict associations between nutritional intake and risk of overweight/obesity, dyslipidemia, hypertension and T2DM by developing a DNN model, and to compare a DNN model with the most popular machine learning models such as logistic regression and decision tree. Subjects aged from 40 to 69 years in the 4-7th (from 2007 through 2018) Korea National Health and Nutrition Examination Survey (KNHANES) were included. Diagnostic criteria of dyslipidemia (n = 10,731), hypertension (n = 10,991), T2DM (n = 3889) and overweight/obesity (n = 10,980) were set as dependent variables. Nutritional intakes were set as independent variables. A DNN model comprising one input layer with 7 nodes, three hidden layers with 30 nodes, 12 nodes, 8 nodes in each layer and one output layer with one node were implemented in Python programming language using Keras with tensorflow backend. In DNN, binary cross-entropy loss function for binary classification was used with Adam optimizer. For avoiding overfitting, dropout was applied to each hidden layer. Structural equation modelling (SEM) was also performed to simultaneously estimate multivariate causal association between nutritional intake and overweight/obesity, dyslipidemia, hypertension and T2DM. The DNN model showed the higher prediction accuracy with 0.58654 for dyslipidemia, 0.79958 for hypertension, 0.80896 for T2DM and 0.62496 for overweight/obesity compared with two other machine leaning models with five-folds cross-validation. Prediction accuracy for dyslipidemia, hypertension, T2DM and overweight/obesity were 0.58448, 0.79929, 0.80818 and 0.62486, respectively, when analyzed by a logistic regression, also were 0.52148, 0.66773, 0.71587 and 0.54026, respectively, when analyzed by a decision tree. This study observed a DNN model with three hidden layers with 30 nodes, 12 nodes, 8 nodes in each layer had better prediction accuracy than two conventional machine learning models of a logistic regression and decision tree.Entities:
Keywords: deep neural network; dyslipidemia; hypertension; nutritional intake; overweight/obesity; prediction; type 2 diabetes mellitus
Mesh:
Year: 2021 PMID: 34073854 PMCID: PMC8197245 DOI: 10.3390/ijerph18115597
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Figure 1DNN structure.
Classification of independent variables.
| Independent Variables | |
|---|---|
| Abbreviations | Full Names |
| N_INTK | Food intake (g) |
| N_EN | Energy intake (Kcal) |
| N_PROT | Protein intake (g) |
| N_FAT | Fat intake (g) |
| N_CHO | Carbohydrate intake (g) |
| N_NA | Sodium intake (mg) |
| N_K | Potassium intake (mg) |
N_CHO, carbohydrate intake (g); N_EN, energy intake (Kcal); N_FAT, fat intake (g); N_INTK, food intake (g); N_K, potassium intake (mg); N_NA, sodium intake (mg); N_PROT, protein intake (g).
Classification of dependent variables.
| Dependent Variables | |||
|---|---|---|---|
| Abbreviations | Full Names | Diagnosis | Diagnostic Criteria |
| HE_sbp | Systolic blood pressure (Mean value of 2–3 BP measurements) | ≥80 mmHg | Hypertension |
| HE_dbp | Diastolic blood pressure (Mean value of 2–3 BP measurements) | ≥140 mmHg | |
| HE_BMI | Body mass index | ≥23 kg/m2 | Overweight/obesity |
| HE_glu | Fasting blood glucose | ≥126 mg/dL | T2DM |
| HE_HbA1c | Glycated hemoglobin | ≥6.5% | |
| HE_chol | Total cholesterol | ≥240 mg/dL | Dyslipidemia |
| HE_HDL_st2 | Calibration of high-density lipoprotein cholesterol | <40 mg/dL | |
| HE_TG | Triglyceride | ≥200 mg/dL | |
HE_BMI, overweight/obesity; HE_chol, total cholesterol; HE_dbp, diastolic blood pressure (mean value of 2–3 blood pressure measurements); HE_glu, fasting blood glucose; E_HbA1c, glycated hemoglobin; HE_HDL_st2, calibration of high-density lipoprotein cholesterol; HE_sbp, systolic blood pressure (mean value of 2–3 blood pressure measurements); HE_TG, triglyceride; T2DM, type 2 diabetes mellites. Dyslipidemia, according to the Korean Society of Lipid and Atherosclerosis, dyslipidemia is defined as any one of the following: total cholesterol level ≥ 40 mg/dL, high-density lipoprotein cholesterol level < 40 mg/dL, triglyceride level ≥ 200 mg/dL, low-density lipoprotein cholesterol level ≥ 160 mg/dL, or the use of a lipid-lowering drug; hypertension, according to Korean hypertension, hypertension was defined when systolic blood pressure was ≥140 mmHg or diastolic blood pressure was ≥80 mmHg; overweight/obesity, according to the Korean Society for the study of obesity; overweight/obesity defined when body mass index was 23 kg/m2 or higher; T2DM, according to the Korean Diabetes Association (KDA) T2DM was defined as fasting plasma glucose levels ≥ 126 mg/dL (7.0 mmol/L) or glycated hemoglobin ≥ 6.5%.
Figure 2Flow chart of the proposed method for classification and prediction. DNN, Deep Neural Network; KNHANES, Korea National Health and Nutrition Examination Survey.
Figure 3DNN model for classifying and predicting on the effects of nutritional intake on overweight/obesity, dyslipidemia, hypertension and T2DM.
Confusion matrix (a) and accuracy formula (b).
| Predicted Class | |||
|---|---|---|---|
| Positive | Negative | ||
| Actual class | Positive | TP | FN |
| Negative | FP | TN | |
Baseline characteristics of datasets.
| Dyslipidemia Dataset | Hypertension Dataset | T2DM Dataset | Overweight/Obesity Dataset | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Training Dataset | Testing Dataset | Training Dataset | Testing Dataset | Training Dataset | Testing Dataset | Training Dataset | Testing Dataset | ||||||||||
| Total number | 10,731 | 10,991 | 3889 | 10,980 | |||||||||||||
| Age (yr) | 40–50 | 3376 | 844 | 3436 | 860 | 1004 | 251 | 3431 | 858 | ||||||||
| 51–60 | 2804 | 702 | 2868 | 718 | 1091 | 273 | 2867 | 717 | |||||||||
| 61–69 | 2404 | 601 | 2487 | 622 | 1016 | 254 | 2485 | 622 | |||||||||
| Gender | Male | 3523 | 881 | 3608 | 902 | 1374 | 344 | 3601 | 901 | ||||||||
| Female | 5061 | 1266 | 5184 | 1297 | 1736 | 435 | 5182 | 1296 | |||||||||
| Nutrition | N_INTK | 1408.12 (905.87–1736.19) | 1393.82 (888.89–1734.68) | 1401.18 (901.75–1728.11) | 1393.78 (880.41–1733.28) | 1517.06 (966.93–1878.52) | 1521.52 (1009.11–1898.02) | 1400.03 (903.68–1728.97) | 1403.96 (874.05–1743.19) | ||||||||
| N_EN | 1871.35 (1353.14–2232.98) | 1861.38 (1330.12–2246.35) | 1865.20 (1344.48–2228.89) | 1862.07 (1343.46–2240.20) | 1916.55 (1352.02–2302.21) | 1928.98 (1345.41–2326.51) | 1863.99 (1346.98–2229.44) | 1871.48 (1337.82–2249.78) | |||||||||
| N_PROT | 65.91 (42.93–81.01) | 65.74 (41.80–82.08) | 65.79 (42.55–81.24) | 65.07 (42.49–80.52) | 68.23 (43.58–84.02) | 68.61 (44.77–85.80) | 65.65 (42.70–80.79) | 65.87 (42.11–81.96) | |||||||||
| N_FAT | 33.97 (16.32–43.12) | 33.44 (15.82–44.36) | 33.75 (16.06–43.32) | 33.41 (16.09–42.50) | 38.22 (19.45–48.17) | 39.90 (19.445–51.32) | 33.73 (16.14–49.32) | 33.60 (15.89–42.82) | |||||||||
| N_CHO | 308.87 (228.71–371.44) | 306.66 (222.05–370.35) | 307.98 (227.32–370.88) | 307.78 (226.78–370.31) | 299.82 (215.42–362.88) | 300.67 (218.14–358.39) | 307.78 (227.34–370.57) | 309.33 (226.97–369.69) | |||||||||
| N_NA | 4467.38 (2518.68–5678.46) | 4435.47 (2480.86–5644.35) | 4460.92 (2495.87–5661.77) | 4423.25 (2526.00–5651.50) | 3745.26 (2117.68–4753.61) | 3787.99 (2122.55–4659.50) | 4446.75 (2504.61–5650.57) | 4490.53 (2512.68–5733.19) | |||||||||
| N_K | 3017.37 (2007.16–3722.56) | 2982.82 (1941.35–3705.11) | 3009.58 (1989.25–3721.10) | 2975.15 (1980.56–3657.30) | 2940.81 (1636.24–3636.82) | 2956.82 (2049.80–3594.30) | 3008.45 (1997.65–3711.75) | 2984.71 (1947.75–3685.11) | |||||||||
| Disease | 0 | 6294 | 8788 | 3143 | 4118 | ||||||||||||
| 1 | 4437 | 2203 | 746 | 6862 | |||||||||||||
| Dataset by diagnostic criteria | HE_CHOL | 193.84 (169–217) | 194.38 (169–218) | HE_SBP | 120.36 (108–131) | 120.48 (108–131) | HE_GLU | 113.78 (93.0–124.0) | 114.17 (93–125) | HE_BMI | 24.12 (21.99–25.99) | 24.10 (21.88–26.03) | |||||
| HE_ | 48.42 (39.95–55.0) | 48.07 (39.95–54.0) | HE_DBP | 77.99 (71.0–84.0) | 78.22 (70.0–85.0) | HE_HbA1c | 6.17 (5.4–6.5) | 6.16 (5.4–6.5) | |||||||||
| HE_TG | 143.47 (79.0–171.0) | 147.60 (80.0–172.5) | |||||||||||||||
N_CHO, carbohydrate intake (g); N_EN, energy intake (Kcal); N_FAT, fat intake (g); N_INTK, food intake (g); N_K, potassium intake (mg); N_NA, sodium intake (mg); N_PROT, protein intake (g); HE_BMI, overweight/obesity; HE_chol, total cholesterol; HE_dbp, diastolic blood pressure (mean value of 2–3 blood pressure measurements); HE_glu, fasting blood glucose; E_HbA1c, glycated hemoglobin; HE_HDL_st2, calibration of high-density lipoprotein cholesterol; HE_sbp, systolic blood pressure (mean value of 2–3 blood pressure measurements); HE_TG, triglyceride; T2DM, type 2 diabetes mellites. Dyslipidemia, according to the Korean Society of Lipid and Atherosclerosis, dyslipidemia is defined as any one of the following: total cholesterol level ≥ 240 mg/dL, high-density lipoprotein cholesterol level < 40 mg/dL, triglyceride level ≥ 200 mg/dL, low-density lipoprotein cholesterol level ≥ 160 mg/dL, or the use of a lipid-lowering drug; hypertension, according to Korean hypertension, hypertension was defined when systolic blood pressure was ≥140 mmHg or diastolic blood pressure was ≥80 mmHg; overweight/obesity, according to the Korean Society for the study of obesity; overweight/obesity defined when body mass index was 23 kg/m2 or higher; T2DM, according to the Korean Diabetes Association (KDA) T2DM was defined as fasting plasma glucose levels ≥ 126 mg/dL (7.0 mmol/L) or glycated hemoglobin ≥ 6.5%.
Figure 4Five-fold cross-validation for data of dyslipidemia, hypertension, T2DM and over-weight/obesity.
Results of accuracy analysis for classification models: DNN, logistic regression and decision tree.
| DNN | Logistic Regression | Decision Tree | |
|---|---|---|---|
| Dyslipidemia | 0.58654 | 0.58448 | 0.52148 |
| Hypertension | 0.79958 | 0.79929 | 0.66773 |
| T2DM | 0.80896 | 0.80818 | 0.71587 |
| Overweight/obesity | 0.62496 | 0.62486 | 0.54026 |
T2DM, type 2 diabetes mellites.
Evaluation of the fitted model.
| Diagnostic Criteria | Model | CMIN | CMIN/DF | NFI | CFI | TLI | IFI | GFI | RMSEA |
|---|---|---|---|---|---|---|---|---|---|
| Dyslipidemia | Research Model | 15.022 | 1.878 | 1.000 | 1.000 | 0.999 | 1.000 | 1.000 | 0.009 |
| Hypertension | 5.829 | 1.457 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.006 | |
| T2DM | 7.300 | 1.217 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.007 | |
| Overweight/Obesity | 7.444 | 2.481 | 1.000 | 1.000 | 0.999 | 1.000 | 1.000 | 0.012 | |
| Acceptance Model Criteria | ≤3 | ≥0.9 | ≥0.9 | ≥0.9 | ≥0.9 | ≥0.9 | ≤0.08 | ||
CFI, comparative fit index; CMIN, minimum chi-square; CMIN/DF, minimum chi-square/degrees of freedom; GFI, goodness of fit index; IFI, incremental fit index; NFI, normed fit index; RMSEA, root mean square error of approximation; T2DM, type 2 diabetes mellites; TLI, Tucker-Lewis index.
Analysis of estimated parameters’ significance.
|
|
|
| ||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| B | 4.071 | −0.729 | −1.342 | −2.237 | 0.685 | −0.133 | −0.328 | 6.593 | −0.729 | −1.342 | −2.237 | 0.685 | −0.133 | −1.837 |
| β | 0.428 | −0.077 | −0.141 | −0.234 | 0.072 | −0.014 | −0.034 | 0.385 | −0.077 | −0.141 | −0.234 | 0.072 | −0.014 | −0.170 |
| S.E. | 0.351 | 0.196 | 0.171 | 0.227 | 0.116 | 0.171 | 0.171 | 0.570 | 0.196 | 0.171 | 0.227 | 0.116 | 0.171 | 0.306 |
| Coefficient | 0.037 | 0.042 | 0.036 | 0.046 | 0.025 | 0.037 | 0.070 | 0.047 | 0.042 | 0.036 | 0.046 | 0.025 | 0.037 | 0.078 |
| C.R. | 11.586 | −3.727 | −7.833 | −9.868 | 5.898 | −0.779 | −1.916 | 11.563 | −3.727 | −7.833 | −9.868 | 5.898 | −0.779 | −6.014 |
|
| 0.105 | 0.214 | 0.210 | 0.202 | 0.215 | 0.216 | 0.409 | 0.082 | 0.214 | 0.210 | 0.202 | 0.215 | 0.216 | 0.254 |
|
| *** | *** | *** | *** | *** | 0.436 | 0.055 | *** | *** | *** | *** | *** | 0.436 | *** |
|
|
|
| ||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| B | 5.084 | −2.987 | −6.107 | −1.369 | 7.416 | 5.895 | −9.175 | 0.422 | −0.088 | −0.102 | −0.173 | 0.136 | −0.034 | −0.073 |
| β | 0.152 | −0.089 | −0.182 | −0.041 | 0.221 | 0.176 | −0.274 | 0.135 | −0.028 | −0.033 | −0.055 | 0.044 | −0.011 | −0.023 |
| S.E. | 1.912 | 1.183 | 1.005 | 1.261 | 0.723 | 1.033 | 0.997 | 0.105 | 0.064 | 0.054 | 0.070 | 0.037 | 0.056 | 0.056 |
| Coefficient | 0.097 | −0.2848 | 0.088 | 0.098 | 0.052 | 0.079 | 0.153 | 0.020 | 0.075 | 0.037 | 0.049 | 0.027 | 0.038 | 0.038 |
| C.R. | 2.659 | −2.524 | −6.077 | −1.085 | 10.258 | 5.709 | −9.205 | 4.017 | −1.38 | −1.891 | −2.487 | 3.655 | −0.607 | 1.295 |
|
| 0.05 | −0.24 | 0.087 | 0.077 | 0.071 | 0.076 | 0.153 | 0.190 | 1.171 | 0.685 | 0.7 | 0.729 | 0.678 | 0.678 |
|
| 0.008 | 0.012 | *** | 0.278 | *** | *** | *** | *** | 0.167 | 0.059 | 0.013 | *** | 0.544 | 0.195 |
p-value, probability value; ***; p < 0.001. B, unnormalized regression coefficient; β, standardized regression coefficient; C.R., critical ratio; S.E., standard error; T2DM, type 2 diabetes mellitus; N_CHO, carbohydrate intake(g); N_EN, energy intake (Kcal); N_FAT, fat intake(g); N_INTK, food intake(g); N_K, potassium intake(mg); N_NA, sodium intake(mg); N_PROT, protein intake(g).
Figure 5Path diagrams for structural equation models: (a) dyslipidemia to nutritional intake; (b) hypertension to nutritional intake; (c) T2DM to nutritional intake; (d) overweigh/obesity to nutritional intake. N_CHO, carbohydrate intake (g); N_EN, energy intake (kcal); N_FAT, fat intake (g); N_INTK, food intake (g); N_K, potassium intake (mg); N_NA, sodium intake (mg); N_PROT, protein intake (g); HE_BMI, overweight/obesity; HE_chol, total cholesterol; HE_dbp, diastolic blood pressure (mean value of 2–3 blood pressure measurements); HE_glu, fasting blood glucose; E_HbA1c, glycated hemoglobin; HE_HDL_st2, calibration of high-density lipoprotein cholesterol; HE_sbp, systolic blood pressure (mean value of 2–3 blood pressure measurements); HE_TG, triglyceride. Dyslipidemia, according to the Korean Society of Lipid and Atherosclerosis, dyslipidemia is defined as any one of the following: total cholesterol level ≥ 240 mg/dL, high-density lipoprotein cholesterol level < 40 mg/dL, triglyceride level ≥ 200 mg/dL, low-density lipoprotein cholesterol level ≥ 160 mg/dL, or the use of a lipid-lowering drug; hypertension, according to Korean hypertension, hypertension was defined when systolic blood pressure was ≥ 140 mmHg or diastolic blood pressure was ≥80 mmHg; Overweight/obesity, according to the Korean Society for the study of obesity; overweight/obesity defined when body mass index was 23 kg/m2 or higher; T2DM, type 2 diabetes mellites according to the Korean Diabetes Association (KDA) T2DM was defined as fasting plasma glucose levels ≥ 126 mg/dL (7.0 mmol/L) or glycated hemoglobin ≥ 6.5%.