| Literature DB >> 35627338 |
Rosy Oh1, Hong Kyu Lee2, Youngmi Kim Pak3, Man-Suk Oh4.
Abstract
The early prediction and identification of risk factors for diabetes may prevent or delay diabetes progression. In this study, we developed an interactive online application that provides the predictive probabilities of prediabetes and diabetes in 4 years based on a Bayesian network (BN) classifier, which is an interpretable machine learning technique. The BN was trained using a dataset from the Ansung cohort of the Korean Genome and Epidemiological Study (KoGES) in 2008, with a follow-up in 2012. The dataset contained not only traditional risk factors (current diabetes status, sex, age, etc.) for future diabetes, but it also contained serum biomarkers, which quantified the individual level of exposure to environment-polluting chemicals (EPC). Based on accuracy and the area under the curve (AUC), a tree-augmented BN with 11 variables derived from feature selection was used as our prediction model. The online application that implemented our BN prediction system provided a tool that performs customized diabetes prediction and allows users to simulate the effects of controlling risk factors for the future development of diabetes. The prediction results of our method demonstrated that the EPC biomarkers had interactive effects on diabetes progression and that the use of the EPC biomarkers contributed to a substantial improvement in prediction performance.Entities:
Keywords: Bayesian network; diabetes mellitus; environmental pollutants; glucose intolerance; machine learning
Mesh:
Substances:
Year: 2022 PMID: 35627338 PMCID: PMC9142138 DOI: 10.3390/ijerph19105800
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 4.614
Description and discretization of variables.
| Variable | Description | Class |
|---|---|---|
| fGTOL | Glucose tolerance at 4-year follow-up | NGT, IGT, DM |
| cGTOL | Glucose tolerance at the time of data collection | NGT, IGT, DM |
| Sex | Sex | Male, Female |
| Drink | Alcohol intake | Non-drinker, Ex-drinker, Current drinker |
| Smoke | Smoking status | Non-smoker, Ex-smoker, Current smoker |
| Exercise | Exercise | No, Yes |
| DMFMY | DM family history | No, Yes |
| Age | Age (years) | <50; 50–60; 60–70; |
| Waist | Waist circumference (cm) | <85; |
| BMI | Body mass index (kg = m2) | <23; 23–25; 25–30; 30–35; |
| sysBP | Systolic blood pressure (mm Hg) | <120; 120–130; 130–140; |
| HbA1c | Glycated haemoglobin A1c | <5.5; 5.5–6.6; |
| HOMA-β | Homeostasis model assessment of β-cell function | <76; 76–114; |
| HOMA-IR | Homeostasis model assessment for insulin resistance | <1.6; 1.6–2.5; |
| TCHL | Total cholesterol (mg/dL) | <200; 200–230; |
| HDL | High-density lipoprotein cholesterol (mg/dL) | <40; 40–60; |
| TG | Triglycerides (mg/dL) | <150; 150–200; |
| ALT | Alanine aminotransferase (IU/L) | <40; |
| AST | Aspartate aminotransferase (IU/L) | <40; |
| hsCRP | High-sensitivity C-reactive protein (mg/L) | <1; 1–3; |
| AhRL | Aryl hydrocarbon receptor ligands (pM, TCDDeq) | <2.7; |
| MIS-ATP | Mitochondria-inhabiting substances determined by ATP contents (%) | <88.07; |
| MIS-ROS | Mitochondria-inhabiting substances determined by ROS levels (%) | <120; |
Baseline characteristics of candidate predictor variables by fGTOL, the glucose tolerance status after 4 years (2012).
| fGTOL | |||||||
|---|---|---|---|---|---|---|---|
| Total | NGT | IGT | DM | Assoc | Post Hoc | ||
| Variable | Mean ± SD or N (%) | ||||||
| Age | 59.74 ± 8.34 | 59.0 ± 8.32 | 62.54 ± 7.98 | 61.6 ± 8.43 | <0.001 | a,b | |
| Sex | <0.001 | ||||||
| Male | 499 (43.1%) | 401 (55.8%) | 58 (31.7%) | 40 (58.0%) | |||
| Female | 660 (56.9%) | 506 (44.2%) | 125 (68.3%) | 29 (42.0%) | |||
| BMI | 24.13 ± 3.13 | 23.90 ± 3.05 | 25.08 ± 3.30 | 24.58 ± 3.19 | <0.001 | a | |
| Waist | 87.57 ± 8.50 | 86.80 ± 8.35 | 90.08 ± 8.26 | 91.09 ± 8.92 | <0.001 | a,b | |
| sysBP | 119.77 ± 15.74 | 118.34 ± 15.13 | 123.90 ± 16.20 | 127.72 ± 18.04 | <0.001 | a,b | |
| HbA1c | 5.50 ± 0.39 | 5.44 ± 0.36 | 5.62 ± 0.41 | 5.89 ± 0.42 | <0.001 | a,b,c | |
| HOMA-β | 112.86 ± 67.21 | 113.88 ± 69.07 | 116.88 ± 62.10 | 88.77 ± 49.00 | 0.007 | b,c | |
| HOMA-IR | 2.12 ± 1.30 | 2.03 ± 1.27 | 2.50 ± 1.48 | 2.20 ± 0.95 | <0.001 | a | |
| TCHL | 191.75 ± 32.91 | 190.65 ± 31.97 | 195.57 ± 34.79 | 195.99 ± 38.91 | 0.099 | ||
| HDL | 46.10 ± 10.68 | 46.72 ± 10.74 | 44.38 ± 10.61 | 42.39 ± 8.73 | <0.001 | a,b | |
| TG | 132.15 ± 80.21 | 124.82 ± 69.82 | 150.48 ± 90.67 | 179.80 ± 136.70 | <0.001 | a,b,c | |
| ALT | 22.01 ± 15.40 | 21.27 ± 12.97 | 23.08 ± 21.10 | 28.84 ± 23.44 | <0.001 | b,c | |
| AST | 24.64 ± 10.81 | 24.28 ± 88.99 | 24.61 ± 10.36 | 29.38 ± 24.51 | <0.001 | b,c | |
| hsCRP | 1.61 ± 5.08 | 1.63 ± 5.60 | 1.44 ± 2.29 | 3.55 ± 2.71 | 0.863 | ||
| DMFMY | 0.332 | ||||||
| No | 1052 (90.8%) | 827 (91.2%) | 161(88.0%) | 64 (92.8%) | |||
| Yes | 107 (9.2%) | 80 (8.8%) | 22 (12.0%) | 5 (7.2%) | |||
| Smoke | <0.001 | ||||||
| Non- | 777 (67.1%) | 614 (67.7%) | 133 (72.7%) | 30 (33.3%) | |||
| Ex- | 194 (16.7%) | 146 (16.1%) | 32 (17.5%) | 16 (23.2%) | |||
| Current | 188 (16.2%) | 147 (16.2%) | 18 (9.8%) | 23 (43.5%) | |||
| Drink | 0.041 | ||||||
| Non- | 587 (50.6%) | 454 (50.0%) | 106 (57.9%) | 27 (39.1%) | |||
| Ex- | 61 (5.3%) | 45 (5.0%) | 9 (4.9%) | 7 (10.1%) | |||
| Current | 511 (44.1%) | 408 (45.0%) | 68 (37.2%) | 35 (50.7%) | |||
| Exercise | 0.216 | ||||||
| No | 788 (68.0%) | 623 (68.7%) | 115 (62.8%) | 50 (72.5%) | |||
| Yes | 371 (32.0%) | 284 (31.3%) | 68 (37.2%) | 19 (27.5%) | |||
| cGTOL | <0.001 | ||||||
| NGT | 917 (79.1%) | 783 (86.3%) | 109 (59.6%) | 25 (36.2%) | |||
| IGT | 242 (20.9%) | 124 (13.7%) | 74 (40.4%) | 44 (63.8%) | |||
| AhRL (pM) | 2.03 ± 1.24 | 1.73 ± 1.02 | 2.96 ± 1.27 | 3.55 ± 1.42 | <0.001 | a,b,c | |
| MIS-ATP (%) | 91.99 ± 12.06 | 93.79 ± 11.94 | 86.53 ± 10.07 | 82.76 ± 9.66 | <0.001 | a,b | |
| MIS-ROS (%) | 112.31 ± 11.91 | 111.12 ± 10.69 | 116.2 ± 14.17 | 117.46 ± 16.35 | <0.001 | a,b | |
cGTOL, glucose tolerance at the time of data collection (current); fGTOL, glucose tolerance after 4 years (future); BMI, body mass index; Waist, waist circumference; sysBP, systolic blood pressure; TCHL, total cholesterol; HDL, high density cholesterol; TG, triglyceride; ALT, alanine aminotransferase; AST, aspartate aminotransferase; hsCRP, high-sensitivity C-reactive peptide; DMFMY, DM family history. “Assoc p-value” is the p-value from ANOVA or chi-square test between each row variable and fGTOL. “Post hoc (Tukey)” presents the significant difference (5% level) of each row variable between a pair of classes of fGTOL from Tukey’s post hoc test; ‘a’ between NGT and IGT of fGTOL, ‘b’ between NGT and DM, and ‘c’ between IGT and DM.
Figure 1Process of developing the BN prediction app for diabetes progression.
Figure 2Feature selection results from (a) filter method, (b) wrapper for TAN, and (c) wrapper for GBN.
Classification accuracy and the AUC of Bayesian network classifiers.
| All variables | Classifier | ||
|---|---|---|---|
| TAN | GBN | ||
| All variables | Accuracy (%) | 77.68 ± 2.60 | 76.35 ± 1.81 |
| AUC | 0.7459 ± 0.0570 | 0.7868 ± 0.0528 | |
| Filter | Accuracy (%) | 78.02 ± 2.64 | 77.77 ± 2.61 |
| AUC | 0.7740 ± 0.0505 | 0.7618 ± 0.0513 | |
| Wrapper | Accuracy (%) | 79.43 ± 2.94 | 78.23 ± 0.42 |
| AUC | 0.8120 ± 0.0436 | 0.7886 ± 0.0384 | |
Figure 3Structure of BN prediction model and mutual information (MI) between the target and each predictor node. Variables connected by edges are conditionally dependent on each other. The scores for MI between the target node fGTOL and the predictor nodes are presented on each blue edge. The variable nodes are color coded according to their MI scores. Variables in the box were sorted in descending order of MI scores.
Figure 4An illustration of DiabetesBN [32], the online interactive app that implements the BN prediction model for diabetes progression. This is an example of the predictive probabilities of normal (NGT), prediabetes (IGT), and diabetes (DM) in 4 years for a subject with Sex = male, Waist ≥ 90 cm, sysBP ≥ 140 mm Hg, Smoke = current, cGTOL = NGT, AhRL ≥ 2.7, MISATP < 88.07, and MISROS ≥ 120.
Figure 5Predictive probabilities of DM (red) and IGT (yellow) for the joint levels of AhRL and MIS-ATP given cGTOL and HbA1c. The levels of AhRL are low (<2.7) and high (≥2.7), and the levels of MIS-ATP are low (<88.07) and high (≥88.07). The lines in each figure show the predictive probabilities of DM (red) and IGT (yellow), marginalized over AhRL and MIS-ATP.
Figure 6GBN structure on the variables. GBN assumed no structural constraint. Variables connected by edges are conditionally dependent on each other.