| Literature DB >> 36118835 |
Tomoki Uchida1,2, Takeshi Kanamori1, Takanori Teramoto1, Yuji Nonaka3, Hiroki Tanaka2, Satoshi Nakamura2,4, Norihito Murayama1.
Abstract
We aimed to identify the glucose metabolism statuses of nondiabetic Japanese adults using a machine learning model with a questionnaire. In this cross-sectional study, Japanese adults (aged 20-64 years) from Tokyo and surrounding areas were recruited. Participants underwent an oral glucose tolerance test (OGTT) and completed a questionnaire regarding lifestyle and physical characteristics. They were classified into four glycometabolic categories based on the OGTT results: category 1: best glucose metabolism, category 2: low insulin sensitivity, category 3: low insulin secretion, and category 4: combined characteristics of categories 2 and 3. A total of 977 individuals were included; the ratios of participants in categories 1, 2, 3, and 4 were 46%, 21%, 14%, and 19%, respectively. Machine learning models (decision tree, support vector machine, random forest, and XGBoost) were developed for identifying the glycometabolic category using questionnaire responses. Then, the top 10 most important variables in the random forest model were selected, and another random forest model was developed using these variables. Its areas under the receiver operating characteristic curve (AUCs) to classify category 1 and the others, category 2 and the others, category 3 and the others, and category 4 and the others were 0.68 (95% confidence intervals: 0.62-0.75), 0.66 (0.58-0.73), 0.61 (0.51-0.70), and 0.70 (0.62-0.77). For external validation of the model, the same dataset of 452 Japanese adults in Hokkaido was obtained. The AUCs to classify categories 1, 2, 3, and 4 and the others were 0.66 (0.61-0.71), 0.57 (0.51-0.62), 0.60 (0.50-0.69), and 0.64 (0.57-0.71). In conclusion, our model could identify the glucose metabolism status using only 10 factors of lifestyle and physical characteristics. This model may help the larger general population without diabetes to understand their glucose metabolism status and encourage lifestyle improvement to prevent diabetes.Entities:
Mesh:
Substances:
Year: 2022 PMID: 36118835 PMCID: PMC9481387 DOI: 10.1155/2022/1026121
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.809
Review of the recent and important studies on prediabetes screening.
| Ref. no. | Screening target | Factors | Models | Tool challenges |
|---|---|---|---|---|
| [ | FPG 100–125 mg/dL, 120 mPG 100–125 mg/dL, or HbA1c 5.7–6.4% | 25 of socioeconomic, clinical, and biochemical factors | RF, GBM, LR, and ANN | Invasive measurement factors were required for screening |
| [ | FPG ≥100 mg/dL | Global diet quality score, age, smoking, alcohol drinking, unable to walk, use of rations card, time spent in sedentary activities | RF, GLMM, LASSO, and EN | Well-trained interviewers were needed to obtain dietary information |
| [ | HbA1c 5.7–6.4% | Age, sex, BMI, waist circumference, and blood pressure | RF, GBM, XGB, LR, and DL | Lack of individuals with high blood glucose levels from screening targets |
| [ | FPG 110–125 mg/dL or HbA1c 5.7–6.4% | Age, BMI, waist-to-hip ratio, systolic blood pressure, waist circumference, sleep duration, smoking status, and vigorous recreational activity time per week | XGB and LR | Lack of individuals with hyperglycemia after glucose loading from screening targets |
| [ | FPG ≥110 mg/dL or 120 mPG ≥140 mg/dL | Age, sex, BMI, smoking, FPG, fasting plasma triglyceride level, and history of high FPG | LR | Research participants were limited to staffs in an oil field in China invasive measurement factors were required for screening |
| [ | FPG 100–125 mg/dL, HbA1c 5.7–6.4%, or 120 mPG 140–199 mg/dL | Semiquantitative food frequency questionnaire answers and clinical and anthropometric measurements scores | LR | Well-trained interviewers were needed to obtain dietary information |
Abbreviation: FPG: fasting plasma glucose level; 120 mPG: 120-min postload plasma glucose level during OGTT; HbA1c: hemoglobin A1c; BMI: body mass index; RF: random forest; GBM: gradient boosting machine; LR: logistic regression; ANN: artificial neural network; GLMM: generalized linear mixed model; LASSO: least absolute shrinkage and selection operator; EN: elastic net; XGB: XGBoost; DL: deep learning.
Figure 1Training, testing, and validation processes of the models. Abbreviations: RF: random forest; XGB: XGBoost; DT: decision tree; SVM: support vector machine.
Characteristics of the participants in each glycometabolic category.
| Category 1 | Category 2 | Category 3 | Category 4 | |
|---|---|---|---|---|
|
| 448 | 206 | 133 | 190 |
| Sex (% women) | 53.1 | 56.3 | 40.6 | 44.2 |
| Age (years) | 42.3 (41.2–43.4) | 43.8 (42.3–45.3) | 46.7 (44.7–48.6)∗ | 48.9 (47.4–50.4)∗ |
| Height (m) | 164.8 (164.1–165.6) | 164.8 (163.6–166.0) | 165.6 (164.2–166.9) | 166.0 (164.9–167.1) |
| BMI (kg/m2) | 21.4 (21.1–21.6) | 23.5 (23.1–24.0)∗ | 21.6 (21.2–22.0) | 23.4 (22.9–23.8)∗ |
| 30 mPG (mg/dL) | 129.5 (127.9–131.1) | 139.5 (137.7–141.2)∗ | 171.9 (169.7–174.2)∗ | 178.1 (175.5–180.8)∗ |
| 120 mPG (mg/dL) | 94.4 (92.9–96.0) | 127.6 (124.4–130.8)∗ | 99.3 (96.3–102.3) | 134.6 (130.5–138.6)∗ |
| Matsuda index | 9.8 (9.4–10.2) | 5.8 (5.4–6.3)∗ | 7.7 (7.3–8.1)∗ | 5.0 (4.6–5.4)∗ |
Data are presented as mean (95% confidence interval), percentage, or number of individuals. ∗p < 0.05 vs. category 1. Abbreviations: BMI: body mass index; x mPG: x-min postload plasma glucose level during the OGTT.
Performances of the models for identifying glycometabolic category (95% confidence intervals).
| Model | AUC for classifying category 1 and the others | AUC for classifying category 2 and the others | AUC for classifying category 3 and the others | AUC for classifying category 4 and the others | Mean of AUCs | Sensitivity to detect categories 2, 3, and 4 | Specificity to detect category 1 |
|---|---|---|---|---|---|---|---|
| Decision tree | 0.63 (0.58-0.70) | 0.68 (0.60-0.75) | 0.56 (0.45-0.66) | 0.61 (0.53-0.70) | 0.62 | 0.71 | 0.41 |
| Support vector machine | 0.64 (0.57-0.70) | 0.65 (0.57-0.73) | 0.58 (0.47-0.68) | 0.55 (0.48-0.64) | 0.61 | 0.70 | 0.55 |
| Random forest | 0.69 (0.63-0.74) | 0.68 (0.61-0.76) | 0.63 (0.55-0.72) | 0.67 (0.59-0.74) | 0.67 | 0.70 | 0.46 |
| XGBoost | 0.62 (0.56-0.68) | 0.58 (0.50-0.66) | 0.59 (0.49-0.69) | 0.60 (0.52-0.68) | 0.60 | 0.70 | 0.45 |
Performance of the random forest model using the ten variables (95% confidence interval).
| Model | AUC for classifying category 1 and the others | AUC for classifying category 2 and the others | AUC for classifying category 3 and the others | AUC for classifying category 4 and the others | Mean of AUCs | Sensitivity to detect categories 2, 3, and 4 | Specificity to detect category 1 |
|---|---|---|---|---|---|---|---|
| Random forest using 10 variables | 0.68 (0.62–0.75) | 0.66 (0.58–0.73) | 0.61 (0.51–0.70) | 0.70 (0.62–0.77) | 0.66 | 0.70 | 0.41 |
Figure 2Receiver operating characteristic (ROC) curves of the random forest model using the ten variables.
Ten most important variables of the model and their importances.
| Variable | Mean decrease in Gini coefficient |
|---|---|
| Body mass index | 10.3 |
| Age | 8.1 |
| Height | 3.3 |
| Do you wake up in the middle of the night? | 3.1 |
| Which do you usually eat: rice or bread? | 2.5 |
| Frequency of tea intake per week at lunch | 2.1 |
| Do you wake up late on nonworking day? | 1.9 |
| Frequency of mobile phone and tablet computer use at bedtime | 1.4 |
| Frequency of soup intake | 1.4 |
| Frequency of toothbrush replacement | 0.8 |
Performance of the random forest model using the ten variables in the external validation (95% confidence interval).
| Model | AUC for classifying category 1 and the others | AUC for classifying category 2 and the others | AUC for classifying category 3 and the others | AUC for classifying category 4 and the others | Mean of AUCs | Sensitivity to detect categories 2, 3, and 4 | Specificity to detect category 1 |
|---|---|---|---|---|---|---|---|
| Random forest using 10 variables | 0.66 (0.61–0.71) | 0.57 (0.51–0.62) | 0.60 (0.50–0.69) | 0.64 (0.57–0.71) | 0.62 | 0.70 | 0.55 |
Figure 3Receiver operating characteristic curves of the random forest model using the ten variables in the external validation.