| Literature DB >> 34308121 |
Tadao Ooka1, Hisashi Johno2, Kazunori Nakamoto3, Yoshioki Yoda4, Hiroshi Yokomichi1, Zentaro Yamagata1.
Abstract
INTRODUCTION: Early intervention in type 2 diabetes can prevent exacerbation of insulin resistance. More effective interventions can be implemented by early and precise prediction of the change in glycated haemoglobin A1c (HbA1c). Artificial intelligence (AI), which has been introduced into various medical fields, may be useful in predicting changes in HbA1c. However, the inability to explain the predictive factors has been a problem in the use of deep learning, the leading AI technology. Therefore, we applied a highly interpretable AI method, random forest (RF), to large-scale health check-up data and examined whether there was an advantage over a conventional prediction model. RESEARCH DESIGN AND METHODS: This study included a cumulative total of 42 908 subjects not receiving treatment for diabetes with an HbA1c <6.5%. The objective variable was the change in HbA1c in the next year. Each prediction model was created with 51 health-check items and part of their change values from the previous year. We used two analytical methods to compare the predictive powers: RF as a new model and multivariate logistic regression (MLR) as a conventional model. We also created models excluding the change values to determine whether it positively affected the predictions. In addition, variable importance was calculated in the RF analysis, and standard regression coefficients were calculated in the MLR analysis to identify the predictors.Entities:
Keywords: diabetes mellitus
Year: 2021 PMID: 34308121 PMCID: PMC8258057 DOI: 10.1136/bmjnph-2020-000200
Source DB: PubMed Journal: BMJ Nutr Prev Health ISSN: 2516-5542
Characteristics of the study participants
| (Unit) | All | Amount of HbA1c increase in the next year* | |||||||
| <0 | 0–0.1 | 0.2–0.3 | 0.4–0.5 | 0.6–0.7 | 0.8–0.9 | 1.0- | |||
| N | 42 908 | 13 459 | 15 761 | 9945 | 2898 | 536 | 158 | 151 | |
| Ggender† | % | 53.10 | 53.51 | 52.20 | 52.64 | 54.07 | 60.63 | 71.52 | 74.83 |
| Age | – | 55.05 | 55.00 | 54.78 | 55.33 | 55.58 | 56.32 | 54.79 | 55.53 |
| Height | Cm | 161.71 | 161.64 | 161.77 | 161.60 | 161.66 | 162.57 | 163.35 | 164.68 |
| Drink† | % | 46.00 | 46.92 | 45.66 | 44.64 | 46.52 | 49.72 | 56.41 | 55.78 |
| Smoke† | % | 45.82 | 46.04 | 44.87 | 45.68 | 47.13 | 53.79 | 60.65 | 67.35 |
| Weight | kg | 60.08 | 59.99 | 59.86 | 60.00 | 60.94 | 61.82 | 65.14 | 68.13 |
| BMI | – | 22.87 | 22.85 | 22.77 | 22.87 | 23.21 | 23.32 | 24.28 | 24.96 |
| Body fat | % | 24.19 | 24.19 | 24.10 | 24.22 | 24.58 | 24.30 | 24.53 | 24.92 |
| GTP | U/L | 36.70 | 36.12 | 36.02 | 36.96 | 38.80 | 44.47 | 51.11 | 58.94 |
| HDL-C | mg/dL | 57.64 | 58.10 | 57.85 | 57.28 | 56.70 | 55.49 | 52.78 | 49.96 |
| LDL-C | mg/dL | 124.31 | 124.69 | 124.20 | 124.25 | 124.06 | 122.05 | 119.15 | 123.44 |
| FBG | mg/dL | 98.42 | 98.43 | 97.51 | 98.36 | 100.35 | 104.96 | 113.41 | 120.47 |
| HbA1c | % | 5.03 | 5.13 | 5.00 | 4.96 | 4.97 | 5.12 | 5.44 | 5.66 |
| S-BP | mm Hg | 124.79 | 125.28 | 124.00 | 124.50 | 126.56 | 127.90 | 129.91 | 130.65 |
| D-BP | mm Hg | 77.31 | 77.57 | 76.94 | 77.23 | 77.93 | 78.21 | 79.77 | 80.36 |
The definition of each variable’s abbreviation can be seen in table 2. Other characteristics can also be found in online supplemental table S2.
*The lower limit of HbA1c increase is −2.0% and the upper limit of HbA1c increase is +5.6%.
†Gender: prevalence of male, drink: prevalence of drinking more than twice a week, smoke: prevalence of current smoking habit.
BMI, body mass index; DBP, diastolic blood pressure; FBG, fasting blood glucose; GTP, glutamyl transpeptidase; HDL-C, high-density lipoprotein cholesterol; LDL-C, low-density lipoprotein cholesterol; SBP, systolic blood pressure.
Variables used in random forest method and multiple logistic regression models
| Objective variable | ||||
| Model 1 (≥0%) | 1 | HbA1c increase of ≥0% from previous year† | 0 | HbA1c increase of not ≥0% from previous year‡ |
| Model 2 (≥0.2%) | 1 | HbA1c increase of ≥0.2% from previous year† | 0 | HbA1c increase of not ≥0.2% from previous year‡ |
| Model 3 (≥0.4%) | 1 | HbA1c increase of ≥0.4% from previous year† | 0 | HbA1c increase of not ≥0.4% from previous year‡ |
| Model 4 (≥0.6%) | 1 | HbA1c increase of ≥0.6% from previous year† | 0 | HbA1c increase of not ≥0.6% from previous year‡ |
| Model 5 (≥0.8%) | 1 | HbA1c increase of ≥0.8% from previous year† | 0 | HbA1c increase of not ≥0.8% from previous year‡ |
| Model 6 (≥1.0%) | 1 | HbA1c increase of ≥1.0% from previous year† | 0 | HbA1c increase of not ≥1.0% from previous year‡ |
|
| ||||
| Single-year value and change from previous year (92 (46+46) in total) | Weight, BMI, body fat, WCC, RCC, Hb, Ht, MCV, MCH, MCHC, PLAT, TP, ALB, A/G, ChE, T-Bil, D-Bil, I-Bil, ALP, LAP, GTP, LDH, AST, ALT, BUN, CRE, UA, Na, K, Cl, Ca, CK, TG, TC, LDL-C, HDL-C, FBG, HbA1c, S-BP, D-BP, FVC, FEV1, P-FVC, P-FEV1, CRP, RF | |||
| Only single-year value (five in total) | Gender, age, height, drink*, smoke* | |||
*Drink: drinking more than twice a week (1) or not (0), smoke: Having current smoking habit (1) or not (0).
†the upper limit of HbA1c increase is +5.6%.
‡the lower limit of HbA1c increase is −2.0%.
A/G, albumin/globulin ratio; ALB, albumin; ALP, alkaline phosphatase; ALT, alanine aminotransferase; AST, aspartate aminotransferase; BMI, body mass index; BUN, blood urea nitrogen; Ca, calcium; ChE, cholinesterase; CK, creatine kinase; Cl, chloride; CRE, creatinine; CRP, C reactive protein; D-Bil, direct bilirubin; D-BP, diastolic blood pressure; FBG, fasting blood glucose; FEV1, forced expiratory volume in 1 s; FVC, forced vital capacity; GTP, glutamyl transpeptidase; Hb, haemoglobin; HbA1c, glycated haemoglobin A1c; HDL-C, high-density lipoprotein cholesterol; Ht, haematocrit; I-Bil, Indirect bilirubin; LAP, Leucine aminopeptidase; LDH, Lactate dehydrogenase; LDL-C, low-density lipoprotein cholesterol; MCH, mean corpuscular haemoglobin; MCHC, mean corpuscular haemoglobin concentration; MCV, mean corpuscular volume; Na, sodium; P-FEV1, forced expiratory volume % in one 1s; P-FVC, forced vital capacity %; PLAT, platelet; RCC, red cell count; RF, rheumatoid factor; S-BP, systolic blood pressure; T-Bil, total bilirubin; TC, total cholesterol; TG, triglyceride; TP, total protein; UA, urinary acid; WCC, white cell count.
Figure 1Receiver operating characteristic (ROC) curves showing the prediction performance of select models for changes in HbA1c. ROC curves of Random forest model (red line), multiple logistic regression model (stepwise logistic regression, blue line) and variable restricted multiple logistic regression model (logistic regression with nine variables, green line) are displayed according to the increase in HbA1c change value.
Figure 2Receiver operating characteristic (ROC) curves showing the prediction performance of select models for changes in HbA1c. ROC curves of random forest model (using two consecutive years of values for prediction, red line) and variable restricted random forest model (using a single year of values for prediction, blue line) are displayed according to the increase in HbA1c change value.
Sensitivity and specificity of best model and AUC on each ROC curve
| AUC | Best model on ROC curve | AUC | Best model on ROC curve | ||||
| Sensitivity | Specificity | Sensitivity | Specificity | ||||
| RF model1 | 0.719 | 0.714 | 0.617 | MLR model1 | 0.699* | 0.648 | 0.648 |
| RF model2 | 0.716 | 0.608 | 0.720 | MLR model2 | 0.711 | 0.648 | 0.668 |
| RF model3 | 0.743 | 0.607 | 0.778 | MLR model3 | 0.734 | 0.629 | 0.729 |
| RF model4 | 0.864 | 0.804 | 0.823 | MLR model4 | 0.817* | 0.748 | 0.773 |
| RF model5 | 0.940 | 0.870 | 0.898 | MLR model5 | 0.840* | 0.685 | 0.889 |
| RF model6 | 0.967 | 0.929 | 0.877 | MLR model6 | 0.854* | 0.750 | 0.834 |
| vrRF model1 | 0.606* | 0.467 | 0.685 | vrMLR model1 | 0.635* | 0.610 | 0.605 |
| vrRF model2 | 0.602* | 0.516 | 0.632 | vrMLR model2 | 0.622* | 0.654 | 0.542 |
| vrRF model3 | 0.638* | 0.541 | 0.671 | vrMLR model3 | 0.634* | 0.607 | 0.594 |
| vrRF model4 | 0.796* | 0.594 | 0.874 | vrMLR model4 | 0.680* | 0.517 | 0.835 |
| vrRF model5 | 0.895† | 0.741 | 0.919 | vrMLR model5 | 0.801* | 0.630 | 0.950 |
| vrRF model6 | 0.918 | 0.821 | 0.944 | vrMLR model6 | 0.798* | 0.643 | 0.959 |
*Significantly lower than that in the corresponding RF model: p<0.01.
†Significantly lower than that in the corresponding RF model: p<0.05.
AUC, area under the curve; MLR, Multiple Logistic Regression; RF, Random Forest; ROC, receiver operating characteristic; vrMLR, variable restricted multiple logistic regression (use only nine variables according to a previous study); vrRF, variable restricted random forest (only use single year for prediction).
Variable importance on random forest models and standard partial regression coefficient on multiple logistic regression models
| RF model 1 | RF model 2 | RF model 3 | RF model 4 | RF model 5 | RF model 6 | Total | ||||||||
| Variables | VI* | Variables | VI* | Variables | VI* | Variables | VI* | Variables | VI* | Variables | VI* | Variables | VI* | |
|
| HbA1c_dif | 100 | HbA1c_dif | 100 | HbA1c_dif | 100 | HbA1c_dif | 100 | HbA1c | 100 | HbA1c | 100 | HbA1c_dif | 100 |
|
| HbA1c | 66.8 | HbA1c | 41.6 | FBG | 53.3 | HbA1c | 96.6 | HbA1c_dif | 98.2 | FBG | 81.0 | HbA1c | 97.0 |
|
| RF_dif | 27.5 | MCV_dif | 28.7 | HbA1c | 50.8 | FBG | 87.2 | FBG | 81.9 | HbA1c_dif | 78.5 | FBG | 79.1 |
|
| A/G_dif | 27.0 | FBG | 28.4 | FBG_dif | 40.2 | FBG_dif | 65.0 | FBG_dif | 80.1 | FBG_dif | 62.7 | FBG_dif | 59.5 |
|
| MCV_dif | 25.2 | RF_dif | 25.7 | MCV_dif | 35.1 | ALP | 43.2 | ALP | 43.2 | Weight | 50.2 | Weight | 42.2 |
|
| P-FEV1 | 23.3 | MCHC_dif | 25.1 | CRP_dif | 34.1 | TC | 43.1 | Weight | 41.7 | ALP | 42.9 | ALP | 41.2 |
|
| P-FEV1_dif | 23.2 | FBG_dif | 25.0 | PLAT | 33.3 | weight | 43.0 | PLAT | 40.0 | TG | 42.4 | PLAT | 38.7 |
|
| GTP | 22.7 | P-FEV1_dif | 25.0 | FVC | 33.3 | CRP_dif | 42.5 | weight_dif | 38.4 | AST | 40.8 | CRP_dif | 38.1 |
|
| P-FVC_dif | 22.4 | P-FVC_dif | 24.6 | P-FEV1 | 32.6 | PLAT | 41.8 | CK_dif | 37.4 | weight_dif | 39.3 | ALP_dif | 38.0 |
|
| CRP_dif | 22.2 | CRP_dif | 24.4 | ALP_dif | 32.6 | ChE | 41.6 | ALP_dif | 37.2 | ALP_dif | 38.3 | TG | 37.7 |
|
|
|
|
|
|
|
| ||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
| 1 | MCH | 100.0 | HbA1c_dif | 100.0 | MCH | 100.0 | MCH | 100.0 | MCH | 100.0 | Ht | 100.0 | MCH | 100.0 |
| 2 | MCV | 82.5 | HbA1c | 84.6 | MCHC | 53.2 | MCV | 61.9 | MCV | 64.7 | Hb | 80.0 | MCV | 72.6 |
| 3 | MCHC | 34.5 | FBG | 81.3 | MCV | 52.7 | MCHC | 46.2 | MCHC | 46.2 | Hb_dif | 32.9 | MCHC | 43.6 |
| 4 | HbA1c | 33.0 | TC | 75.7 | Ht | 46.5 | RCC | 30.0 | FEV1 | 33.6 | Ht_dif | 30.9 | Ht | 43.2 |
| 5 | Ht_dif | 32.9 | ALB_dif | 71.4 | RCC | 43.7 | Ht | 29.1 | MCH_dif | 33.0 | RCC | 29.6 | HbA1c_dif | 39.1 |
| 6 | TC | 30.7 | MCH | 67.3 | FVC | 42.8 | Hb_dif | 24.6 | FVC | 29.5 | MCHC | 23.6 | HbA1c | 33.6 |
| 7 | LDL-C | 26.2 | LDL-C | 63.3 | MCH_dif | 41.5 | RCC_dif | 20.5 | MCV_dif | 28.3 | MCV | 18.8 | FBG | 32.6 |
| 8 | HbA1c_dif | 24.7 | MCV | 58.4 | MCV_dif | 39.4 | MCV_dif | 14.8 | MCHC_dif | 25.9 | MCHC_dif | 14.9 | TC | 29.5 |
| 9 | FBG | 20.6 | HDL-C | 51.1 | HbA1c_dif | 37.3 | HbA1c_dif | 12.8 | Ht | 24.4 | HbA1c | 8.2 | RCC | 27.1 |
| 10 | HDL-C | 19.4 | A/G_dif | 43.6 | FBG | 31.9 | Cl | 10.6 | RCC | 23.1 | TC | 6.7 | LDL-C | 25.1 |
*Variable importance and standard partial regression coefficient are expressed as percentages. We set the degree of most influential variable in each model as 100% (SRC is converted to an absolute value because it can take a negative value).
A/G, albumin/globulin ratio; CRP, C reactive protein; dif, Change value from the previous year; FBG, fasting blood glucose; FEV1, forced expiratory volume in 1 s; FVC, forced vital capacity; GTP, glutamyl transpeptidase; Hb, haemoglobin; HbA1c, glycated haemoglobin A1c; HDL-C, high-density lipoprotein cholesterol; Ht, haematocrit; MCHC, mean corpuscular haemoglobin concentration; MLR, Multiple Logistic Regression; PLAT, platelet; RCC, red cell count; RF, Random Forest; SPRC, Standard Regression Coefficient; TC, total cholesterol; VI, Variable Importance.