| Literature DB >> 36010317 |
Yong Whi Jeong1, Yeojin Jung2, Hoyeon Jeong1, Ji Hye Huh3, Ki-Chul Sung4, Jeong-Hun Shin5, Hyeon Chang Kim6, Jang Young Kim7, Dae Ryong Kang1,8.
Abstract
Hypertension and diabetes mellitus are major chronic diseases that are important factors in the management of cardiovascular disease. In order to prevent the occurrence of chronic diseases, proper health management through periodic health check-ups is necessary. The purpose of this study is to determine the incidence of hypertension and diabetes mellitus according to the health check-up, and to develop a predictive model for hypertension and diabetes according to the health check-up. We used the National Health Insurance Corporation database of Korea and checked whether hypertension or diabetes occurred from that date according to the number of health check-ups over the past 10 years. Compared to those who underwent five health check-ups, those who participated in the first screening had hypertension (OR = 2.18, 95% CI = 2.14-2.22), diabetes mellitus (OR = 1.33, 95% CI = 1.30-1.35) and both diseases (OR = 2.46, 95% CI = 2.39-2.53); individuals who underwent 10 screenings had hypertension (OR = 0.86, 95% CI = 0.83-0.88), diabetes mellitus (OR = 0.83, 95% CI = 0.81-0.85) and both diseases (OR = 0.83, 95% CI = 0.79-0.87). Individuals who attended fewer than five screenings compared with individuals who attended five or more screenings had hypertension (OR = 1.61, 95% CI = 1.59-1.62; AUC = 0.66), diabetes mellitus (OR = 1.21, 95% CI = 1.20-1.22; AUC = 0.59) and both diseases (OR = 1.75, 95% CI = 1.72-1.78, AUC = 0.63). The machine learning-based prediction model using XGBoost showed higher performance in all datasets than the conventional logistic regression model in predicting hypertension (accuracy, 0.828 vs. 0.628; F1-score, 0.800 vs. 0.633; AUC, 828 vs. 0.630), diabetes mellitus (accuracy, 0.707 vs. 0.575; F1-score, 0.663 vs. 0.576; AUC, 0.710 vs. 0.575) and both diseases (accuracy, 0.950 vs. 0.612; F1-score, 0.950 vs. 0.614; AUC, 0.952 vs. 0.612). It was found that health check-up had a great influence on the occurrence of hypertension and diabetes, and screening frequency was more important than other factors in the variable importances.Entities:
Keywords: XGBoost; diabetes mellitus; health check-up; hypertension; logistic regression; random forest
Year: 2022 PMID: 36010317 PMCID: PMC9407141 DOI: 10.3390/diagnostics12081967
Source DB: PubMed Journal: Diagnostics (Basel) ISSN: 2075-4418
Figure 1Study design.
Figure 2Flow chart of study design.
Baseline Characteristics after Propensity Score Matching.
| Variables | Total | Screening < 5 Times | Screening ≥ 5 Times | |
|---|---|---|---|---|
| Sex | 0.4629 | |||
| male | 438,586 (66.35) | 219,152 (66.31) | 219,434 (66.39) | |
| female | 222,448 (33.65) | 111,365 (33.69) | 111,083 (33.61) | |
| Age, years | 53.79 (11.75) | 53.83 (11.84) | 53.77 (11.65) | 0.3539 |
| <0.0001 | ||||
| 30 s | 80,435 (12.17) | 39,861 (12.06) | 40,574 (12.28) | |
| 40 s | 160,385 (24.26) | 80,951 (24.49) | 79,434 (24.03) | |
| 50 s | 209,695 (31.72) | 103,871 (31.43) | 105,824 (32.02) | |
| 60 s | 137,638 (20.82) | 68,455 (20.71) | 69,183 (20.93) | |
| 70 s | 72,881 (11.03) | 37,379 (11.31) | 35,502 (10.74) | |
| Income level | <0.0001 | |||
| quartile 1 | 241,403 (36.52) | 109,280 (33.06) | 132,123 (39.97) | |
| quartile 2 | 173,063 (26.18) | 87,821 (26.41) | 85,782 (25.95) | |
| quartile 3 | 122,972 (18.60) | 67,950 (20.56) | 55,022 (16.65) | |
| quartile 4 | 123,596 (18.70) | 66,006 (19.97) | 57,590 (17.42) | |
| BMI, kg/m2 | <0.0001 | |||
| <18.5 | 11,321 (1.71) | 6234 (1.89) | 5087 (1.54) | |
| 18.5–22.9 | 188,290 (28.48) | 93,845 (28.39) | 94,445 (28.57) | |
| 23.0–24.9 | 173,636 (26.27) | 84,532 (25.58) | 89,104 (26.96) | |
| | 287,797 (43.54) | 145,906 (44.14) | 141,881 (42.93) | |
| Diastolic blood pressure, mmHg | 80.68 (10.39) | 80.66 (10.42) | 80.71 (10.36) | 0.0473 |
| Systolic blood pressure, mmHg | 129.18 (15.11) | 129.16 (15.19) | 129.20 (15.02) | 0.2487 |
| Fasting blood sugar, mg/dL | 98.14 (21.48) | 98.16 (22.79) | 98.12 (20.07) | 0.4133 |
| Total cholesterol, mg/dL | 200.45 (29.69) | 200.79 (41.43) | 200.12 (37.86) | <0.0001 |
| Alcohol consumption, times/week | <0.0001 | |||
| 0 | 322,171 (48.74) | 164,182 (49.67) | 157,989 (47.80) | |
| 1 | 118,182 (17.88) | 52,691 (15.94) | 65,491 (19.81) | |
| 2,3 | 153,556 (23.23) | 72,804 (22.03) | 80,752 (24.43) | |
| 4–7 | 67,125 (10.15) | 40,840 (12.36) | 26,285 (7.95) | |
| Smoking | <0.0001 | |||
| never | 350,333 (53.00) | 173,998 (52.64) | 176,335 (53.35) | |
| ex | 130,701 (19.77) | 57,251 (17.32) | 73,450 (22.22) | |
| current | 180,000 (27.23) | 99,268 (30.03) | 80,732 (24.43) | |
| Physical activity, METs-min/week | 953.57 (1227.30) | 774.92 (1174.12) | 1160.56 (1255.62) | <0.0001 |
| Outcomes | ||||
| Hypertension | <0.0001 | |||
| no | 490,256 (74.17) | 232,065 (70.21) | 258,191 (78.12) | |
| yes | 170,778 (25.83) | 98,452 (29.79) | 72,326 (21.88) | |
| Diabetes mellitus | <0.0001 | |||
| no | 400,243 (60.55) | 191,519 (57.95) | 208,724 (63.15) | |
| yes | 260,791 (39.45) | 138,998 (42.05) | 121,793 (36.85) | |
| Hypertension and diabetes mellitus | <0.0001 | |||
| no | 603,723 (91.33) | 294,778 (89.19) | 308,945 (93.47) | |
| yes | 57,311 (8.67) | 35,739 (10.81) | 21,572 (6.53) | |
Data are presented as a n (%) or mean (SD).
Figure 3Odds ratios with 95% CIs for hypertension and diabetes mellitus according to screening frequency adjusted by age and sex.
Logistic regression of hypertension and diabetes mellitus according to screening frequency group.
| Hypertension | Diabetes Mellitus | Hypertension and | |
|---|---|---|---|
| Variable | OR (95% CI) | OR (95% CI) | OR (95% CI) |
| Screening frequency | |||
| ≥5 times | Ref. | Ref. | Ref. |
| <5 times | 1.61 (1.59–1.62) | 1.21 (1.20–1.22) | 1.75 (1.72–1.78) |
Adjusted by age and sex; OR, Odds Ratio; CI, Confidence Interval; Ref., reference.
Model evaluation results.
| Outcomes | Classifier | Accuracy | AUC (95% CI) | Variable Importance * | |
|---|---|---|---|---|---|
| Hypertension | Logistic | 0.628 | 0.633 | 0.630 | |
| Random | 0.824 | 0.798 | 0.825 | ||
| XGBoost |
|
|
| ||
| Diabetes Mellitus | Logistic | 0.575 | 0.576 | 0.575 | |
| Random | 0.693 | 0.647 | 0.647 | ||
| XGBoost |
|
|
| ||
| Hypertension and | Logistic | 0.612 | 0.614 | 0.612 | |
| Random | 0.948 | 0.946 | 0.949 | ||
| XGBoost |
|
|
|
* The five variables with the highest importance are indicated. Boldface means the highest value in each metric column. AUC, area under curve; CI, confidence interval; BMI, body mass index; FBS, fasting blood sugar.
Figure 4Comparison of receiver operating characteristics curves for (A) hypertension, (B) diabetes mellitus, and (C) hypertension and diabetes mellitus.