| Literature DB >> 36267989 |
Ning Chen1, Feng Fan2, Jinsong Geng3, Yan Yang4, Ya Gao1, Hua Jin5,6,7,8, Qiao Chu1, Dehua Yu5,6,7,8, Zhaoxin Wang9,10,11, Jianwei Shi5,6,10.
Abstract
Objective: The prevention of hypertension in primary care requires an effective and suitable hypertension risk assessment model. The aim of this study was to develop and compare the performances of three machine learning algorithms in predicting the risk of hypertension for residents in primary care in Shanghai, China.Entities:
Keywords: hypertension; machine learning algorithms; primary care; risk assessment model; risk of hypertension
Mesh:
Substances:
Year: 2022 PMID: 36267989 PMCID: PMC9577109 DOI: 10.3389/fpubh.2022.984621
Source DB: PubMed Journal: Front Public Health ISSN: 2296-2565
Characteristics of the participants in primary care settings.
|
|
|
|
|
|
|---|---|---|---|---|
| Age | 72.00 (68.00–78.00) | 70.00 (66.00–75.00) | 683.51 | <0.01 |
| Diabetes status | 2077.18 | <0.01 | ||
| No | 16,512 (65.95) | 13,177 (86.56) | ||
| Yes | 8,526 (34.05) | 2,046 (13.44) | ||
| Urinary protein level | 32.33 | <0.01 | ||
| Negative | 8,261 (32.99) | 8,392 (55.13) | ||
| Positive | 581 (2.32) | 405 (2.66) | ||
| BMI | 24.98 (23.01–27.30) | 24.16 (22.10–26.30) | 458.44 | <0.01 |
| EHSA | 563.15 | <0.01 | ||
| 1 | 6,973 (27.85) | 5,973 (39.24) | ||
| 2 | 12,604 (50.34) | 6,387 (41.96) | ||
| 3 | 358 (1.43) | 219 (1.44) | ||
| 4 | 277 (1.11) | 149 (0.98) | ||
| 5 | 163 (0.65) | 46 (0.30) | ||
| Cr level | 69.00 (58.00–84.00) | 66.00 (56.00–77.70) | 229.09 | <0.01 |
| SBP | 140.00 (130.00–153.00) | 139.00 (126.00–148.00) | 326.93 | <0.01 |
| WC | 87.00 (81.00–93.00) | 85.00 (79.00–91.00) | 157.52 | <0.01 |
| Smoking status | 200.85 | <0.01 | ||
| 1 | 19,171 (76.57) | 10,238 (67.25) | ||
| 2 | 1,159 (4.63) | 857 (5.63) | ||
| 3 | 2,028 (8.10) | 1,700 (11.17) | ||
| LDL-C level | 2.89 (2.20–3.41) | 2.99 (2.46–3.63) | 402.35 | <0.01 |
| HDL-C level | 1.35 (1.11–1.54) | 1.40 (1.20–1.66) | 586.65 | <0.01 |
| Frequency of drinking | 97.64 | <0.01 | ||
| 1 | 18,096 (72.27) | 9,837 (64.62) | ||
| 2 | 2,753 (11.00) | 1,771 (11.63) | ||
| 3 | 199 (0.79) | 151 (0.99) | ||
| 4 | 918 (3.67) | 764 (5.02) | ||
| Glucose level | 5.60 (5.13–6.90) | 5.50 (5.00–6.33) | 247.31 | <0.01 |
| Urea nitrogen level | 5.63 (4.80–6.83) | 5.63 (4.80–6.37) | 306.45 | <0.01 |
| TC level | 4.82 (4.01–5.52) | 4.99 (4.35–5.72) | 267.34 | <0.01 |
| DPB | 78.00 (72.00–84.00) | 78.00 (70.00–82.00) | 235.77 | <0.01 |
| Exercise frequency | 17.48 | <0.01 | ||
| 1 | 14,751 (58.91) | 8,460 (55.57) | ||
| 2 | 815 (3.26) | 391 (2.57) | ||
| 3 | 1,495 (5.97) | 926 (6.08) | ||
| 4 | 5,471 (21.85) | 3,331 (21.88) | ||
| High salt consumption | 17.24 | <0.01 | ||
| No | 24,938 (99.60) | 15,199 (99.80) | ||
| Yes | 100 (0.40) | 24 (0.20) | ||
| TG level | 1.39 (1.12–1.84) | 1.39 (1.00–1.80) | 13.22 | <0.01 |
| Time spent engaged in exercise | 30.00 (30.00–30.00) | 30.00 (30.00–30.00) | 0.41 | 0.52 |
Refers to nonnormally distributed measurement data, reported as the median (25th percentile, 75th percentile).
refers to results of the rank sum test.
refers to the results of the chi-square test.
Configuration of parameters in each ML algorithm.
|
|
|
|
|
|---|---|---|---|
| XGBoost | learning_rate | [0, 0.3] | 0.05 |
| n_estimators | [100, 500] | 200 | |
| gamma | [0, 20] | 5 | |
| subsample | [0, 0.9] | 0.4 | |
| colsample_bytree | [0.5, 0.9] | 0.9 | |
| min_child_weight | (1, 6) | 5 | |
| max_depth | (2, 8) | 6 | |
| objective | - | binary:logistic | |
| Random forest | n _estimators | [1, 50] | 40 |
| criterion | gini | gini | |
| max_depth | none | none | |
| min_samples_split | [5, 200] | 200 | |
| min_samples_leaf | [1, 50] | 1 | |
| max_features | auto | auto | |
| Logistic regression | C | [0, 200] | 100 |
| class_weight | none | none | |
| max_iter | [10, 100] | 10 | |
| solver | - | liblinear |
Figure 1Feature importance in the XGBoost model.
Figure 3Feature importance in the logistic regression model.
The fitting results for the XGBoost, random forest, and logistic regression models for the training, validation, and testing sets.
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|
| XGBoost | Training | 0.886 | 0.530 | 0.756 | 0.739 | 0.752 | 0.816 | 0.818 |
| Validation | 0.862 | 0.480 | 0.732 | 0.678 | 0.717 | 0.791 | 0.753 | |
| Testing | 0.864 | 0.488 | 0.735 | 0.686 | 0.722 | 0.795 | 0.765 | |
| Random forest | Training | 0.896 | 0.434 | 0.723 | 0.718 | 0.722 | 0.800 | 0.782 |
| Validation | 0.871 | 0.446 | 0.721 | 0.678 | 0.711 | 0.789 | 0.745 | |
| Testing | 0.816 | 0.548 | 0.748 | 0.644 | 0.714 | 0.780 | 0.756 | |
| Logistic regression | Training | 0.827 | 0.411 | 0.698 | 0.591 | 0.670 | 0.757 | 0.705 |
| Validation | 0.822 | 0.418 | 0.699 | 0.588 | 0.669 | 0.756 | 0.692 | |
| Testing | 0.829 | 0.430 | 0.705 | 0.604 | 0.678 | 0.762 | 0.707 |
Figure 4The ROC curves obtained from the XGBoost model, random forest model and logistic regression model. X axis: 1-specificity, Y axis: sensitivity. The reference line is shown as a dashed line (the black line).
AUCs for the XGBoost, random forest, and logistic regression models for the training, validation, and testing sets.
|
|
|
|
|---|---|---|
| XGBoost | Training | 0.818 |
| Validation | 0.753 | |
| Testing | 0.765 | |
| Random forest | Training | 0.782 |
| Validation | 0.745 | |
| Testing | 0.756 | |
| Logistic regression | Training | 0.705 |
| Validation | 0.692 | |
| Testing | 0.707 |