| Literature DB >> 32725839 |
Xi Su1,2, Yongyong Xu1, Zhijun Tan1, Xia Wang1, Peng Yang1, Yani Su3, Yangyang Jiang2, Sijia Qin4, Lei Shang1.
Abstract
BACKGROUND: To establish a prediction model for cardiovascular diseases (CVD) in the general population based on random forests.Entities:
Keywords: cardiovascular disease; prediction model; random forest; risk factors
Mesh:
Year: 2020 PMID: 32725839 PMCID: PMC7521325 DOI: 10.1002/jcla.23421
Source DB: PubMed Journal: J Clin Lab Anal ISSN: 0887-8013 Impact factor: 2.352
The basic characteristics of samples in the training set and testing set ( ± s)/[M(Q1, Q3)]
| Variables | Training set (n = 335) | Testing set (n = 163) | χ2/ |
|
|---|---|---|---|---|
| CVD (n, %) | ||||
| Yes | 169 (50.40) | 78 (47.90) | 0.29 | .587 |
| No | 166 (49.60) | 85 (52.10) | ||
| Age (y) | 42.44 ± 9.24 | 42.27 ± 9.54 | −1.89 | .850 |
| Gender (male/female) | 235 (70.10)/100 (29.90) | 121 (74.20)/42 (25.80) | 0.90 | .244 |
| BMI (kg/m2) | 23.97 ± 3.58 | 24.40 ± 4.01 | 1.15 | .248 |
| Waist circumference (cm) | 83.90 ± 14.82 | 85.25 ± 14.71 | 0.96 | .339 |
| FBG (mmol/L) | 5.02 (4.37, 5.67) | 5.00 (4.43, 5.65) | −0.02 | .986 |
| Diastolic blood pressure (DBP, mm Hg) | 80.84 ± 13.25 | 80.79 ± 11.76 | −0.04 | .967 |
| Systolic blood pressure (SBP, mm Hg) | 123.29 ± 17.69 | 123.70 ± 17.09 | 0.25 | .805 |
| TC (mmol/L) | 4.26 (3.70, 5.08) | 4.48 (3.80, 5.20) | −1.55 | .121 |
| TG (mmol/L) | 1.65 (1.10, 2.50) | 1.62 (1.10, 2.69) | −0.37 | .707 |
| HDL‐C (mmol/L) | 1.47 (1.17, 2.26) | 1.57 (1.18, 2.51) | −1.13 | .260 |
| LDL‐C (mmol/L) | 1.76 (1.23, 2.50) | 1.69 (1.20, 2.40) | 0.69 | .485 |
| Activity level | ||||
| Low | 93 (27.80) | 54 (33.10) | 2.78 | .249 |
| Middle | 193 (57.60) | 81 (49.70) | ||
| High | 49 (14.60) | 28 (17.20) | ||
| Smoking | ||||
| No | 188 (56.10) | 86 (52.80) | 0.69 | .707 |
| Smoking cessation | 27 (8.10) | 16 (9.80) | ||
| Yes | 120 (35.80) | 61 (37.40) | ||
| Stroke | ||||
| No | 332 (99.10) | 162 (98.80) | 0.12 | .728 |
| Yes | 3 (0.90) | 2 (1.20) | ||
| Pulmonary tuberculosis | ||||
| No | 334 (99.70) | 162 (99.40) | – | .602 |
| Yes | 1 (0.30) | 1 (0.60) | ||
| Chronic bronchitis | ||||
| No | 331 (98.80) | 162 (99.40) | – | .999 |
| Yes | 4 (1.20) | 1 (0.60) | ||
| Pneumonia | ||||
| No | 161 (98.80) | 333 (99.40) | – | .600 |
| Yes | 2 (1.20) | 2 (0.60) | ||
| Lung cancer | ||||
| No | 335 (100.00) | 163 (100.00) | – | .999 |
| Yes | 0 (0.00) | 0 (0.00) | ||
| Pulmonary emphysema | ||||
| No | 334 (99.80) | 163 (100.00) | – | .999 |
| Yes | 1 (0.30) | 0 (0.00) | ||
| Family history of hypertension | ||||
| No | 222 (66.30) | 113 (69.30) | 0.46 | .495 |
| Yes | 113 (33.70) | 50 (30.70) | ||
| Family history of CHD | ||||
| No | 288 (86.00) | 137 (84.00) | 0.32 | .570 |
| Yes | 47 (14.00) | 26 (16.00) | ||
| Family history of diabetes mellitus | ||||
| No | 283 (84.50) | 140 (85.90) | 0.17 | .670 |
| Yes | 52 (15.50) | 23 (14.10) | ||
| Family history of stroke | ||||
| No | 305 (91.00) | 153 (93.90) | 1.18 | .277 |
| Yes | 30 (9.00) | 10 (6.10) | ||
| Family history of lung cancer | ||||
| No | 159 (97.50) | 328 (97.90) | – | .755 |
| Yes | 4 (2.50) | 7 (2.10) | ||
Represented the data were analyzed by Fisher's exact test.
The basic characteristics of subjects in case group and control group ( ± s)/[M(Q1, Q3)]
| Variables | Case group (n = 247) | Control group (n = 251) | χ2/ |
|
|---|---|---|---|---|
| Age (y) | 47.04 ± 7.87 | 37.82 ± 8.36 | 12.66 | <.001 |
| Gender (male/female) | 202 (81.78)/45 (18.20) | 154 (61.35)/97 (38.60) | 25.48 | <.001 |
| BMI (kg/m2) | 25.25 ± 3.55 | 23.01 ± 3.57 | 7.02 | <.001 |
| Waist circumference (cm) | 87.74 ± 14.19 | 81.00 ± 14.61 | 5.22 | <.001 |
| FBG (mmol/L) | 5.20 (4.64, 6.15) | 4.81 (4.26, 5.33) | 5.12 | <.001 |
| DBP (mm Hg) | 84.84 ± 13.87 | 76.87 ± 10.16 | 7.30 | <.001 |
| SBP (mm Hg) | 128.66 ± 19.36 | 118.28 ± 13.61 | 6.91 | <.001 |
| TC (mmol/L) | 4.41 (3.84, 5.25) | 4.20 (3.60, 4.93) | 2.59 | .010 |
| TG (mmol/L) | 1.93 (1.32, 3.00) | 1.40 (0.95, 2.13) | 5.48 | <.001 |
| HDL‐C (mmol/L) | 1.48 (1.15, 2.43) | 1.50 (1.19, 2.26) | −0.14 | .886 |
| LDL‐C (mmol/L) | 1.74 (1.18, 2.54) | 1.70 (1.25, 2.40) | −0.014 | .989 |
| Activity level | ||||
| Low | 74 (30.00) | 73 (29.10) | 9.55 | .008 |
| Middle | 147 (59.50) | 127 (50.60) | ||
| High | 26 (10.50) | 51 (20.30) | ||
| Smoking | ||||
| No | 110 (44.50) | 164 (65.30) | 22.11 | <.001 |
| Smoking cessation | 28 (11.30) | 15 (6.00) | ||
| Yes | 109 (44.10) | 72 (28.70) | ||
| Stroke | ||||
| No | 243 (98.40) | 250 (99.60) | – | .213 |
| Yes | 4 (1.60) | 1 (0.40) | ||
| Pulmonary tuberculosis | ||||
| No | 246 (99.60) | 250 (99.60) | 0.00 | .991 |
| Yes | 1 (0.40) | 1 (0.40) | ||
| Chronic bronchitis | ||||
| No | 234 (94.70) | 248 (98.80) | 6.63 | .011 |
| Yes | 13 (5.30) | 3 (1.20) | ||
| Pneumonia | ||||
| No | 246 (99.60) | 248 (98.80) | – | .624 |
| Yes | 1 (0.40) | 3 (1.20) | ||
| Lung cancer | ||||
| No | 247 (100.00) | 251 (100.00) | – | .999 |
| Yes | 0 (0.00) | 0 (0.00) | ||
| Pulmonary emphysema | ||||
| No | 247 (100.00) | 250 (99.60) | – | .330 |
| Yes | 0 (0.00) | 1 (0.40) | ||
| Family history of hypertension | ||||
| No | 150 (60.70) | 185 (73.70) | 9.52 | .002 |
| Yes | 97 (39.30) | 66 (26.30) | ||
| Family history of CHD | ||||
| No | 208 (84.20) | 217 (86.50) | 0.50 | .479 |
| Yes | 39 (15.80) | 34 (13.50) | ||
| Family history of diabetes mellitus | ||||
| No | 207 (83.80) | 216 (86.10) | 0.49 | .483 |
| Yes | 40 (16.20) | 35 (13.90) | ||
| Family history of stroke | ||||
| No | 220 (89.10) | 238 (94.80) | 5.58 | .018 |
| Yes | 27 (10.90) | 13 (5.20) | ||
| Family history of lung cancer | ||||
| No | 244 (98.80) | 247 (98.40) | – | .999 |
| Yes | 3 (1.20) | 4 (1.60) | ||
Represented the data were analyzed by Fisher's exact test.
Figure 1The influencing factors of cardiovascular diseases were ordered according to the mean decreased Gini index. AL, activity level; BMI, body mass index; CB, chronic bronchitis; DBP, diastolic blood pressure; FBG, fasting blood glucose; Fh_CHD, family history of coronary heart disease; Fh_DM, family history of diabetes mellitus; Fh_H, family history of hypertension; Fh_LC, family history of lung cancer; Fh_stroke, family history of stroke; HDL‐C, high‐density lipoprotein‐cholesterol; LDL‐C, low‐density lipoprotein‐cholesterol; SBP, systolic blood pressure; TC, total cholesterol; TG, triglycerides; WC, waist circumference
Figure 2Multidimensional classification chart of the random forest. Red dot: training samples; blue dot: testing samples
Figure 3Relationship of dynamic changes between the prediction error and its 95% confidence interval of the random forest and the number of decision trees
Multifactorial logistic regression analysis
| Variables | OR (95% CI) | B |
|
|---|---|---|---|
| Age (y) | 1.14 (1.10‐1.17) | 0.13 | <.001 |
| BMI (kg/m2) | 1.13 (1.06‐1.20) | 0.12 | <.001 |
| TG (mmol/L) | 1.11 (1.02‐1.22) | 0.11 | .023 |
| DBP (mm Hg) | 1.04 (1.02‐1.06) | 0.04 | .001 |
Figure 4The receiver operating characteristic curves of testing samples in the two prediction models