| Literature DB >> 34757322 |
Yujie Yang1,2, Jing Zheng3, Zhenzhen Du1, Ye Li1,4, Yunpeng Cai1.
Abstract
BACKGROUND: Stroke risk assessment is an important means of primary prevention, but the applicability of existing stroke risk assessment scales in the Chinese population has always been controversial. A prospective study is a common method of medical research, but it is time-consuming and labor-intensive. Medical big data has been demonstrated to promote disease risk factor discovery and prognosis, attracting broad research interest.Entities:
Keywords: electronic health records; hypertension; machine learning; medical big data; risk prediction; stroke
Year: 2021 PMID: 34757322 PMCID: PMC8663532 DOI: 10.2196/30277
Source DB: PubMed Journal: JMIR Med Inform
Figure 1The screening process of study population.
Figure 2Age distribution of stroke and nonstroke patients.
Gender and age distribution before and after stratified sampling.
| Characteristics | Positive cases (N=8,827), n (%) | Negative cases (N=42,088), n (%) | Negative cases after sampling N=11,126), n (%) | |
| Gender, male | 5251 (59.49) | 25990 (61.75) | 6174 (55.49) | |
|
| ||||
|
| 30-40 | 414 (4.69) | 5843 (13.88) | 522 (4.69) |
|
| 40-50 | 1746 (19.78) | 17342 (41.20) | 2204 (19.81) |
|
| 50-60 | 2104 (23.84) | 10415 (24.75) | 2656 (23.87) |
|
| 60-70 | 2462 (27.89) | 5448 (12.94) | 3108 (27.93) |
|
| 70-85 | 2088 (23.65) | 2636 (6.26) | 2636 (23.69) |
Distribution of the basic characteristics.
|
| Characteristics | Positive cases (N=8,827) | Negative cases (N=11,126) | |
|
| ||||
|
| Gender, n (%), male | 5,251 (59.49) | 6174 (55.49) | <.001 |
|
| Age, mean (SD), years | 60.21 (11.88) | 59.73 (11.94) | .005 |
|
| Years_after_hypertension, mean (SD), years | 6.25 (5.64) | 6.78 (5.27) | <.001 |
|
| ||||
|
| Smoking | 768 (8.70) | 1233 (11.08) | <.001 |
|
| Drink | 1000 (11.33) | 1643 (14.77) | <.001 |
|
| ||||
|
| FAM_hypertension | 239 (2.71) | 489 (4.40) | <.001 |
|
| FAM_diabetes | 57 (0.65) | 116 (1.04) | .002 |
|
| ||||
|
| SBPb, mmHg | 133.76 (13.42) | 131.33 (10.02) | <.001 |
|
| DBPc, mmHg | 81.93 (9.56) | 80.17 (7.45) | <.001 |
|
| PPDd, mmHg | 52.16 (10.59) | 51.15 (8.81) | <.001 |
|
| ||||
|
| N_followup_1year | 4.13 (3.66) | 5.89 (3.84) | <.001 |
|
| SBP_max, mmHg | 140.29 (14.56) | 142.77 (13.46) | <.001 |
|
| SBP_min, mmHg | 127.17 (14.09) | 122.81 (10.21) | <.001 |
|
| SBP_mean mmHg | 133.20 (11.58) | 131.75 (7.90) | <.001 |
|
| DBP_max, mmHg | 86.47 (9.69) | 89.10 (8.51) | <.001 |
|
| DBP_min, mmHg | 76.67 (10.16) | 73.42 (7.36) | <.001 |
|
| DBP_mean, mmHg | 81.35 (8.17) | 80.71 (5.80) | <.001 |
|
| PPD_max, mmHg | 58.41 (12.02) | 61.48 (10.83) | <.001 |
|
| PPD_min, mmHg | 46.01 (11.05) | 41.69 (8.28) | <.001 |
|
| PPD_mean, mmHg | 51.89 (8.75) | 51.04 (6.26) | <.001 |
|
| HRe_max, times/min | 78.57 (7.08) | 79.57 (7.11) | <.001 |
|
| HR_min, times/min | 74.29 (6.68) | 72.97 (5.97) | <.001 |
|
| SBP_delta_mean, mmHg | 4.37 (3.53) | 4.01 (3.17) | <.001 |
|
| DBP_delta_mean, mmHg | 3.46 (2.44) | 3.24 (2.10) | <.001 |
|
| PPD_delta_mean, mmHg | 4.30 (3.08) | 4.04 (2.68) | <.001 |
|
| HR_delta_mean, times/min | 1.23 (1.83) | 1.08 (1.51) | <.001 |
|
| ||||
|
| Prior cardiovascular diseases | 176 (1.99) | 11 (0.1) | <.001 |
|
| Atrial fibrillation | 53 (0.6) | 16 (0.14) | <.001 |
|
| Atherosclerosis | 488 (5.53) | 358 (3.22) | <.001 |
|
| sleep disorder | 99 (1.12) | 475 (4.27) | <.001 |
|
| Dizziness and headache | 1094 (12.39) | 1804 (16.21) | <.001 |
|
| Malaise and fatigue | 6 (0.07) | 55 (0.49) | <.001 |
|
| Giddiness | 9 (0.10) | 55 (0.49) | <.001 |
|
| Migraine | 7 (0.08) | 38 (0.34) | <.001 |
|
| Antihypertensive treatment | 8551 (96.87) | 10905 (98.01) | <.001 |
|
| Lipid-lowering drug | 1123 (12.72) | 1046 (9.40) | <.001 |
aPearson chi-square test was applied.
bSBP: systolic blood pressure.
cDBP: diastolic blood pressure.
dPPD: pulse pressure difference.
eHR: heart rate.
Model performance of four different algorithms.
| Methods | AUCa | Accuracy | Recall | F1-score | Specificity |
| Logistic regression | 0.8544 | 0.7726 | 0.7141 | 0.7354 | 0.8191 |
| SVMb | 0.8898 | 0.8112 | 0.7844 | 0.7861 | 0.8325 |
| Random forest | 0.8956 | 0.8343 | 0.8157 | 0.8133 | 0.8490 |
| XGBoostc | 0.9220 | 0.8478 | 0.8512 | 0.8319 | 0.8451 |
aAUC: area under the receiver operating curve.
bSVM: support vector machine.
cXGBoost: extreme gradient boosting.
Figure 3The receiver operating characteristic curve of the four algorithms.
Figure 4Features of the top 20 importance in XGBoost model. DBP: diastolic blood pressure; HR: heart rate; PPD: pulse pressure difference; SBP: systolic blood pressure; XGBoost: extreme gradient boosting.
Figure 5Nonlinear effect of six continuous features on the morbidity of stroke. DBP: diastolic blood pressure; PPD: pulse pressure difference; SBP: systolic blood pressure.
Figure 6Receiver operating characteristic curve compared with three traditional risk scales. AUROC: area under the receiver operating characteristic; CMCS: Chinese Multi-provincial Cohort Study; FSRP: Framingham Stroke Risk Profile; XGBoost: extreme gradient boosting.