| Literature DB >> 35250810 |
Elizabeth Hunter1, John D Kelleher1,2.
Abstract
Age is one of the most important risk factors when it comes to stroke risk prediction. However, including age as a risk factor in a stroke prediction model can give rise to a number of difficulties. Age often dominates the risk score, and also not all risk factors contribute proportionally to stroke risk by age. In this study we investigate a number of common stroke risk factors, using Framingham heart study data from the NHLBI Biologic Specimen and Data Repository Information Coordinating Center to determine if they appear to contribute proportionally by age to a stroke risk score. As we find evidence that there is some non-proportionality by age, we then create a set of logistic regression risk models that each predict the 5 year stroke risk for a different age group. The age group models are shown to be better calibrated when compared to a model for all ages that includes age as a risk factor. This suggests that to get better predictions for stroke risk it may be necessary to consider alternative methods for including age in stroke risk prediction models that account for the non-proportionality of the other risk factors as age changes.Entities:
Keywords: epidemiology; machine learning; predictive modeling; risk; stroke
Year: 2022 PMID: 35250810 PMCID: PMC8891452 DOI: 10.3389/fneur.2022.803749
Source DB: PubMed Journal: Front Neurol ISSN: 1664-2295 Impact factor: 4.003
Ischemic strokes by age group.
|
|
|
|
|
| |
|---|---|---|---|---|---|
| Stroke in 5 years | 2,114 | 106 | 277 | 586 | 1,145 |
| No stroke in 5 years | 111,600 | 36,709 | 28,510 | 24,540 | 21,841 |
| Lifetime stroke | 14,983 | 3,821 | 4,066 | 3,929 | 3,167 |
| No lifetime stroke | 98,731 | 32,994 | 24,721 | 21,197 | 19,819 |
Distributions of continuous stroke risk factors.
|
|
|
|
|
| |
|---|---|---|---|---|---|
|
| |||||
| Total | 52 | 130.0 | 134.1 | 390.0 | 1417 |
| Stroke | 79.0 | 150.0 | 155.1 | 390.0 | 70 |
| No stroke | 52.0 | 130.0 | 133.8 | 383.0 | 1347 |
|
| |||||
| Total | 25.0 | 79.0 | 78.9 | 163.0 | 354 |
| Stroke | 38.0 | 80.0 | 80.7 | 148.0 | 14 |
| No stroke | 25.0 | 79.0 | 78.9 | 164.0 | 340 |
|
| |||||
| Total | 12.0 | 25.8 | 26.4 | 66.6 | 1437 |
| Stroke | 14.0 | 26.4 | 26.9 | 53.6 | 55 |
| No stroke | 12.0 | 25.8 | 26.4 | 66.6 | 1382 |
|
| |||||
| Total | 27.0 | 212.0 | 214.4 | 1124.0 | 27906 |
| Stroke | 49.0 | 174.0 | 215.1 | 608.0 | 879 |
| No stroke | 27.0 | 212.0 | 214.4 | 1124.0 | 27027 |
|
| |||||
| Total | 0 | 20.0 | 19.4 | 100 | 13,019 |
| Stroke | 0 | 20.0 | 20.1 | 60 | 175 |
| No stroke | 0 | 20 | 19.4 | 100 | 12,844 |
Distributions of categorical stroke risk factors.
|
|
|
| |
|---|---|---|---|
|
| |||
| Male | 958 | 48,936 | 1.9 |
| Female | 1,156 | 62,664 | 1.8 |
| Missing | 0 | 0 | |
|
| |||
| Yes | 470 | 27,880 | 1.7 |
| No | 1,470 | 70,913 | 2.0 |
| Missing | 174 | 12,807 | |
|
| |||
| Yes | 431 | 3,578 | 10.8 |
| No | 1,683 | 108,022 | 2.0 |
| Missing | 0 | 0 | |
|
| |||
| Yes | 245 | 4,261 | 5.4 |
| No | 1,856 | 107,023 | 1.5 |
| Missing | 13 | 316 | |
|
| |||
| Yes | 901 | 21,459 | 4.0 |
| No | 1,041 | 74,027 | 1.4 |
| Missing | 172 | 16,114 | |
Median value of continuous variables by age group and stroke or non-stroke outcome.
|
|
|
|
| |
|---|---|---|---|---|
|
| ||||
| Total | 127.0 | 137.0 | 141.0 | 148.0 |
| Stroke | 135.5 | 145.0 | 149.0 | 153.0 |
| No stroke | 124.0 | 133.0 | 138.0 | 145.0 |
|
| ||||
| Total | 82.0 | 84.0 | 80.00 | 73 |
| Stroke | 84.0 | 90.0 | 84.00 | 75.00 |
| No stroke | 81.0 | 82.0 | 80.00 | 72.00 |
|
| ||||
| Total | 26.1 | 26.57 | 26.71 | 26.28 |
| Stroke | 27.2 | 27.57 | 27.06 | 26.43 |
| No stroke | 25.6 | 26.20 | 26.55 | 26.16 |
|
| ||||
| Total | 222.5 | 234.0 | 224.0 | 188.0 |
| Stroke | 247.0 | 238.0 | 227.0 | 191.0 |
| No stroke | 213.0 | 233.5 | 222.0 | 186.0 |
|
| ||||
| Total | 20.0 | 20.0 | 20.00 | 15.00 |
| Stroke | 21.7 | 20.00 | 20.00 | 20.00 |
| No stroke | 23.7 | 20.00 | 20.00 | 12.00 |
Percent of clinical exams with an ischemic stroke in 5 years for the categorical risk factors.
|
|
|
|
| |
|---|---|---|---|---|
|
| ||||
| Male | 31.1 | 32.8 | 38.2 | 38.7 |
| Female | 36.0 | 34.0 | 36.3 | 35.3 |
|
| ||||
| Yes | 31.0 | 27.8 | 44.6 | 41.0 |
| No | 35.4 | 37.6 | 34.5 | 36.8 |
|
| ||||
| Yes | 43.8 | 65.6 | 78.5 | 86.1 |
| No | 32.6 | 31.9 | 34.5 | 30.6 |
|
| ||||
| Yes | 22.2 | 50.6 | 58.7 | 39.9 |
| No | 33.8 | 31.3 | 34.3 | 37.0 |
|
| ||||
| Yes | 47.6 | 37.2 | 40.8 | 38.5 |
| No | 32.0 | 32.5 | 35.3 | 36.0 |
Pearson's Chi-squared p-value for categorical variables and Kruskal–Wallis p-value for continuous variables by age group.
|
| |
|---|---|
| Sex | 1.936e-06 |
| Smoking | <2.2e-16 |
| Atrial fibrillation | <2.2e-16 |
| Diabetes | 1.536e-07 |
| High blood pressure treatment | <2.2e-16 |
| Systolic blood pressure | 7.619e-09 |
| Diastolic blood pressure | <2.2e-16 |
| Total cholesterol | <2.2e-16 |
| BMI | 0.0005 |
| Cigarettes per day (all individuals) | <2.2e-16 |
| Cigarettes per day (only smokers) | 0.1071 |
Pearson's Chi-squared p-value for categorical variables and Wilcoxon rank sum p-value for continuous variables by age group (*indicates simulated p-value due to small sample sizes).
|
|
|
|
| |
|---|---|---|---|---|
| Sex | 0.497 | 0.777 | 0.441 | 0.058 |
| Smoking | 0.557 | 0.003 | 0.0002 | 0.136 |
| Atrial fibrillation | 0.522 | 0.0002 | <2.2e-16 | <2.2e-16 |
| Diabetes | 0.717* | 0.0008 | 1.75e-11 | 0.345 |
| High blood pressure treatment | 0.226 | 0.356 | 0.03 | 0.163 |
| Systolic blood pressure | 1.19e-05 | 4.407e-11 | <2.2e-16 | 9.492e-10 |
| Diastolic blood pressure | 0.003 | 7.041e-11 | 5.162e-13 | 1.534e-09 |
| Total cholesterol | 0.003 | 0.098 | 0.016 | 0.118 |
| BMI | 0.025 | 2.934e-05 | 0.150 | 0.058 |
| Cigarettes per day (all individuals) | 0.317 | 0.003 | 1.61e-05 | 0.067 |
| Cigarettes per day (only smokers) | 0.340 | 0.874 | 0.004 | 0.0001 |
Coefficients for the multi-variable logistic regression model including age as a risk factor (*indicates a scaled variable).
|
|
| |
|---|---|---|
| Intercept | −0.989 | <2e-16 |
| Sex | 0.07 | 0.29 |
| Systolic blood pressure* | 0.16 | 3.8e-5 |
| Diastolic blood pressure* | 0.35 | <2e-16 |
| BMI* | 0.08 | 0.016 |
| Cigarettes smoked per day | 0.02 | 3.5e-8 |
| Atrial fibrillation | 2.47 | <2e-16 |
| Diabetes | 0.59 | 1.6e-8 |
| Age* | 0.18 | 2.8e-5 |
Discrimination and calibration metrics when the model is tested on all ages and by age group.
|
|
|
|
|
| |
|---|---|---|---|---|---|
| AUC | 0.69 (0.03) | 0.52 | 0.64 | 0.69 | 0.69 |
| F1 | 0.42 (0.03) | 0.31 | 0.43 | 0.37 | 0.49 |
| Accuracy | 0.70 (0.70) | 0.64 | 0.66 | 0.69 | 0.73 |
| Hosmer and Lemeshow test | 0.26 (7) | 3.4e-5 | 0.03 | 0.44 | 0.12 |
| Spiegelhalter's test | 0.54 (9) | 0.002 | 0.09 | 0.24 | 0.38 |
Coefficients for the age group specific logistic regression models (*indicates a scaled variable).
|
|
|
|
| |
|---|---|---|---|---|
| Intercept | −0.87 | −1.05 | −0.95 | −0.95 |
| 0.0009 | 3.4e-11 | <2e-16 | <2e-16 | |
| Sex | 0.05 | 0.14 | −0.15 | 0.12 |
| 0.87 | 0.48 | 0.24 | 0.20 | |
| Systolic blood pressure* | 0.57 | 0.20 | 0.31 | 0.15 |
| 0.04 | 0.17 | 0.0007 | 0.0008 | |
| Diastolic blood pressure* | 0.09 | 0.42 | 0.24 | 0.25 |
| 0.71 | 0.006 | 0.009 | 1.12e-07 | |
| BMI* | 0.21 | 0.17 | 0.02 | 0.05 |
| 0.24 | 0.07 | 0.76 | 0.28 | |
| Cigarettes smoked per day | 0.003 | −0.007 | 0.03 | 0.02 |
| 0.77 | 0.35 | 1.48e-07 | 0.0003 | |
| Atrial fibrillation | 1.18 | 1.78 | 2.28 | 2.68 |
| 0.07 | 0.0001 | 7.00e-16 | <2e-16 | |
| Diabetes | −0.42 | 1.04 | 1.12 | 0.21 |
| 0.64 | 0.0004 | 1.43e-09 | 0.17 |
Discrimination and calibration metrics for each age model.
|
|
|
|
| |
|---|---|---|---|---|
| AUC | 0.67 (0.16) | 0.70 (0.05) | 0.72 (0.03) | 0.70 (0.03) |
| F1 | 0.22 (0.16) | 0.41 (0.09) | 0.47(0.06) | 0.44(0.04) |
| Accuracy | 0.70 (0.10) | 0.71 (0.05) | 0.68 (0.04) | 0.71(0.02) |
| Hosmer and Lemeshow test | 0.23 (7) | 0.34 (9) | 0.32 (8) | 0.35 (10) |
| Spiegelhalter's test | 0.34 (7) | 0.52 (9) | 0.44 (7) | 0.58 (10) |
Figure 1Sankey diagram showing the change in feature importance ranking by age group.