| Literature DB >> 35058500 |
Weiting Huang1, Tan Wei Ying2, Woon Loong Calvin Chin3, Lohendran Baskaran3, Ong Eng Hock Marcus4, Khung Keong Yeo3, Ng See Kiong2.
Abstract
This study looked at novel data sources for cardiovascular risk prediction including detailed lifestyle questionnaire and continuous blood pressure monitoring, using ensemble machine learning algorithms (MLAs). The reference conventional risk score compared against was the Framingham Risk Score (FRS). The outcome variables were low or high risk based on calcium score 0 or calcium score 100 and above. Ensemble MLAs were built based on naive bayes, random forest and support vector classifier for low risk and generalized linear regression, support vector regressor and stochastic gradient descent regressor for high risk categories. MLAs were trained on 600 Southeast Asians aged 21 to 69 years free of cardiovascular disease. All MLAs outperformed the FRS for low and high-risk categories. MLA based on lifestyle questionnaire only achieved AUC of 0.715 (95% CI 0.681, 0.750) and 0.710 (95% CI 0.653, 0.766) for low and high risk respectively. Combining all groups of risk factors (lifestyle survey questionnaires, clinical blood tests, 24-h ambulatory blood pressure and heart rate monitoring) along with feature selection, prediction of low and high CVD risk groups were further enhanced to 0.791 (95% CI 0.759, 0.822) and 0.790 (95% CI 0.745, 0.836). Besides conventional predictors, self-reported physical activity, average daily heart rate, awake blood pressure variability and percentage time in diastolic hypertension were important contributors to CVD risk classification.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35058500 PMCID: PMC8776753 DOI: 10.1038/s41598-021-04649-y
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
List of risk factors used for prediction in this study.
| ALT, AST, Albumin, CholesterolHDL, CholesterolLDL, CholesterolTotal, Creatinine, Glucose, Haemoglobin, Triglycerides, Urea, WBCCount |
| Calories burned, Steps, Distance, Floors, Minutes sedentary, Minutes lightly active, Minutes fairly active, Minutes very active, Activity calories, Minutes asleep, Minutes awake, Number of awakenings, Time in bed Minutes REM sleep, Minutes light sleep, Minutes deep sleep |
Figure 1Modelling flow chart using ensemble MLA for cardiovascular risk prediction.
Demographics by risk categories.
| Risk factors | Total (n = 600) | Low risk (Agatston = 0) (n = 421) | High risk (Agatston > = 100) (n = 55) | Intermediate risk (Agatston 1–99) (n = 124) | *P-Values |
|---|---|---|---|---|---|
| Age | 49.6 ± 9.2 | 47.02 ± 8.68 | 58.55 ± 6.55 | 54.39 ± 7.34 | 0.0000 |
| Gender (Male 1, Female 0) | 276 (46)% | 155 (36.82)% | 43 (78.18)% | 78 (62.9)% | 0.0001 |
| Body Mass Index (BMI) | 23.63 ± 3.72 | 23.59 ± 3.71 | 23.85 ± 3.63 | 23.7 ± 3.82 | 0.0000 |
| Waist circumference | 83.09 ± 11.01 | 82.38 ± 11.11 | 85.95 ± 9.71 | 84.24 ± 11 | 0.0000 |
| Hip circumference | 95.15 ± 9.82 | 95.37 ± 9.26 | 93.27 ± 14.54 | 95.21 ± 9.07 | 0.0000 |
| Chinese | 561 (93.5)% | 396 (94.06)% | 51 (92.73)% | 114 (91.94)% | 0.0000 |
| Indian | 18 (3)% | 10 (2.38)% | 3 (5.45)% | 5 (4.03)% | 0.0000 |
| Malay | 10 (1.67)% | 7 (1.66)% | 1 (1.82)% | 2 (1.61)% | 0.0001 |
| Others | 11 (1.83)% | 8 (1.9)% | 0 (0)% | 3 (2.42)% | 0.9993 |
| < $3000 | 235 (39.17)% | 150 (35.63)% | 25 (45.45)% | 60 (48.39)% | 0.5008 |
| ≥ $3000—$4999 | 146 (24.33)% | 116 (27.55)% | 14 (25.45)% | 16 (12.9)% | 0.0005 |
| ≥ $5000 | 219 (36.5)% | 155 (36.82)% | 16 (29.09)% | 48 (38.71)% | 0.0027 |
| Not working | 115 (19.17)% | 73 (17.34)% | 14 (25.45)% | 28 (22.58)% | 0.0005 |
| Blue-collar worker | 32 (5.33)% | 20 (4.75)% | 4 (7.27)% | 8 (6.45)% | 0.0000 |
| Pink-collar worker | 45 (7.5)% | 32 (7.6)% | 2 (3.64)% | 11 (8.87)% | 0.0000 |
| White-collar worker | 404 (67.33)% | 293 (69.6)% | 35 (63.64)% | 76 (61.29)% | 0.0459 |
| Other workers | 4 (0.67)% | 3 (0.71)% | 0 (0)% | 1 (0.81)% | 0.9993 |
| Marital status (Married 1, else 0) | 473 (78.83)% | 327 (77.67)% | 52 (94.55)% | 94 (75.81)% | 0.0000 |
| Highest education (at least university degree 1, else 0) | 310 (51.67)% | 225 (53.44)% | 27 (49.09)% | 58 (46.77)% | 0.8927 |
| Smoking history | 48 (8)% | 31 (7.36)% | 8 (14.55)% | 9 (7.26)% | 0.0000 |
| Alcohol consumption | 59 (9.83)% | 44 (10.45)% | 6 (10.91)% | 9 (7.26)% | 0.0000 |
| Personal/family history of Diabetes Mellitus | 201 (33.5)% | 135 (32.07)% | 15 (27.27)% | 51 (41.13)% | 0.0012 |
| Personal/family history of Hyperlipidemia | 110 (18.33)% | 74 (17.58)% | 9 (16.36)% | 27 (21.77)% | 0.0000 |
| Personal/family history of Hypertension | 275 (45.83)% | 191 (45.37)% | 22 (40)% | 62 (50)% | 0.1407 |
| Personal/family history of ischemic heart disease | 69 (11.5)% | 44 (10.45)% | 8 (14.55)% | 17 (13.71)% | 0.0000 |
| Medication for BP and dyslipidemia | 12 (2)% | 2 (0.48)% | 4 (7.27)% | 6 (4.84)% | 0.0000 |
Self-reported lifestyle factors and 24 h blood pressure and heart rate monitoring data by risk categories.
| Lifestyle factors | Total (n = 600) | Low risk (Agatston = 0) (n = 421) | High risk (Agatston > = 100) | Intermediate risk (Agatston 1–99) | *P-values |
|---|---|---|---|---|---|
| Coffee (number of cups per day) | 0.98 ± 1.02 | 0.98 ± 1.03 | 1.07 ± 1.03 | 0.93 ± 0.97 | 0.0000 |
| Fruits (servings per day) | 1.32 ± 0.86 | 1.3 ± 0.9 | 1.31 ± 0.6 | 1.38 ± 0.83 | 0.0000 |
| Vegetables (servings per day) | 1.93 ± 0.95 | 1.94 ± 0.99 | 1.87 ± 0.77 | 1.9 ± 0.9 | 0.0000 |
| Sleep Hours | 6.57 ± 1.03 | 6.55 ± 1.03 | 6.4 ± 0.87 | 6.71 ± 1.07 | 0.0000 |
| Bad | 7 (1.17)% | 7 (1.66)% | 0 (0)% | 0 (0)% | 0.9993 |
| Fairly bad | 53 (8.83)% | 32 (7.6)% | 9 (16.36)% | 12 (9.68)% | 0.0000 |
| Fairly good | 397 (66.17)% | 281 (66.75)% | 33 (60)% | 83 (66.94)% | 0.1407 |
| Very good | 138 (23)% | 98 (23.28)% | 13 (23.64)% | 27 (21.77)% | 0.0002 |
| Stress Level | 4.46 ± 2.14 | 4.6 ± 2.15 | 4.47 ± 2 | 3.98 ± 2.12 | 0.0000 |
| Lifestyle Active | 5.55 ± 2.26 | 5.29 ± 2.28 | 6.25 ± 1.99 | 6.11 ± 2.13 | 0.0000 |
| Traditional medicine, Therapies and Vitamins | 268 (44.67)% | 187 (44.42)% | 30 (54.55)% | 51 (41.13)% | 0.5008 |
| Systolic BP single reading | 128.1 ± 17.25 | 124.91 ± 16.43 | 137.8 ± 13.14 | 134.61 ± 18.22 | 0.0000 |
| Diastolic BP single reading | 78.19 ± 12.92 | 76.21 ± 12.82 | 84.09 ± 11.14 | 82.26 ± 12.31 | 0.0000 |
| Average daily systolic BP | 116.59 ± 13.27 | 113.99 ± 12.29 | 125.09 ± 11.94 | 121.63 ± 14.1 | 0.0000 |
| Average daily diastolic BP | 73.93 ± 8.72 | 72.3 ± 7.94 | 79.18 ± 8.59 | 77.14 ± 9.54 | 0.0000 |
| Average daily mean aterial pressure (MAP) | 88.09 ± 9.36 | 86.24 ± 8.52 | 93.96 ± 8.73 | 91.73 ± 10.23 | 0.0000 |
| Average daily pulse pressure (PP) | 42.57 ± 7.3 | 41.57 ± 7.09 | 46 ± 5.79 | 44.46 ± 7.83 | 0.0000 |
| Average daily heart rate (HR) | 71.47 ± 8.59 | 71.33 ± 8.71 | 73.51 ± 8.39 | 71.03 ± 8.19 | 0.0000 |
| % time awake systolic BP ≥ 135 | 0.18 ± 0.25 | 0.14 ± 0.22 | 0.34 ± 0.27 | 0.27 ± 0.3 | 0.0000 |
| % time awake diastolic BP ≥ 85 | 0.23 ± 0.26 | 0.19 ± 0.23 | 0.37 ± 0.3 | 0.32 ± 0.31 | 0.0000 |
| % time nocturnal systolic BP ≥ 120 | 0.23 ± 0.29 | 0.19 ± 0.27 | 0.42 ± 0.34 | 0.3 ± 0.32 | 0.0000 |
| % time nocturnal diastolic BP ≥ 70 | 0.42 ± 0.32 | 0.37 ± 0.31 | 0.61 ± 0.31 | 0.52 ± 0.33 | 0.0000 |
| % time average daily systolic BP ≥ 120 | 0.4 ± 0.31 | 0.34 ± 0.29 | 0.62 ± 0.26 | 0.5 ± 0.31 | 0.0000 |
| % time average daily diastolic BP ≥ 80 | 0.31 ± 0.27 | 0.27 ± 0.24 | 0.47 ± 0.28 | 0.4 ± 0.31 | 0.0000 |
| Awake systolic BP ARV | 9.04 ± 2.12 | 8.66 ± 1.88 | 10.32 ± 2.23 | 9.75 ± 2.42 | 0.0000 |
| Nocturnal systolic BP ARV | 8.97 ± 3.11 | 8.77 ± 3.16 | 9.61 ± 3.28 | 9.35 ± 2.79 | 0.0000 |
| Awake diastolic BP ARV | 43.39 ± 7.48 | 42.32 ± 7.15 | 47.04 ± 5.96 | 45.43 ± 8.25 | 0.0000 |
| Nocturnal diastolic BP ARV | 40.92 ± 7.16 | 40.25 ± 6.98 | 43.43 ± 6.97 | 42.06 ± 7.51 | 0.0000 |
| Average daily systolic BP ARV | 8.87 ± 1.93 | 8.53 ± 1.78 | 10.03 ± 1.94 | 9.49 ± 2.09 | 0.0000 |
| Average daily diastolic BP ARV | 42.64 ± 7.14 | 41.67 ± 6.88 | 45.98 ± 5.79 | 44.46 ± 7.76 | 0.0000 |
Blood test variables by risk categories.
| Blood tests | Total (n = 600) | Low risk (Agatston = 0) (n = 421) | High risk (Agatston > = 100) (n = 55) | Intermediate risk (Agatston 1–99) (n = 124) | *P-Values |
|---|---|---|---|---|---|
| Alanine aminotransferase (ALT) | 21.38 ± 13.02 | 20.09 ± 12.05 | 28.13 ± 19.49 | 22.75 ± 11.59 | 0.0000 |
| Aspartate transaminase (AST) | 26.54 ± 8.33 | 25.57 ± 7.71 | 31.82 ± 12.43 | 27.5 ± 7.08 | 0.0000 |
| Albumin | 43.14 ± 2.36 | 42.96 ± 2.43 | 43.75 ± 2.15 | 43.45 ± 2.17 | 0.0000 |
| Cholesterol high-density lipoprotein (HDL) | 1.49 ± 0.34 | 1.5 ± 0.33 | 1.48 ± 0.35 | 1.47 ± 0.35 | 0.0000 |
| Cholesterol low-density lipoprotein (LDL) | 3.39 ± 0.83 | 3.29 ± 0.82 | 3.68 ± 0.83 | 3.6 ± 0.81 | 0.0000 |
| Cholesterol total | 5.43 ± 0.94 | 5.31 ± 0.91 | 5.8 ± 0.96 | 5.64 ± 0.94 | 0.0000 |
| Creatinine | 68.52 ± 15.72 | 66.39 ± 15.42 | 74.42 ± 16.49 | 73.13 ± 14.81 | 0.0000 |
| Glucose | 5.29 ± 0.69 | 5.21 ± 0.66 | 5.67 ± 1.05 | 5.41 ± 0.51 | 0.0000 |
| Haemoglobin | 13.64 ± 1.47 | 13.45 ± 1.5 | 14.26 ± 1.28 | 14.02 ± 1.31 | 0.0000 |
| Triglycerides | 1.18 ± 0.68 | 1.12 ± 0.65 | 1.42 ± 0.68 | 1.29 ± 0.73 | 0.0000 |
| White blood cell count (WBC) | 5.81 ± 1.6 | 5.86 ± 1.66 | 5.61 ± 1.27 | 5.74 ± 1.52 | 0.0000 |
| Urea | 4.45 ± 1.13 | 4.33 ± 1.1 | 4.63 ± 1.02 | 4.79 ± 1.19 | 0.0000 |
Fitbit Charge HR data by risk categories.
| Wearables | Total (n = 600) | Low risk (Agatston = 0) (n = 421) | High risk (Agatston > = 100) (n = 55) | Intermediate risk (Agatston 1–99) | *P-values |
|---|---|---|---|---|---|
| Calories burned | 2161.41 ± 478.39 | 2102.27 ± 451.04 | 2447.03 ± 559.49 | 2213.68 ± 470.08 | 0.0000 |
| Steps | 9406.76 ± 3198.63 | 9207.3 ± 3170.53 | 10,274.93 ± 3326.31 | 9631.18 ± 3174.39 | 0.0000 |
| Distance | 6.52 ± 2.33 | 6.36 ± 2.34 | 7.18 ± 2.33 | 6.73 ± 2.24 | 0.0000 |
| Floors | 9.02 ± 7.44 | 8.99 ± 7.7 | 8.74 ± 6.75 | 9.26 ± 6.93 | 0.0000 |
| Minutes sedentary | 873.27 ± 120.03 | 881.41 ± 121.88 | 836.56 ± 112.48 | 864.75 ± 114.6 | 0.0000 |
| Minutes lightly active | 220.44 ± 64.78 | 219.02 ± 63.86 | 224.04 ± 69.9 | 223.37 ± 65.73 | 0.0000 |
| Minutes fairly active | 18.39 ± 17.48 | 15.62 ± 13.59 | 34.37 ± 27.21 | 19.53 ± 18.53 | 0.0000 |
| Minutes very active | 19.77 ± 19.56 | 17 ± 16.99 | 34.29 ± 28.56 | 21.67 ± 18.76 | 0.0000 |
| Activity calories | 943.78 ± 359.86 | 892.63 ± 337.24 | 1204.36 ± 448.75 | 982.18 ± 323.04 | 0.0000 |
| Minutes asleep | 447.4 ± 90.75 | 451.4 ± 90.18 | 452.61 ± 93.08 | 431.44 ± 90.7 | 0.0000 |
| Minutes awake | 37.19 ± 16.39 | 37.29 ± 15.95 | 35.6 ± 20.82 | 37.68 ± 15.42 | 0.0000 |
| Number of awakenings | 2.78 ± 2.91 | 2.77 ± 2.91 | 2.22 ± 1.31 | 3.12 ± 3.43 | 0.0001 |
| Time in bed | 485.65 ± 98.04 | 489.78 ± 96.68 | 489.23 ± 101.39 | 470.1 ± 100.36 | 0.0000 |
| Minutes REM sleep | 1.33 ± 7.52 | 1.17 ± 7.37 | 0.28 ± 1.85 | 2.42 ± 9.48 | 0.7914 |
| Minutes light sleep | 4.02 ± 21.58 | 3.41 ± 20.18 | 0.92 ± 6.14 | 7.61 ± 29.43 | 0.7492 |
| Minutes deep sleep | 0.92 ± 5.1 | 0.82 ± 5.05 | 0.14 ± 0.95 | 1.61 ± 6.33 | 0.8419 |
*Compares between low risk and high risk categories.
Performance of conventional Framingham Risk Score and MLA models by variable groups in low risk categories.
| Model 1: Survey questionnaire | |||||
| Model 2: 24 h ambulatory blood pressure and heart rate | |||||
| Model 3: Clinical blood results | |||||
| Model 4: Model 1 + Model 2 | |||||
| Model 5: Model 1 + Model 3 | |||||
| Model 6: Model 1 + Model 2 + Model 3 | |||||
| Model 6*: Model 1 to Model 3 with feature selection | |||||
| Model 7: Physical activity and sleep trackers |
Figure 2ROC curves for low risk group (left) and high risk group (right). Colours and line style represent the prediction performance for different models. Prediction performance for both low and high risk groups were significantly better in model 5* compared to FRS.
Figure 3The top 15 features of MLA models showing the relative importance of the different variables in CVD risk prediction. Age, glucose, cholesterol LDL, wake period blood pressure variability, medication for BP and dyslipidemia, triglycerides and albumin reading were some common predictors across the different versions.