| Literature DB >> 31784446 |
Abstract
OBJECTIVES: We aimed to test whether or not adding (1) nutrition predictor variables and/or (2) using machine learning models improves cardiovascular death prediction versus standard Cox models without nutrition predictor variables.Entities:
Keywords: cardiovascular disease; machine learning; nutrition; risk prediction
Mesh:
Year: 2019 PMID: 31784446 PMCID: PMC6924849 DOI: 10.1136/bmjopen-2019-032703
Source DB: PubMed Journal: BMJ Open ISSN: 2044-6055 Impact factor: 2.692
Descriptive statistics on the study sample (National Health and Nutrition Examination Survey, 1999–2010 linked to the 2011 National Death Index, n=41 990)
| Training data for model derivation | Test data for model evaluation | P value for difference* | |
| n=29 390 | n=12 600 | ||
| CVD death | |||
| No | 28 211 (96.0) | 12 093 (96.0) | 0.96 |
| Yes | 1179 (4.0) | 507 (4.0) | |
| Heart disease death | |||
| No | 28 507 (97.0) | 12 214 (96.9) | 0.76 |
| Yes | 883 (3.0) | 386 (3.1) | |
| Cerebrovascular death | |||
| No | 29 094 (99.0) | 12 479 (99.0) | 0.71 |
| Yes | 296 (1.0) | 121 (1.0) | |
| Time since interview (months) | 79.3 (±41.4) | 79.4 (±41.6) | 0.84 |
| Wave | |||
| 99–00 | 3810 (13.0) | 1633 (13.0) | 1.0 |
| 01–02 | 8853 (30.1) | 3795 (30.1) | |
| 03–04 | 3926 (13.4) | 1684 (13.4) | |
| 05–06 | 3891 (13.2) | 1669 (13.2) | |
| 07–08 | 4353 (14.8) | 1866 (14.8) | |
| 09–10 | 4557 (15.5) | 1953 (15.5) | |
| Age | 50.0 (±20.4) | 50.1 (±20.6) | 0.60 |
| Sex | |||
| Male | 13 924 (47.4) | 5887 (46.7) | 0.22 |
| Female | 15 466 (52.6) | 6713 (53.3) | |
| Black | |||
| No | 14 807 (50.4) | 6335 (50.3) | 0.94 |
| Yes | 5882 (20.0) | 2511 (19.9) | |
| Missing | 8701 (29.6) | 3754 (29.8) | |
| Hispanic | |||
| No | 21 871 (74.4) | 9359 (74.3) | 0.77 |
| Yes | 7519 (25.6) | 3241 (25.7) | |
| Education level | |||
| <9th | 3942 (13.4) | 1756 (13.9) | 0.087 |
| 9–11 | 4538 (15.4) | 1954 (15.5) | |
| HS degree | 6543 (22.3) | 2716 (21.6) | |
| Some college or Associate’s | 7138 (24.3) | 2986 (23.7) | |
| College degree | 5061 (17.2) | 2268 (18.0) | |
| Missing | 2168 (7.4) | 920 (7.3) | |
| Ratio of family income to poverty threshold | 2.5 (±1.6) | 2.5 (±1.6) | 0.59 |
| Missing | 2655 (9.0) | 1109 (8.8) | |
| Total cholesterol | 198.0 (±43.1) | 198.0 (±43.9) | 0.86 |
| Missing | 3641 (12.4) | 1484 (11.8) | |
| HDL | 45.5 (±23.0) | 45.6 (±23.0) | 0.36 |
| Missing | 3643 (12.4) | 1484 (11.8) | |
| SBP | 125.4 (±20.6) | 125.6 (±21.1) | 0.38 |
| Missing | 3175 (10.8) | 1348 (10.7) | |
| DBP | 69.9 (±12.6) | 69.8 (±12.7) | 0.50 |
| Missing | 3374 (11.5) | 1431 (11.4) | |
| Number of blood pressure medications | |||
| 0 | 19 892 (67.7) | 8436 (67.0) | 0.32 |
| 1 | 7851 (26.7) | 3452 (27.4) | |
| 2 or more | 1647 (5.6) | 712 (5.7) | |
| Type 2 diabetes | |||
| No | 10 537 (35.9) | 4541 (36.0) | 0.42 |
| Yes | 4783 (16.3) | 2008 (15.9) | |
| Missing | 14 070 (47.9) | 6051 (48.0) | |
| Smoking | |||
| No | 23 774 (80.9) | 10 185 (80.8) | 0.90 |
| Yes | 5615 (19.1) | 2414 (19.2) | |
| Missing | 1 (0.0) | 1 (0.0) | |
| HEI | 47.0 (±11.0) | 47.2 (±11.0) | 0.28 |
| Missing | 3277 (11.2) | 1361 (10.8) | |
| AHEI | 47.1 (±11.1) | 47.1 (±11.0) | 0.76 |
| Missing | 3263 (11.1) | 1353 (10.7) | |
| MDS | 5.1 (±1.2) | 5.1 (±1.2) | 0.095 |
| Missing | 3270 (11.1) | 1368 (10.9) | |
| DASH | 47.4 (±9.3) | 47.4 (±9.4) | 0.75 |
| Missing | 8835 (30.1) | 3661 (29.1) |
Mean (±SD) reported for continuous variables and N (%) reported for categorical variables.
Statistics are grouped to reflect participants in the training (n=29 390/41 990=70%) or test (n=12 600/41 990=30%) data subsets.
*Wilcoxon rank sum test for continuous variables, eg, age, and Fisher’s exact test for categorical variables, eg, black race.
AHEI, Alternative Healthy Eating Index; CVD, cardiovascular disease; DASH, Dietary Approaches to Stop Hypertension diet score; HDL, high-density lipoprotein; HEI, Healthy Eating Index; MDS, Mediterranean Diet Score.
Comparisons of participant characteristics by outcome (National Health and Nutrition Examination Survey, 1999–2010 linked to the 2011 National Death Index, n=41 990)
| No CVD | CVD | P value for difference* | |
| n=40 304 | n=1686 | ||
| Time since interview (months) | 80.3 (±41.4) | 55.7 (±34.9) | <0.0001 |
| Wave | |||
| 99–00 | 5168 (12.8) | 275 (16.3) | <0.0001 |
| 01–02 | 11 681 (29.0) | 967 (57.4) | |
| 03–04 | 5401 (13.4) | 209 (12.4) | |
| 05–06 | 5451 (13.5) | 109 (6.5) | |
| 07–08 | 6127 (15.2) | 92 (5.5) | |
| 09–10 | 6476 (16.1) | 34 (2.0) | |
| Age | 49.0 (±20.1) | 74.3 (±11.9) | <0.0001 |
| Sex | |||
| Male | 18 883 (46.9) | 928 (55.0) | <0.0001 |
| Female | 21 421 (53.1) | 758 (45.0) | |
| Black | |||
| No | 20 005 (49.6) | 1137 (67.4) | <0.0001 |
| Yes | 8110 (20.1) | 283 (16.8) | |
| Missing | 12 189 (30.2) | 266 (15.8) | |
| Hispanic | |||
| No | 29 781 (73.9) | 1449 (85.9) | <0.0001 |
| Yes | 10 523 (26.1) | 237 (14.1) | |
| Education level | |||
| <9th | 5223 (13.0) | 475 (28.2) | <0.0001 |
| 9–11 | 6201 (15.4) | 291 (17.3) | |
| HS degree | 8923 (22.1) | 336 (19.9) | |
| Some college or Associate’s | 9776 (24.3) | 348 (20.6) | |
| College degree | 7111 (17.6) | 218 (12.9) | |
| Missing | 3070 (7.6) | 18 (1.1) | |
| Ratio of family income to poverty threshold | 2.5 (±1.6) | 2.1 (±1.4) | <0.0001 |
| Missing | 3565 (8.8) | 199 (11.8) | |
| Total cholesterol | 198.1 (±43.2) | 196.2 (±47.0) | 0.1 |
| Missing | 4670 (11.6) | 455 (27.0) | |
| HDL | 45.5 (±23.0) | 45.0 (±24.2) | 0.002 |
| Missing | 4672 (11.6) | 455 (27.0) | |
| SBP | 124.8 (±20.3) | 142.9 (±26.8) | <0.0001 |
| Missing | 4114 (10.2) | 409 (24.3) | |
| DBP | 70.0 (±12.5) | 67.5 (±14.7) | <0.0001 |
| Missing | 4359 (10.8) | 446 (26.5) | |
| Number of blood pressure medications | |||
| 0 | 27 894 (69.2) | 434 (25.7) | <0.0001 |
| 1 | 10 205 (25.3) | 1098 (65.1) | |
| 2 or more | 2205 (5.5) | 154 (9.1) | |
| Type 2 diabetes | |||
| No | 14 680 (36.4) | 398 (23.6) | <0.0001 |
| Yes | 6229 (15.5) | 562 (33.3) | |
| Missing | 19 395 (48.1) | 726 (43.1) | |
| Smoking | |||
| No | 32 508 (80.7) | 1451 (86.1) | <0.0001 |
| Yes | 7794 (19.3) | 235 (13.9) | |
| Missing | 2 (0.0) | 0 (0.0) | |
| HEI | 46.9 (±11.0) | 51.0 (±10.3) | <0.0001 |
| Missing | 4179 (10.4) | 459 (27.2) | |
| AHEI | 47.1 (±11.1) | 48.0 (±10.9) | 0.006 |
| Missing | 4158 (10.3) | 458 (27.2) | |
| MDS | 5.1 (±1.2) | 5.1 (±1.2) | 0.1 |
| Missing | 4472 (11.1) | 166 (9.8) | |
| DASH | 47.4 (±9.4) | 48.1 (±9.2) | 0.01 |
| Missing | 11 774 (29.2) | 722 (42.8) |
Descriptive summary of variables in those participants without CVD event (n=40 304) versus those with a CVD event (n=1686) during the follow-up period. Mean (±SD) reported for continuous variables and N (%) reported for categorical variables.
*Wilcoxon rank sum test for continuous variables, eg, age, and Fisher’s exact test for categorical variables, eg, black race.
AHEI, Alternative Healthy Eating Index; CVD, cardiovascular disease; DASH, Dietary Approaches to Stop Hypertension diet score; HDL, high-density lipoprotein; HEI, Healthy Eating Index; MDS, Mediterranean Diet Score.
Figure 1Calibration slopes and CIs of models in the hold-out test set (National Health and Nutrition Examination Survey, 1999–2010 linked to the 2011 National Death Index, n=12 600). All models included demographic variables age, sex and race (black race, Hispanic ethnicity); covariates of total cholesterol (mg/dL), high-density lipoprotein (HDL) cholesterol (mg/dL), systolic blood pressure (mm Hg), blood pressure treatment status (yes/no), diabetes status (yes/no) and current smoking status (yes/no). ACC, American College of Cardiology; AHEI, Alternative Healthy Eating Index; DASH, Dietary Approaches to Stop Hypertension diet score; GBM, gradient boosted machine; GND, Greenwood-Nam-D’Agostino; HEI, Healthy Eating Index; MDS, Mediterranean Diet Score; RF, random forest.
Figure 2Model discrimination (C-statistic) in the hold-out test set (National Health and Nutrition Examination Survey, 1999–2010 linked to the 2011 National Death Index, n=12 600). All models included demographic variables age, sex and race (black race, Hispanic ethnicity); covariates of total cholesterol (mg/dL), high-density lipoprotein (HDL) cholesterol (mg/dL), systolic blood pressure (mm Hg), blood pressure treatment status (yes/no), diabetes status (yes/no) and current smoking status (yes/no). ACC, American College of Cardiology; AHEI, Alternative Healthy Eating Index; DASH, Dietary Approaches to Stop Hypertension diet score; GBM, gradient boosted machine; HEI, Healthy Eating Index; MDS, Mediterranean Diet Score; RF, random forest.