| Literature DB >> 34910160 |
Sebastiano Barbieri1, Suneela Mehta2, Billy Wu2, Chrianna Bharat3, Katrina Poppe2, Louisa Jorm1, Rod Jackson2.
Abstract
BACKGROUND: Machine learning-based risk prediction models may outperform traditional statistical models in large datasets with many variables, by identifying both novel predictors and the complex interactions between them. This study compared deep learning extensions of survival analysis models with Cox proportional hazards models for predicting cardiovascular disease (CVD) risk in national health administrative datasets.Entities:
Keywords: Cardiovascular diseases; deep learning; health planning; machine learning; population health; primary prevention; risk assessment; survival analysis
Mesh:
Year: 2022 PMID: 34910160 PMCID: PMC9189958 DOI: 10.1093/ije/dyab258
Source DB: PubMed Journal: Int J Epidemiol ISSN: 0300-5771 Impact factor: 9.685
Figure 1A schematic representation of the neural network used to map a person’s predictors and clinical history to the log of the relative risk function. Code embeddings indicate vector representations of diagnoses, procedures and medications. Type embeddings describe the type of code (primary diagnosis, secondary diagnosis, external cause of injury, or procedure or operation). An extended description is reported in the main text. The ‘‖’ symbol indicates vector concatenation. Gated recurrent units (GRUs) are a gating mechanism in recurrent neural networks
Participant characteristics (N = 2 164 872)
| Women | Men | |
|---|---|---|
| Participants | 1 141 925 (52.7%) | 1 022 947 (47.3%) |
| Age in years, mean (standard deviation) | 49.0 (11.8) | 49.0 (11.6) |
| Ethnicity | ||
| European | 797 571 (69.8%) | 734 891 (71.8%) |
| Māori | 132 802 (11.6%) | 106 912 (10.5%) |
| Pacific | 60 965 (5.3%) | 54 659 (5.3%) |
| Indian | 38 481 (3.4%) | 36 248 (3.5%) |
| Other | 112 106 (9.8%) | 90 237 (8.8%) |
| Deprivation quintile | ||
| 1 | 272 564 (23.9%) | 242 794 (23.7%) |
| 2 | 244 140 (21.4%) | 216 602 (21.2%) |
| 3 | 227 684 (19.9%) | 202 118 (19.8%) |
| 4 | 212 257 (18.6%) | 190 774 (18.6%) |
| 5 | 185 280 (16.2%) | 170 659 (16.7%) |
| Diabetes | 67 143 (5.9%) | 65 290 (6.4%) |
| Atrial fibrillation | 6393 (0.6%) | 11 900 (1.2%) |
| Medications dispensed at baseline | ||
| Blood-pressure-lowering | 194 670 (17.0%) | 167 839 (16.4%) |
| Lipid-lowering | 110 428 (9.7%) | 137 529 (13.4%) |
| Antiplatelet/anticoagulant | 64 158 (5.6%) | 79 443 (7.8%) |
| Follow-up | ||
| Total follow-up, years (mean) | 5 451 552 (4.8) | 4 792 390 (4.7) |
| Cardiovascular disease deaths | 2986 (0.3%) | 5153 (0.5%) |
| Cardiovascular disease events (non-fatal and fatal) | 23 592 (2.1%) | 38 335 (3.7%) |
| Median time to cardiovascular disease event, years | 2.8 (1.4, 3.9) | 2.7 (1.4, 3.9) |
| Non-cardiovascular disease deaths | 13 771 (1.2%) | 15 660 (1.5%) |
| Censored at 5 years | 1 021 829 (89.5%) | 866 167 (84.7%) |
Values are N (%) unless otherwise stated.
Among those with an event between 2013 and 2017 inclusively.
Adjusted local hazard ratios (HRs) for time to cardiovascular disease event within 5 years for women, determined by the deep learning model (only the 10 diagnoses and procedures, and the 10 medications, associated with the largest hazard ratios are reported)
| Women ( | Deep learning model | |
|---|---|---|
| Predictors |
| Adjusted local HRs (95% CI) |
| Age (per year) | 1.09 (1.06, 1.11) | |
| Ethnicity | ||
| European | 797 571 (69.8%) | 1 |
| Māori | 132 802 (11.6%) | 1.96 (1.95, 1.97) |
| Pacific | 60 965 (5.3%) | 1.68 (1.67, 1.69) |
| Indian | 38 481 (3.4%) | 0.925 (0.918, 0.932) |
| Other | 112 106 (9.8%) | 0.720 (0.716, 0.723) |
| Deprivation quintile (per quintile) | 1.16 (1.15, 1.16) | |
| Diabetes | 67 143 (5.9%) | 1.39 (1.37, 1.40) |
| Atrial fibrillation | 6393 (0.6%) | 1.68 (1.66, 1.69) |
| Medications dispensed at baseline | ||
| Blood pressure lowering | 194 670 (17.0%) | 1.31 (1.29, 1.33) |
| Lipid lowering | 110 428 (9.7%) | 0.998 (0.990, 1.01) |
| Antiplatelet/anticoagulant | 64 158 (5.6%) | 1.46 (1.45, 1.47) |
| Interactions | ||
| Age (years)*blood-pressure-lowering medication | 0.980 (0.978, 0.982) | |
| Age (years)*diabetes | 0.999 (0.997, 1.00) | |
| Age (years)*atrial fibrillation | 0.963 (0.961, 0.966) | |
| Blood-pressure-lowering medication*diabetes | 1.10 (1.09, 1.11) | |
| Antiplatelet/anticoagulant medications*diabetes | 0.883 (0.874, 0.892) | |
| Blood-pressure-lowering medication*lipid -lowering medication | 0.997 (0.989, 1.01) | |
| Top 10 diagnoses and procedures | ||
| Z72.0: Tobacco use, current | 84 589 (7.4%) | 2.04 (1.99, 2.10) |
| I10: Essential (primary) hypertension | 14 167 (1.2%) | 1.98 (1.91, 2.06) |
| R07.4: Chest pain, unspecified | 17 208 (1.5%) | 1.69 (1.63, 1.76) |
| 92514-39: General anaesthesia, ASA 3 (Patient with severe systemic disease that limits activity), nonemergency or not known | 10 961 (1.0%) | 1.55 (1.49, 1.61) |
| 56001-00: Computerized tomography of brain | 16 845 (1.5%) | 1.53 (1.47, 1.58) |
| J44.1: Chronic obstructive pulmonary disease with acute exacerbation, unspecified | 1096 (0.1%) | 1.52 (1.47, 1.58) |
| Z92.2: Personal history of long-term (current) use of other medicaments | 2661 (0.2%) | 1.52 (1.47, 1.58) |
| H35.0: Background retinopathy and retinal vascular changes | 692 (0.1%) | 1.51 (1.46, 1.57) |
| Z92.22: Personal history of long-term (current) use of other medicaments, insulin | 2169 (0.2%) | 1.47 (1.42, 1.53) |
| Z72.1: Alcohol use | 957 (0.1%) | 1.45 (1.40, 1.50) |
| Top 10 medications | ||
| Nicotine | 79 506 (7.0%) | 1.74 (1.70, 1.78) |
| Varenicline tartrate | 31 750 (2.8%) | 1.54 (1.50, 1.58) |
| Furosemide] | 13 340 (1.2%) | 1.44 (1.40, 1.49) |
| Tiotropium bromide | 4078 (0.4%) | 1.43 (1.39, 1.47) |
| Bupropion hydrochloride | 30 796 (2.7%) | 1.40 (1.36, 1.43) |
| Cilazapril | 76 762 (6.7%) | 1.38 (1.35, 1.41) |
| Malathion | 22 441 (2.0%) | 1.37 (1.33, 1.41) |
| Salbutamol with ipratropium bromide | 22 240 (1.9%) | 1.35 (1.32, 1.39) |
| Quinapril | 48 373 (4.2%) | 1.33 (1.30, 1.37) |
| Glyceryl trinitrate | 15 899 (1.4%) | 1.31 (1.26, 1.37) |
The local hazard ratios for each predictor are adjusted for all other predictors. Values in parentheses are 95% confidence intervals unless otherwise stated.
Age was centred at the mean value of 49.021. Deprivation quintile was centred around quintile three. The baseline survival estimate at 5 years for the deep learning model, relevant to the mean value of age, deprivation quintile three and the reference group of categorical variables, was 0.9926104519395.
Average and range (in parentheses) of estimated local hazard ratios for all values of the continuous predictor.
Adjusted local hazard ratios (HRs) for time to cardiovascular disease event within 5 years for men, determined by the deep learning model (only the 10 diagnoses and procedures and the 10 medications associated with the largest hazard ratios are reported)
| Men ( | Deep learning model | |
|---|---|---|
| Predictors |
| Adjusted local HRs (95% CI) |
| Age (per year) | 1.09 (1.06, 1.13) | |
| Ethnicity | ||
| European | 734 891 (71.8%) | 1 |
| Māori | 106 912 (10.5%) | 1.69 (1.69, 1.70) |
| Pacific | 54 659 (5.3%) | 1.44 (1.43, 1.44) |
| Indian | 36 248 (3.5%) | 1.40 (1.39, 1.41) |
| Other | 90 237 (8.8%) | 0.785 (0.781, 0.790) |
| Deprivation quintile (per quintile) | 1.10 (1.09, 1.10) | |
| Diabetes | 65 290 (6.4%) | 1.46 (1.45, 1.47) |
| Atrial fibrillation | 11 900 (1.2%) | 1.61 (1.59, 1.62) |
| Medications dispensed at baseline | ||
| Blood-pressure-lowering | 167 839 (16.4%) | 1.12 (1.11, 1.13) |
| Lipid-lowering | 137 529 (13.4%) | 0.937 (0.929, 0.945) |
| Antiplatelet/anticoagulant | 79 443 (7.8%) | 1.43 (1.42, 1.44) |
| Interactions | ||
| Age (years)*blood-pressure-lowering medication | 0.987 (0.986, 0.989) | |
| Age (years)*diabetes | 0.993 (0.991, 0.995) | |
| Age (years)*atrial fibrillation | 0.994 (0.991, 0.996) | |
| Blood-pressure-lowering medication*diabetes | 0.969 (0.960, 0.978) | |
| Antiplatelet/anticoagulant medications*diabetes | 0.855 (0.848, 0.863) | |
| Blood-pressure-lowering medication*lipid-lowering medication | 1.01 (1.01, 1.02) | |
| Top 10 diagnoses and procedures | ||
| J44.0: Chronic obstructive pulmonary disease with acute lower respiratory infection | 1529 (0.1%) | 1.56 (1.50, 1.62) |
| N18.90: Unspecified chronic renal failure | 909 (0.1%) | 1.54 (1.49, 1.60) |
| R07.3: Other chest pain | 7665 (0.7%) | 1.51 (1.45, 1.57) |
| E11.71: Non-insulin-dependent diabetes mellitus with multiple complications, stated as uncontrolled | 663 (0.1%) | 1.51 (1.45, 1.56) |
| L97: Ulcer of lower limb, not elsewhere classified | 896 (0.1%) | 1.50 (1.46, 1.55) |
| E11.72: Type 2 diabetes mellitus with features of insulin resistance | 6209 (0.6%) | 1.50 (1.45, 1.55) |
| R07.4: Chest pain, unspecified | 15 470 (1.5%) | 1.47 (1.43, 1.52) |
| G62.9: Polyneuropathy, unspecified | 694 (0.1%) | 1.47 (1.41, 1.53) |
| Z92.2: Personal history of long-term (current) use of other medicaments | 2336 (0.2%) | 1.47 (1.42, 1.53) |
| J44.9: Chronic obstructive pulmonary disease, unspecified | 523 (0.1%) | 1.46 (1.42, 1.51) |
| Top 10 medications | ||
| Quinapril | 46 541 (4.5%) | 1.73 (1.68, 1.78) |
| Varenicline tartrate | 26 037 (2.5%) | 1.73 (1.69, 1.76) |
| Nicotine | 64 493 (6.3%) | 1.68 (1.65, 1.71) |
| Simvastatin | 140 134 (13.7%) | 1.66 (1.62, 1.70) |
| Glyceryl trinitrate | 14 227 (1.4%) | 1.65 (1.58, 1.72) |
| Cilazapril | 79 241 (7.7%) | 1.60 (1.55, 1.64) |
| Bupropion hydrochloride | 25 139 (2.5%) | 1.58 (1.54, 1.61) |
| Tiotropium bromide | 3399 (0.3%) | 1.52 (1.46, 1.58) |
| Salbutamol with ipratropium bromide | 14 745 (1.4%) | 1.46 (1.42, 1.49) |
| Felodipine | 38 670 (3.8%) | 1.39 (1.36, 1.43) |
The local hazard ratios for each predictor are adjusted for all other predictors. Values in parentheses are 95% confidence intervals unless otherwise stated.
Age was centred at the mean value of 49.027. Deprivation quintile was centred around quintile three. The baseline survival estimate at 5 years for the deep learning model, relevant to the mean value of age, deprivation quintile three and the reference group of categorical variables was 0.9812879278038.
Average and range (in parentheses) of estimated local hazard ratios for all values of the continuous predictor.
Figure 2Calibration and discrimination of the deep learning models and Cox proportional hazards models for women and men. The calibration plots show the mean estimated 5-year risk plotted against the proportion of cardiovascular disease events that occurred over 5 years, for deciles of predicted risk. The diagonal line represents perfect calibration. The discrimination plots show the proportion of total observed events that occurred in each decile of predicted risk
Performance metrics for the deep learning models and traditional Cox proportional hazards models
| Performance metric | Statistic (95% CI) | |||||
|---|---|---|---|---|---|---|
| Women | Men | |||||
| Deep learning | Cox proportional hazards |
| Deep learning | Cox proportional hazards |
| |
| R2 |
| 0.425 (0.423, 0.428) | <0.0001 |
| 0.348 (0.346, 0.350) | <0.0001 |
| D statistic |
| 1.76 (1.75, 1.77) | <0.0001 |
| 1.49 (1.49, 1.50) | <0.0001 |
| Harrell’s C |
| 0.795 (0.794, 0.797) | <0.0001 |
| 0.759 (0.758, 0.759) | <0.0001 |
| Integrated Brier score |
| 0.00978 (0.00977, 0.00979) | <0.0001 |
| 0.0177 (0.0177, 0.0177) | <0.0001 |
Better results are in bold. 95% confidence intervals (CIs) are computed using 5 × 2 cross-validation.
Royston and Sauerbrei’s R2 measures how much of the time-to-event occurring is explained by the model. Higher values indicate that more variation is accounted for by the model.
Royston and Sauerbrei’s D statistic and Harrell’s C statistic are measures of discrimination. Better discrimination is indicated by higher values.
The integrated Brier score is affected by both calibration and discrimination., Better calibration and discrimination are indicated by lower values.
Computed using combined 5 × 2 F tests.
Figure 3Calibration plots for the deep learning models and Cox proportional hazards models in specific New Zealand sub-populations (women and men aged 30–44 years, Māori women and men and most deprived women and men), suggesting improved calibration for the deep learning models
Adjusted hazard ratios for time to cardiovascular disease event within 5 years, determined by the Cox proportional hazards models
| Predictors | Adjusted hazard ratios (95% CI) | |
|---|---|---|
| Women | Men | |
| Age (per year) | 1.09 (1.09, 1.09) | 1.08 (1.08, 1.08) |
| Ethnicity | ||
| European | 1 | 1 |
| Māori | 1.84 (1.78, 1.91) | 1.55 (1.51, 1.61) |
| Pacific | 1.40 (1.33, 1.48) | 1.26 (1.20, 1.32) |
| Indian | 0.910 (0.837, 0.989) | 1.18 (1.11, 1.25) |
| Other | 0.688 (0.647, 0.732) | 0.751 (0.717, 0.787) |
| Deprivation quintile (per quintile) | 1.15 (1.14, 1.16) | 1.11 (1.10, 1.12) |
| Diabetes | 2.43 (2.26, 2.62) | 2.20 (2.07, 2.33) |
| Atrial fibrillation | 2.54 (2.14, 3.01) | 1.99 (1.80, 2.20) |
| Medications dispensed at baseline | ||
| Blood-pressure-lowering | 2.24 (2.13, 2.35) | 1.86 (1.79, 1.94) |
| Lipid-lowering | 1.02 (0.956, 1.08) | 0.942 (0.903, 0.982) |
| Antiplatelet/anticoagulant | 1.48 (1.42, 1.55) | 1.32 (1.27, 1.37) |
| Interactions | ||
| Age (years)*blood pressure-lowering-medication | 0.975 (0.972, 0.978) | 0.976 (0.974, 0.979) |
| Age (years)*diabetes | 0.983 (0.980, 0.987) | 0.982 (0.979, 0.985) |
| Age (years)*atrial fibrillation | 0.984 (0.975, 0.994) | 0.985 (0.979, 0.991) |
| Blood pressure-lowering-medication*diabetes | 0.878 (0.807, 0.956) | 0.858 (0.803, 0.917) |
| Antiplatelet/anticoagulant medications*diabetes | 0.804 (0.744, 0.868) | 0.855 (0.803, 0.910) |
| Blood-pressure-lowering medication*lipid-lowering medication | 0.858 (0.797, 0.923) | 0.941 (0.892, 0.994) |
The hazard ratios for each predictor are adjusted for all other predictors.
Age was centred in women and men separately using their mean values. For age, the mean value in women was 49.021 and the mean value in men was 49.027. Deprivation quintile was centred around quintile three in women and men. The baseline survival estimate at 5 years relevant to the mean value of age, deprivation quintile three and the reference group of categorical variables was 0.9905071151673 among women and 0.9782399916755 among men.