| Literature DB >> 28376093 |
Stephen F Weng1,2, Jenna Reps3,4, Joe Kai1,2, Jonathan M Garibaldi3,4, Nadeem Qureshi1,2.
Abstract
BACKGROUND: Current approaches to predict cardiovascular risk fail to identify many people who would benefit from preventive treatment, while others receive unnecessary intervention. Machine-learning offers opportunity to improve accuracy by exploiting complex interactions between risk factors. We assessed whether machine-learning can improve cardiovascular risk prediction.Entities:
Mesh:
Year: 2017 PMID: 28376093 PMCID: PMC5380334 DOI: 10.1371/journal.pone.0174944
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Variables included in the machine-learning algorithms.
| Variable | Description | Reference |
|---|---|---|
| Gender | male/female | [ |
| Age | Years | [ |
| Total cholesterol | mmol/L | [ |
| HDL cholesterol | mmol/L | [ |
| Systolic blood pressure | mm HG | [ |
| Blood pressure treatment (anti-hypertensives prescribed) | yes/no | [ |
| Smoking | yes/no | [ |
| Diabetes | yes/no | [ |
| Body mass index (BMI) | kg/m2 | [ |
| LDL cholesterol | mmol/L | [ |
| Triglycerides | mmol/L | [ |
| C-reactive protein (CRP) | mg/L | [ |
| Serum fibrinogen | g/L | [ |
| Gamma glutamyl transferase (gamma GT) | IU/L | [ |
| Serum creatinine | g/L | [ |
| Glycated haemoglobin (HbA1c) | % | [ |
| Forced Expiratory Volume (FEV1) | % | [ |
| AST/ALT ratio | — | [ |
| Family history of CHD < 60 years | yes/no | [ |
| Ethnicity | White Caucasian; South Asian; Black/Afro-Carribean; Chinese/East Asian; Other/Mixed; Unknown | [ |
| Townsend deprivation index | 1st quintile (most affluent)– 5th quintile (most deprived); unknown | [ |
| Hypertension | yes/no | [ |
| Rheumatoid arthritis | yes/no | [ |
| Chronic kidney disease | yes/no | [ |
| Atrial fibrillation | yes/no | [ |
| Chronic obstructive pulmonary disease (COPD) | yes/no | [ |
| Severe mental illness | yes/no | [ |
| Prescribed anti-psychotic drug | yes/no | [ |
| Prescribed oral corticosteroids | yes/no | [ |
| Prescribed immunosuppressant | yes/no | [ |
* Measures area level deprivation in the population based on unemployment, non-car ownership, non-home ownership, and household overcrowding
+ Inclusion in published cardiovascular risk algorithms or literature on other potential cardiovascular risk factors
Characteristics of patients aged 30 to 84 in the CPRD study cohort who were free from CVD at baseline.
Patients are stratified by first CVD event during the 10-year follow-up period.
| Risk Factor Variables | Units | CVD (n = 24,970) | No CVD (n = 353,286) | P-Value |
|---|---|---|---|---|
| Age | years (SD) | 65.3 (11.1) | 57.6 (12.8) | < 0.001 |
| BMI | kg/m^2 (SD) | 27.9 (4.94) | 27.9 (5.21) | 0.323 |
| Systolic blood pressure | mm HG (SD) | 141 (17.6) | 137 (17.2) | < 0.001 |
| Total cholesterol | mmol/L (SD) | 5.60 (1.11) | 5.56 (1.06) | < 0.001 |
| HDL cholesterol | mmol/L (SD) | 1.39 (0.41) | 1.46 (0.43) | < 0.001 |
| LDL cholesterol | mmol/L (SD) | 3.45 (0.91) | 3.40 (0.88) | < 0.001 |
| Triglycerides | mmol/L (SD) | 1.69 (0.85) | 1.57 (0.83) | < 0.001 |
| CRP | mg/L (SD) | 10.0 (13.7) | 8.37 (11.5) | < 0.001 |
| Serum fibrinogen | g/L (SD) | 3.86 (1.22) | 3.73 (1.33) | 0.129 |
| gamma GT | IU/L (SD) | 41.3 (33.7) | 39.3 (33.6) | < 0.001 |
| Serum creatinine | umol/L (SD) | 91.9 (17.3) | 87.6 (16.0) | < 0.001 |
| HbA1c | % (SD) | 7.26 (1.61) | 7.14 (1.64) | < 0.001 |
| FEV1 | % (SD) | 66.2 (16.3) | 67.8 (16.9) | 0.007 |
| AST/ALT ratio | — (SD) | 1.04 (0.36) | 1.01 (0.35) | < 0.001 |
| Female | % | 41.8 | 52.8 | < 0.001 |
| Smoking | % | 23.4 | 20.5 | < 0.001 |
| Family history CHD < 60 years | % | 5.00 | 5.51 | < 0.001 |
| Ethnicity | % | 2.27 | 1.90 | 0.004 |
| Ethnicity | % | 0.66 | 1.20 | < 0.001 |
| Ethnicity | % | 0.54 | 0.58 | 0.465 |
| Ethnicity | % | 0.85 | 1.32 | < 0.001 |
| Ethnicity | % | 43.5 | 57.1 | < 0.001 |
| SES | % | 15.8 | 16.0 | < 0.001 |
| SES | % | 13.7 | 13.6 | < 0.001 |
| SESb: 4th Townsend quintile | % | 12.6 | 11.8 | < 0.001 |
| SES | % | 7.95 | 6.91 | < 0.001 |
| SES | % | 34.6 | 34.5 | < 0.001 |
| Hypertension | % | 31.8 | 25.2 | < 0.001 |
| Diabetes | % | 15.0 | 10.1 | < 0.001 |
| Blood pressure treatment | % | 28.3 | 21.9 | < 0.001 |
| Rheumatoid arthritis | % | 1.55 | 0.91 | < 0.001 |
| Chronic kidney disease | % | 0.99 | 0.48 | < 0.001 |
| Atrial fibrillation | % | 4.64 | 2.20 | < 0.001 |
| COPD | % | 3.97 | 2.02 | < 0.001 |
| Severe mental illness | % | 0.34 | 0.32 | 0.563 |
| Anti-psychotic drug prescribed | % | 15.2 | 12.7 | < 0.001 |
| Oral corticosteroid prescribed | % | 13.2 | 9.55 | < 0.001 |
| Immunosuppressant prescribed | % | 13.3 | 9.70 | < 0.001 |
| BMI missing | % | 3.48 | 5.87 | < 0.001 |
| LDL cholesterol missing | % | 25.1 | 24.6 | 0.041 |
| Triglycerides missing | % | 11.7 | 12.3 | 0.004 |
| CRP missing | % | 88.5 | 89.9 | < 0.001 |
| Serum fibrinogen missing | % | 99.0 | 99.0 | 0.207 |
| gamma GT missing | % | 64.8 | 69.1 | < 0.001 |
| Serum creatinine missing | % | 16.1 | 21.5 | < 0.001 |
| HbA1c missing | % | 79.6 | 85.9 | < 0.001 |
| FEV1 missing | % | 96.3 | 97.7 | < 0.001 |
| AST/ALT ratio missing | % | 85.2 | 88.2 | < 0.001 |
*core risk factor for ACC/AHA 10-year CVD risk equations
+missing values present
areference category is White Caucasian
breference category is 1st Townsend quintile (most affluent)
Top 10 risk factor variables for CVD algorithms listed in descending order of coefficient effect size (ACC/AHA; logistic regression), weighting (neural networks), or selection frequency (random forest, gradient boosting machines).
Algorithms were derived from training cohort of 295,267 patients.
| ACC/AHA Algorithm | Machine-learning Algorithms | ||||
|---|---|---|---|---|---|
| Men | Women | ML: Logistic Regression | ML: Random Forest | ML: Gradient Boosting Machines | ML: Neural Networks |
| Age | Age | Ethnicity | Age | Age | Atrial Fibrillation |
| Total Cholesterol | HDL Cholesterol | Age | Gender | Gender | Ethnicity |
| Total Cholesterol | SES: Townsend Deprivation Index | Ethnicity | Ethnicity | Oral Corticosteroid Prescribed | |
| Smoking | Smoking | Gender | Smoking | Smoking | Age |
| Age x Total Cholesterol | Age x | Smoking | Severe Mental Illness | ||
| Treated Systolic Blood Pressure | Age x Total Cholesterol | Atrial Fibrillation | HbA1c | Triglycerides | SES: Townsend Deprivation Index |
| Age x Smoking | Treated Systolic Blood Pressure | Chronic Kidney Disease | Triglycerides | Total Cholesterol | Chronic Kidney Disease |
| Age x | Untreated Systolic Blood Pressure | Rheumatoid Arthritis | SES: Townsend Deprivation Index | HbA1c | |
| Untreated Systolic Blood Pressure | Age x Smoking | Family history of premature CHD | BMI | Systolic Blood Pressure | Smoking |
| Diabetes | Diabetes | COPD | Total Cholesterol | SES: Townsend Deprivation Index | Gender |
Italics: Protective Factors
Performance of the machine-learning (ML) algorithms predicting 10-year cardiovascular disease (CVD) risk derived from applying training algorithms on the validation cohort of 82,989 patients.
Higher c-statistics results in better algorithm discrimination. The baseline (BL) ACC/AHA 10-year risk prediction algorithm is provided for comparative purposes.
| Algorithms | AUC c-statistic | Standard Error | 95% Confidence Interval | Absolute Change from Baseline | |
|---|---|---|---|---|---|
| LCL | UCL | ||||
| BL: ACC/AHA | 0.728 | 0.002 | 0.723 | 0.735 | — |
| ML: Random Forest | 0.745 | 0.003 | 0.739 | 0.750 | +1.7% |
| ML: Logistic Regression | 0.760 | 0.003 | 0.755 | 0.766 | +3.2% |
| ML: Gradient Boosting Machines | 0.761 | 0.002 | 0.755 | 0.766 | +3.3% |
| ML: Neural Networks | 0.764 | 0.002 | 0.759 | 0.769 | +3.6% |
*Standard error estimated by jack-knife procedure [30]