| Literature DB >> 35148360 |
Ajay Kesar1, Adel Baluch1, Omer Barber1, Henry Hoffmann1, Milan Jovanovic1, Daniel Renz1, Bernard Leon Stopak1, Paul Wicks1, Stephen Gilbert1,2.
Abstract
Cardiovascular diseases (CVDs) are the primary cause of all death globally. Timely and accurate identification of people at risk of developing an atherosclerotic CVD and its sequelae is a central pillar of preventive cardiology. One widely used approach is risk prediction models; however, currently available models consider only a limited set of risk factors and outcomes, yield no actionable advice to individuals based on their holistic medical state and lifestyle, are often not interpretable, were built with small cohort sizes or are based on lifestyle data from the 1960s, e.g. the Framingham model. The risk of developing atherosclerotic CVDs is heavily lifestyle dependent, potentially making many occurrences preventable. Providing actionable and accurate risk prediction tools to the public could assist in atherosclerotic CVD prevention. Accordingly, we developed a benchmarking pipeline to find the best set of data preprocessing and algorithms to predict absolute 10-year atherosclerotic CVD risk. Based on the data of 464,547 UK Biobank participants without atherosclerotic CVD at baseline, we used a comprehensive set of 203 consolidated risk factors associated with atherosclerosis and its sequelae (e.g. heart failure). Our two best performing absolute atherosclerotic risk prediction models provided higher performance, (AUROC: 0.7573, 95% CI: 0.755-0.7595) and (AUROC: 0.7544, 95% CI: 0.7522-0.7567), than Framingham (AUROC: 0.680, 95% CI: 0.6775-0.6824) and QRisk3 (AUROC: 0.725, 95% CI: 0.7226-0.7273). Using a subset of 25 risk factors identified with feature selection, our reduced model achieves similar performance (AUROC 0.7415, 95% CI: 0.7392-0.7438) while being less complex. Further, it is interpretable, actionable and highly generalizable. The model could be incorporated into clinical practice and might allow continuous personalized predictions with automated intervention suggestions.Entities:
Mesh:
Year: 2022 PMID: 35148360 PMCID: PMC8836294 DOI: 10.1371/journal.pone.0263940
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Overview of experimental setup of proposed approach.
CVD outcomes statistics according to definition in current study and the comparator study definition by Alaa et al. [29].
| Statistic measured | Number |
|---|---|
| No. of atherosclerotic CVD outcomes that developed in 10-year follow-up according to definition in current study | 28,561 |
| No. of CVD outcomes that developed in 10-year follow-up according to comparator study definition | 28,242 |
| No. of CVD outcomes after 10-year follow-up that overlap in the current study and comparator study definition | 456,184 out of 464,547 (98%) |
| No. of CVD outcomes identified in the current study but not in comparator studies | 4,341 |
| No. of CVD outcomes included in comporator studies, but not in current study | 4,022 |
Performance of all tested classifiers including baseline models.
| No. | Algorithm Name | AUROC and 95% confidence intervals |
|---|---|---|
| 1 | Extreme Gradient Boosting (XGB) | 0.7573 (0.755–0.7595) |
| 2 | Logistic regression with L1 regularization | 0.7544 (0.7522–0.7567) |
| 3 | QRisk3 | 0.725 (0.7226–0.7273) |
| 4 | Framingham Lipid & BMI | 0.680 (0.6775–0.6824) & 0.681 (0.6788–0.6837) |
| 5 | Random Forest | 0.6690 (0.6666–0.6715) |
Fig 2AUROC of logistic regression with L1 regularization and XGBoost.
Fig 3AUROC curves of baseline models on imputed data.
Fig 4AUROC curves of baseline models on unimputed data.
Fig 5Performance of best logistic regression model depending on number of features.
AUROC performance of best performing logistic regression model with L1 regularization (continuous blue line) compared to number of features utilized in each iterative feature elimination step (orange line), dotted blue horizontal line showing intersection of 25 features with iterative feature elimination step, allowing for extrapolation to model performance.
Performance of best logistic regression model depending on number of features.
| Number of Features | AUROC |
|---|---|
| 203 | 75.44 |
| 40 | 75.01 |
| 25 | 74.15 |
| 20 | 73.32 |
| 17 | 72.76 |
| 10 | 70.88 |
| 2 | 68.98 |
Model performance when trained on Whites and tested on non-Whites.
| Model | AUROC on generalizability experiment | Previous AUROC results |
|---|---|---|
| Logistic Regression with L1 regularization | 75.86% | 75.44% |
| XGBoost | 76.26% | 75.73% |
Fig 6AUROC of logistic regression with L1 regularization and XGBoost when trained on Whites and tested on non-Whites.
Relative regression feature weights of 25 most informative risk factors from best logistic regression model.
| Feature number | Risk factor name | Relative informative value descending |
|---|---|---|
| 1 | Age | 0.0938 |
| 2 | Biological sex | 0.0485 |
| 3 | Systolic blood pressure | 0.0284 |
| 4 | Social visits: About once a week | 0.0277 |
| 5 | Social visits: 2–4 times a week | 0.0273 |
| 6 | Walking pace: Brisk pace | 0.0268 |
| 7 | Total cholesterol HDL ratio | 0.0267 |
| 8 | Total cholesterol | 0.0239 |
| 9 | LDL cholesterol | 0.0235 |
| 10 | Familial CVD | 0.0218 |
| 11 | Social visits: About once a month | 0.0203 |
| 12 | Sleep problems: Not at all | 0.0188 |
| 13 | Alcohol with meals: Yes | 0.0184 |
| 14 | Smoking | 0.0184 |
| 15 | Social visits: Almost daily | 0.0178 |
| 16 | No. of cigarettes daily | 0.0163 |
| 17 | Hypertension | 0.0160 |
| 18 | Walking pace: Steady average pace | 0.0154 |
| 19 | Waist circumference | 0.0150 |
| 20 | Alcohol with meals: It varies | 0.0141 |
| 21 | Social visits: Once every few months | 0.0139 |
| 22 | Overall health rating: Excellent | 0.0134 |
| 23 | Other Heart Arrhythmias | 0.0129 |
| 24 | Overall health rating: Poor | 0.0123 |
| 25 | Sleep problems: Several days | 0.0122 |
Categorization of the 25 most informative risk factors into categories from the best logistic regression model.
| Category | Risk Factors |
|---|---|
| Demographics | Age, Biological sex |
| Biomarkers | Waist circumference, systolic blood pressure, total cholesterol, LDL cholesterol, total cholesterol HDL ratio |
| Comorbidities | Hypertension, sleep problems: not at all, sleep problems: several days, other heart arrhythmias |
| Family History | Familial CVD |
| Lifestyle Factors | Social visits: about once/week, social visits: 2–4 times/week, social visits: about once/month, social visits: almost daily, social visits: once every few months, smoking, no. of cigarettes daily, alcohol with meals: yes, alcohol with meals: it varies, walking pace: steady average pace, walking pace: Brisk pace, overall health rating: excellent, overall health rating: poor |