Literature DB >> 32808014

Improved cardiovascular risk prediction using targeted plasma proteomics in primary prevention.

Renate M Hoogeveen¹, João P Belo Pereira¹, Nick S Nurmohamed^1,2, Veronica Zampoleri³, Michiel J Bom², Andrea Baragetti³, S Matthijs Boekholdt⁴, Paul Knaapen², Kay-Tee Khaw⁵, Nicholas J Wareham⁶, Albert K Groen¹, Alberico L Catapano^3,7, Wolfgang Koenig^8,9,10, Evgeni Levin^1,11, Erik S G Stroes¹.

Abstract

AIMS: In the era of personalized medicine, it is of utmost importance to be able to identify subjects at the highest cardiovascular (CV) risk. To date, single biomarkers have failed to markedly improve the estimation of CV risk. Using novel technology, simultaneous assessment of large numbers of biomarkers may hold promise to improve prediction. In the present study, we compared a protein-based risk model with a model using traditional risk factors in predicting CV events in the primary prevention setting of the European Prospective Investigation (EPIC)-Norfolk study, followed by validation in the Progressione della Lesione Intimale Carotidea (PLIC) cohort. METHODS AND
RESULTS: Using the proximity extension assay, 368 proteins were measured in a nested case-control sample of 822 individuals from the EPIC-Norfolk prospective cohort study and 702 individuals from the PLIC cohort. Using tree-based ensemble and boosting methods, we constructed a protein-based prediction model, an optimized clinical risk model, and a model combining both. In the derivation cohort (EPIC-Norfolk), we defined a panel of 50 proteins, which outperformed the clinical risk model in the prediction of myocardial infarction [area under the curve (AUC) 0.754 vs. 0.730; P < 0.001] during a median follow-up of 20 years. The clinically more relevant prediction of events occurring within 3 years showed an AUC of 0.732 using the clinical risk model and an AUC of 0.803 for the protein model (P < 0.001). The predictive value of the protein panel was confirmed to be superior to the clinical risk model in the validation cohort (AUC 0.705 vs. 0.609; P < 0.001).
CONCLUSION: In a primary prevention setting, a proteome-based model outperforms a model comprising clinical risk factors in predicting the risk of CV events. Validation in a large prospective primary prevention cohort is required to address the value for future clinical implementation in CV prevention.

Entities: Chemical Disease Gene Species

Keywords: Cardiovascular event risk; Clinical risk score; Machine learning; Prediction; Proteomics; Targeted proteomics

Mesh：

Year: 2020 PMID： 32808014 PMCID： PMC7672529 DOI： 10.1093/eurheartj/ehaa648

Source DB: PubMed Journal: Eur Heart J ISSN： 0195-668X Impact factor: 29.983

See page 4008 for the editorial comment on this article (doi:

Introduction

Identification of asymptomatic people at the greatest cardiovascular (CV) risk remains a major challenge in primary prevention., Clinically used risk algorithms, including the Framingham risk score, pooled cohort equations, and Systemic Coronary Risk Evaluation (SCORE) system, are based on traditional risk factors for CV disease and predict future events with limited accuracy., Accordingly, a substantial proportion of the general population at risk remains unidentified until their first clinical event. Despite adding individual plasma biomarkers such as pro-brain natriuretic peptide (BNP), high sensitivity troponins, and high sensitivity C-reactive protein (CRP) to clinical risk engines, the overall improvement has been limited. This may be explained by the fact that the vast majority of single markers are selected based on specific pathophysiological concepts, which do not reflect the true complexity of atherosclerosis. In fact, CV risk is the result of an interplay between comorbidities (chronic inflammatory diseases, metabolic derangements) and exogenous risk factors, propagated by a variety of pathophysiological axes, comprising but not limited to lipids, coagulation, and inflammation. Simultaneous assessment of a large number of plasma proteins may hold a promise to further refine risk assessment. To this end, either discovery proteomics, aiming to identify new diagnostic markers or therapeutic targets, or targeted proteomics, aimed at quantification of proteins of specific interest, can be applied. Widespread use of proteomics has been precluded by labour intensiveness, high costs, and the complex clinical interpretation of the bulky results. More recently, these limitations have largely been resolved. Technical advances now allow for high-throughput proteomic analysis in a reproducible and cost-effective manner. In parallel, advanced computational modelling has facilitated the interpretation of large data sets for clinical implementation., Using these innovations, a targeted protein panel was found to modestly improve the prediction of incident atherosclerotic CV disease in primary prevention, whereas Ganz et al. substantiated that targeted proteomics also outperformed refit Framingham in predicting recurrent coronary events. In support, we recently identified two complementary protein signatures predicting the presence of high-risk plaque and the absence of coronary atherosclerosis in subjects referred for the analysis of anginal complaints, clearly outperforming the traditional risk algorithm. In the present study, we hypothesized that a protein-based risk model can outperform prediction using traditional risk factors in the primary prevention setting. Therefore, we tested the ability of a targeted proteomics panel comprising 368 proteins, related to pathways and/or risk factors involved in atherogenesis, to predict CV event risk in a nested case–control sample of the European Prospective Investigation (EPIC)-Norfolk population study, using advanced machine learning techniques. The findings were subsequently validated in the independent, external primary prevention cohort [Progressione della Lesione Intimale Carotidea (PLIC)].

Methods

Study populations

The derivation cohort was a nested case–control sample derived from the EPIC-Norfolk prospective population study, comprising 25 633 individuals recruited from general practices in the Norfolk area, UK. Study participants aged between 39 and 79 years were enrolled between 1993 and 1997. At baseline, patients completed general health questionnaires and a panel of measurements was performed. During follow-up, all individuals were flagged for mortality at the UK Office of National Statistics and vital status was ascertained for the entire cohort. Data on all hospital contacts throughout England and Wales were obtained using National Health Service numbers through linkage with the East Norfolk Health Authority (ENCORE) database. Hospital records and death certificates were coded by trained nosologists and categorized according to the International Classification of Disease 10th revision (ICD-10). The study protocol was approved by the Norwich District Health Authority Ethical Committee. All individuals gave written informed consent. For the current study, we selected 822 apparently healthy individuals in a nested case–control sample from the EPIC-Norfolk study. Apparently healthy individuals were defined as study participants who did not report a history of CV disease. A total of 411 individuals who developed an acute myocardial infarction (either hospitalization or death with ICD code I21-22 coded as the underlying cause) between baseline and follow-up through 2016 were selected together with 411 apparently healthy individuals who remained free of any CV disease during follow-up (Figure )., Machine learning workflow of model construction and validation. AHT med, antihypertensive medication; BMI, body mass index; CV, cardiovascular; EPIC, European Prospective Investigation; HDL-C, high-density lipoprotein cholesterol; PLIC, Progressione della Lesione Intimale Carotidea; SBP, systolic blood pressure; TC, total cholesterol; TG, triglycerides. The validation cohort was the PLIC cohort, a single-centre, observational, cross-sectional, and prospective study of subjects enrolled on a voluntary basis in 1998–2000 and followed for 11 years on average in the northern area of Milan. The 2606 Caucasian subjects who were enrolled in the study underwent four periodic visits. Data about clinical, pathological, familial, and pharmacological history and lifestyle habits were collected based on medical records and self-reporting during these visits. Blood samples were withdrawn, and subjects underwent carotid ultrasound to assess the presence or absence of carotid vascular damage. The presence of documented stenosis or vascular damage on aorta and limb arteries was included in the definition of subclinical atherosclerosis. For the validation cohort, 702 subjects were selected, of whom 351 developed atherosclerosis, comprising subclinical atherosclerosis and 44 subjects who suffered from a CV event, and 351 gender-matched controls during follow-up (Figure ). Cardiovascular events were defined as a combined endpoint of coronary heart disease (myocardial infarction, unstable angina, coronary revascularization, silent ischaemia) and cerebrovascular disease (ischaemic stroke and transient ischaemic attack). This study was approved by the ethics committee and was performed in accordance with the Declaration of Helsinki. All participating subjects signed informed consent.

Biochemical analyses

In EPIC-Norfolk, non-fasting blood was drawn at baseline from study participants, from which total cholesterol, high-density lipoprotein (HDL) cholesterol, and triglycerides were determined with the RA1000 analyser (Bayer Diagnostics, Basingstoke, UK). The Friedewald formula was used for the calculation of low-density lipoprotein (LDL) cholesterol levels. After blood withdrawal, ethylene diamine tetra acetic acid (EDTA) samples were kept overnight at room temperature before transporting to the EPIC-Norfolk laboratory for centrifugation. Hereafter, the remaining plasma was stored at −80°C for future analyses. In the PLIC cohort, blood samples were collected after overnight fasting. Samples were kept on ice after blood withdrawal and centrifuged within 1 h at 3000 rounds per minute for 12 min (Eppendorf 5810R centrifuge). Plasma samples were subsequently stored in 200 μL aliquots at −80°C. Since multiple aliquots were stored, multiple freeze/thaw cycles were prevented. Total cholesterol, HDL cholesterol, triglyceride and glucose levels were determined in serum samples with the Cobas Mira Plus Analyser (Horiba, ABX, France). Again, the Friedewald formula was used for the calculation of LDL cholesterol levels. In 2019, we selected cases and controls from both cohorts, whereupon aliquots were thawed and plasma was transferred, on ice, to 96-well plates. The 96-well plates were shipped to Olink proteomics AB (Uppsala, Sweden) on dry ice for analysis using the proximity extension assay technology. Levels of 368 proteins were measured from the CV II, CV III, Cardiometabolic, and Inflammation panels. These panels were selected for their known associations with CV disease. Cases and controls were randomly distributed across plates and assays were performed in a blinded fashion. Data are Normalized Protein eXpression values. Using an internal extension control and an interpolate control, data quality is controlled and normalized. All assay validation data are available on the manufacturer’s website (www.olink.com).

Statistical analysis

Data are presented as mean ± standard deviation for normally distributed variables or median with inter-quartile range for skewed data. Categorical variables are expressed as absolute number and percentages. Independent sample t-tests and Mann–Whitney U tests were used where appropriate. Two-sided P-values ≤0.05 were considered statistically significant. Data were analysed using R version 3.5.1 (R Foundation, Vienna, Austria).

Model construction

A combination of stacking generalization framework,,tree-based ensemble methods, and multiple gradient boosting classifiers was used to best discriminate between cases and controls. Using these techniques, explained in detail below, different models were constructed. First, a clinical risk model was built. The clinical risk model included parameters of different validated risk scores, the Framingham Risk Score, pooled cohort equations, and SCORE. Parameters included in the clinical risk model were age, gender, body mass index, systolic blood pressure, smoking status, and presence of diabetes, the use of antihypertensive medication, total cholesterol levels, HDL cholesterol levels, and triglyceride levels. Second, a protein-based model was constructed using the measured plasma proteins only. A third model was formed by stacking the clinical risk parameters with the protein parameters. The proteins and clinical parameters were allowed to compete in the formation of this model. All three models were validated in the validation cohort without adjustments. Next, considering the long-term follow-up of subjects and the fact that proteins are subject to change due to lifestyle and medical interventions, we assessed the optimal time point of prediction of acute myocardial infarction in the derivation cohort using Markov-Chain Monte Carlo techniques. For this optimal time point of prediction, similar to the long-term modelling, a clinical risk model, protein model, and a combined model were formed. Specifically for this time point, we calculated the net reclassification improvement (NRI) as described by Pencina et al. for case–control studies. Theretofore, we used the acute myocardial infarction prevalence of the total EPIC-Norfolk cohort in the same period. In addition, we constructed survival models for both the protein model and the clinical risk model in the derivation cohort, to compare model performance across all possible time points. This time-to-event analysis was performed using identical machine learning techniques as the binary models, with the implementation of a survival loss function. Inverse probability of censoring weighting was used to cope with the right-censored data. Using these survival models, time-dependent area under the curves (AUCs) were calculated with a 2-year interval starting from 3 years up to the median follow-up of 20 years.

Machine learning techniques

All binary models were constructed using the same machine learning techniques (Figure ). First, to avoid overfitting of the models, the derivation data set was split into two sets: a training set of 80% and a test set of 20%. The model was not exposed to data from the 20% test set; this was only used for the performance measurements. Ten percentage of the 80% training set was used for model refinement before the model performance was tested in the test set. In construction of the models and identification of the most reliable biomarker signature in our datasets (both proteomics and clinical), we used stability selection with extreme gradient boosting. Gradient boosting is a statistical learning technique, which produces a non-linear model in the form of an ensemble of weak prediction tree-based models. It builds the model in a stage-wise fashion, and it generalizes them by allowing optimization of an arbitrary differentiable loss function. The extreme gradient boosting classification algorithm optimizes a cost function by iteratively choosing a weak hypothesis that points in the negative gradient direction., Using a fivefold cross-validation by random reshuffling of the training set, overfitting was avoided. For increased confidence, this procedure was repeated multiple times on a completely reshuffled dataset. Furthermore, the method was coupled with a rigorous stability selection procedure to ensure the reliability and robustness of the obtained parameters. Finally, we applied a permutation (randomization test) to evaluate statistical validity of the results, since standard univariate significance tests cannot be applied to the used models due to the large number of features. The permutation test comprised 1000 reruns of the model, every time randomly permuting the output variable (presence/absence of the event). By evaluating the distribution of all the results obtained in these simulations and comparing it to the true outcomes, we computed statistical significance associated with the joint panel of the selected markers. We also reported importance scores for each of the proteins that demonstrate preferences of model when constructing non-linear prediction function based on the selected biomarkers. Python version 3.7 (www.python.org), with packages Numpy, Scipy, and Scikits-learn, was used for machine learning models and visualizations.

Results

Baseline characteristics of individuals from the derivation and validation cohort are provided in Table . In short, cases in the derivation cohort were more likely to have traditional risk factors for CV disease (older, more likely male, smokers and had higher blood pressure and cholesterol levels). In the validation cohort, cases were more likely to be older, to smoke, and have high blood pressure. In all participants, 368 preselected proteins were measured; proteins were excluded if ≥90% of the values were below lower limit of detection. Due to the latter and overlap between the panels, the final analysis included 333 unique proteins (Supplementary material online, ). Baseline characteristics Values are n (%), mean ± standard deviation, or median (IQR) for skewed data. BMI, body mass index; EPIC, European Prospective Investigation; HDL, high-density lipoprotein; hsCRP, high sensitivity C-reactive protein; IQR, inter-quartile range; LDL, low-density lipoprotein; PLIC, Progressione della Lesione Intimale Carotidea. To convert to mmol/L, divide with 38.7. To convert to mmol/L, divide with 88.6.

Prediction of acute myocardial infarction

Prediction of myocardial infarction using a machine learning model consisting of 50 plasma proteins over a median follow-up of 20 years resulted in an receiver operating characteristic (ROC) AUC of 0.754 ± 0.011 (permutation test P = 0.0099; Figures and Table ). In comparison, the use of the clinical risk model resulted in an ROC AUC of 0.730 ± 0.015 (permutation test P = 0.0099). Combining the protein panel with clinical risk model resulted in an ROC AUC of 0.764 ± 0.015 (permutation test P = 0.0099). The biomarker model was superior to the clinical risk model (P < 0.001). The combined protein and clinical risk model showed a small incremental AUC of 0.01 in comparison with the protein model alone. Using Markov-Chain Monte Carlo techniques, the optimal time point for prediction was found at 1132 days (∼3 years), which included 66 acute myocardial infarctions and all 411 non-myocardial infarction controls. Of the 50 proteins that were selected for the ∼3 years prediction model, 33 overlapped with the original 20 years model (Supplementary material online, ). Focusing on the events occurring within ∼3 years after baseline blood withdrawal, the ROC AUC increased to 0.803 ± 0.093 (permutation test P = 0.0145; Figure ), as opposed to 0.732 ± 0.164 (permutation test P = 0.0099) using the clinical risk model. The combination of the protein and clinical risk parameters resulted in an ROC AUC of 0.808 ± 0.085 (permutation test P = 0.0178). Now, the biomarker model was superior to the clinical risk model with an incremental AUC of 0.07 (P = 0.025) but not to the combination of the protein and clinical risk model (P = 0.721). For the short-term prediction, the NRI of the protein model in comparison to the clinical risk model was 6.6%. In the survival analysis, the protein model resulted in a mean time-dependent AUC of 0.717 ± 0.027, which was superior across all time points compared to the clinical risk model mean AUC of 0.653 ± 0.031 (P < 0.001; Supplementary material online, ). Importance plot of proteins. Relative importance of 50 proteins predictive in derivation cohort. Receiver operating characteristics of prediction models. (A) Prediction of events with protein, clinical risk, and combined model in derivation cohort. (B) Short-term prediction (<3 years) of events with protein, clinical risk, and combined model in derivation cohort. (C) Prediction of events with protein, clinical risk, and combined model in validation cohort. AUC, area under the curve; ROC, receiver operating characteristic. Derivation and validation of a plasma proteomic model improves cardiovascular risk prediction in a primary prevention setting, demonstrating the potential of a proteomics panel to further refine risk assessment. CV, cardiovascular; NPX, Normalized Protein eXpression; PEA, proximity extension assay. Receiver operating characteristic area under the curve of prediction Average receiver operating characteristics area under the curve of the prediction models.

Validation of the predictive value

We validated the discriminatory ability of the 50 proteins from the derivation cohort in the validation cohort. First, we investigated the ability of the proteins to predict subclinical atherosclerosis. The prediction was relatively poor with an ROC AUC of 0.648 ± 0.056 (permutation test P = 0.0297; Supplementary material online, ). When validating the proteins in the 44 participants who suffered from CV events vs. the 351 participants with no signs of atherosclerosis, the protein model resulted in an ROC AUC of 0.705 ± 0.071 (permutation test P = 0.0099; Figure ), compared to the clinical risk model ROC AUC 0.609 ± 0.057 (permutation test P = 0.0700; Table ). The protein model was significantly better than the clinical risk model in the validation cohort (P < 0.001). The combined protein and clinical risk model resulted in an ROC AUC of 0.692 ± 0.090 (permutation test P = 0.0099), which was not better than the protein model alone (P = 0.618).

Discussion

Using targeted proteomics, we show that a panel of 50 proteins outperforms the clinical risk model in predicting the risk of myocardial infarction (<3 years) in a primary prevention setting with an AUC increase in the ROC curve of 0.07. Improvement in predicting CV events during the entire (median) 20-year follow-up period was significant, albeit modest. In an external independent validation cohort, the predictive value of the protein panel for CV events was confirmed and superior to the clinical risk model (incremental AUC 0.10). Survival analysis showed superiority of the protein model to the clinical risk model at all tested time points (P < 0.001). Collectively, these data show that a novel proteomic panel offers a significant improvement in CV risk discrimination compared to a clinical risk model based on traditional risk factors ().

Protein-based risk prediction outperforms traditional risk factors

We substantiate the predictive value of a panel comprising 50 proteins for a first MI with an ROC AUC of 0.754 ± 0.011, using targeted proteomics. Although outperforming the prediction by the clinical risk model (P < 0.001), the AUC increase of 0.02 is very modest. Interestingly, the prediction of earlier MI (within 3 years after baseline blood sampling) using the plasma protein panel performed better, with an incremental ROC of 0.07. Where genetic prediction models are advocated to predict lifelong risk, the ability of our protein model particularly in shorter-term risk prediction most likely highlights the property of plasma proteomics to reflect a more proximate timeframe. Confronted with continuous changes in lifestyle as well as medical interventions during the course of a life, repeated proteome-based risk estimation as a ‘liquid health check’ may help to further improve lifetime risk estimation. The prediction of predominantly short-term MI substantiates our previous findings that proteomics also predicts the presence of high-risk plaques in patients, which are closely associated with an increased risk for ensuing MI. Previous cohort studies have also reported benefit of proteins in risk prediction. The Framingham Heart Study investigators evaluated a panel of 85 plasma proteins in relation to CV events in primary care setting. Using a multi-marker analysis, they reported eight biomarkers predictive for incident CVD, which on top of clinical parameters achieved an ROC of 0.758. These data are in line with data reported for the prediction of recurrent coronary events, where a panel of 9 out of 1130 proteins modestly improved risk prediction (AUC 0.70) compared to the clinical risk algorithm (AUC 0.64).

Proteins predictive of cardiovascular events

Based on previous findings, we used targeted proteomics using proteins relating to cardiometabolic disease, CV disease, and inflammation/immune responses. The majority of proteins in our model were related to immune system response; particularly proteins involved in chemotaxis, migration, apoptosis, and angiogenesis. Most of the proteins found to predict early vs. all events overlapped (33/50). Several proteins merit further attention considering their marked contribution to the final model. Growth Differentiation Factor 15 (GDF-15) was the protein with the largest contribution. In chronic diseases, GDF-15 produced by leukocytes has been shown to enhance inflammation. Other prominent candidates involve the N-terminal pro-B-type natriuretic peptides and BNP, which are established markers for heart failure and predictors of CV events. There is also a preponderance of inflammatory proteins, comprising metalloproteinase-12 (MMP-12), TRAIL receptor 2 and interleukin-6. These proteins, involved in matrix degradation, apoptosis and inflammation induction, reflect major pathways contributing to atherosclerotic lesion formation and destabilization. Interestingly, there is clear overlap in proteins and pathways when comparing our data to previous CVD-proteomic studies. Thus, GDF-15 was also identified as a predictive candidate in previous studies.,, Similarly, the relevance of plasma MMP12,,, and various chemo/cytokines,, underscore consistency between these studies.

Validation in the Progressione della Lesione Intimale Carotidea cohort

Validation of our findings was performed in the primary prevention PLIC cohort, in which both repetitive non-invasive measures for atherosclerosis and CV events were collected during an 11-year follow-up. The 50-protein model from the derivation cohort showed reasonable prediction of CV events with an ROC AUC of 0.705 ± 0.071, with an incremental AUC of 0.10 compared to the clinical risk model in the PLIC cohort (ROC AUC of 0.609 ± 0.057). We also assessed the value of the proteomic model to predict the presence of subclinical atherosclerotic lesions assessed using ultrasound, revealing an ROC AUC of 0.648 ± 0.056. The failure of plasma proteomics to accurately predict the presence of subclinical atherosclerosis is in line with the findings in the derivation cohort, where the protein signature performed better for early/mid-term CV events than for long-term events.

Clinical perspective

In previous studies, adding single plasma markers to clinical risk algorithms resulted in only a modest improvements of risk prediction., Here, we report a marked improvement in CVD risk prediction using a targeted proteomics approach. The hurdles for using proteomic panels in clinical practice have been largely removed with the advent of affordable high-throughput technology requiring only minimal amounts of plasma. More importantly, machine learning technology further facilitates the use of complex, massive data (such as proteomics) in clinical decision making., The need for better discrimination of subjects at highest CV event risk is underscored by the advent of expensive medication in CVD preventive therapy beyond generic statins, among which PCSK9-antibodies,,low-dose Xa inhibition, SGLT2 inhibitors,, and GLP1 agonists., Whereas a high-risk proteomic panel holds a promise to help identify higher-risk subjects, it is tempting to speculate that pathway analysis of the proteomic signature may also allow for the guidance of what medication to use in specific patient categories. This concept is underscored by the CANTOS study, where predominantly CRP responders demonstrated CV benefit of interleukin 1 beta-antibody administration. However, this concept needs further validation with special emphasis on relationships between biomarkers and protein network analysis., Hypothetically, the development of a targeted-proteomic based risk score might enable a more patient-tailored approach for the primary prevention of CV events.

Strengths and limitations

The combination of proteomics with machine learning technology is highly promising., Machine learning technology can process data that surpasses the capacity of traditional statistics and the human brain to comprehend. One of the most important differences is that our predictive machine learning model is based on multiple proteins in a panel, which collectively leads to a reliable prediction. Using machine learning, non-linear relationships and interactions among proteins are taken into account, in contrast to univariate models that only address up- and/or down-regulation of individual proteins. In the current analyses, we refitted the clinical risk factors from the Framingham risk score and SCORE to best fit our cohort data, aiming to improve the performance of traditional risk factors. By applying analogous machine learning methods for the traditional risk factors, the observed superiority of our protein model over the clinical risk model is distinct. Several potential limitations deserve closer attention. First, the cohorts used in this study were collected over a decade ago. Over the years, risk factor management has improved, plaque characteristics have altered, and patient characteristics have changed. Second, our validation cohort had a limited number of CVD events. However, validation of our protein model on these events was reasonable and the model outperformed the clinical risk model in the validation cohort, in an even stronger manner than in the derivation cohort. Third, we used targeted rather than untargeted proteomics in our study. Proteins were preselected as potential biomarkers for CV disease, since clinical verification, rather than protein discovery was the goal in our study. Despite analysing a broad range of proteins, we may have missed other predictors of CV event risk due to the use of targeted proteomics only. As a result, we may have underestimated the true potential of proteomics in CV risk estimation. Fourth, in contrary to other primary prevention risk scores such as the Pooled Cohort Equation, our constructed models do not predict lifetime risk, which could be useful in primary prevention patients characterized by a relatively low short-term CV risk, such as in subjects below 50 years of age. However, in the present study, we preferred shorter-term prediction for several reasons. Most importantly, the mean age of both our derivation and validation cohorts was well above the age of 50 years, resulting in a higher short-term risk even in primary prevention. Furthermore, diagnostic improvement in detecting high-risk patients is currently needed to make decisions on initiating novel medication on top of routine regimens, and for these decisions, relatively short-time horizons are routinely used. Fifth, the samples in the derivation cohort were non-fasting, while the samples in the validation cohort were collected after an overnight fast. Despite this difference, the protein model performed comparable in the validation cohort. Finally, the current analyses were performed in subjects primarily from European ancestry. Hence, the predictive power remains to be validated in different ethnicities.

Conclusions

In primary prevention, proteome-based risk prediction significantly outperforms prediction using clinical risk factors in predicting the risk of acute myocardial infarction and CV events, especially in the first 3 years. In the midst of novel, expensive drugs, prediction of individual CVD risk and treatment benefit is increasingly important. Further large prospective studies will have to determine the true value of proteome-based risk scores in primary prevention. Click here for additional data file.

Table 1

Baseline characteristics

	EPIC—case (n = 411)	EPIC—control (n = 411)	PLIC—case (n = 351)	PLIC—control (n = 351)
Age (years)	66 ± 7.8	62 ± 7.7	55 ± 8.1	54 ± 8.2
Male gender	282 (68.6)	254 (61.8)	117 (33.3)	116 (33.0)
BMI (kg/m²)	26.8 ± 3.7	26.6 ± 3.6	26.9 ± 4.2	26.4 ± 3.2
Systolic blood pressure (mmHg)	144 ± 19	136 ± 17	134 ± 17	130 ± 16
Diastolic blood pressure (mmHg)	86 ± 12	83 ± 11	84 ± 9	82 ± 9
Current smoker	61 (15)	22 (5.4)	78 (22.2)	56 (16.0)
Total cholesterol (mg/dL)^a	250 ± 47	243 ± 43	225 ± 39	220 ± 38
HDL cholesterol (mg/dL)^a	50 ± 14	53 ± 15	55 ± 15	58 ± 15
LDL cholesterol (mg/dL)^a	164 ± 41	157 ± 39	147 ± 37	142 ± 35
Triglycerides (mg/dL)^b	168 (115–239)	151 (106–222)	102 (66–143)	86 (61–119)
hsCRP (mg/L)	2.1 (1.1–5.0)	1.3 (0.7–2.9)	—	–
HbA1c (%)	5.77 ± 1.28	5.38 ± 0.79	—	—
Antidiabetic drug use baseline	—	—	3 (0.9)	2 (0.6)
Lipid lowering drug use baseline	9 (2.2)	6 (1.5)	—	—
Antihypertensive drug use baseline	150 (36.5)	75 (18.2)	92 (26.2)	68 (19.4)
Median time of follow-up (years)	15.1 (7.7–19.6)	20.5 (19.6–21.2)	11.1 (10.9–11.3)	11.1 (11.0–11.3)

Values are n (%), mean ± standard deviation, or median (IQR) for skewed data.

BMI, body mass index; EPIC, European Prospective Investigation; HDL, high-density lipoprotein; hsCRP, high sensitivity C-reactive protein; IQR, inter-quartile range; LDL, low-density lipoprotein; PLIC, Progressione della Lesione Intimale Carotidea.

To convert to mmol/L, divide with 38.7.

To convert to mmol/L, divide with 88.6.

Table 2

Receiver operating characteristic area under the curve of prediction

	Derivation cohort	Derivation (<3 years)	Validation cohort
Protein model	0.754 ± 0.011	0.803 ± 0.093	0.705 ± 0.071
Clinical risk model	0.730 ± 0.015	0.732 ± 0.164	0.609 ± 0.057
Combined clinical and protein model	0.764 ± 0.015	0.808 ± 0.085	0.692 ± 0.090

Average receiver operating characteristics area under the curve of the prediction models.

41 in total

1. EPIC-Norfolk: study design and characteristics of the cohort. European Prospective Investigation of Cancer.

Authors: N Day; S Oakes; R Luben; K T Khaw; S Bingham; A Welch; N Wareham
Journal: Br J Cancer Date: 1999-07 Impact factor: 7.640

Review 2. Transformative Impact of Proteomics on Cardiovascular Health and Disease: A Scientific Statement From the American Heart Association.

Authors: Merry L Lindsey; Manuel Mayr; Aldrin V Gomes; Christian Delles; D Kent Arrell; Anne M Murphy; Richard A Lange; Catherine E Costello; Yu-Fang Jin; Daniel T Laskowitz; Flora Sam; Andre Terzic; Jennifer Van Eyk; Pothur R Srinivas
Journal: Circulation Date: 2015-07-20 Impact factor: 29.690

Review 3. Machine Learning in Medicine.

Authors: Alvin Rajkomar; Jeffrey Dean; Isaac Kohane
Journal: N Engl J Med Date: 2019-04-04 Impact factor: 91.245

4. Statin Trials, Cardiovascular Events, and Coronary Artery Calcification: Implications for a Trial-Based Approach to Statin Therapy in MESA.

Authors: Martin Bødtker Mortensen; Erling Falk; Dong Li; Khurram Nasir; Michael J Blaha; Veit Sandfort; Carlos Jose Rodriguez; Pamela Ouyang; Matthew Budoff
Journal: JACC Cardiovasc Imaging Date: 2017-07-25

5. Dapagliflozin and Cardiovascular Outcomes in Type 2 Diabetes.

Authors: Stephen D Wiviott; Itamar Raz; Marc P Bonaca; Ofri Mosenzon; Eri T Kato; Avivit Cahn; Michael G Silverman; Thomas A Zelniker; Julia F Kuder; Sabina A Murphy; Deepak L Bhatt; Lawrence A Leiter; Darren K McGuire; John P H Wilding; Christian T Ruff; Ingrid A M Gause-Nilsson; Martin Fredriksson; Peter A Johansson; Anna-Maria Langkilde; Marc S Sabatine
Journal: N Engl J Med Date: 2018-11-10 Impact factor: 91.245

6. Evolocumab and Clinical Outcomes in Patients with Cardiovascular Disease.

Authors: Marc S Sabatine; Robert P Giugliano; Anthony C Keech; Narimon Honarpour; Stephen D Wiviott; Sabina A Murphy; Julia F Kuder; Huei Wang; Thomas Liu; Scott M Wasserman; Peter S Sever; Terje R Pedersen
Journal: N Engl J Med Date: 2017-03-17 Impact factor: 91.245

7. Cardiovascular risk and events in 17 low-, middle-, and high-income countries.

Authors: Salim Yusuf; Sumathy Rangarajan; Koon Teo; Shofiqul Islam; Wei Li; Lisheng Liu; Jian Bo; Qinglin Lou; Fanghong Lu; Tianlu Liu; Liu Yu; Shiying Zhang; Prem Mony; Sumathi Swaminathan; Viswanathan Mohan; Rajeev Gupta; Rajesh Kumar; Krishnapillai Vijayakumar; Scott Lear; Sonia Anand; Andreas Wielgosz; Rafael Diaz; Alvaro Avezum; Patricio Lopez-Jaramillo; Fernando Lanas; Khalid Yusoff; Noorhassim Ismail; Romaina Iqbal; Omar Rahman; Annika Rosengren; Afzalhussein Yusufali; Roya Kelishadi; Annamarie Kruger; Thandi Puoane; Andrzej Szuba; Jephat Chifamba; Aytekin Oguz; Matthew McQueen; Martin McKee; Gilles Dagenais
Journal: N Engl J Med Date: 2014-08-28 Impact factor: 91.245

8. Homogenous 96-plex PEA immunoassay exhibiting high sensitivity, specificity, and excellent scalability.

Authors: Erika Assarsson; Martin Lundberg; Göran Holmquist; Johan Björkesten; Stine Bucht Thorsen; Daniel Ekman; Anna Eriksson; Emma Rennel Dickens; Sandra Ohlsson; Gabriella Edfeldt; Ann-Catrin Andersson; Patrik Lindstedt; Jan Stenvang; Mats Gullberg; Simon Fredriksson
Journal: PLoS One Date: 2014-04-22 Impact factor: 3.240

9. Association of HDL cholesterol efflux capacity with incident coronary heart disease events: a prospective case-control study.

Authors: Danish Saleheen; Robert Scott; Sundas Javad; Wei Zhao; Amrith Rodrigues; Antonino Picataggi; Daniya Lukmanova; Megan L Mucksavage; Robert Luben; Jeffery Billheimer; John J P Kastelein; S Matthijs Boekholdt; Kay-Tee Khaw; Nick Wareham; Daniel J Rader
Journal: Lancet Diabetes Endocrinol Date: 2015-05-27 Impact factor: 32.069

10. Liraglutide and Cardiovascular Outcomes in Type 2 Diabetes.

Authors: Steven P Marso; Gilbert H Daniels; Kirstine Brown-Frandsen; Peter Kristensen; Johannes F E Mann; Michael A Nauck; Steven E Nissen; Stuart Pocock; Neil R Poulter; Lasse S Ravn; William M Steinberg; Mette Stockner; Bernard Zinman; Richard M Bergenstal; John B Buse
Journal: N Engl J Med Date: 2016-06-13 Impact factor: 176.079

14 in total

1. Proteomics for personalized cardiovascular risk assessment: in pursuit of the Holy Grail.

Authors: Peter Ganz; Rajat Deo; Ruth F Dubin
Journal: Eur Heart J Date: 2020-11-01 Impact factor: 29.983

Review 2. Precision Medicine Approaches to Vascular Disease: JACC Focus Seminar 2/5.

Authors: Clint L Miller; Amy R Kontorovich; Ke Hao; Lijiang Ma; Conrad Iyegbe; Johan L M Björkegren; Jason C Kovacic
Journal: J Am Coll Cardiol Date: 2021-05-25 Impact factor: 24.094

3. Gut Microbiota Functional Dysbiosis Relates to Individual Diet in Subclinical Carotid Atherosclerosis.

Authors: Andrea Baragetti; Marco Severgnini; Elena Olmastroni; Carola Conca Dioguardi; Elisa Mattavelli; Andrea Angius; Luca Rotta; Javier Cibella; Giada Caredda; Clarissa Consolandi; Liliana Grigore; Fabio Pellegatta; Flavio Giavarini; Donatella Caruso; Giuseppe Danilo Norata; Alberico Luigi Catapano; Clelia Peano
Journal: Nutrients Date: 2021-01-21 Impact factor: 5.717

Review 4. Integrative Analysis of Multi-Omics and Genetic Approaches-A New Level in Atherosclerotic Cardiovascular Risk Prediction.

Authors: EIena I Usova; Asiiat S Alieva; Alexey N Yakovlev; Madina S Alieva; Alexey A Prokhorikhin; Alexandra O Konradi; Evgeny V Shlyakhto; Paolo Magni; Alberico L Catapano; Andrea Baragetti
Journal: Biomolecules Date: 2021-10-28

5. Relation Between Plasma Proteomics Analysis and Major Adverse Cardiovascular Events in Patients With Stable Coronary Artery Disease.

Authors: Mihaela Ioana Dregoesc; Adrian Bogdan Ţigu; Siroon Bekkering; Charlotte D C C van der Heijden; Sorana Daniela Bolboacǎ; Leo A B Joosten; Frank L J Visseren; Mihai G Netea; Niels P Riksen; Adrian Corneliu Iancu
Journal: Front Cardiovasc Med Date: 2022-02-08

6. Traditional and novel cardiometabolic risk markers across strata of body mass index in young adults.

Authors: Mia Klinkvort Kempel; Trine Nøhr Winding; Vibeke Lynggaard; Steven Brantlov; Johan Hviid Andersen; Morten Böttcher
Journal: Obes Sci Pract Date: 2021-06-01

7. Circulating Cytokines in Myocardial Infarction Are Associated With Coronary Blood Flow.

Authors: Anna Kalinskaya; Oleg Dukhin; Anna Lebedeva; Elena Maryukhnich; Georgy Rusakovich; Daria Vorobyeva; Alexander Shpektor; Leonid Margolis; Elena Vasilieva
Journal: Front Immunol Date: 2022-02-15 Impact factor: 7.561

8. The novel proteomic signature for cardiac allograft vasculopathy.

Authors: Dongmei Wei; Sander Trenson; Jan M Van Keer; Jesus Melgarejo; Ella Cutsforth; Lutgarde Thijs; Tianlin He; Agnieszka Latosinska; Agnieszka Ciarka; Thomas Vanassche; Lucas Van Aelst; Stefan Janssens; Johan Van Cleemput; Harald Mischak; Jan A Staessen; Peter Verhamme; Zhen-Yu Zhang
Journal: ESC Heart Fail Date: 2022-01-10

9. Evaluating the association between socioeconomic position and cardiometabolic risk markers in young adulthood by different life course models.

Authors: Mia Klinkvort Kempel; Trine Nøhr Winding; Morten Böttcher; Johan Hviid Andersen
Journal: BMC Public Health Date: 2022-04-09 Impact factor: 3.295

10. Age and Interleukin-15 Levels Are Independently Associated With Intima-Media Thickness in Obesity-Related NAFLD Patients.

Authors: Giovanni Tarantino; Vincenzo Citro; Clara Balsano; Domenico Capone
Journal: Front Med (Lausanne) Date: 2021-05-21