Kenneth Westerman1, Alba Fernández-Sanlés2,3, Prasad Patil4, Paola Sebastiani4, Paul Jacques1, John M Starr5,6, Ian J Deary5,6, Qing Liu7, Simin Liu7, Roberto Elosua2,8,9, Dawn L DeMeo10, José M Ordovás1,11,12. 1. JM-USDA Human Nutrition Research Center on Aging at Tufts University Boston MA. 2. Cardiovascular Epidemiology and Genetics Research Group REGICOR Study Group IMIM (Hospital del Mar Medical Research Institute) Barcelona Catalonia Spain. 3. Pompeu Fabra University (UPF) Barcelona Catalonia Spain. 4. Department of Biostatistics Boston University School of Public Health Boston MA. 5. Department of Psychology University of Edinburgh United Kingdom. 6. Centre for Cognitive Ageing and Cognitive Epidemiology University of Edinburgh United Kingdom. 7. Department of Epidemiology Brown University School of Public Health Providence RI. 8. CIBER Cardiovascular Diseases (CIBERCV) Madrid Spain. 9. Medicine Department Medical School University of Vic-Central University of Catalonia (UVic-UCC) Vic Catalonia Spain. 10. Channing Division of Network Medicine Department of Medicine Brigham and Women's Hospital Boston MA. 11. IMDEA Alimentación CEI UAM Madrid Spain. 12. Centro Nacional de Investigaciones Cardiovasculares (CNIC) Madrid Spain.
Abstract
Background Epigenome-wide association studies for cardiometabolic risk factors have discovered multiple loci associated with incident cardiovascular disease (CVD). However, few studies have sought to directly optimize a predictor of CVD risk. Furthermore, it is challenging to train multivariate models across multiple studies in the presence of study- or batch effects. Methods and Results Here, we analyzed existing DNA methylation data collected using the Illumina HumanMethylation450 microarray to create a predictor of CVD risk across 3 cohorts: Women's Health Initiative, Framingham Heart Study Offspring Cohort, and Lothian Birth Cohorts. We trained Cox proportional hazards-based elastic net regressions for incident CVD separately in each cohort and used a recently introduced cross-study learning approach to integrate these individual scores into an ensemble predictor. The methylation-based risk score was associated with CVD time-to-event in a held-out fraction of the Framingham data set (hazard ratio per SD=1.28, 95% CI, 1.10-1.50) and predicted myocardial infarction status in the independent REGICOR (Girona Heart Registry) data set (odds ratio per SD=2.14, 95% CI, 1.58-2.89). These associations remained after adjustment for traditional cardiovascular risk factors and were similar to those from elastic net models trained on a directly merged data set. Additionally, we investigated interactions between the methylation-based risk score and both genetic and biochemical CVD risk, showing preliminary evidence of an enhanced performance in those with less traditional risk factor elevation. Conclusions This investigation provides proof-of-concept for a genome-wide, CVD-specific epigenomic risk score and suggests that DNA methylation data may enable the discovery of high-risk individuals who would be missed by alternative risk metrics.
Background Epigenome-wide association studies for cardiometabolic risk factors have discovered multiple loci associated with incident cardiovascular disease (CVD). However, few studies have sought to directly optimize a predictor of CVD risk. Furthermore, it is challenging to train multivariate models across multiple studies in the presence of study- or batch effects. Methods and Results Here, we analyzed existing DNA methylation data collected using the Illumina HumanMethylation450 microarray to create a predictor of CVD risk across 3 cohorts: Women's Health Initiative, Framingham Heart Study Offspring Cohort, and Lothian Birth Cohorts. We trained Cox proportional hazards-based elastic net regressions for incident CVD separately in each cohort and used a recently introduced cross-study learning approach to integrate these individual scores into an ensemble predictor. The methylation-based risk score was associated with CVD time-to-event in a held-out fraction of the Framingham data set (hazard ratio per SD=1.28, 95% CI, 1.10-1.50) and predicted myocardial infarction status in the independent REGICOR (Girona Heart Registry) data set (odds ratio per SD=2.14, 95% CI, 1.58-2.89). These associations remained after adjustment for traditional cardiovascular risk factors and were similar to those from elastic net models trained on a directly merged data set. Additionally, we investigated interactions between the methylation-based risk score and both genetic and biochemical CVD risk, showing preliminary evidence of an enhanced performance in those with less traditional risk factor elevation. Conclusions This investigation provides proof-of-concept for a genome-wide, CVD-specific epigenomic risk score and suggests that DNA methylation data may enable the discovery of high-risk individuals who would be missed by alternative risk metrics.
Entities:
Keywords:
DNA methylation; cardiovascular disease; epigenomics; risk prediction
body mass indexBeta‐Mixture Quantile Dilation normalizationcytosine‐phosphate‐guanine sitecross‐study learnercardiovascular diseaseFramingham Heart Study Offspring CohortFramingham Heart Study (Johns Hopkins University subset)Framingham Heart Study (University of Minnesota subset)Framingham Risk Scoregenomic risk scorehigh‐density lipoproteinintraclass correlation coefficientLothian Birth Cohorts 1936low‐density lipoproteinmyocardial infarctionmethylation‐based risk scoreRegistre Gironí del CORsingle‐nucleotide polymorphismsingle‐study learnerWomen's Health Initiative
Clinical Perspective
What Is New?
An epigenomic (DNA methylation‐based) cardiovascular risk score was developed using a recently introduced statistical approach for combining risk models across cohorts.Interactions between an epigenomic risk score and existing genomic and clinical risk scores for cardiovascular disease were assessed.
What Are the Clinical Implications?
DNA methylation may add a new molecular dimension to the prediction of cardiovascular risk.This epigenomic risk score may perform best in individuals with lower Framingham Risk Scores and thus identify high‐risk individuals who would otherwise go undetected.
Introduction
DNA methylation is an important epigenetic pathway through which genetic variants and environmental exposures impact disease risk.1, 2 Methylation at specific cytosine‐phosphate‐guanine (CpG) sites has been associated with disease in epigenome‐wide association studies, even showing associations in blood as a convenient but non‐target tissue such as for type 2 diabetes mellitus.3 Methylation‐based risk scores (MRS) allow genome‐wide aggregation of epigenetic information, similarly to the more established genetic risk scores, and allow for the use of models with arbitrary complexity. These risk scores are often developed initially by using methylation as a proxy for disease risk factors, such as body mass index 4 and general aging‐related morbidity.5 Alternatively, given sufficient sample size, epigenetic associations with disease risk can be modeled directly.6Associations between DNA methylation and cardiovascular disease (CVD) have been explored in many different cohorts and using diverse approaches. Cross‐sectional associations have been found across multiple relevant tissues, namely blood, aorta, and other vascular tissues.7 Some investigations aimed at cardiovascular risk factors have discovered CpGs predictive of CVD development,8, 9 while Mendelian randomization approaches have suggested causality of at least some of these CpG‐risk factor associations.10 A few studies directly modeling incident CVD as a primary outcome have either been conducted using only global (not locus‐specific) methylation levels,11 or have found limited additional predictive power in the presence of known risk factors.12 A recent large‐scale meta‐analysis found multiple CpG sites predictive of incident coronary heart disease, but focused on univariate approaches.13 We have previously investigated methylation regions and modules associating with incident CVD, generating mechanistic insights but without aggregating these results into a direct predictor of risk.14 Additionally, it is unclear how the CVD risk tracked by DNA methylation is redundant with or complementary to existing risk metrics, including genetic scores15 and those based on traditional cardiovascular risk factors (eg, the Framingham Risk Score for generalized CVD).16Combining signal across population‐scale cohorts can increase sample size while attenuating the effect of study‐specific biases and confounding factors, but can be prone to emergent sources of confounding from “batch” effects or other systematic biases in methylation data across cohorts. This is especially problematic when there is notable class imbalance (ie, different outcome frequencies) across cohorts.17 The most common method for dealing with this heterogeneity is meta‐analysis, but standard meta‐analysis approaches are restricted to univariate (one CpG site at a time) models. Other approaches include batch effect correction on the input data set (eg, ComBat18), direct adjustment for batch/study in linear models, or adjustment for derived variables intended to capture technical biases (eg, surrogate variable analysis19), but these approaches can often lead to over‐ or under‐estimates of true biological effects.17 An alternative approach described recently, cross‐study learning, instead trains an ensemble predictor consisting of one or multiple models per cohort.20 This strategy allows the use of arbitrarily complex models while avoiding technical confounding from direct combination of the data sets.To develop an improved DNA methylation‐based cardiovascular risk predictor using multiple heterogeneous training cohorts, we used a cross‐study learning method to develop an ensemble of penalized time‐to‐event regression risk models. The resulting composite risk score performed well in a held‐out data subset, associating with survival even in the presence of traditional risk factors, and showing similar performance to models trained on naively merged data sets. External validation was achieved in a case‐control study for prevalent myocardial infarction (MI). Further, interactions were assessed between the composite methylation‐based risk score and other risk predictors, finding that it is potentially most effective in those with low Framingham Risk Scores.
Methods
Study Participants and Phenotype Collection
Phenotypes (demographic, anthropometric, biochemical, and clinical), DNA methylation data, and imputed genotypes were available either from publicly available controlled‐access databases or upon request from the cohorts. Cohort‐specific details are provided in Data S1. Blood‐based biochemical markers (total cholesterol, low‐density lipoprotein cholesterol, high‐density lipoprotein cholesterol, triglycerides, fasting glucose, high‐sensitivity C‐reactive protein, and systolic blood pressure) were log10‐transformed for all analyses. In the Lothian Birth Cohort 1936, LDL was estimated from total cholesterol and triglycerides using the Friedewald equation. Diabetes mellitus was defined as either use of diabetes mellitus medication or a measured fasting blood glucose level of >125 mg/dL. Antihypertensive medication use, smoking status, and diabetes mellitus status were assumed to be false where missing, though missing data rates for these variables in the held‐out FHS (Framingham Heart Study) subset were low (0.1%, 0.1%, and 7%, respectively). Analysis of these data sets was approved by the Tufts University Health Sciences Institutional Review Board (protocol 12592), and all subjects gave informed consent.
DNA Methylation Data Processing
DNA methylation data for all initial cohorts (Womens Health Initiative [WHI], FHS, and Lothian Birth Cohorts [LBC]) were collected using the Illumina HumanMethylation450 microarray platform21 and downloaded as raw intensity files. FHS methylation data were collected in 2 primary batches in 2 centers—1 in subjects from a nested case‐control for CVD measured at Johns Hopkins University (FHS‐JHU), and the other in a larger set of remaining Framingham Offspring participants measured at the University of Minnesota (FHS‐UM). Preprocessing was performed using the minfi and wateRmelon packages for R.22, 23 Sample‐wise filters were as follows: robust overall signal in the main cluster based on visual inspection of an intensity plot, <10% of probes undetected at a detection threshold of P<1e‐16, and a reported sex matching methylation‐based sex prediction. Probes were removed using the following criteria: >10% of samples undetected at a detection threshold of P<1e‐16, location in the X or Y chromosomes, non‐CpG probes, cross‐hybridizing probes, probes measuring SNPs, and probes with an annotated single‐nucleotide polymorphism at the CpG site or in the single‐base extension region. Samples were normalized using the Noob method for background correction and dye‐bias normalization, followed by the BMIQ method for probe type correction.24, 25 Blood cell fractions for 6 blood cell types (CD4+ T‐cells, CD8+ T‐cells, B‐cells, natural killer cells, monocytes, and granulocytes) were estimated using a common reference‐based method,26 and 5 of these (excluding granulocytes) were included in cell count‐adjusted statistical models. After quality control and filtering steps, 390 597 CpG sites were shared between the 3 data sets, formatted as beta values (roughly equal to the ratio of methylated signal to total microarray signal, or .DNA methylation data for the REGICOR (Registre Gironí del COR) cohort were collected using the Illumina MethylationEPIC microarray platform27 and analyzed using the wateRmelon
23 and methylumi28 R packages. Samples were excluded based on detection P>0.05 in at least 1% of probes or failure to cluster in the appropriate sex based on X chromosome methylation. Probes were excluded based on detection P>0.05 in at least 1% of samples, a bead count <3 in at least 5% of samples, discarding by Illumina based on underperformance (n=1031) or changes in the manufacturing process (n=977), non‐CpG targets, and cross‐hybridization (n=43 979). A batch normalization was performed by standardizing beta values to mean zero and unit variance within each bisulfite conversion batch before analysis. After quality control and preprocessing, 811 610 CpG sites across 391 individuals were available for analysis. Participants were further excluded from analysis because of unknown smoking habits (n=10) and unavailable information regarding diabetes mellitus, hypertension, or hyperlipidemia (n=53). Surrogate variable analysis19 was used to calculate 2 surrogate variables, representing potential technical and biological confounders, for adjustment in MRS replication models.
CVD Risk Modeling
Study‐specific CVD risk models were trained using penalized Cox proportional hazards regressions with the elastic net penalty. CVD events were defined as including coronary heart disease, stroke, and death from CVD (see Data S1 for cohort‐specific details), and times were right‐censored based on the most recent exam available in each cohort. The elastic alpha parameter was initially set at 0.05 (closer to ridge regression) to retain a higher number of CpGs with non‐zero weights while still performing feature selection.29 Inner cross‐validation loops varying alpha between 0.05 and 0.95 showed negligible differences in model performance (evaluated by mean squared error). The penalty parameter λ was optimized through 5‐fold cross‐validation (use of 10‐fold cross‐validation did not meaningfully change the results). For each model, only the most variable 100 000 CpGs according to median absolute deviation (≈25% of all available sites shared across platforms) were included to decrease the computational burden and ensure that the selected CpGs would have meaningful interindividual variation.The cross‐study learner (CSL) was constructed as an ensemble of study‐specific regression models. Scores from each single‐study learner (SSL) were combined using the “stacking” approach,20 implemented as follows. First, predictions from each SSL to both itself and the other training data sets were combined into a design matrix (with dimensions Ntotal×# SSLs). This formed the input to an additional penalized Cox regression (ridge regression with λ optimized through 5‐fold cross‐validation and coefficients restricted to be non‐negative) of all training studies at once. Coefficients from this regression, corresponding to input study‐specific SSLs, were normalized to sum to 1 to produce the CSL weights. For use in new data sets, SSL scores were each standardized to mean zero and unit variance before calculating their weighted sum (using the “stacking” weights) as the final CSL score.A series of approaches for combining information across cohorts were tested as alternatives to the CSL. The naive “combined” approach consisted of simply aggregating observations from all training sets into a single data set and training an elastic net regression as described above while adjusting for study as a fixed effect. The ComBat method trained across all studies as with the “combined” approach, but included an empirical Bayes‐based preprocessing step to directly remove mean differences across studies that were not associated with the outcome of interest (incident CVD events).18MRS evaluation in FHS‐UM was performed using Cox proportional hazards models, with a series of models adjusting for covariates including demographics, anthropometrics, biochemical values, and cell subtype estimates. Additional sensitivity models incorporated flexible spline bases for age and cell type fractions (pspline function) and an interaction between age and sex. Robust standard errors were used to account for family structure as has been suggested for clustered data30 and used for epigenetic risk models in FHS.31 The proportional hazards assumption was assessed using the cox.zph R function, and no violation was detected (P>0.05). To compare risk scores generated using different models (combined and ComBat‐preprocessed) to the CSL, Cox regressions adjusting for the “basic” covariate set were used to evaluate each MRS alone, the CSL MRS plus the combined MRS, and the CSL MRS plus the ComBat‐preprocessed MRS in the held‐out FHS‐UM data set. Likelihood ratio tests were then used to compare each of the 2‐MRS models to that CSL‐only model, with the resulting P values indicating whether either of these alternative scores provided additional benefit. MRS evaluation in the REGICOR case‐control used logistic regression models, adjusting for the same sets of covariates where possible, though traditional biochemical risk factors were only available in discrete low versus high categories.The biology underlying the CSL model was evaluated through a series of enrichment tests using the component CpG loci and annotated genes. Gene ontology‐based enrichment analysis of each cohort‐specific model was performed using the gometh function from the missMethyl package for R.32 This procedure uses gene annotations for CpGs from the HumanMethylation450 microarray annotation from Illumina (v1.0 B2). Enrichment analysis is then performed for each gene ontology category using Wallenius’ non‐central hypergeometric distribution to account for inconsistent representation of CpG sites across genes. The overall merged set of CpGs included in the final CSL model was then tested for enrichment in transcription factor binding sites using HOMER tool.33 CpG loci (with respect to genome build hg19) were provided as inputs, with 200 base‐pair windows and repeat‐masked sequences.
Genomic Risk Score Calculation
Imputed genotype data for WHI were retrieved from dbGaP (accession: phs000746.v2.p3. Variants were filtered for imputation R
2>0.3, and annotated with rsIDs, loci, and allelic information using the 1000 Genomes Phase 3 download from dbSNP (download date: April 13, 2018). Weights for the genetic risk score calculation (6 630 151 variants) were based on the genome‐wide CVD score developed by Khera et al.15 We note that these scores were developed only for populations of European descent, and thus are not optimized for the mixed‐ancestry WHI population. Genomic risk scores (GRS) were then calculated as the weighted sum of allelic dosages, normalized by the number of relevant SNPs available. Genotype data processing and GRS calculation were performed using PLINK 2.0.
Risk Score Interaction Analysis
Interaction analysis was performed using similar Cox regression models to those above, adjusting for the “basic” set of covariates and using robust standard error estimates. To facilitate visual comparisons, main‐effect regressions for the MRS were fitted within risk strata defined by the Framingham Risk Score (FRS) or genomic risk score (GRS), both separately in each data set having >25 events in the group, and after merging these data sets and allowing for stratified baseline hazards (strata() argument to the coxph function). To obtain overall interaction effect estimates, an interaction between MRS and either FRS or GRS was introduced into a combined regression including all data sets, while allowing stratified baseline hazards. We note that main effects in the interaction analysis are biased away from the null since the regression data sets were used for training the MRS. Regressions assessing the GRS excluded non‐European ancestry participants to match the ancestry used to develop the CVD score.15For quasi‐replication of these associations in the REGICOR data set, stratified logistic regressions were used to discriminate MI cases from controls using the MRS, while adjusting for estimated cell count fractions as well as 2 surrogate variable analysis components (as in the main REGICOR models). In the absence of continuous values for blood pressure and lipids, an empirical risk function was generated by first performing a logistic regression on the following cardiovascular risk factors: age, sex, estimated cell count fractions, body mass index, diabetes mellitus, smoking status, hyperlipidemia (binary), and hypertension (binary), along with 2 surrogate variable analysis components. Predicted risks based on this model were then used to stratify subjects into 4 risk groups by evenly splitting the range of predicted risks into 4 segments (thus resulting in strata based on raw risk, rather than percentiles).
Results
Cross‐Study Learner Model Development
Epigenomic model development was performed in 3 cohorts, including the WHI, FHS, and LBC 1936. The FHS data set was divided into 2 functionally separate groups (FHS‐JHU and FHS‐UM) based on differences in subject selection and geographic location of laboratory methylation analysis (see Methods). Further population details can be found in Table 1.
Table 1
Baseline Parameters of the Populations Used for Model Development
Study/Subset
WHI
FHS‐JHU
LBC
FHS‐UM
Sample size
2023
484
818
2103
Age, y
65 (59–70)
71 (64–77)
69 (68–70)
64 (59–71)
Sex (women)
2023 (100%)
145 (30%)
406 (50%)
1270 (60%)
Ancestry
% European
959 (47%)
484 (100%)
818 (100%)
2103 (100%)
% African American
651 (32%)
0 (0%)
0 (0%)
0 (0%)
% Hispanic
413 (20%)
0 (0%)
0 (0%)
0 (0%)
Body mass index, kg/m2
29.1 (25.5–33.3)
28.2 (25.5–31.3)
27.5 (24.9–30.3)
27.4 (24.3–31)
LDL cholesterol, mg/dL
150 (126–175)
88 (73–107)
118 (89.5–150.3)
107 (87–128)
HDL cholesterol, mg/dL
51 (43–60)
49 (40–60)
56.1 (47.2–68.3)
56 (45.8–69)
Triglycerides, mg/dL
127 (92–177)
101.5 (75–141.2)
128.4 (97.4–171.2)
102 (73–142)
Fasting glucose, mg/dL
96 (88.6–108)
106 (97–116)
Unavailable
100 (94–109)
Systolic blood pressure, mm Hg
131 (120–143)
130 (117–143)
148.7 (137–161.3)
126 (116–138)
No. CVD events
Prior only
0
127
70
112
Incident only
1009
67
133
146
Prior and incident
0
58
164
34
Total
1009
252
367
292
Follow‐up time, y
22
10
14
10
Continuous values shown as: median (interquartile range). CVD indicates cardiovascular disease; FHS‐JHU, Framingham Heart Study Offspring Cohort (Johns Hopkins University subset); FHS‐UM, Framingham Heart Study Offspring Cohort (University of Minnesota subset); HDL, high‐density lipoprotein; LBC, Lothian Birth Cohorts 1936; LDL, low‐density lipoprotein; and WHI, Women's Health Initiative.
Baseline Parameters of the Populations Used for Model DevelopmentContinuous values shown as: median (interquartile range). CVD indicates cardiovascular disease; FHS‐JHU, Framingham Heart Study Offspring Cohort (Johns Hopkins University subset); FHS‐UM, Framingham Heart Study Offspring Cohort (University of Minnesota subset); HDL, high‐density lipoprotein; LBC, Lothian Birth Cohorts 1936; LDL, low‐density lipoprotein; and WHI, Women's Health Initiative.Figure 1 outlines the computational workflow. Briefly, a cross‐study learning (CSL) model was developed by training time‐to‐event elastic net regressions on 3 of the data sets, while holding out the FHS‐UM subset for evaluation. The FHS‐UM subset was chosen to hold out as it more closely represents the larger free‐living Framingham population. While there is moderate heterogeneity between the included cohorts (for example, in original cohort study designs, details of CVD definitions, and length of follow‐up), the intent of the present investigation was to explore the extraction of shared signal across cohorts with recognized heterogeneity. Next, a model re‐trained on all 4 data sets were subject to external replication in the REGICOR study. CSL model CpGs were characterized as to their potential biological function, and model performance was assessed across strata of alternative cardiovascular risk metrics.
Figure 1
Computational workflow for MRS development and evaluation.
The initial MRS was trained in 3 cohorts with Framingham Heart Study Offspring Cohort (University of Minnesota subset) held out to evaluate performance. The final MRS was then trained using all 4 data sets and examined for biological significance, before testing for prevalent myocardial infarction discrimination in an independent cohort and assessment of interactions with genetic and traditional risk scores. FHS‐JHU indicates Framingham Heart Study Offspring Cohort (Johns Hopkins University subset); FHS‐UM, Framingham Heart Study Offspring Cohort (University of Minnesota subset); LBC, Lothian Birth Cohorts 1936; MI, myocardial infarction; MRS, methylation‐based risk score; and WHI, Women's Health Initiative.
Computational workflow for MRS development and evaluation.
The initial MRS was trained in 3 cohorts with Framingham Heart Study Offspring Cohort (University of Minnesota subset) held out to evaluate performance. The final MRS was then trained using all 4 data sets and examined for biological significance, before testing for prevalent myocardial infarction discrimination in an independent cohort and assessment of interactions with genetic and traditional risk scores. FHS‐JHU indicates Framingham Heart Study Offspring Cohort (Johns Hopkins University subset); FHS‐UM, Framingham Heart Study Offspring Cohort (University of Minnesota subset); LBC, Lothian Birth Cohorts 1936; MI, myocardial infarction; MRS, methylation‐based risk score; and WHI, Women's Health Initiative.The initial predictor was developed by training individual penalized Cox proportional hazards regression models (single‐study learners, or SSLs) in each of the 3 training cohorts (WHI, FHS‐JHU, and LBC). Scores from these models were aggregated through a “stacking” method, in which the outcomes and model predictions from each of the individual data sets are combined, and a regression is used to assign weights to each of the model scores (see Methods). This procedure led to FHS‐JHU dropping out of the ensemble model, with weights for this initial predictor as follows: 0.57 (WHI), 0.0 (FHS‐JHU), and 0.43 (LBC). This result means that the FHS‐JHU score did not transfer to the rest of the data sets (ie, to WHI and LBC) as well as the scores from the other 2 components models.
Assessment in Held‐Out FHS Subset
Stacking of the 3 initial predictors resulted in model weights of 0.57, 0, and 0.43 for WHI, FHS‐JHU, and LBC, respectively (ie, the FHS‐JHU sub‐model did not contribute to the initial stacked ensemble model). The resulting ensemble predictor was evaluated using robust Cox proportional hazards models in FHS‐UM, showing strong associations with incident CVD in an unadjusted model (hazard ratio [HR]=1.58, 95% CI, 1.37–1.83), which was attenuated partially through adjustment for standard covariates (age, sex, and estimated cell type fractions; HR, 1.28; 95% CI, 1.10–1.50) as well as CVD risk factors (HR, 1.29; 95% CI, 1.09–1.51). Results for the unadjusted model and 3 risk factor‐adjusted models are shown in Table 2, and associated Kaplan–Meier curves across epigenetic risk tertiles are shown in Figure 2.
Estimated hazard ratio per SD of the methylation‐based risk score [95% CI].
No covariates.
Adjusted for age, sex, and estimated cell type fractions.
Additionally adjusted for body mass index, low‐density lipoprotein cholesterol, high‐density lipoprotein cholesterol, systolic blood pressure, diabetes mellitus status, and current smoking.
Adjusted for Framingham Risk Score (uses all risk factors other than body mass index and cell type fractions).
Figure 2
Kaplan–Meier survival curves in the held‐out Framingham Heart Study Offspring Cohort (University of Minnesota subset data set).
Individual curves correspond to tertiles of the initial (3‐data set) methylation‐based risk score. Vertical ticks correspond to censored observations, and colored bands represent 95% CI for tertile‐specific survival curves. X‐axis is limited to the time span in which at least 50 uncensored observations remained for each tertile (3275 days). MRS indicates methylation‐based risk score.
MRS Performance in Held‐Out FHS SubsetFHS indicates Framingham Heart Study; FRS, Framingham Risk Score; HR, hazard ratio; and MRS, methylation‐based risk score.Estimated hazard ratio per SD of the methylation‐based risk score [95% CI].No covariates.Adjusted for age, sex, and estimated cell type fractions.Additionally adjusted for body mass index, low‐density lipoprotein cholesterol, high‐density lipoprotein cholesterol, systolic blood pressure, diabetes mellitus status, and current smoking.Adjusted for Framingham Risk Score (uses all risk factors other than body mass index and cell type fractions).
Kaplan–Meier survival curves in the held‐out Framingham Heart Study Offspring Cohort (University of Minnesota subset data set).
Individual curves correspond to tertiles of the initial (3‐data set) methylation‐based risk score. Vertical ticks correspond to censored observations, and colored bands represent 95% CI for tertile‐specific survival curves. X‐axis is limited to the time span in which at least 50 uncensored observations remained for each tertile (3275 days). MRS indicates methylation‐based risk score.Additional sensitivity analyses were performed to assess the robustness of these results to variations in the model‐building or evaluation approach. Hazard ratios in the held‐out FHS‐UM were no higher using penalized logistic regression in training (unadjusted HR, 1.52; 95% CI, 1.32–1.76), excluding individuals with past events in training (unadjusted HR, 1.55; 95% CI, 1.33–1.81), or adjusting for race in WHI (unadjusted HR, 1.20; 95% CI, 1.03–1.39). Neither were these results affected by training using the full set of 390 597 CpGs. Similarly, variations in the evaluation regressions did not produce meaningfully different results, either when excluding all individuals who experienced prior CVD events (Table S1), analyzing incident CVD as a binary outcome using logistic regression (unadjusted odds ratio per SD=2.15, 95% CI, 1.91–2.42), or stratifying by sex. Adjustment for age and cell type fractions as flexible spline functions as well as an age‐sex interaction to assess possible residual confounding did not decrease estimated HRs from the basic model (saturated model HR, 1.31; 95% CI, 1.12–1.52). Use of the MRS for binary incident CVD prediction resulted in a c‐statistic of 0.642 (95% CI, 0.599–0.685), compared with 0.691 (95% CI, 0.653–0.729) for the Framingham Risk Score alone and 0.695 (95% CI, 0.655–0.734) using the 2 scores together.Results from comparison of CSL performance to models trained on combined data sets (either naive combination or including preprocessing using ComBat) are shown in Figure S1. The ComBat‐preprocessed model had modestly higher hazard ratios in FHS‐UM, while relative differences with the combined model depended on the covariates included. However, likelihood ratio tests using the basic model covariates (age, sex, and cell type fraction‐adjusted) did not reveal a strong added benefit of either the combined (P=0.58) or ComBat (P=0.08) risk scores over that using only the CSL.
Final CSL Model Characterization
The stacking regression in the final CSL model defining the methylation‐based risk score (MRS) gave the most weight to WHI (0.48) and LBC (0.38), while retaining non‐zero weights for FHS‐JHU (0.06) and FHS‐UM (0.08). This result indicates that the WHI and LBC‐trained models were better able to transfer across the combined‐cohort set of outcomes compared with the other models. There was little overlap of specific CpG sites across cohort‐specific models, with a maximum of 13 CpGs shared between 2 models (WHI and FHS‐UM) and no CpGs shared between >3 or (Figure 3A). This could result from heterogeneity in the complex relationships between DNA methylation and CVD across populations. However, it may also reflect the tendency of the elastic net regression to select only a single feature from a group of correlated features, where the specific CpGs selected in different data sets varied because of the presence of biological and technical noise. However, even if the SSLs were capturing different biological mechanisms, the CSL model is designed to capture such heterogeneous signal from across cohorts. Despite the lack of site‐specific overlap, there was broad agreement for 3 of the 4 component SSL models at the level of enriched biological processes, with all except FHS‐JHU enriched most strongly for proximity to genes involved in homophilic cell adhesion (Figure 3B). MRS component CpGs tended to be found in similar genomic loci to the overall set of variable CpGs and were enriched in gene bodies and depleted in CpG islands compared with the full microarray CpG set. However, MRS CpGs did show a modest enrichment in and around CpG islands compared with the set of variable CpGs (Figure 3D). To seek more clarity as to potential biological mechanisms represented by the MRS, the HOMER tool was used to calculate enrichment of transcription factor binding motifs in the MRS component CpG sites. Using the union of all individual SSL CpG sites as input, no strong enrichments were found (all q>0.5).
Figure 3
Characterization of the final cross study learner model.
A, Overlap of cytosine‐phosphate‐guanine (CpG) sites in the 4 individual predictors constituting the final model. B, Study‐specific weights for constructing the ensemble model (derived from the “stacking” regression). C, Results from Gene Ontology (GO)‐based enrichment analysis using genes annotated to single‐study learner component CpGs. All GO terms with false discovery rate <0.001 in any cohort are shown and colored according to −log(P value) for enrichment in each single‐study learner. Values were cut at −log(P)=20 for visualization purposes. D, Proportion of CpGs in the full set of cross study learner CpGs (union of CpG sets in each component SSL) compared with the 100 000 most variable CpGs (as used in single‐study learner model development) and the full set of available CpGs. Groupings according to both gene‐based and CpG island‐based CpG annotations are shown. CpG indicates cytosine‐phosphate‐guanine; FHS‐JHU, Framingham Heart Study Offspring Cohort (Johns Hopkins University subset); FHS‐UM, Framingham Heart Study Offspring Cohort (University of Minnesota subset); LBC, Lothian Birth Cohorts 1936; MRS, methylation‐based risk score; WHI, Women's Health Initiative; UTR, untranslated region; and TSS, transcription start site.
Characterization of the final cross study learner model.
A, Overlap of cytosine‐phosphate‐guanine (CpG) sites in the 4 individual predictors constituting the final model. B, Study‐specific weights for constructing the ensemble model (derived from the “stacking” regression). C, Results from Gene Ontology (GO)‐based enrichment analysis using genes annotated to single‐study learner component CpGs. All GO terms with false discovery rate <0.001 in any cohort are shown and colored according to −log(P value) for enrichment in each single‐study learner. Values were cut at −log(P)=20 for visualization purposes. D, Proportion of CpGs in the full set of cross study learner CpGs (union of CpG sets in each component SSL) compared with the 100 000 most variable CpGs (as used in single‐study learner model development) and the full set of available CpGs. Groupings according to both gene‐based and CpG island‐based CpG annotations are shown. CpG indicates cytosine‐phosphate‐guanine; FHS‐JHU, Framingham Heart Study Offspring Cohort (Johns Hopkins University subset); FHS‐UM, Framingham Heart Study Offspring Cohort (University of Minnesota subset); LBC, Lothian Birth Cohorts 1936; MRS, methylation‐based risk score; WHI, Women's Health Initiative; UTR, untranslated region; and TSS, transcription start site.To better understand the stability of the risk score over time, intraclass correlation coefficients (ICCs) were calculated for 2 sets of grouped samples: 26 technical replicates from FHS and ≈1000 longitudinal samples (across 3 visits, or about 6 years total) from LBC (Table S2). The technical replicates showed an ICC of 0.85, while the longitudinal samples showed an ICC of 0.68. As would be expected, the ICC for samples closer in time (Waves 1 & 2; ICC=0.69) were higher than that for samples more distant in time (Waves 1 & 3; ICC=0.61). Based on the observation of imperfect stability of the MRS over time as well as the partial attenuation in held‐out hazard ratios after adjustment for age, its component CpGs (the 1305‐element union of all CpGs in any of the 4 individual SSL models) were examined for overlap with established epigenetic age metrics. While no enrichment was seen for the original cross‐tissue DNAm age from Horvath,34 strong enrichment was seen for the morbidity‐directed PhenoAge5 (9 of 513 CpGs; P=2.3e‐5) and especially the blood‐specific aging marker from Hannum et al35 (13 of 71 CpGs; P=5.9e‐21). We note that these overlaps do not constitute a major fraction of either CpG set but are nonetheless highly statistically significant. The PhenoAge metric is based on some known cardiovascular risk factors (eg, C‐reactive protein) and is known to associate with CVD but is not trained in any of the cohorts included here.
Discrimination in Myocardial Infarction Case‐Control Study
As one form of replication, the MRS was investigated for its discriminative performance in a nested case‐control for prior myocardial infarction in the REGICOR cohort (Table 3; cohort description in Table S3), which was matched for sex and age and thus free of potential confounding by these variables. We note that this data set contained prevalent (rather than incident) events, and thus provides replication in a similar but not identical biological context. These methylation data were collected on the next generation of Illumina methylation microarray (MethylationEPIC), which does not perfectly overlap with the HumanMethylation450 platform, but contained ≈93% of the CpGs input to the MRS model training procedure. The MRS was able to discriminate cases and controls in both unadjusted (odds ratio=1.79, P=6.33e‐6) and, to a lesser degree, risk factor‐adjusted models (odds ratio=1.61, P=0.019). Odds ratios were qualitatively similar across modeling strategies (Combined, ComBat, and CSL) for all of the adjustment models (Figure S1B).
Table 3
Results From Replication in REGICOR Myocardial Infarction Case Control
Model
ComBat
Combined
CSL
Minimala
1.79 [1.39–2.31]
1.86 [1.45–2.38]
1.83 [1.41–2.37]
Basicb
2.16 [1.58–2.93]
2.12 [1.57–2.87]
2.14 [1.58–2.89]
Plus risk factorsc
1.76 [1.22–2.54]
1.66 [1.15–2.4]
1.61 [1.11–2.34]
Results are presented as: odds ratio per SD methylation‐based risk score [95% CI]. CSL indicates cross study learner; MRS, methylation‐based risk score; and REGICOR, Registre Gironí del COR.
Adjusted for 2 surrogate variable analysis components.
Additionally adjusted for age, sex, and estimated cell type fractions.
Further adjusted for body mass index, low‐density lipoprotein, high‐density lipoprotein cholesterol, systolic blood pressure, diabetes mellitus status, and current smoking.
Results From Replication in REGICOR Myocardial Infarction Case ControlResults are presented as: odds ratio per SD methylation‐based risk score [95% CI]. CSL indicates cross study learner; MRS, methylation‐based risk score; and REGICOR, Registre Gironí del COR.Adjusted for 2 surrogate variable analysis components.Additionally adjusted for age, sex, and estimated cell type fractions.Further adjusted for body mass index, low‐density lipoprotein, high‐density lipoprotein cholesterol, systolic blood pressure, diabetes mellitus status, and current smoking.
Interactions With Alternate Risk Metrics
To understand how the present risk score interacts with other established CVD risk metrics, the performance of the MRS was re‐evaluated after stratifying individuals by risk scores reflecting either demographic and biochemical features (Framingham Risk Score), or genetic variants (GRS based on Khera et al15). First, the marginal effects of these risk scores were confirmed in each population. The FRS was strongly predictive in WHI and FHS, while surprisingly showing no association with CVD incidence in LBC (Table S4). As the data set with the largest number of available events, imputed genotypes were retrieved for WHI and GRS calculated, demonstrating a moderate association with CVD (odds ratio per SD=1.28, P=1.1e‐6).In pooled Cox models using study‐specific baseline hazards and performed using the final 4‐study MRS, it appeared that the MRS was more effective in those in lower “traditional” risk strata (based on models stratified by FRS categories; Figure 4A). As a sensitivity analysis, the cohorts were fully stratified into separate models, in which this pattern was visually clear in WHI and FHS‐JHU (Figure S2). The pattern did not appear in LBC, although we note that the Framingham Risk Score also did not show a “main effect” for incident CVD in this cohort. A similar pattern appeared with respect to genetic risk in WHI (European ancestry participants only based on the formulation of the relevant risk score), in which maximum MRS performance was achieved in the lowest alternative risk stratum. Supplementing these visual comparisons, combined Cox regressions across all cohorts (allowing for different baseline hazards across studies) showed a strong MRS‐FRS interaction effect (7% reduction in HR for the MRS per 10% increase in FRS; P=8.27e‐05), while that for the MRS‐GRS interaction did not reach nominal statistical significance (2% reduction in HR for the MRS per SD increase in GRS; P=0.719).
Figure 4
Interactions of MRS with other biomarkers of CVD risk.
A, Hazard ratios for the MRS within subsets of 10‐year generalized CVD risk according to the Framingham Risk Score. B, Hazard ratios for the MRS within quartiles of a genetic cardiovascular risk score (in European‐ancestry WHI participants only). Hazard ratios are estimated using the final MRS, which was trained using each of these data sets. Cox regressions included stratum‐specific baseline hazards and were adjusted for age, sex, and estimated cell subtype fractions. Error bars represent standard errors for the hazard ratio estimates. Annotated P values describe the test of interaction between the MRS and the alternative risk metric. FRS indicates Framingham Risk Score; GRS, genetic risk score; and MRS, methylation‐based risk score.
Interactions of MRS with other biomarkers of CVD risk.
A, Hazard ratios for the MRS within subsets of 10‐year generalized CVD risk according to the Framingham Risk Score. B, Hazard ratios for the MRS within quartiles of a genetic cardiovascular risk score (in European‐ancestry WHI participants only). Hazard ratios are estimated using the final MRS, which was trained using each of these data sets. Cox regressions included stratum‐specific baseline hazards and were adjusted for age, sex, and estimated cell subtype fractions. Error bars represent standard errors for the hazard ratio estimates. Annotated P values describe the test of interaction between the MRS and the alternative risk metric. FRS indicates Framingham Risk Score; GRS, genetic risk score; and MRS, methylation‐based risk score.To explore the clinical potential of these interactions further, we returned to the initial MRS (trained in 3 data sets with FHS‐UM held out). The FHS‐UM data set was filtered to include only participants with lower CVD risk based on the FRS (<10% estimated 10‐year risk). Within this lower‐risk subset, participants in the upper MRS quintile had more than double the risk of the remainder of the participants: 7% (12/176) of the upper MRS quintile experienced incident events, while 3% (19/701) of the remaining 4 MRS quintiles experienced incident events.FRS could not be calculated in the REGICOR data set, as not all risk factors were available as continuous values. However, stratified models replicated the observation of greater MRS discrimination in the lowest alternative risk stratum. An empirical risk function was generated through logistic regression of MI status on cardiovascular risk factors (age, sex, body mass index, diabetes mellitus, smoking status, hyperlipidemia [binary], and hypertension). Predicted MI risk using this model was used to stratify subjects into 4 risk groups, with MRS odds ratios (per SD) of 4.49 in the lowest‐risk group versus 1.20 in the highest‐risk group. More detailed results from these analyses are shown in Table S5.
Discussion
Epigenetic signatures of cardiometabolic diseases and aging in general are being actively explored as biomarkers of disease risk that are potentially modifiable and reveal underlying biological mechanisms. Here, in a novel application of a cross‐study ensembling method, we introduce a DNA methylation‐based score specific to cardiovascular disease risk. The model performs similarly to one trained on a direct combination of the component data sets and may perform best in individuals predicted to be at lower risk based on traditional risk factors.We opted to use cross‐study learning to train our risk model based on the expectation that differences across cohorts (eg, demographic, behavioral) may contribute to heterogeneity in both the marginal distribution of the CpG features and the conditional distribution of the CVD outcome. Under these conditions, the generalizability of a single‐study predictor is often obscured or overstated.36, 37 The performance of the CSL model was similar to that of models trained on the merged cohorts with or without batch adjustment via ComBat. This suggests that the assumptions made by these direct combination strategies (ie, that the heterogeneity structure can be captured by variation in the marginal effects of each CpG site) are met. In practice, this underlying structure is unknown, and we highlight that the CSL was able to produce similar gains in accuracy without making specific assumptions.In assessing the stability of the MRS, we observed reasonable reproducibility between technical replicates (ICC=0.85). ICCs for LBC subjects over time were somewhat lower (ICC=0.68), which is to be expected because of not only changes in environment, but also the known epigenetic evolution with age that we observed to be enriched in the components of our score. Furthermore, this value is at the upper end of the range of single‐CpG repeatability measurements over time calculated in the combined Lothian Birth Cohorts (1921 and 1936).38 These ICC values suggest an imperfect but usable reproducibility of the MRS, and an aggregate marker that is fairly robust considering the low replicability that has been observed for individual sites in technical replicates (general median ICC of 0.3 and mode of 0.75 in a “high reliability” cluster).39Our observation that different CpGs tended to be selected across studies (Figure 3) is in agreement with the relative lack of replication seen in prior cardiovascular epigenomic studies.7 However, the enrichment of the MRS component CpGs for proximity to genes related to cell‐cell adhesion (in all subsets except FHS‐JHU) is indicative of shared underlying biological mechanisms. As we have previously observed in the WHI and FHS cohorts, it appears that immune activation is central to the prognostic information contained in leukocyte DNA methylation.14 For example, epigenetic processes have been shown to be involved in the activation and increased adhesion of monocytes in response to environmental insults and metabolic stress, though these have been explored primarily in relationship to histone modifications.40 Our results provide preliminary support for an attractive model in which a methylation‐based score could act as a monitor of cumulative stress in leukocytes and their corresponding activation towards a more atherogenic state.Existing epigenetic scores have shown varying strengths of association with incident cardiovascular disease. An early investigation examined blood‐based methylation in LINE‐1 elements, finding strong associations of global hypomethylation with prevalent and incident ischemic heart disease (LINE‐1), though additional reports showed opposite associations of methylation at repetitive elements with CVD.41 Guarrera et al developed a biomarker for MI based on global LINE‐1 and ZBTB12 gene methylation that provided a modest net reclassification index improvement (0.23–0.47) compared with traditional risk factors only. Multiple epigenetic aging metrics, though not developed specifically for CVD, have been shown to predict incident CHD, including PhenoAge (odds ratios from 1.02–1.08) and GrimAge (hazard ratio=1.07, adjusted for age and technical factors).5, 31 While these associations are statistically significant, they do not represent clinically meaningful improvements in discrimination. Our observed hazard ratio of 1.28 (Basic model in the held‐out FHS‐UM data set) indicates that this MRS may be closer to clinical relevance. We note that our component CpG sites overlap strongly with those of these established epigenetic metrics including PhenoAge, suggesting that it captures some of the same biological patterns. However, the mechanistic significance of the specific methylation signals captured by these aging‐related metrics, whether as markers of epigenetic regulation breakdown or the work of an “epigenetic maintenance system”, is still unclear.34, 42In examining the potential clinical utility of an novel risk score for CVD, it is important to understand to what extent it is redundant with or complementary to existing risk metrics. We first note that the strength of this epigenetic score in adjusted models is lower than that found for traditional risk scores (Table S4) and some novel biochemical risk measures such as high‐sensitivity Troponin I (adjusted HR for global CVD=3.01).43 However, analysis of interactions between different risk metrics can be clinically relevant, as demonstrated for example in a recent investigation exploring the interaction between genetic and lifestyle‐based risk prediction for dementia.44 Here, we saw a pattern of stronger epigenetic risk associations in individuals whose cardiovascular risk based on traditional metrics (here, the Framingham Risk Score) was low. This pattern replicated in the REGICOR data set (though FRS could not be directly calculated), with improved MRS discrimination in lower‐risk subjects based on an empirical risk function. While these associations are preliminary, they suggest that an epigenetic risk score could help identify higher‐risk individuals who otherwise would not have been detected by other metrics. While we did not identify any robust patterns of differential MRS performance in strata based on a genetic cardiovascular risk score, there may have been lower power to detect any such patterns from the outset given the modest discriminatory performance of the GRS in WHI.Multiple limitations should be acknowledged. While lymphocytes are known to be important in CVD pathogenesis, epigenetic signals have been reported in other CVD‐relevant tissues, such as aorta and other vascular tissues,7 that were not examined here. Additionally, the present definition of CVD was chosen to balance specificity of CVD subtypes with sample size, but this balance could be altered to focus on more specific disease subtypes (eg, myocardial infarction) or a broader definition of CVD (eg, including heart failure). Finally, while the REGICOR data set provided an important age‐ and sex‐matched case‐control setting for replication of the MRS, this work would benefit from future replication in an independent cohort enabling assessment of incident disease.In summary, we have developed an epigenetic risk score for cardiovascular disease that provides additional value beyond existing risk measures and may show improved performance in populations otherwise designated as low risk. Furthermore, we have shown a novel application of a cross‐cohort ensembling method that may provide significant value to future investigations in genomic epidemiology.
Sources of Funding
This work was supported by the US Department of Agriculture, Agriculture Research Service (8050–51000‐098‐00D). Dr. Westerman was additionally supported by National Institutes of Health predoctoral training grant 5T32HL069772‐14. The WHI program is funded by the National Heart, Lung, and Blood Institute, National Institutes of Health, US Department of Health and Human Services through contracts HHSN268201600018C, HHSN268201600001C, HHSN268201600002C, HHSN268201600003C, and HHSN268201600004C. This article was prepared in collaboration with investigators of the WHI but has not been reviewed by the WHI and does not necessarily reflect the opinions of the WHI investigators or the National Heart, Lung, and Blood Institute. The Framingham Heart Study is conducted and supported by the National Heart, Lung, and Blood Institute in collaboration with Boston University (Contract No. N01‐HC‐25195 and HHSN268201500001I). This article was not prepared in collaboration with investigators of the Framingham Heart Study and does not necessarily reflect the opinions or views of the Framingham Heart Study, Boston University, or National Heart, Lung, and Blood Institute. The LBC 1936 is supported by Age UK (Disconnected Mind program) and the Medical Research Council (MR/M01311/1). Methylation typing in LBC 1936 was supported by Centre for Cognitive Ageing and Cognitive Epidemiology (Pilot Fund award), Age UK, The Wellcome Trust Institutional Strategic Support Fund, The University of Edinburgh, and The University of Queensland. LBC 1936 work was conducted in the Centre for Cognitive Ageing and Cognitive Epidemiology, which supported Dr. Deary and is supported by the Medical Research Council and Biotechnology and Biological Sciences Research Council (MR/K026992/1).
Disclosures
None.Data S1Tables S1–S5Figures S1–S2References 45–49Click here for additional data file.
Authors: Golareh Agha; Michael M Mendelson; Cavin K Ward-Caviness; Roby Joehanes; TianXiao Huan; Rahul Gondalia; Elias Salfati; Jennifer A Brody; Giovanni Fiorito; Jan Bressler; Brian H Chen; Symen Ligthart; Simonetta Guarrera; Elena Colicino; Allan C Just; Simone Wahl; Christian Gieger; Amy R Vandiver; Toshiko Tanaka; Dena G Hernandez; Luke C Pilling; Andrew B Singleton; Carlotta Sacerdote; Vittorio Krogh; Salvatore Panico; Rosario Tumino; Yun Li; Guosheng Zhang; James D Stewart; James S Floyd; Kerri L Wiggins; Jerome I Rotter; Michael Multhaup; Kelly Bakulski; Steven Horvath; Philip S Tsao; Devin M Absher; Pantel Vokonas; Joel Hirschhorn; M Daniele Fallin; Chunyu Liu; Stefania Bandinelli; Eric Boerwinkle; Abbas Dehghan; Joel D Schwartz; Bruce M Psaty; Andrew P Feinberg; Lifang Hou; Luigi Ferrucci; Nona Sotoodehnia; Giuseppe Matullo; Annette Peters; Myriam Fornage; Themistocles L Assimes; Eric A Whitsel; Daniel Levy; Andrea A Baccarelli Journal: Circulation Date: 2019-08-19 Impact factor: 29.690
Authors: Sven Heinz; Christopher Benner; Nathanael Spann; Eric Bertolino; Yin C Lin; Peter Laslo; Jason X Cheng; Cornelis Murre; Harinder Singh; Christopher K Glass Journal: Mol Cell Date: 2010-05-28 Impact factor: 17.970
Authors: Simone Wahl; Alexander Drong; Benjamin Lehne; Marie Loh; William R Scott; Sonja Kunze; Pei-Chien Tsai; Janina S Ried; Weihua Zhang; Youwen Yang; Sili Tan; Giovanni Fiorito; Lude Franke; Simonetta Guarrera; Silva Kasela; Jennifer Kriebel; Rebecca C Richmond; Marco Adamo; Uzma Afzal; Mika Ala-Korpela; Benedetta Albetti; Ole Ammerpohl; Jane F Apperley; Marian Beekman; Pier Alberto Bertazzi; S Lucas Black; Christine Blancher; Marc-Jan Bonder; Mario Brosch; Maren Carstensen-Kirberg; Anton J M de Craen; Simon de Lusignan; Abbas Dehghan; Mohamed Elkalaawy; Krista Fischer; Oscar H Franco; Tom R Gaunt; Jochen Hampe; Majid Hashemi; Aaron Isaacs; Andrew Jenkinson; Sujeet Jha; Norihiro Kato; Vittorio Krogh; Michael Laffan; Christa Meisinger; Thomas Meitinger; Zuan Yu Mok; Valeria Motta; Hong Kiat Ng; Zacharoula Nikolakopoulou; Georgios Nteliopoulos; Salvatore Panico; Natalia Pervjakova; Holger Prokisch; Wolfgang Rathmann; Michael Roden; Federica Rota; Michelle Ann Rozario; Johanna K Sandling; Clemens Schafmayer; Katharina Schramm; Reiner Siebert; P Eline Slagboom; Pasi Soininen; Lisette Stolk; Konstantin Strauch; E-Shyong Tai; Letizia Tarantini; Barbara Thorand; Ettje F Tigchelaar; Rosario Tumino; Andre G Uitterlinden; Cornelia van Duijn; Joyce B J van Meurs; Paolo Vineis; Ananda Rajitha Wickremasinghe; Cisca Wijmenga; Tsun-Po Yang; Wei Yuan; Alexandra Zhernakova; Rachel L Batterham; George Davey Smith; Panos Deloukas; Bastiaan T Heijmans; Christian Herder; Albert Hofman; Cecilia M Lindgren; Lili Milani; Pim van der Harst; Annette Peters; Thomas Illig; Caroline L Relton; Melanie Waldenberger; Marjo-Riitta Järvelin; Valentina Bollati; Richie Soong; Tim D Spector; James Scott; Mark I McCarthy; Paul Elliott; Jordana T Bell; Giuseppe Matullo; Christian Gieger; Jaspal S Kooner; Harald Grallert; John C Chambers Journal: Nature Date: 2016-12-21 Impact factor: 49.962
Authors: Morgan E Levine; Ake T Lu; Austin Quach; Brian H Chen; Themistocles L Assimes; Stefania Bandinelli; Lifang Hou; Andrea A Baccarelli; James D Stewart; Yun Li; Eric A Whitsel; James G Wilson; Alex P Reiner; Abraham Aviv; Kurt Lohman; Yongmei Liu; Luigi Ferrucci; Steve Horvath Journal: Aging (Albany NY) Date: 2018-04-18 Impact factor: 5.682
Authors: Ruth Pidsley; Elena Zotenko; Timothy J Peters; Mitchell G Lawrence; Gail P Risbridger; Peter Molloy; Susan Van Djik; Beverly Muhlhausler; Clare Stirzaker; Susan J Clark Journal: Genome Biol Date: 2016-10-07 Impact factor: 13.583
Authors: Amit V Khera; Mark Chaffin; Krishna G Aragam; Mary E Haas; Carolina Roselli; Seung Hoan Choi; Pradeep Natarajan; Eric S Lander; Steven A Lubitz; Patrick T Ellinor; Sekar Kathiresan Journal: Nat Genet Date: 2018-08-13 Impact factor: 38.330
Authors: Ake T Lu; Austin Quach; James G Wilson; Alex P Reiner; Abraham Aviv; Kenneth Raj; Lifang Hou; Andrea A Baccarelli; Yun Li; James D Stewart; Eric A Whitsel; Themistocles L Assimes; Luigi Ferrucci; Steve Horvath Journal: Aging (Albany NY) Date: 2019-01-21 Impact factor: 5.682
Authors: Paul D Yousefi; Matthew Suderman; Ryan Langdon; Oliver Whitehurst; George Davey Smith; Caroline L Relton Journal: Nat Rev Genet Date: 2022-03-18 Impact factor: 53.242
Authors: Yinan Zheng; Brian T Joyce; Shih-Jen Hwang; Jiantao Ma; Lei Liu; Norrina B Allen; Amy E Krefman; Jun Wang; Tao Gao; Drew R Nannini; Haixiang Zhang; David R Jacobs; Myron D Gross; Myriam Fornage; Cora E Lewis; Pamela J Schreiner; Stephen Sidney; Dongquan Chen; Philip Greenland; Daniel Levy; Lifang Hou; Donald M Lloyd-Jones Journal: Circulation Date: 2022-06-02 Impact factor: 39.918
Authors: Tess D Pottinger; Sadiya S Khan; Yinan Zheng; Wei Zhang; Hilary A Tindle; Matthew Allison; Gretchen Wells; Aladdin H Shadyab; Rami Nassir; Lisa Warsinger Martin; JoAnn E Manson; Donald M Lloyd-Jones; Philip Greenland; Andrea A Baccarelli; Eric A Whitsel; Lifang Hou Journal: Clin Epigenetics Date: 2021-02-25 Impact factor: 7.259
Authors: Kenneth Westerman; Alba Fernández-Sanlés; Prasad Patil; Paola Sebastiani; Paul Jacques; John M Starr; Ian J Deary; Qing Liu; Simin Liu; Roberto Elosua; Dawn L DeMeo; José M Ordovás Journal: J Am Heart Assoc Date: 2020-04-20 Impact factor: 5.501