| Literature DB >> 25773750 |
Stephen Burgess1, Robert A Scott, Nicholas J Timpson, George Davey Smith, Simon G Thompson.
Abstract
Finding individual-level data for adequately-powered Mendelian randomization analyses may be problematic. As publicly-available summarized data on genetic associations with disease outcomes from large consortia are becoming more abundant, use of published data is an attractive analysis strategy for obtaining precise estimates of the causal effects of risk factors on outcomes. We detail the necessary steps for conducting Mendelian randomization investigations using published data, and present novel statistical methods for combining data on the associations of multiple (correlated or uncorrelated) genetic variants with the risk factor and outcome into a single causal effect estimate. A two-sample analysis strategy may be employed, in which evidence on the gene-risk factor and gene-outcome associations are taken from different data sources. These approaches allow the efficient identification of risk factors that are suitable targets for clinical intervention from published data, although the ability to assess the assumptions necessary for causal inference is diminished. Methods and guidance are illustrated using the example of the causal effect of serum calcium levels on fasting glucose concentrations. The estimated causal effect of a 1 standard deviation (0.13 mmol/L) increase in calcium levels on fasting glucose (mM) using a single lead variant from the CASR gene region is 0.044 (95 % credible interval -0.002, 0.100). In contrast, using our method to account for the correlation between variants, the corresponding estimate using 17 genetic variants is 0.022 (95 % credible interval 0.009, 0.035), a more clearly positive causal effect.Entities:
Mesh:
Year: 2015 PMID: 25773750 PMCID: PMC4516908 DOI: 10.1007/s10654-015-0011-z
Source DB: PubMed Journal: Eur J Epidemiol ISSN: 0393-2990 Impact factor: 8.082
Fig. 1Schematic diagram outlining the Mendelian randomization approach
Fig. 2Associations with a range of covariates of weighted allele scores based on genetic variants associated with calcium levels for: (top) 17 variants in and around the CASR gene region; (bottom) 10 variants in different gene regions. Estimates are coefficients for the difference in the covariate measured in standard deviations per unit increase in the allele score [a unit increase in the allele score is scaled to be associated with a 1 standard deviation (0.13 mmol/L) increase in calcium levels]. Coefficients are obtained from the EPIC-InterAct dataset using linear regression with adjustment for age, sex and centre. Lines are 95 % confidence intervals
Fig. 3Association of genetic variants with fasting glucose (mM) obtained from publicly-available data from MAGIC consortium against association with calcium levels (mmol/L) obtained from EPIC-InterAct per calcium-increasing allele for: (top) 17 variants in and around the CASR gene region; (bottom) the subset of 6 variants in and around the CASR gene region associated with calcium levels (). Lines represent 95 % confidence intervals
Causal estimates for a 1 standard deviation (0.13 mmol/L) increase in calcium levels on fasting glucose (mM) using genetic variants from in and around the CASR gene region
| Number of variants |
| Causal estimate | 95 % credible interval | |
|---|---|---|---|---|
| All variants | 17 | 3.4 | 0.022 | 0.009, 0.035 |
| Variants associated with calcium at | 6 | 7.9 | 0.028 | −0.003, 0.062 |
| Lead variant only | 1 | 30.6 | 0.044 | −0.002, 0.100 |
Estimates and 95 % credible intervals are estimated from Bayesian likelihood-based method using all 17 measured variants, using the 6 variants associated with calcium in the EPIC-InterAct dataset (p < 0.1), and using the lead variant (rs1801725) only. Partial F statistics are taken from the regression of calcium on the genetic variants in a multivariable regression (with adjustment for age, sex, and centre)