| Literature DB >> 28944551 |
Stephen Burgess1,2, Verena Zuber1,3, Elsa Valdes-Marquez4, Benjamin B Sun2, Jemma C Hopewell4.
Abstract
Mendelian randomization uses genetic variants to make causal inferences about the effect of a risk factor on an outcome. With fine-mapped genetic data, there may be hundreds of genetic variants in a single gene region any of which could be used to assess this causal relationship. However, using too many genetic variants in the analysis can lead to spurious estimates and inflated Type 1 error rates. But if only a few genetic variants are used, then the majority of the data is ignored and estimates are highly sensitive to the particular choice of variants. We propose an approach based on summarized data only (genetic association and correlation estimates) that uses principal components analysis to form instruments. This approach has desirable theoretical properties: it takes the totality of data into account and does not suffer from numerical instabilities. It also has good properties in simulation studies: it is not particularly sensitive to varying the genetic variants included in the analysis or the genetic correlation matrix, and it does not have greatly inflated Type 1 error rates. Overall, the method gives estimates that are less precise than those from variable selection approaches (such as using a conditional analysis or pruning approach to select variants), but are more robust to seemingly arbitrary choices in the variable selection step. Methods are illustrated by an example using genetic associations with testosterone for 320 genetic variants to assess the effect of sex hormone related pathways on coronary artery disease risk, in which variable selection approaches give inconsistent inferences.Entities:
Keywords: Mendelian randomization; allele score; conditional analysis; correlated variants; summarized data
Mesh:
Substances:
Year: 2017 PMID: 28944551 PMCID: PMC5725678 DOI: 10.1002/gepi.22077
Source DB: PubMed Journal: Genet Epidemiol ISSN: 0741-0395 Impact factor: 2.135
Estimates in Motivating Example
| Threshold Correlation | ||||
|---|---|---|---|---|
| Selection Approach | ρ |
| Number of Variants | Estimate (SE) |
| Conditional analysis in independent dataset (Coviello) | – | – | 8 | −0.258 (0.097) |
| GCTA at | – | – | 6 | −0.009 (0.058) |
| GCTA at | – | – | 19 | −0.068 (0.042) |
| Pruning | 0.2 | 0.04 | 8 | −0.110 (0.094) |
| Pruning | 0.4 | 0.16 | 20 | −0.085 (0.067) |
| Pruning | 0.6 | 0.36 | 39 | −0.017 (0.051) |
| Pruning | 0.8 | 0.64 | 62 | −0.137 (0.031) |
| Pruning | 0.9 | 0.81 | 85 | −0.537 (‐) |
| Pruning | 0.95 | 0.9025 | 104 | −1.099 (0.001) |
Estimates (SE) of causal effect of testosterone on CAD risk (estimates are log odds ratios per unit increase in log‐transformed testosterone) from IVW method (accounting for correlation) with variants selected using three different approaches and (for the pruning method) six different threshold correlations (measured by ρ and by r 2).
The variance estimate was negative, indicating that the weighting matrix was not positive definite, meaning that either the standard errors in the weighting matrix were imprecisely estimated, or else were not compatible with the correlation matrix.
Figure 1Estimated genetic associations and 95% confidence intervals with testosterone (nmol/L, then log‐transformed) and with coronary artery disease risk (log odds ratios): (left) for 104 genetic variants included in Mendelian randomization analysis with threshold correlation 0.95 (); (right) for 62 genetic variants with threshold correlation 0.8 ()
Note: The heavy dashed line is the IVW estimate (accounting for correlation between variants).
Simulations Varying Choice of Variants and Correlation Matrix
| Varying Choice ofVariants | Varying Correlation Matrix | |||||
|---|---|---|---|---|---|---|
| Selection Approach | Mean Estimate | SD | Mean SE | Mean Estimate | SD | Mean SE |
| Pruning at | −0.100 | 0.044 | 0.094 | −0.114 | 0.035 | 0.090 |
| Pruning at | −0.093 | 0.032 | 0.078 | −0.074 | 0.027 | 0.065 |
| Pruning at | −0.009 | 0.049 | 0.060 | −0.018 | 0.052 | 0.046 |
| Pruning at | −0.024 | 0.402 | 0.048 | ‐ | – | – |
| PCA at 99% of variance | −0.053 | 0.028 | 0.098 | −0.051 | 0.027 | 0.096 |
| PCA at 99.9% of variance | −0.045 | 0.025 | 0.084 | −0.047 | 0.017 | 0.083 |
Means of estimates, SDs of estimates, and mean SEs for 10,000 iterations based on motivating example: (i) varying the choice of variants and (ii) varying the correlation matrix. Six approaches for selecting genetic variants are performed: four based on pruning at different correlation thresholds (ρ) and two based on PCA.
aExcluding 536 iterations in which the standard error was not defined.
bEstimates were highly variable and the standard error was not defined for a large proportion of iterations.
Simulation Rounding Association Estimates
| Unrounded | Three Decimal Places | Two Decimal Places | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Selection Approach | SD | Mean SE | Power | SD | Mean SE | Power | SD | Mean SE | Power |
| Null causal effect | |||||||||
| Pruning at | 0.080 | 0.079 | 5.0 | 0.080 | 0.080 | 4.9 | 0.086 | 0.077 | 7.3 |
| Pruning at | 0.067 | 0.066 | 5.0 | 0.067 | 0.066 | 5.1 | 0.073 | 0.063 | 9.2 |
| Pruning at | 0.049 | 0.049 | 5.0 | 0.050 | 0.050 | 4.9 | 0.066 | 0.047 | 16.5 |
| Pruning at | 0.027 | 0.022 | 10.5 | 0.175 | 0.022 | 40.8 | 0.418 | 0.020 | 62.2 |
| PCA at 99% of variance | 0.089 | 0.090 | 4.6 | 0.090 | 0.090 | 4.6 | 0.094 | 0.083 | 8.0 |
| PCA at 99.9% of variance | 0.075 | 0.075 | 4.6 | 0.075 | 0.076 | 4.5 | 0.079 | 0.069 | 9.0 |
| Positive causal effect of 0.1 | |||||||||
| Pruning at | 0.080 | 0.079 | 24.8 | 0.080 | 0.080 | 24.6 | 0.086 | 0.077 | 27.9 |
| Pruning at | 0.067 | 0.066 | 33.6 | 0.067 | 0.066 | 33.2 | 0.073 | 0.063 | 37.0 |
| Pruning at | 0.049 | 0.049 | 54.3 | 0.050 | 0.050 | 51.9 | 0.066 | 0.047 | 53.1 |
| Pruning at | 0.027 | 0.022 | 88.8 | 0.172 | 0.022 | 86.7 | 0.644 | 0.020 | 79.3 |
| PCA at 99% of variance | 0.089 | 0.090 | 19.6 | 0.090 | 0.090 | 19.5 | 0.095 | 0.083 | 25.1 |
| PCA at 99.9% of variance | 0.075 | 0.075 | 26.1 | 0.075 | 0.076 | 25.6 | 0.079 | 0.069 | 32.6 |
SD of estimates, mean SEs, and empirical power based on the 95% confidence interval for 10,000 simulated datasets using six approaches for selecting genetic variants. Results are also given on rounding the association estimates to a fixed number of decimal places.