| Literature DB >> 35638254 |
Fatima Batool1, Ashish Patel1, Dipender Gill2,3,4, Stephen Burgess1,5.
Abstract
When genetic variants in a gene cluster are associated with a disease outcome, the causal pathway from the variants to the outcome can be difficult to disentangle. For example, the chemokine receptor gene cluster contains genetic variants associated with various cytokines. Associations between variants in this cluster and stroke risk may be driven by any of these cytokines. Multivariable Mendelian randomization is an extension of standard univariable Mendelian randomization to estimate the direct effects of related exposures with shared genetic predictors. However, when genetic variants are clustered, due to being located in a single genetic region, a Goldilocks dilemma arises: including too many highly-correlated variants in the analysis can lead to ill-conditioning, but pruning variants too aggressively can lead to imprecise estimates or even lack of identification. We propose multivariable methods that use principal component analysis to reduce many correlated genetic variants into a smaller number of orthogonal components that are used as instrumental variables. We show in simulations that these methods result in more precise estimates that are less sensitive to numerical instability due to both strong correlations and small changes in the input data. We apply the methods to demonstrate the most likely causal risk factor for stroke at the chemokine gene cluster is monocyte chemoattractant protein-1.Entities:
Keywords: Mendelian randomization; causal inference; correlated variants; dimension reduction; gene cluster
Mesh:
Substances:
Year: 2022 PMID: 35638254 PMCID: PMC9541575 DOI: 10.1002/gepi.22462
Source DB: PubMed Journal: Genet Epidemiol ISSN: 0741-0395 Impact factor: 2.344
Results from the main simulation study
| Parameter | Method | Pruning | Mean | SD | Mean SE | Power |
|---|---|---|---|---|---|---|
|
| MV‐IVW | Oracle | 0.353 | 0.133 | 0.120 | 81.4 |
| 0.4 | 0.304 | 0.164 | 0.147 | 57.4 | ||
| 0.6 | 0.207 | 0.115 | 0.094 | 60.9 | ||
| 0.8 | −0.083 | 0.417 | 0.051 | 76.5 | ||
| MV‐LIML | Oracle | 0.379 | 0.143 | 0.133 | 81.4 | |
| 0.4 | 0.340 | 0.188 | 0.163 | 58.9 | ||
| 0.6 | 0.316 | 0.212 | 0.103 | 77.0 | ||
| 0.8 | 0.083 | 2.372 | 0.179 | 78.8 | ||
| MV‐IVW‐PCA | – | 0.296 | 0.130 | 0.119 | 69.3 | |
| MV‐LIML‐PCA | – | 0.347 | 0.152 | 0.130 | 74.5 | |
|
| MV‐IVW | Oracle | −0.005 | 0.133 | 0.120 | 7.4 |
| 0.4 | −0.012 | 0.166 | 0.147 | 7.5 | ||
| 0.6 | −0.003 | 0.112 | 0.094 | 9.1 | ||
| 0.8 | 0.037 | 0.408 | 0.051 | 75.4 | ||
| MV‐LIML | Oracle | −0.001 | 0.144 | 0.133 | 6.2 | |
| 0.4 | −0.007 | 0.192 | 0.163 | 7.3 | ||
| 0.6 | 0.006 | 0.186 | 0.103 | 20.2 | ||
| 0.8 | 0.010 | 2.522 | 0.179 | 76.6 | ||
| MV‐IVW‐PCA | – | −0.013 | 0.129 | 0.119 | 7.1 | |
| MV‐LIML‐PCA | – | −0.006 | 0.154 | 0.130 | 8.7 | |
|
| MV‐IVW | Oracle | −0.545 | 0.132 | 0.120 | 98.6 |
| 0.4 | −0.487 | 0.166 | 0.148 | 87.6 | ||
| 0.6 | −0.315 | 0.139 | 0.094 | 86.9 | ||
| 0.8 | 0.220 | 0.418 | 0.051 | 77.8 | ||
| MV‐LIML | Oracle | −0.576 | 0.142 | 0.134 | 98.8 | |
| 0.4 | −0.531 | 0.190 | 0.164 | 88.6 | ||
| 0.6 | −0.451 | 0.212 | 0.103 | 92.6 | ||
| 0.8 | 0.005 | 2.250 | 0.179 | 79.3 | ||
| MV‐IVW‐PCA | – | −0.476 | 0.130 | 0.119 | 96.1 | |
| MV‐LIML‐PCA | – | −0.538 | 0.152 | 0.131 | 97.0 |
Note: Mean estimates, standard deviation (SD) of estimates, mean standard error (mean SE) of estimates, and empirical power of the 95% confidence interval to estimate , , and . We consider four methods, and various pruning thresholds for the MV‐IVW and MV‐LIML methods, plus an oracle setting in which only the 15 variants that truly affect the traits are included in the analysis.
Results from the main simulation study for conditional selection method
| Parameter | Method | Mean | SD | Mean SE | Power |
|---|---|---|---|---|---|
|
| Oracle | 0.357 | 0.129 | 0.120 | 82.5 |
| Cond select | 0.334 | 0.137 | 0.126 | 73.3 | |
| MV‐IVW‐PCA | 0.299 | 0.126 | 0.118 | 70.8 | |
| MV‐LIML‐PCA | 0.349 | 0.150 | 0.130 | 76.3 | |
|
| Oracle | −0.012 | 0.133 | 0.121 | 7.2 |
| Cond select | −0.015 | 0.140 | 0.126 | 7.4 | |
| MV‐IVW‐PCA | −0.018 | 0.126 | 0.119 | 5.8 | |
| MV‐LIML‐PCA | −0.009 | 0.148 | 0.131 | 7.1 | |
|
| Oracle | −0.543 | 0.130 | 0.121 | 98.7 |
| Cond select | −0.515 | 0.142 | 0.127 | 96.5 | |
| MV‐IVW‐PCA | −0.475 | 0.130 | 0.119 | 95.9 | |
| MV‐LIML‐PCA | −0.539 | 0.151 | 0.131 | 96.6 |
Note: Results from oracle, conditional selection, MV‐IVW‐PCA, and MV‐LIML‐PCA methods across 1000 simulated datasets: mean estimates, standard deviation (SD) of estimates, mean standard error (mean SE) of estimates, and empirical power of the 95% confidence interval to estimate , , and .
Results from the main simulation study with a correlation matrix estimated in an independent sample of 10,000 individuals
| Parameter | Method | Pruning | Mean | SD | Mean SE | Power |
|---|---|---|---|---|---|---|
|
| MV‐IVW | 0.4 | 0.303 | 0.166 | 0.147 | 57.1 |
| 0.6 | 0.206 | 0.114 | 0.090 | 62.4 | ||
| MV‐LIML | 0.4 | 0.340 | 0.189 | 0.162 | 59.0 | |
| 0.6 | 0.301 | 0.183 | 0.098 | 76.7 | ||
| MV‐IVW‐PCA | – | 0.295 | 0.130 | 0.118 | 69.2 | |
| MV‐LIML‐PCA | – | 0.347 | 0.153 | 0.130 | 74.7 | |
|
| MV‐IVW | 0.4 | −0.013 | 0.167 | 0.148 | 7.3 |
| 0.6 | −0.005 | 0.114 | 0.090 | 11.2 | ||
| MV‐LIML | 0.4 | −0.009 | 0.193 | 0.163 | 7.3 | |
| 0.6 | 0.003 | 0.174 | 0.098 | 20.8 | ||
| MV‐IVW‐PCA | – | −0.012 | 0.129 | 0.118 | 7.2 | |
| MV‐LIML‐PCA | – | −0.006 | 0.154 | 0.130 | 8.9 | |
|
| MV‐IVW | 0.4 | −0.485 | 0.165 | 0.148 | 86.8 |
| 0.6 | −0.322 | 0.126 | 0.090 | 88.9 | ||
| MV‐LIML | 0.4 | −0.528 | 0.188 | 0.164 | 87.9 | |
| 0.6 | −0.436 | 0.207 | 0.099 | 93.6 | ||
| MV‐IVW‐PCA | – | −0.476 | 0.130 | 0.119 | 96.1 | |
| MV‐LIML‐PCA | – | −0.538 | 0.152 | 0.131 | 97.0 |
Note: Mean estimates, standard deviation (SD) of estimates, mean standard error (mean SE) of estimates, and empirical power of the 95% confidence interval to estimate , , and .
Results from the main simulation study with genetic associations ( ‐coefficients and standard errors) rounded to three decimal places
| Parameter | Method | Pruning | Mean | SD | Mean SE | Power |
|---|---|---|---|---|---|---|
|
| MV‐IVW | 0.4 | 0.305 | 0.165 | 0.147 | 57.6 |
| 0.6 | 0.200 | 0.138 | 0.090 | 59.7 | ||
| MV‐LIML | 0.4 | 0.341 | 0.187 | 0.162 | 59.4 | |
| 0.6 | 0.302 | 0.192 | 0.099 | 75.6 | ||
| MV‐IVW‐PCA | – | 0.296 | 0.130 | 0.119 | 69.2 | |
| MV‐LIML‐PCA | – | 0.346 | 0.153 | 0.130 | 74.7 | |
|
| MV‐IVW | 0.4 | −0.013 | 0.165 | 0.148 | 7.3 |
| 0.6 | −0.012 | 0.131 | 0.090 | 15.1 | ||
| MV‐LIML | 0.4 | −0.009 | 0.190 | 0.163 | 7.1 | |
| 0.6 | −0.001 | 0.183 | 0.099 | 24.4 | ||
| MV‐IVW‐PCA | – | −0.013 | 0.129 | 0.119 | 7.1 | |
| MV‐LIML‐PCA | – | −0.006 | 0.154 | 0.130 | 8.8 | |
|
| MV‐IVW | 0.4 | −0.487 | 0.166 | 0.148 | 87.5 |
| 0.6 | −0.330 | 0.145 | 0.090 | 89.3 | ||
| MV‐LIML | 0.4 | −0.529 | 0.189 | 0.164 | 88.4 | |
| 0.6 | −0.459 | 0.195 | 0.099 | 94.3 | ||
| MV‐IVW‐PCA | – | −0.476 | 0.131 | 0.119 | 96.0 | |
| MV‐LIML‐PCA | – | −0.536 | 0.152 | 0.131 | 96.9 |
Note: Mean estimates, standard deviation (SD) of estimates, mean standard error (mean SE) of estimates, and empirical power of the 95% confidence interval to estimate , , and .
Applied example: effect of three cytokines on stroke risk
| MCP‐1 | MCP‐3 | Eotaxin‐1 | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Method | Pruning | Variants/PCs | Cond number | Estimate (SE) | p Value | Estimate (SE) |
| Estimate (SE) |
|
|
| |||||||||
| MV‐IVW | 0.1 | 20 | 27.7 | 0.091 (0.045) | 0.046 | −0.062 (0.046) | 0.18 | 0.062 (0.082) | 0.45 |
| 0.4 | 75 | 1224 | 0.057 (0.035) | 0.11 | −0.014 (0.024) | 0.55 | 0.110 (0.050) | 0.028 | |
| 0.6 | 151 | 17762 | −0.038 (0.022) | 0.09 | −0.014 (0.017) | 0.41 | 0.040 (0.029) | 0.17 | |
| MV‐IVW‐PCA | 30 | 24.9 | 0.075 (0.041) | 0.071 | −0.029 (0.027) | 0.29 | 0.000 (0.063) | 0.99 | |
|
| |||||||||
| MV‐IVW | 0.1 | 19 | 22.7 | 0.270 (0.095) | 0.005 | 0.151 (0.104) | 0.14 | −0.174 (0.169) | 0.30 |
| 0.4 | 70 | 870 | 0.141 (0.073) | 0.053 | 0.003 (0.051) | 0.96 | −0.132 (0.107) | 0.21 | |
| 0.6 | 145 | 15,790 | 0.089 (0.046) | 0.056 | 0.040 (0.036) | 0.27 | 0.019 (0.062) | 0.76 | |
| MV‐IVW‐PCA | 29 | 25.9 | 0.254 (0.089) | 0.004 | −0.018 (0.065) | 0.78 | −0.108 (0.145) | 0.46 | |
Note: Estimates (standard errors, SE) and p values from MV‐IVW and MV‐IVW‐PCA methods. Variants/PCs indicates the number of genetic variants (MV‐IVW method) or principal components (PCs, MV‐IVW‐PCA method) included in the analysis. Cond number indicates the condition number of the variance‐covariance matrix ; larger numbers signal worse problems due to ill‐conditioning. Estimates represent log odds ratios per 1 standard deviation increase in the cytokine.