| Literature DB >> 31045282 |
Jue-Sheng Ong1, Stuart MacGregor1.
Abstract
With the advent of very large scale genome-wide association studies (GWASs), the promise of Mendelian randomization (MR) has begun to be fulfilled. However, whilst GWASs have provided essential information on the single nucleotide polymorphisms (SNPs) associated with modifiable risk factors needed for MR, the availability of large numbers of SNP instruments raises issues of how best to use this information and how to deal with potential problems such as pleiotropy. Here we provide commentary on some of the recent advances in the MR analysis, including an overview of the different genetic architectures that are being uncovered for a variety of modifiable risk factors and how users ought to take that into consideration when designing MR studies.Entities:
Keywords: causal inference; genome-wide complex trait analysis-generalized summary mendelian randomization (GCTA-GSMR); mendelian randomization; mendelian randomization pleiotropy RESidual sum and outlier (MR-PRESSO); pleiotropy assessment
Mesh:
Year: 2019 PMID: 31045282 PMCID: PMC6767464 DOI: 10.1002/gepi.22207
Source DB: PubMed Journal: Genet Epidemiol ISSN: 0741-0395 Impact factor: 2.135
Comparison of SNP‐Heterogeneity tests across MR‐PRESSO, gSMR, and classical 2‐sample MR methods
| Method | Formulation of heterogeneity test | Test statistics | Description |
|---|---|---|---|
| GCTA‐GSMR HEIDI |
| The test statistic | Computes SNP‐level heterogeneity only, and uses the HEIDI test to discard outliers (e.g. |
|
| |||
| MR‐PRESSO |
| Empirical | Relies on bootstrap to generate empirical distribution for the causal estimates. The main difference is that it uses a “leave‐one‐out” approach to obtain unbiased RSS values. Evaluates both SNP‐level and global heterogeneity. |
| Note that this can also be rewritten as | |||
|
| |||
| Where | |||
| Mode, median, inverse variance weighted models | Cochran Q test: | Convergence to | Estimates global heterogeneity |
|
| |||
|
| |||
| Modified MR‐Egger | Cochran Q' test: | Convergence to | Fits an additional intercept term before adjusting for directional pleiotropy. Also models global heterogeneity |
|
|
Abbreviations: GSMR, generalized summary mendelian randomization; HEIDI, heterogeneity in dependent instrument; MR‐PRESSO, mendelian randomization pleiotropy residual sum and outlier.
and refer to the SNP‐exposure and SNP‐outcome association estimate.
refers to the wald‐type estimator for SNP , given by .
is the GSMR causal estimate for the SNP at top 25‐th percentile of log(p value) on the SNP‐exposure association. The reason for not using the SNP with the highest log(p value) is to avoid potential SNP‐pleiotropy generating a bias on the test statistics.
is the inverse‐variance weighted estimate for all SNP instruments.
refers to the IVW estimate for all SNP instruments excluding SNP . Finally, is the MR‐Egger intercept of the regression.
Each model in the above also assumes that values follow Gaussian distributions. In other 2‐sample MR models, heterogeneity is often quantified via the Cochran Q test statistics (or Q’ for modified MR Egger).
Comparison of MR estimates for LDL‐cholesterol on coronary artery disease between GCTA‐GSMR and MR‐PRESSO
| Method | Settings | Raw estimate | Outlier adjusted | Runtime, sec | Additional comments | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| Causal estimate | SE |
| Causal estimate | SE |
| SNPs filtered | ||||
| GCTA‐GSMR | LD‐matrix precomputed | NA | NA | NA | 0.432 | 0.022 | 4.15E‐85 | 9 | 2.5 | Runtime was on the basis of the analysis portion only. The computation of the LD‐matrix needs to be done via GCTA separately. |
| MR‐PRESSO | Nb = 1000, outlier | 0.402 | 0.071 | 5.02E‐08 | 0.462 | 0.027 | 5.70E‐36 | 48 | 39.1 | Outlier test unstable with only 1000 simulations to compute the null distribution (i.e. cannot obtain pval of outlier < 0.188). |
| Nb = 10,000, outlier | 0.402 | 0.071 | 5.02E‐08 | 0.408 | 0.028 | 2.56E‐30 | 45 | 375.2 | ||
| Nb = 50,000, outlier | 0.402 | 0.071 | 5.02E‐08 | 0.408 | 0.028 | 2.56E‐30 | 45 | 1871.2 | ||
Abbreviations: GCTA‐GSMR, genome‐wide complex trait analysis‐generalized summary mendelian randomization; MR‐PRESSO, mendelian randomization pleiotropy residual sum and outlier; SNP, single nucleotide polymorphism.
Causal estimate refers to the estimated effect size (log(OR) on coronary artery disease (CAD) risk per standard deviation increase in genetically predicted LDL‐cholesterol (LDL‐c). SE refers to the respective standard errors of the causal estimate. Nb denote the number of simulation replicates required to generate the null distribution used in the MR‐PRESSO outlier tests. The data for these traits were extracted from publicly available GWAS summary statistics (LDL‐c from http://csg.sph.umich.edu/willer/public/lipids2013/; CAD from http://www.cardiogramplusc4d.org/data‐downloads/).
Figure 1Illustrative scenarios for the genetic architecture of modifiable risk factors. The figure above shows the Manhattan plots (left panel) illustrating the different type of genetic architecture for modifiable risk factors used in MR studies. The red line (at y = log10(5e‐8)) indicates the genome‐wide significance (GW) threshold, where variants with a ‐log10(p value) above the line are deemed to be genome‐wide significant. As genome‐wide significant SNPs have F‐statistics > 30, they can be used as viable instruments given the other MR‐assumptions hold. The GWAS for trait A was modeled after coffee consumption; trait B modeled after Alcohol intake; trait C modeled after BMI. Note that the plots above are illustrative and do not represent the current state of knowledge for these traits. GWASs, genome‐wide association studies; SNP, single nucleotide polymorphism
Figure 2Distribution of cumulative SNP variance explained based on different forms of polygenicity in genetic architecture. The x‐axis represents the cumulative variance explained by SNPs (commonly denoted as r^2) for the underlying trait of interest ‐ an important indicator of power for MR analyses. While the y‐axis refers to the number of instruments starting from the SNP with the largest r^2 on the underlying trait. The mixed form is analogous to Scenario 2a in the main text. The change in cumulative variance explained by instruments can be used to evaluate whether there is any marginal benefit (on power) for including more SNP instruments. SNP, single nucleotide polymorphism
Figure 3Flowchart outlining approaches for performing two‐sample Mendelian randomization studies. The flowchart outlines some recommended steps to perform MR sensitivity analyses on the basis of the genetic architecture of the modifiable risk factors (Scenario 1, 2a and 2b). The path highlighted in purple refers to techniques commonly applied for traits in Scenario 1, whereas those highlighted in blue are for traits with a more polygenic architecture. In Scenario 1, SNPs that are (or are from genes) potentially associated with other confounding risk factors should first be removed before the main analysis. For Scenario 2, evidence of SNP‐pleiotropy can be identified via outliers on the MR funnel plot. The main difference between two paths is that methods in Scenario 1 rely on biological knowledge of instruments to evaluate pleiotropy whereas more statistical approaches were utilized in Scenario 2. The MR‐TRYX software can be found here: https://github.com/explodecomputer/tryx. SNP, single nucleotide polymorphism
Selection of commonly used modifiable risk factors in published MR studies
| Modifiable risk factor | Number of instruments | Approximate instrument | PubMed ID |
|---|---|---|---|
| Alcohol intake (European) | 1 | 1% | 28645180; 29212772; 25503943 |
| Alcohol intake (Asian) | 1 | 3% | 27575649 |
| Age at menarche | 375 | 7% | 28436984 |
| Bitter taste liking | 1 | 43% | 23900446 |
| Body mass index | 73–97 | 1.4–2.7% | 29232439; 27401727; 27427428 |
| Coffee consumption | 5 | 0.60% | 29760501 |
| C‐reactive protein | 4 | 2% | 20056955 |
| Calcium | 1 | 1% | 28742912 |
| Dairy intake | 1 | 1% | 28302601; 29071490 |
| Education attainment | 162 | 1.80% | 28855160 |
| Fasting glucose | 37 | 5% | 28954281 |
| Fasting Insulin | 17 | 1% | 28954281 |
| H.pylori susceptibility | 2 | 1% | 29089580 |
| Height | >2000 | 13% | 29581483 |
| High‐density lipoprotein (HDL) | 63 | 14% | 28594918 |
| Hydroxyvitamin‐D | 4 | 3% | 27594614; 29089348; 26305103 |
| Low‐density lipoprotein (LDL) | 50 | 15% | 28594918 |
| Plasma vitamin C | 1 | 1% | 29939348 |
| Plasma urate | 1 | 2% | 28428355 |
| Polyunsaturated fatty acids (multiple) | 2–5 | 8–30% | 29473154; 27490808 |
| Serum iron level | 5 | 4% | 28186534 |
| Smoking heaviness | 1 | 1% | 29509885 |
| Triglyceride | 45 | 12% | 28594918 |
| Tobacco consumption | 1 | 1% | 29688528 |
| Total cholesterol | 65 | 15% | 28594918 |
| Vitamin B12 | 3–11 | 3–6% | 22199995; 29249824 |
| Waist‐to‐hip ratio (both sexes) | 47 | 1.40% | 27550749 |
The table above represents a selection of some of the risk factors considered in MR studies to date. Note that this list is not a complete representation of all the modifiable traits in the MR literature, but merely to show that traits that MR studies with few instruments remain relevant in the field. Selection of studies are on the basis of the criteria that (a) variance explained by instruments (r^2) are reported and (b) total sample size in the outcome set. r^2 are approximated on the basis of sample size and reported F‐statistics if r^2 is not available from previously cited GWASs or the original article itself.