Literature DB >> 31028273

Mendelian randomization analysis using mixture models for robust and efficient estimation of causal effects.

Abstract

Mendelian randomization (MR) has emerged as a major tool for the investigation of causal relationship among traits, utilizing results from large-scale genome-wide association studies. Bias due to horizontal pleiotropy, however, remains a major concern. We propose a novel approach for robust and efficient MR analysis using large number of genetic instruments, based on a novel spike-detection algorithm under a normal-mixture model for underlying effect-size distributions. Simulations show that the new method, MRMix, provides nearly unbiased or/and less biased estimates of causal effects compared to alternative methods and can achieve higher efficiency than comparably robust estimators. Application of MRMix to publicly available datasets leads to notable observations, including identification of causal effects of BMI and age-at-menarche on the risk of breast cancer; no causal effect of HDL and triglycerides on the risk of coronary artery disease; a strong detrimental effect of BMI on the risk of major depressive disorder.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2019 PMID： 31028273 PMCID： PMC6486646 DOI： 10.1038/s41467-019-09432-2

Source DB: PubMed Journal: Nat Commun ISSN： 2041-1723 Impact factor: 14.919

Introduction

Discoveries of genetic susceptibility variants underlying complex traits continue to increase rapidly with ever-growing size of genome-wide association studies[1-5]. Mendelian randomization (MR)—a form of instrumental variable analysis for the assessment of the causal effect of one trait on another—provides major opportunity for translation of the increasing knowledge of genetics to improve human health[6,7]. MR analysis has already been widely used for obtaining evidence for drug targets, causal basis for epidemiologic associations, and cascading effects among complex molecular traits[8,9]. While MR was originally designed to be used with individual genetic instruments with known biologic functions, the recent trend has been to exploit the multitude of genetic variants emerging from large GWAS. The use of multiple genetic variants can allow to increase the power of MR analysis, correct for weak instrument bias and can provide evidence of causality across a broader set of underlying mechanisms for intervening on the traits[7,8]. When all of the selected variants satisfy the key assumption of MR analysis, i.e., they only have direct effects on one trait, then the causal effect of that trait on the other can be efficiently estimated by meta-analysis of the well-known ratio estimates across the different variants[10]. It is, however, recognized that while the availability of many variants provides an opportunity to strengthen MR analysis, there is potential for bias as the key assumption, i.e., the variants have no direct effect on the second trait, can be violated due to pleiotropy[7,8]. Indeed, recent empirical studies[3,11-16] have unequivocally shown that common variants have wide spread pleiotropic effects, and consequently, polygenic MR analysis can be susceptible to bias. Originally, some methods were developed that allow genetic variants to have pleiotropic effects, but they require the strong InSIDE assumption, i.e. the direct and indirect effects are uncorrelated[16-19]. Under this assumption, any genetic correlation between the two traits, as measured by the selected SNPs, can arise solely due to the underlying causal relationship between the traits. The effects of genetic variants on multiple traits can be correlated when they are mediated through common causal factors. Thus, most recently there has been effort to develop methods that can allow for the presence of invalid instruments which may have complex, possibly correlated pleiotropic effects. In particular, median- and mode-based ratio estimators have been proposed for removing effects of invalid instruments under different assumptions[20,21]. Further, a recent study proposed the use of methods for outlier detection for conducting sensitivity analysis to the presence of invalid instruments[22]. While these and other new methods present important progress, there remain important gaps as they can be susceptible to substantial residual bias in the presence of a large number of invalid instruments or/and can produce estimates of causal effects with large uncertainty. In this article, we propose a novel method for estimation of causal effects in multi-marker MR analysis by taking advantage of a working parametric model for the underlying bivariate effect-size distribution of the SNPs across pairs of traits. The model allows genetic correlation to arise both from causal and non-causal relationships. This model implies the zero modal pleiotropy (ZEMPA) assumption, which is also required by the mode-based estimator. For robust estimation of causal effects, we propose an estimating equation approach that essentially requires maximization of the probability concentration of the residuals, , at the null component of a normal-mixture model (method named MRMix, see Fig. 1 for an overview). We use extensive simulation studies to show that the proposed method can provide much better trade-off between bias and variance than existing estimators in a wide set of scenarios. We apply the proposed and existing methods for conducting MR analysis across a variety of exposures and health outcomes using publicly available summary-statistics from very large GWAS. The analysis reveals important differences across methods and new insights to causal relationships underlying some of these traits.

Fig. 1

Overview of MRMix approach. a Four components of the Mendelian randomization mixture (MRMix) model: (1) Direct effect on X and an indirect effect on Y only through X; (2) direct effects on both X and Y; (3) direct effect on Y but no relationship with X; (4) related to neither X nor Y. Direct effects are denoted by u and u and the causal effect is denoted by θ. Parameters π1, π2, π3, π4 are the mixing probabilities associated with components (1), (2), (3), and (4), respectively; within each component, we assume u and u to either be 0 or follow normal distributions and are the variance-covariance parameters. b Schematic of the MRMix estimation algorithm. The line corresponding to the true causal effect (0.2) maximizes the number of points “close” to the line. Here the regions that are covered by the 95% probability band under the null distribution (shaded area, expressed as ) are highlighted. We plot the line corresponding to three values θ = −0.3, 0.2, 0.48; is the estimated proportion of valid IVs. Source data are provided as a Source Data file

Results

Simulations

Simulation studies show that MRMix can be far more robust compared to existing alternatives in a wide range of scenarios (Figs 2 and 3, Supplementary Figure 1). For example, when genetic correlation due to the causal relationship and pleiotropic effects are in the same direction (Fig. 2), MRMix generally produced nearly unbiased estimates of causal effects as long as the sample size for GWAS for the exposure (X) and the corresponding number of instruments reached a minimum threshold (e.g., n > 100 K, K > 100). The bias was minimal or moderate even when only 25% of the instruments were valid. Among the alternatives, the inverse-variance weighted (IVW) and the Egger regression methods had the largest bias in directions away from null; the weighted median method was less sensitive, but had a considerable bias in many scenarios, and the weighted mode method had least bias. The bias of the weighted mode method was comparable to that of MRMix in most scenarios when the number of valid instruments was 50%, but was substantially more when the number of valid instruments dropped to 25% and the sample size for the GWAS of the outcome (Y) was relatively small (e.g., n = 100 K, n = 33.3 K).

Fig. 2

Fig. 3

Simulation results when genetic correlation due to causal and pleiotropic effects are in opposite directions. Mean and standard deviation of causal estimates are reported over 100 simulations. The true causal effect θ = −0.2. Estimates of association coefficient for SNPs across two traits are simulated assuming an underlying four-component model for effect-size distribution (Scenario A, see Methods), where SNPs could have direct effects on neither traits, only on X, only on Y, or on both with the effects being correlated. The proportion of valid instruments, i.e. the SNPs which have only direct effects on X as a proportion of the total number of SNPs which are associated with X, are fixed at 50% or 25%. n = N: sample size of the study associated with X; n: sample size of the study associated with Y. Standard error bars higher than 60 are truncated and marked with *true-value. The average number of IVs, defined as the SNPs which reach genome-wide significance (z-test p < 5 × 10−8) in the study associated with X, is 14, 105, 399, 1135, and 1780 for N = 50 k, 100 k, 200 k, 500 k and 1000 k, respectively. Source data are provided as a Source Data file

Simulation results when genetic correlation due to causal and pleiotropic effects are in the same direction. Mean and standard deviation of causal estimates are reported over 100 simulations. The true causal effect θ is 0.2. Estimates of association coefficient for SNPs across two traits are simulated assuming an underlying four-component model for effect-size distribution (Scenario A, see Methods), where SNPs could have direct effects on neither traits, only on X, only on Y, or on both with the effects being correlated. The proportion of valid instruments, i.e., the SNPs which have only direct effects on X, as a proportion of the total number of SNPs which are associated with X, are fixed at 50% or 25%. n = N: sample size of the study associated with X; n: sample size of the study associated with Y. Standard error bars higher than 60 are truncated and marked with *true-value. The average number of IVs, defined as the SNPs which reach genome-wide significance (z-test p < 5 × 10−8) in the study associated with X, is 14, 105, 399, 1135, and 1780 for N = 50 k, 100 k, 200 k, 500 k, and 1000 k, respectively. Source data are provided as a Source Data file Simulation results when genetic correlation due to causal and pleiotropic effects are in opposite directions. Mean and standard deviation of causal estimates are reported over 100 simulations. The true causal effect θ = −0.2. Estimates of association coefficient for SNPs across two traits are simulated assuming an underlying four-component model for effect-size distribution (Scenario A, see Methods), where SNPs could have direct effects on neither traits, only on X, only on Y, or on both with the effects being correlated. The proportion of valid instruments, i.e. the SNPs which have only direct effects on X as a proportion of the total number of SNPs which are associated with X, are fixed at 50% or 25%. n = N: sample size of the study associated with X; n: sample size of the study associated with Y. Standard error bars higher than 60 are truncated and marked with *true-value. The average number of IVs, defined as the SNPs which reach genome-wide significance (z-test p < 5 × 10−8) in the study associated with X, is 14, 105, 399, 1135, and 1780 for N = 50 k, 100 k, 200 k, 500 k and 1000 k, respectively. Source data are provided as a Source Data file When the genetic correlations due to causal relationship and pleiotropic effects were in the opposite directions (Fig. 3) MRMix showed more notable bias in estimation of causal effect—the direction of bias was generally towards the null and did not lead to estimates that were in the opposite direction of the true effect. The degree of bias was more when the number of valid instruments was smaller, but the bias steadily disappeared with increasing sample size irrespective of the proportion of valid instruments. In this scenario, the bias of all the other methods were much more severe and sometimes led to average estimates of causal effects in the opposite direction as those of the true effects. As earlier, the weighted mode method was the most robust among the alternatives considered, and yet it produced substantially more biased estimate of causal effect compared to MRMix in a number of scenarios. Simulation studies also reveal MRMix estimates have much higher precision, i.e., smaller standard errors, relative to comparably robust estimators. In particular, the relative efficiency of MRMix compared to the weighted mode estimator, evaluated as the inverse of the ratio of respective variances, reached up to 3–4 fold in some of the settings. As expected, the IVW method generally had the smallest standard errors across all the scenarios, but because it produced severe bias its efficiency is not comparable to that of MRMix. There were also several scenarios where the weighted median estimator had smaller standard errors compared to MRMix, but in all of these cases the former method produced substantially more biased estimates. Finally, across all scenarios, the Egger regression method produced estimates with much larger standard errors than the alternatives. Reverse directional MR analysis shows that MRMix is highly sensitive to the causal direction (Supplementary Figures 2, 3). In contrast, the IVW, Egger regression and the weighted median method often produced estimates of causal effect of substantial magnitude from the outcome (Y) to the exposure (X). The weighted mode method, similar to MRMix, was found to be robust in the regard. In alternative simulation scenario, where we allow SNPs with larger effects to be more likely to be valid IVs, the methods rank similarly as described above, although all the methods tend to be less biased (Supplementary Figures 4, 5). When the effect-sizes are generated from non-normal distributions, MRMix shows similar robustness and efficiency gain compared to MR-mode (Supplementary Figures 6–9). Simulation studies also showed that when the number of selected instruments were large, the analytical formula of standard error through asymptotic theory is generally quite accurate (Supplementary Table 1). When n is between 100 K and 200 K, the estimator is conservative in the sense that it leads to some degree of overestimation of the true standard error.

Data analysis

We summarized the datasets we used in Supplementary Table 2. The MRMix analysis detected significant causal effects of genetically determined LDL-C, BMI, and blood pressure, but not that for HDL-C and triglycerides (TG), on the risk of coronary artery diseases (CAD) (Table 1). There were important differences across the methods in estimates of the causal effect for some of these factors. In particular, both IVW and the weighted median method detected significant causal effects of HDL-C and triglycerides, in directions consistent with known epidemiologic associations. The weighted mode method detected some effect for triglycerides, but the estimate had large standard error and was not statistically significant. The MRMix method estimated the causal effect for both of these lipid factors virtually to be zero. All methods detected causal effect of LDL-C in the expected direction and produced estimate of effect-size in similar range with respect to each other (OR for CAD per SD unit increase in LDL ranged between 1.28 and 1.51), but notably lower than those reported by previous MR analysis based on smaller number of genetic instruments[23]. Almost all methods detected causal effect of blood pressure and BMI in directions consistent with epidemiologic studies and produced estimates of effect-size in similar range. Egger regression and MR-mode yielded substantially wider confidence intervals thus leading to statistically non-significant or borderline significant results.

Table 1

Estimates and 95% confidence intervals for causal effects (log-OR of disease per SD-unit increase in risk-factor) of various putative risk-factors on three disease outcomes

Disease^a	Risk-factors^b	# of IVs^c	MRMix	IVW	Weighted median	Weighted mode	Egger	LDSC^d
CAD	BMI	972	0.39 [0.32, 0.46]	0.34 [0.3, 0.39]	0.36 [0.3, 0.42]	0.23 [−0.02, 0.48]	0.76 [0.44, 1.08]	0.44
	LDL	155	0.33 [0.23, 0.43]	0.28 [0.21, 0.35]	0.28 [0.22, 0.35]	0.25 [0.04, 0.45]	0.41 [0.05, 0.76]	0.14
	HDL	200	−0.01 [−0.12, 0.1]	−0.17 [−0.24, −0.1]	−0.08 [−0.15, −0.02]	−0.01 [−0.2, 0.17]	0.23 [−0.09, 0.54]	−0.28
	TG	128	−0.04 [−0.25, 0.17]	0.24 [0.16, 0.31]	0.19 [0.11, 0.28]	0.14 [−0.14, 0.42]	0.23 [−0.12, 0.58]	0.26
	SBP	215	0.49 [0.33, 0.65]	0.44 [0.34, 0.54]	0.44 [0.35, 0.54]	0.43 [0.17, 0.69]	0.5 [−0.18, 1.18]	0.54
	DBP	237	0.4 [0.24, 0.56]	0.4 [0.31, 0.5]	0.36 [0.28, 0.45]	0.33 [0.08, 0.59]	0.3 [−0.37, 0.97]	0.58
BC	BMI	839	−0.18 [−0.28, −0.08]	−0.1 [−0.16, −0.05]	−0.14 [−0.2, −0.08]	−0.15 [−0.45, 0.14]	−0.14 [−0.5, 0.21]	−0.19
	Height	3794	0.03 [−0.02, 0.08]	0.02 [0, 0.04]	0.02 [0, 0.05]	0.01 [−0.15, 0.17]	0.04 [−0.11, 0.18]	0.08
	LDL	125	0.11 [−0.24, 0.46]	0.06 [−0.01, 0.12]	0.08 [0.01, 0.15]	0.09 [−0.11, 0.3]	0.12 [−0.2, 0.45]	0.02
	HDL	152	0.24 [0, 0.48]	0.09 [0.04, 0.15]	0.11 [0.05, 0.17]	0.13 [0, 0.27]	0.03 [−0.22, 0.27]	0.1
	TG	104	−0.07 [−0.43, 0.29]	−0.06 [−0.13, 0.01]	−0.07 [−0.15, 0]	−0.11 [−0.25, 0.03]	−0.2 [−0.52, 0.11]	−0.04
	Age at menarche	262	−0.13 [−0.28, 0.02]	−0.02 [−0.08, 0.04]	−0.05 [−0.11, 0.02]	−0.08 [−0.25, 0.1]	0.18 [−0.24, 0.59]	0.21
MDD	BMI	971	0.34 [0.09, 0.59]	0.14 [0.09, 0.2]	0.19 [0.12, 0.25]	0.39 [0.01, 0.76]	0.11 [−0.24, 0.45]	0.14
MDD	Years of education	510	−0.18 [−2.18, 1.82]	−0.31 [−0.39, −0.23]	−0.3 [−0.39, −0.2]	−0.23 [−0.57, 0.11]	−0.4 [−0.97, 0.18]	−0.52

aCAD: coronary artery disease; BC: breast cancer; MDD: major depressive disorder

bLDL: low-density lipoprotein cholesterol. HDL: high-density lipoprotein cholesterol. TG: triglycerides. DBP: diastolic blood pressure. SBP: systolic blood pressure

cIVs are defined as SNPs which reach genome-wide significance (z-test p < 5 × 10−8) in the study associated with X

dLDSC: LD score regression estimates of causal effects is defined as , the ratio between the estimated genetic covariance and the estimated heritability of the exposure (see Supplementary Notes for details)

Estimates and 95% confidence intervals for causal effects (log-OR of disease per SD-unit increase in risk-factor) of various putative risk-factors on three disease outcomes aCAD: coronary artery disease; BC: breast cancer; MDD: major depressive disorder bLDL: low-density lipoprotein cholesterol. HDL: high-density lipoprotein cholesterol. TG: triglycerides. DBP: diastolic blood pressure. SBP: systolic blood pressure cIVs are defined as SNPs which reach genome-wide significance (z-test p < 5 × 10−8) in the study associated with X dLDSC: LD score regression estimates of causal effects is defined as , the ratio between the estimated genetic covariance and the estimated heritability of the exposure (see Supplementary Notes for details) The MRMix analysis detected the significant causal effect of genetically determined BMI on the risk of breast cancer (BC). The method also detected suggestive evidence for causal effects for HDL-C and age-at-menarche (AAM), but not those for height, LDL-C, and TG. There were, again, important differences across methods. MRMix inferred negative causal relationship between increased level of BMI and the risk of BC, inconsistent with positive association that is typically seen in epidemiologic studies. A previous MR analysis[24] that used fewer genetic instruments also detected the negative direction of the causal effect, but they reported the estimated effect-sizes to be somewhat stronger (OR for per SD unit increase in BMI reported to be in the range 0.56–0.75 compared to 0.84 by MRMix in the current study). The estimates from all the other methods were also in the negative direction, but those obtained from the weighted mode and Egger regression did not reach statistical significance due to large confidence intervals. MRMix method indicated an increased level of HDL-C could be causally related to higher risk of BC (OR = 1.27 per SD increase in HDL-C level), but the result was only borderline statistically significant. The IVW and weighted median methods also detected these effects in the same direction, but the estimated effect-sizes were notably (by 50%) smaller. The weighted mode and Egger regression methods did not detect the effect to be statistically significant due to large confidence intervals. The MRMix was the only method which detected suggestive evidence of the casual effect of AAM on the risk of BC and the direction of effect was consistent with known epidemiologic association. Intriguingly, one previous study also noted that the standard IVW analysis does not detect any causal effect of AAM on the risk of breast cancer[25]. However, significant causal effect, in the same direction as the MRMix, has been reported in previous MR analysis which had adjusted for genetic relationship between AAM with BMI[25,26]. Finally, none of the methods detected a significant causal relationship between height and risk of BC although epidemiologic studies have consistently reported a positive association. MRMix detected that genetically determined BMI increases the risk of major depressive disorder (MDD). All the other methods detected the same directional effect, but the magnitude of effect-sizes were notably smaller for the IVW, weighted median and Egger regression compared to the weighted mode and the MRMix method. The MRMix estimate for the effect of genetically determined years of education (EDY) on MDD had very large confidence interval and indicated no evidence of statistical significance for the causal effect. In contrast, the IVW and the weighted median methods detected a statistically inverse causal relationship between EDY and MDD. The weighted mode method also estimated the effect to be in the same direction, but the magnitude of the effect was attenuated and did not reach statistical significance.

Discussion

In this article, we develop a novel and powerful method for conducting MR analysis using a large number of genetic instruments based on normal-mixture models for effect-size distribution where distinct mixture components are incorporated to allow genetic correlations to arise both from causal and non-causal relationships. To gain robustness against possible model misspecification, we do not directly rely on the likelihood for model-based inference. Instead, we develop an estimating equation approach, that, in essence, involves estimation of causal effect through maximization of the probability concentration of residuals—defined by the total effect of SNPs on one trait after subtracting off indirect effects through the other trait—at the null component of a two-component normal mixture model. Both simulation studies and extensive data analyses show the method is not only robust, i.e., immune to bias in the presence of a large number of invalid instruments, but also can be highly efficient, i.e., it produces substantially more precise estimates of putative causal effects compared to alternative robust methods. The investigations also show the method is sensitive to the direction of causality and hence suitable for bi-directional MR analysis. Simulation studies clearly demonstrate the superior performance of MRMix compared to a number of existing popularly used methods for MR analysis (Figs 2 and 3, Supplementary Figures 1–9). Stability of the method does require an adequately large sample size for the GWAS of the putative exposure of interest so that the number of instrument available for the analysis is reasonably large (e.g., >50). Once such threshold is exceeded, the method appears to be highly adaptive in dealing with invalid instruments and can maintain excellent trade-off between bias and efficiency compared to other methods. Even in the presence of large number of invalid instruments, the method often produces unbiased estimates of causal effect and, in settings, where there was notable bias, the bias was generally towards null and disappeared with increasing sample size. In the same settings, the alternative methods generally produced much larger bias, sometimes in the directions away from null and the bias always does not diminish with sample size. Among the alternatives considered, the mode-based ratio estimator shows similar level of robustness as MRMix for large sample size, which is intuitive given that both MRMix and mode-base estimator relies on the ZEMPA assumption. In spite of this similarity, for smaller sample size, in several settings, MRMix produces distinctly smaller bias. Further, MRMix clearly produces estimates with much smaller standard errors and this gain in efficiency is more pronounced when the number of valid instruments is larger, demonstrating the ability of the method to more effectively use the valid instruments compared to the weighted mode estimator. Although tuning the bandwidth of the mode-base estimator could improve its stability and efficiency, it is an additional difficulty for the users to deal with. The MR analysis of the causal relationship of age-at-menarche and the risk of breast cancer provides an important empirical illustration of the strength of MRMix. It has been previously reported that genetic correlations between AAM and BC—due to underlying direct causal effect and that due to confounding/mediating effect of BMI—acts in opposite directions. Thus, polygenic MR analysis using standard IVW method could fail to identify the causal relationship[25]. However, when the IVW estimator is adjusted for the relationship of AAM associated SNPs with BMI, evidence of the casual effect of the inverse relationship between AAM and BC risk, consistent with epidemiologic observation, has been reported[26]. Consistent with previous studies, in our analysis, the standard IVW method produced estimate of causal effect for AAM to be virtual null. The MRMix, although did not explicitly account for BMI, produced estimate of causal effect that is similar as those reported from BMI adjusted IVW in previous studies[25,26]. The weighted mode method, while pulled the estimate more towards the right direction compared to IVW, the estimate was attenuated and had very wide confidence intervals. Further, bi-directional MR analysis between BMI and AAM using MRMix suggested that genetically predicted BMI has an inverse causal effect on AAM, and the reverse directional effect is much weaker (Supplementary Tables 3, 4), and thus it appears that the SNPs which are associated with AAM through BMI, which, on its own, influences the risk of BC, are the underlying invalid instruments. The example demonstrates that MRMix has the ability to produce robust and efficient estimate of causal effects in the presence of potentially unobservable confounding factors. Additional data analyses also illustrated the distinct property of MRMix compared to alternatives. In particular, the MRMix method estimated the causal effect of HDL and triglycerides on the risk of CAD to be virtually null, while standard IVW and weighted median methods detected these effects to be significantly away from null in directions consistent with known epidemiologic associations. A recent study reported the significant putative causal effect of years of education on reducing risk of major depressive disorder based on standard IVW analyis[27]. MRMix analysis produced large degree of uncertainty in the underlying estimate of causal effect and did not provide any evidence of statistical significance. MRMix found the causal effect of genetically determined BMI on increasing the risk of MDD to be notably stronger than that is indicated by the traditional IVW method. Thus, it is possible that there are common genetic pathways underlying these traits, which lead to genetic correlation in opposite directions than that is due to the direct causal effect of BMI on MDD. Recently a number of alternative methods have been proposed for conducting robust MR analysis in the presence of invalid instruments. One such method, termed as MR-PRESSO[16], applies outlier detection test to each individual genetic variant and removes potentially invalid instruments. While the method was shown to be highly useful for the detection of bias in reported estimates of causal effects in existing MR analysis, the method can only partially correct for bias and relies on the InSIDE assumption. Further, because the method requires conducting a series of tests and evaluating their significance based on simulations, implementation of it can be time-consuming and estimation of uncertainty associated with the final estimator can be challenging. A related method for instrument selection, termed two stage hard thresholding (TSHT) with voting, constructs many estimates of the set of valid IVs and use majority or plurality voting to make final decisions[28]. This method was proved to be consistent in instrument selection and effect estimation, but requires individual-level data and is not as widely applicable as summary level data based methods. Another method proposes to obtain IVW estimators for all possible subsets of genetic instruments and then combine them with a model averaging method with lower weight given to more heterogeneous subsets[29]. While the method is shown to be highly robust as well as powerful, it is currently not scalable for the analysis of large number of instruments which is the focus of the current study. Another study proposed analysis of the causal relationship between traits based on genetic relationship, but using a different framework than that for the standard MR analysis. The study defined one trait to be partially or fully causal for another, if there is an underlying genetically determined latent variable which influences both traits, but has a stronger relationship with the first than the second[30]. The study defined moment equations for the estimation of parameters quantifying degree of partial causality using GWAS summary-statistics. We believe this novel framework and more traditional MR hypothesis can complement each other to provide an improved understanding of the nature of genetic correlation across traits. The use of latent variable framework, for example, detected evidence of partial causality of several cholesterol traits on blood pressure level. Neither MRMix, nor any of the other methods, detected any evidence of direct causal effects underlying these traits (see Supplementary Table 3). Thus the evidence of partial causality is likely to have been primarily driven by the existence of underlying common genetic pathways which are more strongly related to cholesterol level than blood pressure. The MRMix method has limitations as well. First, the method relies on certain model for underlying effect-size distribution. As we have noted before, the mis-specification of effect-size distribution is not as critical as we do not use the model directly to perform maximum-likelihood estimation, but instead use the model as an efficient way of identifying certain “mode”-based estimator based on underlying estimating equations. Simulation studies show that even when the underlying effect-size distribution has more complexity than the assumed model, the estimation of causal effect parameters can remain relatively robust. Nevertheless, more extensive simulation and theoretical studies are needed to further understand the property of the method under complex but realistic models for effect-size distributions as has been evidenced from recent study[5]. Second, the method does require pre-selection of SNPs as genetic instruments based on p-value in the z-test for the significance of their association. We have observed that as long as the significance threshold is stringent (e.g., p-value <5 × 10−8), there is not substantial winner’s curse bias due to the selection of SNPs and estimation of their coefficients from the same study (Supplementary Table 5). While the method, in principle, can be extended to include SNPs with more liberal threshold, it can suffer from winner’s curse bias unless the SNP selection and coefficient estimation are performed based on independent studies (Supplementary Table 5). Third, in our current study, we have focused on the analysis of independent SNPs selected from GWAS through stringent LD-pruning after prioritizing by p-values. As we perform MR analysis based on the marginal effects of the individual SNPs, some of which may tag multiple underlying causal SNPs, the underlying pattern of LD may cause some bias in MRMix as well the other methods. Further studies are needed to investigate the effect of LD in MR analysis, especially when large number of genetic instruments are used. Fourth, though our simulation settings are flexible and realistic, they assume zero modal pleiotropy. Further studies are needed to investigate the performance in scenarios where ZEMPA is violated or scenarios that favor other methods (e.g., median or Egger regression). Further studies are also merited to explore the property of MRMix under more complex causal structure in the data, such as partial causality[30] and multiple components of causality[29]. In conclusion, MRMix provides a novel tool for conducting robust and powerful MR analysis using large number of genetic instruments that are now rapidly becoming available from the recent expansion of GWAS. We demonstrate through simulation studies, as well as variety of real data analyses, that the method has notable ability to trade-off bias and efficiency for estimation of causal effects in the presence of invalid instruments. Application of MRMix for future MR studies will lead to improved understanding of causal basis of genetic correlation across traits.

Methods

Model setup

We propose a method for two-sample MR analysis that requires only summary-level GWAS association statistics for a putative exposure (X) and the outcome (Y) from separate studies. We describe the proposed method in the context of independent SNPs. Let (β, β), j = 1,2,…M, denote the underlying true association coefficients in a standardized scale for the M SNPs for the exposure (X) and the outcome (Y). The standard MR analysis assumes that all the SNPs are valid instruments, i.e., they are associated with X but have no direct effect on Y. If the assumption is satisfied, then the two sets of regression coefficients will satisfy a proportional relationship in the form β = θβ, where θ is the causal effect of X on Y. The proportional relationship holds if X and Y follow linear models of the form and with the assumption that and . Further, the relationship holds for binary Y if X and Y follow log-linear model and the regression model for X is , where is independently distributed of G. Then we can derive . The proportionality assumption also approximately holds under logistic regression model when the outcome is relatively rare (in the population) under which log-linear and logistic models become similar. In our data analysis for disease outcomes, we assume that the proportionate relationship holds in the log-odds-ratio parameter scale[31]. Instead of assuming the proportional relationship holds across all instruments, we propose modeling the bivariate effect-size distribution using a flexible normal-mixture model where the proportional relationship needs to be satisfied only for a fraction of the genetic variants. We assume a SNP can have four different types of effects: (1) direct effect on X and an indirect effect on Y only through X, (2) direct effects on both X and Y, (3) direct effect on Y but no relationship with X, (4) related to neither X nor Y (Fig. 1a). If we let u and u be the direct effects of a SNP on X and Y, respectively, then we can write β = u and β = u + θu. We also make distributional assumptions for the four types of effects: (1) , (2) , (3) , (4) u = u = 0. The first component includes the valid IVs. The second component includes SNPs in horizontal pleiotropy, i.e., SNPs with potentially correlated effects on X and Y. This component allows violation of the InSIDE assumption as we allow σ ≠ 0. The SNPs in third and fourth components are not associated with X, but can be included when we apply a liberal instrument selection threshold. Note that our model implies the zero modal pleiotropy (ZEMPA) assumption, which is also required by the mode-based estimator[21]. To see this, note that the pleiotropic effect u = 0 in the first and fourth components, and follows continuous distribution in the second and third components. Hence the most common value of u is 0, which is equivalent to the ZEMPA assumption.

MR analysis using mixture-model (MRMix)

In GWAS, we obtain noised estimates and , where one can assume with known standard errors and . In principle, a likelihood for the observed data can be written by integrating over the “prior” model for the bivariate effect-size distribution. However, maximum-likelihood estimation of the target parameter θ, jointly with all of the nuisance parameters may face computational challenges due to identifiability issues associated with mixture likelihoods. Further, the inference can be sensitive to violation of the underlying modeling assumptions. In the following, we propose an alternative estimation procedure that is computationally simple and rely less on the underlying model assumption. This procedure effectively solves a spike-detection problem. Intuitively, we observe that under this model, the true causal effect θ maximizes the number of points “close” to the line , i.e. points for which the vertical distance can be covered by the null distribution . For a different value the null distribution covers a smaller number of points (Fig. 1b). We characterize this observation statistically as follows. The distribution of the residuals can be written at the true (θ) and alternative value under the proposed model in the form See Supplementary Notes for details. Note that when , the first and fourth terms collapse, leading to an enrichment of the point mass at . Only at the true value θ, does have an enriched point mass π1 + π4 at , while for other values this point mass is π4. The enrichment π1 is contributed by the SNPs that have no direct effects on Y, the key assumption underlying instrumental variable (IV) method. Our approach uses this property to identify the causal effect. Based on the above observations, we propose the following estimation procedure: For a fixed , perform maximum-likelihood to fit the two-component normal mixture model in the form to get estimates of unknown parameters as and ; Search over a grid of values and choose the one that maximizes as the estimate, i.e., . Under the working model , π0 is the proportion of valid IVs and σ2 is the unknown variance parameter associated with the invalid IVs. We note that in step (i), for computational simplification, we are only fitting a two-component normal mixture model which is correct when (the true value). When , the two-component model is not correct and can only provide an approximation of the underlying multi-component normal mixture model. We observe in simulation studies that although the model is wrong under the alternative, the proposed estimate has no asymptotic bias. In contrast, a maximum-likelihood estimator, which maximizes the likelihood of the residuals under the two-component normal-mixture model, produces substantially biased estimate of causal effect due to mis-specification of the model under alternative. If the study for X and Y have overlapping subjects, the null component in step (i) can be easily modified to account for correlation in estimated effects. For example, one could use the bivariate LD score regression[13,32] to estimate, , the covariance of GWAS estimates given the effect-sizes. Hence the null component could be modified as to account for sample overlap across studies of X and Y.

Variance estimation

In this section, we use asymptotic theory to derive the standard error of MRMix estimator. Although the inference for spike-detection is non-standard, we can derive an underlying estimating equation by exploiting the fact that θ maximizes , which itself is obtained by maximization of a parametric likelihood. Note that the value of θ that maximizes can be found by solving equation . We can express in terms of the parametric likelihood using implicit function theorem. For each θ, we fit a two-component normal mixture model with log-likelihood to estimate π0 and σ2. The score equations are ∂l/∂π0 = 0,∂l/∂σ2 = 0. Using implicit function theorem, we can show that solving is equivalent to solvingunder conditions and . By taking the Taylor expansion of the estimating function with respect to first θ and then β’s, we obtained an influence function representation of the final estimator (See Supplementary Notes for details). The standard error of can be easily calculated from the influence function representation.

Simulation setup

We conduct extensive simulation studies to evaluate the proposed method under different scenarios. We simulate genome-wide summary statistics of 200,000 independent SNPs. We first simulate the direct effect-sizes (u, u) and compute the total effect-sizes as: β = u and β = u + θu. Then we generated the summary statistics by simulating independently from and , where n and n are the sample sizes for studies associated with X and Y, respectively. This mimics the two-sample MR setup where the exposure and the outcome are measured on independent samples. In all our simulations, we set where we vary to C = 1, 3 with the choice of C = 3 reflects scenarios where the effective sample size for Y can be expected to be lower than the exposure X. We simulated true effect-sizes for the SNPs under the hypothesized four-component model as well as more complex models that include additional mixture components. Under Scenario A, we simulated effect-sizes u and u from the four-component mixture model (Fig. 1a), where we vary the proportion of valid IVs by changing the ratio of π1 and π2 according to the following specifications: 50% causal SNPs for X are valid IVs: π1 = π2 = 0.01. 25% causal SNPs for X are valid IVs: π1 = 0.005,π2 = 0.015. For both cases, we set and σ = 0.5σσ. According to the model, a total of 2% of the SNPs have direct effect on X and either 2% or 2.5% of the SNPs have direct effect on Y; and the overlapping SNPs have strongly correlated effects (correlation = 0.5). The total heritability of X is 20% and that for Y ranges from 18.8% to 28.8%, respectively. We allow the causal effect (θ) of X on Y to be null, or non-null in positive or negative directions so that the genetic correlations due to pleiotropic and causal effects can act in opposite directions. In Scenario B, we simulated effect-sizes using more complex normal mixture model that allows existence of clusters of non-null SNPs with distinctly larger effects than others[5]. In particular, we allow a fraction of causal SNPs for X to have distinctly larger effects and we assume the SNPs that have larger effects are more likely to be valid IVs. Similarly, among SNPs which have direct effects on Y, we allow existence of SNPs which have distinctly larger effects and these SNPs are less likely to have direct effect on X. We allow SNPs with distinct cluster of effect-sizes through incorporation of additional normal-mixture components with varying variance-component parameters (see Supplementary Notes for more details). Finally, in a third scenario (Scenario C), we conduct simulations to study the robustness of the proposed methods when effect-size distribution does not follow normal or mixture-normal forms. In particular, we simulated effect-sizes across X and Y using the same mixture model as depicted in Fig. 1a, but generated effect-sizes under each component using non-normal, but symmetric and unimodal distributions, such as Laplace and T distributions (Supplementary Notes).

Inclusion of SNPs with liberal threshold

In our main analysis, we focused on analysis based on SNPs as instruments that have achieved genome-wide significance in the study associated with X. We further explored the ability of the method to handle additional SNPs below genome-wide significance. When SNPs are included using more liberal threshold, one would expect a fraction of these SNPs to be null. In the presence of null SNPs, the probability concentration of the two-component mixture model for the residuals at the null component is (π1 + π4), where π1 is the proportion of valid instruments and π4 is the proportion of null SNPs. Thus, while the inclusion of more SNPs as potential instruments could lead to increase in efficiency due to increase in the underlying valid instruments, if a very liberal threshold is used, then large value of π4 can obscure estimation of π1. Thus, one would expect there would be an optimal threshold for SNP selection, as is typically observed for building polygenic risk scores for risk-prediction. We varied the p-value threshold in the z-test for instrument selection to 0.005, 5 × 10−4, 5 × 10−6 and studied the bias and standard errors for resulting MRMix estimates. Further, as the winner’s curse problem can create bias when selection of SNPs and estimation of their effects are done based on the same study, we also studied the performance of MRMix when effect-sizes of the SNPs associated with X are estimated based on an independent dataset than the one used to select the SNPs.

Summary level data

We applied MRMix for the analysis of publicly available GWAS summary level data to explore causal relationships underlying a variety of exposure-outcome pairs of interest. We selected these pairs based on available sample sizes and number of underlying instruments, existing evidence of epidemiologic associations in the literature or/and evidence of causality from recent MR studies. On the exposure side, we accessed data for height and body mass index[4], blood lipids[33], education attainment[34], blood pressure[35] and age at menarche[26]. For data analysis, we only selected SNPs to be potential instruments if they reach genome-wide significance (z-test p-value <5 × 10−8) in the respective studies. Further, we used LD-clumping with an r2 threshold of 0.1 to select a set of independent instruments for each trait. The number of instruments across the different exposures varied between 104 and 3794, with the largest numbers being available for height (K = 3794), BMI (K = 972) and years of education (K = 510) due to the availability of results from the large UK Biobank study. On the outcome side, we accessed data for coronary artery disease (CAD)[36] for its analysis in relationship to known major risk factors BMI, blood lipids and blood pressure[37-43]; breast cancer[44] in relationship to several known epidemiologic risk-factors, including height, age-at-menarche, BMI, cholesterol level[45-48]; and major depressive disorder (MDD)[27] in relationship to BMI and years of education[49-51]. In addition, we explored potential causal interrelationships among some of the exposures themselves, such as between BMI and blood pressure, and between BMI and age-at-menarche[52,53]. See Supplementary Table 2 for more information. For all datasets, we only included SNPs among a set of ~1.07 million HapMap3 SNPs that have MAF > 0.05 and have matching alleles in the 1000 Genomes European sample. We set the first allele in 1000G data as effect allele, flipping the sign of the coefficient when necessary. We also removed SNPs whose reported sample sizes were less than 2/3 of the 90th percentile of the sample size distribution across SNPs in respective studies. Finally, we removed SNPs in major histocompatibility complex (MHC) region (26 ~34 Mb on chromosome 6) and SNPs that have very large z score (z2 > 80) to prevent the outliers that may unduly influence the results[54].

Alternative methods

For both simulations and real data applications, we compare MRMix with existing popularly used MR methods that allow the estimation of causal effects. In particular, we included inverse-variance weighted (IVW) method[10], weighted median[20], weighted mode[21] and Egger regression[17]. Further, we observe that if the InSIDE assumption holds across all SNPs, then the LD score regression methodology[13] can be used to estimate the causal effect without any pre-selection of SNPs. In this case, the estimate is simply given by , where ρg is the estimated genetic covariance of the pair of traits and is the estimated heritability of X. This estimator is nearly equivalent to Egger regression using the same set of SNPs (see Supplementary Notes for details). In fact, any method to estimate heritability and genetic correlation can be used in this way to estimate causal effects. Thus, as a benchmark for comparison, in real data analysis, we also report estimates of causal effect based on the LD score regression.

Reporting Summary

Further information on experimental design is available in the Nature Research Reporting Summary linked to this article.

50 in total

1. Pooled analysis of prospective cohort studies on height, weight, and breast cancer risk.

Authors: P A van den Brandt; D Spiegelman; S S Yaun; H O Adami; L Beeson; A R Folsom; G Fraser; R A Goldbohm; S Graham; L Kushi; J R Marshall; A B Miller; T Rohan; S A Smith-Warner; F E Speizer; W C Willett; A Wolk; D J Hunter
Journal: Am J Epidemiol Date: 2000-09-15 Impact factor: 4.897

2. Change in body mass index and its impact on blood pressure: a prospective population study.

Authors: W B Drøyvold; K Midthjell; T I L Nilsen; J Holmen
Journal: Int J Obes (Lond) Date: 2005-06 Impact factor: 5.095

Review 3. 10 Years of GWAS Discovery: Biology, Function, and Translation.

Authors: Peter M Visscher; Naomi R Wray; Qian Zhang; Pamela Sklar; Mark I McCarthy; Matthew A Brown; Jian Yang
Journal: Am J Hum Genet Date: 2017-07-06 Impact factor: 11.025

4. BMI as a Modifiable Risk Factor for Type 2 Diabetes: Refining and Understanding Causal Estimates Using Mendelian Randomization.

Authors: Laura J Corbin; Rebecca C Richmond; Kaitlin H Wade; Stephen Burgess; Jack Bowden; George Davey Smith; Nicholas J Timpson
Journal: Diabetes Date: 2016-07-08 Impact factor: 9.461

5. Mendelian randomization analysis with multiple genetic variants using summarized data.

Authors: Stephen Burgess; Adam Butterworth; Simon G Thompson
Journal: Genet Epidemiol Date: 2013-09-20 Impact factor: 2.135

Review 6. Recent Developments in Mendelian Randomization Studies.

Authors: Jie Zheng; Denis Baird; Maria-Carolina Borges; Jack Bowden; Gibran Hemani; Philip Haycock; David M Evans; George Davey Smith
Journal: Curr Epidemiol Rep Date: 2017-11-22

7. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression.

Authors: Naomi R Wray; Stephan Ripke; Manuel Mattheisen; Maciej Trzaskowski; Enda M Byrne; Abdel Abdellaoui; Mark J Adams; Esben Agerbo; Tracy M Air; Till M F Andlauer; Silviu-Alin Bacanu; Marie Bækvad-Hansen; Aartjan F T Beekman; Tim B Bigdeli; Elisabeth B Binder; Douglas R H Blackwood; Julien Bryois; Henriette N Buttenschøn; Jonas Bybjerg-Grauholm; Na Cai; Enrique Castelao; Jane Hvarregaard Christensen; Toni-Kim Clarke; Jonathan I R Coleman; Lucía Colodro-Conde; Baptiste Couvy-Duchesne; Nick Craddock; Gregory E Crawford; Cheynna A Crowley; Hassan S Dashti; Gail Davies; Ian J Deary; Franziska Degenhardt; Eske M Derks; Nese Direk; Conor V Dolan; Erin C Dunn; Thalia C Eley; Nicholas Eriksson; Valentina Escott-Price; Farnush Hassan Farhadi Kiadeh; Hilary K Finucane; Andreas J Forstner; Josef Frank; Héléna A Gaspar; Michael Gill; Paola Giusti-Rodríguez; Fernando S Goes; Scott D Gordon; Jakob Grove; Lynsey S Hall; Eilis Hannon; Christine Søholm Hansen; Thomas F Hansen; Stefan Herms; Ian B Hickie; Per Hoffmann; Georg Homuth; Carsten Horn; Jouke-Jan Hottenga; David M Hougaard; Ming Hu; Craig L Hyde; Marcus Ising; Rick Jansen; Fulai Jin; Eric Jorgenson; James A Knowles; Isaac S Kohane; Julia Kraft; Warren W Kretzschmar; Jesper Krogh; Zoltán Kutalik; Jacqueline M Lane; Yihan Li; Yun Li; Penelope A Lind; Xiaoxiao Liu; Leina Lu; Donald J MacIntyre; Dean F MacKinnon; Robert M Maier; Wolfgang Maier; Jonathan Marchini; Hamdi Mbarek; Patrick McGrath; Peter McGuffin; Sarah E Medland; Divya Mehta; Christel M Middeldorp; Evelin Mihailov; Yuri Milaneschi; Lili Milani; Jonathan Mill; Francis M Mondimore; Grant W Montgomery; Sara Mostafavi; Niamh Mullins; Matthias Nauck; Bernard Ng; Michel G Nivard; Dale R Nyholt; Paul F O'Reilly; Hogni Oskarsson; Michael J Owen; Jodie N Painter; Carsten Bøcker Pedersen; Marianne Giørtz Pedersen; Roseann E Peterson; Erik Pettersson; Wouter J Peyrot; Giorgio Pistis; Danielle Posthuma; Shaun M Purcell; Jorge A Quiroz; Per Qvist; John P Rice; Brien P Riley; Margarita Rivera; Saira Saeed Mirza; Richa Saxena; Robert Schoevers; Eva C Schulte; Ling Shen; Jianxin Shi; Stanley I Shyn; Engilbert Sigurdsson; Grant B C Sinnamon; Johannes H Smit; Daniel J Smith; Hreinn Stefansson; Stacy Steinberg; Craig A Stockmeier; Fabian Streit; Jana Strohmaier; Katherine E Tansey; Henning Teismann; Alexander Teumer; Wesley Thompson; Pippa A Thomson; Thorgeir E Thorgeirsson; Chao Tian; Matthew Traylor; Jens Treutlein; Vassily Trubetskoy; André G Uitterlinden; Daniel Umbricht; Sandra Van der Auwera; Albert M van Hemert; Alexander Viktorin; Peter M Visscher; Yunpeng Wang; Bradley T Webb; Shantel Marie Weinsheimer; Jürgen Wellmann; Gonneke Willemsen; Stephanie H Witt; Yang Wu; Hualin S Xi; Jian Yang; Futao Zhang; Volker Arolt; Bernhard T Baune; Klaus Berger; Dorret I Boomsma; Sven Cichon; Udo Dannlowski; E C J de Geus; J Raymond DePaulo; Enrico Domenici; Katharina Domschke; Tõnu Esko; Hans J Grabe; Steven P Hamilton; Caroline Hayward; Andrew C Heath; David A Hinds; Kenneth S Kendler; Stefan Kloiber; Glyn Lewis; Qingqin S Li; Susanne Lucae; Pamela F A Madden; Patrik K Magnusson; Nicholas G Martin; Andrew M McIntosh; Andres Metspalu; Ole Mors; Preben Bo Mortensen; Bertram Müller-Myhsok; Merete Nordentoft; Markus M Nöthen; Michael C O'Donovan; Sara A Paciga; Nancy L Pedersen; Brenda W J H Penninx; Roy H Perlis; David J Porteous; James B Potash; Martin Preisig; Marcella Rietschel; Catherine Schaefer; Thomas G Schulze; Jordan W Smoller; Kari Stefansson; Henning Tiemeier; Rudolf Uher; Henry Völzke; Myrna M Weissman; Thomas Werge; Ashley R Winslow; Cathryn M Lewis; Douglas F Levinson; Gerome Breen; Anders D Børglum; Patrick F Sullivan
Journal: Nat Genet Date: 2018-04-26 Impact factor: 38.330

8. A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease.

Authors: Majid Nikpay; Anuj Goel; Hong-Hee Won; Leanne M Hall; Christina Willenborg; Stavroula Kanoni; Danish Saleheen; Theodosios Kyriakou; Christopher P Nelson; Jemma C Hopewell; Thomas R Webb; Lingyao Zeng; Abbas Dehghan; Maris Alver; Sebastian M Armasu; Kirsi Auro; Andrew Bjonnes; Daniel I Chasman; Shufeng Chen; Ian Ford; Nora Franceschini; Christian Gieger; Christopher Grace; Stefan Gustafsson; Jie Huang; Shih-Jen Hwang; Yun Kyoung Kim; Marcus E Kleber; King Wai Lau; Xiangfeng Lu; Yingchang Lu; Leo-Pekka Lyytikäinen; Evelin Mihailov; Alanna C Morrison; Natalia Pervjakova; Liming Qu; Lynda M Rose; Elias Salfati; Richa Saxena; Markus Scholz; Albert V Smith; Emmi Tikkanen; Andre Uitterlinden; Xueli Yang; Weihua Zhang; Wei Zhao; Mariza de Andrade; Paul S de Vries; Natalie R van Zuydam; Sonia S Anand; Lars Bertram; Frank Beutner; George Dedoussis; Philippe Frossard; Dominique Gauguier; Alison H Goodall; Omri Gottesman; Marc Haber; Bok-Ghee Han; Jianfeng Huang; Shapour Jalilzadeh; Thorsten Kessler; Inke R König; Lars Lannfelt; Wolfgang Lieb; Lars Lind; Cecilia M Lindgren; Marja-Liisa Lokki; Patrik K Magnusson; Nadeem H Mallick; Narinder Mehra; Thomas Meitinger; Fazal-Ur-Rehman Memon; Andrew P Morris; Markku S Nieminen; Nancy L Pedersen; Annette Peters; Loukianos S Rallidis; Asif Rasheed; Maria Samuel; Svati H Shah; Juha Sinisalo; Kathleen E Stirrups; Stella Trompet; Laiyuan Wang; Khan S Zaman; Diego Ardissino; Eric Boerwinkle; Ingrid B Borecki; Erwin P Bottinger; Julie E Buring; John C Chambers; Rory Collins; L Adrienne Cupples; John Danesh; Ilja Demuth; Roberto Elosua; Stephen E Epstein; Tõnu Esko; Mary F Feitosa; Oscar H Franco; Maria Grazia Franzosi; Christopher B Granger; Dongfeng Gu; Vilmundur Gudnason; Alistair S Hall; Anders Hamsten; Tamara B Harris; Stanley L Hazen; Christian Hengstenberg; Albert Hofman; Erik Ingelsson; Carlos Iribarren; J Wouter Jukema; Pekka J Karhunen; Bong-Jo Kim; Jaspal S Kooner; Iftikhar J Kullo; Terho Lehtimäki; Ruth J F Loos; Olle Melander; Andres Metspalu; Winfried März; Colin N Palmer; Markus Perola; Thomas Quertermous; Daniel J Rader; Paul M Ridker; Samuli Ripatti; Robert Roberts; Veikko Salomaa; Dharambir K Sanghera; Stephen M Schwartz; Udo Seedorf; Alexandre F Stewart; David J Stott; Joachim Thiery; Pierre A Zalloua; Christopher J O'Donnell; Muredach P Reilly; Themistocles L Assimes; John R Thompson; Jeanette Erdmann; Robert Clarke; Hugh Watkins; Sekar Kathiresan; Ruth McPherson; Panos Deloukas; Heribert Schunkert; Nilesh J Samani; Martin Farrall
Journal: Nat Genet Date: 2015-09-07 Impact factor: 38.330

9. Consistent Estimation in Mendelian Randomization with Some Invalid Instruments Using a Weighted Median Estimator.

Authors: Jack Bowden; George Davey Smith; Philip C Haycock; Stephen Burgess
Journal: Genet Epidemiol Date: 2016-04-07 Impact factor: 2.135

10. Association analysis identifies 65 new breast cancer risk loci.

Authors: Kyriaki Michailidou; Sara Lindström; Joe Dennis; Jonathan Beesley; Shirley Hui; Siddhartha Kar; Audrey Lemaçon; Penny Soucy; Dylan Glubb; Asha Rostamianfar; Manjeet K Bolla; Qin Wang; Jonathan Tyrer; Ed Dicks; Andrew Lee; Zhaoming Wang; Jamie Allen; Renske Keeman; Ursula Eilber; Juliet D French; Xiao Qing Chen; Laura Fachal; Karen McCue; Amy E McCart Reed; Maya Ghoussaini; Jason S Carroll; Xia Jiang; Hilary Finucane; Marcia Adams; Muriel A Adank; Habibul Ahsan; Kristiina Aittomäki; Hoda Anton-Culver; Natalia N Antonenkova; Volker Arndt; Kristan J Aronson; Banu Arun; Paul L Auer; François Bacot; Myrto Barrdahl; Caroline Baynes; Matthias W Beckmann; Sabine Behrens; Javier Benitez; Marina Bermisheva; Leslie Bernstein; Carl Blomqvist; Natalia V Bogdanova; Stig E Bojesen; Bernardo Bonanni; Anne-Lise Børresen-Dale; Judith S Brand; Hiltrud Brauch; Paul Brennan; Hermann Brenner; Louise Brinton; Per Broberg; Ian W Brock; Annegien Broeks; Angela Brooks-Wilson; Sara Y Brucker; Thomas Brüning; Barbara Burwinkel; Katja Butterbach; Qiuyin Cai; Hui Cai; Trinidad Caldés; Federico Canzian; Angel Carracedo; Brian D Carter; Jose E Castelao; Tsun L Chan; Ting-Yuan David Cheng; Kee Seng Chia; Ji-Yeob Choi; Hans Christiansen; Christine L Clarke; Margriet Collée; Don M Conroy; Emilie Cordina-Duverger; Sten Cornelissen; David G Cox; Angela Cox; Simon S Cross; Julie M Cunningham; Kamila Czene; Mary B Daly; Peter Devilee; Kimberly F Doheny; Thilo Dörk; Isabel Dos-Santos-Silva; Martine Dumont; Lorraine Durcan; Miriam Dwek; Diana M Eccles; Arif B Ekici; A Heather Eliassen; Carolina Ellberg; Mingajeva Elvira; Christoph Engel; Mikael Eriksson; Peter A Fasching; Jonine Figueroa; Dieter Flesch-Janys; Olivia Fletcher; Henrik Flyger; Lin Fritschi; Valerie Gaborieau; Marike Gabrielson; Manuela Gago-Dominguez; Yu-Tang Gao; Susan M Gapstur; José A García-Sáenz; Mia M Gaudet; Vassilios Georgoulias; Graham G Giles; Gord Glendon; Mark S Goldberg; David E Goldgar; Anna González-Neira; Grethe I Grenaker Alnæs; Mervi Grip; Jacek Gronwald; Anne Grundy; Pascal Guénel; Lothar Haeberle; Eric Hahnen; Christopher A Haiman; Niclas Håkansson; Ute Hamann; Nathalie Hamel; Susan Hankinson; Patricia Harrington; Steven N Hart; Jaana M Hartikainen; Mikael Hartman; Alexander Hein; Jane Heyworth; Belynda Hicks; Peter Hillemanns; Dona N Ho; Antoinette Hollestelle; Maartje J Hooning; Robert N Hoover; John L Hopper; Ming-Feng Hou; Chia-Ni Hsiung; Guanmengqian Huang; Keith Humphreys; Junko Ishiguro; Hidemi Ito; Motoki Iwasaki; Hiroji Iwata; Anna Jakubowska; Wolfgang Janni; Esther M John; Nichola Johnson; Kristine Jones; Michael Jones; Arja Jukkola-Vuorinen; Rudolf Kaaks; Maria Kabisch; Katarzyna Kaczmarek; Daehee Kang; Yoshio Kasuga; Michael J Kerin; Sofia Khan; Elza Khusnutdinova; Johanna I Kiiski; Sung-Won Kim; Julia A Knight; Veli-Matti Kosma; Vessela N Kristensen; Ute Krüger; Ava Kwong; Diether Lambrechts; Loic Le Marchand; Eunjung Lee; Min Hyuk Lee; Jong Won Lee; Chuen Neng Lee; Flavio Lejbkowicz; Jingmei Li; Jenna Lilyquist; Annika Lindblom; Jolanta Lissowska; Wing-Yee Lo; Sibylle Loibl; Jirong Long; Artitaya Lophatananon; Jan Lubinski; Craig Luccarini; Michael P Lux; Edmond S K Ma; Robert J MacInnis; Tom Maishman; Enes Makalic; Kathleen E Malone; Ivana Maleva Kostovska; Arto Mannermaa; Siranoush Manoukian; JoAnn E Manson; Sara Margolin; Shivaani Mariapun; Maria Elena Martinez; Keitaro Matsuo; Dimitrios Mavroudis; James McKay; Catriona McLean; Hanne Meijers-Heijboer; Alfons Meindl; Primitiva Menéndez; Usha Menon; Jeffery Meyer; Hui Miao; Nicola Miller; Nur Aishah Mohd Taib; Kenneth Muir; Anna Marie Mulligan; Claire Mulot; Susan L Neuhausen; Heli Nevanlinna; Patrick Neven; Sune F Nielsen; Dong-Young Noh; Børge G Nordestgaard; Aaron Norman; Olufunmilayo I Olopade; Janet E Olson; Håkan Olsson; Curtis Olswold; Nick Orr; V Shane Pankratz; Sue K Park; Tjoung-Won Park-Simon; Rachel Lloyd; Jose I A Perez; Paolo Peterlongo; Julian Peto; Kelly-Anne Phillips; Mila Pinchev; Dijana Plaseska-Karanfilska; Ross Prentice; Nadege Presneau; Darya Prokofyeva; Elizabeth Pugh; Katri Pylkäs; Brigitte Rack; Paolo Radice; Nazneen Rahman; Gadi Rennert; Hedy S Rennert; Valerie Rhenius; Atocha Romero; Jane Romm; Kathryn J Ruddy; Thomas Rüdiger; Anja Rudolph; Matthias Ruebner; Emiel J T Rutgers; Emmanouil Saloustros; Dale P Sandler; Suleeporn Sangrajrang; Elinor J Sawyer; Daniel F Schmidt; Rita K Schmutzler; Andreas Schneeweiss; Minouk J Schoemaker; Fredrick Schumacher; Peter Schürmann; Rodney J Scott; Christopher Scott; Sheila Seal; Caroline Seynaeve; Mitul Shah; Priyanka Sharma; Chen-Yang Shen; Grace Sheng; Mark E Sherman; Martha J Shrubsole; Xiao-Ou Shu; Ann Smeets; Christof Sohn; Melissa C Southey; John J Spinelli; Christa Stegmaier; Sarah Stewart-Brown; Jennifer Stone; Daniel O Stram; Harald Surowy; Anthony Swerdlow; Rulla Tamimi; Jack A Taylor; Maria Tengström; Soo H Teo; Mary Beth Terry; Daniel C Tessier; Somchai Thanasitthichai; Kathrin Thöne; Rob A E M Tollenaar; Ian Tomlinson; Ling Tong; Diana Torres; Thérèse Truong; Chiu-Chen Tseng; Shoichiro Tsugane; Hans-Ulrich Ulmer; Giske Ursin; Michael Untch; Celine Vachon; Christi J van Asperen; David Van Den Berg; Ans M W van den Ouweland; Lizet van der Kolk; Rob B van der Luijt; Daniel Vincent; Jason Vollenweider; Quinten Waisfisz; Shan Wang-Gohrke; Clarice R Weinberg; Camilla Wendt; Alice S Whittemore; Hans Wildiers; Walter Willett; Robert Winqvist; Alicja Wolk; Anna H Wu; Lucy Xia; Taiki Yamaji; Xiaohong R Yang; Cheng Har Yip; Keun-Young Yoo; Jyh-Cherng Yu; Wei Zheng; Ying Zheng; Bin Zhu; Argyrios Ziogas; Elad Ziv; Sunil R Lakhani; Antonis C Antoniou; Arnaud Droit; Irene L Andrulis; Christopher I Amos; Fergus J Couch; Paul D P Pharoah; Jenny Chang-Claude; Per Hall; David J Hunter; Roger L Milne; Montserrat García-Closas; Marjanka K Schmidt; Stephen J Chanock; Alison M Dunning; Stacey L Edwards; Gary D Bader; Georgia Chenevix-Trench; Jacques Simard; Peter Kraft; Douglas F Easton
Journal: Nature Date: 2017-10-23 Impact factor: 49.962

42 in total

1. A bidirectional Mendelian randomization study supports causal effects of kidney function on blood pressure.

Authors: Zhi Yu; Josef Coresh; Guanghao Qi; Morgan Grams; Eric Boerwinkle; Harold Snieder; Alexander Teumer; Cristian Pattaro; Anna Köttgen; Nilanjan Chatterjee; Adrienne Tin
Journal: Kidney Int Date: 2020-05-23 Impact factor: 10.612

2. Mendelian randomization and pleiotropy analysis.

Authors: Xiaofeng Zhu
Journal: Quant Biol Date: 2020-10-21

3. Bidirectional two-sample Mendelian randomization analysis identifies causal associations between relative carbohydrate intake and depression.

Authors: Shi Yao; Meng Zhang; Shan-Shan Dong; Jia-Hao Wang; Kun Zhang; Jing Guo; Yan Guo; Tie-Lin Yang
Journal: Nat Hum Behav Date: 2022-07-18

4. Mendelian randomization for causal inference accounting for pleiotropy and sample structure using genome-wide summary statistics.

Authors: Xianghong Hu; Jia Zhao; Zhixiang Lin; Yang Wang; Heng Peng; Hongyu Zhao; Xiang Wan; Can Yang
Journal: Proc Natl Acad Sci U S A Date: 2022-07-05 Impact factor: 12.779

5. Proteins Associated with Risk of Kidney Function Decline in the General Population.

Authors: Morgan E Grams; Aditya Surapaneni; Jingsha Chen; Linda Zhou; Zhi Yu; Diptavo Dutta; Paul A Welling; Nilanjan Chatterjee; Jingning Zhang; Dan E Arking; Teresa K Chen; Casey M Rebholz; Bing Yu; Pascal Schlosser; Eugene P Rhee; Christie M Ballantyne; Eric Boerwinkle; Pamela L Lutsey; Thomas Mosley; Harold I Feldman; Ruth F Dubin; Peter Ganz; Hongzhe Lee; Zihe Zheng; Josef Coresh
Journal: J Am Soc Nephrol Date: 2021-09 Impact factor: 14.978

6. Constrained maximum likelihood-based Mendelian randomization robust to both correlated and uncorrelated pleiotropic effects.

Authors: Haoran Xue; Xiaotong Shen; Wei Pan
Journal: Am J Hum Genet Date: 2021-07-01 Impact factor: 11.043

7. A robust two-sample transcriptome-wide Mendelian randomization method integrating GWAS with multi-tissue eQTL summary statistics.

Authors: Kevin J Gleason; Fan Yang; Lin S Chen
Journal: Genet Epidemiol Date: 2021-04-09 Impact factor: 2.344

8. An iterative approach to detect pleiotropy and perform Mendelian Randomization analysis using GWAS summary statistics.

Authors: Xiaofeng Zhu; Xiaoyin Li; Rong Xu; Tao Wang
Journal: Bioinformatics Date: 2021-06-16 Impact factor: 6.937

9. A comprehensive evaluation of methods for Mendelian randomization using realistic simulations and an analysis of 38 biomarkers for risk of type 2 diabetes.

Authors: Guanghao Qi; Nilanjan Chatterjee
Journal: Int J Epidemiol Date: 2021-08-30 Impact factor: 7.196

10. Association of Interleukin-6 Signaling and C-Reactive Protein With Intracranial Aneurysm: A Mendelian Randomization and Genetic Correlation Study.

Authors: Peng-Peng Niu; Xue Wang; Yu-Ming Xu
Journal: Front Genet Date: 2021-06-08 Impact factor: 4.599