Kai Wang1, Shizhong Han2,3. 1. Department of Biostatistics, The University of Iowa, Iowa City, 52242, USA. kai-wang@uiowa.edu. 2. Lieber Institute for Brain Development, Johns Hopkins School of Medicine, Baltimore, 21205, USA. 3. Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, 21205, USA.
Abstract
Mendelian randomization (MR) is becoming more and more popular for inferring causal relationship between an exposure and a trait. Typically, instrument SNPs are selected from an exposure GWAS based on their summary statistics and the same summary statistics on the selected SNPs are used for subsequent analyses. However, this practice suffers from selection bias and can invalidate MR methods, as showcased via two popular methods: the summary data-based MR (SMR) method and the two-sample MR Steiger method. The SMR method is conservative while the MR Steiger method can be either conservative or liberal. A simple and yet more powerful alternative to SMR is proposed.
Mendelian randomization (MR) is becoming more and more popular for inferring causal relationship between an exposure and a trait. Typically, instrument SNPs are selected from an exposure GWAS based on their summary statistics and the same summary statistics on the selected SNPs are used for subsequent analyses. However, this practice suffers from selection bias and can invalidate MR methods, as showcased via two popular methods: the summary data-based MR (SMR) method and the two-sample MR Steiger method. The SMR method is conservative while the MR Steiger method can be either conservative or liberal. A simple and yet more powerful alternative to SMR is proposed.
As a feasible alternative to expensive and sometimes impossible randomized clinical trials, Mendelian randomization (MR) is becoming more and more popular for inferring causal relationship between an exposure and a trait[1-3]. Summary data-based two-sample MR methods often take the following two steps: One appealing feature of these methods is that they only rely on summary statistics on the exposure GWAS and the trait GWAS. Individual-level data are not needed.Obtain instruments (typically SNPs) from exposure GWAS (Genome-Wide Association Study) that are significant at genome-wide level (typically );Investigate the causal relationship between the exposure and the trait, using the summary exposure GWAS statistics at the selected SNPs and a trait GWAS. The summary exposure GWAS statistics are those used in Step 1 for SNP selection.The inference validity of this two-step approach is affected by selection bias. When conducting causal inference in Step 2 with respect to the SNPs selected in Step 1, the summary statistics from the exposure GWAS can not be regarded as random samples for the true population association strength[4-6]. Treating them as random samples leads to over-estimation of the effect size of these SNPs on the exposure. Association strength in a random sample is often much weaker, a phenomenon commonly seen in studies aimed at replicating previous findings. This selection bias has been noted in the literature[6,7]. But its effect on hypothesis testing related to two sample summary data based Mendelian randomization is largely unknown.Two popular MR methods, the summary data-based MR method[2] and the two-sample MR Steiger method[1], are considered. For the summary data-based MR method, the most significant SNP (instead of several SNPs) from a gene is selected as the instrument from the exposure GWAS. For the two-sample MR Steiger method, a SNP significantly associated with both the exposure GWAS and the trait GWAS is selected. The genotype score (0, 1, or 2) at this SNP is denoted by g. The exposure level is denoted by x and the trait value is denoted by y. The Wald statistic on chi-square scale for testing the association between the SNP and the exposure is denoted by . Its value is supposed to be large because it satisfies the selection criterion used in Step 1. For instance, when the selection criterion is , there must be . The Wald statistic for testing the association between the SNP and the trait is denoted by , which is independent of .
Results
Summary data-based MR
Summary data-based MR[2] (SMR) is a popular MR method for inferring causality between x and y. Its null is , where is the true regression coefficient for x with y the response. The two-stage least square (2SLS) estimate of iswhere is the least square estimate of , the regression coefficient for g with x the response, and is the least square estimate of , the regression coefficient for g with y the response. is also known as the Wald ratio[5]. Causal relationship between exposure x and y exists if the following test statistic is significant[2]:where and are Wald statistics on chi-square scale. The null distribution of is approximated by 1-df chi-square using the Delta method[2].There are several issues with statistic . The derivation of its null distribution assumes that is a consistent estimator of and (asymptotically) follows a normal distribution ([2], Online Methods). However, these two conditions do not hold. If the significance level used in Step 1 is , there must be which implies . As a result, the distribution of is not (asymptotically) normal and is not a consistent estimator of . To numerically demonstrate this point, 10,000 random samples of are generated from a 1-df chi-square with a large non-centrality 13 (to make sure there are reasonable number of ). Among them, 322 are significant at genome-wide significant level . The quantile-quantile plot of these selected against 322 random samples from 1-df chi-square with non-centrality 13 is shown in Fig. 1. The distribution of is clearly different from the distribution of .
Figure 1
Quantile-quantile plot for selected (322 out of 10,000) and 322 random . The distribution of selected is different from the distribution of random as shown by the deviation of the points from the 45° line. The vertical line indicates the selection threshold which corresponds to genome-wide significance level .
Quantile-quantile plot for selected (322 out of 10,000) and 322 random . The distribution of selected is different from the distribution of random as shown by the deviation of the points from the 45° line. The vertical line indicates the selection threshold which corresponds to genome-wide significance level .The applicability of the Delta method to approximating the distribution of is in doubt even in the absence of the selection process imposed on . Approximating the null distribution of by a 1-df chi-square is equivalent to approximating the null distribution of by a normal distribution. However, according to Eq. (1), is a ratio of two normals. In general, the distribution of the ratio of two normal variables can not be approximated by a normal as it can take a variety of shapes such as bimodal, unimodal, or asymmetric[8]. It is known that if and are both equal to 0, the distribution of would be a Cauchy, a fat-tailed distribution whose mean and variance do not exist. For the case and , the distribution of can be approximated by a normal only in certain intervals[8].For the case , to our best knowledge, there are no known theoretical results regarding whether the distribution of can be approximated by a normal. The only thing we are sure about is that the distribution of is symmetric because the distribution of is the same as the distribution of . A numerical example is used to examine the distribution of . Ten thousand random ’s are generated from and 10,000 random are generated from N(0, 1). A normal quantile plot of is generated using the qqnorm and qqline functions in R with their default settings and is shown in Fig. 2 (left panel). Similar to a Cauchy distribution, the distribution of is apparently fat-tailed compared to a normal: the lower end is more negative while the upper end is more positive.
Figure 2
Normal quantile plot for 10,000 generated under and . The distribution of appears to be fat-tailed compared to a normal (left panel). The distribution of (436 out of 10,000) seems to be a normal (right panel) due to selection imposed on . See the text for explanation.
A normal quantile plot is also generated for and is shown in the right panel of Fig. 2. It may be surprising that the distribution of appears to be a normal. The reason of this phenomenon is that the range of is greatly reduced under the selection criterion. According to Eq. (1), is roughly proportional to with high probability.Normal quantile plot for 10,000 generated under and . The distribution of appears to be fat-tailed compared to a normal (left panel). The distribution of (436 out of 10,000) seems to be a normal (right panel) due to selection imposed on . See the text for explanation.A more general argument that the approximating distribution of is not 1-df chi-square is the following. Sincethere is regardless of the distribution of . That is, is always dominated by . Similarly, is always dominated by . Therefore, . Since and approximately follow independent 1-df chi-square distributions, the approximating distribution of can not be 1-df chi-square. Neither the approximate distribution of . Using a 1-df chi-square distribution for results in a conservative test.We performed extensive simulations to investigate the null distribution of the SMR statistic in a more realistic setting by using imputed GWAS genotype data from the Atherosclerosis Risk in Communities (ARIC) study of European-ancestry samples[9]. Specifically, we simulated gene expression levels for each Ensemble gene on autosomes at varying numbers of causal eQTLs (n = 1, 5, and 10), (narrow sense) heritability levels ( = 0.1, 0.2, 0.4, 0.8), and sample sizes (N = 250, 500, 1000, and 2000). We tested association between all SNPs within each gene and expression levels of the gene, and only genes whose top associated SNP met the selection criteria () were subjected to SMR test. GWAS association signals were randomly assigned from a standard normal distribution. Figure 3 shows the QQ plot for the SMR statistics when instrumental eQTLs were selected from genes with 5 causal eQTLs and a level of heritability = 0.4 at all four sample sizes. Clearly, the SMR statistics were lower than expected null values at the tail of distribution, though the distribution became closer to the null at larger sample size, which may be explained by the stronger eQTL signals as shown in our numerical example above. The complete set of QQ plots for the SMR test statistic are shown in Supplementary Figs. S1–S12 online. Overall, our simulations showed that the SMR statistics were conservative and did not strictly follow the 1-df chi-squire distribution, especially when the effect size of each individual eQTL was small on average. These results are consistent with our theoretical insights.
Figure 3
Quantile-quantile plot for simulated SMR statistics against statistics of 1-df chi-squire distribution. Instrumental eQTLs for SMR test were top associated eQTL () selected from genes whose expression levels were simulated under a genetic model of 5 causal eQTLs and heritability of 0.4 at four different sample sizes (N = 250, 500, 1000, and 2000). The grey areas represent the 95% confidence band around 1-df chi-square statistics.
Quantile-quantile plot for simulated SMR statistics against statistics of 1-df chi-squire distribution. Instrumental eQTLs for SMR test were top associated eQTL () selected from genes whose expression levels were simulated under a genetic model of 5 causal eQTLs and heritability of 0.4 at four different sample sizes (N = 250, 500, 1000, and 2000). The grey areas represent the 95% confidence band around 1-df chi-square statistics.
More on SMR and a conditional test
One may want to use an estimate of that takes into account the selection. However, such an estimate is not expected to be simple given the complexity of the selection (e.g., the SNP is the most significant one among a number of SNPs). Another alternative is to use another exposure GWAS independent of the exposure GWAS used in Step 1 to estimate and then compute . However, this is not recommended because is inherently conservative. is equal to the half of the harmonic mean of and . Fixing one of and , say , and change , reaches its smallest value when and converges to when . The conservativeness of is also observed in simulation studies by Veturi and Ritchie[10].The null hypothesis for was not specifically defined in Zhu et al.[2]. It is unlikely to be the intended null . Actually, similar to the Sobel’s statistic popular in mediation analysis, the null corresponding to is . For this null, a statistic more powerful than is . The statistic rejects the null if and only if both and are significant. Therefore, whenever rejects the null, will but not vice versa. This is because .A test more powerful than (hence also more powerful than ) in the current situation is a conditional test. Because the SNP is selected for its significant association with the exposure, the situation can be excluded. Given this information, a meaningful null would be for which a test statistic is . The null is rejected when is significant. This test, conditional on a significant statistic, assumes that there is no pleiotropy. That is, the selected SNP affects the trait only through the exposure and there are no other paths. In other words, the selected SNP is a valid instrument. In light of Eq. (1), if and only if when the possibility of is excluded. Hence the null for this conditional test is equivalent to . This test is asymptotically valid because asymptotically follows a 1-df chi-square distribution. The threshold for significance for this test is not at the genome level. Rather, it is at the gene level and only needs to be corrected for the number of genes for which SNPs are selected for instruments. This results in a more powerful testing procedure than using a genome-wide threshold.
An empirical study
We compared the performance of conditional test we proposed and the SMR test on an empirical study of schizophrenia. We used to-date the largest GWAS summary statistics for schizophrenia[11] and the eQTL results from analysis of 1387 brain samples (prefrontal cortex) by the PsychENCODE[12] (downloaded from the SMR data resource website). In total, 9639 genes were tested for SMR at a top associated cis-eQTL () and 65 genes were significant after Bonferroni correction. In contrast, the conditional test, whose test statistic is and considers only those instrumental eQTLs, discovered 127 Bonferroni-significant genes, including 62 genes not detected by SMR (. Supplementary Table S1 online). Among those genes missed by SMR, there were several strong candidates for schizophrenia, such as AKT3[13-15], RGS6[16,17], and KCNN3. It may not be surprising that AKT3 and RGS6 were identified as these two harbored genome-wide significant variants () in original GWAS[11], but the discovery of KCNN3 was novel and the strongest SNP-level association evidence for this gene was only at (rs10796933). Of note, our previous study also showed evidence for the association of KCNN3 with schizophrenia through integrated analysis of GWAS with methylation QTL[18].
Two-sample MR Steiger method
The two-sample MR Steiger method[1,19] assumes that there is a causal relationship between the exposure and the trait and that the selected SNP is a valid instrument for one of them (but it is unknown for which one)[1]. A SNP is selected not only for its association with the exposure but also for its association with the trait[1,19]. The null for the two-sample MR Steiger test is where and are the (population) Pearson correlation coefficients. Let and be the sample correlation coefficients corresponding to and , respectively. Using Fisher’s Z transformation, there arewhere and are sample sizes. The null is equivalent to saying that the mean of is equal to the mean of . The two-sample MR Steiger method uses the following statistic[1,19]:If is significant and positive, the causal direction is from x to y. If is significant and negative, the causal direction is from y to x.However, the statistic does not approximately follow a standard normal distribution because the SNP is selected for its significant p-values. Using a selection criterion , or and greater than 29.71679 on 1-df chi-square scale, the sample correlation coefficients and would be at least 0.4823663, 0.1700451, or 0.05443772 for 100, 1000, or 10,000 given the relationship . Although this selection procedure is useful for selecting the instrument SNP, it imposes a lower limit on and . over-estimates and is not consistent. The mean of the statistic is not around 0 even when holds if . The distribution of is truncated and is not normal. So is the distribution of . The variance of is smaller than due to selection. Similarly, the variance of is smaller than . When , the numerator of is around 0 and is conservative. When , the numerator of is no longer around 0 and is liberal. Overall, the distributions of and are truncated normal instead of normal. The argument that the statistic follows asymptotically a standard normal does not hold. The two-sample MR Steiger method can be either liberal or conservative.Numerical examples are constructed. First we consider the case and demonstrate the effect of selection severity. Ten thousand random samples of and are independently generated from the normal distributions shown in Eqs. (2) and (3). These and form a matrix. The first column contains values for and the second for . Only the rows satisfying and are kept. This selection criterion corresponds to on the p-value scale. When , there are 2557 selected on which the statistic is computed. The sample mean of selected is 0.1903508 while the sample mean of selected () is lower, as expected. A normal quantile-quantile plot of is shown in the left panel of Fig. 4. Clearly the distribution of is different from normal. Type I error rates are inflated. At significance level 0.05 and 0.01, the type I error rates (i.e., the proportion of significant statistics) are 0.08916699 and 0.01486117, respectively. If , the selection is less severe. Almost 75% (7434 out of 10,000) s are selected. Even so, the distribution of shows apparent departure from normal as shown in the right panel of Fig. 4. At significance level 0.05 and 0.01, the type I error rates are 0.0306699 and 0.005111649, respectively. In this case, appears to be conservative.
Figure 4
Normal Q-Q plot of simulated with . are selected from 10,000 replicates at genome-wide significance level .
Normal Q-Q plot of simulated with . are selected from 10,000 replicates at genome-wide significance level .We also considered larger sample sizes. When , and , 2,375 s are selected. When and , 6,410 s are selected. As shown in Fig. 5, there is apparent departure of the distribution of from a normal. At significance level 0.05, the type I error rate is 0.09515789 for the case and is 0.03728549 for . At significance level 0.01, the type I error rates are 0.01515789 and 0.006084243, respectively. The type I error rates can be either inflated or deflated.
Figure 5
Normal Q-Q plot of simulated with . are selected from 10,000 replicates at genome-wide significance level .
Normal Q-Q plot of simulated with . are selected from 10,000 replicates at genome-wide significance level .One remedy would be to estimate and by maximizing the conditional likelihood given the SNP selection criteria. Let and denote the density function and the distribution function of the standard normal, respectively. The likelihood ratio statistic for testing is wherewith and selection thresholds corresponding to and , respectively. However, due to selection, computation of and can be challenging. One alternative method is to use an exposure GWAS and a trait GWAS that are independent of those used to select the SNP. However, such studies may be impractical to obtain[6].
Discussion
Summary statistics MR is subject to selection bias, resulting in excessive false positives (for instance, the MR Steiger method) or missed discoveries (for instance, the SMR method). This bias is a form of winner’s curse. Selection bias has been discussed in the literature in the context of the choice of the instrument SNPs[7], colocalisation test[20], and estimation of exposure effect[5,6].Our work complements previous studies on selection bias due to selection of SNPs. While previous work focused on the effect of this bias on the Wald ratio[5,6] (i.e., estimation), ours focuses on testing whether the exposure causally affects the outcome (i.e., inference). Selection bias leads to underestimation of the Wald ratio[5] but its effect on type I error rate can be either liberal or conservative depending on the MR method used. Most importantly, the SMR method is conservative even in the absence of selection bias where is approximately normal.Correcting for selection bias is a challenging task. Zhao et al.[6] get around this issue by using an independent exposure GWAS. On the other hand, our conditional test, an alternative to the SMR method, uses only the trait GWAS. It may be expanded to accommodate multiple instrumental SNPs and the presence of pleiotropy.Supplementary Information.
Authors: Zhihong Zhu; Futao Zhang; Han Hu; Andrew Bakshi; Matthew R Robinson; Joseph E Powell; Grant W Montgomery; Michael E Goddard; Naomi R Wray; Peter M Visscher; Jian Yang Journal: Nat Genet Date: 2016-03-28 Impact factor: 38.330
Authors: Eugenia Radulescu; Andrew E Jaffe; Richard E Straub; Qiang Chen; Joo Heon Shin; Thomas M Hyde; Joel E Kleinman; Daniel R Weinberger Journal: Mol Psychiatry Date: 2018-11-26 Impact factor: 15.992
Authors: Daifeng Wang; Shuang Liu; Jonathan Warrell; Hyejung Won; Xu Shi; Fabio C P Navarro; Declan Clarke; Mengting Gu; Prashant Emani; Yucheng T Yang; Min Xu; Michael J Gandal; Shaoke Lou; Jing Zhang; Jonathan J Park; Chengfei Yan; Suhn Kyong Rhie; Kasidet Manakongtreecheep; Holly Zhou; Aparna Nathan; Mette Peters; Eugenio Mattei; Dominic Fitzgerald; Tonya Brunetti; Jill Moore; Yan Jiang; Kiran Girdhar; Gabriel E Hoffman; Selim Kalayci; Zeynep H Gümüş; Gregory E Crawford; Panos Roussos; Schahram Akbarian; Andrew E Jaffe; Kevin P White; Zhiping Weng; Nenad Sestan; Daniel H Geschwind; James A Knowles; Mark B Gerstein Journal: Science Date: 2018-12-14 Impact factor: 47.728
Authors: Gibran Hemani; Jie Zheng; Benjamin Elsworth; Tom R Gaunt; Philip C Haycock; Kaitlin H Wade; Valeriia Haberland; Denis Baird; Charles Laurin; Stephen Burgess; Jack Bowden; Ryan Langdon; Vanessa Y Tan; James Yarmolinsky; Hashem A Shihab; Nicholas J Timpson; David M Evans; Caroline Relton; Richard M Martin; George Davey Smith Journal: Elife Date: 2018-05-30 Impact factor: 8.140
Authors: Stephan Ripke; Colm O'Dushlaine; Kimberly Chambert; Jennifer L Moran; Anna K Kähler; Susanne Akterin; Sarah E Bergen; Ann L Collins; James J Crowley; Menachem Fromer; Yunjung Kim; Sang Hong Lee; Patrik K E Magnusson; Nick Sanchez; Eli A Stahl; Stephanie Williams; Naomi R Wray; Kai Xia; Francesco Bettella; Anders D Borglum; Brendan K Bulik-Sullivan; Paul Cormican; Nick Craddock; Christiaan de Leeuw; Naser Durmishi; Michael Gill; Vera Golimbet; Marian L Hamshere; Peter Holmans; David M Hougaard; Kenneth S Kendler; Kuang Lin; Derek W Morris; Ole Mors; Preben B Mortensen; Benjamin M Neale; Francis A O'Neill; Michael J Owen; Milica Pejovic Milovancevic; Danielle Posthuma; John Powell; Alexander L Richards; Brien P Riley; Douglas Ruderfer; Dan Rujescu; Engilbert Sigurdsson; Teimuraz Silagadze; August B Smit; Hreinn Stefansson; Stacy Steinberg; Jaana Suvisaari; Sarah Tosato; Matthijs Verhage; James T Walters; Douglas F Levinson; Pablo V Gejman; Kenneth S Kendler; Claudine Laurent; Bryan J Mowry; Michael C O'Donovan; Michael J Owen; Ann E Pulver; Brien P Riley; Sibylle G Schwab; Dieter B Wildenauer; Frank Dudbridge; Peter Holmans; Jianxin Shi; Margot Albus; Madeline Alexander; Dominique Campion; David Cohen; Dimitris Dikeos; Jubao Duan; Peter Eichhammer; Stephanie Godard; Mark Hansen; F Bernard Lerer; Kung-Yee Liang; Wolfgang Maier; Jacques Mallet; Deborah A Nertney; Gerald Nestadt; Nadine Norton; Francis A O'Neill; George N Papadimitriou; Robert Ribble; Alan R Sanders; Jeremy M Silverman; Dermot Walsh; Nigel M Williams; Brandon Wormley; Maria J Arranz; Steven Bakker; Stephan Bender; Elvira Bramon; David Collier; Benedicto Crespo-Facorro; Jeremy Hall; Conrad Iyegbe; Assen Jablensky; Rene S Kahn; Luba Kalaydjieva; Stephen Lawrie; Cathryn M Lewis; Kuang Lin; Don H Linszen; Ignacio Mata; Andrew McIntosh; Robin M Murray; Roel A Ophoff; John Powell; Dan Rujescu; Jim Van Os; Muriel Walshe; Matthias Weisbrod; Durk Wiersma; Peter Donnelly; Ines Barroso; Jenefer M Blackwell; Elvira Bramon; Matthew A Brown; Juan P Casas; Aiden P Corvin; Panos Deloukas; Audrey Duncanson; Janusz Jankowski; Hugh S Markus; Christopher G Mathew; Colin N A Palmer; Robert Plomin; Anna Rautanen; Stephen J Sawcer; Richard C Trembath; Ananth C Viswanathan; Nicholas W Wood; Chris C A Spencer; Gavin Band; Céline Bellenguez; Colin Freeman; Garrett Hellenthal; Eleni Giannoulatou; Matti Pirinen; Richard D Pearson; Amy Strange; Zhan Su; Damjan Vukcevic; Peter Donnelly; Cordelia Langford; Sarah E Hunt; Sarah Edkins; Rhian Gwilliam; Hannah Blackburn; Suzannah J Bumpstead; Serge Dronov; Matthew Gillman; Emma Gray; Naomi Hammond; Alagurevathi Jayakumar; Owen T McCann; Jennifer Liddle; Simon C Potter; Radhi Ravindrarajah; Michelle Ricketts; Avazeh Tashakkori-Ghanbaria; Matthew J Waller; Paul Weston; Sara Widaa; Pamela Whittaker; Ines Barroso; Panos Deloukas; Christopher G Mathew; Jenefer M Blackwell; Matthew A Brown; Aiden P Corvin; Mark I McCarthy; Chris C A Spencer; Elvira Bramon; Aiden P Corvin; Michael C O'Donovan; Kari Stefansson; Edward Scolnick; Shaun Purcell; Steven A McCarroll; Pamela Sklar; Christina M Hultman; Patrick F Sullivan Journal: Nat Genet Date: 2013-08-25 Impact factor: 38.330