Literature DB >> 28881976

Increasing the power of meta-analysis of genome-wide association studies to detect heterogeneous effects.

C H Lee1, E Eskin2,3, B Han1.   

Abstract

MOTIVATION: Meta-analysis is essential to combine the results of genome-wide association studies (GWASs). Recent large-scale meta-analyses have combined studies of different ethnicities, environments and even studies of different related phenotypes. These differences between studies can manifest as effect size heterogeneity. We previously developed a modified random effects model (RE2) that can achieve higher power to detect heterogeneous effects than the commonly used fixed effects model (FE). However, RE2 cannot perform meta-analysis of correlated statistics, which are found in recent research designs, and the identified variants often overlap with those found by FE.
RESULTS: Here, we propose RE2C, which increases the power of RE2 in two ways. First, we generalized the likelihood model to account for correlations of statistics to achieve optimal power, using an optimization technique based on spectral decomposition for efficient parameter estimation. Second, we designed a novel statistic to focus on the heterogeneous effects that FE cannot detect, thereby, increasing the power to identify new associations. We developed an efficient and accurate p -value approximation procedure using analytical decomposition of the statistic. In simulations, RE2C achieved a dramatic increase in power compared with the decoupling approach (71% vs. 21%) when the statistics were correlated. Even when the statistics are uncorrelated, RE2C achieves a modest increase in power. Applications to real genetic data supported the utility of RE2C. RE2C is highly efficient and can meta-analyze one hundred GWASs in one day.
AVAILABILITY AND IMPLEMENTATION: The software is freely available at http://software.buhmhan.com/RE2C . CONTACT: buhm.han@amc.seoul.kr. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

Entities:  

Mesh:

Year:  2017        PMID: 28881976      PMCID: PMC5870848          DOI: 10.1093/bioinformatics/btx242

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Genome-wide association studies (GWASs) have identified numerous single-nucleotide polymorphisms (SNPs) that are associated with human traits (Manolio, 2010; Welter ). For many diseases, however, the identified variants explain only part of the known heritability, which indicates the existence of undetected variants with small effects (Evangelou and Ioannidis, 2013; Manolio, 2013). To scale up genetic discovery, meta-analysis of GWASs has become a popular tool to augment the sample size (Evangelou and Ioannidis, 2013; Fleiss, 1993; Zeggini and Ioannidis, 2009). Recently, the use of meta-analysis in GWASs has expanded to new research designs, such as combining different related diseases (Kiryluk ; Lee ; Perry ), populations (Liu ), environments (Kang ), tissues (Sul ) and cancer types (Bhattacharjee ; Petersen ). These differences between studies can manifest as heterogeneity, which refers to effect-size differences. When heterogeneity exists, the commonly used fixed effects model (FE) is not optimal. The traditional random effects model (RE) (DerSimonian and Laird, 1986) is also conservative and is not powerful (Han and Eskin, 2011). To overcome this challenge, we recently developed a modified RE (RE2) that has higher power under condition of heterogeneity (Han and Eskin, 2011). RE2 has been used widely in cross-population human disease analyses (Chimusa ; Keller ; Sapkota ), cross-environment mouse trait analyses (Kang ), cross-condition expression quantitative trait loci (eQTL) analyses (Sul ; Ye ), and cross-feature neuroimaging analyses (Hibar ; Stein ). However, RE2 has some limitations. First, RE2 cannot perform meta-analysis of correlated statistics. Although the traditional assumption of independence of statistics has been valid in conventional study designs, it can be invalidated in new research designs. For example, in cross-disease meta-analyses, it is common that some controls are used in more than one study, which can cause correlations of statistics (Dichgans ; Kar ; Moskvina ). Thus, in cross-disease analyses, both heterogeneity and correlations can occur. In a cross-tissue eQTL analysis (Sul ), the intra-individual similarity of gene expression levels between different tissues can cause the correlations of statistics. To account for these correlations, Lin and Sullivan extended FE (Lin and Sullivan, 2009). However, for RE methods, no solutions have been suggested. Recently, Han et al. developed a decoupling approach that makes the statistics independent (Han ). The transformed data can be used for RE2. However, the optimality of this approach has not been evaluated yet. The second limitation of RE2 is that the identified variants by RE2 and FE overlap substantially. This is because RE2 is designed as a stand-alone method that captures variants with and without heterogeneity. However, in most of the meta-analyses of GWASs, it is essential to apply FE before applying RE2, because detecting variants with homogeneous effects is of primary interest. To the best of our knowledge, all investigators who employed RE2 for meta-analyses of GWASs used RE2 coupled with FE. Considering this practical situation, the current implementation of RE2 could be suboptimal. In the present study, we propose a new method, called RE2C, which increases the power of RE2 in two ways. First, we generalized the likelihood model of RE2 to account for correlations of statistics and to achieve optimal power. To estimate the maximum likelihood estimators of parameters efficiently, we developed an optimization procedure based on spectral decomposition of the variance-covariance matrix. Second, we modified the statistic to focus on the heterogeneous effects that cannot be detected by FE. This modification increased the power to identify new associations after the application of FE. The statistic does not follow a known asymptotic distribution; therefore, we developed an efficient and accurate P-value approximation procedure using analytical decomposition of the statistic. In our simulations, RE2C achieved a dramatic increase in power compared with competing approaches, such as the decoupling approach (71% vs. 21%) when the statistics were correlated. Even when the statistics were uncorrelated, RE2C achieved a modest increase in power. Applications to real genetic data demonstrated that RE2C improved the significances of the associated variants. RE2C is efficient and can meta-analyze one hundred GWASs within one day. The software is available at http://software.buhmhan.com/RE2C.

2 Materials and methods

2.1 Existing meta-analysis methods for independent statistics

2.1.1 Fixed effects model

The FE method assumes that the magnitude of the true effect is common or fixed in every study in the meta-analysis. The inverse-variance-weighted effect-size method (Cochran, 1954; de Bakker ; Fleiss, 1993; Mantel and Haenszel, 1959) and the weighted sum-of-z-scores method (de Bakker ; Han and Eskin, 2011; Zaykin, 2011) are used widely. We only describe the former, because the two methods are approximately equivalent (Lee ). Let be the effect-size estimates, such as log odds ratios or regression coefficients, in independent studies. Under the FE model, the observed effect of study i is the sum of the true common effect and the within-study error : If the sample sizes of the studies are sufficiently large, is normally distributed. Let SE() be the standard error of and let  = SE()2. It is common practice to use the estimated sample variance for . Let be the inverse variance. The inverse-variance-weighted effect-size estimator is the sum of weighted with weights : The variance of is It follows that the standard error of is . Note that is minimized only if the weights are inverse variances, which explains the method’s name (Cochran, 1954; Greene, 2012; Lee ). We can then build a summary z-score, which follows under the null hypothesis of no association . The P-value can be calculated as where is the cumulative density function of the standard normal distribution.

2.1.2 Random effects model (traditional)

In contrast to FE, the RE method models heterogeneity explicitly and assumes that the true value of the effect size of each study is sampled from an underlying distribution. Suppose that the distribution has mean and variance 2. The observed effect is then the sum of the common effect and the deviation of the th study’s observed effect from , say (Cochran, 1954) such that where the within-study error is uncorrelated with the true effect sizes . The variance in is the sum of the between-study variance and the within-study variance (Western and Bloome, 2009), The most popular approach to estimate is the method of moments proposed by DerSimonian and Laird (DerSimonian and Laird, 1986, 2015). Given the estimated between-study variance , the RE effect size is calculated similarly to Equation (1): where the weights are now instead of . Note that SE . Similarly to FE, we can construct a z-score statistic and the P-value is The traditional RE approach is equivalent to a likelihood ratio test that assumes the same heterogeneity under both the null and the alternative hypotheses (Han and Eskin, 2011). This assumption can be conservative in GWASs; therefore, RE has limited power in GWASs (Han and Eskin, 2011).

2.1.3 RE2 (Han and Eskin)

Han and Eskin proposed a modified RE method (RE2) that has better power than RE or FE under conditions of effect size heterogeneity (Han and Eskin, 2011). The key difference between RE and RE2 is that the latter assumes no heterogeneity under the null hypothesis. This assumption is appropriate in many situations of GWASs where we expect that the effect sizes are all zero under the null hypothesis. The method is a likelihood ratio test that has the fixed parameters under the null hypothesis, as follows: The roots of the partial derivatives of the equation (3) are not in a closed form; therefore, the maximum likelihood (ML) estimates and must be determined by using an iterative procedure. Hardy and Thompson suggested a simple and efficient procedure based on the Newton–Raphson method (Han and Eskin, 2011; Hardy and Thompson, 1996). Given and , the likelihood ratio statistic can be constructed as follows: The value of is restricted to be non-negative; therefore, as shown by Self and Liang (Self and Liang, 1987), the statistic follows a 50:50 mixture of and asymptotically. Thus, the asymptotic P-value is In practice, because of the small number of studies (N), a tabulated correction is necessary for an accurate P-value. We pre-calculated the P-value table and the P-value is where is the small sample correction factor.

2.2 Existing meta-analysis methods for correlated statistics

2.2.1 The Lin-Sullivan method

Historically, meta-analysis methods focused mainly on summarizing independent estimates. However, in recent research design, the statistics are often correlated, for example, because of overlapping subjects, which is common in cross-disease meta-analysis. Lin and Sullivan (Lin and Sullivan, 2009) developed a meta-analysis solution to account for these correlations. First, they showed that the correlations of statistics could be calculated analytically. For example, in a case/control design, the correlation between statistics of studies i and j is approximated as where , and are the total number of th and th studies and the number of overlapping subjects between the two (th and th), respectively. Subscripts 1 and 0 denote the case and control status. Let be the correlation matrix of . Given , one can easily calculate the variance-covariance matrix, Lin and Sullivan suggested a statistic: where is an vector with ones. The variance is Therefore, one can obtain a z-score as well as a P-value (Lin and Sullivan, 2009). This method does not assume heterogeneity; therefore, it can be considered as an extension of FE to account for correlations.

2.2.2 The decoupling method

Recently, Han et al. (Han ) proposed a method called "decoupling" that can transform correlated data into independent data. As Lin and Sullivan showed, in many situations, the correlation matrix can be approximated analytically before the meta-analysis. Han et al. calculate a transformed covariance structure: where is the vector of standard errors, and diag() is a diagonal matrix whose diagonals are The updated standard errors then become where denotes the th diagonal element of . The data become independent, and thus can be used for RE2 as well as FE. Han et al. showed that when the decoupled data are used for FE, the method is analytically equivalent to the Lin-Sullivan method. Han et al. also showed that under conditions of heterogeneity, RE2 with decoupling (Decoupling-RE2) shows a higher power than FE with decoupling. However, the optimality of Decoupling-RE2 has not been evaluated.

2.3 RE2C

In the present study, we propose RE2C, a powerful random effects method for meta-analysis of GWASs. RE2C is built upon RE2, but with two modifications that improve its power: (1) accounting optimally for correlations, and (2) focusing on heterogeneous effects conditioned on the application of FE. C in RE2C refers to both correlations and conditioning.

2.3.1 Optimizing for the meta-analysis of correlated datasets

We extended the RE2 model to include correlations between statistics. Let be the length n vector denoting the observed effect sizes. Then, we could build a model where are random effects reflecting between-study heterogeneity and are random errors. Given the correlation of statistics , which can be approximated analytically using the Lin-Sullivan approach (Lin and Sullivan, 2009), we have where is the vector of the standard errors. Then, the variance-covariance matrix of is The likelihood functions under the null and alternative hypotheses become To build a likelihood ratio test, we must find the maximum likelihood estimation (MLE) of the parameters and . Previously, for independent statistics, RE2 utilized an iterative procedure suggested by Hardy and Thompson (Hardy and Thompson, 1996). However, their method only considers independent statistics. Therefore, we developed an optimization procedure that can be applied efficiently for both independent and correlated statistics. We chose to use the technique developed for the restricted maximum likelihood (REML) framework. The key idea of our optimization procedure is to transform the two-dimensional search into a one-dimension search using the technique that was developed by Patterson and Thompson (Patterson and Thompson, 1971). A similar technique has been used previously to correct for population stratifications (Kang ). We decomposed the observations using a direct sum, where one of the decomposed observations is the observation for the REML function after integrating out the fixed effects. That is, we decomposed into two matrix-vector multiplications of and such as: where is a transformation matrix of rank and is a transformation matrix of rank. The specific forms of and for our purpose are described below. The properties of the direct sum mean that the log-likelihood function of the mixed model can be decomposed into two log-likelihood functions of independent observations as follows: The projection matrix is an idempotent and symmetrical matrix that integrates out the fixed effects (mean) of the observation . In our problem, matrix is: Here, is a vector of ones of size n. The matrix satisfies , i.e. . Then, matrix becomes: Matrix satisfies the conditions and . Next, we considered the full log-likelihood with the parameters of interest and as follows is an orthogonal projection matrix; therefore, is in the form of , where is an matrix with orthonormal columns, such that . To reduce the complexity of the restricted likelihood function for , Harville (1974) suggested the use of the restricted likelihood function for , where the MLE for the two likelihood functions are the same. As Harville showed, the restricted likelihood can be shown as: where . Let the orthogonal matrix, , be the eigenvectors of the matrix such that is diagonal. Let . The matrix then has the following properties: (i) , (ii) , (iii) and (iv) . Using the spectral decomposition framework, the symmetric matrix can be shown as: where is the eigenvalues of the matrix , where at least one value is zero, and the ) matrix has the eigenvectors associated with as the columns. We use to refer to a vector of ones of size . Note that is equal to . Using the properties of the matrix and , we have Here, we considered the full (not restricted) likelihood function whose is substituted with . For our problem of finding the MLE, this modified function is sufficient, because it satisfies that at the MLE. Note that although we focused on the full likelihood function to build a likelihood ratio test, the same optimization procedure below can be applied to the restricted likelihood function. Following Equation (4), we could define the generalized inverse of the matrix , , which is Next, we could transform into a simpler expression as follows: Thus, the likelihood becomes where the scalar value is the th eigenvalue of the matrix , and is the th component of the vector . Now, the transformation has reduced the number of parameters to one (). Thus, we can use a simple Newton-Raphson procedure to estimate the unknown parameter, . The first and the second derivatives of the transformed log-likelihood functions are: In summary, using this optimization procedure, the parameter estimation needs only the application of the Newton-Raphson method to a single parameter, which is very efficient. Thus, we have a high chance of obtaining the global optimum using a grid search as the starting point for the Newton-Raphson procedure. After we find the MLE, we can build a likelihood ratio test statistic, which follows a 50:50 mixture of and asymptotically.

2.3.2 Focusing on heterogeneous effects

We then modified the test procedure of RE2 to focus on heterogeneous effects. In most meta-analyses of GWASs, detecting variants with homogeneous effects is of primary interest. For this reason, it is often essential to apply FE before applying RE2, while accounting for the increased multiple testing burden. We surveyed the literature that cited and used RE2; at least in all the  papers that we examined, the studies used RE2 coupled with FE. Thus, considering this unique situation of meta-analysis of GWASs, where the prior application of FE is mandatory, we can improve the power of RE2 by focusing on the heterogeneous effects that would not be identified by FE. Specifically, we designed a statistic as follows, In short, this statistic can become significant only if the RE2 P-value is more significant than the FE P-value. Although the statistic looks simple, calculating the P-value of this statistic is non-trivial. Obviously, unlike RE2, this statistic does not follow a known asymptotic distribution. One possible way is to use a resampling approach that samples null z-scores repeatedly. However, P-values typically observed in GWASs are extremely small . To estimate such a small P-value using resampling, a large number of samplings are required. Thus, in GWASs where millions of markers are analyzed, resampling can be very slow. To approximate the P-value of the new method efficiently, we used the following strategy. Recall that the RE2 statistic is a likelihood ratio statistic that measures the difference between the two likelihoods: L0 in Equation (2) and L1 in Equation (3). We introduced an intermediate likelihood function, which is similar to L1, but with a restriction of Then, the RE2 statistic can be decomposed into the sum of the difference between L0 and Lint and the difference between Lint and L1, as follows (Han and Eskin, 2011): where Ø indicates an empty set. The first statistic, , is equal to the square of the FE statistic (). The second statistic, , tests for nonzero between-study variance, similar to the Cochran’s Q test. The two statistics are independent under the null hypothesis (Self and Liang, 1987). Asymptotically, follows , and follows a 50:50 mixture of 0 and . However, the conditions for them to follow their asymptotic distributions are different. Under the assumption that the effect size () follows a normal distribution due to a large sample in each study, which is the case in GWASs, follows regardless of the number of studies (N). However, even under the normality assumption, follows a 50:50 mixture of 0 and only if N is large. N is small in typical meta-analysis of GWAS; therefore, the true distribution of can deviate greatly from the asymptotic distribution. For our method, we approximated and tabulated the distribution of empirically for every possible N. In the previous section, we extended the RE2 model to account for correlations between statistics. Equation (5) can also be decomposed into two parts, where is the Lin-Sullivan estimator of μ, which is . is equivalent to the square of the z-score of the Lin-Sullivan method in this situation. Now that the RE2 statistic can be decomposed into and whose null distributions are known, given an observed RE2 statistic, its P-value can be interpreted as an integral over a region in the two-dimensional space. Specifically, in Figure 1, the RE2 P-value is the volume of the region excluding the bottom left triangle (i.e. region ). However, in RE2C, we only consider the region where . Thus, for each , we can search for that would satisfy , or Let this lower boundary of that satisfies be . This boundary is plotted as a dashed line in Figure 1. Then, given an observed RE2C statistic , we calculated the P-value as follows. We divided the range of into K small bins (e.g. 1000 bins in [0,50]), denoted as (). The approximated P-value is where is the width of the bins. That is, we calculated the probability that would be large enough to satisfy for every bin of , and integrated them together. We took the maximum function because if is smaller than , then by definition. Thus, we calculated the volume of region A in Figure 1. As a result, it always satisfies the equation: as long as , because we have removed region B in Figure 1. This shows that the RE2C P-value can never be less significant than the RE2 P-value when those methods are used coupled with FE, for the variants with . Note that the calculation of the P-value is efficient because we have pre-calculated for every x and N and the cumulative density function of for every N. Thus, the computational complexity is only O(K). Moreover, the complexity is not dependent on how small the P-value is, unlike in the resampling approaches.
Fig. 1

Two-dimensional representation of and . Given the observed statistic , is the probability in area A, while is the probability in areas A and B

Two-dimensional representation of and . Given the observed statistic , is the probability in area A, while is the probability in areas A and B

3 Results

3.1 Simulations

We evaluated the performance of RE2C using simulations. We assumed seven studies, each of which comprised individuals, half of which were controls and half were cases. We assumed a SNP with a minor allele frequency (MAF) of 0.1, following the Hardy-Weinberg equilibrium.

3.1.1 False positive rate

We assumed the null hypothesis of no association and evaluated the false positive rate of RE2C. We repeated the null simulations 109 times and estimated the false positive rate as the proportion of the repeats whose P-value was , where . Table 1 shows that the false positive rates of RE2C were well calibrated. We then assumed that the statistics were correlated, with a correlation coefficient ρ = 0.4. The false positive rates for the correlated statistics were also controlled (Table 1). There was a slight conservative tendency, which was possibly caused by the errors in our approximation of P-values using bins. However, the discrepancies were very small.
Table 1

False positive rates of RE2C

α5.0·10−25.0·10−45.0·10−65.0·10−8
Independent input4.8·10−24.8·10−44.7·10−65.5·10−8
Correlated input (ρ = 0.4)4.7·10−24.6·10−44.5·10−64.0·10−8
False positive rates of RE2C

3.1.2 Power for independent statistics

We compared the powers of FE, RE2 and RE2C. We generated 10 000 sets for meta-analysis, where we again assumed seven studies with sample size equal to 2000 and a MAF of 0.1. In our simulations, we considered the practical situations that FE was already applied before the application of RE2 or RE2C. Thus, we considered the combined use of RE2 (or RE2C) with FE where multiple tests were accounted. Specifically, the power of FE was the proportion of the sets whose P-value exceeded the genome-wide threshold . The power of RE2 (or RE2C) was the proportion of the sets whose FE or RE2 (RE2C) P-value exceeded . To model the effect size heterogeneity in our simulations, we assumed four different effect size distributions. Let μ be a specific, assumed target log odds ratio. The four distributions were as follows, in order of increasing amount of heterogeneity. First, we assumed a unimodal distribution that was a normal distribution with mean and standard error μ, truncated to [0, 2μ]. Second, we assumed a uniform distribution spanning . Third, we assumed a bimodal distribution that followed N(0, μ2) truncated to [0, μ] with one half probability, and N(2μ, μ2) truncated to [μ, 2μ] with another half probability. These three distributions all had mean μ. Finally, we assumed a distribution representing opposite effects, which followed N(-1.2μ, μ2) with one-half probability and N(1.2μ, μ2) with another half probability. Although opposite effects between studies can be rare in genetic studies of the same disease, they can occur in cross-disease meta-analyses or cross-tissue eQTL analyses. Once we assumed one of the distributions above, we randomly sampled , the log odds ratio in study , from the distribution. We then sampled the minor allele counts in control and case samples assuming the control and case MAF, respectively. The control MAF was assumed to be the same as the population MAF (0.1), assuming a very small prevalence, and the case MAF was . For effective comparisons of power, we adjusted for each distribution such that the power of the most powerful method was approximately 70%. Figure 2 shows the power comparison results. The powers of RE2 and RE2C are shown as stacked bars. We assumed a prior application of FE to random effect methods; therefore, we applied a different color scheme to the proportion of datasets determined as significant by FE (light grey) and the proportion of datasets where the random effect methods newly identified as significant (dark grey). Note that the height of light grey bar is slightly shrunk in RE2/RE2C compared in FE, because the significance level was adjusted to one-half. As the heterogeneity increases, the combined use of the random effect methods with FE gave increasingly higher powers than compared with using FE alone, as expected. Under all tested scenarios of effect size distributions, RE2C was the most powerful. RE2C increased power of RE2 by 1.55, 1.85, 2.07 and 2.98% for unimodal, uniform, bimodal and opposite effects, respectively. Although the increase in the absolute amount of power was modest, the increase in relative power gain compared with FE was non-negligible. For example, in the unimodal distribution, the power gain of RE2C from FE was 1.71%, which was more than 10 times greater than that of RE2 (0.16%).
Fig. 2

Power of FE, RE2 and our new RE2C method for the meta-analysis of independent statistics. Assuming the statistics are independent, we simulated various effect size distributions with differing amounts of heterogeneity. We considered the scenario that RE2 or RE2C is additionally applied to FE while accounting for multiple testing. The power of RE2 and RE2C are shown as two-color stacked bars, where we colored the proportion identified by FE as significant in light grey and the proportion that RE2/RE2C additionally identified as significant in dark grey

Power of FE, RE2 and our new RE2C method for the meta-analysis of independent statistics. Assuming the statistics are independent, we simulated various effect size distributions with differing amounts of heterogeneity. We considered the scenario that RE2 or RE2C is additionally applied to FE while accounting for multiple testing. The power of RE2 and RE2C are shown as two-color stacked bars, where we colored the proportion identified by FE as significant in light grey and the proportion that RE2/RE2C additionally identified as significant in dark grey

3.1.3 Power for correlated statistics

Using a similar simulation scheme, we evaluated the power of RE2C under the situation that the statistics were correlated. After we sampled the effect sizes of the studies, we generated the observed effect sizes assuming that they were correlated with correlation coefficient ρ. We assumed ρ = 0.1 and ρ = 0.4, and calculated the power for each setting. The value ρ = 0.4 was derived from assuming a cross-disease analysis with 2000 cases and 3000 shared controls (Wellcome Trust Case Control Consortium, 2007). The competing approaches in this simulation were the Lin-Sullivan (LS) and the Decoupling-RE2 (DR2) methods. As described in the Methods, the Lin-Sullivan method is an extended FE method to account for correlations. Decoupling-RE2 refers to the application of the transformed data by decoupling approach, which became independent, to RE2. Figure 3 shows that RE2C outperformed the other methods greatly in all scenarios of effect size distributions and correlations. For example, for the uniform distribution where ρ = 0.4, RE2C achieved 71% power while the power of Lin-Sullivan method and Decoupling-RE2 were only 23.8% and 21.4% respectively. Surprisingly, Decoupling-RE2 performed poorly even for large heterogeneity when the correlations were high (ρ = 0.4). This demonstrates that although the application of the decoupled data to RE2 is possible, it may not provide optimal power.
Fig. 3

Power of Lin-Sullivan (LS), Decoupling-RE2 (DR2) and our new RE2C method for meta-analyzing correlated statistics. Assuming statistics are correlated with correlation coefficient ρ, we simulated various effect size distributions with differing amount of heterogeneity. We considered the scenario that DR2 or RE2C is additionally applied to LS while accounting for multiple testing. DR2 and RE2C power is shown as two-color stacked bars, where we colored the proportion that LS was significant in light grey and the proportion that DR2/RE2C additionally identified as significant in dark grey

Power of Lin-Sullivan (LS), Decoupling-RE2 (DR2) and our new RE2C method for meta-analyzing correlated statistics. Assuming statistics are correlated with correlation coefficient ρ, we simulated various effect size distributions with differing amount of heterogeneity. We considered the scenario that DR2 or RE2C is additionally applied to LS while accounting for multiple testing. DR2 and RE2C power is shown as two-color stacked bars, where we colored the proportion that LS was significant in light grey and the proportion that DR2/RE2C additionally identified as significant in dark grey

3.2 Applications to real data

We wanted to evaluate the utility of RE2C for real data. To this end, we used the cross-disease analysis data of Moskvina et al. (Moskvina ) who performed a meta-analysis of association results for the Alzheimer's disease (AD) and the Parkinson’s disease (PD). Moskvina et al. examined the meta-analysis P-values of 10 loci known to be associated with AD and 18 loci known to be associated with PD. The two diseases shared some controls; therefore, there were correlations between the statistics of the two diseases. To account for these correlations, Moskvina et al. used the Lin-Sullivan method. However, the same variant may have differing effects on the two diseases. Therefore, random effect methods might help in association tests. We obtained the reported effect sizes (OR) and P-values for these 28 loci from the table shown in their manuscript. We then calculated the standard errors from the OR and P-values, and used them for meta-analysis. We removed three loci whose OR was 1.00 (because the paper reported only two digits below zero), and applied RE2C to the remaining 25 loci. Table 2 gives the details of the collected data and the meta-analysis results. Out of 25 loci, LS was the most significant in 13 loci. In all the remaining 12 loci, RE2C was the most significant. Note that for the 13 loci where LS was the most significant, RE2C P-values were completely non-significant (). This is because RE2C was designed to be used with FE (LS), but focusing only on loci with heterogeneity. We also show the results of an RE2C implementation with optimization for correlated statistics but without the technique for focusing on heterogeneous effects (denoted as RE2C*), which shows that focusing on heterogeneous effects improved P-values at these 12 loci. Overall, these results showed that if RE2C is used in combination with LS, a high association test power to detect both loci with and without heterogeneity is obtained. Interestingly, RE2C found two loci (rs4698413 and rs2263418) as genome-wide significant that were not identified by LS alone.
Table 2

Cross-disease meta-analysis results of the Alzheimer’s disease and Parkinson’s disease based on the reported data from Moskvina et al.

Methods
Parkinson Disease
Alzheimer Disease
LSDR2RE2C*RE2C
ChrBase PairsSNPORPORPPPPP
Alzheimer Disease
1207 819 4921-2078194920.610.0620.500.0580.0160.0190.023891
2127 892 810rs67338391.070.00981.235.2E-50.000290.000330.000173.0E-5
647 327 031rs93672711.110.00141.060.3390.00170.00200.002791
7143 106 884rs78060470.870.0010.890.1510.00070.00080.001181
827 466 181rs15322770.990.7090.811.8E-60.0240.00118.11E-051.5E-5
1160 045 900rs79498160.950.0730.820.000750.00840.0100.005890.0012
1185 677 09411-856770941.200.00551.260.0570.00190.0020.003141
1901 032 228rs560595580.860.00230.840.050.00080.00090.0013021
1945 392 254rs68570.950.1545.554.4E-920.00022.9E-533.24E-941.6E-95
1951 724 326rs2006561.060.0891.060.237130.0550.060.074611
Parkinson Disease
1155 135 036rs357490111.436.1E-51.020.9380.000120.000140.000221
2135 592 245rs67580441.121.2E-50.960.3830.00050.00037.99E-051.5E-5
2169 119 178rs133920791.141.1E-60.950.2960.00013.2E-56.07E-061.0E-6
3161 114 968rs3365490.909.4E-61.050.2750.00040.00024.68E-058.5E-6
3182 760 073rs105137891.110.00071.010.9210.0010.00120.001641
415 737 882rs46984131.154.4E-90.980.6515.6E-72.2E-75.38E-088.2E-9
477 146 751rs562754161.152.0E-61.010.843.8E-54.1E-52.80E-054.9E-6
490 646 886rs3561650.761.2E-281.040.389.4E-218.0E-253.81E-283.2E-29
632 440 158rs74537031.100.00061.200.000211.4E-51.7E-52.68E-051
816 718 969rs5877381.100.000151.020.6160.00080.00090.001171
889 647 6888-896476881.631.9E-51.500.0781.2E-51.4E-52.26E-051
1240 582 993rs22634181.241.5E-80.930.3541.4E-65.6E-79.45E-081.5E-8
12123 110 365rs64891580.910.000180.930.1190.00010.00020.000231
1631 103 796rs23596121.123.3E-61.080.0732.8E-63.4E-65.55E-061
1743 804 317rs98973990.751.5E-190.920.1071.4E-164.6E-178.19E-188.3E-19

We compared the results of the Lin-Sullivan method (LS), Decoupling-RE2 (DR2) and RE2C. RE2C* refers to an RE2C implementation with optimization for correlated statistics but without the technique for focusing on heterogeneous effects. The most significant P-value among all methods is in bold-face.

Cross-disease meta-analysis results of the Alzheimer’s disease and Parkinson’s disease based on the reported data from Moskvina et al. We compared the results of the Lin-Sullivan method (LS), Decoupling-RE2 (DR2) and RE2C. RE2C* refers to an RE2C implementation with optimization for correlated statistics but without the technique for focusing on heterogeneous effects. The most significant P-value among all methods is in bold-face. We also performed additional real data analyses where statistics were uncorrelated, to demonstrate the performance of RE2C for combining independent datasets. The results are shown in Supplementary Materials (Supplementary Table S1).

3.3 Efficiency

We evaluated the efficiency of the methods (Table 3). To this end, we measured the running time of methods for the meta-analysis of differing numbers of studies (from 2 to 100). We timed how long it took to analyze 1 000 000 SNPs. We used the software R to run FE and RE2C, and Java to run RE2. RE2C was highly efficient. The estimated time to analyze a million SNPs in a meta-analysis combining 100 studies was 0.07 hours for RE2 and 0.44 hours for RE2C. Our results imply that RE2C is suitable for future large-scale meta-analyses, where the number of datasets to be combined is expected to grow.
Table 3

Efficiency of RE2C

2 studies10 studies25 studies100 studies
FE (R)25s52s93s297s
RE2 (Java)36s51s85s260s
RE2C (R)23s51s118s1615s (0.44h)
Efficiency of RE2C

4 Conclusion

We proposed a new random effects model meta-analysis method RE2C, which has an improved power for the detection of heterogeneous effects between studies. We optimized the statistic for meta-analyzing correlated statistics, and modified the statistics to only focus on heterogeneous effects. We expect that our method will be applied to a wide range of study designs in the future, such as cross-disease or cross-population studies, to help identify new associations.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) [grant number 2016R1C1B2013126]. Conflict of Interest: none declared. Click here for additional data file.
  35 in total

Review 1.  Genomewide association studies and assessment of the risk of disease.

Authors:  Teri A Manolio
Journal:  N Engl J Med       Date:  2010-07-08       Impact factor: 91.245

2.  A likelihood approach to meta-analysis with random effects.

Authors:  R J Hardy; S G Thompson
Journal:  Stat Med       Date:  1996-03-30       Impact factor: 2.373

3.  Optimally weighted Z-test is a powerful method for combining probabilities in meta-analysis.

Authors:  D V Zaykin
Journal:  J Evol Biol       Date:  2011-05-23       Impact factor: 2.411

4.  Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies.

Authors:  Buhm Han; Eleazar Eskin
Journal:  Am J Hum Genet       Date:  2011-05-13       Impact factor: 11.025

5.  Genome-wide association identifies genetic variants associated with lentiform nucleus volume in N = 1345 young and elderly subjects.

Authors:  Derrek P Hibar; Jason L Stein; April B Ryles; Omid Kohannim; Neda Jahanshad; Sarah E Medland; Narelle K Hansell; Katie L McMahon; Greig I de Zubicaray; Grant W Montgomery; Nicholas G Martin; Margaret J Wright; Andrew J Saykin; Clifford R Jack; Michael W Weiner; Arthur W Toga; Paul M Thompson
Journal:  Brain Imaging Behav       Date:  2013-06       Impact factor: 3.978

6.  A genome-wide association study identifies pancreatic cancer susceptibility loci on chromosomes 13q22.1, 1q32.1 and 5p15.33.

Authors:  Gloria M Petersen; Laufey Amundadottir; Charles S Fuchs; Peter Kraft; Rachael Z Stolzenberg-Solomon; Kevin B Jacobs; Alan A Arslan; H Bas Bueno-de-Mesquita; Steven Gallinger; Myron Gross; Kathy Helzlsouer; Elizabeth A Holly; Eric J Jacobs; Alison P Klein; Andrea LaCroix; Donghui Li; Margaret T Mandelson; Sara H Olson; Harvey A Risch; Wei Zheng; Demetrius Albanes; William R Bamlet; Christine D Berg; Marie-Christine Boutron-Ruault; Julie E Buring; Paige M Bracci; Federico Canzian; Sandra Clipp; Michelle Cotterchio; Mariza de Andrade; Eric J Duell; J Michael Gaziano; Edward L Giovannucci; Michael Goggins; Göran Hallmans; Susan E Hankinson; Manal Hassan; Barbara Howard; David J Hunter; Amy Hutchinson; Mazda Jenab; Rudolf Kaaks; Charles Kooperberg; Vittorio Krogh; Robert C Kurtz; Shannon M Lynch; Robert R McWilliams; Julie B Mendelsohn; Dominique S Michaud; Hemang Parikh; Alpa V Patel; Petra H M Peeters; Aleksandar Rajkovic; Elio Riboli; Laudina Rodriguez; Daniela Seminara; Xiao-Ou Shu; Gilles Thomas; Anne Tjønneland; Geoffrey S Tobias; Dimitrios Trichopoulos; Stephen K Van Den Eeden; Jarmo Virtamo; Jean Wactawski-Wende; Zhaoming Wang; Brian M Wolpin; Herbert Yu; Kai Yu; Anne Zeleniuch-Jacquotte; Joseph F Fraumeni; Robert N Hoover; Patricia Hartge; Stephen J Chanock
Journal:  Nat Genet       Date:  2010-01-24       Impact factor: 38.330

7.  Identification of common variants associated with human hippocampal and intracranial volumes.

Authors:  Jason L Stein; Sarah E Medland; Alejandro Arias Vasquez; Derrek P Hibar; Rudy E Senstad; Anderson M Winkler; Roberto Toro; Katja Appel; Richard Bartecek; Ørjan Bergmann; Manon Bernard; Andrew A Brown; Dara M Cannon; M Mallar Chakravarty; Andrea Christoforou; Martin Domin; Oliver Grimm; Marisa Hollinshead; Avram J Holmes; Georg Homuth; Jouke-Jan Hottenga; Camilla Langan; Lorna M Lopez; Narelle K Hansell; Kristy S Hwang; Sungeun Kim; Gonzalo Laje; Phil H Lee; Xinmin Liu; Eva Loth; Anbarasu Lourdusamy; Morten Mattingsdal; Sebastian Mohnke; Susana Muñoz Maniega; Kwangsik Nho; Allison C Nugent; Carol O'Brien; Martina Papmeyer; Benno Pütz; Adaikalavan Ramasamy; Jerod Rasmussen; Mark Rijpkema; Shannon L Risacher; J Cooper Roddey; Emma J Rose; Mina Ryten; Li Shen; Emma Sprooten; Eric Strengman; Alexander Teumer; Daniah Trabzuni; Jessica Turner; Kristel van Eijk; Theo G M van Erp; Marie-Jose van Tol; Katharina Wittfeld; Christiane Wolf; Saskia Woudstra; Andre Aleman; Saud Alhusaini; Laura Almasy; Elisabeth B Binder; David G Brohawn; Rita M Cantor; Melanie A Carless; Aiden Corvin; Michael Czisch; Joanne E Curran; Gail Davies; Marcio A A de Almeida; Norman Delanty; Chantal Depondt; Ravi Duggirala; Thomas D Dyer; Susanne Erk; Jesen Fagerness; Peter T Fox; Nelson B Freimer; Michael Gill; Harald H H Göring; Donald J Hagler; David Hoehn; Florian Holsboer; Martine Hoogman; Norbert Hosten; Neda Jahanshad; Matthew P Johnson; Dalia Kasperaviciute; Jack W Kent; Peter Kochunov; Jack L Lancaster; Stephen M Lawrie; David C Liewald; René Mandl; Mar Matarin; Manuel Mattheisen; Eva Meisenzahl; Ingrid Melle; Eric K Moses; Thomas W Mühleisen; Matthias Nauck; Markus M Nöthen; Rene L Olvera; Massimo Pandolfo; G Bruce Pike; Ralf Puls; Ivar Reinvang; Miguel E Rentería; Marcella Rietschel; Joshua L Roffman; Natalie A Royle; Dan Rujescu; Jonathan Savitz; Hugo G Schnack; Knut Schnell; Nina Seiferth; Colin Smith; Vidar M Steen; Maria C Valdés Hernández; Martijn Van den Heuvel; Nic J van der Wee; Neeltje E M Van Haren; Joris A Veltman; Henry Völzke; Robert Walker; Lars T Westlye; Christopher D Whelan; Ingrid Agartz; Dorret I Boomsma; Gianpiero L Cavalleri; Anders M Dale; Srdjan Djurovic; Wayne C Drevets; Peter Hagoort; Jeremy Hall; Andreas Heinz; Clifford R Jack; Tatiana M Foroud; Stephanie Le Hellard; Fabio Macciardi; Grant W Montgomery; Jean Baptiste Poline; David J Porteous; Sanjay M Sisodiya; John M Starr; Jessika Sussmann; Arthur W Toga; Dick J Veltman; Henrik Walter; Michael W Weiner; Joshua C Bis; M Arfan Ikram; Albert V Smith; Vilmundur Gudnason; Christophe Tzourio; Meike W Vernooij; Lenore J Launer; Charles DeCarli; Sudha Seshadri; Ole A Andreassen; Liana G Apostolova; Mark E Bastin; John Blangero; Han G Brunner; Randy L Buckner; Sven Cichon; Giovanni Coppola; Greig I de Zubicaray; Ian J Deary; Gary Donohoe; Eco J C de Geus; Thomas Espeseth; Guillén Fernández; David C Glahn; Hans J Grabe; John Hardy; Hilleke E Hulshoff Pol; Mark Jenkinson; René S Kahn; Colm McDonald; Andrew M McIntosh; Francis J McMahon; Katie L McMahon; Andreas Meyer-Lindenberg; Derek W Morris; Bertram Müller-Myhsok; Thomas E Nichols; Roel A Ophoff; Tomas Paus; Zdenka Pausova; Brenda W Penninx; Steven G Potkin; Philipp G Sämann; Andrew J Saykin; Gunter Schumann; Jordan W Smoller; Joanna M Wardlaw; Michael E Weale; Nicholas G Martin; Barbara Franke; Margaret J Wright; Paul M Thompson
Journal:  Nat Genet       Date:  2012-04-15       Impact factor: 38.330

Review 8.  Shared genetic susceptibility to ischemic stroke and coronary artery disease: a genome-wide analysis of common variants.

Authors:  Martin Dichgans; Rainer Malik; Inke R König; Jonathan Rosand; Robert Clarke; Solveig Gretarsdottir; Gudmar Thorleifsson; Braxton D Mitchell; Themistocles L Assimes; Christopher Levi; Christopher J O'Donnell; Myriam Fornage; Unnur Thorsteinsdottir; Bruce M Psaty; Christian Hengstenberg; Sudha Seshadri; Jeanette Erdmann; Joshua C Bis; Annette Peters; Giorgio B Boncoraglio; Winfried März; James F Meschia; Sekar Kathiresan; M Arfan Ikram; Ruth McPherson; Kari Stefansson; Cathie Sudlow; Muredach P Reilly; John R Thompson; Pankaj Sharma; Jemma C Hopewell; John C Chambers; Hugh Watkins; Peter M Rothwell; Robert Roberts; Hugh S Markus; Nilesh J Samani; Martin Farrall; Heribert Schunkert
Journal:  Stroke       Date:  2013-11-21       Impact factor: 7.914

9.  Effectively identifying eQTLs from multiple tissues by combining mixed model and meta-analytic approaches.

Authors:  Jae Hoon Sul; Buhm Han; Chun Ye; Ted Choi; Eleazar Eskin
Journal:  PLoS Genet       Date:  2013-06-13       Impact factor: 5.917

10.  Genetic susceptibility for chronic bronchitis in chronic obstructive pulmonary disease.

Authors:  Jin Hwa Lee; Michael H Cho; Craig P Hersh; Merry-Lynn N McDonald; James D Crapo; Per S Bakke; Amund Gulsvik; Alejandro P Comellas; Christine H Wendt; David A Lomas; Victor Kim; Edwin K Silverman
Journal:  Respir Res       Date:  2014-09-21
View more
  14 in total

1.  Association of Schizophrenia Risk With Disordered Niacin Metabolism in an Indian Genome-wide Association Study.

Authors:  Sathish Periyasamy; Sujit John; Raman Padmavati; Preeti Rajendren; Priyadarshini Thirunavukkarasu; Jacob Gratten; Anna Vinkhuyzen; Allan McRae; Elizabeth G Holliday; Dale R Nyholt; Derek Nancarrow; Andrew Bakshi; Gibran Hemani; Deborah Nertney; Heather Smith; Cheryl Filippich; Kalpana Patel; Javed Fowdar; Duncan McLean; Srinivasan Tirupati; Arunkumar Nagasundaram; Prasad Rao Gundugurti; Krishnamurthy Selvaraj; Jayaprakash Jegadeesan; Lynn B Jorde; Naomi R Wray; Matthew A Brown; Rachel Suetani; Jean Giacomotto; Rangaswamy Thara; Bryan J Mowry
Journal:  JAMA Psychiatry       Date:  2019-10-01       Impact factor: 21.596

2.  Meta-Analysis for Epigenome-Wide Association Studies.

Authors:  Nan Wang; Shuilin Jin
Journal:  Methods Mol Biol       Date:  2022

3.  Bayesian analysis of longitudinal traits in the Korea Association Resource (KARE) cohort.

Authors:  Wonil Chung; Hyunji Hwang; Taesung Park
Journal:  Genomics Inform       Date:  2022-06-30

4.  Leveraging the local genetic structure for trans-ancestry association mapping.

Authors:  Jiashun Xiao; Mingxuan Cai; Xinyi Yu; Xianghong Hu; Gang Chen; Xiang Wan; Can Yang
Journal:  Am J Hum Genet       Date:  2022-06-16       Impact factor: 11.043

5.  A multi-level investigation of the genetic relationship between endometriosis and ovarian cancer histotypes.

Authors:  Sally Mortlock; Rosario I Corona; Pik Fang Kho; Paul Pharoah; Ji-Heui Seo; Matthew L Freedman; Simon A Gayther; Matthew T Siedhoff; Peter A W Rogers; Ronald Leuchter; Christine S Walsh; Ilana Cass; Beth Y Karlan; B J Rimel; Grant W Montgomery; Kate Lawrenson; Siddhartha P Kar
Journal:  Cell Rep Med       Date:  2022-03-15

6.  FOLD: a method to optimize power in meta-analysis of genetic association studies with overlapping subjects.

Authors:  Emma E Kim; Seunghoon Lee; Cue Hyunkyu Lee; Hyunjung Oh; Kyuyoung Song; Buhm Han
Journal:  Bioinformatics       Date:  2017-12-15       Impact factor: 6.937

7.  GWAS of five gynecologic diseases and cross-trait analysis in Japanese.

Authors:  Tatsuo Masuda; Siew-Kee Low; Masato Akiyama; Makoto Hirata; Yutaka Ueda; Koichi Matsuda; Tadashi Kimura; Yoshinori Murakami; Michiaki Kubo; Yoichiro Kamatani; Yukinori Okada
Journal:  Eur J Hum Genet       Date:  2019-09-05       Impact factor: 4.246

8.  Genome-Wide Association Analyses Identify Variants in IRF4 Associated With Acute Myeloid Leukemia and Myelodysplastic Syndrome Susceptibility.

Authors:  Junke Wang; Alyssa I Clay-Gilmour; Ezgi Karaesmen; Abbas Rizvi; Qianqian Zhu; Li Yan; Leah Preus; Song Liu; Yiwen Wang; Elizabeth Griffiths; Daniel O Stram; Loreall Pooler; Xin Sheng; Christopher Haiman; David Van Den Berg; Amy Webb; Guy Brock; Stephen Spellman; Marcelo Pasquini; Philip McCarthy; James Allan; Friedrich Stölzel; Kenan Onel; Theresa Hahn; Lara E Sucheston-Campbell
Journal:  Front Genet       Date:  2021-06-17       Impact factor: 4.599

9.  Identifying small-effect genetic associations overlooked by the conventional fixed-effect model in a large-scale meta-analysis of coronary artery disease.

Authors:  Lerato E Magosi; Anuj Goel; Jemma C Hopewell; Martin Farrall
Journal:  Bioinformatics       Date:  2020-01-15       Impact factor: 6.937

10.  A novel estimator of between-study variance in random-effects models.

Authors:  Nan Wang; Jun Zhang; Li Xu; Jing Qi; Beibei Liu; Yiyang Tang; Yinan Jiang; Liang Cheng; Qinghua Jiang; Xunbo Yin; Shuilin Jin
Journal:  BMC Genomics       Date:  2020-02-11       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.