Literature DB >> 27149374

Compare and Contrast Meta Analysis (CCMA): A Method for Identification of Pleiotropic Loci in Genome-Wide Association Studies.

Hansjörg Baurecht¹, Melanie Hotze¹, Elke Rodríguez¹, Judith Manz², Stephan Weidinger¹, Heather J Cordell³, Thomas Augustin⁴, Konstantin Strauch^5,6.

Abstract

In recent years, genome-wide association studies (GWAS) have identified many loci that are shared among common disorders and this has raised interest in pleiotropy. For performing appropriate analysis, several methods have been proposed, e.g. conducting a look-up in external sources or exploiting GWAS results by meta-analysis based methods. We recently proposed the Compare & Contrast Meta-Analysis (CCMA) approach where significance thresholds were obtained by simulation. Here we present analytical formulae for the density and cumulative distribution function of the CCMA test statistic under the null hypothesis of no pleiotropy and no association, which, conveniently for practical reasons, turns out to be exponentially distributed. This allows researchers to apply the CCMA method without having to rely on simulations. Finally, we show that CCMA demonstrates power to detect disease-specific, agonistic and antagonistic loci comparable to the frequently used Subset-Based Meta-Analysis approach, while better controlling the type I error rate.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2016 PMID： 27149374 PMCID： PMC4858294 DOI： 10.1371/journal.pone.0154872

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Genome-wide association studies (GWAS) have identified many loci that are shared among common disorders. [1] The interest in pleiotropy, “the multi-functionality of a gene in phenotype presentation”, [2] has increased in recent years. Customized arrays have been designed by consortia of related diseases (e.g. the Immunochip array for immune-mediated disorders), to fine map established GWAS loci at high resolution and identify single nucleotide variants (SNVs) shared among different traits. For performing an appropriate analysis, several methods [1, 2] have been proposed that use external sources such as the GWAS catalog. [3] Others exploit GWAS results using meta-analysis based methods. [4, 5] We have recently proposed the Compare & Contrast Meta-Analysis (CCMA) approach [6] and have found suitable P-value thresholds corresponding to standard suggestive (P < 10−5) and genome wide significant (P < 10−8) association by simulation. In this work we present an analytical cumulative distribution function for the CCMA test statistic, which is in good accordance with the levels derived by simulation studies.

Materials and Methods

As we previously described [6], the CCMA uses z-scores from GWAS of two different traits, T1 and T2, which are asymptotically normally distributed and signed according to the direction of effect of a certain reference allele. Furthermore, two z-scores for meta analysis are defined, assuming an agonistic or an antagonistic action of the variant on the two traits [6]. Then the CCMA test statistic is constructed as where In order to derive a P-value for an observed realization t, the null distribution was empirically determined by simulating R = 1,000,000,000 replicates of two normally distributed random variables Z1 and Z2. Then , and was calculated for each replicate. The empirical P-values can be derived as In order to find an analytic formulation of the P-value distribution we consider the squared values of the test statistics under the null hypothesis (H0) of no pleiotropy and no association between the SNV and any trait. By design, each of the four transformed variables follows a distribution with and under H0 (see S1 Appendix). Thus, the transformed CCMA test statistic can be expressed as and empirical P-values can be calculated for an observed realization by Plotting −log10(P) against suggests that the relationship can be expressed by a straight line (Fig 1).

Fig 1

Five empirical evaluations of the −log10(P)-distribution of the statistic, each obtained by simulating 2 × 109 replicates.

The theoretical distribution was obtained by fitting a straight line. The grey shaded area reflects the 95% Clopper-Pearson confidence interval [7].

Five empirical evaluations of the −log10(P)-distribution of the statistic, each obtained by simulating 2 × 109 replicates.

The theoretical distribution was obtained by fitting a straight line. The grey shaded area reflects the 95% Clopper-Pearson confidence interval [7]. A general formula for the distribution and density function of the maximum of independent identically-distributed (iid) variables has been described in Chapter 2.11 of Ewens & Grant [8]. Let X1, X2, …, X be continuous iid variables and Xmax = max(X1, X2, …, X) their maximum, then the cumulative distribution function of Xmax can be written as follows: Formula (5) cannot be applied directly to our situation, since we do not have four independent variables. However, we can divide them into two independent blocks of iid -distributed variables and . We let be the distribution function of each variable and let denote the distribution function of or , then Furthermore it is known that the sum of two iid -distributed variables is -distributed with the cumulative distribution function . Since we have only two independent random variables and , we may postulate the following boundaries for : To prove that F(z) ≥ F(z) for two test statistics Z and Z, we have to show that Z ≤ Z for every scenario, i.e., for every set of and . It can be seen that and thus . Furthermore, it is obvious that and therefore . Finally, we prove that by showing that . Since obviously and , it remains to be shown that and (see S2 Appendix). This concludes the proof of Eq (7). Therefore, with Formula (7) we have established explicit boundaries for , which are visualized in Fig 2.

Fig 2

Comparison of , and .

It is important that is exponentially distributed. To derive that, note that can be expressed in terms of an exponential distribution F(z) with scale parameter and F(z) is connected to z by a log-linear relation Given the fact that the relationship of −log10(P) and under H0 is a straight line (Fig 1), the cumulative distribution function of is Using the relationship 10 = elog(10)⋅, we can write as an exponential distribution In conclusion, from the empirically derived linear relation between the log10-transformed P-value and the test statistic it follows that is exponentially distributed. In order to determine the theoretical distribution, we searched for the optimal slope parameter b. To this end, we conducted two simulations of 100 empirical distributions with R = 1,000,000,000 replicates and 5 empirical distributions with R = 2,000,000,000 replicates, respectively. We estimated the slope parameter by means of linear regression and found a consistent estimate of b ≈ 0.228 (Table 1).

Table 1

Distribution of the slope parameter b of simulated distributions by different simulation settings.

sim. = simulations, repl. = replicates.

Setting	Min	Q1	Median	Q3	Max	Mean	Std Dev
100 sim.with 1 × 10⁹ repl.	0.22786	0.22795	0.22797	0.2280	0.22809	0.22797	3.88 ⋅ 10⁻⁵
5 sim. with 2 × 10⁹ repl.	0.22796	0.22797	0.22798	0.22798	0.22799	0.22798	1.08 ⋅ 10⁻⁵

Distribution of the slope parameter b of simulated distributions by different simulation settings.

sim. = simulations, repl. = replicates. With Eqs (10) and (11) we can give a formula for the cumulative distribution function of the original (not squared) Zmax statistic: Formula (12) represents the cumulative distribution function of the original Zmax statistic and we compare it with its simulated values from the previous study. We find theoretical thresholds for suggestive (10−5) and genomewide (10−8) significance of Zmax = 4.68 and Zmax = 5.92, respectively (S1 Fig). These thresholds correspond well to the values of 4.7 and 6 derived by our previous simulation study (see Methods section in Baurecht et al. [6]).

Results

We compared the power and type 1 error (see S3 Appendix) of the CCMA method with the Subset-Based Meta-Analysis [5] implemented in the R-package ASSET [9] by simulations. To this end, we generated a fixed population of n = 20,000 individuals with respective genotypes according to the specified minor allele frequency (MAF) for a single SNV in exact Hardy-Weinberg Equilibrium. Then, we drew n = 8,000 individuals and simulated their phenotypes by applying a multinomial model with baseline risks for two diseases of 0.1 and 0.05 (e.g. AD and psoriasis), mimicking the respective prevalence using a previously described algorithm [10]. For simplicity the controls were distributed equally between both case sets. We varied the minor allele frequencies (MAF) ∈ (0.1, 0.2, 0.3) and the odds ratios (OR) ∈ (1.15, 1.2, 1.3). Power was estimated for levels of α = 0.001 and α = 10−5 with R = 1,000 replicates to detect (a) disease specific, (b) agonistic and (c) antagonistic effects. In the simulation-based power analysis we found that the CCMA method is only marginally less powerful for detecting disease specific, agonistic and antagonistic effects than the ASSET method (S2, S3, S4 Figs, Table 2). However, CCMA provides better control over the type 1 error rate (see S1 Table and S5 Fig). These results demonstrate the trade off between power and controlling type 1 error. If we would use e.g. the inflated ASSET threshold of 0.01205 for CCMA (S1 Table: OR = 1.3, MAF = 0.2, α = 0.01), then ASSET and CCMA exhibit almost identical power (disease-specific: PowerASSET = 0.830, PowerCCMA = 0.839; agonistic: PowerASSET = 0.976, PowerCCMA = 0.974; antagonistic: PowerASSET = 0.952, PowerCCMA = 0.955). We obtained comparable results by setting equal baseline risks for both diseases (data not shown).

Table 2

Power comparison of the CCMA and Subset-Based Meta-Analysis (ASSET) for detection of true associations at a significance level of α = 0.001 and α = 10−5.

For each power estimate, we ran R = 1,000 simulations with n = 8,000 individuals for various MAF and OR values and assigned the disease status by a multinomial model.

MAF	OR	disease-specific effect		agonistic effect		antagonistic effect
		ASSET	CCMA	ASSET	CCMA	ASSET	CCMA
α = 0.001
0.1	1.15	0.0320	0.0270	0.0600	0.0520	0.0430	0.0360
	1.2	0.0900	0.0860	0.1620	0.1400	0.1140	0.1060
	1.3	0.2760	0.2660	0.5780	0.5420	0.4470	0.4330
0.2	1.15	0.0780	0.0690	0.1820	0.1700	0.1340	0.1300
	1.2	0.1760	0.1730	0.4430	0.4160	0.3450	0.3270
	1.3	0.6200	0.6070	0.9050	0.8920	0.8320	0.8200
0.3	1.15	0.1100	0.1090	0.2460	0.2240	0.2130	0.2000
	1.2	0.2950	0.2830	0.6130	0.5830	0.5330	0.5060
	1.3	0.8170	0.8150	0.9760	0.9670	0.9430	0.9360
α = 10⁻⁵
0.1	1.15	0.0010	0.0010	0.0030	0.0020	0.0010	0.0020
	1.2	0.0080	0.0100	0.0220	0.0220	0.0140	0.0110
	1.3	0.0540	0.0540	0.1980	0.1880	0.0940	0.0910
0.2	1.15	0.0080	0.0090	0.0190	0.0190	0.0070	0.0070
	1.2	0.0240	0.0260	0.1010	0.0900	0.0630	0.0580
	1.3	0.2320	0.2280	0.5800	0.5540	0.4490	0.4210
0.3	1.15	0.0130	0.0100	0.0300	0.0260	0.0230	0.0240
	1.2	0.0560	0.0540	0.2090	0.1940	0.1380	0.1290
	1.3	0.4160	0.4190	0.8000	0.7830	0.6960	0.6790

Power comparison of the CCMA and Subset-Based Meta-Analysis (ASSET) for detection of true associations at a significance level of α = 0.001 and α = 10−5.

For each power estimate, we ran R = 1,000 simulations with n = 8,000 individuals for various MAF and OR values and assigned the disease status by a multinomial model. A minor modification of the CCMA test statistic allows taking study size into account by using weights w1 and w2 (see S4 Appendix), which improves power for detecting either agonistic or antagonistic effects, depending on the specification of the transformation matrix (S2 Table). If we distribute the controls in proportion to the case sets, which is a reasonable scenario in practice, the power of both methods is mostly increased. Of note, for disease specific and antagonistic effects and α = 10−5 the power of CCMA and its modified version is in most cases higher than the power of ASSET (S3 Table).

Discussion

We have previously shown that the CCMA method is an appealing approach to screen for shared and disease-specific loci as well as to leverage additional cross-phenotype association information using available GWAS data [6]. We have now determined the null distribution for the CCMA test statistic, which corresponds to an exponential distribution, and we show that CCMA demonstrates comparable power for detecting disease-specific, agonistic and antagonistic loci to the frequently used Subset-Based Meta-Analysis [5] (ASSET) approach, while better controlling the type I error. The CCMA method, which is calculated in a straightforward way, allows us to infer the mode of pleiotropy directly by looking at which of the four constituent statistics T1, T2, T12,agonistic or T12,antagonistic yields the maximum. Finally, the CCMA method can also be applied to other genome-wide molecular data (e.g. gene expression, epigenomics, metabolomics) as well as to other research questions such as those encountered in environmental epidemiology. Here, the influence of environmental exposures or lifestyle factors on two different traits of interest can be analyzed with regard to their concordant or contrasting effects. In subgroup meta-analysis similar questions are addressed by e.g. comparing group A vs. group B using a Z-test [11]. This Z-test allows only to contrast two effects, but neither to consider disease-specific, agonistic and antagonistic effects simultaneously nor to distinguish between them. A canonical method to approach such questions would be a multinomial regression model followed by Wald tests for testing effect contrasts [12]. Although the multinomial regression model allows to incorporate covariates, it is not applicable if only summary statistics are available and it requires by far more computing time if applied on a genome-wide level. In conclusion, the proposed CCMA method has some attractive properties for investigating the effect of exposure variables on two different traits. The simply constructed test statistic follows an exponential distribution under the null hypothesis, which allows a fast and easy implementation as well as a direct deduction of the mode of pleiotropy. The method can be conveniently applied to similar questions in other domains and can also exploit summary statistics from two single studies.

Empirical and theoretical −log10(P)-distribution of Zmax with parameter b = 0.228.

Dotted and solid grey lines indicate the thresholds of suggestive (Zmax = 4.68) and genomewide significance (Zmax = 5.92). (TIF) Click here for additional data file.

Simulation-based power comparison of CCMA and Subset-Based Meta-Analysis (ASSET) for detecting a disease-specific effect.

For each power estimate, we ran R = 1,000 simulations with n = 8,000 individuals for various MAF and OR values and assigned the disease status by a multinomial model. A significance threshold of α = 0.001 and α = 10−5 was applied. (PDF) Click here for additional data file.

Simulation-based power comparison of CCMA and Subset-Based Meta-Analysis (ASSET) for detecting an agonistic effect.

Simulation-based power comparison of CCMA and Subset-Based Meta-Analysis (ASSET) for detecting an antagonistic effect.

Simulation-based type 1 error comparison of CCMA, wCCMA and the Subset-Based Meta-Analysis (ASSET) under H0.

We ran R = 100,000 simulations with n = 8,000 individuals for various MAF values under H0. Several significance thresholds were considered for comparison α = (0.001, 0.005, 0.01, 0.05). (PDF) Click here for additional data file.

Type 1 error comparison of CCMA, wCCMA and the Subset-Based Meta-Analysis (ASSET) under H0.

We ran R = 100,000 simulations with n = 8,000 individuals for various MAF under H0. Several significance thresholds were considered for comparison α = (0.001, 0.005, 0.01, 0.05). (PDF) Click here for additional data file.

Power comparison of the CCMA, wCCMA and Subset-Based Meta-Analysis (ASSET) for detection of true associations at a significance level of α = 0.001 and α = 10−5.

For each power estimate, we ran R = 1,000 simulations with n = 8,000 individuals for various MAF and OR values and assigned the disease status by a multinomial model and distributed controls equally to both case sets. (PDF) Click here for additional data file. For each power estimate, we ran R = 1,000 simulations with n = 8,000 individuals for various MAF and OR values and assigned the disease status by a multinomial model and distributed controls proportionally to the case sets. (PDF) Click here for additional data file.

Proof of Independence between Z12,agonistic and Z12,antagonistic.

(PDF) Click here for additional data file.

Proof that and .

(PDF) Click here for additional data file.

Comparison of the Type 1 Error.

(PDF) Click here for additional data file.

Weighted CCMA Test Statistic (wCCMA).

(PDF) Click here for additional data file.

6 in total

1. A subset-based approach improves power and interpretation for the combined analysis of genetic association studies of heterogeneous traits.

Authors: Samsiddhi Bhattacharjee; Preetha Rajaraman; Kevin B Jacobs; William A Wheeler; Beatrice S Melin; Patricia Hartge; Meredith Yeager; Charles C Chung; Stephen J Chanock; Nilanjan Chatterjee
Journal: Am J Hum Genet Date: 2012-05-04 Impact factor: 11.025

2. Combined analysis of genome-wide association studies for Crohn disease and psoriasis identifies seven shared susceptibility loci.

Authors: David Ellinghaus; Eva Ellinghaus; Rajan P Nair; Philip E Stuart; Tõnu Esko; Andres Metspalu; Sophie Debrus; John V Raelson; Trilokraj Tejasvi; Majid Belouchi; Sarah L West; Jonathan N Barker; Sulev Kõks; Külli Kingo; Tobias Balschun; Orazio Palmieri; Vito Annese; Christian Gieger; H Erich Wichmann; Michael Kabesch; Richard C Trembath; Christopher G Mathew; Gonçalo R Abecasis; Stephan Weidinger; Susanna Nikolaus; Stefan Schreiber; James T Elder; Michael Weichenthal; Michael Nothnagel; Andre Franke
Journal: Am J Hum Genet Date: 2012-04-06 Impact factor: 11.025

Review 3. Abundant pleiotropy in human complex diseases and traits.

Authors: Shanya Sivakumaran; Felix Agakov; Evropi Theodoratou; James G Prendergast; Lina Zgaga; Teri Manolio; Igor Rudan; Paul McKeigue; James F Wilson; Harry Campbell
Journal: Am J Hum Genet Date: 2011-11-11 Impact factor: 11.025

4. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations.

Authors: Danielle Welter; Jacqueline MacArthur; Joannella Morales; Tony Burdett; Peggy Hall; Heather Junkins; Alan Klemm; Paul Flicek; Teri Manolio; Lucia Hindorff; Helen Parkinson
Journal: Nucleic Acids Res Date: 2013-12-06 Impact factor: 16.971

5. Genome-wide comparative analysis of atopic dermatitis and psoriasis gives insight into opposing genetic mechanisms.

Authors: Hansjörg Baurecht; Melanie Hotze; Stephan Brand; Carsten Büning; Paul Cormican; Aiden Corvin; David Ellinghaus; Eva Ellinghaus; Jorge Esparza-Gordillo; Regina Fölster-Holst; Andre Franke; Christian Gieger; Norbert Hubner; Thomas Illig; Alan D Irvine; Michael Kabesch; Young A E Lee; Wolfgang Lieb; Ingo Marenholz; W H Irwin McLean; Derek W Morris; Ulrich Mrowietz; Rajan Nair; Markus M Nöthen; Natalija Novak; Grainne M O'Regan; Stefan Schreiber; Catherine Smith; Konstantin Strauch; Philip E Stuart; Richard Trembath; Lam C Tsoi; Michael Weichenthal; Jonathan Barker; James T Elder; Stephan Weidinger; Heather J Cordell; Sara J Brown
Journal: Am J Hum Genet Date: 2015-01-08 Impact factor: 11.025

6. Network-based SNP meta-analysis identifies joint and disjoint genetic features across common human diseases.

Authors: Matthias Arnold; Mara L Hartsperger; Hansjörg Baurecht; Elke Rodríguez; Benedikt Wachinger; Andre Franke; Michael Kabesch; Juliane Winkelmann; Arne Pfeufer; Marcel Romanos; Thomas Illig; Hans-Werner Mewes; Volker Stümpflen; Stephan Weidinger
Journal: BMC Genomics Date: 2012-09-18 Impact factor: 3.969

6 in total

3 in total

1. An Analytic Solution to the Computation of Power and Sample Size for Genetic Association Studies under a Pleiotropic Mode of Inheritance.

Authors: Derek Gordon; Douglas Londono; Payal Patel; Wonkuk Kim; Stephen J Finch; Gary A Heiman
Journal: Hum Hered Date: 2017-03-18 Impact factor: 0.444

2. ZBTB7B (ThPOK) Is Required for Pathogenesis of Cerebral Malaria and Protection against Pulmonary Tuberculosis.

Authors: David Langlais; Philippe Gros; James M Kennedy; Anna Georges; Angelia V Bassenden; Silvia M Vidal; Albert M Berghuis; Ichiro Taniuchi; Jacek Majewski; Mark Lathrop; Marcel A Behr
Journal: Infect Immun Date: 2020-01-22 Impact factor: 3.441

3. Meta-analysis of Immunochip data of four autoimmune diseases reveals novel single-disease and cross-phenotype associations.

Authors: Ana Márquez; Martin Kerick; Alexandra Zhernakova; Javier Gutierrez-Achury; Wei-Min Chen; Suna Onengut-Gumuscu; Isidoro González-Álvaro; Luis Rodriguez-Rodriguez; Raquel Rios-Fernández; Miguel A González-Gay; Maureen D Mayes; Soumya Raychaudhuri; Stephen S Rich; Cisca Wijmenga; Javier Martín
Journal: Genome Med Date: 2018-12-20 Impact factor: 11.117

3 in total