Literature DB >> 28968763

JEPEGMIX2: improved gene-level joint analysis of eQTLs in cosmopolitan cohorts.

Chris Chatzinakos¹, Donghyung Lee², Bradley T Webb¹, Vladimir I Vladimirov¹, Kenneth S Kendler¹, Silviu-Alin Bacanu¹.

Abstract

Motivation: To increase detection power, researchers use gene level analysis methods to aggregate weak marker signals. Due to gene expression controlling biological processes, researchers proposed aggregating signals for expression Quantitative Trait Loci (eQTL). Most gene-level eQTL methods make statistical inferences based on (i) summary statistics from genome-wide association studies (GWAS) and (ii) linkage disequilibrium patterns from a relevant reference panel. While most such tools assume homogeneous cohorts, our Gene-level Joint Analysis of functional SNPs in Cosmopolitan Cohorts (JEPEGMIX) method accommodates cosmopolitan cohorts by using heterogeneous panels. However, JEPGMIX relies on brain eQTLs from older gene expression studies and does not adjust for background enrichment in GWAS signals.
Results: We propose JEPEGMIX2, an extension of JEPEGMIX. When compared to JPEGMIX, it uses (i) cis-eQTL SNPs from the latest expression studies and (ii) brains specific (sub)tissues and tissues other than brain. JEPEGMIX2 also (i) avoids accumulating averagely enriched polygenic information by adjusting for background enrichment and (ii) to avoid an increase in false positive rates for studies with numerous highly enriched (above the background) genes, it outputs gene q-values based on Holm adjustment of P-values. Availability and implementation: https://github.com/Chatzinakos/JEPEGMIX2. Supplementary information: Supplementary data are available at Bioinformatics online.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2018 PMID： 28968763 PMCID： PMC5860197 DOI： 10.1093/bioinformatics/btx509

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Gene expression is believed to have influenced human evolution and play a key role in diseases (Emilsson ). Thus, it is critical for understanding diseases and developing treatments. The importance of gene expression was further underlined by the enrichment of association signals in SNPs tagging gene expression (Nica and Dermitzakis, 2008; Nicolae ), which are denoted as expression quantitative trait loci (eQTL). Currently, the identification of complex disease susceptibility loci is performed via genome-wide association studies (GWAS). It involves scanning single nucleotide polymorphisms (SNPs) across the entire genome for genetic variants associated with a trait. Univariate analysis of GWAS is still the de facto tool for identifying trait associated SNPs (Wellcome Trust Case Control, 2007). However, when analyzing more complex GWAS SNPs with weak or moderate effect sizes, the significant findings account only for a small fraction of the total trait variation (Manolio ). Due to their small effect sizes, these SNPs are rarely detected in GWAS (Yang ). To increase the power of detection, researchers proposed analyzing genetic variants multivariately (Wang ). One type of multivariate analyses is the transcriptome-wide association study (TWAS) which identifies significant expression-trait associations. Such methods, e.g. joint effect on phenotype of eQTL/functional SNPs associated with a gene (JEPEG) (Lee ), PredictXcan (Gamazon ), JEPEGMIX (Lee ) and TWAS (Gusev ) use eQTL to predict gene expression and/or infer which genes are associated with traits. However, unlike competing non-eQTL paradigms, e.g. LDscore/LDpred (Bulik-Sullivan ), current TWAS methods (i) lack competitive adjustment for background enrichment (‘average signal’) and (ii) do not output q-values that control false positive rates when there is a substantial number of genes enriched (above background) in signals. To address these shortcomings, we propose JEPEGMIX2, an extension of JEPEGMIX, which, in addition to the existing advantage of imputing eQTLs statistics and inferring gene-trait association in cosmopolitan cohorts, it also (i) adjusts for background enrichment, (ii) offers the option to upweight rarer eQTLs and (iii) to avoid false positive rate increase for high signal enrichment, it outputs Holm q-values.

2 Materials and methods

To avoid a mere accumulation of just averagely enriched polygenic information, we competitively adjust statistics for background enrichment. This is achieved by adjusting the statistic for average non-centrality. Such ‘centralized’ JEPEGMIX statistic we denote as competitive (C) and the original statistic as the non-competitive (NC). Let be the vector of -scores for measured SNPs in the genome scans. Due to polygenicity, the expected genome scan statistics, each with 1 degree of freedom (df), has a non-zero background noncentrality parameter , i.e. . Thus, by the method of moments, we can estimate , where is computed using all measured SNPs in the genome scan, However, given that , a better estimator is, thus, . To develop a competitive test, before computing gene-level statistics, Z-scores must be shrunk towards zero by adjusting for the average background enrichment. This can be achieved via a 3 step process: By Delta method (a first order Taylor approximation), as a linear transformation (deflation) of has the same correlation structure. Thus, can be used to build the competitive gene statistics (Supplementary Text S1), which has the same variance as their non-competitive versions. Recompute, under ‘average’ noncentrality, the P-value associated with statistics:|), where |), is the cumulative distribution function (cdf) of the non-central distribution with 1 df and noncentrality parameter . Transform into its quantile vector from a central distribution with 1 df, i.e. |), Transform to a ‘central’ Z-score: . To facilitate user-specific input along with future extensions, the new annotation file now includes a R-like formula for the expression of each gene as a function of its eQTL genotypes. The annotation file includes cis-eQTL for all tissues available in PREDICTDB (http://predictdb.hakyimlab.org/). To avoid making inference about genes poorly predicted by SNPs, for the available tissues we retain only genes for which the expression is predicted with q-value from its eQTLs. Additionally, given the increased deleteriousness of rarer mutations, we offer the possibility to upweight coefficient of rarer variants (Supplementary Text S1 for statistic computation) using a Madsen and Browning type approach (Madsen and Browning, 2009). For linkage disequilibrium (LD) estimates in cosmopolitan cohorts (needed for both imputation and statistical inference), we allow user to input the study cohort proportions of ethnicities from the reference panel. LD patterns of the study cohort are estimated as a weighted mixture (with the above weights) of the LD matrices for all ethnic groups in a reference panel (Supplementary Text S2). LD patterns are subsequently used to (i) accurately impute summary statistics of unmeasured eQTLs (Supplementary Text S3) and (ii) compute the variance of the SNP linear combinations used for gene level tests in each tissue (Supplementary Text S2). The current version uses the 1000 genome (1KG) Phase I release version 3 as reference panel (Durbin ). It consists of Europeans, Asians, Africans and Native Americans.

3 Simulations

To estimate the false positive rates of JEPEGMIX2, for five different cosmopolitan studies scenarios (Supplementary Text S4), we simulated (under ) 100 cosmopolitan cohorts of 10, 000 subjects for Ilumina 1 M autosomal SNPs using 1KG haplotype patterns (Supplementary Text S4, Supplementary Table 1). The subject phenotypes were simulated independent of genotypes as a random Gaussian sample. SNP phenotype-genotype association summary statistics, were computed as a correlation test. We obtained JEPEGMIX2 statistics for: (i) competitive (C), non-competitive (NC) and (ii) tests with rare (Madsen and Browning like) (R) and non-rare (NR) eQTL weights. To test the ability of methods to maintain false positive rates under background enrichment, we provide an enriched scenario. Under this scenario, we quantile transform the simulated ‘central’ Z-score (CZ) to a ‘non-central’ Z-score (NCZ) scenario by following the three steps from the previous section with the first step having noncentrality and the second one [extrapolation of PGC3 Schizophrenia nocentrality from PGC2 (Ripke )]. We also applied JEPEGMIX2 to 16 real summary datasets (Supplementary Text S5, Supplementary Table S2). To limit the increase in Type I error rates of JEPEGMIX2, we deem as significantly associated only genes with Holm-adjusted P-value (q-value) Due to C4 explaining most of Major Histocompatibility (MHC) (chr6: 25–33 Mb) (McCarthy ), signals for schizophrenia (SCZ), for this trait, we omit non-C4 genes in this region.

Table 1.

Signals for real datasets

Traits	No unique genes
SCZ	68
ALZ	34
AMD	17
BIP	11
HDL	79
LDL	78
T2D	5
TG	48
Smoking	5

4 Results

JEPEGMIX2 with competitive (C) statistics, controls the false positive rates at or below nominal thresholds for both central (CZ) and non-central (NCZ) scenarios while the non-competitive (NC) has similar behavior only for the central case (when the GWAS statistics are not enriched) (Supplementary Text S5, Supplementary Figs S1–S5). Under the enriched scenario (NCZ) the non-competitive version of the test has much increased false positive rates. Using the Holm P-value adjustment and both rare (R) and non-rare (NR) e QTL weights, for the real datasets significant gene signals were found in 9 traits, for which we present heatmaps (Supplementary Text S5, Supplementary Figs S6–S23). The number of genes with q-value is presented in Table 1 (for the abbreviations see Supplementary Table S2). Each analysis ran in less than 3 h on a cluster node with 4× Intel Xeon 6 core 2.67 GHz. Signals for real datasets

5 Conclusions

We propose JEPEGMIX2, an updated software/method for testing the association between (cis-eQTL mediated) gene expression and trait. Unlike existing methods, even for highly enriched GWAS, JEPEGMIX2 competitive version fully controls the false positive rates at or below nominal levels. To the applicability of JEPEGMIX to cosmopolitan cohorts, we add a competitive version and extend the number of included (i) eQTLs and (ii) tissues. Unlike existing methods, it also accommodates up weighting of the rare variants and avoids the increased rate of false positives incurred by FDR adjustment (under enrichment) by using a Holm adjustment. While gene expression in different tissues are often correlated and incomplete due to the rather small sample sizes of existing gene expression experiments, the capacity of discriminating causal tissues will be enhanced by further increases in sample size of such studies. Being written in C ++, JEPEGMIX2 is very fast. Future versions of the software will use larger reference panels. Conflict of Interest: none declared. Click here for additional data file.

16 in total

1. Pathway-based approaches for analysis of genomewide association studies.

Authors: Kai Wang; Mingyao Li; Maja Bucan
Journal: Am J Hum Genet Date: 2007-12 Impact factor: 11.025

2. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies.

Authors: Brendan K Bulik-Sullivan; Po-Ru Loh; Hilary K Finucane; Stephan Ripke; Jian Yang; Nick Patterson; Mark J Daly; Alkes L Price; Benjamin M Neale
Journal: Nat Genet Date: 2015-02-02 Impact factor: 38.330

3. Common SNPs explain a large proportion of the heritability for human height.

Authors: Jian Yang; Beben Benyamin; Brian P McEvoy; Scott Gordon; Anjali K Henders; Dale R Nyholt; Pamela A Madden; Andrew C Heath; Nicholas G Martin; Grant W Montgomery; Michael E Goddard; Peter M Visscher
Journal: Nat Genet Date: 2010-06-20 Impact factor: 38.330

Review 4. Using gene expression to investigate the genetic basis of complex disorders.

Authors: Alexandra C Nica; Emmanouil T Dermitzakis
Journal: Hum Mol Genet Date: 2008-10-15 Impact factor: 6.150

5. Genetics of gene expression and its effect on disease.

Authors: Valur Emilsson; Gudmar Thorleifsson; Bin Zhang; Amy S Leonardson; Florian Zink; Jun Zhu; Sonia Carlson; Agnar Helgason; G Bragi Walters; Steinunn Gunnarsdottir; Magali Mouy; Valgerdur Steinthorsdottir; Gudrun H Eiriksdottir; Gyda Bjornsdottir; Inga Reynisdottir; Daniel Gudbjartsson; Anna Helgadottir; Aslaug Jonasdottir; Adalbjorg Jonasdottir; Unnur Styrkarsdottir; Solveig Gretarsdottir; Kristinn P Magnusson; Hreinn Stefansson; Ragnheidur Fossdal; Kristleifur Kristjansson; Hjortur G Gislason; Tryggvi Stefansson; Bjorn G Leifsson; Unnur Thorsteinsdottir; John R Lamb; Jeffrey R Gulcher; Marc L Reitman; Augustine Kong; Eric E Schadt; Kari Stefansson
Journal: Nature Date: 2008-03-16 Impact factor: 49.962

6. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS.

Authors: Dan L Nicolae; Eric Gamazon; Wei Zhang; Shiwei Duan; M Eileen Dolan; Nancy J Cox
Journal: PLoS Genet Date: 2010-04-01 Impact factor: 5.917

7. JEPEGMIX: gene-level joint analysis of functional SNPs in cosmopolitan cohorts.

Authors: Donghyung Lee; Vernell S Williamson; T Bernard Bigdeli; Brien P Riley; Bradley T Webb; Ayman H Fanous; Kenneth S Kendler; Vladimir I Vladimirov; Silviu-Alin Bacanu
Journal: Bioinformatics Date: 2015-10-01 Impact factor: 6.937

8. A groupwise association test for rare mutations using a weighted sum statistic.

Authors: Bo Eskerod Madsen; Sharon R Browning
Journal: PLoS Genet Date: 2009-02-13 Impact factor: 5.917

9. A reference panel of 64,976 haplotypes for genotype imputation.

Authors: Shane McCarthy; Sayantan Das; Warren Kretzschmar; Olivier Delaneau; Andrew R Wood; Alexander Teumer; Hyun Min Kang; Christian Fuchsberger; Petr Danecek; Kevin Sharp; Yang Luo; Carlo Sidore; Alan Kwong; Nicholas Timpson; Seppo Koskinen; Scott Vrieze; Laura J Scott; He Zhang; Anubha Mahajan; Jan Veldink; Ulrike Peters; Carlos Pato; Cornelia M van Duijn; Christopher E Gillies; Ilaria Gandin; Massimo Mezzavilla; Arthur Gilly; Massimiliano Cocca; Michela Traglia; Andrea Angius; Jeffrey C Barrett; Dorrett Boomsma; Kari Branham; Gerome Breen; Chad M Brummett; Fabio Busonero; Harry Campbell; Andrew Chan; Sai Chen; Emily Chew; Francis S Collins; Laura J Corbin; George Davey Smith; George Dedoussis; Marcus Dorr; Aliki-Eleni Farmaki; Luigi Ferrucci; Lukas Forer; Ross M Fraser; Stacey Gabriel; Shawn Levy; Leif Groop; Tabitha Harrison; Andrew Hattersley; Oddgeir L Holmen; Kristian Hveem; Matthias Kretzler; James C Lee; Matt McGue; Thomas Meitinger; David Melzer; Josine L Min; Karen L Mohlke; John B Vincent; Matthias Nauck; Deborah Nickerson; Aarno Palotie; Michele Pato; Nicola Pirastu; Melvin McInnis; J Brent Richards; Cinzia Sala; Veikko Salomaa; David Schlessinger; Sebastian Schoenherr; P Eline Slagboom; Kerrin Small; Timothy Spector; Dwight Stambolian; Marcus Tuke; Jaakko Tuomilehto; Leonard H Van den Berg; Wouter Van Rheenen; Uwe Volker; Cisca Wijmenga; Daniela Toniolo; Eleftheria Zeggini; Paolo Gasparini; Matthew G Sampson; James F Wilson; Timothy Frayling; Paul I W de Bakker; Morris A Swertz; Steven McCarroll; Charles Kooperberg; Annelot Dekker; David Altshuler; Cristen Willer; William Iacono; Samuli Ripatti; Nicole Soranzo; Klaudia Walter; Anand Swaroop; Francesco Cucca; Carl A Anderson; Richard M Myers; Michael Boehnke; Mark I McCarthy; Richard Durbin
Journal: Nat Genet Date: 2016-08-22 Impact factor: 38.330

10. Genome-wide association analysis identifies 13 new risk loci for schizophrenia.

Authors: Stephan Ripke; Colm O'Dushlaine; Kimberly Chambert; Jennifer L Moran; Anna K Kähler; Susanne Akterin; Sarah E Bergen; Ann L Collins; James J Crowley; Menachem Fromer; Yunjung Kim; Sang Hong Lee; Patrik K E Magnusson; Nick Sanchez; Eli A Stahl; Stephanie Williams; Naomi R Wray; Kai Xia; Francesco Bettella; Anders D Borglum; Brendan K Bulik-Sullivan; Paul Cormican; Nick Craddock; Christiaan de Leeuw; Naser Durmishi; Michael Gill; Vera Golimbet; Marian L Hamshere; Peter Holmans; David M Hougaard; Kenneth S Kendler; Kuang Lin; Derek W Morris; Ole Mors; Preben B Mortensen; Benjamin M Neale; Francis A O'Neill; Michael J Owen; Milica Pejovic Milovancevic; Danielle Posthuma; John Powell; Alexander L Richards; Brien P Riley; Douglas Ruderfer; Dan Rujescu; Engilbert Sigurdsson; Teimuraz Silagadze; August B Smit; Hreinn Stefansson; Stacy Steinberg; Jaana Suvisaari; Sarah Tosato; Matthijs Verhage; James T Walters; Douglas F Levinson; Pablo V Gejman; Kenneth S Kendler; Claudine Laurent; Bryan J Mowry; Michael C O'Donovan; Michael J Owen; Ann E Pulver; Brien P Riley; Sibylle G Schwab; Dieter B Wildenauer; Frank Dudbridge; Peter Holmans; Jianxin Shi; Margot Albus; Madeline Alexander; Dominique Campion; David Cohen; Dimitris Dikeos; Jubao Duan; Peter Eichhammer; Stephanie Godard; Mark Hansen; F Bernard Lerer; Kung-Yee Liang; Wolfgang Maier; Jacques Mallet; Deborah A Nertney; Gerald Nestadt; Nadine Norton; Francis A O'Neill; George N Papadimitriou; Robert Ribble; Alan R Sanders; Jeremy M Silverman; Dermot Walsh; Nigel M Williams; Brandon Wormley; Maria J Arranz; Steven Bakker; Stephan Bender; Elvira Bramon; David Collier; Benedicto Crespo-Facorro; Jeremy Hall; Conrad Iyegbe; Assen Jablensky; Rene S Kahn; Luba Kalaydjieva; Stephen Lawrie; Cathryn M Lewis; Kuang Lin; Don H Linszen; Ignacio Mata; Andrew McIntosh; Robin M Murray; Roel A Ophoff; John Powell; Dan Rujescu; Jim Van Os; Muriel Walshe; Matthias Weisbrod; Durk Wiersma; Peter Donnelly; Ines Barroso; Jenefer M Blackwell; Elvira Bramon; Matthew A Brown; Juan P Casas; Aiden P Corvin; Panos Deloukas; Audrey Duncanson; Janusz Jankowski; Hugh S Markus; Christopher G Mathew; Colin N A Palmer; Robert Plomin; Anna Rautanen; Stephen J Sawcer; Richard C Trembath; Ananth C Viswanathan; Nicholas W Wood; Chris C A Spencer; Gavin Band; Céline Bellenguez; Colin Freeman; Garrett Hellenthal; Eleni Giannoulatou; Matti Pirinen; Richard D Pearson; Amy Strange; Zhan Su; Damjan Vukcevic; Peter Donnelly; Cordelia Langford; Sarah E Hunt; Sarah Edkins; Rhian Gwilliam; Hannah Blackburn; Suzannah J Bumpstead; Serge Dronov; Matthew Gillman; Emma Gray; Naomi Hammond; Alagurevathi Jayakumar; Owen T McCann; Jennifer Liddle; Simon C Potter; Radhi Ravindrarajah; Michelle Ricketts; Avazeh Tashakkori-Ghanbaria; Matthew J Waller; Paul Weston; Sara Widaa; Pamela Whittaker; Ines Barroso; Panos Deloukas; Christopher G Mathew; Jenefer M Blackwell; Matthew A Brown; Aiden P Corvin; Mark I McCarthy; Chris C A Spencer; Elvira Bramon; Aiden P Corvin; Michael C O'Donovan; Kari Stefansson; Edward Scolnick; Shaun Purcell; Steven A McCarroll; Pamela Sklar; Christina M Hultman; Patrick F Sullivan
Journal: Nat Genet Date: 2013-08-25 Impact factor: 38.330

2 in total

1. Genome-wide association study of shared liability to anxiety disorders in Army STARRS.

Authors: John M Hettema; Brad Verhulst; Chris Chatzinakos; Silviu-Alin Bacanu; Chia-Yen Chen; Robert J Ursano; Ronald C Kessler; Joel Gelernter; Jordan W Smoller; Feng He; Sonia Jain; Murray B Stein
Journal: Am J Med Genet B Neuropsychiatr Genet Date: 2019-12-30 Impact factor: 3.568

2. TWAS pathway method greatly enhances the number of leads for uncovering the molecular underpinnings of psychiatric disorders.

Authors: Chris Chatzinakos; Foivos Georgiadis; Donghyung Lee; Na Cai; Vladimir I Vladimirov; Anna Docherty; Bradley T Webb; Brien P Riley; Jonathan Flint; Kenneth S Kendler; Nikolaos P Daskalakis; Silviu-Alin Bacanu
Journal: Am J Med Genet B Neuropsychiatr Genet Date: 2020-09-21 Impact factor: 3.568

2 in total