Literature DB >> 20375113

FastEpistasis: a high performance computing solution for quantitative trait epistasis.

Thierry Schüpbach1, Ioannis Xenarios, Sven Bergmann, Karen Kapur.   

Abstract

MOTIVATION: Genome-wide association studies have become widely used tools to study effects of genetic variants on complex diseases. While it is of great interest to extend existing analysis methods by considering interaction effects between pairs of loci, the large number of possible tests presents a significant computational challenge. The number of computations is further multiplied in the study of gene expression quantitative trait mapping, in which tests are performed for thousands of gene phenotypes simultaneously.
RESULTS: We present FastEpistasis, an efficient parallel solution extending the PLINK epistasis module, designed to test for epistasis effects when analyzing continuous phenotypes. Our results show that the algorithm scales with the number of processors and offers a reduction in computation time when several phenotypes are analyzed simultaneously. FastEpistasis is capable of testing the association of a continuous trait with all single nucleotide polymorphism (SNP) pairs from 500 000 SNPs, totaling 125 billion tests, in a population of 5000 individuals in 29, 4 or 0.5 days using 8, 64 or 512 processors. AVAILABILITY: FastEpistasis is open source and available free of charge only for non-commercial users from http://www.vital-it.ch/software/FastEpistasis.

Entities:  

Mesh:

Year:  2010        PMID: 20375113      PMCID: PMC2872003          DOI: 10.1093/bioinformatics/btq147

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

Genome-wide association studies (GWASs) have been instrumental in identifying genetic variants associated with complex traits such as human disease or gene expression phenotypes (Hirschhorn et al., 2005). While many GWAS results have been reported analyzing single nucleotide polymorphisms (SNPs) one-at-a-time, only recently have studies begun to extend analysis methods to consider interaction effects between pairs of loci (Cordell, 2009; Curtis, 2007; Emily et al., 2009; Gayan et al., 2008; Herold et al., 2009). Although interactions may yield new insight into the effect of genetics on complex traits (Manolio et al., 2009), a major challenge to studying interactions is due to the large number of possible tests, which need to be considered. Examining all pair-wise interactions between two SNP loci using a 500000 SNP chip equates to performing 125 billion tests. Additionally, carrying out permutation tests or studying epistasis in the context of quantitative trait mapping of gene expression, in which genetic variants are tested for association with each of thousands of phenotypes simultaneously (Franke et al., 2009), further increases the number of epistasis tests. Efficient software is needed to carry out the large number of tests of interaction using quantitative responses. Although several software programs have been proposed to search for interactions in case–control data (Greene et al., 2010; Zhang et al., 2009), few have been optimized to handle continuous responses. In this article, we describe FastEpistasis, an optimized software suite designed for quantitative responses, which extends PLINK (Purcell et al., 2007) epistasis functionality. FastEpistasis uses a parallel algorithm that is capable of computing tests for all pairs of genome-wide SNPs and efficiently handles tests given multiple phenotypes.

2 METHODS

FastEpistasis, a software tool capable of computing tests of epistasis for a large number of SNP pairs, is an efficient parallel extension to the PLINK epistasis module. It tests epistatic effects in the normal linear regression of a quantitative response on marginal effects of each SNP and an interaction effect of the SNP pair, where SNPs are coded as additive effects, taking values 0,1 or 2. The test for epistasis reduces to testing whether the interaction term is significantly different from zero. FastEpistasis methods are briefly outlined, with further details provided in the Supplementary Material. The computations are optimized by splitting the analysis tasks into three separate applications: pre-, core- and post-computation. The pre-computation phase loads PLINK binary format data files, reformats the data for faster computations and reduces the number of conditions to check in the core phase. The core computational phase is designed to embarrassingly parallelize the computations, iterating through SNP pairs and efficiently carrying out the tests for epistasis. The computations are based on applying the QR decomposition to derive least squares estimates of the interaction coefficient and its standard error. The core computation software comes in several versions to take advantage of different high-performance architectures—a Symmetric Multiprocessing (SMP) version and a clustered Message Passing Interface (MPI) version. An optional post-computation phase is provided to aggregate results from each processor or core, include detailed SNP information, compute P-values from each test, and convert to text files. We assessed the performance of our software using International HapMap Project genotypes (Frazer et al., 2007) and random phenotypes (see supplementary material for details). Unless stated otherwise, results from all SNP pair epistasis tests are output.

3 RESULTS

We compared the performance of FastEpistasis and PLINK epistasis tests for several sets of SNP pairs, using a single core to enable a fair comparison. FastEpistasis ran almost 15 times faster than PLINK, completing 81 376 epistasis tests per second compared to 5696 tests per second computed by PLINK (see Supplementary Table 1). In the event that only SNP pair results below a P-value threshold are needed, requiring a negligible time for post-computation, FastEpistasis computes about 120 000 epistasis tests per second, ∼20 times faster than PLINK (also see below for output size effect in multiple phenotype analysis). However, the gain in performance depends on the number of individuals in the population as shown in Table 1 and Supplementary Figure 1. With the exception of Not A Number PLINK output, all FastEpistasis results agree perfectly with PLINK.
Table 1.

Epistasis tests per second completed by FastEpistasis core computation phase for several population sizes, using eight cores

Individuals103 tests (s)Individuals103 tests (s)
601393.14 (82.7)1000289.44 (3.7)
1001214.15 (38.4)300081.00 (0.7)
500538.59 (3.9)500045.56 (0.4)

Averages are taken over 10 runs with SDs in parentheses. SNP pairs are derived from disjoint sets A, B containing 19 999 and 2596 SNPs.

Epistasis tests per second completed by FastEpistasis core computation phase for several population sizes, using eight cores Averages are taken over 10 runs with SDs in parentheses. SNP pairs are derived from disjoint sets A, B containing 19 999 and 2596 SNPs. The speed of FastEpistasis scales linearly with the number of processors at 93% asymptotical efficiency, using either SMP or MPI architecture (see Supplementary Fig. 2). At this rate, the computational time required to test all pairs of 500 000 SNPs, totaling 125 billion tests, using a population of 5000 individuals is about 29, 4 or 0.5 days using 8, 64 or 512 MPI-bound processors. FastEpistasis is capable of analyzing several different phenotypes simultaneously, using the same genotypes. By performing the QR decomposition of the covariate matrix once and applying the result to several phenotypes, the total number of computations is reduced compared to carrying out the computations separately for each phenotype. Although we observe a significant speed-up with multiple phenotypes, the performance reaches a peak and then collapses, and becomes a penalty as the number of phenotypes grows (Supplementary Fig. 3). The problem occurs during the core-computation phase and is due to the size of the results. The processors are able to compute the test statistics faster than the results can be buffered and transferred to the hard drive. Completely omitting to output the results removes the performance collapse. The reduction in computational time analyzing several phenotypes simultaneously depends on several factors including the speed of the epistasis tests (which in turn depends on the number of individuals) and the number of results to be output. For example, using 8 processors, a population size of 171 MKK individuals, 10 phenotypes, and outputting all epistasis results, the computations are 1.06 times faster than analyzing each phenotype separately whereas outputting results for P < 0.01, ∼ 1% of tests, the computations are 4.77 times faster. Therefore, restricting the output to P-values below a relatively small threshold, or increasing storage throughput using a striped disk RAID array, for example, can decrease computational demands when analyzing multiple phenotypes.

4 DISCUSSION

Epistasis is fundamental to understanding the structure and function of genetic pathways (Phillips, 2008). Recent studies have reported epistatic effects that confer susceptibility to common diseases (Emily et al., 2009; Wu et al., 2010). Genetic interactions may also be able to explain a larger proportion of phenotypic variance for common diseases or related traits (Manolio et al., 2009) or reveal information about gene function (Franke et al., 2009). FastEpistasis is capable of computing fast tests of epistasis for quantitative phenotypes, enabling researchers to study interaction effects of pairs of genetic loci.
  14 in total

Review 1.  Genome-wide association studies for common diseases and complex traits.

Authors:  Joel N Hirschhorn; Mark J Daly
Journal:  Nat Rev Genet       Date:  2005-02       Impact factor: 53.242

2.  PLINK: a tool set for whole-genome association and population-based linkage analyses.

Authors:  Shaun Purcell; Benjamin Neale; Kathe Todd-Brown; Lori Thomas; Manuel A R Ferreira; David Bender; Julian Maller; Pamela Sklar; Paul I W de Bakker; Mark J Daly; Pak C Sham
Journal:  Am J Hum Genet       Date:  2007-07-25       Impact factor: 11.025

3.  Using biological networks to search for interacting loci in genome-wide association studies.

Authors:  Mathieu Emily; Thomas Mailund; Jotun Hein; Leif Schauser; Mikkel Heide Schierup
Journal:  Eur J Hum Genet       Date:  2009-03-11       Impact factor: 4.246

4.  Screen and clean: a tool for identifying interactions in genome-wide association studies.

Authors:  Jing Wu; Bernie Devlin; Steven Ringquist; Massimo Trucco; Kathryn Roeder
Journal:  Genet Epidemiol       Date:  2010-04       Impact factor: 2.135

Review 5.  Epistasis--the essential role of gene interactions in the structure and evolution of genetic systems.

Authors:  Patrick C Phillips
Journal:  Nat Rev Genet       Date:  2008-11       Impact factor: 53.242

Review 6.  eQTL analysis in humans.

Authors:  Lude Franke; Ritsert C Jansen
Journal:  Methods Mol Biol       Date:  2009

Review 7.  Finding the missing heritability of complex diseases.

Authors:  Teri A Manolio; Francis S Collins; Nancy J Cox; David B Goldstein; Lucia A Hindorff; David J Hunter; Mark I McCarthy; Erin M Ramos; Lon R Cardon; Aravinda Chakravarti; Judy H Cho; Alan E Guttmacher; Augustine Kong; Leonid Kruglyak; Elaine Mardis; Charles N Rotimi; Montgomery Slatkin; David Valle; Alice S Whittemore; Michael Boehnke; Andrew G Clark; Evan E Eichler; Greg Gibson; Jonathan L Haines; Trudy F C Mackay; Steven A McCarroll; Peter M Visscher
Journal:  Nature       Date:  2009-10-08       Impact factor: 49.962

8.  A second generation human haplotype map of over 3.1 million SNPs.

Authors:  Kelly A Frazer; Dennis G Ballinger; David R Cox; David A Hinds; Laura L Stuve; Richard A Gibbs; John W Belmont; Andrew Boudreau; Paul Hardenbol; Suzanne M Leal; Shiran Pasternak; David A Wheeler; Thomas D Willis; Fuli Yu; Huanming Yang; Changqing Zeng; Yang Gao; Haoran Hu; Weitao Hu; Chaohua Li; Wei Lin; Siqi Liu; Hao Pan; Xiaoli Tang; Jian Wang; Wei Wang; Jun Yu; Bo Zhang; Qingrun Zhang; Hongbin Zhao; Hui Zhao; Jun Zhou; Stacey B Gabriel; Rachel Barry; Brendan Blumenstiel; Amy Camargo; Matthew Defelice; Maura Faggart; Mary Goyette; Supriya Gupta; Jamie Moore; Huy Nguyen; Robert C Onofrio; Melissa Parkin; Jessica Roy; Erich Stahl; Ellen Winchester; Liuda Ziaugra; David Altshuler; Yan Shen; Zhijian Yao; Wei Huang; Xun Chu; Yungang He; Li Jin; Yangfan Liu; Yayun Shen; Weiwei Sun; Haifeng Wang; Yi Wang; Ying Wang; Xiaoyan Xiong; Liang Xu; Mary M Y Waye; Stephen K W Tsui; Hong Xue; J Tze-Fei Wong; Luana M Galver; Jian-Bing Fan; Kevin Gunderson; Sarah S Murray; Arnold R Oliphant; Mark S Chee; Alexandre Montpetit; Fanny Chagnon; Vincent Ferretti; Martin Leboeuf; Jean-François Olivier; Michael S Phillips; Stéphanie Roumy; Clémentine Sallée; Andrei Verner; Thomas J Hudson; Pui-Yan Kwok; Dongmei Cai; Daniel C Koboldt; Raymond D Miller; Ludmila Pawlikowska; Patricia Taillon-Miller; Ming Xiao; Lap-Chee Tsui; William Mak; You Qiang Song; Paul K H Tam; Yusuke Nakamura; Takahisa Kawaguchi; Takuya Kitamoto; Takashi Morizono; Atsushi Nagashima; Yozo Ohnishi; Akihiro Sekine; Toshihiro Tanaka; Tatsuhiko Tsunoda; Panos Deloukas; Christine P Bird; Marcos Delgado; Emmanouil T Dermitzakis; Rhian Gwilliam; Sarah Hunt; Jonathan Morrison; Don Powell; Barbara E Stranger; Pamela Whittaker; David R Bentley; Mark J Daly; Paul I W de Bakker; Jeff Barrett; Yves R Chretien; Julian Maller; Steve McCarroll; Nick Patterson; Itsik Pe'er; Alkes Price; Shaun Purcell; Daniel J Richter; Pardis Sabeti; Richa Saxena; Stephen F Schaffner; Pak C Sham; Patrick Varilly; David Altshuler; Lincoln D Stein; Lalitha Krishnan; Albert Vernon Smith; Marcela K Tello-Ruiz; Gudmundur A Thorisson; Aravinda Chakravarti; Peter E Chen; David J Cutler; Carl S Kashuk; Shin Lin; Gonçalo R Abecasis; Weihua Guan; Yun Li; Heather M Munro; Zhaohui Steve Qin; Daryl J Thomas; Gilean McVean; Adam Auton; Leonardo Bottolo; Niall Cardin; Susana Eyheramendy; Colin Freeman; Jonathan Marchini; Simon Myers; Chris Spencer; Matthew Stephens; Peter Donnelly; Lon R Cardon; Geraldine Clarke; David M Evans; Andrew P Morris; Bruce S Weir; Tatsuhiko Tsunoda; James C Mullikin; Stephen T Sherry; Michael Feolo; Andrew Skol; Houcan Zhang; Changqing Zeng; Hui Zhao; Ichiro Matsuda; Yoshimitsu Fukushima; Darryl R Macer; Eiko Suda; Charles N Rotimi; Clement A Adebamowo; Ike Ajayi; Toyin Aniagwu; Patricia A Marshall; Chibuzor Nkwodimmah; Charmaine D M Royal; Mark F Leppert; Missy Dixon; Andy Peiffer; Renzong Qiu; Alastair Kent; Kazuto Kato; Norio Niikawa; Isaac F Adewole; Bartha M Knoppers; Morris W Foster; Ellen Wright Clayton; Jessica Watkin; Richard A Gibbs; John W Belmont; Donna Muzny; Lynne Nazareth; Erica Sodergren; George M Weinstock; David A Wheeler; Imtaz Yakub; Stacey B Gabriel; Robert C Onofrio; Daniel J Richter; Liuda Ziaugra; Bruce W Birren; Mark J Daly; David Altshuler; Richard K Wilson; Lucinda L Fulton; Jane Rogers; John Burton; Nigel P Carter; Christopher M Clee; Mark Griffiths; Matthew C Jones; Kirsten McLay; Robert W Plumb; Mark T Ross; Sarah K Sims; David L Willey; Zhu Chen; Hua Han; Le Kang; Martin Godbout; John C Wallenburg; Paul L'Archevêque; Guy Bellemare; Koji Saeki; Hongguang Wang; Daochang An; Hongbo Fu; Qing Li; Zhen Wang; Renwu Wang; Arthur L Holden; Lisa D Brooks; Jean E McEwen; Mark S Guyer; Vivian Ota Wang; Jane L Peterson; Michael Shi; Jack Spiegel; Lawrence M Sung; Lynn F Zacharia; Francis S Collins; Karen Kennedy; Ruth Jamieson; John Stewart
Journal:  Nature       Date:  2007-10-18       Impact factor: 49.962

9.  Allelic association studies of genome wide association data can reveal errors in marker position assignments.

Authors:  David Curtis
Journal:  BMC Genet       Date:  2007-06-08       Impact factor: 2.797

10.  A method for detecting epistasis in genome-wide studies using case-control multi-locus association analysis.

Authors:  Javier Gayán; Antonio González-Pérez; Fernando Bermudo; María Eugenia Sáez; Jose Luis Royo; Antonio Quintas; Jose Jorge Galan; Francisco Jesús Morón; Reposo Ramirez-Lorca; Luis Miguel Real; Agustín Ruiz
Journal:  BMC Genomics       Date:  2008-07-31       Impact factor: 3.969

View more
  47 in total

1.  EPIBLASTER-fast exhaustive two-locus epistasis detection strategy using graphical processing units.

Authors:  Tony Kam-Thong; Darina Czamara; Koji Tsuda; Karsten Borgwardt; Cathryn M Lewis; Angelika Erhardt-Lehmann; Bernhard Hemmer; Peter Rieckmann; Markus Daake; Frank Weber; Christiane Wolf; Andreas Ziegler; Benno Pütz; Florian Holsboer; Bernhard Schölkopf; Bertram Müller-Myhsok
Journal:  Eur J Hum Genet       Date:  2010-12-08       Impact factor: 4.246

2.  Gene-Gene Interactions Detection Using a Two-stage Model.

Authors:  Zhanyong Wang; Jae Hoon Sul; Sagi Snir; Jose A Lozano; Eleazar Eskin
Journal:  J Comput Biol       Date:  2015-04-14       Impact factor: 1.479

3.  AA9int: SNP interaction pattern search using non-hierarchical additive model set.

Authors:  Hui-Yi Lin; Po-Yu Huang; Dung-Tsa Chen; Heng-Yuan Tung; Thomas A Sellers; Julio M Pow-Sang; Rosalind Eeles; Doug Easton; Zsofia Kote-Jarai; Ali Amin Al Olama; Sara Benlloch; Kenneth Muir; Graham G Giles; Fredrik Wiklund; Henrik Gronberg; Christopher A Haiman; Johanna Schleutker; Børge G Nordestgaard; Ruth C Travis; Freddie Hamdy; David E Neal; Nora Pashayan; Kay-Tee Khaw; Janet L Stanford; William J Blot; Stephen N Thibodeau; Christiane Maier; Adam S Kibel; Cezary Cybulski; Lisa Cannon-Albright; Hermann Brenner; Radka Kaneva; Jyotsna Batra; Manuel R Teixeira; Hardev Pandha; Yong-Jie Lu; Jong Y Park
Journal:  Bioinformatics       Date:  2018-12-15       Impact factor: 6.937

4.  A parallelized strategy for epistasis analysis based on Empirical Bayesian Elastic Net models.

Authors:  Jia Wen; Colby T Ford; Daniel Janies; Xinghua Shi
Journal:  Bioinformatics       Date:  2020-06-01       Impact factor: 6.937

5.  Genetic architecture of natural variation in Drosophila melanogaster aggressive behavior.

Authors:  John Shorter; Charlene Couch; Wen Huang; Mary Anna Carbone; Jason Peiffer; Robert R H Anholt; Trudy F C Mackay
Journal:  Proc Natl Acad Sci U S A       Date:  2015-06-22       Impact factor: 11.205

6.  Multiple susceptibility loci at chromosome 11q23.3 are associated with plasma triglyceride in East Asians.

Authors:  Bayasgalan Gombojav; Soo Ji Lee; Minjung Kho; Yun-Mi Song; Kayoung Lee; Joohon Sung
Journal:  J Lipid Res       Date:  2015-12-03       Impact factor: 5.922

7.  Dissection of allelic interactions among Pto-miR257 and its targets and their effects on growth and wood properties in Populus.

Authors:  B Chen; Q Du; J Chen; X Yang; J Tian; B Li; D Zhang
Journal:  Heredity (Edinb)       Date:  2016-04-27       Impact factor: 3.821

8.  Epistasis dominates the genetic architecture of Drosophila quantitative traits.

Authors:  Wen Huang; Stephen Richards; Mary Anna Carbone; Dianhui Zhu; Robert R H Anholt; Julien F Ayroles; Laura Duncan; Katherine W Jordan; Faye Lawrence; Michael M Magwire; Crystal B Warner; Kerstin Blankenburg; Yi Han; Mehwish Javaid; Joy Jayaseelan; Shalini N Jhangiani; Donna Muzny; Fiona Ongeri; Lora Perales; Yuan-Qing Wu; Yiqing Zhang; Xiaoyan Zou; Eric A Stone; Richard A Gibbs; Trudy F C Mackay
Journal:  Proc Natl Acad Sci U S A       Date:  2012-09-04       Impact factor: 11.205

Review 9.  Towards Precision Medicine for Hypertension: A Review of Genomic, Epigenomic, and Microbiomic Effects on Blood Pressure in Experimental Rat Models and Humans.

Authors:  Sandosh Padmanabhan; Bina Joe
Journal:  Physiol Rev       Date:  2017-10-01       Impact factor: 37.312

Review 10.  Network approaches to systems biology analysis of complex disease: integrative methods for multi-omics data.

Authors:  Jingwen Yan; Shannon L Risacher; Li Shen; Andrew J Saykin
Journal:  Brief Bioinform       Date:  2018-11-27       Impact factor: 11.622

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.