Literature DB >> 27412093

LAMPLINK: detection of statistically significant SNP combinations from GWAS data.

Aika Terada1,2,3, Ryo Yamada4, Koji Tsuda2,3,5, Jun Sese3,6.   

Abstract

One of the major issues in genome-wide association studies is to solve the missing heritability problem. While considering epistatic interactions among multiple SNPs may contribute to solving this problem, existing software cannot detect statistically significant high-order interactions. We propose software named LAMPLINK, which employs a cutting-edge method to enumerate statistically significant SNP combinations from genome-wide case-control data. LAMPLINK is implemented as a set of additional functions to PLINK, and hence existing procedures with PLINK can be applicable. Applied to the 1000 Genomes Project data, LAMPLINK detected a combination of five SNPs that are statistically significantly accumulated in the Japanese population.
AVAILABILITY AND IMPLEMENTATION: LAMPLINK is available at http://a-terada.github.io/lamplink/ CONTACT: terada@cbms.k.u-tokyo.ac.jp or sese.jun@aist.go.jpSupplementary information: Supplementary data are available at Bioinformatics online.
© The Author 2016. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2016        PMID: 27412093      PMCID: PMC5181558          DOI: 10.1093/bioinformatics/btw418

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Background

Genome-wide association studies (GWASs) have identified hundreds of loci associated with various complex human traits (Welter ). These studies conduct screening of individual single nucleotide polymorphisms (SNPs) using statistical tests to assess the association of each SNP with a phenotype. However, this procedure is known to cause the ‘missing heritability’, namely, a large proportion of heritability remains unexplained by loci identified (Maher, 2008), hence it is increasingly important to evaluate combinatorial effects of SNPs (Wei ). Several types of software have been developed to detect interactions among SNPs related to a phenotype (Purcell ; Zhang and Liu, 2007; Calle ; Wan ; Kam-Thong ; Van Lishout ). However, few methods can simultaneously overcome two major problems. One is that statistical validity is not performed. Most methods that can enumerate higher-order interactions do not evaluate statistical significance of the results. The other is that a combination size is limited in practical application. Existing statistical techniques such as logistic regression and multifactor dimensionality reduction can be used to find combinatorial effects. When we investigate all combinatorial effects, these techniques have to be applied to all possible combinations, which is too computationally intensive. Both problems need to be overcome if high-order interaction analysis is to be successfully performed. A recently proposed statistical method called Limitless Arity Multiple-testing Procedure (LAMP) (Terada ) provides a possibility of detecting statistically significant higher-order interactions. LAMP is a multiple testing procedure for listing statistically significant combinatorial effects by introducing a theoretical upper bound of family-wise error rate tighter than Bonferroni correction. Its application to GWAS analysis may uncover synergistic effects of SNPs associated with diseases. We therefore developed LAMPLINK, a software that incorporates LAMP with a widely used GWAS analysis software PLINK (Purcell ). Applied to the 1000 Genomes Project data, it detected a combination of five SNPs accumulated in the Japanese population with statistical significance.

2 Methods and implementation

LAMPLINK is implemented by adding options for detecting statistically significant high-order interactions of SNPs to PLINK (version 1.07), allowing for use of all options and files in LAMPLINK. Figure 1 shows a typical analytical procedure for detecting SNP combinations using LAMPLINK. LAMPLINK performs a case–control analysis for GWAS data using Fisher’s exact test or chi-squared test, and enumerates statistically significant combinations associated with a given phenotype. The additional options are shown in Supplementary Table S1, and the details of LAMPLINK are described in Supplementary Text. LAMPLINK runs with C and Python 2.7 on Linux.
Fig. 1.

Overview of LAMPLINK. (a) Workflow to detect statistically significant SNP combinations. (b) Two significant combinations including three and five SNPs (IDs 3 and 10 in Supplementary Table S3) detected in Procedure 1. Each petal corresponds to the P-value of a single SNP, and the central circle represents that of the SNP combination. Color shows the adjusted P-values. The P-value of the combination was smaller than the P-values of any single SNP, suggesting the existence of an epistatic effect among the three SNPs. (c) Detected combination after Procedure 2. ID 10 has been eliminated because it includes pairs of SNPs whose . (d) Manhattan plot of P-values from the test of the association between the Japanese population and other populations. Crosses represent significant SNP combinations in (b). The horizontal line indicates the adjusted significance level (5.49e-07)

Overview of LAMPLINK. (a) Workflow to detect statistically significant SNP combinations. (b) Two significant combinations including three and five SNPs (IDs 3 and 10 in Supplementary Table S3) detected in Procedure 1. Each petal corresponds to the P-value of a single SNP, and the central circle represents that of the SNP combination. Color shows the adjusted P-values. The P-value of the combination was smaller than the P-values of any single SNP, suggesting the existence of an epistatic effect among the three SNPs. (c) Detected combination after Procedure 2. ID 10 has been eliminated because it includes pairs of SNPs whose . (d) Manhattan plot of P-values from the test of the association between the Japanese population and other populations. Crosses represent significant SNP combinations in (b). The horizontal line indicates the adjusted significance level (5.49e-07)

2.1 Detection of statistically significant SNP combinations

The ––lamp option with ––model-dom (or ––model-rec) can be used for enumerating statistically significant SNP combinations (Procedure 1 in Fig. 1a). The input and output filenames are specified with the ––file (or ––bfile for binary format) and ––out options, respectively. When you set ––model-dom, LAMPLINK detects statistically significant combinations of SNPs according to a dominant exclusive model, whereas ––model-rec uses a recessive exclusive model. These two genetic models are defined in Supplementary Text. LAMPLINK results are exported to files: ‘.lamp’ and ‘.lamplink’. The former file reports all SNP combinations statistically significantly associated with the phenotype. The latter file reports detailed information about each SNP in a format similar to the result generated by PLINK for association analysis. All columns of the result files are listed in Supplementary Table S2.

2.2 Elimination of redundant SNP combinations

Procedure 1 may end up listing combinations of SNPs that are in the same linkage disequilibrium (LD) region, which may prevent understanding of SNP–phenotype associations. The ––lamp-ld-remove option is useful to filter out uninformative combinations (Procedure 2 in Fig. 1a). Using this option eliminates SNP combinations whose members have r2 higher than the user-specified threshold, on the assumption that they are located in the same LD region. If all r2 scores computed for SNP pairs in each chromosome are higher than the threshold, the combination is removed.

3 Analysis of exome data

We applied LAMPLINK to human exome data provided by the 1000 Genomes Project (The 1000 Genomes Project Consortium, 2010), including 12 758 SNPs and 697 individuals from seven populations. We demonstrated a case–control study by regarding 105 Japanese individuals as cases and the remaining as controls. We detected combinations of SNPs accumulated in Japanese with statistical significance. All of the settings and commands for this demonstration are described in Supplementary Text. The experiments were run on a machine with an Intel Xeon E5-2680v2 processor at 2.6 GHz running Red Hat Enterprise Linux 6.4. We compared the time performance of LAMPLINK with the ––epistatic option in PLINK, which exhaustively analyzes the relationship of pairs of SNPs to a phenotype. The calculation time of LAMPLINK was 21.281 s, whereas PLINK required over 150 min to investigate all pairs of SNPs, showing that LAMPLINK has the ability to identify combinatorial effects of SNPs within a short time despite investigating all possible combinations of SNPs. A detailed time performance analysis of LAMPLINK is provided in Supplementary Text. Procedure 1 detected 106 statistically significant SNP combinations, including 10 SNP combinations that consisted of three or more SNPs (Supplementary Table S3). These combinations could not be detected by PLINK. Figure 1(b) illustrates two statistically significant combinations (IDs 3 and 10 in Supplementary Table S3). ID 3 consisted of three SNPs located within the genes PCDHGA1, VPS13C and VNN3. These SNPs are located within different genes on different chromosomes (Fig. 1d). ID 10 consisted of five SNPs. Four SNPs are located within the same gene MTRR, and hence this combination is eliminated in Procedure 2 (Fig. 1c). We discuss these results in detail in Supplementary Text. These two results show that LAMPLINK has the ability to detect statistically significant SNP combinations from genome-wide case–control data. By replacing the phenotype with a disease, it might be possible to identify causal mutations of complex diseases. LAMPLINK is the first implementation that can detect statistically sound high-order interactions from tens of thousands of markers. Hence, LAMPLINK may contribute to the identification of combinatorial effects from multiple markers by re-analysis of existing GWAS datasets.

4 Future work

LAMPLINK currently supports two genetic models (dominant and recessive exclusive models), but it cannot handle the combination of recessive and dominant models (known as the jointly recessive-dominant model for two loci; Li and Reich, 2000) due to a theoretical limitation in LAMP. Future work includes supporting the jointly recessive-dominant model as well as the threshold (Greenberg, 1981) and additive models (Neuman and Rice, 1992), which may help solving the problem of missing heritability. We also plan to incorporate other statistical models into LAMPLINK for analyzing various types of data. For example, statistical assessment using the Mann–Whitney U-test or a regression model is useful to analyze numerical traits data. Additionally, LAMP has been developed to avoid spurious results caused by a confounding variable (e.g. age or gender of patients) (Terada ). Incorporating these methods will greatly improve the versatility of our software.
  14 in total

1.  A complete enumeration and classification of two-locus disease models.

Authors:  W Li; J Reich
Journal:  Hum Hered       Date:  2000 Nov-Dec       Impact factor: 0.444

2.  Two-locus models of disease.

Authors:  R J Neuman; J P Rice
Journal:  Genet Epidemiol       Date:  1992       Impact factor: 2.135

3.  BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies.

Authors:  Xiang Wan; Can Yang; Qiang Yang; Hong Xue; Xiaodan Fan; Nelson L S Tang; Weichuan Yu
Journal:  Am J Hum Genet       Date:  2010-09-10       Impact factor: 11.025

4.  Bayesian inference of epistatic interactions in case-control studies.

Authors:  Yu Zhang; Jun S Liu
Journal:  Nat Genet       Date:  2007-08-26       Impact factor: 38.330

5.  Personal genomes: The case of the missing heritability.

Authors:  Brendan Maher
Journal:  Nature       Date:  2008-11-06       Impact factor: 49.962

6.  GLIDE: GPU-based linear regression for detection of epistasis.

Authors:  Tony Kam-Thong; Chloé-Agathe Azencott; Lawrence Cayton; Benno Pütz; André Altmann; Nazanin Karbalai; Philipp G Sämann; Bernhard Schölkopf; Bertram Müller-Myhsok; Karsten M Borgwardt
Journal:  Hum Hered       Date:  2012-09-04       Impact factor: 0.444

7.  A simple method for testing two-locus models of inheritance.

Authors:  D A Greenberg
Journal:  Am J Hum Genet       Date:  1981-07       Impact factor: 11.025

8.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations.

Authors:  Danielle Welter; Jacqueline MacArthur; Joannella Morales; Tony Burdett; Peggy Hall; Heather Junkins; Alan Klemm; Paul Flicek; Teri Manolio; Lucia Hindorff; Helen Parkinson
Journal:  Nucleic Acids Res       Date:  2013-12-06       Impact factor: 16.971

Review 9.  Detecting epistasis in human complex traits.

Authors:  Wen-Hua Wei; Gibran Hemani; Chris S Haley
Journal:  Nat Rev Genet       Date:  2014-09-09       Impact factor: 53.242

10.  An efficient algorithm to perform multiple testing in epistasis screening.

Authors:  François Van Lishout; Jestinah M Mahachie John; Elena S Gusareva; Victor Urrea; Isabelle Cleynen; Emilie Théâtre; Benoît Charloteaux; Malu Luz Calle; Louis Wehenkel; Kristel Van Steen
Journal:  BMC Bioinformatics       Date:  2013-04-24       Impact factor: 3.169

View more
  6 in total

1.  MP-LAMP: parallel detection of statistically significant multi-loci markers on cloud platforms.

Authors:  Kazuki Yoshizoe; Aika Terada; Koji Tsuda
Journal:  Bioinformatics       Date:  2018-09-01       Impact factor: 6.937

2.  SAMA: A Fast Self-Adaptive Memetic Algorithm for Detecting SNP-SNP Interactions Associated with Disease.

Authors:  Ying Yin; Boxin Guan; Yuhai Zhao; Yuan Li
Journal:  Biomed Res Int       Date:  2020-08-24       Impact factor: 3.411

3.  Self-Adjusting Ant Colony Optimization Based on Information Entropy for Detecting Epistatic Interactions.

Authors:  Boxin Guan; Yuhai Zhao
Journal:  Genes (Basel)       Date:  2019-02-01       Impact factor: 4.096

4.  A Fast and Scalable Workflow for SNPs Detection in Genome Sequences Using Hadoop Map-Reduce.

Authors:  Muhammad Tahir; Muhammad Sardaraz
Journal:  Genes (Basel)       Date:  2020-02-05       Impact factor: 4.096

5.  Machine learning to reveal hidden risk combinations for the trajectory of posttraumatic stress disorder symptoms.

Authors:  Yuta Takahashi; Kazuki Yoshizoe; Masao Ueki; Gen Tamiya; Yu Zhiqian; Yusuke Utsumi; Atsushi Sakuma; Koji Tsuda; Atsushi Hozawa; Ichiro Tsuji; Hiroaki Tomita
Journal:  Sci Rep       Date:  2020-12-10       Impact factor: 4.379

6.  Pharmacogenetics of Praziquantel Metabolism: Evaluating the Cytochrome P450 Genes of Zimbabwean Patients During a Schistosomiasis Treatment.

Authors:  Grace Zdesenko; Takafira Mduluza; Francisca Mutapi
Journal:  Front Genet       Date:  2022-06-08       Impact factor: 4.772

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.