Literature DB >> 25655172

Filter-free exhaustive odds ratio-based genome-wide interaction approach pinpoints evidence for interaction in the HLA region in psoriasis.

Laura Grange1,2,3, Jean-François Bureau4,5, Iryna Nikolayeva6,7,8, Richard Paul9,10, Kristel Van Steen11,12, Benno Schwikowski13, Anavaj Sakuntabhai14,15.   

Abstract

BACKGROUND: Deciphering the genetic architecture of complex traits is still a major challenge for human genetics. In most cases, genome-wide association studies have only partially explained the heritability of traits and diseases. Epistasis, one potentially important cause of this missing heritability, is difficult to explore at the genome-wide level. Here, we develop and assess a tool based on interactive odds ratios (IOR), Fast Odds Ratio-based sCan for Epistasis (FORCE), as a novel approach for exhaustive genome-wide epistasis search. IOR is the ratio between the multiplicative term of the odds ratio (OR) of having each variant over the OR of having both of them. By definition, an IOR that significantly deviates from 1 suggests the occurrence of an interaction (epistasis). As the IOR is fast to calculate, we used the IOR to rank and select pairs of interacting polymorphisms for P value estimation, which is more time consuming.
RESULTS: FORCE displayed power and accuracy similar to existing parametric and non-parametric methods, and is fast enough to complete a filter-free genome-wide epistasis search in a few days on a standard computer. Analysis of psoriasis data uncovered novel epistatic interactions in the HLA region, corroborating the known major and complex role of the HLA region in psoriasis susceptibility.
CONCLUSIONS: Our systematic study revealed the ability of FORCE to uncover novel interactions, highlighted the importance of exhaustiveness, as well as its specificity for certain types of interactions that were not detected by existing approaches. We therefore believe that FORCE is a valuable new tool for decoding the genetic basis of complex diseases.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 25655172      PMCID: PMC4341885          DOI: 10.1186/s12863-015-0174-3

Source DB:  PubMed          Journal:  BMC Genet        ISSN: 1471-2156            Impact factor:   2.797


Background

During the past decade, many genome-wide association studies (GWAS) have aimed to identify new genetic factors determining susceptibility to a variety of diseases [1,2]. Although promising and sometimes successful, these large-scale studies have only led to modest advances [3]. One explanation is that the underlying model that single SNPs contribute independently to the complex trait may frequently be too simple. Rather, complex traits are likely to result from a complex interplay between genes, notably epistatic gene-environment and gene-gene interactions [4]. The principal obstacles in a genome-wide search for epistasis are statistical power to overcome the limitations of multiple testing and the computational time of the search itself. Over the past decades, many tools have been developed for epistasis detection using various statistical methods [5,6], including those based on regression [7-11], linkage disequilibrium and haplotypes [12,13], and Bayesian approaches [14,15]. Alternative approaches are based on data-filtering, machine-learning and data mining [16-19]. Here, we present an approach that detects pairwise epistasis on a genome-wide scale based on the classical interaction odds ratio (IOR). Introduced by Piegorsch et al. in 1994 [20], this approach has mainly been used for the detection of gene-environment interactions in case-only designs [21]. VanderWeele et al. [22] showed how the use of IOR can help reveal mechanistic interactions in case-only datasets. Firstly, we report on the first efficient implementation of an approach for genome-wide epistasis detection, which we call FORCE (Fast Odds Ratio sCan for Epistasis). Due to its mathematical simplicity, the approach is suitable for exhaustive unfiltered epistasis analysis; i.e., the exact value of the IOR statistic can be evaluated for all pairs of genotyped SNPs in reasonable time on a standard computer. We introduce the mathematics to compute exact P-values for the most extreme values of IOR. Secondly, we describe the application of FORCE to the Welcome Trust Case Control Consortium (WTCCC) data on psoriasis, and analyze the previously unknown statistical interactions we found in the light of already-known results. Lastly we ask whether the statistical interactions detected by FORCE were found due to its exhaustiveness and/or its underlying genetic model, and we present evidence for both. We show that the restriction of FORCE to analyzing only certain SNPs selected according to their marginal effect on psoriasis (as previously described by Knight et al. [23]) strongly limits the statistical significance of the results. We then benchmark the performance of FORCE and other popular methods to detect simulated epistatic interactions, always using exhaustive search. Under different common models for interaction and noise, FORCE consistently detects certain types of interactions better than other approaches.

Methods

Definition of interaction odds ratio (IOR)

For any given pair of SNPs, the interaction odds-ratio statistic IOR is calculated from a pair of 2×2 contingency tables. These tables are derived from 3×3 tables of all allele combinations, by collapsing them according to a dominant or recessive model (see Table 1). Following preliminary evidence that the dominant model allowed more efficient detection of epistasis (Table 2), all analyses were performed using this dominant genetic model.
Table 1

Contingency table under a dominant model

SNP1 SNP2 Cases Controls
AABB α β
AABb or bb γ δ
Aa or aaBB ε ζ
Aa or aaBb or bb η θ

Major alleles are respectively A and B for each SNP and minor alleles a and b. The risk allele is assumed to be the minor allele.

Table 2

Power and Family-wise error rate (FWER) for detection of the functional pair using a dominant or recessive transmission assumption in 6 different epistasis models

Genetic model Test Model 1 Model 2 Model 3 Model 4 Model 5 Model 6
Dominant Power10.970.9610.930.99
FWER0.050.020.020.060.050.04
Recessive Power0.930.960.010.0100
FWER0.040.070.020.020.030.01
Contingency table under a dominant model Major alleles are respectively A and B for each SNP and minor alleles a and b. The risk allele is assumed to be the minor allele. Power and Family-wise error rate (FWER) for detection of the functional pair using a dominant or recessive transmission assumption in 6 different epistasis models We define the following odds ratios: Note that IOR is undefined when the denominator of this expression becomes zero. For formal consistency, we therefore added a pseudocount of 1 to each cell of the two contingency tables.

Statistical significance: Empirical and exact P-values

Note that an IOR of x equals an IOR of 1/x after exchanging counts between cases and controls. We define universal IOR, u(IOR): This definition allows us to express significant deviations of u(IOR) from the expectation of 1 using a one-tailed P-value. Pairs with high u(IOR) were identified by the straightforward algorithm that computes u(IOR) for each pair of given SNPs. Our C implementation encodes, in a preprocessing step, all data related to any given SNP into a bit string, and then uses fast logical and bit-counting functions to compute u(IOR) for all pairs. Marginal empirical P-values for any given pair of SNPs were calculated as the proportion of u(IOR) values from randomly generated permutations of case–control labels that were larger than or equal to the value of u(IOR) obtained for the same pair in real data. The number of permutations performed (1000 for simulated data, 100,000 for real data) was adapted to the number of tests performed in these two scenarios. Exact P-values were calculated usingand computed by the straightforward algorithm with four nested loops to cover all required parameter tuples (α’,γ’,ε’,η’). Each inner loop only visits those parameter values that correspond to possible tuples with α’ + γ’ + ε’ + η’ = α + γ + ε + η, given the parameter values in the outer loop. Summed are those terms with u(IOR) ≥ x.

Application of FORCE to psoriasis data

To evaluate FORCE, we assessed its performance on the WTCCC psoriasis dataset. Initial GWAS and further analyses performed on these data are described in [24]. Following general practice for pre-processing, we excluded potentially low-quality SNP data from further analysis. Specifically, we discarded i) any individual whose total missing rate was above 0.05, ii) any SNP with a frequency of missing data above 0.05, and iii) any SNP with minor allele frequency below 0.05. After pre-processing, our dataset consisted of 2,618 cases, 2,737 controls and 491,191 SNPs, corresponding to approximately 1.2 × 1011 SNP pairs. We excluded pairs with a genomic distance of less than 100 kb to avoid pairs in linkage disequilibrium. In addition, we found that low row and cell counts in the contingency table (Table 1) can lead to extreme but frequently not significant values of u(IOR). For the purposes of this study, we excluded 3,521,114 SNP pairs with a total count of less than 50 in any row, or less than 5 in any cell of the contingency table. In addition to FORCE, we performed PLINK (FastEpistasis mode) on the top-ranked 500 pairs to compare the results obtained with both methods.

Comparison of exhaustive FORCE with semi-exhaustive and conditional search

To assess the utility of exhaustive search, we constructed a reference dataset of SNPs previously implicated in psoriasis. We started with a set of 34 SNPs from two previous reviews on psoriasis genetics [25,26] that were part of our psoriasis dataset. After applying quality control thresholds (described above), 18 SNPs remained. Following general practice for genome-wide approaches, for exhaustive and semi-exhaustive searches, we used a genome-wide significance threshold of , which is based on a model of the human genome with 106 independent SNPs [27].

Comparison of FORCE with other approaches on simulated datasets

We simulated datasets of 10 biallelic SNPs over 200 cases and 200 controls following the Hardy-Weinberg equilibrium model. Interactions were simulated according to six different previously described models without main effect [28] (Table 3). These models represent pure epistasis effects, and not confounding main effects. Model 1 is an interaction effect in which high risk of disease occurs when inheriting heterozygous genotypes at either locus (Aa or Bb) but not both. Model 2 represents high risk of disease when inheriting two high-risk alleles that could be A and/or B. Models 3–6 correspond to the epistasis model discovery method described by Moore et al. [29]. Each of these models represents an interaction effect without any main effects. Allele frequencies are p = 0.25 and q = 0.75 for model 3 and 4, p = 0.1 and q = 0.9 for model 5 and 6.
Table 3

Penetrances and allele frequencies (p,q) used to simulate the interaction models – from Ritchie [28]

Model 1 Model 2 Model 3
BB Bb bb BB Bb bb BB Bb bb
AA00.100AA000.10AA0.080.070.05
Aa0.1000.10Aa00.050Aa0.1000.10
aa00.100aa0.1000aa0.030.100.04
p = 0.5, q = 0.5p = 0.5, q = 0.5p = 0.25, q = 0.75
Model 4 Model 5 Model 6
BB Bb bb BB Bb bb BB Bb bb
AA00.010.09AA0.070.050.02AA0.090.0010.02
Aa0.040.010.08Aa0.050.090.01Aa0.080.070.005
aa0.070.090.03aa0.020.010.03aa0.0030.0070.02
p = 0.25, q = 0.75p = 0.10, q = 0.9p = 0.10, q = 0.9

Marginal penetrances for each genotype are identical as we simulate pure epistasis effects.

Penetrances and allele frequencies (p,q) used to simulate the interaction models – from Ritchie [28] Marginal penetrances for each genotype are identical as we simulate pure epistasis effects. For each of the six models, we generated 100 datasets in each of the 16 conditions of the presence or absence of four of the most commonly encountered sources of noise: missing data (MS), genotyping errors (GE), genetic heterogeneity (GH), and phenocopy (PC). For GH, two independent interactions were simulated instead of one, each interaction being risk-associated in half of the affected cases. When PC was simulated, interaction affected the trait for half of the cases, emulating an unknown environmental effect. GE and MS were simulated at 5%, as previously described [28]. An epistatic pair of SNPs was considered as detected if the empirical P-value was below 0.001, i.e., below 0.05 after Bonferroni correction. Power was estimated as n/100, where n is the number of datasets with detection(s). When two pairs (P1, P2) of SNPs were simulated, detection was counted under one of three different conditions: D1) when P1 and P2 were detected, D2) when P1 was detected, or D3) when P1 or P2 was detected. Family-wise error rate (FWER) was calculated as m/100, where m is the number of datasets for which at least one pair other than the simulated pair was detected.

Results

FORCE enables exhaustive unfiltered epistasis analysis

The FORCE method for epistasis detection is based on the choice of a dominant or recessive model that collapses combinations of allele counts into two 2×2 incidence tables (see Methods). Interactions are then detected as extreme values of the IOR statistic. We implemented the FORCE method for epistasis in C language [30]. Due to its mathematical simplicity and efficient implementation, the computation of IOR could be performed rapidly, compared to other approaches (4.3 days on a single core of a standard computer). Table 4 shows running times of different methods selected for this study.
Table 4

Average time needed to exhaustively test one/all 1.25×10 pairs among 500,000 SNPs using a single-core CPU computer

Software Time for one/all SNPs (single core)
MB-MDR [16]5×10-3 s/20 years [31]
PLINK Epistasis [7]2×10-4 s/289 days [5]
PLINK FastEpistasis [8]2×10-5 s/29 days [32]
FORCE3×10-6 s/4.3 days
GWIS - 3 filters [33]1.6×10-6 s/2.2 days [33]
GWIS - 1 filter [33]3.8×10-7 s/0.5 days [33]

We included the recent GWIS approach that is described as ‘exhaustive’, but uses filtering to avoid computing test statistics for all pairs of SNPs.

Average time needed to exhaustively test one/all 1.25×10 pairs among 500,000 SNPs using a single-core CPU computer We included the recent GWIS approach that is described as ‘exhaustive’, but uses filtering to avoid computing test statistics for all pairs of SNPs.

Identification of statistically strong interactions requires exhaustive search

To assess the value of exhaustive search, we first evaluated the performance of a conventional, non-exhaustive approach of constraining the analysis to pairs of SNPs that were previously shown to have main effects associated with the phenotype. We therefore performed a constrained analysis on all pairs of 18 high-quality SNPs that had main effects on psoriasis in previous GWA studies (see Methods). Table 5 gives the best 25 hits obtained through this approach when evaluated on the WTCCC dataset on psoriasis [24] (the results of all pairs are shown in Additional file 1: Table S1). None of the 153 pairs reached a significant interaction P-value below a genome-wide significance threshold of 10−13.
Table 5

Results from conditional search, restricted to pairs of previously implicated SNPs

First GWAS SNP Second GWAS SNP FORCE PLINK FastEpistasis
rs number Chromosomal location rs number Chromosomal location I OR Empirical p-value p-value
rs104845546p21.33rs275245q156.8460.0088820.003095
rs104845546p21.33rs31347926p21.331.0680.30140.007746
rs22018411p31.3rs32130945q33.34.7370.029520.012373
rs31347926p21.33rs479506717q113.1880.074190.012783
rs205415q31rs177169422q246.9870.0082120.013389
rs7028732p16rs479506717q113.4140.064660.014129
rs104845546p21.33rs479506717q112.5970.10710.018096
rs6106046q23rs177169422q246.5910.010250.023261
rs32130945q33.3rs1258010012q13.25.1320.023490.028993
rs46492031p36rs2409936q212.2700.13190.037791
rs46492031p36rs7028732p161.2370.2660.041136
rs31347926p21.33rs275245q1511.8400.0005810.041483
rs7028732p16rs25468905q33.30.8040.370.041729
rs275245q15rs177169422q245.2800.021580.045812
rs6106046q23rs67012161q214.2890.038370.057701
rs22018411p31.3rs25468905q33.32.5870.10770.059206
rs275245q15rs799321413q14.113.5960.057930.059609
rs31347926p21.33rs32130945q33.32.6100.10620.072999
rs7028732p16rs22018411p31.31.6690.19640.083717
rs104845546p21.33rs1258010012q13.22.5350.11130.086631
rs46492031p36rs67012161q213.5180.060720.086671
rs22018411p31.3rs275245q151.6660.19680.088785
rs41127881q21.3rs799321413q14.111.5460.21370.090419
rs2409936q21rs799321413q14.111.8960.16850.096038
rs67012161q21rs801694714q131.0870.29710.100508
Results from conditional search, restricted to pairs of previously implicated SNPs A more comprehensive approach, to which we will here refer to as semi-exhaustive, constrains only one of the SNPs in a pair to a set of previously identified SNPs [8]. Table 6 shows, for each of the 18 previously identified “main effects” SNPs, the highest-scoring interactors, according to the FORCE and PLINK FastEpistasis statistics. Note that FORCE and PLINK identified a few genome-wide significant interactions with P-values as low as 10−20.
Table 6

Semi-exhaustive search among SNP pairs containing a GWAS-identified SNP

GWAS-identified SNP Highest-scoring interactor with GWAS-identified SNP
FORCE u(I OR ) PLINK FastEpistasis Z-score
rs number Chromosomal location Risk allele OR Single association p-value rs number Chromosomal location u(I OR ) Empirical p-value Exact p-value a rs number Chromosomal location Exact p-value
rs104845546p21.334.664.0E-214rs41516646p21.332.97<10E-067.86E-10rs286159506p21.3 2.12E-14
rs25468905q33.31.541.0E-20rs75253451p31.12.537.10E-042.17E-06rs479609317q121.24E-06
rs67012161q211.456.2E-05rs215689222q11.222.5<10E-061.30E-13rs1085358018q21.14.99E-07
rs41127881q21.31.416.5E-09No pair meeting all inclusion criteriars44599834q21.13.35E-08
rs799321413q14.111.412.0E-06No pair meeting all inclusion criteriars108005591q23.34.04E-08
rs32130945q33.31.395.0E-11No pair meeting all inclusion criteriars105126865p13.18.12E-06
rs177169422q241.291.1E-13rs1692872210q22.12.694.10E-062.78E-06rs25536808q13.25.67E-07
rs205415q31.11.275.0E-15No pair meeting all inclusion criteriars171718185q31.25.63E-07
rs2409936q211.255.3E-20rs47271577q21.122.78<10E-06 1.88E-20 rs287732722q12.11.26E-07
rs479506717q111.194.0E-11No pair meeting all inclusion criteriars38198473q27.34.59E-07
rs801694714q131.191.5E-11No pair meeting all inclusion criteriars1107174615q22.311.58E-08
rs6106046q231.197.0E-07rs175855373p26.22.69<10E-06 6.47E-19 rs479488817q11.11.11E-06
rs1258010012q13.21.171.0E-06rs75657422q31.23.39<10E-06 6.80E-20 rs299215413q21.312.07E-06
rs46492031p361.136.8E-08No pair meeting all inclusion criteriars76616844q28.11.70E-06
rs22018411p31.31.133.0E-08No pair meeting all inclusion criteriars1278325210q26.113.79E-06
rs275245q151.132.6E-11No pair meeting all inclusion criteriars78497199q21.311.37E-08
rs7028732p161.123.6E-09No pair meeting all inclusion criteriars1089789711q13.41.79E-06
rs31347926p21.33NR1.0E-09rs10620706p21.322.88<10E-062.85E-10rs10620706p21.32 5.25E-14

aBold data are genome-wide significant interactions.

Semi-exhaustive search among SNP pairs containing a GWAS-identified SNP aBold data are genome-wide significant interactions. Finally, the relatively low computational complexity required for the FORCE statistic allowed us to perform exhaustive analysis of all SNP pairs in the psoriasis dataset. The results are shown in Table 7 (100 best hits shown in Additional file 1: Table S2). Strikingly, the best resulting P-values are another 20 orders of magnitude lower than the P-values identified by semi-exhaustive search. This shows that a large number of the most significant interactions are missed by the semi-exhaustive approach, and hence that the possibility of discovering the statistically best-supported interactions requires an exhaustive approach. Interestingly, FORCE and PLINK identify distinct interactions.
Table 7

FORCE Exhaustive search top hits, and PLINK FastEpistasis results in WTCCC psoriasis data

SNP pair description Epistasis search results
First SNP Second SNP FORCE PLINK FastEpistasis
rs number Chromosomal location (position) rs number Chromosomal location (position) u(I OR ) p-value a p-value a
rs41516646p21.33 (31,920,873)rs92675326p21.33 (31,639,979)10.588 3.32E-33 4.65E-33
rs41516646p21.33 (31,920,873)rs22279566p21.33 (31,778,272)9.662 2.02E-26 7.72E-06
rs31324686p21.33 (31,475,486)rs41516646p21.33 (31,920,873)9.571 3.14E-25 1.82E-07
rs92675466p21.33 (31,673,436)rs41516646p21.33 (31,920,873)8.340 1.08E-31 2.88E-31
rs41516646p21.33 (31,920,873)rs22600006p21.33 (31,593,476)7.749 3.74E-18 1.81E-06
rs25236086p21.33 (31,322,559)rs41516646p21.33 (31,920,873)7.695 1.08E-18 4.93E-09
rs41516646p21.33 (31,920,873)rs28558076p21.33 (31,469,323)7.444 3.88E-17 3.35E-05
rs25964646p21.33 (31,416,156)rs41516646p21.33 (31,920,873)7.379 2.67E-15 5.40E-10
rs31299396p21.32 (31,412,961)rs31312966p21.32 (32,172,993)7.376 6.43E-41 4.45E-30
rs25164646p21.33 (31,416,156)rs126631036p21.32 (32,161,324)7.2294.25E-133.74E-07
rs69066626p21.32 (32,266,506)rs92676496p21.33 (31,824,828)7.187 1.59E-25 2.86E-06
rs121538556p21.33 (32,074,804)rs25236086p21.33 (31,322,559)7.181 1.59E-23 6.91E-09
rs414901312p12.2 (21,282,410)rs93562066q27 (164,818,834)6.4859.82E-091.11E-05
rs5355866p21.33 (31,860,337)rs25235896p21.33 (31,327,334)6.299 1.84E-44 5.45E-43
rs25235896p21.33 (31,327,334)rs6594456p21.33 (31,864,304)6.268 4.08E-45 9.51E-44
rs4083596p21.32 (32,141,883)rs41516646p21.33 (31,920,873)6.038 4.30E-21 1.64E-21
rs2164182chr11q21 (95,981,029)rs168642961q24.3 (171,236,326)5.9458.34E-089.86E-06
rs22279566p21.32 (31,778,272)rs25235896p21.33 (31,327,334)5.851 1.39E-42 2.91E-42
rs1205039514q31.3 (86,210,504)rs23010925q14.3 (83,363,112)5.8311.67E-082.33E-06
rs126631036p21.32 (32,161,324)rs92676496p21.33 (31,824,828)5.827 4.99E-15 1.21E-02
rs5355866p21.33 (31,860,337)rs126631036p21.32 (32,161,324)5.8106.66E-112.62E-05
rs92675326p21.33 (31,639,979)rs92674876p21.33 (31,511,350)5.806 6.48E-19 3.25E-20
rs92674876p21.33 (31,511,350)rs95015876p21.33 (31,346,937)5.804 1.75E-24 1.96E-05
rs126631036p21.32 (32,161,324)rs31306376p21.33 (31,488,145)5.800 4.52E-16 2.79E-05
rs29483698p22 (12,736,387)rs40779208q22.1 (98,893,864)5.8006.05E-091.90E-07

aBold data are genome-wide significant interactions.

FORCE Exhaustive search top hits, and PLINK FastEpistasis results in WTCCC psoriasis data aBold data are genome-wide significant interactions.

FORCE pinpoints interactions beyond main effects in the HLA region

We also analyzed the exhaustive FORCE results with regard to previous studies, which have detected numerous main effects [24-26], but only few weak statistical interactions [24,34,35]. We assessed the performance of FORCE using the WTCCC psoriasis dataset, which contains 2,618 cases, 2,737 controls and 491,191 SNPs. Table 7 shows the 25 best FORCE hits. Twenty-one out of 25 SNP pairs involve SNPs located in the HLA region on chromosome 6, which is consistent with the known strong involvement of the HLA region in psoriasis. Interestingly, certain SNP pairs found to be statistically significant by FORCE did not reach genome-wide significance when using PLINK FastEpistasis. It is well known that SNPs with main effects may falsely appear to be interacting [36]. To avoid such artifacts in our analysis, we removed those SNPs that displayed a univariate statistical association P-value of 10−5 or less [24]. The results show three highly significant interactions involving SNPs from the HLA region that display no main effect (Table 8). In the absence of correlation between the SNPs we claim that these findings provide evidence of interactive effects involved in psoriasis susceptibility. This confirms that FORCE is able to uncover novel statistical interactions in the HLA region that have not been detected before using conventional approaches.
Table 8

Most significant interactions detected through exhaustive search after main effect SNPs removal

rs number Chromosome Position Marginal effect p-value a I OR R 2b
p-value
SNP1 SNP2 SNP1 SNP2 SNP1 SNP2 SNP1 SNP2
rs2254556rs92675326631,374,85431,672,2020.0080.076 1.22E-22 5.230.002
rs9267532rs25235186631,672,20231,373,3510.0760.006 3.15E-22 5.150.002
rs2596437rs92675326631,371,30931,672,2020.0060.076 7.56E-22 5.10.002

aBold data are genome-wide significant interactions. bR2 were calculated using controls only.

Most significant interactions detected through exhaustive search after main effect SNPs removal aBold data are genome-wide significant interactions. bR2 were calculated using controls only.

FORCE systematically detects interactions missed by other approaches

Besides its exhaustiveness, the other characteristic feature of the FORCE approach is the use of the IOR statistic for genome-wide epistasis analysis. To study the extent to which the choice of this statistic contributed to the identification of novel statistical interactions, we used datasets that contained different simulated epistatic interactions between SNPs without main effects, according one of six models of Ritchie [28], and none or one of the four sources of noise: Genotyping Error (GE), Missing Data (MS), Genetic Heterogeneity (GH), Phenocopy (PC) (see Methods for details). We then evaluated the power of FORCE and three other popular epistasis detection methods (PLINK Epistasis [7] and PLINK FastEpistasis [8] using default parameters, and MB-MDR [16], using recommended parameters [37]) to detect the simulated interactions. We used a significance threshold of 0.001. Figure 1 shows the results for all epistatic models for the case of no noise.
Figure 1

Power of different approaches to detect simulated epistatic interactions across the six epistasis models by Ritchie [ 28 ]. Purple: FORCE – Green: MB-MDR – Blue: PLINK Epistasis – Red: PLINK FastEpistasis. Refer to Table 3 for the definitions of the 6 interaction models.

Power of different approaches to detect simulated epistatic interactions across the six epistasis models by Ritchie [ 28 ]. Purple: FORCE – Green: MB-MDR – Blue: PLINK Epistasis – Red: PLINK FastEpistasis. Refer to Table 3 for the definitions of the 6 interaction models. Under all six models, FORCE and MB-MDR consistently showed power close to 1. The situation became more interesting in the presence of noise. Figure 2 shows the power of the tested methods for all six models in the presence of one type of noise (numerical values for are given in Tables 9, 10, 11 and 12). While the results for Genotyping Errors (GE) and Missing Data (MS) were very similar to the no-noise scenario, the presence of Genetic Heterogeneity (GH, independent of the definition of “detection”) or Phenocopy (PC) revealed larger differences among the different approaches. Firstly, we noted that, with GH and PC, all approaches lose power. Secondly, we observed that different approaches worked consistently better than others, depending on the interaction model. For interaction models 1 and 2, MB-MDR dominated all other approaches; FORCE dominated the other approaches for interaction models 3–6.
Figure 2

Power of different approaches to detect simulated epistatic interactions across the six epistasis models by Ritchie [ 28 ], in the presence of noise. Comparison of the power of four methods to detect interaction in the presence of one source of noise. GH: Genetic heterogeneity – GE: Genotyping errors – MS: Missing data – PC: Phenocopy. When GH is simulated, three different ways of calculating power are employed: the power of detecting both pairs in the same dataset, the power of detecting the first (fixed) pair and the power to detect either of the two epistatic pairs. Purple: FORCE – Green: MB-MDR – Blue: PLINK Epistasis – Red: PLINK FastEpistasis.

Table 9

Power and family-wise error rate (FWER) of FORCE, MBMDR, Plink Epistasis and Plink FastEpistasis on 6 epistasis models with or without noise

Model 1 Model 2 Model 3 Model 4 Model 5 Model 6
No noise FORCE Powera 1 0.97 0.96 1 0.93 0.99
FWERb 0.05 0.02 0.02 0.06 0.05 0.04
MBMDR Power 1 1 1 0.98 0.87 0.97
FWER 0.02 0 0 0.01 0.01 0.01
Plink Epistasis Power0 1 0.32 0.98 0.9 0.97
FWER 0.04 0.05 0.02 0.02 0 0.03
FastEpistasis Power0 1 0.38 0.98 0.81 0.84
FWER0.070.07 0.02 0.02 0.01 0.05
GE FORCE Power 0.99 0.99 0.97 1 0.95 0.99
FWER 0.03 0.03 0.08 0.03 0.03 0.04
MBMDR Power 1 1 1 0.99 0.85 1
FWER 0 0 0.01 0.02 0 0
Plink Epistasis Power0 1 0.28 0.99 0.87 0.99
FWER 0 0 0.07 0.02 0 0
FastEpistasis Power0.01 1 0.31 1 0.74 0.92
FWER 0.04 0 0.09 0.05 0 0
MS FORCE Power 0.99 0.96 0.95 1 0.93 0.99
FWER0.07 0.02 0.02 0.03 0.03 0.06
MBMDR Power 1 1 1 0.99 0.8 0.94
FWER 0 0 0 0 0 0
Plink Epistasis Power0 1 0.26 1 0.8 0.97
FWER 0.03 0.02 0.08 0.02 0 0
FastEpistasis Power0 1 0.29 1 0.67 0.91
FWER0.06 0.01 0.1 0.04 0 0
PC FORCE Power0.08 0.55 0.130.330.230.37
FWER 0.03 0.05 0.05 0.06 0.04 0.07
MBMDR Power 0.72 0.98 0.120.090.110.16
FWER 0 0 0 0 0 0
Plink Epistasis Power0 0.95 0.010.210.190.28
FWER 0.04 0.03 0.05 0.06 0.01 0
FastEpistasis Power0 0.99 0.010.230.070.21
FWER0.07 0.03 0.04 0.06 0.01 0.01

Genotype errors (GE), missing data (MS) or phenocopy (PC). aIn bold, power higher than 50%. bIn bold, FWER lower than 5%.

Table 10

Power and family-wise error rate (FWER) of FORCE, MBMDR, Plink Epistasis and Plink FastEpistasis on 6 epistasis models without noise or with simulated genetic heterogeneity (GH)

Model 1 Model 2 Model 3 Model 4 Model 5 Model 6
Both First Either Both First Either Both First Either Both First Either Both First Either Both First Either
FORCE Powera 0.010.080.140.380.6 0.82 0.030.190.340.160.39 0.62 0.040.210.380.10.34 0.57
FWERb 0.02 0.070.07 0.04 0.02 0.02
MBMDR Power 0.75 0.86 0.97 0.96 0.98 1 00.090.170.010.070.1300.070.130.030.160.28
FWER 0 0 0 0 0 0
Plink Epistasis Power000 0.91 0.96 1 00.020.030.050.29 0.52 00.130.260.020.250.47
FWER 0.01 0.02 0.01 0.05 0 0
FastEpistasis Power000 0.96 0.98 1 00.020.040.080.31 0.54 00.070.140.020.180.34
FWER 0.02 0.03 0.01 0.07 0.01 0

aIn bold, power higher than 50%. bIn bold, FWER lower than 5%.

Table 11

Power of FORCE detection method, impact of various sources of noise and combinations of them for the 6 epistatic models

Type of noise Model 1 a Model 2 Model 3 Model 4 Model 5 Model 6
No noise 1 0.97 0.96 1 0.93 0.99
Genotype errors (GE) 0.99 0.99 0.97 1 0.95 0.99
Phenocopy (PC)0.08 0.55 0.130.330.230.37
Misssing data (MS) 0.99 0.96 0.95 1 0.93 0.99
GE + PC0.05 0.62 0.180.30.310.35
GE + MS 0.95 0.98 0.96 1 0.91 0.99
PC + MS0.06 0.52 0.210.310.210.26
GE + PC + MS0.09 0.55 0.210.460.130.35
both first either both first either both first either both first either both first either both first either
Genetic heterogeneity (GH)0.010.080.140.380.6 0.82 00.190.340.20.39 0.62 00.210.380.10.34 0.57
GH + GE0.010.090.160.340.6 0.85 0.030.180.320.170.4 0.62 0.040.230.410.140.31 0.50
GH + PC00.0150.030.020.090.1600.010.0200.020.040.010.0350.060.010.040.07
GH + MS0.010.040.070.370.57 0.77 0.020.1450.270.180.385 0.59 0.030.190.350.070.28 0.50
GH + GE + PC00.010.020.030.1050.1800.020.0400.0250.0500.0250.0500.030.06
GH + GE + MS00.050.10.330.665 0.80 0.020.1550.290.130.385 0.64 0.030.230.430.130.3050.48
GH + PC + MS00.0050.010.010.0950.1800.0250.0500.040.0800.0050.0100.0350.07
GH + GE + PC + MS00.0150.030.010.080.1500.0150.0300.0350.0700.0150.0300.030.06

GE: Genotyping errors – GH: Genetic heterogeneity – MS: Missing data – PC: Phenocopy. In case of GH, power is calculated in 3 different ways as the proportion of datasets in which both, the first or either of the interacting pairs are detected. aIn bold, power higher than 50%.

Table 12

Family-wise error rate (FWER) of FORCE for the 6 epistatic models and 16 noise conditions tested

Family-wise error rate Model 1 a Model 2 Model 3 Model 4 Model 5 Model 6
No noise0.050.020.02 0.06 0.050.04
Genotype errors (GE)0.030.03 0.08 0.030.030.04
Genetic heterogeneity (GH)0.02 0.07 0.07 0.040.020.02
Phenocopy (PC)0.030.050.05 0.06 0.04 0.07
Misssing data (MS) 0.07 0.020.020.030.03 0.06
GE + GH0.05 0.07 0.040.030.01 0.07
GE + PC0.050.050.010.020.030.02
GE + MS0.020.01 0.06 0.040.04 0.07
GH + PC0.050.050.030.020.030.03
GH + MS0.04 0.07 0.050.030.030.01
PC + MS0.030.020.03 0.06 0.040.03
GE + GH + PC 0.07 0.03 0.06 0.030.020.04
GE + GH + MS 0.07 0.050.040.010.050.05
GH + PC + MS0.040.02 0.06 0.020.030.03
GE + PC + MS0.05 0.08 0.06 0.06 0.050.05
GE + GH + PC + MS0.02 0.07 0.06 0.040.05 0.06

GE: Genotyping errors – GH: Genetic heterogeneity – MS: Missing data – PC: Phenocopy. aIn bold, FWER > 0.05.

Power of different approaches to detect simulated epistatic interactions across the six epistasis models by Ritchie [ 28 ], in the presence of noise. Comparison of the power of four methods to detect interaction in the presence of one source of noise. GH: Genetic heterogeneity – GE: Genotyping errors – MS: Missing data – PC: Phenocopy. When GH is simulated, three different ways of calculating power are employed: the power of detecting both pairs in the same dataset, the power of detecting the first (fixed) pair and the power to detect either of the two epistatic pairs. Purple: FORCE – Green: MB-MDR – Blue: PLINK Epistasis – Red: PLINK FastEpistasis. Power and family-wise error rate (FWER) of FORCE, MBMDR, Plink Epistasis and Plink FastEpistasis on 6 epistasis models with or without noise Genotype errors (GE), missing data (MS) or phenocopy (PC). aIn bold, power higher than 50%. bIn bold, FWER lower than 5%. Power and family-wise error rate (FWER) of FORCE, MBMDR, Plink Epistasis and Plink FastEpistasis on 6 epistasis models without noise or with simulated genetic heterogeneity (GH) aIn bold, power higher than 50%. bIn bold, FWER lower than 5%. Power of FORCE detection method, impact of various sources of noise and combinations of them for the 6 epistatic models GE: Genotyping errors – GH: Genetic heterogeneity – MS: Missing data – PC: Phenocopy. In case of GH, power is calculated in 3 different ways as the proportion of datasets in which both, the first or either of the interacting pairs are detected. aIn bold, power higher than 50%. Family-wise error rate (FWER) of FORCE for the 6 epistatic models and 16 noise conditions tested GE: Genotyping errors – GH: Genetic heterogeneity – MS: Missing data – PC: Phenocopy. aIn bold, FWER > 0.05.

Discussion

This study introduces the FORCE approach for genome-wide epistasis analysis. On the basis of the Interaction Odds Ratio (IOR) statistic, it performs a genome-wide search for epistatic interactions between pairs of SNPs in a reasonable time on a standard laptop computer. The search is exhaustive and filter-free; i.e., the result is guaranteed to reflect the most extreme IOR values over all possible interactions. Exhaustive search using FORCE is possible because of the computational simplicity of the IOR statistic. Wu et al. [38] introduced a haplotype-based measure based on the following term:where is the odds ratio for both risk haplotypes when carried together, compared to the baseline haplotypes; and are the odds ratios for each risk haplotype, respectively, compared to the baseline haplotype. Although both methods are based on odds ratios, the methods differ in several respects. First, and most significantly, Wu’s method uses haplotypes, which typically require the statistical inference of haplotypes. Even though this design was shown to be better powered than classical genotype-based statistics, the additional calculations are computationally costly. As a result, FORCE can perform an exhaustive genome-wide epistasis search in a few days on a single compute core while, in practice, Wu’s method only allows a limited number of SNP pairs to be tested. In addition to the different statistics themselves, the approaches to calculating significance differ. FORCE relies on an exact P-value that requires too much time to be calculated exhaustively for all SNP pairs. Instead, P-values are calculated only for pairs with the highest IOR. Conversely, Wu et al. used an approximate, chi-square distribution-based, P-value which can be applied to each investigated pair of the search. Our study on WTCCC psoriasis data suggests that the computational effort for exhaustive testing is currently not just a luxury. The popular class of conditional analyses focuses only on possible interactions of previously implicated SNPs – often the only option to perform large-scale analysis in reasonable time. When comparing conditional and exhaustive FORCE analyses, we found that the conditional approach only detects interactions of vastly weaker statistical significance. Our systematic study on small simulated datasets indicates that FORCE not only “goes farther” than existing approaches because of its exhaustive search, but also detects fundamentally different types of interactions, in particular in the biologically more relevant models 3–6. In two out of six models of epistatic interaction described by Ritchie [28], and across the different sources of noise in the data, FORCE consistently displayed a good power of detection compared to other approaches. Interestingly, each of the four approaches is always less efficient than another for at least one model associated with one type of noise. Finally, by applying FORCE to WTCCC psoriasis data, we were able to detect statistical interactions between SNPs in the HLA region, even after the exclusion of all SNPs with main effects. To our knowledge this constitutes the first demonstration that the genetic structure of the HLA region cannot be understood by the analysis of main effects alone and that more than one interacting locus exists in that region.

Conclusions

Together, the different elements of our study suggest that FORCE represents a valuable new addition to the arsenal of genome-wide epistasis detection approaches for case–control studies. As with other approaches, the additionally detected interactions are a priori of a statistical nature, and require detailed analysis and follow-up. Beyond this, our study has provided an example for the need for exhaustive epistasis analysis. In the future, exhaustive analysis will be facilitated by the ever-increasing computational power available to biological research. On one hand, this may enable the exhaustive calculation of FORCE P-values, which can be expected to lead to a potentially much enlarged set of statistically significant interactions. On the other hand, more computational power, as well as algorithmic improvements, may also render exhaustive analysis under those models of interactions feasible for which running times are prohibitive today. Finally, we believe that these improvements are necessary for the integration of different types of interactions and other types of large-scale data, which may ultimately be key to understanding the genetic basis of complex diseases.
  36 in total

1.  On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data.

Authors:  Daniel F Schwarz; Inke R König; Andreas Ziegler
Journal:  Bioinformatics       Date:  2010-05-26       Impact factor: 6.937

Review 2.  Biostatistical aspects of genome-wide association studies.

Authors:  Andreas Ziegler; Inke R König; John R Thompson
Journal:  Biom J       Date:  2008-02       Impact factor: 2.207

3.  Model-Based Multifactor Dimensionality Reduction to detect epistasis for quantitative traits in the presence of error-free and noisy data.

Authors:  Jestinah M Mahachie John; François Van Lishout; Kristel Van Steen
Journal:  Eur J Hum Genet       Date:  2011-03-16       Impact factor: 4.246

4.  Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case-control studies.

Authors:  W W Piegorsch; C R Weinberg; J A Taylor
Journal:  Stat Med       Date:  1994-01-30       Impact factor: 2.373

5.  Application of Genetic Algorithms to the Discovery of Complex Models for Simulation Studies in Human Genetics.

Authors:  Jason H Moore; Lance W Hahn; Marylyn D Ritchie; Tricia A Thornton; Bill C White
Journal:  Proc Genet Evol Comput Conf       Date:  2002-07-01

6.  Meta-analysis confirms the LCE3C_LCE3B deletion as a risk factor for psoriasis in several ethnic groups and finds interaction with HLA-Cw6.

Authors:  Eva Riveira-Munoz; Su-Min He; Georgia Escaramís; Philip E Stuart; Ulrike Hüffmeier; Catherine Lee; Brian Kirby; Akira Oka; Emiliano Giardina; Wilson Liao; Judith Bergboer; Kati Kainu; Rafael de Cid; Batmunkh Munkhbat; Patrick L J M Zeeuwen; John A L Armour; Annie Poon; Tomotaka Mabuchi; Akira Ozawa; Agnieszka Zawirska; A David Burden; Jonathan N Barker; Francesca Capon; Heiko Traupe; Liang-Dan Sun; Yong Cui; Xian-Yong Yin; Gang Chen; Henry W Lim; Rajan P Nair; John J Voorhees; Trilokraj Tejasvi; Ramón Pujol; Namid Munkhtuvshin; Judith Fischer; Juha Kere; Joost Schalkwijk; Anne Bowcock; Pui-Yan Kwok; Giuseppe Novelli; Hidetoshi Inoko; Anthony W Ryan; Richard C Trembath; André Reis; Xue-Jun Zhang; James T Elder; Xavier Estivill
Journal:  J Invest Dermatol       Date:  2010-11-25       Impact factor: 8.551

Review 7.  Current understanding of human genetics and genetic analysis of psoriasis.

Authors:  Akira Oka; Tomotaka Mabuchi; Akira Ozawa; Hidetoshi Inoko
Journal:  J Dermatol       Date:  2012-03       Impact factor: 4.005

8.  Case-only gene-environment interaction studies: when does association imply mechanistic interaction?

Authors:  Tyler J VanderWeele; Sonia Hernández-Díaz; Miguel A Hernán
Journal:  Genet Epidemiol       Date:  2010-05       Impact factor: 2.135

9.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations.

Authors:  Danielle Welter; Jacqueline MacArthur; Joannella Morales; Tony Burdett; Peggy Hall; Heather Junkins; Alan Klemm; Paul Flicek; Teri Manolio; Lucia Hindorff; Helen Parkinson
Journal:  Nucleic Acids Res       Date:  2013-12-06       Impact factor: 16.971

10.  GWIS--model-free, fast and exhaustive search for epistatic interactions in case-control GWAS.

Authors:  Benjamin Goudey; David Rawlinson; Qiao Wang; Fan Shi; Herman Ferra; Richard M Campbell; Linda Stern; Michael T Inouye; Cheng Soon Ong; Adam Kowalczyk
Journal:  BMC Genomics       Date:  2013-05-28       Impact factor: 3.969

View more
  1 in total

1.  DNA methylation as a mediator of HLA-DRB1*15:01 and a protective variant in multiple sclerosis.

Authors:  Lara Kular; Yun Liu; Sabrina Ruhrmann; Galina Zheleznyakova; Francesco Marabita; David Gomez-Cabrero; Tojo James; Ewoud Ewing; Magdalena Lindén; Bartosz Górnikiewicz; Shahin Aeinehband; Pernilla Stridh; Jenny Link; Till F M Andlauer; Christiane Gasperi; Heinz Wiendl; Frauke Zipp; Ralf Gold; Björn Tackenberg; Frank Weber; Bernhard Hemmer; Konstantin Strauch; Stefanie Heilmann-Heimbach; Rajesh Rawal; Ulf Schminke; Carsten O Schmidt; Tim Kacprowski; Andre Franke; Matthias Laudes; Alexander T Dilthey; Elisabeth G Celius; Helle B Søndergaard; Jesper Tegnér; Hanne F Harbo; Annette B Oturai; Sigurgeir Olafsson; Hannes P Eggertsson; Bjarni V Halldorsson; Haukur Hjaltason; Elias Olafsson; Ingileif Jonsdottir; Kari Stefansson; Tomas Olsson; Fredrik Piehl; Tomas J Ekström; Ingrid Kockum; Andrew P Feinberg; Maja Jagodic
Journal:  Nat Commun       Date:  2018-06-19       Impact factor: 14.919

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.