Literature DB >> 23396514

An information-gain approach to detecting three-way epistatic interactions in genetic association studies.

Ting Hu1, Yuanzhu Chen, Jeff W Kiralis, Ryan L Collins, Christian Wejse, Giorgio Sirugo, Scott M Williams, Jason H Moore.   

Abstract

BACKGROUND: Epistasis has been historically used to describe the phenomenon that the effect of a given gene on a phenotype can be dependent on one or more other genes, and is an essential element for understanding the association between genetic and phenotypic variations. Quantifying epistasis of orders higher than two is very challenging due to both the computational complexity of enumerating all possible combinations in genome-wide data and the lack of efficient and effective methodologies.
OBJECTIVES: In this study, we propose a fast, non-parametric, and model-free measure for three-way epistasis.
METHODS: Such a measure is based on information gain, and is able to separate all lower order effects from pure three-way epistasis.
RESULTS: Our method was verified on synthetic data and applied to real data from a candidate-gene study of tuberculosis in a West African population. In the tuberculosis data, we found a statistically significant pure three-way epistatic interaction effect that was stronger than any lower-order associations.
CONCLUSION: Our study provides a methodological basis for detecting and characterizing high-order gene-gene interactions in genetic association studies.

Entities:  

Keywords:  epistasis; gene-gene interaction; genetic association studies; high-order interaction; information gain

Mesh:

Year:  2013        PMID: 23396514      PMCID: PMC3721169          DOI: 10.1136/amiajnl-2012-001525

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


Introduction

Understanding the mapping from genetic variation to phenotypic variation has a great potential helping us understand, predict, diagnose, and treat common human diseases. However, existing main-effect-centered methodologies and techniques that depend on fundamental assumptions about a simple genetic architecture can only find very limited individual associations with disease risks. Genome-wide association studies1–3 and next generation sequencing4 make millions of single nucleotide polymorphisms (SNPs) in the human genome available for testing associations with phenotypic traits. These developments call for new methodologies that embrace the complex genetic architecture of diseases.5 6 The non-additive effects of gene–gene interactions, that is, epistasis, are believed to be an important contributor to the complex relationship between genetic and phenotypic variations.7–11 The focus of recent disease association research is shifting from identifying single locus susceptibility to quantifying interaction effects between multiple candidate loci throughout the human genome.8 9 12 Detecting and quantifying epistasis is a very challenging task. First, the epistatic interactions could involve multiple genes from a pair to a large set, and this undetermined order of interactions imposes enormous computational complexity of enumerating all possible combinations of genetic attributes for varying orders.6 13 Second, as the order of interacting genetic attributes goes beyond two, it becomes mathematically difficult to separate the additive lower-order effects and the pure higher-order synergy, that is, the extreme case in which the association can only be observed when all attributes are considered together. Those are also the major reasons why most existing epistasis studies are limited to pairwise interactions on genetics data of moderate sizes. Information-theoretic measures have emerged as a very useful tool to quantify synergistic interactions among multiple genetic attributes.14–21 These measures are based on the Shannon entropy, which quantifies the amount of information, or uncertainty, of a random variable.22 By considering genetic attributes and phenotypic traits as random variables, entropy-based information-theoretic measures can be used to quantify the shared information between one gene and a trait, i.e., the main effect, as well as the gained extra information about a trait obtained from combining multiple genes, i.e., the synergistic effect or epistasis. However, as discussed previously, due to the mathematical complexity, the application of information-theoretic measures in disease association studies is mainly limited to pairwise epistasis between two genetic attributes. In this study, we propose a new measure to quantify the synergistic effects among three genetic attributes that contribute to disease susceptibility by extending the information-theoretic measure for pairwise synergy. In particular, we first measure the total amount of information that three attributes together can provide about the phenotypic status, and then subtract all lower-order effects including the main effects of the three attributes and all pairwise synergies between them. This yields a very strict measure of pure three-way epistasis. There have been previous attempts at extending information-theoretic measures on three-way and higher-order synergies.14 17 23 24 However, most existing measures are not able to decouple lower-order interaction effects from the higher-order effects and the formalization of higher-order synergy is still debatable. We compared our new measure to those existing ones and were able to show that our measure performed the best at separating all lower-order effects from the pure three-way epistasis by applying to both synthetic data containing artifact epistasis models and real human disease data from a candidate-gene association study of tuberculosis in a population from West Africa.25 Of particular interest, we identified a statistically significant three-way epistasis model in the tuberculosis data, in which the three-way synergy is stronger than all the main effects and pairwise-interaction effects combined. With further biological verification and interpretation, this model could be very valuable in advancing tuberculosis research.

Methods

Information-theoretic measures

In information theory,22 entropy is a measure of the uncertainty of a random variable. It can be explained as the amount of information required on average to describe a random variable. For a discrete variable with alphabet and probability mass function , its entropy is defined as When there are more than one random variable, the definition of entropy can be extended as follows. The joint entropy of two discrete random variables and with a joint distribution is defined asand the conditional entropy of given the knowledge of can be obtained by the chain rule as The dependency between two random variables can be described using mutual information.22 This is a measure of the amount of information that one random variable contains about the other, or can be thought of as the reduction of uncertainty of one random variable given the knowledge of the other. In the context of genetic association studies, mutual information can be very useful to quantify how much of a phenotypic status is explained by genotypic variations. We consider a genetic attribute and the phenotypic class , for example, case or control, are both discrete random variables. The mutual information measures the reduction in the uncertainty of the class C due to the knowledge about the genotype of (figure 1A), defined asIntuitively, can be used as a measure of the main effect of the genetic attribute on the class .
Figure 1

Venn diagrams showing the entropy and mutual information of genetic attributes and the phenotypic class C. (A) For one attribute G1 and the phenotypic class C, their entropies and are indicated as the two colored sets. The mutual information is defined as the intersection of the two sets, and the joint entropy H(G1,C) is the reunion of the two sets. (B) For two genetic attributes , and the phenotypic class , the mutual information is the intersection of entropy and joint entropy .

Venn diagrams showing the entropy and mutual information of genetic attributes and the phenotypic class C. (A) For one attribute G1 and the phenotypic class C, their entropies and are indicated as the two colored sets. The mutual information is defined as the intersection of the two sets, and the joint entropy H(G1,C) is the reunion of the two sets. (B) For two genetic attributes , and the phenotypic class , the mutual information is the intersection of entropy and joint entropy . Mutual information can also be extended to measure the epistatic interaction effect between two attributes. Given two genetic attributes and , the mutual informationmeasures how much of the phenotypic class joining and together can explain (figure 1B). By subtracting the individual main effects of and from their joint effect , that is,the information gain is the gain of mutual information of knowing both and with respect to the class . A positive value of indicates the synergy between and , while a negative value indicates the redundancy or correlation between them. The synergy can be well explained using the epistatic interactions between two genetic attributes. As discussed previously, this pairwise information-gain measure has been successfully applied in many epistasis studies thanks to its model-free, non-parametric, and fast implementation. Further extension of the information-gain measure on more than two genetic attributes is non-trivial for epistasis studies because many complex human diseases could very likely involve genetic interactions of orders higher than two way. There is no widely accepted formal definition of information gain including genetic attributes higher than two. Here, we make an effort measuring three-way synergistic interactions using information gain. In a previous attempt, Anastassiou14 and Varadan et al24 proposed to define the three-way information gain by comparing the integrated joint mutual information to the best-achieved subsets mutual information after breaking the whole into partitions, mathematically written as The partition of the set chosen in this formula is the one that maximizes the sum of the amounts of mutual information connecting the subsets with the phenotypic class. This is referred to as ‘maximum-information partition’ of the set with respect to the class . This three-way quantifies the information that can be gained by combining and together comparing to its maximum-information partition. Although technically sound, this formula might include false-positive errors of pure three-way epistasis. For instance, assuming and is the maximum-information partition, after combining with , the gained information could be the result of either the pure three-way epistasis, or the pairwise epistasis between and one (or both) of , or the mixture of all above. A more strict alternative measure23 was proposed as followswhere all the lower-order effects are subtracted. However, as reviewed and pointed out in Anastassiou,14 this formula fails in the extreme redundancy case where all , , and provide the same full amount of information on , that is, . In this case, , where are different values taken from . Therefore , which indicates the contradictory extreme synergy. In our study, we propose a new strict measure by modifying as We only subtract pairwise synergies, that is, positive information gain, because the failure of is due to the fact that it adds back information by subtracting negative information gain. By subtracting all lower-order effects and synergies, measures the pure three-way synergy that is observable only by considering three attributes together. Also note that, when applying to genetics data, the above mutual-information and information-gain measures can be normalized by dividing the class entropy . The normalized measures range from , and provide the percentage of explaining the phenotypic class by giving the knowledge of one or multiple genetic attributes.

Datasets

We applied both three-way synergy measures and to synthetic datasets and a real dataset on pulmonary tuberculosis from a West African population. The synthetic datasets were generated using genetic architecture model emulator for testing and evaluating software (GAMETES),26 27 a direct approach to simulating bi-allelic n-locus epistatic models. In GAMETES, an n-locus epistasis model is generated deterministically with specified genetic constraints such as heritability and minor allele frequencies, and then a population of samples can be generated for that model. Using GAMETES, we can have synthetic data of models with epistasis at desired orders, which are ideal for testing and comparing the two three-way synergy measures. We first generated a pure three-way epistasis model , where the association to phenotypic status was only observable when all three SNPs were considered together, that is, no main effects and no pairwise interactions. The total heritability of combining three SNPs was set to 0.27, and the minor allele frequencies of all three SNPs were set to 0.2. A corresponding dataset was generated by including 100 SNPs in total with 97 randomized SNPs, , to provide a null distribution. This synthetic dataset had 400 cases and 400 controls. Then we used GAMETES to generate a collective pairwise epistasis model , in which any two of the three attributes had strong pairwise interactions but there was no epistasis at the three-way level. Again there was no main effect for each attribute. The heritability of combining three SNPs was set to 0.32, and the heritability of combining any two SNPs, that is, , and , was about 0.1. The minor allele frequencies for all three SNPs were again set to 0.2. The second synthetic dataset also included 100 SNPs in total with 97 randomized SNPs and had 400 cases and 400 controls. The pulmonary tuberculosis data are from a case–control study25 conducted at the Bandim Health Project in Bissau, the capital city of Guinea-Bissau. This area has a high prevalence of pulmonary tuberculosis and tuberculosis symptoms.28 Tuberculosis is one of the highest mortality diseases due to the infection of Mycobacterium tuberculosis. However, the majority of infected individuals keep the bacterium under control and never develop a clinical disease. Genetic variation among individuals is a promising direction to look into the factors that could influence the susceptibility to develop tuberculosis. Tuberculosis patients included in this study were residents or long-term guests of Bissau aged 15 years or older. From November 2003 to November 2005, 438 tuberculosis patients were screened at local health centers. A total of 344 subjects met the inclusion criteria and accepted participation, and DNA samples were successfully collected from 321 of them. Healthy controls were recruited from the study area with certain exclusion criteria, such as history of tuberculosis and household tuberculosis records within the past 2 years. Three hundred and forty-seven DNA samples were obtained from the healthy control group. All DNA samples were extracted using a standard salting-out procedure. DNA purities were estimated spectrophotometrically, and final concentrations were determined by PicoGreen. Samples (4 ng of DNA) were genotyped by TaqMan SNP assays (ABI; Applera International Inc, Foster City, California, USA) in 10 μl reaction volume, using the Rotor-Gene 3000 (Corbett Robotics Pty Ltd, Brisbane, Queensland, Australia) and the ABI 7500 real-time PCR systems. Fluorescence curves were analyzed with the Rotor-Gene Software V.6 and the 7500 Sequence Detection Software V.1.2.1 for allelic discrimination. The data include 19 SNPs from innate immunity genes, DC-SIGN (CD209), long pentraxin 3 (PTX3), toll-like receptors (TLRs), and vitamin D receptor (VDR), which are relevant to the defense against M tuberculosis. The missing genotypes (<5%) were imputed using a frequency-based method. The missing value of a sample was filled using the most common genotype of the corresponding SNP in the population.

Results

Information-gain measurements on synthetic data

Figure 2A shows the results of applying information-theoretic measures to the first synthetic dataset that had a pure three-way epistasis model. The points in red represent the observed values of the model , and the box plots in black show the measures for the randomized SNPs . Neither main effect nor pairwise synergy was found using the information-theoretic measures as the observed data points do not distinguish from the null distributions. In addition, both three-way synergy measurements and successfully captured the pure three-way epistasis in the model.
Figure 2

Information-theoretic measures (normalized information gain and mutual information representing the genetic associations to the case–control status) of the synthetic datasets generated by GAMETES. (A) The first synthetic dataset that has a pure three-way epistasis model with no main effects and no pairwise epistasis. (B) The second synthetic dataset with a collective pairwise epistasis model , in which there are no main effects and no three-way epistasis but all the pairs , , and have interaction effects. Points in red represent the observed values of the model in (A) and in (B). Box plots in black show the null distributions of randomized single nucleotide polymorphisms.

Information-theoretic measures (normalized information gain and mutual information representing the genetic associations to the case–control status) of the synthetic datasets generated by GAMETES. (A) The first synthetic dataset that has a pure three-way epistasis model with no main effects and no pairwise epistasis. (B) The second synthetic dataset with a collective pairwise epistasis model , in which there are no main effects and no three-way epistasis but all the pairs , , and have interaction effects. Points in red represent the observed values of the model in (A) and in (B). Box plots in black show the null distributions of randomized single nucleotide polymorphisms. The results of the second synthetic dataset with the collective pairwise epistasis model are shown in figure 2B. Again, the points in red represent the observed values of the model , and the box plots in black show the null distributions of 97 randomized SNPs. As we can see, all three pairwise synergies were detected. No three-way epistasis was detected by , but reported a strong three-way epistasis among , , and by including some portions of their pairwise synergies.

Information-gain measurements on real data

We used information-theoretic measures to quantify the main effects, two-way and three-way synergies on all possible combinations of attributes in the tuberculosis data. In addition, to show the collective and neighborhood structures of strong synergistic pairs, we built a pairwise epistasis network18 (figure 3, rendered by software Cytoscape29). The network was constructed through an incrementing process as follows. An edge and its two end vertices were added to the network only if their pairwise epistasis strength was greater than a given threshold. When we gradually decreased the threshold from its maximal observed value to its minimal value, the network started from a single edge with two vertices and eventually became a complete graph, that is, every single vertex is directly connected to every other vertex. We chose the highest pairwise synergy threshold, that is, 0.71%, when all 19 attributes were included in the network. Therefore, we can have a map of the strongest pairwise interactions showing the neighborhood structure of every attribute.
Figure 3

The pairwise statistical epistasis network of the tuberculosis data. The network has 32 edges (pairwise interactions) and 19 vertices (single nucleotide polymorphisms; SNPs). The size of a vertex indicates its main effect, and edge width indicates the pairwise synergy. The color of a vertex denotes the gene that a SNP belongs to, with blue representing CD209, green representing VDR, pink representing PTX3, and yellow representing TLRs. The network was built by incrementally adding pairwise interactions, ranked by their strength (pairwise information gain), and their two end SNPs. This network construction process completed when all 19 SNPs were included. It thus shows the strongest pairwise interactions and the neighborhood structures for all 19 SNPs.

The pairwise statistical epistasis network of the tuberculosis data. The network has 32 edges (pairwise interactions) and 19 vertices (single nucleotide polymorphisms; SNPs). The size of a vertex indicates its main effect, and edge width indicates the pairwise synergy. The color of a vertex denotes the gene that a SNP belongs to, with blue representing CD209, green representing VDR, pink representing PTX3, and yellow representing TLRs. The network was built by incrementally adding pairwise interactions, ranked by their strength (pairwise information gain), and their two end SNPs. This network construction process completed when all 19 SNPs were included. It thus shows the strongest pairwise interactions and the neighborhood structures for all 19 SNPs. Figure 3 has 19 vertices and 32 edges that represent the top 32 strongest pairwise interactions out of all pairs. Using this network, we can easily identify three-way models that have strong collective pairwise synergies. In graph theory,30 the distance of a pair of vertices and is defined as the minimal number of edges for one vertex to reach the other. Given three vertices , , and , we define their trio distance as the sum of all pairwise distances, that is, . Therefore, for trios with , any two of them are directly joined by an edge, which indicates three strong collective pairwise interactions in a three-way model. If a trio has , one vertex is directly connected to the other two but the other two are not joined by an edge. A trio with does not have strong collective pairwise epistasis. Figure 4 shows the comparison of the results of both three-way synergy measures. In general, and are positively correlated (Spearman's rank correlation ρ=0.8448, ). All the data points, that is, three-way models, are positioned on one side of the line, which indicates that the measure is always less than or equal to the measure. This is intuitive because subtracts more terms than does from . Colors indicate whether a trio has strong collective pairwise epistasis. As seen in the figure, the discrepancies (away from the line) between and are the most distinguishing for red data points, that is, trios that have either two or three strong collective pairwise interactions. These results also verify our previous discussion on these two three-way synergy measures using synthetic data. That is, probably includes pairwise synergies into its three-way epistasis measure when there are more than one pairwise synergies in a three-way model, and only captures the pure three-way epistasis that are beyond any lower-order synergies or effects.
Figure 4

Comparison of two three-way synergy measurements in the tuberculosis data. Each data point represents a three-way model and its color indicates whether the three-way model has strong collective pairwise epistasis. If a trio's distance is less than or equal to 4 in the pairwise epistasis network (figure 3), this three-way model has strong collective pairwise interactions among its three attributes (in red). Otherwise a trio does not possess strong collective pairwise epistasis (in black). The dashed curve represents the line.

Comparison of two three-way synergy measurements in the tuberculosis data. Each data point represents a three-way model and its color indicates whether the three-way model has strong collective pairwise epistasis. If a trio's distance is less than or equal to 4 in the pairwise epistasis network (figure 3), this three-way model has strong collective pairwise interactions among its three attributes (in red). Otherwise a trio does not possess strong collective pairwise epistasis (in black). The dashed curve represents the line. We also looked into the correlations between the three-way synergies and lower-order synergies or effects (table 1). Both three-way synergy measures do not correlate with individual main effects. However, the three-way shows a correlation with pairwise synergy but does not. This further confirms our previous discussion on the differences between these two three-way synergy measures.
Table 1

Spearman's rank correlation of association synergies or effects at different model orders in the tuberculosis dataset

Three-way IGpartitionThree-way IGstrict
Main effect
Pairwise synergy
Spearman's rank correlation of association synergies or effects at different model orders in the tuberculosis dataset The best three-way models using those two synergy measures are reported in figure 5. The figure shows all the individual main effects, pairwise synergies, three-way synergies, and the total mutual information for a three-way model. The best model (figure 5A) includes SNPs rs11465421 from CD209, rs1544410 from VDR, and rs7975232 from VDR. It has the total mutual information 6.695% and three-way 3.668%. This model is clearly a mixed epistasis model by possessing both strong collective pairwise interactions and a pure three-way interaction. The best model (figure 5B) involves SNPs from three different genes, rs5743836 from TLR9, rs2305619 from PTX3, and rs4804803 from CD209. All three SNPs have very limited main effects and pairwise synergies. However, the strict three-way information gain is stronger than all other lower-order effects combined together, and contributes to the total association. An explicit test of epistasis31 was used to assess the statistical significance of those observations. Instead of shuffling the case–control class in a standard permutation test, the explicit test randomly shuffled genotypes of samples within each class. Therefore, the genotype frequencies within each class remained fixed. This preserved the independent main effects while randomizing any non-linear interactions, and provided the null hypothesis that the only genetic effects in the data were linear and additive. We performed the explicit test 1000 times for each observation, and both best models were statistically significant ( for the best and for the best ).
Figure 5

The models with the highest three-way information gain in the tuberculosis data. (A) The best model of with permutation testing significance , and (B) the best model of with permutation testing significance . Each circle is a single nucleotide polymorphism (SNP) with its name and main effect strength. An edge represents a pairwise epistasis with its strength labeled. is the total mutual information between combining three SNPs and the case–control status, and is the three-way information gain. All measures are normalized by dividing the entropy of the case–control status , and thus give the percentage of predicting information on the phenotype status.

The models with the highest three-way information gain in the tuberculosis data. (A) The best model of with permutation testing significance , and (B) the best model of with permutation testing significance . Each circle is a single nucleotide polymorphism (SNP) with its name and main effect strength. An edge represents a pairwise epistasis with its strength labeled. is the total mutual information between combining three SNPs and the case–control status, and is the three-way information gain. All measures are normalized by dividing the entropy of the case–control status , and thus give the percentage of predicting information on the phenotype status.

Discussion

Epistasis is recognized as playing an important role in the genetic architecture of complex traits such as common human diseases. Quantifying interaction effects among multiple loci throughout the human genome has become the major focus of current research for understanding the complex relationship between genetic variations and phenotypic traits.8 9 12 However, this task is challenging due to the fact that first enumerating all possible combinations of genetic variants in a dataset of moderate size is computationally infeasible, and second it is difficult to separate the additive effects and the synergistic effects among multiple genetic factors. The majority of epistasis studies focus on pairwise interactions and rarely look beyond interactions of orders higher than two. It is computationally intensive to enumerate three-way combinations in human genome data and, furthermore, not many good methods of quantifying three-way epistasis are proposed in the literature.14 In the present study, we proposed a fast, model-free, and non-parametric measure to detect and characterize three-way epistatic interactions. It is a natural extension of the pairwise synergy measure using information gain by measuring the total three-way mutual information and then subtracting three main effects and three pairwise synergies. Our approach was shown to be able to detect pure three-way epistasis, that is, no observable effect at either the two-way or one-way level, in both synthetic and real population-based genetics data. Our study is not the first attempt to quantify three-way epistasis using information-theoretic measures. We compared our measure to a previously published one .32 Both measures exclude main effects. However, is biased towards three-way models that possess strong pairwise interactions. Our excludes all one-way and two-way effects and is able to detect pure three-way synergy that is only observable when all three attributes are considered together. Both three-way synergy measures were applied to a pulmonary tuberculosis dataset from a West African population. Several potentially relevant innate immunity genes, CD209, PTX3, TLRs, and VDR, were included in these data to investigate their role in pulmonary tuberculosis susceptibility. The dendritic cell-specific intercellular adhesion molecule-3-grabbing non-integrin (DC-SIGN or CD209) is a crucial M tuberculosis receptor expressed on the surface of dendritic cells, and is involved in the initiation of innate immune response through identification of potential infectious agents. CD209 has previously been found to be associated with tuberculosis.33 34 Long pentraxin 3 (PTX3) is produced by innate immunity cells and vascular cells in response to proinflammatory signals and TLR engagement.35 PTX3 levels have been shown to be correlated to the degree of infection in lungs, and PTX3 plasma levels can be monitored to measure treatment efficacy because PTX3 concentration decreases as an infection is mitigated.36 TLRs are a family of receptors that are a key component in the innate immune system. TLRs recognize pathogenic molecules and control host immune response against them, and have been extensively proved to impact susceptibility to infectious and inflammatory diseases.37 38 The VDR has been shown to be linked to TLRs. It was found that TLR activation of human macrophages upregulated expression of the VDR and the vitamin D1-hydroxylase genes.39 This link suggests that the difference among human populations’ ability in producing vitamin D may contribute to susceptibility to microbial infection. Furthermore, a case–control and family study reported the association between VDR polymorphisms and susceptibility to tuberculosis.40 The strongest three-way epistasis models we found (figure 5) could further extend and strengthen the understanding of how those relevant genes might work in a synergistic way. Although the synergistic effects were captured in a statistical manner, they may indicate either direct or indirect biological relationships among those genetic factors. In particular, the strongest model shows a ‘nested’ epistasis hierarchy with both strong pairwise interactions and a strong pure three-way interaction. These three SNPs are from VDR and CD209, which indicates a potential correlation between these two genes. More interestingly, the strongest model shows a three-way synergy among TLR9, PTX3, and CD209. There have been studies reporting correlations and molecular interactions between TLRs and CD209.41 However, no published research has indicated the three-way biological synergy among all three genes. Our findings may suggest that multiple pathways interweave in the innate immune system to defend the human body against M tuberculosis, and the deficiencies in all those three genetic factors greatly increase the risk of developing tuberculosis. We believe that with further biological validations, our findings could be very helpful in predicting high-risk M tuberculosis-infected individuals and preventing their tuberculosis clinical developments. Investigating high-order genetic interactions is an arduous task, and yet essential for understanding the complex genetic architecture of human diseases. The effectiveness of our information-gain approach in detecting three-way interactions was verified using both synthetic and real genetics data. Note that our measurement is scalable regardless of the size of genetics data. However, when large-scale genome-wide data are considered and exhaustive enumeration of all possible three-way genetic attribute combinations is infeasible, data pre-screening using intelligent data-mining techniques or biology knowledge will be required. There are some interesting venues to extend our approach in future studies. First, it is important to study the genetic interactions on continuous traits. In that case, the probability density functions for continuous random variables will be used to replace the probability mass function in current discrete information-theoretic measures. Second, it will be interesting to extend the measures on synergies higher than three way. However, as more attributes are involved, the interaction hierarchy gets more complicated. More carefully designed mathematical measures are required. We anticipate that designing powerful and efficient methods to quantify high-order epistasis has the great potential in improving disease treatment and healthcare by revealing the genetic complexity of common human diseases.
  38 in total

Review 1.  Determinants of the success of whole-genome association testing.

Authors:  Andrew G Clark; Eric Boerwinkle; James Hixson; Charles F Sing
Journal:  Genome Res       Date:  2005-11       Impact factor: 9.043

Review 2.  Genome-wide association studies for common diseases and complex traits.

Authors:  Joel N Hirschhorn; Mark J Daly
Journal:  Nat Rev Genet       Date:  2005-02       Impact factor: 53.242

3.  Computational inference of the molecular logic for synaptic connectivity in C. elegans.

Authors:  Vinay Varadan; David M Miller; Dimitris Anastassiou
Journal:  Bioinformatics       Date:  2006-07-15       Impact factor: 6.937

4.  Toll-like receptor triggering of a vitamin D-mediated human antimicrobial response.

Authors:  Philip T Liu; Steffen Stenger; Huiying Li; Linda Wenzel; Belinda H Tan; Stephan R Krutzik; Maria Teresa Ochoa; Jürgen Schauber; Kent Wu; Christoph Meinken; Diane L Kamen; Manfred Wagner; Robert Bals; Andreas Steinmeyer; Ulrich Zügel; Richard L Gallo; David Eisenberg; Martin Hewison; Bruce W Hollis; John S Adams; Barry R Bloom; Robert L Modlin
Journal:  Science       Date:  2006-02-23       Impact factor: 47.728

5.  Symbolic modeling of epistasis.

Authors:  Jason H Moore; Nate Barney; Chia-Ti Tsai; Fu-Tien Chiang; Jiang Gui; Bill C White
Journal:  Hum Hered       Date:  2007-02-02       Impact factor: 0.444

Review 6.  Epistasis and its implications for personal genetics.

Authors:  Jason H Moore; Scott M Williams
Journal:  Am J Hum Genet       Date:  2009-09       Impact factor: 11.025

7.  Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection.

Authors:  Ryan J Urbanowicz; Jeff Kiralis; Jonathan M Fisher; Jason H Moore
Journal:  BioData Min       Date:  2012-09-26       Impact factor: 2.522

8.  Characterizing genetic interactions in human disease association studies using statistical epistasis networks.

Authors:  Ting Hu; Nicholas A Sinnott-Armstrong; Jeff W Kiralis; Angeline S Andrew; Margaret R Karagas; Jason H Moore
Journal:  BMC Bioinformatics       Date:  2011-09-12       Impact factor: 3.169

9.  Computational analysis of the synergy among multiple interacting genes.

Authors:  Dimitris Anastassiou
Journal:  Mol Syst Biol       Date:  2007-02-13       Impact factor: 11.429

10.  Genomewide association studies and human disease.

Authors:  John Hardy; Andrew Singleton
Journal:  N Engl J Med       Date:  2009-04-15       Impact factor: 91.245

View more
  29 in total

1.  ViSEN: methodology and software for visualization of statistical epistasis networks.

Authors:  Ting Hu; Yuanzhu Chen; Jeff W Kiralis; Jason H Moore
Journal:  Genet Epidemiol       Date:  2013-03-06       Impact factor: 2.135

Review 2.  Transdisciplinary approaches enhance the production of translational knowledge.

Authors:  Timothy H Ciesielski; Melinda C Aldrich; Carmen J Marsit; Robert A Hiatt; Scott M Williams
Journal:  Transl Res       Date:  2016-11-10       Impact factor: 7.012

3.  A Framework for Efficient N-Way Interaction Testing in Case/Control Studies With Categorical Data.

Authors:  Aristos Aristodimou; Athos Antoniades; Efthimios Dardiotis; Eleni Loizidou; George Spyrou; Christina Votsi; Christodoulou Kyproula; Marios Pantzaris; Nikolaos Grigoriadis; Georgios Hadjigeorgiou; Theodoros Kyriakides; Constantinos Pattichi
Journal:  IEEE Open J Eng Med Biol       Date:  2021-07-27

4.  Genomics of human pulmonary tuberculosis: from genes to pathways.

Authors:  Catherine M Stein; Lindsay Sausville; Christian Wejse; Rafal S Sobota; Nicola M Zetola; Philip C Hill; W Henry Boom; William K Scott; Giorgio Sirugo; Scott M Williams
Journal:  Curr Genet Med Rep       Date:  2017-10-12

5.  Multifactor dimensionality reduction reveals a three-locus epistatic interaction associated with susceptibility to pulmonary tuberculosis.

Authors:  Ryan L Collins; Ting Hu; Christian Wejse; Giorgio Sirugo; Scott M Williams; Jason H Moore
Journal:  BioData Min       Date:  2013-02-18       Impact factor: 2.522

6.  Identification of additive, dominant, and epistatic variation conferred by key genes in cellulose biosynthesis pathway in Populus tomentosa†.

Authors:  Qingzhang Du; Jiaxing Tian; Xiaohui Yang; Wei Pan; Baohua Xu; Bailian Li; Pär K Ingvarsson; Deqiang Zhang
Journal:  DNA Res       Date:  2014-11-26       Impact factor: 4.458

7.  Heuristic identification of biological architectures for simulating complex hierarchical genetic interactions.

Authors:  Jason H Moore; Ryan Amos; Jeff Kiralis; Peter C Andrews
Journal:  Genet Epidemiol       Date:  2014-11-13       Impact factor: 2.135

Review 8.  Detecting epistasis in human complex traits.

Authors:  Wen-Hua Wei; Gibran Hemani; Chris S Haley
Journal:  Nat Rev Genet       Date:  2014-09-09       Impact factor: 53.242

9.  Functional genomics annotation of a statistical epistasis network associated with bladder cancer susceptibility.

Authors:  Ting Hu; Qinxin Pan; Angeline S Andrew; Jillian M Langer; Michael D Cole; Craig R Tomlinson; Margaret R Karagas; Jason H Moore
Journal:  BioData Min       Date:  2014-04-11       Impact factor: 2.522

10.  Computational genetics analysis of grey matter density in Alzheimer's disease.

Authors:  Amanda L Zieselman; Jonathan M Fisher; Ting Hu; Peter C Andrews; Casey S Greene; Li Shen; Andrew J Saykin; Jason H Moore
Journal:  BioData Min       Date:  2014-08-22       Impact factor: 2.522

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.