Literature DB >> 28005911

Epistasis and Entropy.

Kristina Crona1.   

Abstract

Epistasis is a key concept in the theory of adaptation. Indicators of epistasis are of interest for large systems where systematic fitness measurements may not be possible. Some recent approaches depend on information theory. We show that considering shared entropy for pairs of loci can be misleading. The reason is that shared entropy does not imply epistasis for the pair. This observation holds true also in the absence of higher order epistasis. We discuss a method for reducing the number of false positives. However, our main conclusion is that entropy-based approaches have serious limitations in this context.

Entities:  

Mesh:

Year:  2016        PMID: 28005911      PMCID: PMC5178987          DOI: 10.1371/journal.pgen.1006322

Source DB:  PubMed          Journal:  PLoS Genet        ISSN: 1553-7390            Impact factor:   5.917


On Recent Approaches to Entropy and Epistasis

Methods for inferring gene epistasis, or gene interactions, without fitness measurements are valuable for many reasons. It may be difficult or costly, if even possible, to accurately measure fitness for populations in nature. One approach depends on information theory. Briefly, entropy can be used for finding nonrandom associations for individuals in a population [1]. For instance, if two mutations tend to co-occur, then they will have nonzero shared entropy. Gupta and Adami [2] consider entropy for HIV drug resistance mutations and interpret shared entropy for a pair of mutations as evidence for pairwise epistasis. However, the authors’ conclusion is not valid under realistic assumptions. One cannot deduce pairwise epistasis from shared entropy. Shared entropy may have other causes. For instance, it is well established that a particular mutation (or substitution) may serve as a door opener for new mutations [3, 4]. The new mutations are selected for only if the first mutation has occurred. Such constraints are known from antimicrobial drug resistance, and they may explain many cases of parallel evolution. In concrete terms, suppose that A and B are drug resistance mutations, but that B is selected for only if A has occurred. This would be the case if the beneficial effect of B depends on the presence of A. In a list of clinically found drug resistance mutants, B would not appear unless A is present (such patterns are far from rare for resistance mutations). The connection to entropy should be clear. Suppose that another mutation, C, depends on A as well. In that case it is quite plausible that B and C tend to co-occur in the population, and an analysis would reveal nonzero shared entropy for B and C. However, the fitness effects of B and C may be completely independent; i.e., there is no epistasis for B and C. It is easy to see how misleading the shared entropy condition for pairwise epistasis can be. Suppose that ten mutations are mutually independent (the presence of the others is neither an advantage nor a disadvantage) but that they all depend on A. Then one would identify shared entropy for 55 pairs although there is epistasis for 10 pairs only. The shared entropy condition is misleading also for the most simple systems. Example 1 in S1 Text concerns a system with no higher order epistasis, i.e, no epistasis beyond pairwise interactions. Two mutations in the system have shared entropy, although their fitness effects are independent. Indeed, 2-way epistasis was excluded both according the geometric classification of gene interactions [5] and according to an approach that depends on Walsh coefficients [6]. Moreover, for a slightly more involved case where the starting point is a heterogeneous population (see Example 3 in S1 Text and the subsequent discussion) the shared entropy is as high as could be for two mutations, although there is no epistasis for the pair. However, the starting point in Gupta and Adami [2] is sound. If there is no epistasis at all in a system, then one would measure little or no shared entropy for mutations under ideal circumstances. The question is if one can learn anything more specific about epistasis from shared entropy. Our analysis of the simple rule that shared entropy for two loci implies epistasis for the pair revealed problems. The rule gives false positives. A method that filters out some false positives is discussed in S1 Text. The method depends on considering the entire network of pairs with shared entropy, so as to distinguish between direct and indirect causes for shared entropy. However, the method will not provide a complete solution to the problem of relating epistasis and entropy. Gene interactions are difficult to analyze from frequency data. Nevertheless, it is possible that some approach of the type proposed in Gupta and Adami [2] could work as a rule of thumb. However, that would require a statistical argument. As it currently stands, there is no foundation for the shared entropy condition for identifying pairwise epistasis. It should be remarked that the criticism expressed here does not apply to entropy-based methods in human genetics [e.g., 7–9]. Applications of information theory to human genetics depends on the ability to compare genetic information to health conditions. No similar information that directly relates genotype and phenotype is available for the HIV data analyzed in Gupta and Adami [2]. In that sense, the authors considered a more difficult problem. For a more general perspective, detecting and quantifying epistasis for multilocus systems is a challenging problem, and various new methods have been proposed in recent years. For instance, one line of research provides tools for detecting gene interactions from qualitative data, such as rank orders of genotypes according to fitness [e.g., 10–12]. There is no question that entropy-based approaches, as well as many other recent methods have potential. However, while conducting research in the field we noticed that not all of the methods have been justified by solid theoretical arguments. Moreover, experimentalists have reported seemingly contradicting results from different methods applied to the same protein data (personal communication). Some caution is recommended, and it is probably fair to say that the field is “heroic” rather than “mature” at this point in time. (PDF) Click here for additional data file.
  9 in total

1.  Inferring tree models for oncogenesis from comparative genome hybridization data.

Authors:  R Desper; F Jiang; O P Kallioniemi; H Moch; C H Papadimitriou; A A Schäffer
Journal:  J Comput Biol       Date:  1999       Impact factor: 1.479

2.  Reciprocal sign epistasis is a necessary condition for multi-peaked fitness landscapes.

Authors:  Frank J Poelwijk; Sorin Tănase-Nicola; Daniel J Kiviet; Sander J Tans
Journal:  J Theor Biol       Date:  2010-12-16       Impact factor: 2.691

Review 3.  Perspective: Sign epistasis and genetic constraint on evolutionary trajectories.

Authors:  Daniel M Weinreich; Richard A Watson; Lin Chao
Journal:  Evolution       Date:  2005-06       Impact factor: 3.694

4.  An entropy-based approach for testing genetic epistasis underlying complex diseases.

Authors:  Guolian Kang; Weihua Yue; Jifeng Zhang; Yuehua Cui; Yijun Zuo; Dai Zhang
Journal:  J Theor Biol       Date:  2007-10-06       Impact factor: 2.691

5.  Exploration of gene-gene interaction effects using entropy-based methods.

Authors:  Changzheng Dong; Xun Chu; Ying Wang; Yi Wang; Li Jin; Tieliu Shi; Wei Huang; Yixue Li
Journal:  Eur J Hum Genet       Date:  2007-10-31       Impact factor: 4.246

6.  The peaks and geometry of fitness landscapes.

Authors:  Kristina Crona; Devin Greene; Miriam Barlow
Journal:  J Theor Biol       Date:  2012-10-02       Impact factor: 2.691

Review 7.  Should evolutionary geneticists worry about higher-order epistasis?

Authors:  Daniel M Weinreich; Yinghong Lan; C Scott Wylie; Robert B Heckendorn
Journal:  Curr Opin Genet Dev       Date:  2013-11-27       Impact factor: 5.578

8.  An information-gain approach to detecting three-way epistatic interactions in genetic association studies.

Authors:  Ting Hu; Yuanzhu Chen; Jeff W Kiralis; Ryan L Collins; Christian Wejse; Giorgio Sirugo; Scott M Williams; Jason H Moore
Journal:  J Am Med Inform Assoc       Date:  2013-02-08       Impact factor: 4.497

9.  Strong Selection Significantly Increases Epistatic Interactions in the Long-Term Evolution of a Protein.

Authors:  Aditi Gupta; Christoph Adami
Journal:  PLoS Genet       Date:  2016-03-30       Impact factor: 5.917

  9 in total
  1 in total

1.  Shared Information between Residues Is Sufficient to Detect Pairwise Epistasis in a Protein.

Authors:  Aditi Gupta; Christoph Adami
Journal:  PLoS Genet       Date:  2016-12-22       Impact factor: 5.917

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.