Literature DB >> 20236946

Mutual information is critically dependent on prior assumptions: would the correct estimate of mutual information please identify itself?

Andrew D Fernandes1, Gregory B Gloor.   

Abstract

MOTIVATION: Mutual information (MI) is a quantity that measures the dependence between two arbitrary random variables and has been repeatedly used to solve a wide variety of bioinformatic problems. Recently, when attempting to quantify the effects of sampling variance on computed values of MI in proteins, we encountered striking differences among various novel estimates of MI. These differences revealed that estimating the 'true' value of MI is not a straightforward procedure, and minor variations of assumptions yielded remarkably different estimates.
RESULTS: We describe four formally equivalent estimates of MI, three of which explicitly account for sampling variance, that yield non-equal values of MI given exact frequencies. These MI estimates are essentially non-predictive of each other, converging only in the limit of implausibly large datasets. Lastly, we show that all four estimates are biologically reasonable estimates of MI, despite their disparity, since each is actually the Kullback-Leibler divergence between random variables conditioned on equally plausible hypotheses.
CONCLUSIONS: For sparse contingency tables of the type universally observed in protein coevolution studies, our results show that estimates of MI, and hence inferences about physical phenomena such as coevolution, are critically dependent on at least three prior assumptions. These assumptions are: (i) how observation counts relate to expected frequencies; (ii) the relationship between joint and marginal frequencies; and (iii) how non-observed categories are interpreted. In any biologically relevant data, these assumptions will affect the MI estimate as much or more-so than observed data, and are independent of uncertainty in frequency parameters.

Mesh:

Year:  2010        PMID: 20236946     DOI: 10.1093/bioinformatics/btq111

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  6 in total

1.  EEG-based investigation of brain connectivity changes in psychotic patients undergoing the primitive expression form of dance therapy: a methodological pilot study.

Authors:  Errikos-Chaim Ventouras; Alexia Margariti; Paraskevi Chondraki; Ioannis Kalatzis; Nicholas-Tiberio Economou; Hara Tsekou; Thomas Paparrigopoulos; Periklis Ktonas
Journal:  Cogn Neurodyn       Date:  2014-11-14       Impact factor: 5.082

Review 2.  Bioinformatic analysis of HIV-1 entry and pathogenesis.

Authors:  Benjamas Aiamkitsumrit; Will Dampier; Gregory Antell; Nina Rivera; Julio Martin-Garcia; Vanessa Pirrone; Michael R Nonnemacher; Brian Wigdahl
Journal:  Curr HIV Res       Date:  2014       Impact factor: 1.581

3.  The contribution of coevolving residues to the stability of KDO8P synthase.

Authors:  Sharon H Ackerman; Domenico L Gatti
Journal:  PLoS One       Date:  2011-03-09       Impact factor: 3.240

4.  New methods to measure residues coevolution in proteins.

Authors:  Hongyun Gao; Yongchao Dou; Jialiang Yang; Jun Wang
Journal:  BMC Bioinformatics       Date:  2011-05-26       Impact factor: 3.169

5.  Information processing by simple molecular motifs and susceptibility to noise.

Authors:  Siobhan S Mc Mahon; Oleg Lenive; Sarah Filippi; Michael P H Stumpf
Journal:  J R Soc Interface       Date:  2015-09-06       Impact factor: 4.118

6.  Accurate simulation and detection of coevolution signals in multiple sequence alignments.

Authors:  Sharon H Ackerman; Elisabeth R Tillier; Domenico L Gatti
Journal:  PLoS One       Date:  2012-10-16       Impact factor: 3.240

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.