Literature DB >> 30495984

Multivariate Analysis of Data Sets with Missing Values: An Information Theory-Based Reliability Function.

Lisa Uechi1, David J Galas1, Nikita A Sakhanenko1.   

Abstract

Missing values in complex biological data sets have significant impacts on our ability to correctly detect and quantify interactions in biological systems and to infer relationships accurately. In this article, we propose a useful metaphor to show that information theory measures, such as mutual information and interaction information, can be employed directly for evaluating multivariable dependencies even if data contain some missing values. The metaphor is that of thinking of variable dependencies as information channels between and among variables. In this view, missing data can be thought of as noise that reduces the channel capacity in predictable ways. We extract the available information in the data even if there are missing values and use the notion of channel capacity to assess the reliability of the result. This avoids the common practice-in the absence of prior knowledge of random imputation-of eliminating samples entirely, thus losing the information they can provide. We show how this reliability function can be implemented for pairs of variables, and generalize it for an arbitrary number of variables. Illustrations of the reliability functions for several cases are provided using simulated data.

Entities:  

Keywords:  channel capacity; information theory; missing data; multivariate data analysis; reliability function

Mesh:

Year:  2018        PMID: 30495984      PMCID: PMC6383577          DOI: 10.1089/cmb.2018.0179

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  13 in total

1.  Evolution of biological complexity.

Authors:  C Adami; C Ofria; T C Collier
Journal:  Proc Natl Acad Sci U S A       Date:  2000-04-25       Impact factor: 11.205

2.  High-resolution haplotype structure in the human genome.

Authors:  M J Daly; J D Rioux; S F Schaffner; T J Hudson; E S Lander
Journal:  Nat Genet       Date:  2001-10       Impact factor: 38.330

3.  Complexity in biological signaling systems.

Authors:  G Weng; U S Bhalla; R Iyengar
Journal:  Science       Date:  1999-04-02       Impact factor: 47.728

4.  Inference of missing SNPs and information quantity measurements for haplotype blocks.

Authors:  Shih-Chieh Su; C-C Jay Kuo; Ting Chen
Journal:  Bioinformatics       Date:  2005-02-04       Impact factor: 6.937

5.  Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation.

Authors:  Matthew Stephens; Paul Scheet
Journal:  Am J Hum Genet       Date:  2005-01-31       Impact factor: 11.025

6.  An entropy-based measure of founder informativeness.

Authors:  M Humberto Reyes-Valdés; Claire G Williams
Journal:  Genet Res       Date:  2005-02       Impact factor: 1.588

7.  Biological Information as Set-Based Complexity.

Authors:  David J Galas; Matti Nykter; Gregory W Carter; Nathan D Price; Ilya Shmulevich
Journal:  IEEE Trans Inf Theory       Date:  2010-02-25       Impact factor: 2.501

8.  Biological data analysis as an information theory problem: multivariable dependence measures and the shadows algorithm.

Authors:  Nikita A Sakhanenko; David J Galas
Journal:  J Comput Biol       Date:  2015-09-03       Impact factor: 1.479

9.  The Information Content of Discrete Functions and Their Application in Genetic Data Analysis.

Authors:  Nikita A Sakhanenko; James Kunert-Graf; David J Galas
Journal:  J Comput Biol       Date:  2017-10-13       Impact factor: 1.479

10.  Information compression exploits patterns of genome composition to discriminate populations and highlight regions of evolutionary interest.

Authors:  Nicholas J Hudson; Laercio R Porto-Neto; James Kijas; Sean McWilliam; Ryan J Taft; Antonio Reverter
Journal:  BMC Bioinformatics       Date:  2014-03-07       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.