Jyotsna Kasturi1, Raj Acharya, Murali Ramanathan. 1. Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA 16802, USA. jkasturi@cse.psu.edu
Abstract
MOTIVATION: Arrays allow measurements of the expression levels of thousands of mRNAs to be made simultaneously. The resulting data sets are information rich but require extensive mining to enhance their usefulness. Information theoretic methods are capable of assessing similarities and dissimilarities between data distributions and may be suited to the analysis of gene expression experiments. The purpose of this study was to investigate information theoretic data mining approaches to discover temporal patterns of gene expression from array-derived gene expression data. RESULTS: The Kullback-Leibler divergence, an information-theoretic distance that measures the relative dissimilarity between two data distribution profiles, was used in conjunction with an unsupervised self-organizing map algorithm. Two published, array-derived gene expression data sets were analyzed. The patterns obtained with the KL clustering method were found to be superior to those obtained with the hierarchical clustering algorithm using the Pearson correlation distance measure. The biological significance of the results was also examined. AVAILABILITY: Software code is available by request from the authors. All programs were written in ANSI C and Matlab (Mathworks Inc., Natick, MA).
MOTIVATION: Arrays allow measurements of the expression levels of thousands of mRNAs to be made simultaneously. The resulting data sets are information rich but require extensive mining to enhance their usefulness. Information theoretic methods are capable of assessing similarities and dissimilarities between data distributions and may be suited to the analysis of gene expression experiments. The purpose of this study was to investigate information theoretic data mining approaches to discover temporal patterns of gene expression from array-derived gene expression data. RESULTS: The Kullback-Leibler divergence, an information-theoretic distance that measures the relative dissimilarity between two data distribution profiles, was used in conjunction with an unsupervised self-organizing map algorithm. Two published, array-derived gene expression data sets were analyzed. The patterns obtained with the KL clustering method were found to be superior to those obtained with the hierarchical clustering algorithm using the Pearson correlation distance measure. The biological significance of the results was also examined. AVAILABILITY: Software code is available by request from the authors. All programs were written in ANSI C and Matlab (Mathworks Inc., Natick, MA).
Authors: Arkadiusz Gertych; Kolja A Wawrowsky; Erik Lindsley; Eugene Vishnevsky; Daniel L Farkas; Jian Tajbakhsh Journal: Cytometry A Date: 2009-07 Impact factor: 4.355
Authors: José Peña; Jessica A Plante; Alda Celena Carillo; Kimberly K Roberts; Jennifer K Smith; Terry L Juelich; David W C Beasley; Alexander N Freiberg; Montiago X Labute; Pejman Naraghi-Arani Journal: PLoS Negl Trop Dis Date: 2014-10-23
Authors: Tom Michoel; Steven Maere; Eric Bonnet; Anagha Joshi; Yvan Saeys; Tim Van den Bulcke; Koenraad Van Leemput; Piet van Remortel; Martin Kuiper; Kathleen Marchal; Yves Van de Peer Journal: BMC Bioinformatics Date: 2007-05-03 Impact factor: 3.169
Authors: Jianchao Yao; Chunqi Chang; Mari L Salmi; Yeung Sam Hung; Ann Loraine; Stanley J Roux Journal: BMC Bioinformatics Date: 2008-06-18 Impact factor: 3.169