Yong Lu1, Roni Rosenfeld, Ziv Bar-Joseph. 1. School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, 15213, USA.
Abstract
MOTIVATION: The expression of genes during the cell division process has now been studied in many different species. An important goal of these studies is to identify the set of cycling genes. To date, this was done independently for each of the species studied. Due to noise and other data analysis problems, accurately deriving a set of cycling genes from expression data is a hard problem. This is especially true for some of the multicellular organisms, including humans. RESULTS: Here we present the first algorithm that combines microarray expression data from multiple species for identifying cycling genes. Our algorithm represents genes from multiple species as nodes in a graph. Edges between genes represent sequence similarity. Starting with the measured expression values for each species we use Belief Propagation to determine a posterior score for genes. This posterior is used to determine a new set of cycling genes for each species. We applied our algorithm to improve the identification of the set of cell cycle genes in budding yeast and humans. As we show, by incorporating sequence similarity information we were able to obtain a more accurate set of genes compared to methods that rely on expression data alone. Our method was especially successful for the human dataset indicating that it can use a high quality dataset from one species to overcome noise problems in another. AVAILABILITY: C implementation is available from the supporting website: http://www.cs.cmu.edu/~lyongu/pub/cellcycle/.
MOTIVATION: The expression of genes during the cell division process has now been studied in many different species. An important goal of these studies is to identify the set of cycling genes. To date, this was done independently for each of the species studied. Due to noise and other data analysis problems, accurately deriving a set of cycling genes from expression data is a hard problem. This is especially true for some of the multicellular organisms, including humans. RESULTS: Here we present the first algorithm that combines microarray expression data from multiple species for identifying cycling genes. Our algorithm represents genes from multiple species as nodes in a graph. Edges between genes represent sequence similarity. Starting with the measured expression values for each species we use Belief Propagation to determine a posterior score for genes. This posterior is used to determine a new set of cycling genes for each species. We applied our algorithm to improve the identification of the set of cell cycle genes in budding yeast and humans. As we show, by incorporating sequence similarity information we were able to obtain a more accurate set of genes compared to methods that rely on expression data alone. Our method was especially successful for the human dataset indicating that it can use a high quality dataset from one species to overcome noise problems in another. AVAILABILITY: C implementation is available from the supporting website: http://www.cs.cmu.edu/~lyongu/pub/cellcycle/.
Authors: Carolina Sarobo; Lívia M Lacorte; Marcela Martins; Jaqueline C Rinaldi; Andrei Moroz; Wellerson R Scarano; Flavia K Delella; Sérgio L Felisbino Journal: Int J Exp Pathol Date: 2012-12 Impact factor: 1.925
Authors: Nicholas Paul Gauthier; Malene Erup Larsen; Rasmus Wernersson; Ulrik de Lichtenberg; Lars Juhl Jensen; Søren Brunak; Thomas Skøt Jensen Journal: Nucleic Acids Res Date: 2007-10-16 Impact factor: 16.971