Marcus A Badgeley1, Stuart C Sealfon1, Maria D Chikina1. 1. Department of Neurology, Mount Sinai School of Medicine, New York, NY 10029 and Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15260, USA.
Abstract
MOTIVATION: Modern molecular technologies allow the collection of large amounts of high-throughput data on the functional attributes of genes. Often multiple technologies and study designs are used to address the same biological question such as which genes are overexpressed in a specific disease state. Consequently, there is considerable interest in methods that can integrate across datasets to present a unified set of predictions. RESULTS: An important aspect of data integration is being able to account for the fact that datasets may differ in how accurately they capture the biological signal of interest. While many methods to address this problem exist, they always rely either on dataset internal statistics, which reflect data structure and not necessarily biological relevance, or external gold standards, which may not always be available. We present a new rank aggregation method for data integration that requires neither external standards nor internal statistics but relies on Bayesian reasoning to assess dataset relevance. We demonstrate that our method outperforms established techniques and significantly improves the predictive power of rank-based aggregations. We show that our method, which does not require an external gold standard, provides reliable estimates of dataset relevance and allows the same set of data to be integrated differently depending on the specific signal of interest. AVAILABILITY: The method is implemented in R and is freely available at http://www.pitt.edu/~mchikina/BIRRA/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Modern molecular technologies allow the collection of large amounts of high-throughput data on the functional attributes of genes. Often multiple technologies and study designs are used to address the same biological question such as which genes are overexpressed in a specific disease state. Consequently, there is considerable interest in methods that can integrate across datasets to present a unified set of predictions. RESULTS: An important aspect of data integration is being able to account for the fact that datasets may differ in how accurately they capture the biological signal of interest. While many methods to address this problem exist, they always rely either on dataset internal statistics, which reflect data structure and not necessarily biological relevance, or external gold standards, which may not always be available. We present a new rank aggregation method for data integration that requires neither external standards nor internal statistics but relies on Bayesian reasoning to assess dataset relevance. We demonstrate that our method outperforms established techniques and significantly improves the predictive power of rank-based aggregations. We show that our method, which does not require an external gold standard, provides reliable estimates of dataset relevance and allows the same set of data to be integrated differently depending on the specific signal of interest. AVAILABILITY: The method is implemented in R and is freely available at http://www.pitt.edu/~mchikina/BIRRA/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Bin Zheng; Zhixiang Liao; Joseph J Locascio; Kristen A Lesniak; Sarah S Roderick; Marla L Watt; Aron C Eklund; Yanli Zhang-James; Peter D Kim; Michael A Hauser; Edna Grünblatt; Linda B Moran; Silvia A Mandel; Peter Riederer; Renee M Miller; Howard J Federoff; Ullrich Wüllner; Spyridon Papapetropoulos; Moussa B Youdim; Ippolita Cantuti-Castelvetri; Anne B Young; Jeffery M Vance; Richard L Davis; John C Hedreen; Charles H Adler; Thomas G Beach; Manuel B Graeber; Frank A Middleton; Jean-Christophe Rochet; Clemens R Scherzer Journal: Sci Transl Med Date: 2010-10-06 Impact factor: 17.956
Authors: Richard S Spielman; Laurel A Bastone; Joshua T Burdick; Michael Morley; Warren J Ewens; Vivian G Cheung Journal: Nat Genet Date: 2007-01-07 Impact factor: 38.330
Authors: Atsushi Suzuki; Ángel Raya; Yasuhiko Kawakami; Masanobu Morita; Takaaki Matsui; Kinichi Nakashima; Fred H Gage; Concepción Rodríguez-Esteban; Juan Carlos Izpisúa Belmonte Journal: Proc Natl Acad Sci U S A Date: 2006-06-26 Impact factor: 11.205
Authors: Insuk Lee; Ben Lehner; Catriona Crombie; Wendy Wong; Andrew G Fraser; Edward M Marcotte Journal: Nat Genet Date: 2008-01-27 Impact factor: 38.330
Authors: Curtis Huttenhower; Erin M Haley; Matthew A Hibbs; Vanessa Dumeaux; Daniel R Barrett; Hilary A Coller; Olga G Troyanskaya Journal: Genome Res Date: 2009-02-26 Impact factor: 9.043
Authors: Alain Coletta; Colin Molter; Robin Duqué; David Steenhoff; Jonatan Taminau; Virginie de Schaetzen; Stijn Meganck; Cosmin Lazar; David Venet; Vincent Detours; Ann Nowé; Hugues Bersini; David Y Weiss Solís Journal: Genome Biol Date: 2012-11-18 Impact factor: 13.583