Amit G Deshwar1, Quaid Morris. 1. Edward S. Rogers Sr. Department of Electrical and Computer Engineering, Department of Molecular Genetics, Department of Computer Science and Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 1A1, Canada.
Abstract
MOTIVATION: Gene expression data are currently collected on a wide range of platforms. Differences between platforms make it challenging to combine and compare data collected on different platforms. We propose a new method of cross-platform normalization that uses topic models to summarize the expression patterns in each dataset before normalizing the topics learned from each dataset using per-gene multiplicative weights. RESULTS: This method allows for cross-platform normalization even when samples profiled on different platforms have systematic differences, allows the simultaneous normalization of data from an arbitrary number of platforms and, after suitable training, allows for online normalization of expression data collected individually or in small batches. In addition, our method outperforms existing state-of-the-art platform normalization tools. AVAILABILITY AND IMPLEMENTATION: MATLAB code is available at http://morrislab.med.utoronto.ca/plida/.
MOTIVATION: Gene expression data are currently collected on a wide range of platforms. Differences between platforms make it challenging to combine and compare data collected on different platforms. We propose a new method of cross-platform normalization that uses topic models to summarize the expression patterns in each dataset before normalizing the topics learned from each dataset using per-gene multiplicative weights. RESULTS: This method allows for cross-platform normalization even when samples profiled on different platforms have systematic differences, allows the simultaneous normalization of data from an arbitrary number of platforms and, after suitable training, allows for online normalization of expression data collected individually or in small batches. In addition, our method outperforms existing state-of-the-art platform normalization tools. AVAILABILITY AND IMPLEMENTATION: MATLAB code is available at http://morrislab.med.utoronto.ca/plida/.
Authors: Nicolas Borisov; Maria Suntsova; Maxim Sorokin; Andrew Garazha; Olga Kovalchuk; Alexander Aliper; Elena Ilnitskaya; Ksenia Lezhnina; Mikhail Korzinkin; Victor Tkachev; Vyacheslav Saenko; Yury Saenko; Dmitry G Sokov; Nurshat M Gaifullin; Kirill Kashintsev; Valery Shirokorad; Irina Shabalina; Alex Zhavoronkov; Bhubaneswar Mishra; Charles R Cantor; Anton Buzdin Journal: Cell Cycle Date: 2017-08-21 Impact factor: 4.534