Salih Tuna1, Mahesan Niranjan. 1. School of Electronics and Computer Science, University of Southampton, Southampton, UK.
Abstract
MOTIVATION: High-throughput measurements of mRNA abundances from microarrays involve several stages of preprocessing. At each stage, a user has access to a large number of algorithms with no universally agreed guidance on which of these to use. We show that binary representations of gene expressions, retaining only information on whether a gene is expressed or not, reduces the variability in results caused by algorithmic choice, while also improving the quality of inference drawn from microarray studies. RESULTS: Binary representation of transcriptome data has the desirable property of reducing the variability introduced at the preprocessing stages due to algorithmic choice. We compare the effect of the choice of algorithms on different problems and suggest that using binary representation of microarray data with Tanimoto kernel for support vector machine reduces the effect of the choice of algorithm and simultaneously improves the performance of classification of phenotypes.
MOTIVATION: High-throughput measurements of mRNA abundances from microarrays involve several stages of preprocessing. At each stage, a user has access to a large number of algorithms with no universally agreed guidance on which of these to use. We show that binary representations of gene expressions, retaining only information on whether a gene is expressed or not, reduces the variability in results caused by algorithmic choice, while also improving the quality of inference drawn from microarray studies. RESULTS: Binary representation of transcriptome data has the desirable property of reducing the variability introduced at the preprocessing stages due to algorithmic choice. We compare the effect of the choice of algorithms on different problems and suggest that using binary representation of microarray data with Tanimoto kernel for support vector machine reduces the effect of the choice of algorithm and simultaneously improves the performance of classification of phenotypes.
Authors: Tanya Barrett; Dennis B Troup; Stephen E Wilhite; Pierre Ledoux; Carlos Evangelista; Irene F Kim; Maxim Tomashevsky; Kimberly A Marshall; Katherine H Phillippy; Patti M Sherman; Rolf N Muertter; Michelle Holko; Oluwabukunmi Ayanbule; Andrey Yefanov; Alexandra Soboleva Journal: Nucleic Acids Res Date: 2010-11-21 Impact factor: 16.971
Authors: Patrick S Stumpf; Rosanna C G Smith; Michael Lenz; Andreas Schuppert; Franz-Josef Müller; Ann Babtie; Thalia E Chan; Michael P H Stumpf; Colin P Please; Sam D Howison; Fumio Arai; Ben D MacArthur Journal: Cell Syst Date: 2017-09-27 Impact factor: 10.304