MOTIVATION: Many standard statistical techniques are effective on data that are normally distributed with constant variance. Microarray data typically violate these assumptions since they come from non-Gaussian distributions with a non-trivial mean-variance relationship. Several methods have been proposed that transform microarray data to stabilize variance and draw its distribution towards the Gaussian. Some methods, such as log or generalized log, rely on an underlying model for the data. Others, such as the spread-versus-level plot, do not. We propose an alternative data-driven multiscale approach, called the Data-Driven Haar-Fisz for microarrays (DDHFm) with replicates. DDHFm has the advantage of being 'distribution-free' in the sense that no parametric model for the underlying microarray data is required to be specified or estimated; hence, DDHFm can be applied very generally, not just to microarray data. RESULTS: DDHFm achieves very good variance stabilization of microarray data with replicates and produces transformed intensities that are approximately normally distributed. Simulation studies show that it performs better than other existing methods. Application of DDHFm to real one-color cDNA data validates these results. AVAILABILITY: The R package of the Data-Driven Haar-Fisz transform (DDHFm) for microarrays is available in Bioconductor and CRAN.
MOTIVATION: Many standard statistical techniques are effective on data that are normally distributed with constant variance. Microarray data typically violate these assumptions since they come from non-Gaussian distributions with a non-trivial mean-variance relationship. Several methods have been proposed that transform microarray data to stabilize variance and draw its distribution towards the Gaussian. Some methods, such as log or generalized log, rely on an underlying model for the data. Others, such as the spread-versus-level plot, do not. We propose an alternative data-driven multiscale approach, called the Data-Driven Haar-Fisz for microarrays (DDHFm) with replicates. DDHFm has the advantage of being 'distribution-free' in the sense that no parametric model for the underlying microarray data is required to be specified or estimated; hence, DDHFm can be applied very generally, not just to microarray data. RESULTS: DDHFm achieves very good variance stabilization of microarray data with replicates and produces transformed intensities that are approximately normally distributed. Simulation studies show that it performs better than other existing methods. Application of DDHFm to real one-color cDNA data validates these results. AVAILABILITY: The R package of the Data-Driven Haar-Fisz transform (DDHFm) for microarrays is available in Bioconductor and CRAN.
Authors: Katherine Y King; Megan T Baldridge; David C Weksberg; Stuart M Chambers; Georgi L Lukov; Shihua Wu; Nathan C Boles; Sung Yun Jung; Jun Qin; Dan Liu; Zhou Songyang; N Tony Eissa; Gregory A Taylor; Margaret A Goodell Journal: Blood Date: 2011-06-01 Impact factor: 22.113
Authors: Shucha Zhang; Cheng Zheng; Ian R Lanza; K Sreekumaran Nair; Daniel Raftery; Olga Vitek Journal: Anal Chem Date: 2009-08-01 Impact factor: 6.986
Authors: Tung T Nguyen; Richard R Almon; Debra C DuBois; William J Jusko; Ioannis P Androulakis Journal: BMC Bioinformatics Date: 2010-05-26 Impact factor: 3.169
Authors: Saioa López; Isabel Smith-Zubiaga; Alicia García de Galdeano; María Dolores Boyano; Oscar García; Jesús Gardeazábal; Conrado Martinez-Cadenas; Neskuts Izagirre; Concepción de la Rúa; Santos Alonso Journal: PLoS One Date: 2015-08-05 Impact factor: 3.240