Kenneth Lo1, Raphael Gottardo. 1. Department of Statistics, University of British Columbia, 333-6356 Agricultural Road, Vancouver, BC, Canada V6T 1Z2. c.lo@stat.ubc.ca
Abstract
MOTIVATION: Inference about differential expression is a typical objective when analyzing gene expression data. Recently, Bayesian hierarchical models have become increasingly popular for this type of problem. The two most common hierarchical models are the hierarchical Gamma-Gamma (GG) and Lognormal-Normal (LNN) models. However, to facilitate inference, some unrealistic assumptions have been made. One such assumption is that of a common coefficient of variation across genes, which can adversely affect the resulting inference. RESULTS: In this paper, we extend both the GG and LNN modeling frameworks to allow for gene-specific variances and propose EM based algorithms for parameter estimation. The proposed methodology is evaluated on three experimental datasets: one cDNA microarray experiment and two Affymetrix spike-in experiments. The two extended models significantly reduce the false positive rate while keeping a high sensitivity when compared to the originals. Finally, using a simulation study we show that the new frameworks are also more robust to model misspecification. AVAILABILITY: The R code for implementing the proposed methodology can be downloaded at http://www.stat.ubc.ca/~c.lo/FEBarrays. SUPPLEMENTARY INFORMATION: The supplementary material is available at http://www.stat.ubc.ca/~c.lo/FEBarrays/supp.pdf.
MOTIVATION: Inference about differential expression is a typical objective when analyzing gene expression data. Recently, Bayesian hierarchical models have become increasingly popular for this type of problem. The two most common hierarchical models are the hierarchical Gamma-Gamma (GG) and Lognormal-Normal (LNN) models. However, to facilitate inference, some unrealistic assumptions have been made. One such assumption is that of a common coefficient of variation across genes, which can adversely affect the resulting inference. RESULTS: In this paper, we extend both the GG and LNN modeling frameworks to allow for gene-specific variances and propose EM based algorithms for parameter estimation. The proposed methodology is evaluated on three experimental datasets: one cDNA microarray experiment and two Affymetrix spike-in experiments. The two extended models significantly reduce the false positive rate while keeping a high sensitivity when compared to the originals. Finally, using a simulation study we show that the new frameworks are also more robust to model misspecification. AVAILABILITY: The R code for implementing the proposed methodology can be downloaded at http://www.stat.ubc.ca/~c.lo/FEBarrays. SUPPLEMENTARY INFORMATION: The supplementary material is available at http://www.stat.ubc.ca/~c.lo/FEBarrays/supp.pdf.
Authors: Greg Finak; Andrew McDavid; Pratip Chattopadhyay; Maria Dominguez; Steve De Rosa; Mario Roederer; Raphael Gottardo Journal: Biostatistics Date: 2013-07-24 Impact factor: 5.899
Authors: Brian C Haynes; Michael L Skowyra; Sarah J Spencer; Stacey R Gish; Matthew Williams; Elizabeth P Held; Michael R Brent; Tamara L Doering Journal: PLoS Pathog Date: 2011-12-08 Impact factor: 6.823