MOTIVATION: In a typical gene expression profiling study, our prime objective is to identify the genes that are differentially expressed between the samples from two different tissue types. Commonly, standard analysis of variance (ANOVA)/regression is implemented to identify the relative effects of these genes over the two types of samples from their respective arrays of expression levels. But, this technique becomes fundamentally flawed when there are unaccounted sources of variability in these arrays (latent variables attributable to different biological, environmental or other factors relevant in the context). These factors distort the true picture of differential gene expression between the two tissue types and introduce spurious signals of expression heterogeneity. As a result, many genes which are actually differentially expressed are not detected, whereas many others are falsely identified as positives. Moreover, these distortions can be different for different genes. Thus, it is also not possible to get rid of these variations by simple array normalizations. This both-way error can lead to a serious loss in sensitivity and specificity, thereby causing a severe inefficiency in the underlying multiple testing problem. In this work, we attempt to identify the hidden effects of the underlying latent factors in a gene expression profiling study by partial least squares (PLS) and apply ANCOVA technique with the PLS-identified signatures of these hidden effects as covariates, in order to identify the genes that are truly differentially expressed between the two concerned tissue types. RESULTS: We compare the performance of our method SVA-PLS with standard ANOVA and a relatively recent technique of surrogate variable analysis (SVA), on a wide variety of simulation settings (incorporating different effects of the hidden variable, under situations with varying signal intensities and gene groupings). In all settings, our method yields the highest sensitivity while maintaining relatively reasonable values for the specificity, false discovery rate and false non-discovery rate. Application of our method to gene expression profiling for acute megakaryoblastic leukemia shows that our method detects an additional six genes, that are missed by both the standard ANOVA method as well as SVA, but may be relevant to this disease, as can be seen from mining the existing literature.
MOTIVATION: In a typical gene expression profiling study, our prime objective is to identify the genes that are differentially expressed between the samples from two different tissue types. Commonly, standard analysis of variance (ANOVA)/regression is implemented to identify the relative effects of these genes over the two types of samples from their respective arrays of expression levels. But, this technique becomes fundamentally flawed when there are unaccounted sources of variability in these arrays (latent variables attributable to different biological, environmental or other factors relevant in the context). These factors distort the true picture of differential gene expression between the two tissue types and introduce spurious signals of expression heterogeneity. As a result, many genes which are actually differentially expressed are not detected, whereas many others are falsely identified as positives. Moreover, these distortions can be different for different genes. Thus, it is also not possible to get rid of these variations by simple array normalizations. This both-way error can lead to a serious loss in sensitivity and specificity, thereby causing a severe inefficiency in the underlying multiple testing problem. In this work, we attempt to identify the hidden effects of the underlying latent factors in a gene expression profiling study by partial least squares (PLS) and apply ANCOVA technique with the PLS-identified signatures of these hidden effects as covariates, in order to identify the genes that are truly differentially expressed between the two concerned tissue types. RESULTS: We compare the performance of our method SVA-PLS with standard ANOVA and a relatively recent technique of surrogate variable analysis (SVA), on a wide variety of simulation settings (incorporating different effects of the hidden variable, under situations with varying signal intensities and gene groupings). In all settings, our method yields the highest sensitivity while maintaining relatively reasonable values for the specificity, false discovery rate and false non-discovery rate. Application of our method to gene expression profiling for acute megakaryoblastic leukemia shows that our method detects an additional six genes, that are missed by both the standard ANOVA method as well as SVA, but may be relevant to this disease, as can be seen from mining the existing literature.
Authors: Nicholas B Larson; Shannon McDonnell; Amy J French; Zach Fogarty; John Cheville; Sumit Middha; Shaun Riska; Saurabh Baheti; Asha A Nair; Liang Wang; Daniel J Schaid; Stephen N Thibodeau Journal: Am J Hum Genet Date: 2015-05-14 Impact factor: 11.025
Authors: Susanne C van den Brink; Anna Alemany; Vincent van Batenburg; Naomi Moris; Marloes Blotenburg; Judith Vivié; Peter Baillie-Johnson; Jennifer Nichols; Katharina F Sonnen; Alfonso Martinez Arias; Alexander van Oudenaarden Journal: Nature Date: 2020-02-19 Impact factor: 49.962
Authors: Theodore S Hong; Eliezer M Van Allen; Sophia C Kamran; Jochen K Lennerz; Claire A Margolis; David Liu; Brendan Reardon; Stephanie A Wankowicz; Emily E Van Seventer; Adam Tracy; Jennifer Y Wo; Scott L Carter; Henning Willers; Ryan B Corcoran Journal: Clin Cancer Res Date: 2019-06-28 Impact factor: 12.531
Authors: Rebecca G Smith; Ehsan Pishva; Gemma Shireby; Adam R Smith; Janou A Y Roubroeks; Eilis Hannon; Gregory Wheildon; Diego Mastroeni; Gilles Gasparoni; Matthias Riemenschneider; Armin Giese; Andrew J Sharp; Leonard Schalkwyk; Vahram Haroutunian; Wolfgang Viechtbauer; Daniel L A van den Hove; Michael Weedon; Danielle Brokaw; Paul T Francis; Alan J Thomas; Seth Love; Kevin Morgan; Jörn Walter; Paul D Coleman; David A Bennett; Philip L De Jager; Jonathan Mill; Katie Lunnon Journal: Nat Commun Date: 2021-06-10 Impact factor: 14.919