Literature DB >> 26286812

Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed.

Laurent Jacob1, Johann A Gagnon-Bartsch2, Terence P Speed3.   

Abstract

When dealing with large scale gene expression studies, observations are commonly contaminated by sources of unwanted variation such as platforms or batches. Not taking this unwanted variation into account when analyzing the data can lead to spurious associations and to missing important signals. When the analysis is unsupervised, e.g. when the goal is to cluster the samples or to build a corrected version of the dataset--as opposed to the study of an observed factor of interest--taking unwanted variation into account can become a difficult task. The factors driving unwanted variation may be correlated with the unobserved factor of interest, so that correcting for the former can remove the latter if not done carefully. We show how negative control genes and replicate samples can be used to estimate unwanted variation in gene expression, and discuss how this information can be used to correct the expression data. The proposed methods are then evaluated on synthetic data and three gene expression datasets. They generally manage to remove unwanted variation without losing the signal of interest and compare favorably to state-of-the-art corrections. All proposed methods are implemented in the bioconductor package RUVnormalize.
© The Author 2015. Published by Oxford University Press.

Entities:  

Keywords:  Batch effect; Control genes; Gene expression; Normalization; Replicate samples

Mesh:

Year:  2015        PMID: 26286812      PMCID: PMC4679071          DOI: 10.1093/biostatistics/kxv026

Source DB:  PubMed          Journal:  Biostatistics        ISSN: 1465-4644            Impact factor:   5.899


  13 in total

1.  Singular value decomposition for genome-wide expression data processing and modeling.

Authors:  O Alter; P O Brown; D Botstein
Journal:  Proc Natl Acad Sci U S A       Date:  2000-08-29       Impact factor: 11.205

2.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias.

Authors:  B M Bolstad; R A Irizarry; M Astrand; T P Speed
Journal:  Bioinformatics       Date:  2003-01-22       Impact factor: 6.937

3.  Gender-specific gene expression in post-mortem human brain: localization to sex chromosomes.

Authors:  Marquis P Vawter; Simon Evans; Prabhakara Choudary; Hiroaki Tomita; Jim Meador-Woodruff; Margherita Molnar; Jun Li; Juan F Lopez; Rick Myers; David Cox; Stanley J Watson; Huda Akil; Edward G Jones; William E Bunney
Journal:  Neuropsychopharmacology       Date:  2004-02       Impact factor: 7.853

4.  Using control genes to correct for unwanted variation in microarray data.

Authors:  Johann A Gagnon-Bartsch; Terence P Speed
Journal:  Biostatistics       Date:  2011-11-17       Impact factor: 5.899

5.  Correction for hidden confounders in the genetic analysis of gene expression.

Authors:  Jennifer Listgarten; Carl Kadie; Eric E Schadt; David Heckerman
Journal:  Proc Natl Acad Sci U S A       Date:  2010-09-01       Impact factor: 11.205

6.  Normalization of RNA-seq data using factor analysis of control genes or samples.

Authors:  Davide Risso; John Ngai; Terence P Speed; Sandrine Dudoit
Journal:  Nat Biotechnol       Date:  2014-08-24       Impact factor: 54.908

7.  Adjusting batch effects in microarray expression data using empirical Bayes methods.

Authors:  W Evan Johnson; Cheng Li; Ariel Rabinovic
Journal:  Biostatistics       Date:  2006-04-21       Impact factor: 5.899

8.  Accurate discovery of expression quantitative trait loci under confounding from spurious and genuine regulatory hotspots.

Authors:  Hyun Min Kang; Chun Ye; Eleazar Eskin
Journal:  Genetics       Date:  2008-09-14       Impact factor: 4.562

9.  Statistical methods for handling unwanted variation in metabolomics data.

Authors:  Alysha M De Livera; Marko Sysi-Aho; Laurent Jacob; Johann A Gagnon-Bartsch; Sandra Castillo; Julie A Simpson; Terence P Speed
Journal:  Anal Chem       Date:  2015-03-06       Impact factor: 6.986

10.  Capturing heterogeneity in gene expression studies by surrogate variable analysis.

Authors:  Jeffrey T Leek; John D Storey
Journal:  PLoS Genet       Date:  2007-08-01       Impact factor: 5.917

View more
  36 in total

1.  Normalization of RNA-seq data using factor analysis of control genes or samples.

Authors:  Davide Risso; John Ngai; Terence P Speed; Sandrine Dudoit
Journal:  Nat Biotechnol       Date:  2014-08-24       Impact factor: 54.908

2.  Systematic Evaluation of Genes and Genetic Variants Associated with Type 1 Diabetes Susceptibility.

Authors:  Ramesh Ram; Munish Mehta; Quang T Nguyen; Irma Larma; Bernhard O Boehm; Flemming Pociot; Patrick Concannon; Grant Morahan
Journal:  J Immunol       Date:  2016-02-24       Impact factor: 5.422

3.  Modeling confounding by half-sibling regression.

Authors:  Bernhard Schölkopf; David W Hogg; Dun Wang; Daniel Foreman-Mackey; Dominik Janzing; Carl-Johann Simon-Gabriel; Jonas Peters
Journal:  Proc Natl Acad Sci U S A       Date:  2016-07-05       Impact factor: 11.205

4.  A new normalization for Nanostring nCounter gene expression data.

Authors:  Ramyar Molania; Johann A Gagnon-Bartsch; Alexander Dobrovic; Terence P Speed
Journal:  Nucleic Acids Res       Date:  2019-07-09       Impact factor: 16.971

5.  Removing inter-subject technical variability in magnetic resonance imaging studies.

Authors:  Jean-Philippe Fortin; Elizabeth M Sweeney; John Muschelli; Ciprian M Crainiceanu; Russell T Shinohara
Journal:  Neuroimage       Date:  2016-02-23       Impact factor: 6.556

6.  SeqNet: An R Package for Generating Gene-Gene Networks and Simulating RNA-Seq Data.

Authors:  Tyler Grimes; Somnath Datta
Journal:  J Stat Softw       Date:  2021-07-10       Impact factor: 6.440

7.  An improved and explicit surrogate variable analysis procedure by coefficient adjustment.

Authors:  Seunggeun Lee; Wei Sun; Fred A Wright; Fei Zou
Journal:  Biometrika       Date:  2017-04-21       Impact factor: 2.445

8.  miRNAs differentially expressed by next-generation sequencing in cord blood buffy coat samples of boys and girls.

Authors:  Daneida Lizarraga; Karen Huen; Mary Combs; Maria Escudero-Fung; Brenda Eskenazi; Nina Holland
Journal:  Epigenomics       Date:  2016-11-24       Impact factor: 4.778

9.  Statistical methods for handling unwanted variation in metabolomics data.

Authors:  Alysha M De Livera; Marko Sysi-Aho; Laurent Jacob; Johann A Gagnon-Bartsch; Sandra Castillo; Julie A Simpson; Terence P Speed
Journal:  Anal Chem       Date:  2015-03-06       Impact factor: 6.986

10.  Granulocyte macrophage colony-stimulating factor induces CCL17 production via IRF4 to mediate inflammation.

Authors:  Adrian Achuthan; Andrew D Cook; Ming-Chin Lee; Reem Saleh; Hsu-Wei Khiew; Melody W N Chang; Cynthia Louis; Andrew J Fleetwood; Derek C Lacey; Anne D Christensen; Ashlee T Frye; Pui Yeng Lam; Hitoshi Kusano; Koji Nomura; Nancy Steiner; Irmgard Förster; Stephen L Nutt; Moshe Olshansky; Stephen J Turner; John A Hamilton
Journal:  J Clin Invest       Date:  2016-08-15       Impact factor: 14.808

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.