Literature DB >> 25294822

svaseq: removing batch effects and other unwanted noise from sequencing data.

Jeffrey T Leek1.   

Abstract

It is now known that unwanted noise and unmodeled artifacts such as batch effects can dramatically reduce the accuracy of statistical inference in genomic experiments. These sources of noise must be modeled and removed to accurately measure biological variability and to obtain correct statistical inference when performing high-throughput genomic analysis. We introduced surrogate variable analysis (sva) for estimating these artifacts by (i) identifying the part of the genomic data only affected by artifacts and (ii) estimating the artifacts with principal components or singular vectors of the subset of the data matrix. The resulting estimates of artifacts can be used in subsequent analyses as adjustment factors to correct analyses. Here I describe a version of the sva approach specifically created for count data or FPKMs from sequencing experiments based on appropriate data transformation. I also describe the addition of supervised sva (ssva) for using control probes to identify the part of the genomic data only affected by artifacts. I present a comparison between these versions of sva and other methods for batch effect estimation on simulated data, real count-based data and FPKM-based data. These updates are available through the sva Bioconductor package and I have made fully reproducible analysis using these methods available from: https://github.com/jtleek/svaseq.
© The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Year:  2014        PMID: 25294822      PMCID: PMC4245966          DOI: 10.1093/nar/gku864

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  26 in total

1.  Using control genes to correct for unwanted variation in microarray data.

Authors:  Johann A Gagnon-Bartsch; Terence P Speed
Journal:  Biostatistics       Date:  2011-11-17       Impact factor: 5.899

2.  Asymptotic conditional singular value decomposition for high-dimensional genomic data.

Authors:  Jeffrey T Leek
Journal:  Biometrics       Date:  2010-06-16       Impact factor: 2.571

3.  Principal components analysis corrects for stratification in genome-wide association studies.

Authors:  Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich
Journal:  Nat Genet       Date:  2006-07-23       Impact factor: 38.330

4.  Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories.

Authors:  Peter A C 't Hoen; Marc R Friedländer; Jonas Almlöf; Michael Sammeth; Irina Pulyakhina; Seyed Yahya Anvar; Jeroen F J Laros; Henk P J Buermans; Olof Karlberg; Mathias Brännvall; Johan T den Dunnen; Gert-Jan B van Ommen; Ivo G Gut; Roderic Guigó; Xavier Estivill; Ann-Christine Syvänen; Emmanouil T Dermitzakis; Tuuli Lappalainen
Journal:  Nat Biotechnol       Date:  2013-09-15       Impact factor: 54.908

5.  Independent surrogate variable analysis to deconvolve confounding factors in large-scale microarray profiling studies.

Authors:  Andrew E Teschendorff; Joanna Zhuang; Martin Widschwendter
Journal:  Bioinformatics       Date:  2011-04-06       Impact factor: 6.937

Review 6.  Tackling the widespread and critical impact of batch effects in high-throughput data.

Authors:  Jeffrey T Leek; Robert B Scharpf; Héctor Corrada Bravo; David Simcha; Benjamin Langmead; W Evan Johnson; Donald Geman; Keith Baggerly; Rafael A Irizarry
Journal:  Nat Rev Genet       Date:  2010-09-14       Impact factor: 53.242

7.  Addressing challenges in the production and analysis of illumina sequencing data.

Authors:  Martin Kircher; Patricia Heyn; Janet Kelso
Journal:  BMC Genomics       Date:  2011-07-29       Impact factor: 3.969

8.  ReCount: a multi-experiment resource of analysis-ready RNA-seq gene count datasets.

Authors:  Alyssa C Frazee; Ben Langmead; Jeffrey T Leek
Journal:  BMC Bioinformatics       Date:  2011-11-16       Impact factor: 3.169

9.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

Authors:  Mark D Robinson; Davis J McCarthy; Gordon K Smyth
Journal:  Bioinformatics       Date:  2009-11-11       Impact factor: 6.937

10.  Transcriptome and genome sequencing uncovers functional variation in humans.

Authors:  Tuuli Lappalainen; Michael Sammeth; Marc R Friedländer; Peter A C 't Hoen; Jean Monlong; Manuel A Rivas; Mar Gonzàlez-Porta; Natalja Kurbatova; Thasso Griebel; Pedro G Ferreira; Matthias Barann; Thomas Wieland; Liliana Greger; Maarten van Iterson; Jonas Almlöf; Paolo Ribeca; Irina Pulyakhina; Daniela Esser; Thomas Giger; Andrew Tikhonov; Marc Sultan; Gabrielle Bertier; Daniel G MacArthur; Monkol Lek; Esther Lizano; Henk P J Buermans; Ismael Padioleau; Thomas Schwarzmayr; Olof Karlberg; Halit Ongen; Helena Kilpinen; Sergi Beltran; Marta Gut; Katja Kahlem; Vyacheslav Amstislavskiy; Oliver Stegle; Matti Pirinen; Stephen B Montgomery; Peter Donnelly; Mark I McCarthy; Paul Flicek; Tim M Strom; Hans Lehrach; Stefan Schreiber; Ralf Sudbrak; Angel Carracedo; Stylianos E Antonarakis; Robert Häsler; Ann-Christine Syvänen; Gert-Jan van Ommen; Alvis Brazma; Thomas Meitinger; Philip Rosenstiel; Roderic Guigó; Ivo G Gut; Xavier Estivill; Emmanouil T Dermitzakis
Journal:  Nature       Date:  2013-09-15       Impact factor: 49.962

View more
  179 in total

1.  HBEGF+ macrophages in rheumatoid arthritis induce fibroblast invasiveness.

Authors:  David Kuo; Jennifer Ding; Ian S Cohn; Fan Zhang; Kevin Wei; Deepak A Rao; Cristina Rozo; Upneet K Sokhi; Sara Shanaj; David J Oliver; Adriana P Echeverria; Edward F DiCarlo; Michael B Brenner; Vivian P Bykerk; Susan M Goodman; Soumya Raychaudhuri; Gunnar Rätsch; Lionel B Ivashkiv; Laura T Donlin
Journal:  Sci Transl Med       Date:  2019-05-08       Impact factor: 17.956

2.  Hydroxysteroid 17-β dehydrogenase 13 variant increases phospholipids and protects against fibrosis in nonalcoholic fatty liver disease.

Authors:  Panu K Luukkonen; Taru Tukiainen; Anne Juuti; Henna Sammalkorpi; P A Nidhina Haridas; Onni Niemelä; Johanna Arola; Marju Orho-Melander; Antti Hakkarainen; Petri T Kovanen; Om Dwivedi; Leif Groop; Leanne Hodson; Amalia Gastaldelli; Tuulia Hyötyläinen; Matej Orešič; Hannele Yki-Järvinen
Journal:  JCI Insight       Date:  2020-03-12

3.  Sociality emerges from solitary behaviours and reproductive plasticity in the orchid bee Euglossa dilemma.

Authors:  Nicholas W Saleh; Santiago R Ramírez
Journal:  Proc Biol Sci       Date:  2019-07-10       Impact factor: 5.349

4.  LAmbDA: label ambiguous domain adaptation dataset integration reduces batch effects and improves subtype detection.

Authors:  Travis S Johnson; Tongxin Wang; Zhi Huang; Christina Y Yu; Yi Wu; Yatong Han; Yan Zhang; Kun Huang; Jie Zhang
Journal:  Bioinformatics       Date:  2019-11-01       Impact factor: 6.937

5.  Integrative analysis of liver-specific non-coding regulatory SNPs associated with the risk of coronary artery disease.

Authors:  Ilakya Selvarajan; Anu Toropainen; Kristina M Garske; Maykel López Rodríguez; Arthur Ko; Zong Miao; Dorota Kaminska; Kadri Õunap; Tiit Örd; Aarthi Ravindran; Oscar H Liu; Pierre R Moreau; Ashik Jawahar Deen; Ville Männistö; Calvin Pan; Anna-Liisa Levonen; Aldons J Lusis; Sami Heikkinen; Casey E Romanoski; Jussi Pihlajamäki; Päivi Pajukanta; Minna U Kaikkonen
Journal:  Am J Hum Genet       Date:  2021-02-23       Impact factor: 11.025

6.  Normalizing single-cell RNA sequencing data: challenges and opportunities.

Authors:  Catalina A Vallejos; Davide Risso; Antonio Scialdone; Sandrine Dudoit; John C Marioni
Journal:  Nat Methods       Date:  2017-05-15       Impact factor: 28.547

7.  Copper associates with differential methylation in placentae from two US birth cohorts.

Authors:  Elizabeth Kennedy; Todd M Everson; Tracy Punshon; Brian P Jackson; Ke Hao; Luca Lambertini; Jia Chen; Margaret R Karagas; Carmen J Marsit
Journal:  Epigenetics       Date:  2019-09-04       Impact factor: 4.528

8.  Nonparametric expression analysis using inferential replicate counts.

Authors:  Anqi Zhu; Avi Srivastava; Joseph G Ibrahim; Rob Patro; Michael I Love
Journal:  Nucleic Acids Res       Date:  2019-10-10       Impact factor: 16.971

9.  A field guide for the compositional analysis of any-omics data.

Authors:  Thomas P Quinn; Ionas Erb; Greg Gloor; Cedric Notredame; Mark F Richardson; Tamsyn M Crowley
Journal:  Gigascience       Date:  2019-09-01       Impact factor: 6.524

10.  Cross-validation of existing signatures and derivation of a novel 29-gene transcriptomic signature predictive of progression to TB in a Brazilian cohort of household contacts of pulmonary TB.

Authors:  Samantha Leong; Yue Zhao; Rodrigo Ribeiro-Rodrigues; Edward C Jones-López; Carlos Acuña-Villaorduña; Patricia Marques Rodrigues; Moises Palaci; David Alland; Reynaldo Dietze; Jerrold J Ellner; W Evan Johnson; Padmini Salgame
Journal:  Tuberculosis (Edinb)       Date:  2020-01-07       Impact factor: 3.131

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.