Literature DB >> 22611599

The practical effect of batch on genomic prediction.

Hilary S Parker1, Jeffrey T Leek.   

Abstract

Measurements from microarrays and other high-throughput technologies are susceptible to non-biological artifacts like batch effects. It is known that batch effects can alter or obscure the set of significant results and biological conclusions in high-throughput studies. Here we examine the impact of batch effects on predictors built from genomic technologies. To investigate batch effects, we collected publicly available gene expression measurements with known outcomes, and estimated batches using date. Using these data we show (1) the impact of batch effects on prediction depends on the correlation between outcome and batch in the training data, and (2) removing expression measurements most affected by batch before building predictors may improve the accuracy of those predictors. These results suggest that (1) training sets should be designed to minimize correlation between batches and outcome, and (2) methods for identifying batch-affected probes should be developed to improve prediction results for studies with high correlation between batches and outcome.

Entities:  

Mesh:

Year:  2012        PMID: 22611599      PMCID: PMC3760371          DOI: 10.1515/1544-6115.1766

Source DB:  PubMed          Journal:  Stat Appl Genet Mol Biol        ISSN: 1544-6115


  20 in total

Review 1.  Microarray data analysis: from disarray to consolidation and consensus.

Authors:  David B Allison; Xiangqin Cui; Grier P Page; Mahyar Sabripour
Journal:  Nat Rev Genet       Date:  2006-01       Impact factor: 53.242

2.  Genome-wide analysis of estrogen receptor binding sites.

Authors:  Jason S Carroll; Clifford A Meyer; Jun Song; Wei Li; Timothy R Geistlinger; Jérôme Eeckhoute; Alexander S Brodsky; Erika Krasnickas Keeton; Kirsten C Fertuck; Giles F Hall; Qianben Wang; Stefan Bekiranov; Victor Sementchenko; Edward A Fox; Pamela A Silver; Thomas R Gingeras; X Shirley Liu; Myles Brown
Journal:  Nat Genet       Date:  2006-10-01       Impact factor: 38.330

3.  Frozen robust multiarray analysis (fRMA).

Authors:  Matthew N McCall; Benjamin M Bolstad; Rafael A Irizarry
Journal:  Biostatistics       Date:  2010-01-22       Impact factor: 5.899

Review 4.  Tackling the widespread and critical impact of batch effects in high-throughput data.

Authors:  Jeffrey T Leek; Robert B Scharpf; Héctor Corrada Bravo; David Simcha; Benjamin Langmead; W Evan Johnson; Donald Geman; Keith Baggerly; Rafael A Irizarry
Journal:  Nat Rev Genet       Date:  2010-09-14       Impact factor: 53.242

5.  Diagnosis of multiple cancer types by shrunken centroids of gene expression.

Authors:  Robert Tibshirani; Trevor Hastie; Balasubramanian Narasimhan; Gilbert Chu
Journal:  Proc Natl Acad Sci U S A       Date:  2002-05-14       Impact factor: 11.205

6.  Supervised normalization of microarrays.

Authors:  Brigham H Mecham; Peter S Nelson; John D Storey
Journal:  Bioinformatics       Date:  2010-03-31       Impact factor: 6.937

7.  Pitfalls of supervised feature selection.

Authors:  Pawel Smialowski; Dmitrij Frishman; Stefan Kramer
Journal:  Bioinformatics       Date:  2009-10-29       Impact factor: 6.937

8.  A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data.

Authors:  J Luo; M Schumacher; A Scherer; D Sanoudou; D Megherbi; T Davison; T Shi; W Tong; L Shi; H Hong; C Zhao; F Elloumi; W Shi; R Thomas; S Lin; G Tillinghast; G Liu; Y Zhou; D Herman; Y Li; Y Deng; H Fang; P Bushel; M Woods; J Zhang
Journal:  Pharmacogenomics J       Date:  2010-08       Impact factor: 3.550

9.  Cluster analysis and display of genome-wide expression patterns.

Authors:  M B Eisen; P T Spellman; P O Brown; D Botstein
Journal:  Proc Natl Acad Sci U S A       Date:  1998-12-08       Impact factor: 11.205

10.  ArrayExpress update--from an archive of functional genomics experiments to the atlas of gene expression.

Authors:  Helen Parkinson; Misha Kapushesky; Nikolay Kolesnikov; Gabriella Rustici; Mohammad Shojatalab; Niran Abeygunawardena; Hugo Berube; Miroslaw Dylag; Ibrahim Emam; Anna Farne; Ele Holloway; Margus Lukk; James Malone; Roby Mani; Ekaterina Pilicheva; Tim F Rayner; Faisal Rezwan; Anjan Sharma; Eleanor Williams; Xiangqun Zheng Bradley; Tomasz Adamusiak; Marco Brandizi; Tony Burdett; Richard Coulson; Maria Krestyaninova; Pavel Kurnosov; Eamonn Maguire; Sudeshna Guha Neogi; Philippe Rocca-Serra; Susanna-Assunta Sansone; Nataliya Sklyar; Mengyao Zhao; Ugis Sarkans; Alvis Brazma
Journal:  Nucleic Acids Res       Date:  2008-11-10       Impact factor: 16.971

View more
  11 in total

1.  Covariance adjustment for batch effect in gene expression data.

Authors:  Jung Ae Lee; Kevin K Dobbin; Jeongyoun Ahn
Journal:  Stat Med       Date:  2014-03-28       Impact factor: 2.373

2.  Preserving biological heterogeneity with a permuted surrogate variable analysis for genomics batch correction.

Authors:  Hilary S Parker; Jeffrey T Leek; Alexander V Favorov; Michael Considine; Xiaoxin Xia; Sameer Chavan; Christine H Chung; Elana J Fertig
Journal:  Bioinformatics       Date:  2014-06-06       Impact factor: 6.937

3.  RIPOSTE: a framework for improving the design and analysis of laboratory-based research.

Authors:  Nicholas Gd Masca; Elizabeth Ma Hensor; Victoria R Cornelius; Francesca M Buffa; Helen M Marriott; James M Eales; Michael P Messenger; Amy E Anderson; Chris Boot; Catey Bunce; Robert D Goldin; Jessica Harris; Rod F Hinchliffe; Hiba Junaid; Shaun Kingston; Carmen Martin-Ruiz; Christopher P Nelson; Janet Peacock; Paul T Seed; Bethany Shinkins; Karl J Staples; Jamie Toombs; Adam Ka Wright; M Dawn Teare
Journal:  Elife       Date:  2015-05-07       Impact factor: 8.140

4.  Removing batch effects for prediction problems with frozen surrogate variable analysis.

Authors:  Hilary S Parker; Héctor Corrada Bravo; Jeffrey T Leek
Journal:  PeerJ       Date:  2014-09-23       Impact factor: 2.984

5.  Functional normalization of 450k methylation array data improves replication in large cancer studies.

Authors:  Jean-Philippe Fortin; Aurélie Labbe; Mathieu Lemire; Brent W Zanke; Thomas J Hudson; Elana J Fertig; Celia Mt Greenwood; Kasper D Hansen
Journal:  Genome Biol       Date:  2014-12-03       Impact factor: 13.583

6.  A Pathway Based Classification Method for Analyzing Gene Expression for Alzheimer's Disease Diagnosis.

Authors:  Nicola Voyle; Aoife Keohane; Stephen Newhouse; Katie Lunnon; Caroline Johnston; Hilkka Soininen; Iwona Kloszewska; Patrizia Mecocci; Magda Tsolaki; Bruno Vellas; Simon Lovestone; Angela Hodges; Steven Kiddle; Richard Jb Dobson
Journal:  J Alzheimers Dis       Date:  2016       Impact factor: 4.472

7.  BEclear: Batch Effect Detection and Adjustment in DNA Methylation Data.

Authors:  Ruslan Akulenko; Markus Merl; Volkhard Helms
Journal:  PLoS One       Date:  2016-08-25       Impact factor: 3.240

8.  A data-driven interactome of synergistic genes improves network-based cancer outcome prediction.

Authors:  Amin Allahyar; Joske Ubels; Jeroen de Ridder
Journal:  PLoS Comput Biol       Date:  2019-02-06       Impact factor: 4.475

9.  Batch effect confounding leads to strong bias in performance estimates obtained by cross-validation.

Authors:  Charlotte Soneson; Sarah Gerster; Mauro Delorenzi
Journal:  PLoS One       Date:  2014-06-26       Impact factor: 3.240

10.  Single-Patient Molecular Testing with NanoString nCounter Data Using a Reference-Based Strategy for Batch Effect Correction.

Authors:  Aline Talhouk; Stefan Kommoss; Robertson Mackenzie; Martin Cheung; Samuel Leung; Derek S Chiu; Steve E Kalloger; David G Huntsman; Stephanie Chen; Maria Intermaggio; Jacek Gronwald; Fong C Chan; Susan J Ramus; Christian Steidl; David W Scott; Michael S Anglesio
Journal:  PLoS One       Date:  2016-04-20       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.