Literature DB >> 23958724

A new statistic for identifying batch effects in high-throughput genomic data that uses guided principal component analysis.

Sarah E Reese1, Kellie J Archer, Terry M Therneau, Elizabeth J Atkinson, Celine M Vachon, Mariza de Andrade, Jean-Pierre A Kocher, Jeanette E Eckel-Passow.   

Abstract

MOTIVATION: Batch effects are due to probe-specific systematic variation between groups of samples (batches) resulting from experimental features that are not of biological interest. Principal component analysis (PCA) is commonly used as a visual tool to determine whether batch effects exist after applying a global normalization method. However, PCA yields linear combinations of the variables that contribute maximum variance and thus will not necessarily detect batch effects if they are not the largest source of variability in the data.
RESULTS: We present an extension of PCA to quantify the existence of batch effects, called guided PCA (gPCA). We describe a test statistic that uses gPCA to test whether a batch effect exists. We apply our proposed test statistic derived using gPCA to simulated data and to two copy number variation case studies: the first study consisted of 614 samples from a breast cancer family study using Illumina Human 660 bead-chip arrays, whereas the second case study consisted of 703 samples from a family blood pressure study that used Affymetrix SNP Array 6.0. We demonstrate that our statistic has good statistical properties and is able to identify significant batch effects in two copy number variation case studies.
CONCLUSION: We developed a new statistic that uses gPCA to identify whether batch effects exist in high-throughput genomic data. Although our examples pertain to copy number data, gPCA is general and can be used on other data types as well.
AVAILABILITY AND IMPLEMENTATION: The gPCA R package (Available via CRAN) provides functionality and data to perform the methods in this article. CONTACT: reesese@vcu.edu

Entities:  

Mesh:

Year:  2013        PMID: 23958724      PMCID: PMC3810845          DOI: 10.1093/bioinformatics/btt480

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  20 in total

Review 1.  Filter versus wrapper gene selection approaches in DNA microarray domains.

Authors:  Iñaki Inza; Pedro Larrañaga; Rosa Blanco; Antonio J Cerrolaza
Journal:  Artif Intell Med       Date:  2004-06       Impact factor: 5.326

2.  The sva package for removing batch effects and other unwanted variation in high-throughput experiments.

Authors:  Jeffrey T Leek; W Evan Johnson; Hilary S Parker; Andrew E Jaffe; John D Storey
Journal:  Bioinformatics       Date:  2012-01-17       Impact factor: 6.937

3.  Frozen robust multiarray analysis (fRMA).

Authors:  Matthew N McCall; Benjamin M Bolstad; Rafael A Irizarry
Journal:  Biostatistics       Date:  2010-01-22       Impact factor: 5.899

4.  Quantifying uncertainty in genotype calls.

Authors:  Benilton S Carvalho; Thomas A Louis; Rafael A Irizarry
Journal:  Bioinformatics       Date:  2009-11-11       Impact factor: 6.937

5.  Visualization and statistical comparisons of microbial communities using R packages on Phylochip data.

Authors:  Susan Holmes; Alexander Alekseyenko; Alden Timme; Tyrrell Nelson; Pankaj Jay Pasricha; Alfred Spormann
Journal:  Pac Symp Biocomput       Date:  2011

6.  Quality control and quality assurance in genotypic data for genome-wide association studies.

Authors:  Cathy C Laurie; Kimberly F Doheny; Daniel B Mirel; Elizabeth W Pugh; Laura J Bierut; Tushar Bhangale; Frederick Boehm; Neil E Caporaso; Marilyn C Cornelis; Howard J Edenberg; Stacy B Gabriel; Emily L Harris; Frank B Hu; Kevin B Jacobs; Peter Kraft; Maria Teresa Landi; Thomas Lumley; Teri A Manolio; Caitlin McHugh; Ian Painter; Justin Paschall; John P Rice; Kenneth M Rice; Xiuwen Zheng; Bruce S Weir
Journal:  Genet Epidemiol       Date:  2010-09       Impact factor: 2.135

7.  A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data.

Authors:  J Luo; M Schumacher; A Scherer; D Sanoudou; D Megherbi; T Davison; T Shi; W Tong; L Shi; H Hong; C Zhao; F Elloumi; W Shi; R Thomas; S Lin; G Tillinghast; G Liu; Y Zhou; D Herman; Y Li; Y Deng; H Fang; P Bushel; M Woods; J Zhang
Journal:  Pharmacogenomics J       Date:  2010-08       Impact factor: 3.550

8.  Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods.

Authors:  Chao Chen; Kay Grennan; Judith Badner; Dandan Zhang; Elliot Gershon; Li Jin; Chunyu Liu
Journal:  PLoS One       Date:  2011-02-28       Impact factor: 3.240

9.  Integrated analysis of multiple microarray datasets identifies a reproducible survival predictor in ovarian cancer.

Authors:  Panagiotis A Konstantinopoulos; Stephen A Cannistra; Helen Fountzilas; Aedin Culhane; Kamana Pillay; Bo Rueda; Daniel Cramer; Michael Seiden; Michael Birrer; George Coukos; Lin Zhang; John Quackenbush; Dimitrios Spentzos
Journal:  PLoS One       Date:  2011-03-29       Impact factor: 3.240

10.  The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets - improving meta-analysis and prediction of prognosis.

Authors:  Andrew H Sims; Graeme J Smethurst; Yvonne Hey; Michal J Okoniewski; Stuart D Pepper; Anthony Howell; Crispin J Miller; Robert B Clarke
Journal:  BMC Med Genomics       Date:  2008-09-21       Impact factor: 3.063

View more
  51 in total

1.  Detection of copy number variants and loss of heterozygosity from impure tumor samples using whole exome sequencing data.

Authors:  Xiaocheng Liu; Ao Li; Jianing Xi; Huanqing Feng; Minghui Wang
Journal:  Oncol Lett       Date:  2018-07-16       Impact factor: 2.967

2.  Detecting hidden batch factors through data-adaptive adjustment for biological effects.

Authors:  Haidong Yi; Ayush T Raman; Han Zhang; Genevera I Allen; Zhandong Liu
Journal:  Bioinformatics       Date:  2018-04-01       Impact factor: 6.937

3.  Feature specific quantile normalization enables cross-platform classification of molecular subtypes using gene expression data.

Authors:  Jennifer M Franks; Guoshuai Cai; Michael L Whitfield
Journal:  Bioinformatics       Date:  2018-06-01       Impact factor: 6.937

4.  Discovery of Blood Transcriptional Endotypes in Women with Pelvic Inflammatory Disease.

Authors:  Xiaojing Zheng; Catherine M O'Connell; Wujuan Zhong; Uma M Nagarajan; Manoj Tripathy; De'Ashia Lee; Ali N Russell; Harold Wiesenfeld; Sharon Hillier; Toni Darville
Journal:  J Immunol       Date:  2018-03-12       Impact factor: 5.422

5.  Machine learning model for predicting Major Depressive Disorder using RNA-Seq data: optimization of classification approach.

Authors:  Pragya Verma; Madhvi Shakya
Journal:  Cogn Neurodyn       Date:  2021-09-22       Impact factor: 5.082

6.  MODEL-BASED FEATURE SELECTION AND CLUSTERING OF RNA-SEQ DATA FOR UNSUPERVISED SUBTYPE DISCOVERY.

Authors:  David K Lim; Naim U Rashid; Joseph G Ibrahim
Journal:  Ann Appl Stat       Date:  2021-03-18       Impact factor: 2.083

7.  Identification and validation of core genes for serous ovarian adenocarcinoma via bioinformatics analysis.

Authors:  Ruru Zhu; Jisen Xue; Huijun Chen; Qian Zhang
Journal:  Oncol Lett       Date:  2020-08-21       Impact factor: 2.967

8.  Evaluation of inter-batch differences in stem-cell derived neurons.

Authors:  Gladys Morrison; Cong Liu; Claudia Wing; Shannon M Delaney; Wei Zhang; M Eileen Dolan
Journal:  Stem Cell Res       Date:  2015-12-31       Impact factor: 2.020

9.  A transdisciplinary approach to understand the epigenetic basis of race/ethnicity health disparities.

Authors:  Lucas A Salas; Lauren C Peres; Zaneta M Thayer; Rick Wa Smith; Yichen Guo; Wonil Chung; Jiahui Si; Liming Liang
Journal:  Epigenomics       Date:  2021-03-10       Impact factor: 4.778

10.  COPD-associated miR-145-5p is downregulated in early-decline FEV1 trajectories in childhood asthma.

Authors:  Anshul Tiwari; Jiang Li; Alvin T Kho; Maoyun Sun; Quan Lu; Scott T Weiss; Kelan G Tantisira; Michael J McGeachie
Journal:  J Allergy Clin Immunol       Date:  2020-12-29       Impact factor: 10.793

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.