Teng Fei1, Tengjiao Zhang2, Weiyang Shi3, Tianwei Yu1. 1. Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, USA. 2. School of Life Sciences and Technology, Tongji University, Shanghai, China. 3. Ministry of Education Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao, China.
Abstract
Motivation: It is well known that batch effects exist in RNA-seq data and other profiling data. Although some methods do a good job adjusting for batch effects by modifying the data matrices, it is still difficult to remove the batch effects entirely. The remaining batch effect can cause artifacts in the detection of patterns in the data. Results: In this study, we consider the batch effect issue in the pattern detection among the samples, such as clustering, dimension reduction and construction of networks between subjects. Instead of adjusting the original data matrices, we design an adaptive method to directly adjust the dissimilarity matrix between samples. In simulation studies, the method achieved better results recovering true underlying clusters, compared to the leading batch effect adjustment method ComBat. In real data analysis, the method effectively corrected distance matrices and improved the performance of clustering algorithms. Availability and implementation: The R package is available at: https://github.com/tengfei-emory/QuantNorm. Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: It is well known that batch effects exist in RNA-seq data and other profiling data. Although some methods do a good job adjusting for batch effects by modifying the data matrices, it is still difficult to remove the batch effects entirely. The remaining batch effect can cause artifacts in the detection of patterns in the data. Results: In this study, we consider the batch effect issue in the pattern detection among the samples, such as clustering, dimension reduction and construction of networks between subjects. Instead of adjusting the original data matrices, we design an adaptive method to directly adjust the dissimilarity matrix between samples. In simulation studies, the method achieved better results recovering true underlying clusters, compared to the leading batch effect adjustment method ComBat. In real data analysis, the method effectively corrected distance matrices and improved the performance of clustering algorithms. Availability and implementation: The R package is available at: https://github.com/tengfei-emory/QuantNorm. Supplementary information: Supplementary data are available at Bioinformatics online.
Authors: Monica Benito; Joel Parker; Quan Du; Junyuan Wu; Dong Xiang; Charles M Perou; J S Marron Journal: Bioinformatics Date: 2004-01-01 Impact factor: 6.937
Authors: Ye Zhang; Steven A Sloan; Laura E Clarke; Christine Caneda; Colton A Plaza; Paul D Blumenthal; Hannes Vogel; Gary K Steinberg; Michael S B Edwards; Gordon Li; John A Duncan; Samuel H Cheshier; Lawrence M Shuer; Edward F Chang; Gerald A Grant; Melanie G Hayden Gephart; Ben A Barres Journal: Neuron Date: 2015-12-10 Impact factor: 17.173
Authors: Shin Lin; Yiing Lin; Joseph R Nery; Mark A Urich; Alessandra Breschi; Carrie A Davis; Alexander Dobin; Christopher Zaleski; Michael A Beer; William C Chapman; Thomas R Gingeras; Joseph R Ecker; Michael P Snyder Journal: Proc Natl Acad Sci U S A Date: 2014-11-20 Impact factor: 11.205
Authors: Christian Müller; Arne Schillert; Caroline Röthemeier; David-Alexandre Trégouët; Carole Proust; Harald Binder; Norbert Pfeiffer; Manfred Beutel; Karl J Lackner; Renate B Schnabel; Laurence Tiret; Philipp S Wild; Stefan Blankenberg; Tanja Zeller; Andreas Ziegler Journal: PLoS One Date: 2016-06-07 Impact factor: 3.240
Authors: Jun Jiang; Burak Tekin; Lin Yuan; Sebastian Armasu; Stacey J Winham; Ellen L Goode; Hongfang Liu; Yajue Huang; Ruifeng Guo; Chen Wang Journal: Front Med (Lausanne) Date: 2022-09-07