Weiguang Mao1,2, Javad Rahimikollu1,2, Ryan Hausler3, Maria Chikina1,2. 1. Joint Carnegie Mellon-University of Pittsburgh Ph.D. Program in Computational Biology, Pittsburgh, PA 15260, USA. 2. Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15260, USA. 3. Department of Medicine, Division of Hematology/Oncology,, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
Abstract
MOTIVATION: RNA-seq technology provides unprecedented power in the assessment of the transcription abundance and can be used to perform a variety of downstream tasks such as inference of gene-correlation network and eQTL discovery. However, raw gene expression values have to be normalized for nuisance biological variation and technical covariates, and different normalization strategies can lead to dramatically different results in the downstream study. RESULTS: We describe a generalization of singular value decomposition-based reconstruction for which the common techniques of whitening, rank-k approximation and removing the top k principal components are special cases. Our simple three-parameter transformation, DataRemix, can be tuned to reweigh the contribution of hidden factors and reveal otherwise hidden biological signals. In particular, we demonstrate that the method can effectively prioritize biological signals over noise without leveraging external dataset-specific knowledge, and can outperform normalization methods that make explicit use of known technical factors. We also show that DataRemix can be efficiently optimized via Thompson sampling approach, which makes it feasible for computationally expensive objectives such as eQTL analysis. Finally, we apply our method to the Religious Orders Study and Memory and Aging Project dataset, and we report what to our knowledge is the first replicable trans-eQTL effect in human brain. AVAILABILITYAND IMPLEMENTATION: DataRemix is an R package which is freely available at GitHub (https://github.com/wgmao/DataRemix). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: RNA-seq technology provides unprecedented power in the assessment of the transcription abundance and can be used to perform a variety of downstream tasks such as inference of gene-correlation network and eQTL discovery. However, raw gene expression values have to be normalized for nuisance biological variation and technical covariates, and different normalization strategies can lead to dramatically different results in the downstream study. RESULTS: We describe a generalization of singular value decomposition-based reconstruction for which the common techniques of whitening, rank-k approximation and removing the top k principal components are special cases. Our simple three-parameter transformation, DataRemix, can be tuned to reweigh the contribution of hidden factors and reveal otherwise hidden biological signals. In particular, we demonstrate that the method can effectively prioritize biological signals over noise without leveraging external dataset-specific knowledge, and can outperform normalization methods that make explicit use of known technical factors. We also show that DataRemix can be efficiently optimized via Thompson sampling approach, which makes it feasible for computationally expensive objectives such as eQTL analysis. Finally, we apply our method to the Religious Orders Study and Memory and Aging Project dataset, and we report what to our knowledge is the first replicable trans-eQTL effect in human brain. AVAILABILITYAND IMPLEMENTATION: DataRemix is an R package which is freely available at GitHub (https://github.com/wgmao/DataRemix). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov Journal: Proc Natl Acad Sci U S A Date: 2005-09-30 Impact factor: 11.205
Authors: Alexis Battle; Sara Mostafavi; Xiaowei Zhu; James B Potash; Myrna M Weissman; Courtney McCormick; Christian D Haudenschild; Kenneth B Beckman; Jianxin Shi; Rui Mei; Alexander E Urban; Stephen B Montgomery; Douglas F Levinson; Daphne Koller Journal: Genome Res Date: 2013-10-03 Impact factor: 9.043
Authors: Sara Mostafavi; Alexis Battle; Xiaowei Zhu; Alexander E Urban; Douglas Levinson; Stephen B Montgomery; Daphne Koller Journal: PLoS One Date: 2013-07-18 Impact factor: 3.240
Authors: Fred A Wright; Patrick F Sullivan; Andrew I Brooks; Fei Zou; Wei Sun; Kai Xia; Vered Madar; Rick Jansen; Wonil Chung; Yi-Hui Zhou; Abdel Abdellaoui; Sandra Batista; Casey Butler; Guanhua Chen; Ting-Huei Chen; David D'Ambrosio; Paul Gallins; Min Jin Ha; Jouke Jan Hottenga; Shunping Huang; Mathijs Kattenberg; Jaspreet Kochar; Christel M Middeldorp; Ani Qu; Andrey Shabalin; Jay Tischfield; Laura Todd; Jung-Ying Tzeng; Gerard van Grootheest; Jacqueline M Vink; Qi Wang; Wei Wang; Weibo Wang; Gonneke Willemsen; Johannes H Smit; Eco J de Geus; Zhaoyu Yin; Brenda W J H Penninx; Dorret I Boomsma Journal: Nat Genet Date: 2014-04-13 Impact factor: 38.330
Authors: Arjun Bhattacharya; Alina M Hamilton; Helena Furberg; Eugene Pietzak; Mark P Purdue; Melissa A Troester; Katherine A Hoadley; Michael I Love Journal: Brief Bioinform Date: 2021-05-20 Impact factor: 11.622