Literature DB >> 32821903

DataRemix: a universal data transformation for optimal inference from gene expression datasets.

Weiguang Mao1,2, Javad Rahimikollu1,2, Ryan Hausler3, Maria Chikina1,2.   

Abstract

MOTIVATION: RNA-seq technology provides unprecedented power in the assessment of the transcription abundance and can be used to perform a variety of downstream tasks such as inference of gene-correlation network and eQTL discovery. However, raw gene expression values have to be normalized for nuisance biological variation and technical covariates, and different normalization strategies can lead to dramatically different results in the downstream study.
RESULTS: We describe a generalization of singular value decomposition-based reconstruction for which the common techniques of whitening, rank-k approximation and removing the top k principal components are special cases. Our simple three-parameter transformation, DataRemix, can be tuned to reweigh the contribution of hidden factors and reveal otherwise hidden biological signals. In particular, we demonstrate that the method can effectively prioritize biological signals over noise without leveraging external dataset-specific knowledge, and can outperform normalization methods that make explicit use of known technical factors. We also show that DataRemix can be efficiently optimized via Thompson sampling approach, which makes it feasible for computationally expensive objectives such as eQTL analysis. Finally, we apply our method to the Religious Orders Study and Memory and Aging Project dataset, and we report what to our knowledge is the first replicable trans-eQTL effect in human brain. AVAILABILITYAND IMPLEMENTATION: DataRemix is an R package which is freely available at GitHub (https://github.com/wgmao/DataRemix). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Year:  2021        PMID: 32821903      PMCID: PMC8128479          DOI: 10.1093/bioinformatics/btaa745

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  17 in total

1.  Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms.

Authors:  Orly Alter; Patrick O Brown; David Botstein
Journal:  Proc Natl Acad Sci U S A       Date:  2003-03-11       Impact factor: 11.205

2.  Correction for hidden confounders in the genetic analysis of gene expression.

Authors:  Jennifer Listgarten; Carl Kadie; Eric E Schadt; David Heckerman
Journal:  Proc Natl Acad Sci U S A       Date:  2010-09-01       Impact factor: 11.205

3.  Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Authors:  Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov
Journal:  Proc Natl Acad Sci U S A       Date:  2005-09-30       Impact factor: 11.205

4.  A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies.

Authors:  Oliver Stegle; Leopold Parts; Richard Durbin; John Winn
Journal:  PLoS Comput Biol       Date:  2010-05-06       Impact factor: 4.475

5.  Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals.

Authors:  Alexis Battle; Sara Mostafavi; Xiaowei Zhu; James B Potash; Myrna M Weissman; Courtney McCormick; Christian D Haudenschild; Kenneth B Beckman; Jianxin Shi; Rui Mei; Alexander E Urban; Stephen B Montgomery; Douglas F Levinson; Daphne Koller
Journal:  Genome Res       Date:  2013-10-03       Impact factor: 9.043

6.  UniProt: a worldwide hub of protein knowledge.

Authors: 
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

7.  Capturing heterogeneity in gene expression studies by surrogate variable analysis.

Authors:  Jeffrey T Leek; John D Storey
Journal:  PLoS Genet       Date:  2007-08-01       Impact factor: 5.917

8.  Normalizing RNA-sequencing data by modeling hidden covariates with prior knowledge.

Authors:  Sara Mostafavi; Alexis Battle; Xiaowei Zhu; Alexander E Urban; Douglas Levinson; Stephen B Montgomery; Daphne Koller
Journal:  PLoS One       Date:  2013-07-18       Impact factor: 3.240

9.  Heritability and genomics of gene expression in peripheral blood.

Authors:  Fred A Wright; Patrick F Sullivan; Andrew I Brooks; Fei Zou; Wei Sun; Kai Xia; Vered Madar; Rick Jansen; Wonil Chung; Yi-Hui Zhou; Abdel Abdellaoui; Sandra Batista; Casey Butler; Guanhua Chen; Ting-Huei Chen; David D'Ambrosio; Paul Gallins; Min Jin Ha; Jouke Jan Hottenga; Shunping Huang; Mathijs Kattenberg; Jaspreet Kochar; Christel M Middeldorp; Ani Qu; Andrey Shabalin; Jay Tischfield; Laura Todd; Jung-Ying Tzeng; Gerard van Grootheest; Jacqueline M Vink; Qi Wang; Wei Wang; Weibo Wang; Gonneke Willemsen; Johannes H Smit; Eco J de Geus; Zhaoyu Yin; Brenda W J H Penninx; Dorret I Boomsma
Journal:  Nat Genet       Date:  2014-04-13       Impact factor: 38.330

10.  Understanding Tissue-Specific Gene Regulation.

Authors:  Abhijeet Rajendra Sonawane; John Platig; Maud Fagny; Cho-Yi Chen; Joseph Nathaniel Paulson; Camila Miranda Lopes-Ramos; Dawn Lisa DeMeo; John Quackenbush; Kimberly Glass; Marieke Lydia Kuijjer
Journal:  Cell Rep       Date:  2017-10-24       Impact factor: 9.423

View more
  2 in total

1.  An approach for normalization and quality control for NanoString RNA expression data.

Authors:  Arjun Bhattacharya; Alina M Hamilton; Helena Furberg; Eugene Pietzak; Mark P Purdue; Melissa A Troester; Katherine A Hoadley; Michael I Love
Journal:  Brief Bioinform       Date:  2021-05-20       Impact factor: 11.622

2.  Robust normalization and transformation techniques for constructing gene coexpression networks from RNA-seq data.

Authors:  Kayla A Johnson; Arjun Krishnan
Journal:  Genome Biol       Date:  2022-01-03       Impact factor: 13.583

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.