Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 DataRemix: a universal data transformation for optimal inference from gene expression datasets.

Literature DB >> 32821903

DataRemix: a universal data transformation for optimal inference from gene expression datasets.

Weiguang Mao^1,2, Javad Rahimikollu^1,2, Ryan Hausler³, Maria Chikina^1,2.

Abstract

MOTIVATION: RNA-seq technology provides unprecedented power in the assessment of the transcription abundance and can be used to perform a variety of downstream tasks such as inference of gene-correlation network and eQTL discovery. However, raw gene expression values have to be normalized for nuisance biological variation and technical covariates, and different normalization strategies can lead to dramatically different results in the downstream study.
RESULTS: We describe a generalization of singular value decomposition-based reconstruction for which the common techniques of whitening, rank-k approximation and removing the top k principal components are special cases. Our simple three-parameter transformation, DataRemix, can be tuned to reweigh the contribution of hidden factors and reveal otherwise hidden biological signals. In particular, we demonstrate that the method can effectively prioritize biological signals over noise without leveraging external dataset-specific knowledge, and can outperform normalization methods that make explicit use of known technical factors. We also show that DataRemix can be efficiently optimized via Thompson sampling approach, which makes it feasible for computationally expensive objectives such as eQTL analysis. Finally, we apply our method to the Religious Orders Study and Memory and Aging Project dataset, and we report what to our knowledge is the first replicable trans-eQTL effect in human brain. AVAILABILITYAND IMPLEMENTATION: DataRemix is an R package which is freely available at GitHub (https://github.com/wgmao/DataRemix). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Species

Year: 2021 PMID： 32821903 PMCID： PMC8128479 DOI： 10.1093/bioinformatics/btaa745

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

17 in total

1. Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms.

Authors: Orly Alter; Patrick O Brown; David Botstein
Journal: Proc Natl Acad Sci U S A Date: 2003-03-11 Impact factor: 11.205

2. Correction for hidden confounders in the genetic analysis of gene expression.

Authors: Jennifer Listgarten; Carl Kadie; Eric E Schadt; David Heckerman
Journal: Proc Natl Acad Sci U S A Date: 2010-09-01 Impact factor: 11.205

3. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Authors: Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov
Journal: Proc Natl Acad Sci U S A Date: 2005-09-30 Impact factor: 11.205

4. A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies.

Authors: Oliver Stegle; Leopold Parts; Richard Durbin; John Winn
Journal: PLoS Comput Biol Date: 2010-05-06 Impact factor: 4.475

5. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals.

Authors: Alexis Battle; Sara Mostafavi; Xiaowei Zhu; James B Potash; Myrna M Weissman; Courtney McCormick; Christian D Haudenschild; Kenneth B Beckman; Jianxin Shi; Rui Mei; Alexander E Urban; Stephen B Montgomery; Douglas F Levinson; Daphne Koller
Journal: Genome Res Date: 2013-10-03 Impact factor: 9.043

6. UniProt: a worldwide hub of protein knowledge.

Authors:
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971

7. Capturing heterogeneity in gene expression studies by surrogate variable analysis.

Authors: Jeffrey T Leek; John D Storey
Journal: PLoS Genet Date: 2007-08-01 Impact factor: 5.917

8. Normalizing RNA-sequencing data by modeling hidden covariates with prior knowledge.

Authors: Sara Mostafavi; Alexis Battle; Xiaowei Zhu; Alexander E Urban; Douglas Levinson; Stephen B Montgomery; Daphne Koller
Journal: PLoS One Date: 2013-07-18 Impact factor: 3.240

9. Heritability and genomics of gene expression in peripheral blood.

Authors: Fred A Wright; Patrick F Sullivan; Andrew I Brooks; Fei Zou; Wei Sun; Kai Xia; Vered Madar; Rick Jansen; Wonil Chung; Yi-Hui Zhou; Abdel Abdellaoui; Sandra Batista; Casey Butler; Guanhua Chen; Ting-Huei Chen; David D'Ambrosio; Paul Gallins; Min Jin Ha; Jouke Jan Hottenga; Shunping Huang; Mathijs Kattenberg; Jaspreet Kochar; Christel M Middeldorp; Ani Qu; Andrey Shabalin; Jay Tischfield; Laura Todd; Jung-Ying Tzeng; Gerard van Grootheest; Jacqueline M Vink; Qi Wang; Wei Wang; Weibo Wang; Gonneke Willemsen; Johannes H Smit; Eco J de Geus; Zhaoyu Yin; Brenda W J H Penninx; Dorret I Boomsma
Journal: Nat Genet Date: 2014-04-13 Impact factor: 38.330

10. Understanding Tissue-Specific Gene Regulation.

Authors: Abhijeet Rajendra Sonawane; John Platig; Maud Fagny; Cho-Yi Chen; Joseph Nathaniel Paulson; Camila Miranda Lopes-Ramos; Dawn Lisa DeMeo; John Quackenbush; Kimberly Glass; Marieke Lydia Kuijjer
Journal: Cell Rep Date: 2017-10-24 Impact factor: 9.423

2 in total

1. An approach for normalization and quality control for NanoString RNA expression data.

Authors: Arjun Bhattacharya; Alina M Hamilton; Helena Furberg; Eugene Pietzak; Mark P Purdue; Melissa A Troester; Katherine A Hoadley; Michael I Love
Journal: Brief Bioinform Date: 2021-05-20 Impact factor: 11.622

2. Robust normalization and transformation techniques for constructing gene coexpression networks from RNA-seq data.

Authors: Kayla A Johnson; Arjun Krishnan
Journal: Genome Biol Date: 2022-01-03 Impact factor: 13.583

2 in total