Literature DB >> 36109686

Removing unwanted variation from large-scale RNA sequencing data with PRPS.

Anthony T Papenfuss1,2,3,4, Terence P Speed5,6, Ramyar Molania7,8, Momeneh Foroutan9, Johann A Gagnon-Bartsch10, Luke C Gandolfo11,12,13, Aryan Jain14, Abhishek Sinha14, Gavriel Olshansky15,16, Alexander Dobrovic17.   

Abstract

Accurate identification and effective removal of unwanted variation is essential to derive meaningful biological results from RNA sequencing (RNA-seq) data, especially when the data come from large and complex studies. Using RNA-seq data from The Cancer Genome Atlas (TCGA), we examined several sources of unwanted variation and demonstrate here how these can significantly compromise various downstream analyses, including cancer subtype identification, association between gene expression and survival outcomes and gene co-expression analysis. We propose a strategy, called pseudo-replicates of pseudo-samples (PRPS), for deploying our recently developed normalization method, called removing unwanted variation III (RUV-III), to remove the variation caused by library size, tumor purity and batch effects in TCGA RNA-seq data. We illustrate the value of our approach by comparing it to the standard TCGA normalizations on several TCGA RNA-seq datasets. RUV-III with PRPS can be used to integrate and normalize other large transcriptomic datasets coming from multiple laboratories or platforms.
© 2022. The Author(s).

Entities:  

Year:  2022        PMID: 36109686     DOI: 10.1038/s41587-022-01440-w

Source DB:  PubMed          Journal:  Nat Biotechnol        ISSN: 1087-0156            Impact factor:   68.164


  62 in total

1.  Using control genes to correct for unwanted variation in microarray data.

Authors:  Johann A Gagnon-Bartsch; Terence P Speed
Journal:  Biostatistics       Date:  2011-11-17       Impact factor: 5.899

2.  Normalization of RNA-seq data using factor analysis of control genes or samples.

Authors:  Davide Risso; John Ngai; Terence P Speed; Sandrine Dudoit
Journal:  Nat Biotechnol       Date:  2014-08-24       Impact factor: 54.908

3.  A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis.

Authors:  Marie-Agnès Dillies; Andrea Rau; Julie Aubert; Christelle Hennequet-Antier; Marine Jeanmougin; Nicolas Servant; Céline Keime; Guillemette Marot; David Castel; Jordi Estelle; Gregory Guernec; Bernd Jagla; Luc Jouneau; Denis Laloë; Caroline Le Gall; Brigitte Schaëffer; Stéphane Le Crom; Mickaël Guedj; Florence Jaffrézic
Journal:  Brief Bioinform       Date:  2012-09-17       Impact factor: 11.622

4.  A new normalization for Nanostring nCounter gene expression data.

Authors:  Ramyar Molania; Johann A Gagnon-Bartsch; Alexander Dobrovic; Terence P Speed
Journal:  Nucleic Acids Res       Date:  2019-07-09       Impact factor: 16.971

Review 5.  Tackling the widespread and critical impact of batch effects in high-throughput data.

Authors:  Jeffrey T Leek; Robert B Scharpf; Héctor Corrada Bravo; David Simcha; Benjamin Langmead; W Evan Johnson; Donald Geman; Keith Baggerly; Rafael A Irizarry
Journal:  Nat Rev Genet       Date:  2010-09-14       Impact factor: 53.242

6.  A scaling normalization method for differential expression analysis of RNA-seq data.

Authors:  Mark D Robinson; Alicia Oshlack
Journal:  Genome Biol       Date:  2010-03-02       Impact factor: 13.583

7.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments.

Authors:  James H Bullard; Elizabeth Purdom; Kasper D Hansen; Sandrine Dudoit
Journal:  BMC Bioinformatics       Date:  2010-02-18       Impact factor: 3.169

Review 8.  Revisiting global gene expression analysis.

Authors:  Jakob Lovén; David A Orlando; Alla A Sigova; Charles Y Lin; Peter B Rahl; Christopher B Burge; David L Levens; Tong Ihn Lee; Richard A Young
Journal:  Cell       Date:  2012-10-26       Impact factor: 41.582

9.  GC-content normalization for RNA-Seq data.

Authors:  Davide Risso; Katja Schwartz; Gavin Sherlock; Sandrine Dudoit
Journal:  BMC Bioinformatics       Date:  2011-12-17       Impact factor: 3.169

10.  How data analysis affects power, reproducibility and biological insight of RNA-seq studies in complex datasets.

Authors:  Lucia Peixoto; Davide Risso; Shane G Poplawski; Mathieu E Wimmer; Terence P Speed; Marcelo A Wood; Ted Abel
Journal:  Nucleic Acids Res       Date:  2015-07-21       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.