Literature DB >> 21252076

Length bias correction for RNA-seq data in gene set analyses.

Liyan Gao1, Zhide Fang, Kui Zhang, Degui Zhi, Xiangqin Cui.   

Abstract

MOTIVATION: Next-generation sequencing technologies are being rapidly applied to quantifying transcripts (RNA-seq). However, due to the unique properties of the RNA-seq data, the differential expression of longer transcripts is more likely to be identified than that of shorter transcripts with the same effect size. This bias complicates the downstream gene set analysis (GSA) because the methods for GSA previously developed for microarray data are based on the assumption that genes with same effect size have equal probability (power) to be identified as significantly differentially expressed. Since transcript length is not related to gene expression, adjusting for such length dependency in GSA becomes necessary.
RESULTS: In this article, we proposed two approaches for transcript-length adjustment for analyses based on Poisson models: (i) At individual gene level, we adjusted each gene's test statistic using the square root of transcript length followed by testing for gene set using the Wilcoxon rank-sum test. (ii) At gene set level, we adjusted the null distribution for the Fisher's exact test by weighting the identification probability of each gene using the square root of its transcript length. We evaluated these two approaches using simulations and a real dataset, and showed that these methods can effectively reduce the transcript-length biases. The top-ranked GO terms obtained from the proposed adjustments show more overlaps with the microarray results. AVAILABILITY: R scripts are at http://www.soph.uab.edu/Statgenetics/People/XCui/r-codes/.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21252076      PMCID: PMC3042188          DOI: 10.1093/bioinformatics/btr005

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  19 in total

1.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias.

Authors:  B M Bolstad; R A Irizarry; M Astrand; T P Speed
Journal:  Bioinformatics       Date:  2003-01-22       Impact factor: 6.937

Review 2.  Microarray data analysis: from disarray to consolidation and consensus.

Authors:  David B Allison; Xiangqin Cui; Grier P Page; Mahyar Sabripour
Journal:  Nat Rev Genet       Date:  2006-01       Impact factor: 53.242

3.  Statistical inferences for isoform expression in RNA-Seq.

Authors:  Hui Jiang; Wing Hung Wong
Journal:  Bioinformatics       Date:  2009-02-25       Impact factor: 6.937

4.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays.

Authors:  John C Marioni; Christopher E Mason; Shrikant M Mane; Matthew Stephens; Yoav Gilad
Journal:  Genome Res       Date:  2008-06-11       Impact factor: 9.043

5.  Mapping and quantifying mammalian transcriptomes by RNA-Seq.

Authors:  Ali Mortazavi; Brian A Williams; Kenneth McCue; Lorian Schaeffer; Barbara Wold
Journal:  Nat Methods       Date:  2008-05-30       Impact factor: 28.547

6.  Stem cell transcriptome profiling via massive-scale mRNA sequencing.

Authors:  Nicole Cloonan; Alistair R R Forrest; Gabriel Kolle; Brooke B A Gardiner; Geoffrey J Faulkner; Mellissa K Brown; Darrin F Taylor; Anita L Steptoe; Shivangi Wani; Graeme Bethel; Alan J Robertson; Andrew C Perkins; Stephen J Bruce; Clarence C Lee; Swati S Ranade; Heather E Peckham; Jonathan M Manning; Kevin J McKernan; Sean M Grimmond
Journal:  Nat Methods       Date:  2008-05-30       Impact factor: 28.547

7.  Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise.

Authors:  John R S Newman; Sina Ghaemmaghami; Jan Ihmels; David K Breslow; Matthew Noble; Joseph L DeRisi; Jonathan S Weissman
Journal:  Nature       Date:  2006-05-14       Impact factor: 49.962

Review 8.  RNA-Seq: a revolutionary tool for transcriptomics.

Authors:  Zhong Wang; Mark Gerstein; Michael Snyder
Journal:  Nat Rev Genet       Date:  2009-01       Impact factor: 53.242

9.  Differential expression analysis for sequence count data.

Authors:  Simon Anders; Wolfgang Huber
Journal:  Genome Biol       Date:  2010-10-27       Impact factor: 13.583

10.  Variable locus length in the human genome leads to ascertainment bias in functional inference for non-coding elements.

Authors:  Leila Taher; Ivan Ovcharenko
Journal:  Bioinformatics       Date:  2009-01-25       Impact factor: 6.937

View more
  28 in total

1.  LOESS correction for length variation in gene set-based genomic sequence analysis.

Authors:  Anton Aboukhalil; Martha L Bulyk
Journal:  Bioinformatics       Date:  2012-04-05       Impact factor: 6.937

2.  Regulation of phenylalanine ammonia-lyase (PAL) gene family in wood forming tissue of Populus trichocarpa.

Authors:  Rui Shi; Christopher M Shuford; Jack P Wang; Ying-Hsuan Sun; Zhichang Yang; Hsi-Chuan Chen; Sermsawat Tunlaya-Anukit; Quanzi Li; Jie Liu; David C Muddiman; Ronald R Sederoff; Vincent L Chiang
Journal:  Planta       Date:  2013-06-14       Impact factor: 4.116

3.  Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories.

Authors:  Peter A C 't Hoen; Marc R Friedländer; Jonas Almlöf; Michael Sammeth; Irina Pulyakhina; Seyed Yahya Anvar; Jeroen F J Laros; Henk P J Buermans; Olof Karlberg; Mathias Brännvall; Johan T den Dunnen; Gert-Jan B van Ommen; Ivo G Gut; Roderic Guigó; Xavier Estivill; Ann-Christine Syvänen; Emmanouil T Dermitzakis; Tuuli Lappalainen
Journal:  Nat Biotechnol       Date:  2013-09-15       Impact factor: 54.908

4.  Family-based quantitative trait meta-analysis implicates rare noncoding variants in DENND1A in polycystic ovary syndrome.

Authors:  Matthew Dapas; Ryan Sisk; Richard S Legro; Margrit Urbanek; Andrea Dunaif; M Geoffrey Hayes
Journal:  J Clin Endocrinol Metab       Date:  2019-04-30       Impact factor: 5.958

5.  RSEQREP: RNA-Seq Reports, an open-source cloud-enabled framework for reproducible RNA-Seq data processing, analysis, and result reporting.

Authors:  Travis L Jensen; Michael Frasketi; Kevin Conway; Leigh Villarroel; Heather Hill; Konstantinos Krampis; Johannes B Goll
Journal:  F1000Res       Date:  2017-12-21

6.  Modelling RNA-Seq data with a zero-inflated mixture Poisson linear model.

Authors:  Siyun Liu; Yuan Jiang; Tao Yu
Journal:  Genet Epidemiol       Date:  2019-07-22       Impact factor: 2.135

Review 7.  Whole transcriptome analysis with sequencing: methods, challenges and potential solutions.

Authors:  Zhihua Jiang; Xiang Zhou; Rui Li; Jennifer J Michal; Shuwen Zhang; Michael V Dodson; Zhiwu Zhang; Richard M Harland
Journal:  Cell Mol Life Sci       Date:  2015-05-28       Impact factor: 9.261

8.  Length bias correction in gene ontology enrichment analysis using logistic regression.

Authors:  Gu Mi; Yanming Di; Sarah Emerson; Jason S Cumbie; Jeff H Chang
Journal:  PLoS One       Date:  2012-10-02       Impact factor: 3.240

9.  Bias detection and correction in RNA-Sequencing data.

Authors:  Wei Zheng; Lisa M Chung; Hongyu Zhao
Journal:  BMC Bioinformatics       Date:  2011-07-19       Impact factor: 3.169

10.  Evidence classification of high-throughput protocols and confidence integration in RegulonDB.

Authors:  Verena Weiss; Alejandra Medina-Rivera; Araceli M Huerta; Alberto Santos-Zavaleta; Heladia Salgado; Enrique Morett; Julio Collado-Vides
Journal:  Database (Oxford)       Date:  2013-01-17       Impact factor: 3.451

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.