MOTIVATION: Next-generation sequencing technologies are being rapidly applied to quantifying transcripts (RNA-seq). However, due to the unique properties of the RNA-seq data, the differential expression of longer transcripts is more likely to be identified than that of shorter transcripts with the same effect size. This bias complicates the downstream gene set analysis (GSA) because the methods for GSA previously developed for microarray data are based on the assumption that genes with same effect size have equal probability (power) to be identified as significantly differentially expressed. Since transcript length is not related to gene expression, adjusting for such length dependency in GSA becomes necessary. RESULTS: In this article, we proposed two approaches for transcript-length adjustment for analyses based on Poisson models: (i) At individual gene level, we adjusted each gene's test statistic using the square root of transcript length followed by testing for gene set using the Wilcoxon rank-sum test. (ii) At gene set level, we adjusted the null distribution for the Fisher's exact test by weighting the identification probability of each gene using the square root of its transcript length. We evaluated these two approaches using simulations and a real dataset, and showed that these methods can effectively reduce the transcript-length biases. The top-ranked GO terms obtained from the proposed adjustments show more overlaps with the microarray results. AVAILABILITY: R scripts are at http://www.soph.uab.edu/Statgenetics/People/XCui/r-codes/.
MOTIVATION: Next-generation sequencing technologies are being rapidly applied to quantifying transcripts (RNA-seq). However, due to the unique properties of the RNA-seq data, the differential expression of longer transcripts is more likely to be identified than that of shorter transcripts with the same effect size. This bias complicates the downstream gene set analysis (GSA) because the methods for GSA previously developed for microarray data are based on the assumption that genes with same effect size have equal probability (power) to be identified as significantly differentially expressed. Since transcript length is not related to gene expression, adjusting for such length dependency in GSA becomes necessary. RESULTS: In this article, we proposed two approaches for transcript-length adjustment for analyses based on Poisson models: (i) At individual gene level, we adjusted each gene's test statistic using the square root of transcript length followed by testing for gene set using the Wilcoxon rank-sum test. (ii) At gene set level, we adjusted the null distribution for the Fisher's exact test by weighting the identification probability of each gene using the square root of its transcript length. We evaluated these two approaches using simulations and a real dataset, and showed that these methods can effectively reduce the transcript-length biases. The top-ranked GO terms obtained from the proposed adjustments show more overlaps with the microarray results. AVAILABILITY: R scripts are at http://www.soph.uab.edu/Statgenetics/People/XCui/r-codes/.
Authors: Nicole Cloonan; Alistair R R Forrest; Gabriel Kolle; Brooke B A Gardiner; Geoffrey J Faulkner; Mellissa K Brown; Darrin F Taylor; Anita L Steptoe; Shivangi Wani; Graeme Bethel; Alan J Robertson; Andrew C Perkins; Stephen J Bruce; Clarence C Lee; Swati S Ranade; Heather E Peckham; Jonathan M Manning; Kevin J McKernan; Sean M Grimmond Journal: Nat Methods Date: 2008-05-30 Impact factor: 28.547
Authors: John R S Newman; Sina Ghaemmaghami; Jan Ihmels; David K Breslow; Matthew Noble; Joseph L DeRisi; Jonathan S Weissman Journal: Nature Date: 2006-05-14 Impact factor: 49.962
Authors: Rui Shi; Christopher M Shuford; Jack P Wang; Ying-Hsuan Sun; Zhichang Yang; Hsi-Chuan Chen; Sermsawat Tunlaya-Anukit; Quanzi Li; Jie Liu; David C Muddiman; Ronald R Sederoff; Vincent L Chiang Journal: Planta Date: 2013-06-14 Impact factor: 4.116
Authors: Peter A C 't Hoen; Marc R Friedländer; Jonas Almlöf; Michael Sammeth; Irina Pulyakhina; Seyed Yahya Anvar; Jeroen F J Laros; Henk P J Buermans; Olof Karlberg; Mathias Brännvall; Johan T den Dunnen; Gert-Jan B van Ommen; Ivo G Gut; Roderic Guigó; Xavier Estivill; Ann-Christine Syvänen; Emmanouil T Dermitzakis; Tuuli Lappalainen Journal: Nat Biotechnol Date: 2013-09-15 Impact factor: 54.908
Authors: Matthew Dapas; Ryan Sisk; Richard S Legro; Margrit Urbanek; Andrea Dunaif; M Geoffrey Hayes Journal: J Clin Endocrinol Metab Date: 2019-04-30 Impact factor: 5.958
Authors: Travis L Jensen; Michael Frasketi; Kevin Conway; Leigh Villarroel; Heather Hill; Konstantinos Krampis; Johannes B Goll Journal: F1000Res Date: 2017-12-21
Authors: Zhihua Jiang; Xiang Zhou; Rui Li; Jennifer J Michal; Shuwen Zhang; Michael V Dodson; Zhiwu Zhang; Richard M Harland Journal: Cell Mol Life Sci Date: 2015-05-28 Impact factor: 9.261