Wei Vivian Li1, Jingyi Jessica Li1,2. 1. Department of Statistics, University of California, Los Angeles, Los Angeles, CA 90095-1554, USA. 2. Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095-088, USA.
Abstract
BACKGROUND: Since the invention of next-generation RNA sequencing (RNA-seq) technologies, they have become a powerful tool to study the presence and quantity of RNA molecules in biological samples and have revolutionized transcriptomic studies. The analysis of RNA-seq data at four different levels (samples, genes, transcripts, and exons) involve multiple statistical and computational questions, some of which remain challenging up to date. RESULTS: We review RNA-seq analysis tools at the sample, gene, transcript, and exon levels from a statistical perspective. We also highlight the biological and statistical questions of most practical considerations. CONCLUSIONS: The development of statistical and computational methods for analyzing RNA-seq data has made significant advances in the past decade. However, methods developed to answer the same biological question often rely on diverse statistical models and exhibit different performance under different scenarios. This review discusses and compares multiple commonly used statistical models regarding their assumptions, in the hope of helping users select appropriate methods as needed, as well as assisting developers for future method development.
BACKGROUND: Since the invention of next-generation RNA sequencing (RNA-seq) technologies, they have become a powerful tool to study the presence and quantity of RNA molecules in biological samples and have revolutionized transcriptomic studies. The analysis of RNA-seq data at four different levels (samples, genes, transcripts, and exons) involve multiple statistical and computational questions, some of which remain challenging up to date. RESULTS: We review RNA-seq analysis tools at the sample, gene, transcript, and exon levels from a statistical perspective. We also highlight the biological and statistical questions of most practical considerations. CONCLUSIONS: The development of statistical and computational methods for analyzing RNA-seq data has made significant advances in the past decade. However, methods developed to answer the same biological question often rely on diverse statistical models and exhibit different performance under different scenarios. This review discusses and compares multiple commonly used statistical models regarding their assumptions, in the hope of helping users select appropriate methods as needed, as well as assisting developers for future method development.
Authors: Daniel Branton; David W Deamer; Andre Marziali; Hagan Bayley; Steven A Benner; Thomas Butler; Massimiliano Di Ventra; Slaven Garaj; Andrew Hibbs; Xiaohua Huang; Stevan B Jovanovich; Predrag S Krstic; Stuart Lindsay; Xinsheng Sean Ling; Carlos H Mastrangelo; Amit Meller; John S Oliver; Yuriy V Pershin; J Michael Ramsey; Robert Riehn; Gautam V Soni; Vincent Tabard-Cossa; Meni Wanunu; Matthew Wiggin; Jeffery A Schloss Journal: Nat Biotechnol Date: 2008-10 Impact factor: 54.908
Authors: Michelle N Arbeitman; Eileen E M Furlong; Farhad Imam; Eric Johnson; Brian H Null; Bruce S Baker; Mark A Krasnow; Matthew P Scott; Ronald W Davis; Kevin P White Journal: Science Date: 2002-09-27 Impact factor: 47.728
Authors: Tamar Sofer; Nuzulul Kurniansyah; François Aguet; Kristin Ardlie; Peter Durda; Deborah A Nickerson; Joshua D Smith; Yongmei Liu; Sina A Gharib; Susan Redline; Stephen S Rich; Jerome I Rotter; Kent D Taylor Journal: Brief Bioinform Date: 2021-11-05 Impact factor: 11.622
Authors: Issam Hasni; Philippe Decloquement; Sandrine Demanèche; Rayane Mouh Mameri; Olivier Abbe; Philippe Colson; Bernard La Scola Journal: Microorganisms Date: 2020-05-21