Alyssa C Frazee1, Andrew E Jaffe2, Ben Langmead3, Jeffrey T Leek1. 1. Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Center for Computational Biology and. 2. Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Center for Computational Biology and. 3. Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Center for Computational Biology and Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
Abstract
MOTIVATION: Statistical methods development for differential expression analysis of RNA sequencing (RNA-seq) requires software tools to assess accuracy and error rate control. Since true differential expression status is often unknown in experimental datasets, artificially constructed datasets must be utilized, either by generating costly spike-in experiments or by simulating RNA-seq data. RESULTS: Polyester is an R package designed to simulate RNA-seq data, beginning with an experimental design and ending with collections of RNA-seq reads. Its main advantage is the ability to simulate reads indicating isoform-level differential expression across biological replicates for a variety of experimental designs. Data generated by Polyester is a reasonable approximation to real RNA-seq data and standard differential expression workflows can recover differential expression set in the simulation by the user. AVAILABILITY AND IMPLEMENTATION: Polyester is freely available from Bioconductor (http://bioconductor.org/). CONTACT: jtleek@gmail.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Statistical methods development for differential expression analysis of RNA sequencing (RNA-seq) requires software tools to assess accuracy and error rate control. Since true differential expression status is often unknown in experimental datasets, artificially constructed datasets must be utilized, either by generating costly spike-in experiments or by simulating RNA-seq data. RESULTS: Polyester is an R package designed to simulate RNA-seq data, beginning with an experimental design and ending with collections of RNA-seq reads. Its main advantage is the ability to simulate reads indicating isoform-level differential expression across biological replicates for a variety of experimental designs. Data generated by Polyester is a reasonable approximation to real RNA-seq data and standard differential expression workflows can recover differential expression set in the simulation by the user. AVAILABILITY AND IMPLEMENTATION: Polyester is freely available from Bioconductor (http://bioconductor.org/). CONTACT: jtleek@gmail.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Gregory R Grant; Michael H Farkas; Angel D Pizarro; Nicholas F Lahens; Jonathan Schug; Brian P Brunk; Christian J Stoeckert; John B Hogenesch; Eric A Pierce Journal: Bioinformatics Date: 2011-07-19 Impact factor: 6.937
Authors: Srikumar Sengupta; Jennifer M Bolin; Victor Ruotti; Bao Kim Nguyen; James A Thomson; Angela L Elwell; Ron Stewart Journal: J Vis Exp Date: 2011-10-27 Impact factor: 1.355
Authors: Xu Shi; Andrew F Neuwald; Xiao Wang; Tian-Li Wang; Leena Hilakivi-Clarke; Robert Clarke; Jianhua Xuan Journal: Bioinformatics Date: 2021-05-05 Impact factor: 6.937
Authors: Stephanie C Hicks; Kwame Okrah; Joseph N Paulson; John Quackenbush; Rafael A Irizarry; Héctor Corrada Bravo Journal: Biostatistics Date: 2018-04-01 Impact factor: 5.899
Authors: Narayanan Raghupathy; Kwangbom Choi; Matthew J Vincent; Glen L Beane; Keith S Sheppard; Steven C Munger; Ron Korstanje; Fernando Pardo-Manual de Villena; Gary A Churchill Journal: Bioinformatics Date: 2018-07-01 Impact factor: 6.937
Authors: Leonardo Collado-Torres; Abhinav Nellore; Alyssa C Frazee; Christopher Wilks; Michael I Love; Ben Langmead; Rafael A Irizarry; Jeffrey T Leek; Andrew E Jaffe Journal: Nucleic Acids Res Date: 2016-09-29 Impact factor: 16.971
Authors: Bahman Afsari; Theresa Guo; Michael Considine; Liliana Florea; Luciane T Kagohara; Genevieve L Stein-O'Brien; Dylan Kelley; Emily Flam; Kristina D Zambo; Patrick K Ha; Donald Geman; Michael F Ochs; Joseph A Califano; Daria A Gaykalova; Alexander V Favorov; Elana J Fertig Journal: Bioinformatics Date: 2018-06-01 Impact factor: 6.937