Hao Wu1, Chi Wang1, Zhijin Wu1. 1. Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, Department of Biostatistics and Markey Cancer Center, University of Kentucky, Lexington, KY 40536 and Department of Biostatistics, Brown University, Providence, RI 02806, USA.
Abstract
MOTIVATION: RNA-seq has become a routine technique in differential expression (DE) identification. Scientists face a number of experimental design decisions, including the sample size. The power for detecting differential expression is affected by several factors, including the fraction of DE genes, distribution of the magnitude of DE, distribution of gene expression level, sequencing coverage and the choice of type I error control. The complexity and flexibility of RNA-seq experiments, the high-throughput nature of transcriptome-wide expression measurements and the unique characteristics of RNA-seq data make the power assessment particularly challenging. RESULTS: We propose prospective power assessment instead of a direct sample size calculation by making assumptions on all of these factors. Our power assessment tool includes two components: (i) a semi-parametric simulation that generates data based on actual RNA-seq experiments with flexible choices on baseline expressions, biological variations and patterns of DE; and (ii) a power assessment component that provides a comprehensive view of power. We introduce the concepts of stratified power and false discovery cost, and demonstrate the usefulness of our method in experimental design (such as sample size and sequencing depth), as well as analysis plan (gene filtering). AVAILABILITY: The proposed method is implemented in a freely available R software package PROPER. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: RNA-seq has become a routine technique in differential expression (DE) identification. Scientists face a number of experimental design decisions, including the sample size. The power for detecting differential expression is affected by several factors, including the fraction of DE genes, distribution of the magnitude of DE, distribution of gene expression level, sequencing coverage and the choice of type I error control. The complexity and flexibility of RNA-seq experiments, the high-throughput nature of transcriptome-wide expression measurements and the unique characteristics of RNA-seq data make the power assessment particularly challenging. RESULTS: We propose prospective power assessment instead of a direct sample size calculation by making assumptions on all of these factors. Our power assessment tool includes two components: (i) a semi-parametric simulation that generates data based on actual RNA-seq experiments with flexible choices on baseline expressions, biological variations and patterns of DE; and (ii) a power assessment component that provides a comprehensive view of power. We introduce the concepts of stratified power and false discovery cost, and demonstrate the usefulness of our method in experimental design (such as sample size and sequencing depth), as well as analysis plan (gene filtering). AVAILABILITY: The proposed method is implemented in a freely available R software package PROPER. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Vivian G Cheung; Renuka R Nayak; Isabel Xiaorong Wang; Susannah Elwyn; Sarah M Cousins; Michael Morley; Richard S Spielman Journal: PLoS Biol Date: 2010-09-14 Impact factor: 8.029
Authors: Robert C Gentleman; Vincent J Carey; Douglas M Bates; Ben Bolstad; Marcel Dettling; Sandrine Dudoit; Byron Ellis; Laurent Gautier; Yongchao Ge; Jeff Gentry; Kurt Hornik; Torsten Hothorn; Wolfgang Huber; Stefano Iacus; Rafael Irizarry; Friedrich Leisch; Cheng Li; Martin Maechler; Anthony J Rossini; Gunther Sawitzki; Colin Smith; Gordon Smyth; Luke Tierney; Jean Y H Yang; Jianhua Zhang Journal: Genome Biol Date: 2004-09-15 Impact factor: 13.583
Authors: Daniel Bottomly; Nicole A R Walter; Jessica Ezzell Hunter; Priscila Darakjian; Sunita Kawane; Kari J Buck; Robert P Searles; Michael Mooney; Shannon K McWeeney; Robert Hitzemann Journal: PLoS One Date: 2011-03-24 Impact factor: 3.240
Authors: Sarah Djebali; Carrie A Davis; Angelika Merkel; Alex Dobin; Timo Lassmann; Ali Mortazavi; Andrea Tanzer; Julien Lagarde; Wei Lin; Felix Schlesinger; Chenghai Xue; Georgi K Marinov; Jainab Khatun; Brian A Williams; Chris Zaleski; Joel Rozowsky; Maik Röder; Felix Kokocinski; Rehab F Abdelhamid; Tyler Alioto; Igor Antoshechkin; Michael T Baer; Nadav S Bar; Philippe Batut; Kimberly Bell; Ian Bell; Sudipto Chakrabortty; Xian Chen; Jacqueline Chrast; Joao Curado; Thomas Derrien; Jorg Drenkow; Erica Dumais; Jacqueline Dumais; Radha Duttagupta; Emilie Falconnet; Meagan Fastuca; Kata Fejes-Toth; Pedro Ferreira; Sylvain Foissac; Melissa J Fullwood; Hui Gao; David Gonzalez; Assaf Gordon; Harsha Gunawardena; Cedric Howald; Sonali Jha; Rory Johnson; Philipp Kapranov; Brandon King; Colin Kingswood; Oscar J Luo; Eddie Park; Kimberly Persaud; Jonathan B Preall; Paolo Ribeca; Brian Risk; Daniel Robyr; Michael Sammeth; Lorian Schaffer; Lei-Hoon See; Atif Shahab; Jorgen Skancke; Ana Maria Suzuki; Hazuki Takahashi; Hagen Tilgner; Diane Trout; Nathalie Walters; Huaien Wang; John Wrobel; Yanbao Yu; Xiaoan Ruan; Yoshihide Hayashizaki; Jennifer Harrow; Mark Gerstein; Tim Hubbard; Alexandre Reymond; Stylianos E Antonarakis; Gregory Hannon; Morgan C Giddings; Yijun Ruan; Barbara Wold; Piero Carninci; Roderic Guigó; Thomas R Gingeras Journal: Nature Date: 2012-09-06 Impact factor: 49.962
Authors: Gil Kanfer; Shireen A Sarraf; Yaakov Maman; Heather Baldwin; Eunice Dominguez-Martin; Kory R Johnson; Michael E Ward; Martin Kampmann; Jennifer Lippincott-Schwartz; Richard J Youle Journal: J Cell Biol Date: 2021-02-01 Impact factor: 10.539
Authors: Yanzhu Lin; Kseniya Golovnina; Zhen-Xia Chen; Hang Noh Lee; Yazmin L Serrano Negron; Hina Sultana; Brian Oliver; Susan T Harbison Journal: BMC Genomics Date: 2016-01-05 Impact factor: 3.969