Yuwen Liu1, Jie Zhou, Kevin P White. 1. Institute of Genomics and Systems Biology, Committee on Development, Regeneration, and Stem Cell Biology and Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA.
Abstract
MOTIVATION: RNA-seq is replacing microarrays as the primary tool for gene expression studies. Many RNA-seq studies have used insufficient biological replicates, resulting in low statistical power and inefficient use of sequencing resources. RESULTS: We show the explicit trade-off between more biological replicates and deeper sequencing in increasing power to detect differentially expressed (DE) genes. In the human cell line MCF7, adding more sequencing depth after 10 M reads gives diminishing returns on power to detect DE genes, whereas adding biological replicates improves power significantly regardless of sequencing depth. We also propose a cost-effectiveness metric for guiding the design of large-scale RNA-seq DE studies. Our analysis showed that sequencing less reads and performing more biological replication is an effective strategy to increase power and accuracy in large-scale differential expression RNA-seq studies, and provided new insights into efficient experiment design of RNA-seq studies. AVAILABILITY AND IMPLEMENTATION: The code used in this paper is provided on: http://home.uchicago.edu/∼jiezhou/replication/. The expression data is deposited in the Gene Expression Omnibus under the accession ID GSE51403.
MOTIVATION: RNA-seq is replacing microarrays as the primary tool for gene expression studies. Many RNA-seq studies have used insufficient biological replicates, resulting in low statistical power and inefficient use of sequencing resources. RESULTS: We show the explicit trade-off between more biological replicates and deeper sequencing in increasing power to detect differentially expressed (DE) genes. In the human cell line MCF7, adding more sequencing depth after 10 M reads gives diminishing returns on power to detect DE genes, whereas adding biological replicates improves power significantly regardless of sequencing depth. We also propose a cost-effectiveness metric for guiding the design of large-scale RNA-seq DE studies. Our analysis showed that sequencing less reads and performing more biological replication is an effective strategy to increase power and accuracy in large-scale differential expression RNA-seq studies, and provided new insights into efficient experiment design of RNA-seq studies. AVAILABILITY AND IMPLEMENTATION: The code used in this paper is provided on: http://home.uchicago.edu/∼jiezhou/replication/. The expression data is deposited in the Gene Expression Omnibus under the accession ID GSE51403.
Authors: David Brawand; Magali Soumillon; Anamaria Necsulea; Philippe Julien; Gábor Csárdi; Patrick Harrigan; Manuela Weier; Angélica Liechti; Ayinuer Aximu-Petri; Martin Kircher; Frank W Albert; Ulrich Zeller; Philipp Khaitovich; Frank Grützner; Sven Bergmann; Rasmus Nielsen; Svante Pääbo; Henrik Kaessmann Journal: Nature Date: 2011-10-19 Impact factor: 49.962
Authors: Michele A Busby; Chip Stewart; Chase A Miller; Krzysztof R Grzeda; Gabor T Marth Journal: Bioinformatics Date: 2013-01-12 Impact factor: 6.937
Authors: Cole Trapnell; Adam Roberts; Loyal Goff; Geo Pertea; Daehwan Kim; David R Kelley; Harold Pimentel; Steven L Salzberg; John L Rinn; Lior Pachter Journal: Nat Protoc Date: 2012-03-01 Impact factor: 13.491
Authors: Shihao Shen; Juw Won Park; Zhi-xiang Lu; Lan Lin; Michael D Henry; Ying Nian Wu; Qing Zhou; Yi Xing Journal: Proc Natl Acad Sci U S A Date: 2014-12-05 Impact factor: 11.205
Authors: Ian Maze; Li Shen; Bin Zhang; Benjamin A Garcia; Ningyi Shao; Amanda Mitchell; HaoSheng Sun; Schahram Akbarian; C David Allis; Eric J Nestler Journal: Nat Neurosci Date: 2014-10-28 Impact factor: 24.884
Authors: Michal Kabza; Justyna A Karolak; Malgorzata Rydzanicz; Michał W Szcześniak; Dorota M Nowak; Barbara Ginter-Matuszewska; Piotr Polakowski; Rafal Ploski; Jacek P Szaflik; Marzena Gajecka Journal: Eur J Hum Genet Date: 2017-02-01 Impact factor: 4.246