Yuanyuan Bian1, Chong He1, Jie Hou2, Jianlin Cheng2, Jing Qiu3. 1. Department of Statistics, University of Missouri, Columbia, MO, USA. 2. Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA. 3. Department of Applied Economics and Statistics, University of Delaware, Newark, DE, USA.
Abstract
MOTIVATION: Several methods have been proposed for the paired RNA-seq analysis. However, many of them do not consider the heterogeneity in treatment effect among pairs that can naturally arise in real data. In addition, it has been reported in literature that the false discovery rate (FDR) control of some popular methods has been problematic. In this paper, we present a full hierarchical Bayesian model for the paired RNA-seq count data that accounts for variation of treatment effects among pairs and controls the FDR through the posterior expected FDR. RESULTS: Our simulation studies show that most competing methods can have highly inflated FDR for small to moderate sample sizes while PairedFB is able to control FDR close to the nominal levels. Furthermore, PairedFB has overall better performance in ranking true differentially expressed genes (DEGs) on the top than others, especially when the sample size gets bigger or when the heterogeneity level of treatment effects is high. In addition, PairedFB can be applied to identify the biologically significant DEGs with controlled FDR. The real data analysis also indicates PairedFB tends to find more biologically relevant genes even when the sample size is small. PairedFB is also shown to be robust with respect to the model misspecification in terms of its relative performance compared to others. AVAILABILITY AND IMPLEMENTATION: Software to implement this method (PairedFB) can be downloaded at: https://sites.google.com/a/udel.edu/qiujing/publication. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Several methods have been proposed for the paired RNA-seq analysis. However, many of them do not consider the heterogeneity in treatment effect among pairs that can naturally arise in real data. In addition, it has been reported in literature that the false discovery rate (FDR) control of some popular methods has been problematic. In this paper, we present a full hierarchical Bayesian model for the paired RNA-seq count data that accounts for variation of treatment effects among pairs and controls the FDR through the posterior expected FDR. RESULTS: Our simulation studies show that most competing methods can have highly inflated FDR for small to moderate sample sizes while PairedFB is able to control FDR close to the nominal levels. Furthermore, PairedFB has overall better performance in ranking true differentially expressed genes (DEGs) on the top than others, especially when the sample size gets bigger or when the heterogeneity level of treatment effects is high. In addition, PairedFB can be applied to identify the biologically significant DEGs with controlled FDR. The real data analysis also indicates PairedFB tends to find more biologically relevant genes even when the sample size is small. PairedFB is also shown to be robust with respect to the model misspecification in terms of its relative performance compared to others. AVAILABILITY AND IMPLEMENTATION: Software to implement this method (PairedFB) can be downloaded at: https://sites.google.com/a/udel.edu/qiujing/publication. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Anna Esteve-Codina; Oriol Arpi; Maria Martinez-García; Estela Pineda; Mar Mallo; Marta Gut; Cristina Carrato; Anna Rovira; Raquel Lopez; Avelina Tortosa; Marc Dabad; Sonia Del Barco; Simon Heath; Silvia Bagué; Teresa Ribalta; Francesc Alameda; Nuria de la Iglesia; Carmen Balaña Journal: PLoS One Date: 2017-01-25 Impact factor: 3.240
Authors: Lisa M Chung; John P Ferguson; Wei Zheng; Feng Qian; Vincent Bruno; Ruth R Montgomery; Hongyu Zhao Journal: BMC Bioinformatics Date: 2013-03-27 Impact factor: 3.169
Authors: Brian E Vestal; Camille M Moore; Elizabeth Wynn; Laura Saba; Tasha Fingerlin; Katerina Kechris Journal: BMC Bioinformatics Date: 2020-08-28 Impact factor: 3.169