Shiquan Sun1,2, Jiaqiang Zhu2, Sahar Mozaffari3, Carole Ober3, Mengjie Chen3,4, Xiang Zhou2,5. 1. Department of Computer Science, Northwestern Polytechnical University, Xi'an, Shaanxi, China. 2. Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA. 3. Department of Human Genetics, University of Chicago, Chicago, IL, USA. 4. Department of Medicine, University of Chicago, Chicago, IL, USA. 5. Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA.
Abstract
Motivation: Genomic sequencing studies, including RNA sequencing and bisulfite sequencing studies, are becoming increasingly common and increasingly large. Large genomic sequencing studies open doors for accurate molecular trait heritability estimation and powerful differential analysis. Heritability estimation and differential analysis in sequencing studies requires the development of statistical methods that can properly account for the count nature of the sequencing data and that are computationally efficient for large datasets. Results: Here, we develop such a method, PQLseq (Penalized Quasi-Likelihood for sequencing count data), to enable effective and efficient heritability estimation and differential analysis using the generalized linear mixed model framework. With extensive simulations and comparisons to previous methods, we show that PQLseq is the only method currently available that can produce unbiased heritability estimates for sequencing count data. In addition, we show that PQLseq is well suited for differential analysis in large sequencing studies, providing calibrated type I error control and more power compared to the standard linear mixed model methods. Finally, we apply PQLseq to perform gene expression heritability estimation and differential expression analysis in a large RNA sequencing study in the Hutterites. Availability and implementation: PQLseq is implemented as an R package with source code freely available at www.xzlab.org/software.html and https://cran.r-project.org/web/packages/PQLseq/index.html. Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: Genomic sequencing studies, including RNA sequencing and bisulfite sequencing studies, are becoming increasingly common and increasingly large. Large genomic sequencing studies open doors for accurate molecular trait heritability estimation and powerful differential analysis. Heritability estimation and differential analysis in sequencing studies requires the development of statistical methods that can properly account for the count nature of the sequencing data and that are computationally efficient for large datasets. Results: Here, we develop such a method, PQLseq (Penalized Quasi-Likelihood for sequencing count data), to enable effective and efficient heritability estimation and differential analysis using the generalized linear mixed model framework. With extensive simulations and comparisons to previous methods, we show that PQLseq is the only method currently available that can produce unbiased heritability estimates for sequencing count data. In addition, we show that PQLseq is well suited for differential analysis in large sequencing studies, providing calibrated type I error control and more power compared to the standard linear mixed model methods. Finally, we apply PQLseq to perform gene expression heritability estimation and differential expression analysis in a large RNA sequencing study in the Hutterites. Availability and implementation: PQLseq is implemented as an R package with source code freely available at www.xzlab.org/software.html and https://cran.r-project.org/web/packages/PQLseq/index.html. Supplementary information: Supplementary data are available at Bioinformatics online.
Authors: Jian Yang; Beben Benyamin; Brian P McEvoy; Scott Gordon; Anjali K Henders; Dale R Nyholt; Pamela A Madden; Andrew C Heath; Nicholas G Martin; Grant W Montgomery; Michael E Goddard; Peter M Visscher Journal: Nat Genet Date: 2010-06-20 Impact factor: 38.330
Authors: S A Monks; A Leonardson; H Zhu; P Cundiff; P Pietrusiak; S Edwards; J W Phillips; A Sachs; E E Schadt Journal: Am J Hum Genet Date: 2004-10-21 Impact factor: 11.025
Authors: Deqiang Sun; Yuanxin Xi; Benjamin Rodriguez; Hyun Jung Park; Pan Tong; Mira Meong; Margaret A Goodell; Wei Li Journal: Genome Biol Date: 2014-02-24 Impact factor: 13.583
Authors: Allan F McRae; Joseph E Powell; Anjali K Henders; Lisa Bowdler; Gibran Hemani; Sonia Shah; Jodie N Painter; Nicholas G Martin; Peter M Visscher; Grant W Montgomery Journal: Genome Biol Date: 2014-05-29 Impact factor: 13.583
Authors: Ana Conesa; Pedro Madrigal; Sonia Tarazona; David Gomez-Cabrero; Alejandra Cervera; Andrew McPherson; Michał Wojciech Szcześniak; Daniel J Gaffney; Laura L Elo; Xuegong Zhang; Ali Mortazavi Journal: Genome Biol Date: 2016-01-26 Impact factor: 13.583
Authors: Tasha Thong; Yutong Wang; Michael D Brooks; Christopher T Lee; Clayton Scott; Laura Balzano; Max S Wicha; Justin A Colacino Journal: Front Cell Dev Biol Date: 2020-05-08
Authors: M Konki; N Lindgren; M Kyläniemi; R Venho; E Laajala; B Ghimire; R Lahesmaa; J Kaprio; J O Rinne; R J Lund Journal: Sci Rep Date: 2020-08-25 Impact factor: 4.379