| Literature DB >> 28042360 |
Abstract
We consider negative binomial (NB) regression models for RNA-Seq read counts and investigate an approach where such NB regression models are fitted to individual genes separately and, in particular, the NB dispersion parameter is estimated from each gene separately without assuming commonalities between genes. This single-gene approach contrasts with the more widely-used dispersion-modeling approach where the NB dispersion is modeled as a simple function of the mean or other measures of read abundance, and then estimated from a large number of genes combined. We show that through the use of higher-order asymptotic techniques, inferences with correct type I errors can be made about the regression coefficients in a single-gene NB regression model even when the dispersion is unknown and the sample size is small. The motivations for studying single-gene models include: 1) they provide a basis of reference for understanding and quantifying the power-robustness trade-offs of the dispersion-modeling approach; 2) they can also be potentially useful in practice if moderate sample sizes become available and diagnostic tools indicate potential problems with simple models of dispersion.Entities:
Keywords: 92D20; Extra-Poisson variation; Higher-order asymptotics; Negative binomial; Overdispersion; Power-robustness; Primary 62P10; RNA-Seq; Regression
Year: 2015 PMID: 28042360 PMCID: PMC5193394 DOI: 10.4310/SII.2015.v8.n4.a1
Source DB: PubMed Journal: Stat Interface ISSN: 1938-7989 Impact factor: 0.582