| Literature DB >> 28348500 |
Yiwen Zhang1, Hua Zhou2, Jin Zhou3, Wei Sun4.
Abstract
Data with multivariate count responses frequently occur in modern applications. The commonly used multinomial-logit model is limiting due to its restrictive mean-variance structure. For instance, analyzing count data from the recent RNA-seq technology by the multinomial-logit model leads to serious errors in hypothesis testing. The ubiquity of over-dispersion and complicated correlation structures among multivariate counts calls for more flexible regression models. In this article, we study some generalized linear models that incorporate various correlation structures among the counts. Current literature lacks a treatment of these models, partly due to the fact that they do not belong to the natural exponential family. We study the estimation, testing, and variable selection for these models in a unifying framework. The regression models are compared on both synthetic and real RNA-seq data.Entities:
Keywords: Dirichlet-multinomial; analysis of deviance; categorical data analysis; generalized Dirichlet-multinomial; iteratively reweighted Poisson regression (IRPR); negative multinomial; reduced rank GLM; regularization
Year: 2017 PMID: 28348500 PMCID: PMC5365157 DOI: 10.1080/10618600.2016.1154063
Source DB: PubMed Journal: J Comput Graph Stat ISSN: 1061-8600 Impact factor: 2.302