| Literature DB >> 30761571 |
Ivonne Martin1,2, Hae-Won Uh3, Taniawati Supali4, Makedonka Mitreva5,6, Jeanine J Houwing-Duistermaat3,7.
Abstract
Clustered overdispersed multivariate count data are challenging to model due to the presence of correlation within and between samples. Typically, the first source of correlation needs to be addressed but its quantification is of less interest. Here, we focus on the correlation between time points. In addition, the effects of covariates on the multivariate counts distribution need to be assessed. To fulfill these requirements, a regression model based on the Dirichlet-multinomial distribution for association between covariates and the categorical counts is extended by using random effects to deal with the additional clustering. This model is the Dirichlet-multinomial mixed regression model. Alternatively, a negative binomial regression mixed model can be deployed where the corresponding likelihood is conditioned on the total count. It appears that these two approaches are equivalent when the total count is fixed and independent of the random effects. We consider both subject-specific and categorical-specific random effects. However, the latter has a larger computational burden when the number of categories increases. Our work is motivated by microbiome data sets obtained by sequencing of the amplicon of the bacterial 16S rRNA gene. These data have a compositional structure and are typically overdispersed. The microbiome data set is from an epidemiological study carried out in a helminth-endemic area in Indonesia. The conclusions are as follows: time has no statistically significant effect on microbiome composition, the correlation between subjects is statistically significant, and treatment has a significant effect on the microbiome composition only in infected subjects who remained infected.Entities:
Keywords: Dirichlet-multinomial; conditional model; count; generalized linear mixed model; microbiome; multivariate; overdispersion
Mesh:
Year: 2019 PMID: 30761571 PMCID: PMC6594162 DOI: 10.1002/sim.8101
Source DB: PubMed Journal: Stat Med ISSN: 0277-6715 Impact factor: 2.373
The J × K contingency table
|
| ||||
|---|---|---|---|---|
|
| ||||
|
|
| … |
| |
|
|
| … |
| |
|
|
|
| … |
|
| ⋮ | ⋮ | ⋮ | ⋮ | |
|
|
| … |
| |
| Marginal |
| … |
| |
Figure 1Bias and mean squared error (MSE) of data sets generated from the Dirichlet‐multinomial mixed model with subject‐specific random effect. : a vector of parameters in loglinear model. σ : the standard deviation of the between individual variation. θ the overdispersion
The mean estimates (standard deviation) over 1000 replicates when data sets were generated from the Dirichlet‐multinomial mixed model with categorical‐specific random effect with common variance
|
|
|
|
|
|
|
| Loglik | |
|---|---|---|---|---|---|---|---|---|
| Subj‐sp | 0.418(0.130) | −0.959(0.059) | 0.096(0.050) | 0.765(0.071) | −1.900(0.075) | −1.680(0.754) | −1.886(0.094) | −3918.581(18.333) |
| Cat‐sp | 0.438(0.172) | −1.007(0.116) | 0.108(0.109) | 0.809(0.084) | −2.017(0.086) | −0.704(0.005) | −2.366(0.125) | −3961.971(14.865) |
|
|
|
|
|
|
|
|
| |
| Sub‐sp | 0.359(0.130) | −0.882(0.080) | 0.096(0.077) | 0.695(0.088) | −1.739(0.095) | −0.973(0.334) | −1.320(0.099) | −4064.461(18.503) |
| Cat‐sp | 0.462(0.170) | −1.000(0.122) | 0.110(0.117) | 0.794(0.085) | −1.997(0.083) | −0.607(0.028) | −2.253(0.129) | −3988.959(16.392) |
|
|
|
|
|
|
|
|
| |
| Sub‐sp | 0.300(0.132) | −0.766(0.099) | 0.092(0.105) | 0.602(0.11) | −1.509(0.121) | −0.766(0.196) | −0.698(0.112) | −4171.966(17.482) |
| Cat‐sp | 0.455(0.194) | −1.004(0.173) | 0.099(0.17) | 0.795(0.089) | −1.985(0.096) | −0.237(0.049) | −2.235(0.165) | −4011.262(19.136) |
|
|
|
|
|
|
|
|
| |
| Sub‐sp | 0.270(0.129) | −0.691(0.112) | 0.092(0.117) | 0.541(0.122) | −1.376(0.133) | −0.699(0.187) | −0.367(0.117) | −4193.591(19.342) |
| Cat‐sp | 0.449(0.200) | −1.003(0.213) | 0.100(0.197) | 0.795(0.088) | −1.985(0.101) | −0.020(0.046) | −2.225(0.177) | −3993.517(23.146) |
Each rows started with Sub‐sp represents the estimates (standard deviation) when data sets were fitted with the DMM model with subject‐specific random effect and rows started with Cat‐sp represents the estimates (standard deviation) when data sets were fitted with DMM model with categorical‐specific random effect having common variance.
,σ ,θ are as explained in Figure 1.
Loglik represents the loglikelihood value obtained using the corresponding model.
Rows in gray represent the estimation when the standard deviation of the normally distributed random effect is small.
Figure 2Estimates of the fixed effect and overdispersion parameters obtained from three different models (UNBM, CNBM, and DMM) when data sets were generated using UNBM model. CNBM, conditional negative‐binomial mixed model; DMM, Dirichlet‐multinomial mixed; UNBM, unconstrained negative binomial mixed [Colour figure can be viewed at wileyonlinelibrary.com]
Figure 3Estimates for the variance components obtained from three different models (UNBM, CNBM, and DMM) when data sets were generated using the UNBM model. CNBM, conditional negative‐binomial mixed model; DMM, Dirichlet‐multinomial mixed; UNBM, unconstrained negative binomial mixed [Colour figure can be viewed at wileyonlinelibrary.com]
The observed marginal correlation of the motivating data set
|
|
|
|
|
|
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| 1 | −0.46 | −0.43 | −0.48 | −0.12 | −0.23 | ||||||
|
| · | 1 | −0.29 | 0.13 | 0.02 | 0 | ||||||
|
| · | · | 1 | −0.27 | −0.19 | 0 | ||||||
|
| · | · | · | 1 | 0.1 | 0.06 | ||||||
|
| · | · | · | · | 1 | 0.01 | ||||||
|
| · | · | · | · | · | 1 | ||||||
|
|
| −0.11 | −0.05 | −0.01 | 0 | −0.13 | 1 | −0.27 | −0.53 | −0.57 | 0.04 | −0.14 |
|
| −0.14 |
| 0.04 | 0.03 | −0.01 | −0.05 | · | 1 | −0.27 | −0.15 | −0.05 | 0.01 |
|
| 0.04 | 0.05 |
| −0.07 | −0.08 | −0.1 | · | · | 1 | −0.07 | −0.22 | −0.11 |
|
| −0.11 | −0.02 | 0.01 |
| 0.05 | 0.3 | · | · | · | 1 | 0.02 | 0.09 |
|
| 0.06 | −0.25 | 0.09 | 0.01 |
| 0.01 | · | · | · | · | 1 | −0.05 |
|
| −0.07 | 0.08 | −0.06 | −0.01 | 0.23 |
| · | · | · | · | · | 1 |
represents the bacterial phyla j,j = 1,…,6 at time‐point t. The order of j are Firmicutes, Actinobacteria, Bacteroidetes, Proteobacteria, Unclassified and pooled category.
Characteristics at baseline for study participants
| Characteristics | albendazole | placebo |
|---|---|---|
| (N = 69) | (N = 81) | |
| Age (in years), mean (sd) | 28.38 (16.52) | 27.85 (16.91) |
| Sex, female, n(%) | 39 (56.52) | 45 (55.56) |
| Helminth Infections, n(%) | ||
|
| 17 (24.64) | 18 (22.22) |
| Hookworm | 26 (37.681) | 23 (28.40) |
|
| 25 (36.23) | 23 (28.40) |
|
| 2 (2.90) | 2 (2.47) |
|
| 20 (28.99) | 22 (27.16) |
| Any helminths | 47 (68.12) | 47 (58.03) |
| Proportion (in %) of the 6 most abundant bacteria phyla, mean(sd) | ||
|
| 12.49 (8.95) | 10.96 (7.98) |
|
| 7.41 (11.35) | 6.42 (11.04) |
|
| 66.83 (13.46) | 70.05 (13.66) |
|
| 9.78 (7.86) | 9.16 (8.37) |
| Unclassified | 1.95 (2.22) | 2.68 (3.22) |
| Pooled | 1.54 (3.67) | 0.73 (1.22) |
Unclassified represents sequences that cannot be assigned to a phyla.
Pooled category consists of the remaining 13 phyla having average relative abundance among samples less than 1%.
Figure 4The profile of the microbiome study. The chart shows the number of subjecs infected with at least one of the prevalent soil transmitted helminths (Helminth (+)) or free of helminth infections (Helminth (‐)) that belonged to either the placebo or albendazole treatment group, at pre‐treatment and 21 months after the first treatment round. The circled number represents the condition explained in Section 4 [Colour figure can be viewed at wileyonlinelibrary.com]
The log odds ratio (95% CI) when data set were fitted with Dirichlet‐multinomial mixed with categorical‐specific random effect having common variance*
| Categories | INF |
| TRT× | Bhelm × INF × TRT× |
|---|---|---|---|---|
|
| −0.006 (−0.218, 0.207) | 0.050 (−0.155, 0.256) | 0.046 (−0.235, 0.326) | 0.326 (−0.042, 0.694) |
|
| 0.220 (−0.056, 0.496) | −0.119 (−0.395, 0.157) | −0.012 (−0.381, 0.356) | −0.916 (−1.573, −0.259) |
|
| 0.171 (−0.054, 0.396) | 0.056 (−0.161, 0.273) | 0.035 (−0.256, 0.326) | 0.026 (−0.376, 0.427) |
| Unclassified | −0.024 (−0.304, 0.257) | 0.129 (−0.149, 0.407) | −0.099 (−0.476, 0.277) | −0.159 (−0.727, 0.410) |
| Pooled | 0.166 (−0.158, 0.490) | 0.195 (−0.124, 0.515) | −0.030 (−0.449, 0.388) | −0.180 (−0814, 0.454) |
| Loglik | −8285.5 |
| 0.08 (0.01) | |
|
| 0.22 (0.03) |
Fitted with SAS procedure NLMIXED with 3 quadrature points of adaptive Gauss‐Hermite approximation.
The estimated marginal correlation of the data set obtained by Dirichlet‐multinomial mixed model with categorical‐specific random effect having common variance across categories
|
|
|
|
|
|
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| 1 | −0.55 | −0.35 | −0.51 | −0.3 | −0.22 | ||||||
|
| · | 1 | −0.06 | −0.09 | −0.05 | −0.03 | ||||||
|
| · | · | 1 | −0.04 | −0.03 | −0.02 | ||||||
|
| · | · | · | 1 | −0.05 | −0.03 | ||||||
|
| · | · | · | · | 1 | −0.02 | ||||||
|
| · | · | · | · | · | 1 | ||||||
|
|
| −0.12 | −0.06 | −0.1 | −0.05 | −0.04 | 1 | −0.57 | −0.3 | −0.51 | −0.3 | −0.23 |
|
| −0.12 |
| 0.03 | 0.03 | 0.02 | 0.02 | · | 1 | −0.07 | −0.08 | −0.06 | −0.04 |
|
| −0.05 | 0.02 |
| 0.02 | 0.01 | 0.01 | · | · | 1 | −0.05 | −0.02 | −0.01 |
|
| −0.1 | 0.02 | 0.02 |
| 0.02 | 0.01 | · | · | · | 1 | −0.05 | −0.03 |
|
| −0.05 | 0.02 | 0.01 | 0.02 |
| 0.01 | · | · | · | · | 1 | −0.02 |
|
| −0.04 | 0.02 | 0.01 | 0.01 | 0.01 |
| · | · | · | · | · | 1 |