| Literature DB >> 28446995 |
Habtamu K Benecha1, Brian Neelon2, Kimon Divaris3, John S Preisser4.
Abstract
Mixture distributions provide flexibility in modeling data collected from populations having unexplained heterogeneity. While interpretations of regression parameters from traditional finite mixture models are specific to unobserved subpopulations or latent classes, investigators are often interested in making inferences about the marginal mean of a count variable in the overall population. Recently, marginal mean regression modeling procedures for zero-inflated count outcomes have been introduced within the framework of maximum likelihood estimation of zero-inflated Poisson and negative binomial regression models. In this article, we propose marginalized mixture regression models based on two-component mixtures of non-degenerate count data distributions that provide directly interpretable estimates of exposure effects on the overall population mean of a count outcome. The models are examined using simulations and applied to two datasets, one from a double-blind dental caries incidence trial, and the other from a horticultural experiment. The finite sample performance of the proposed models are compared with each other and with marginalized zero-inflated count models, as well as ordinary Poisson and negative binomial regression.Entities:
Keywords: Dental caries; Excess zeros; Marginal inference; Mixture model; Over-dispersion; Zero-inflation
Year: 2017 PMID: 28446995 PMCID: PMC5384970 DOI: 10.1186/s40488-017-0057-4
Source DB: PubMed Journal: J Stat Distrib Appl ISSN: 2195-5832
Percent relative median biases of estimates of β 1, β 2 and β 3 from marginalized mixture models fitted to data generated from the MPois-Pois model with 10,000 replications
| Sample size | Parameter | Poisson | MZIP | MPois-Pois | NB | MZINB | MNB-Pois |
|---|---|---|---|---|---|---|---|
| 100 |
| 2.03 | 1.40 | −2.04 | 2.18 | 0.97 | 0.56 |
|
| 0.77 | −3.11 | 0.08 | 1.00 | −3.45 | 1.54 | |
|
| −0.30 | −0.61 | −0.70 | −0.26 | −0.74 | −0.33 | |
| 200 |
| 0.97 | 1.70 | −0.68 | 1.38 | 1.89 | 1.34 |
|
| −0.02 | −2.64 | −0.69 | 0.08 | −2.65 | 0.62 | |
|
| 0.15 | −0.43 | −0.29 | −0.09 | −0.41 | 0.06 | |
| 500 |
| −0.68 | −0.36 | −0.87 | −0.79 | −1.18 | 0.07 |
|
| 0.04 | −1.51 | 0.11 | 0.09 | −1.44 | 0.78 | |
|
| 0.08 | −0.16 | −0.14 | 0.05 | −0.11 | 0.19 | |
| 1000 |
| −0.14 | −0.37 | −0.40 | −0.07 | −0.64 | 0.43 |
|
| 0.48 | −1.43 | 0.27 | 0.50 | −0.91 | 0.88 | |
|
| 0.09 | −0.08 | 0.06 | 0.07 | −0.07 | 0.22 |
Type I error rates for the estimate of β 1 from marginalized models fitted to data generated from the MPois-Pois model with 10,000 replications
| Sample size | Poisson | MZIP | MPois-Pois | NB | MZINB | MNB-Pois |
|---|---|---|---|---|---|---|
| 100 | 0.127 | 0.102 | 0.068 | 0.077 | 0.070 | 0.073 |
| 200 | 0.131 | 0.106 | 0.067 | 0.077 | 0.072 | 0.069 |
| 500 | 0.135 | 0.112 | 0.060 | 0.079 | 0.073 | 0.065 |
| 1000 | 0.134 | 0.112 | 0.054 | 0.072 | 0.066 | 0.061 |
Coverages of 95% confidence intervals for estimates of β 1, β 2 and β 3 from marginalized models fitted to data generated from the MPois-Pois model with 10,000 replications
| Sample size | Parameter | Poisson | MZIP | MPois-Pois | NB | MZINB | MNB-Pois |
|---|---|---|---|---|---|---|---|
| 100 |
| 89.9 | 91.3 | 93.7 | 93.3 | 93.4 | 93.8 |
|
| 89.2 | 90.9 | 93.2 | 92.4 | 92.8 | 92.9 | |
|
| 91.8 | 92.9 | 95.2 | 94.9 | 94.7 | 95.1 | |
| 200 |
| 89.4 | 91.2 | 94.1 | 93.6 | 93.8 | 94.1 |
|
| 88.9 | 90.9 | 93.3 | 92.2 | 92.9 | 93.2 | |
|
| 91.4 | 92.6 | 95.1 | 95.1 | 94.9 | 95.2 | |
| 500 |
| 88.9 | 90.7 | 94.1 | 92.9 | 93.5 | 93.9 |
|
| 88.6 | 90.5 | 94.4 | 92.0 | 93.1 | 93.9 | |
|
| 91.0 | 92.0 | 94.9 | 95.1 | 94.8 | 94.9 | |
| 1000 |
| 89.3 | 90.9 | 94.7 | 93.5 | 93.9 | 94.4 |
|
| 88.5 | 90.8 | 94.7 | 92.1 | 93.1 | 93.8 | |
|
| 91.1 | 92.1 | 95.0 | 95.0 | 95.0 | 94.9 |
Percent relative median biases of estimates of β 1, β 2 and β 3 from marginalized mixture models fitted to data generated from the MNB-Pois model with 10,000 replications
| Sample size | Parameter | Poisson | MZIP | MPois-Pois | NB | MZINB | MNB-Pois |
|---|---|---|---|---|---|---|---|
| 100 |
| 6.94 | 23.51 | 6.80 | 10.48 | 13.72 | 11.95 |
|
| 2.98 | 7.95 | 4.00 | 5.26 | 1.89 | 4.44 | |
|
| 0.01 | 1.40 | −4.35 | 0.80 | 0.88 | −0.25 | |
| 200 |
| 5.27 | 20.12 | −14.85 | 5.45 | 7.41 | 4.57 |
|
| 1.45 | 5.11 | −1.12 | 2.49 | 0.07 | 2.02 | |
|
| 0.18 | 1.49 | −5.44 | 0.38 | 0.36 | 0.33 | |
| 500 |
| 0.57 | 11.79 | −29.97 | 1.31 | 0.73 | −0.75 |
|
| 0.66 | 2.81 | −3.90 | 1.18 | 0.14 | 0.62 | |
|
| 0.39 | 1.52 | −7.66 | 0.59 | 0.61 | 0.46 | |
| 1000 |
| 1.19 | 10.34 | −34.68 | 1.92 | 2.39 | 0.00 |
|
| 0.79 | 2.63 | −4.75 | 1.00 | 0.39 | 0.87 | |
|
| −0.01 | 0.97 | −10.13 | 0.03 | −0.01 | −0.19 |
Type I error rates for the estimate of β 1 from marginalized models fitted to data generated from the MNB-Pois model with 10,000 replications
| Sample size | Poisson | MZIP | MPois-Pois | NB | MZINB | MNB-Pois |
|---|---|---|---|---|---|---|
| 100 | 0.325 | 0.271 | 0.262 | 0.084 | 0.079 | 0.103 |
| 200 | 0.334 | 0.272 | 0.255 | 0.079 | 0.073 | 0.064 |
| 500 | 0.341 | 0.273 | 0.232 | 0.081 | 0.074 | 0.053 |
| 1000 | 0.340 | 0.273 | 0.240 | 0.076 | 0.072 | 0.049 |
Coverages of 95% confidence intervals for estimates of β 1, β 2 and β 3 from marginalized models fitted to data generated from the MNB-Pois model with 10,000 replications
| Sample size | Parameter | Poisson | MZIP | MPois-Pois | NB | MZINB | MNB-Pois |
|---|---|---|---|---|---|---|---|
| 100 |
| 72.3 | 77.4 | 76.9 | 92.5 | 92.4 | 89.7 |
|
| 74.0 | 79.6 | 77.8 | 90.8 | 91.8 | 89.6 | |
|
| 74.4 | 79.4 | 83.0 | 94.0 | 93.7 | 92.0 | |
| 200 |
| 71.6 | 77.6 | 78.1 | 92.2 | 92.3 | 93.0 |
|
| 72.8 | 79.1 | 78.9 | 91.0 | 91.8 | 92.7 | |
|
| 74.1 | 80.0 | 83.9 | 94.4 | 94.0 | 93.5 | |
| 500 |
| 71.2 | 77.0 | 78.1 | 92.1 | 92.2 | 94.2 |
|
| 72.3 | 78.6 | 80.8 | 90.5 | 91.3 | 94.5 | |
|
| 73.6 | 79.7 | 80.2 | 94.3 | 93.9 | 94.5 | |
| 1000 |
| 71.7 | 77.5 | 76.2 | 92.7 | 93.1 | 95.0 |
|
| 73.0 | 78.9 | 81.5 | 90.2 | 91.6 | 95.0 | |
|
| 74.1 | 80.7 | 71.6 | 94.6 | 94.6 | 95.3 |
Fig. 1Histogram of two-year DMFS increment for 3412 children ages 11–12 from a dental caries incidence trial conducted in Lanarkshire, Scotland between 1988 and 1992
Fig. 2Observed (circles) and predicted relative frequencies of two-year DMFS increment for children in the Lanarkshire caries trial
Marginalized count regression model estimates (est) and standard errors (SE) for the Lanarkshire caries trial
| Poisson | NB | MZIP | MZINB | MPois-Pois | MNB-Pois | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Variable | est | se | est | se | est | se | est | se | est | se | est | se |
| Marginal mean model | ||||||||||||
| Intercept | 1.216 | 0.017 | 1.201 | 0.035 | 1.234 | 0.022 | 1.206 | 0.034 | 1.309 | 0.029 | 1.207 | 0.035 |
| bc | 0.758 | 0.017 | 0.757 | 0.036 | 0.765 | 0.023 | 0.757 | 0.035 | 0.663 | 0.026 | 0.757 | 0.035 |
| calc | −0.204 | 0.020 | −0.189 | 0.040 | −0.211 | 0.025 | −0.195 | 0.039 | −0.263 | 0.031 | −0.199 | 0.039 |
| NaF | −0.072 | 0.018 | −0.056 | 0.039 | −0.098 | 0.023 | −0.060 | 0.038 | −0.144 | 0.028 | −0.060 | 0.038 |
| NaFTMP | −0.052 | 0.022 | −0.022 | 0.048 | −0.104 | 0.029 | −0.034 | 0.047 | −0.062 | 0.035 | −0.033 | 0.047 |
| Zero-inflation model | Latent class mean model | |||||||||||
| Intercept | −1.280 | 0.079 | −2.257 | 0.290 | 2.131 | 0.031 | −1.425 | 0.513 | ||||
| bc | −1.309 | 0.113 | −2.911 | 1.571 | 0.512 | 0.027 | 3.539 | 0.500 | ||||
| calc | 0.129 | 0.095 | −0.149 | 0.273 | −0.249 | 0.031 | 0.022 | 0.173 | ||||
| NaF | 0.208 | 0.096 | 0.419 | 0.307 | −0.152 | 0.029 | −0.013 | 0.188 | ||||
| NaFTMP | 0.289 | 0.119 | 0.545 | 0.363 | −0.081 | 0.037 | −0.307 | 0.201 | ||||
| Mixing probability model estimatesa | ||||||||||||
|
| −0.753 | 0.055 | −1.818 | 0.170 | ||||||||
|
| 0.320 | 0.140 | ||||||||||
| Dispersion estimatesb | ||||||||||||
|
| −0.328 | 0.042 | ||||||||||
|
| 0.806 | 0.028 | 0.639 | 0.038 | 0.720 | |||||||
| Model fit statistics | ||||||||||||
| −2loglik | 22706 | 17249 | 20410 | 17189 | 18048 | 17169 | ||||||
| AIC | 22716 | 17261 | 20430 | 17211 | 18070 | 17193 | ||||||
aNote that π=1/(1+e −)
balso, τ= logϕ
Estimated log-likelihood, AIC and incidence density ratios (95% CI) comparing NaF and NaFTMP with SMFP in the Lanakshire trial, based on four marginalized models
| Incidence Density Ratio (95% CI) | ||
|---|---|---|
| Model | NaF | NaFTMP |
| Poisson | 0.931 (0.898, 0.965) | 0.949 (0.909, 0.992) |
| NB | 0.946 (0.875, 1.022) | 0.979 (0.890, 1.076) |
| MZIP | 0.907 (0.867, 0.948) | 0.902 (0.852, 0.953) |
| MZINB | 0.942 (0.874, 1.015) | 0.967 (0.881, 1.061) |
| MPois-Pois | 0.866 (0.820, 0.915) | 0.940 (0.879, 1.006) |
| MNB-Pois | 0.942 (0.874, 1.015) | 0.968 (0.882, 1.062) |
Marginalized count regression model estimates (est) and stanard errors (se) for the number of roots produced by 270 shoots of the apple cultivar Trajan
| Poisson | NBa | MZIP | MZINBb | MPois-Pois | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Variable | est | se | est | se | est | se | est | se | est | se |
| Marginal mean model | ||||||||||
| Intercept | 1.880 | 0.058 | 1.876 | 0.126 | 1.854 | 0.060 | 1.855 | 0.072 | 1.863 | 0.056 |
| Photoperiod, 16h | -0.711 | 0.104 | -0.706 | 0.188 | -0.620 | 0.134 | -0.618 | 0.152 | -0.687 | 0.144 |
| log(BAP/2.2) | 0.069 | 0.042 | 0.073 | 0.092 | 0.092 | 0.042 | 0.091 | 0.052 | 0.080 | 0.034 |
| Interaction | -0.176 | 0.077 | -0.182 | 0.138 | -0.258 | 0.078 | -0.259 | 0.094 | -0.218 | 0.075 |
| Zero-inflation model | Latent class mean model | |||||||||
| Intercept | -4.262 | 0.732 | -4.381 | 0.827 | 2.142 | 0.051 | ||||
| Photoperiod, 16h | 4.159 | 0.753 | 4.264 | 0.846 | -4.238 | 0.552 | ||||
| Mixing probability modelc | ||||||||||
|
| 0.178 | 0.184 | ||||||||
|
| 0.544 | |||||||||
| Model fit statistics | ||||||||||
| -2loglik | 1566.4 | 1402.1 | 1250.2 | 1236.5 | 1236.4 | |||||
| AIC | 1574.4 | 1412.1 | 1262.2 | 1250.5 | 1250.4 | |||||
aIn the NB model, (s.e. = 0.083)
bIn the MZINB model, (s.e. 0.351) corresponding to
cIn the MPois-Pois model, π=1/(1+e −)
Model-predicted mean number of roots of the apple cultivar Trajan produced by the eight treatments
| Treatment | No. of shoots | Observed mean | Count regression modela | ||||
|---|---|---|---|---|---|---|---|
| Poisson | NB | MZIP | MZINB | MPP | |||
| 8h + BAP 2.2 | 30 | 5.83 | 6.55 | 6.53 | 6.39 | 6.39 | 6.44 |
| 8h + BAP 4.4 | 30 | 7.77 | 6.87 | 6.87 | 6.81 | 6.81 | 6.81 |
| 8h + BAP 8.8 | 40 | 7.50 | 7.21 | 7.22 | 7.25 | 7.25 | 7.20 |
| 8h + BAP 17.6 | 40 | 7.15 | 7.56 | 7.60 | 7.73 | 7.72 | 7.60 |
| 16h + BAP 2.2 | 30 | 3.27 | 3.22 | 3.22 | 3.43 | 3.45 | 3.24 |
| 16h + BAP 4.4 | 30 | 2.73 | 2.99 | 2.99 | 3.06 | 3.07 | 2.95 |
| 16h + BAP 8.8 | 30 | 3.13 | 2.78 | 2.77 | 2.73 | 2.73 | 2.68 |
| 16h + BAP 17.6 | 40 | 2.45 | 2.58 | 2.57 | 2.43 | 2.43 | 2.43 |
aMPP = MPois-Pois model