| Literature DB >> 26343704 |
Wenqi Wu1, James Stamey2, David Kahle3.
Abstract
Count data are subject to considerable sources of what is often referred to as non-sampling error. Errors such as misclassification, measurement error and unmeasured confounding can lead to substantially biased estimators. It is strongly recommended that epidemiologists not only acknowledge these sorts of errors in data, but incorporate sensitivity analyses into part of the total data analysis. We extend previous work on Poisson regression models that allow for misclassification by thoroughly discussing the basis for the models and allowing for extra-Poisson variability in the form of random effects. Via simulation we show the improvements in inference that are brought about by accounting for both the misclassification and the overdispersion.Entities:
Keywords: count data; misclassification; overdispersion
Mesh:
Year: 2015 PMID: 26343704 PMCID: PMC4586634 DOI: 10.3390/ijerph120910648
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Figure 1The naive baseline model: the number of deaths due to cancer () and non-cancer () follow a Poisson distribution with constant parameters.
Figure 2The no-misclassification Poisson regression model.
Figure 3The Poisson regression model with misclassification; (a) The graphical model representation of the model. denotes the sensitivity of the classifier, and denotes its specificity; (b) The contingency table representation of the data. () is the true number of deaths due to lung cancer (non-lung cancer). () is the number of true number of lung cancer (non-lung cancer) deaths misclassified. () is the observed number of deaths due to lung cancer (non-lung cancer). Note that C and denote correctly classified and misclassified, respectively.
Figure 4The random-intercept Poisson regression model with misclassification.
Figure 5Posterior means and 95% credible sets for sensitivity analysis of (with true value 0.5).
Average posterior means across 1000 simulations (truth: , ).
| 0.10 | –0.11 | –0.30 | 0.43 | 0.51 | |
| 0.25 | –0.11 | –0.32 | 0.42 | 0.50 | |
| 0.50 | –0.10 | –0.33 | 0.40 | 0.50 | |
| 0.75 | –0.08 | –0.34 | 0.39 | 0.50 | |
| 0.10 | –0.07 | –0.31 | 0.36 | 0.50 | |
| 0.25 | –0.06 | –0.32 | 0.35 | 0.50 | |
| 0.50 | –0.05 | –0.33 | 0.34 | 0.49 | |
| 0.75 | –0.05 | –0.33 | 0.32 | 0.49 |
Average width of 95% intervals across 1000 simulations.
| 0.10 | 0.52 | 0.66 | 0.46 | 0.53 | |
| 0.25 | 0.70 | 0.88 | 0.66 | 0.80 | |
| 0.50 | 0.93 | 1.15 | 0.89 | 1.06 | |
| 0.75 | 1.09 | 1.37 | 1.02 | 1.28 | |
|
| |||||
| 0.10 | 0.53 | 0.76 | 0.44 | 0.56 | |
| 0.25 | 0.70 | 0.95 | 0.63 | 0.78 | |
| 0.50 | 0.91 | 1.21 | 0.86 | 1.08 | |
| 0.75 | 1.07 | 1.44 | 1.02 | 1.31 |
Average coverage of the 95% intervals across 1000 simulations.
| 0.10 | 0.72 | 0.95 | 0.90 | 0.96 | |
| 0.25 | 0.81 | 0.95 | 0.91 | 0.94 | |
| 0.50 | 0.88 | 0.96 | 0.91 | 0.95 | |
| 0.75 | 0.90 | 0.96 | 0.93 | 0.95 | |
| 0.10 | 0.59 | 0.96 | 0.74 | 0.95 | |
| 0.25 | 0.74 | 0.94 | 0.86 | 0.95 | |
| 0.50 | 0.81 | 0.96 | 0.86 | 0.94 | |
| 0.75 | 0.85 | 0.96 | 0.87 | 0.94 |
Figure 6Posterior means (a) and coverage rates (b) for γ;
Figure 7Posterior means (a) and coverage rates (b) for β;