| Literature DB >> 30895182 |
Lisa Combelles1, Fabien Corbiere1, Didier Calavas2, Anne Bronner3, Viviane Hénaux2, Timothée Vergne1.
Abstract
Risk factors are key epidemiological concepts that are used to explain disease distributions. Identifying disease risk factors is generally done by comparing the characteristics of diseased and non-diseased populations. However, imperfect disease detectability generates disease observations that do not necessarily represent accurately the true disease situation. In this study, we conducted an extensive simulation exercise to emphasize the impact of imperfect disease detection on the outcomes of logistic models when case reports are aggregated at a larger scale (e.g., diseased animals aggregated at farm level). We used a probabilistic framework to simulate both the disease distribution in herds and imperfect detectability of the infected animals in these herds. These simulations show that, under logistic models, true herd-level risk factors are generally correctly identified but their associated odds ratio are heavily underestimated as soon as the sensitivity of the detection is less than one. If the detectability of infected animals is not only imperfect but also heterogeneous between herds, the variables associated with the detection heterogeneity are likely to be incorrectly identified as risk factors. This probability of type I error increases with increasing heterogeneity of the detectability, and with decreasing sensitivity. Finally, the simulations highlighted that, when count data is available (e.g., number of infected animals in herds), they should not be reduced to a presence/absence dataset at the herd level (e.g., presence or not of at least one infected animal) but rather modeled directly using zero-inflated count models which are shown to be much less sensitive to imperfect detectability issues. In light of these simulations, we revisited the analysis of the French bovine abortion surveillance data, which has already been shown to be characterized by imperfect and heterogeneous abortion detectability. As expected, we found substantial differences between the quantitative outputs of the logistic model and those of the zero-inflated Poisson model. We conclude by strongly recommending that efforts should be made to account for, or at the very least discuss, imperfect disease detectability when assessing associations between putative risk factors and observed disease distributions, and advocate the use of zero-inflated count models if count data is available.Entities:
Keywords: bias; bovine abortion; logistic regression; risk factors; sensitivity; surveillance; zero-inflated Poisson model
Year: 2019 PMID: 30895182 PMCID: PMC6415588 DOI: 10.3389/fvets.2019.00066
Source DB: PubMed Journal: Front Vet Sci ISSN: 2297-1769
Parameter values used in the simulations.
| N | Number of epidemiological units | 10,000 | 10,000 | 10,000 | 10,000 |
| m | Mean number of cases in diseased epidemiological units | 4 | 1 to 17 | 4 | 1 to 17 |
| prev.2 | Probability of disease presence in epidemiological units that do not have the factor X1 | 0.2; 0.4; 0.6; 0.8 | 0.1; 0.2; 0.5 | 0.1; 0.2; 0.5 | 0.1; 0.2; 0.5 |
| OR(X1) | Odds ratio of the probability of disease presence based on X1 | 1 to 10 | 2; 5; 10 | 2; 5; 10 | 2; 5; 10 |
| Se.1 | Sensitivity of detection of diseased elementary units in epidemiological units that have the factor X2 | 0.01, 0.02, …,1 | 0.01, 0.02, …,1 | 0.1, 0.2, …,1 | 0.3; 0.6; 0.9 |
| Se.2 | Sensitivity of detection of diseased elementary units in epidemiological units that do not have the factor X2 | Se.2 = Se.1 | Se.2 = Se.1 | 0.1, 0.2, …,1 | 0.3; 0.6; 0.9 |
Figure 1Probability that the factor X2 is identified as a variable statistically significantly associated with the presence of at least one reported case (i.e., risk factor for “apparent” presence) by a logistic regression as a function of the detection sensitivity in epidemiological units with X2 = 0 or X2 = 1. This figure was done by simulating 500 datasets generated with prev.2 = 0.2, OR(X1) = 5 and m = 4.
Figure 2Probability that the factor X2 is identified as a variable statistically significantly associated with the average number of reported cases in epidemiological units with at least one case (i.e., risk factor for number of case reports given presence of at least one case) by a zero-inflated Poisson regression as a function of the detection sensitivity in epidemiological units with X2 = 0 or X2 = 1. This figure was done by simulating 500 datasets generated with prev.2 = 0.2, OR(X1) = 5 and m = 4.
Figure 3Relative bias of the odds ratio estimated by a logistic regression as a function of the detection sensitivity in epidemiological units with X2 = 0 or X2 = 1. This figure was done by simulating 500 datasets generated with prev.2 = 0.2, OR(X1) = 5 and m = 4.
Distribution of farm characteristics (production type and herd size) according to whether or not at least one abortion was reported during the period of interest.
| ≤ 7,686 bovine-day | 28,129 | 2,728 | 1,028 | 876 | 254 | 43 |
| >7,686 et ≤ 18,586 bovine-day | 13,680 | 9,753 | 2,236 | 1,678 | 4,868 | 835 |
| >18,586 bovine-day | 11,895 | 6,605 | 4,742 | 2,721 | 5,067 | 2,858 |
Figure 4Distribution of the number of reported abortions per farm.
Results from the bovine abortion zero-inflated Poisson regression.
| Production type | Beef | Reference | Reference | ||
| Mixed | 0.90 | 0.63–1.29 | < 0.01 | ||
| Dairy | 1.97 | 1.65–2.37 | |||
| Herd size | ≤7,686 | Reference | Reference | ||
| (bovine-day) | >7,686 and ≤ 18,586 | 2.35 | 2.00–2.78 | < 0.01 | |
| >18,586 | 3.30 | 2.81–3.87 | |||
| Production type | Mixed | >7,686 and ≤ 18,586 | 5.78 | 2.36–14.16 | |
| & Herd size | Dairy | >7,686 and ≤ 18,586 | 7.86 | 4.58–13.48 | |
| (bovine-day) | Mixed | >18,586 | 6.94 | 2.89–16.69 | < 0.01 |
| Dairy | >18,586 | 8.79 | 5.17–14.93 | ||
| Production type | Beef | reference | reference | ||
| Mixed | 1.60 | 1.52–1.69 | < 0.01 | ||
| Dairy | 1.75 | 1.67–1.83 | |||
| Herd size | ≤7,686 | reference | reference | ||
| (bovine-day) | >7,686 and ≤18,586 | 2.05 | 1.78–2.35 | < 0.01 | |
| >18,586 | 3.21 | 2.80–3.69 | |||
OR, Odds ratio; IRR, incidence rate ratio; 95%CI, 95% confidence interval; LRT, likelihood ratio test.
Results of the bovine abortion logistic regression.
| Production type | Beef | Reference | Reference | ||
| Mixed | 1.35 | 0.95–1.85 | < 0.01 | ||
| Dairy | 2.96 | 2.53–3.44 | |||
| Herd size | ≤7,686 | Reference | Reference | ||
| (bovine-day) | >7,686 and ≤18,586 | 3.91 | 3.58–4.28 | < 0.01 | |
| > 18,586 | 7.40 | 6.81–8.04 | |||
| Production type | Mixed | >7,686 and ≤ 18,586 | 11.93 | 5.56–25.54 | |
| & Herd size | Dairy | >7,686 and ≤ 18,586 | 16.22 | 10.77–24.45 | < 0.01 |
| (bovine-day) | Mixed | >18,586 | 19.28 | 9.13–40.75 | |
| Dairy | >18,586 | 24.91 | 16.66–37.25 | ||
OR, Odds ratio; 95%CI, 95% confidence interval; LRT, likelihood ratio test.
Figure 5Receiver operating characteristic curve for the logistic model implemented on the bovine abortion dataset. The dotted line represents the diagonal (Sensitivity = 1-Specificity).