| Literature DB >> 28944022 |
Adriaan M Dokter1,2,3, E Emiel van Loon3, Wimke Fokkema4, Thomas K Lameris2, Bart A Nolet2,3, Henk P van der Jeugd1,2.
Abstract
A common problem with observational datasets is that not all events of interest may be detected. For example, observing animals in the wild can difficult when animals move, hide, or cannot be closely approached. We consider time series of events recorded in conditions where events are occasionally missed by observers or observational devices. These time series are not restricted to behavioral protocols, but can be any cyclic or recurring process where discrete outcomes are observed. Undetected events cause biased inferences on the process of interest, and statistical analyses are needed that can identify and correct the compromised detection processes. Missed observations in time series lead to observed time intervals between events at multiples of the true inter-event time, which conveys information on their detection probability. We derive the theoretical probability density function for observed intervals between events that includes a probability of missed detection. Methodology and software tools are provided for analysis of event data with potential observation bias and its removal. The methodology was applied to simulation data and a case study of defecation rate estimation in geese, which is commonly used to estimate their digestive throughput and energetic uptake, or to calculate goose usage of a feeding site from dropping density. Simulations indicate that at a moderate chance to miss arrival events (p = 0.3), uncorrected arrival intervals were biased upward by up to a factor 3, while parameter values corrected for missed observations were within 1% of their true simulated value. A field case study shows that not accounting for missed observations leads to substantial underestimates of the true defecation rate in geese, and spurious rate differences between sites, which are introduced by differences in observational conditions. These results show that the derived methodology can be used to effectively remove observational biases in time-ordered event data.Entities:
Keywords: fecal output; interval time series; missing data; mixture model; observation protocol; probability of detection
Year: 2017 PMID: 28944022 PMCID: PMC5606873 DOI: 10.1002/ece3.3281
Source DB: PubMed Journal: Ecol Evol ISSN: 2045-7758 Impact factor: 2.912
Figure 1Example probability density function for an observed interval distribution and its components. The area under each curve (shaded in gray for the fundamental component) equals (1 − f)*πi, with πi given by equation (2), and f for the (optional) exponential Poisson process component. The length of the bars at the top of the figure indicate the true and observed interval lengths, where the gray number in the bars indicates the number of consecutively missed arrivals i
Parameter retrieval for simulated interval data. In each simulation run, 100 interval observations were generated and parameters retrieved using equations (3) and (4)
| param | μ | σ | σwithin |
|
|---|---|---|---|---|
|
|
|
|
|
|
| Retrieved (eq. | 250 (6) | 49 (4) | – | 0.30 (0.05) |
| Uncorrected | 336 (16) | 158 (14) | – | – |
|
|
|
|
|
|
| Retrieved (eq. | 256 (11) | 56 (7) | 13 (4) | 0.23 (0.04) |
| Uncorrected | 333 (17) | 156 (15) | 140 (16) | – |
|
|
|
|
|
|
| Retrieved (eq. | 252 (8) | 52 (7) | 36 (4) | 0.25 (0.04) |
| Uncorrected | 336 (16) | 159 (14) | 148 (15) | – |
Means and standard deviations in brackets are given for retrieved parameters over 1000 runs. Significance for including a probability for missed detections (parameter p) and for separating within‐group variance (σwithin) was tested with a likelihood ratio test against a null model without these terms (p values denoted by stars: *<.05, **<.01, ***<.001).
Figure 2Observed defecation intervals for Brent Geese in May at two sites collected in a 2‐week period, Schiermonnikoog (top, saltmarsh site) and Terschelling (bottom, pasture site). The solid curve is a fit of (see eq. (4)) to the interval data. The probability to observe a defecation (1 − p) is higher at the pasture site
Comparison of interval models within sites
| Site | Model | Loglik |
| ΔAIC | Sign. |
|---|---|---|---|---|---|
| 1 |
| −430 | 4 | 0 | |
| 1 |
| −437 | 2 | 9 |
|
| 1 |
| −436 | 3 | 10 |
|
| 1 |
| −436 | 3 | 11 |
|
| 2 |
| −574 | 4 | 0 | |
| 2 |
| −576 | 3 | 1 | NS |
| 2 |
| −583 | 3 | 16 |
|
| 2 |
| −586 | 2 | 20 |
|
Models including both a missed event probability p and a random Poisson fraction f give the best fit (although inclusion of f was only significant at site 1). The best model for each site was used for subsequent between site comparisons. Deviance tests are against the best model for each site (p values denoted by stars: *<.05, **<.01, ***<.001).
Comparison of interval mean and standard deviation between sites (saltmarsh, n = 67, pasture n = 97 intervals)
| Param | Retrieved | Uncorrected | ||||
|---|---|---|---|---|---|---|
| Saltmarsh | Pasture | Diff | Saltmarsh | Pasture | Diff | |
| μ | 245 | 233 | NS | 341 | 269 |
|
| σ | 53 | 54 | NS | 186 | 123 |
|
| σwithin | 24 | 42 | NS | 176 | 113 |
|
|
| 0.27 | 0.14 | – | – | – | – |
|
| 0.20 | 0.05 | – | – | – | – |
Not accounting for a nonzero missed event probability p leads to underestimates of the mean, overestimates of the variance, and spurious significant differences in means and variances between sites (p values for site comparison denoted by stars: *<.05, **<.01, ***<.001, NS > .05).