| Literature DB >> 34814732 |
Tam Tran1, W Tanner Porter2, Daniel J Salkeld3, Melissa A Prusinski4, Shane T Jensen5, Dustin Brisson1.
Abstract
Citizen science projects have the potential to address hypotheses requiring extremely large datasets that cannot be collected with the financial and labour constraints of most scientific projects. Data collection by the general public could expand the scope of scientific enquiry if these data accurately capture the system under study. However, data collection inconsistencies by the untrained public may result in biased datasets that do not accurately represent the natural world. In this paper, we harness the availability of scientific and public datasets of the Lyme disease tick vector to identify and account for biases in citizen science tick collections. Estimates of tick abundance from the citizen science dataset correspond moderately with estimates from direct surveillance but exhibit consistent biases. These biases can be mitigated by including factors that may impact collector participation or effort in statistical models, which, in turn, result in more accurate estimates of tick population sizes. Accounting for collection biases within large-scale, public participation datasets could update species abundance maps and facilitate using the wealth of citizen science data to answer scientific questions at scales that are not feasible with traditional datasets.Entities:
Keywords: Lyme disease vector; citizen science; community science; population trends
Mesh:
Year: 2021 PMID: 34814732 PMCID: PMC8611339 DOI: 10.1098/rsif.2021.0610
Source DB: PubMed Journal: J R Soc Interface ISSN: 1742-5662 Impact factor: 4.118
Figure 1Active tick surveillance in 2016–2017: each point represents a collection site where a minimum of 1000 m was surveyed at each site.
Figure 2The tick population size in each county correlates with the number of ticks collected by the public in both 2016 (a) and 2017 (b). The number of ticks submitted by the public ranked across counties (x-axis) was similar to the rank of tick population sizes estimated from active surveillance (y-axis) with the diagonal line showing perfect correspondence. The consistency in the discrepancies between the datasets across years can be illustrated using data from Cayuga and Nassau counties (red points) and Warren county (blue points) as examples. That is, counties such as Cayuga and Nassau have large tick populations but few ticks were submitted from the public. By contrast, counties such as Warren have smaller tick populations but high tick submissions from citizen science. The datasets corresponded more strongly in 2017 (Spearman ρ = 0.71, p = 4.1 × 10−9) than in 2016 (ρ = 0.53, p = 2.7 × 10−4). Ticks were submitted by the public from fewer counties in 2016 (43 counties) than in 2017 (56 counties), resulting in different axis lengths.
Figure 3Collector-associated factors can rectify consistent errors in citizen science datasets. Models built with tick submissions from citizen science as predictors (x-axis) can predict actual tick population sizes, as estimated by active surveillance (y-axis). A model using only citizen science data (a) exhibits moderate accuracy, with evenly distributed errors as the tick population sizes are both underpredicted and overpredicted. The addition of nine collector-associated factors without Lyme disease corrects biases in citizen science data resulting in a model (b) that accurately predicted tick abundance. Collector-associated factors improved underpredictions and overpredictions. For example, a randomly selected set of sites that are overpredicted by the citizen science data (the red points in (a)) are much more accurately predicted by the full model (red points representing the same counties are much closer to the diagonal line in (b)). Similarly, a randomly selected set of sites that are underpredicted by the citizen science data (blue points in (a)) are much more accurately predicted by the full model (blue points in (b)). Both axes represent total ticks per county on the natural log scale (e ≍ 2.718) for 2016 and 2017 estimates.
Regression models predicting the estimated number of ticks per NYS county.
| description of model | RMSE | AIC | |
|---|---|---|---|
| citizen science model: | 0.58 | 0.37 | 172 |
| median household income | 0.51 | 0.52 | 148 |
| mean temperature | 0.51 | 0.51 | 149 |
| population | 0.53 | 0.48 | 155 |
| % below poverty | 0.55 | 0.44 | 163 |
| Google trends | 0.55 | 0.43 | 164 |
| % white population | 0.56 | 0.43 | 164 |
| % bachelor's degree or higher | 0.57 | 0.39 | 170 |
| % children (0–14 years old) | 0.57 | 0.39 | 170 |
| county's median age | 0.58 | 0.38 | 172 |
| Lyme disease incidence rate | 0.58 | 0.38 | 172 |
| full model with Lyme disease: | 0.44 | 0.64 | 138 |
Figure 4Models built on citizen science tick submissions and collector-associated factors can be extrapolated across the northeastern USA ((a) 2016 and (b) 2017). The high predictive accuracy of the models in NYS suggests a powerful tool to estimate I. scapularis population sizes in the counties of nearby states. Predictions from the full model without Lyme disease capture tick population size variability both among counties and between years in the same counties across northeastern states. Tick population sizes are represented as a heat map, with darker colours representing larger population sizes. Grey represents counties with no citizen science tick submissions. Predictions were made using the full model without Lyme disease with the exception of the Google Trends predictor owing to the lack of these data at the appropriate resolution.