| Literature DB >> 30499218 |
Shawn D Taylor1, Joan M Meiners1, Kristina Riemer2, Michael C Orr3, Ethan P White2,4.
Abstract
Large-scale observational data from citizen science efforts are becoming increasingly common in ecology, and researchers often choose between these and data from intensive local-scale studies for their analyses. This choice has potential trade-offs related to spatial scale, observer variance, and interannual variability. Here we explored this issue with phenology by comparing models built using data from the large-scale, citizen science USA National Phenology Network (USA-NPN) effort with models built using data from more intensive studies at Long Term Ecological Research (LTER) sites. We built statistical and process based phenology models for species common to each data set. From these models, we compared parameter estimates, estimates of phenological events, and out-of-sample errors between models derived from both USA-NPN and LTER data. We found that model parameter estimates for the same species were most similar between the two data sets when using simple models, but parameter estimates varied widely as model complexity increased. Despite this, estimates for the date of phenological events and out-of-sample errors were similar, regardless of the model chosen. Predictions for USA-NPN data had the lowest error when using models built from the USA-NPN data, while LTER predictions were best made using LTER-derived models, confirming that models perform best when applied at the same scale they were built. This difference in the cross-scale model comparison is likely due to variation in phenological requirements within species. Models using the USA-NPN data set can integrate parameters over a large spatial scale while those using an LTER data set can only estimate parameters for a single location. Accordingly, the choice of data set depends on the research question. Inferences about species-specific phenological requirements are best made with LTER data, and if USA-NPN or similar data are all that is available, then analyses should be limited to simple models. Large-scale predictive modeling is best done with the larger-scale USA-NPN data, which has high spatial representation and a large regional species pool. LTER data sets, on the other hand, have high site fidelity and thus characterize inter-annual variability extremely well. Future research aimed at forecasting phenology events for particular species over larger scales should develop models that integrate the strengths of both data sets.Entities:
Keywords: Long Term Ecological Research; USA National Phenology Network; budburst; data integration; flowering; forecasting; scale
Mesh:
Year: 2018 PMID: 30499218 PMCID: PMC7378950 DOI: 10.1002/ecy.2568
Source DB: PubMed Journal: Ecology ISSN: 0012-9658 Impact factor: 5.499
Figure 1Locations of USA National Phenology Network sites used (black points) and Long Term Ecological Research sites (LTER; labeled circles), with gray scale showing elevation.
LTER data sets used in the analysis
| Data set name | Habitat | Phenological event (no. species) | Source |
|---|---|---|---|
| Harvard Forest | Northeast deciduous forest | Budburst (17) Flowering (7) | O'Keefe ( |
| Jornada Experimental Range | Chihuahuan Desert | Flowering (2) | |
| H.J. Andrews Experimental Forest | Northwest wet coniferous forest | Budburst (5) Flowering (4) | Schulze ( |
| Hubbard Brook | Northeast deciduous forest | Budburst (3) | Bailey ( |
Phenology models used in the analysis
| Name | DOY estimator | Forcing equations | Total parameters | Source |
|---|---|---|---|---|
| Naive |
| – | 1 | – |
| Linear |
| – | 2 | – |
| GDD |
|
| 3 | Réaumur ( |
| Fixed GDD |
|
| 1 | Réaumur ( |
| Alternating |
|
| 3 | Cannell and Smith ( |
| Uniforc |
|
| 4 | Chuine ( |
| M1 |
|
| 4 | Blümel and Chmielewski ( |
| MSB |
|
| 4 | Jeong et al. ( |
For all models, except the Naive and Linear models, the daily mean temperature is first transformed via the specified forcing equation. The cumulative sum of forcing is then calculated from a specific start date (either or using the fitted parameter ). The phenological event is estimated as the in which cumulative forcing is greater than or equal to the specified total required forcing (either or the specified equation). Parameters for each model are as follows: for the Naive model is the mean day of year of a phenological event; for the Linear model, and are the intercept and slope, respectively, and is the average daily temperature between 1 January and 31 March; for the GDD model is the total accumulated forcing required, is the start date of forcing accumulation, and is the threshold daily mean temperature above which forcing accumulates; for the Fixed GDD model, is the total accumulated forcing required; for the Alternating model, is the number of chill days (daily mean temperature below 0°C) from to the of the phenological event, , , and are the three fitted model coefficients; for the Uniforc model, is the total accumulated forcing required, is the start date of forcing accumulation, and and are two additional fitted parameters that define the sigmoid function; the M1 model is the same as the GDD model, but with the additional fitted parameter that adjusts the total forcing accumulation according to day length; the MSB model is the same as the Alternating model, but with the additional fitted parameter to correct the model according to mean spring temperature.
Figure 2Comparisons of parameter estimates between USA National Phenology Network (USA‐NPN) and LTER derived models. Each point represents a parameter value for a specific species and phenophase and is the mean value from 250 bootstrap iterations. The black line is the 1:1 line. The is the coefficient of determination, which can be negative if the relationship between the two parameter sets is worse than no relationship but with the same mean values. Models and parameters are defined in Table 2.
Figure 3Comparison of predicted day of year (DOY) of all phenological events between USA‐NPN and LTER‐derived models. Top panels show comparisons at LTER sites and bottom panels show comparisons at USA‐NPN sites. Each point is an estimate for a single held‐out observation. Colors indicate observations for a single species and phenophase combination.
Figure 4Differences in prediction error between USA‐NPN and LTER‐derived models. Density plots for comparisons of predictions on LTER data (top row) and USA‐NPN data (bottom row). Each plot represents the difference between the RMSE for LTER‐derived model and the USA‐NPN derived model, meaning that values less than zero indicate more accurate prediction by LTER‐derived models and values greater than zero indicate more accurate prediction by NPN‐derived models. P < 0.001 for all t tests. Differences are calculated pairwise for the 38 species/phenophase comparisons.
Attributes of the two data sets used in this study
| Parameter |
|
|
|---|---|---|
| Time‐series length |
|
|
| Spatial extent | Low |
|
| Local species representation | High | Low |
| Regional/Continental species representation | Low |
|
| Number of observers |
|
|
| Site fidelity | High | Low |
Bold text indicates an attribute is expected to increase over time.