| Literature DB >> 35343013 |
Mary E Lofton1, Jennifer A Brentrup2, Whitney S Beck3, Jacob A Zwart4, Ruchi Bhattacharya5, Ludmila S Brighenti6, Sarah H Burnet7, Ian M McCullough8, Bethel G Steele9, Cayelan C Carey1, Kathryn L Cottingham2, Michael C Dietze10, Holly A Ewing11, Kathleen C Weathers9, Shannon L LaDeau9.
Abstract
Near-term ecological forecasts provide resource managers advance notice of changes in ecosystem services, such as fisheries stocks, timber yields, or water quality. Importantly, ecological forecasts can identify where there is uncertainty in the forecasting system, which is necessary to improve forecast skill and guide interpretation of forecast results. Uncertainty partitioning identifies the relative contributions to total forecast variance introduced by different sources, including specification of the model structure, errors in driver data, and estimation of current states (initial conditions). Uncertainty partitioning could be particularly useful in improving forecasts of highly variable cyanobacterial densities, which are difficult to predict and present a persistent challenge for lake managers. As cyanobacteria can produce toxic and unsightly surface scums, advance warning when cyanobacterial densities are increasing could help managers mitigate water quality issues. Here, we fit 13 Bayesian state-space models to evaluate different hypotheses about cyanobacterial densities in a low nutrient lake that experiences sporadic surface scums of the toxin-producing cyanobacterium, Gloeotrichia echinulata. We used data from several summers of weekly cyanobacteria samples to identify dominant sources of uncertainty for near-term (1- to 4-week) forecasts of G. echinulata densities. Water temperature was an important predictor of cyanobacterial densities during model fitting and at the 4-week forecast horizon. However, no physical covariates improved model performance over a simple model including the previous week's densities in 1-week-ahead forecasts. Even the best fit models exhibited large variance in forecasted cyanobacterial densities and did not capture rare peak occurrences, indicating that significant explanatory variables when fitting models to historical data are not always effective for forecasting. Uncertainty partitioning revealed that model process specification and initial conditions dominated forecast uncertainty. These findings indicate that long-term studies of different cyanobacterial life stages and movement in the water column as well as measurements of drivers relevant to different life stages could improve model process representation of cyanobacteria abundance. In addition, improved observation protocols could better define initial conditions and reduce spatial misalignment of environmental data and cyanobacteria observations. Our results emphasize the importance of ecological forecasting principles and uncertainty partitioning to refine and understand predictive capacity across ecosystems.Entities:
Keywords: Bayesian model; algae; blooms; ecological forecasting; hindcast; lake; oligotrophic; phytoplankton; scums; state-space model; uncertainty partitioning; variance partitioning
Mesh:
Year: 2022 PMID: 35343013 PMCID: PMC9287081 DOI: 10.1002/eap.2590
Source DB: PubMed Journal: Ecol Appl ISSN: 1051-0761 Impact factor: 6.105
Terms associated with partitioning uncertainty in ecological models and forecasts
| Term | Definition | Example |
|---|---|---|
| Credible interval | Range of values within which a parameter or model prediction falls with a specified probability; does not include observation uncertainty | Range of chlorophyll |
| Driver data uncertainty | Uncertainty in the estimate or measurement of driver data (environmental predictors of the forecasted state) | Uncertainty in observations of soil temperature needed to drive a soil respiration model; uncertainty in weather forecasts |
| Hindcast | Predictions with specified uncertainty of a past time period using data (withheld from model calibration) that are iteratively assimilated into the model (Jolliffe & Stephenson, | Making model predictions for tick abundances observed 2 years ago using a model calibrated to observations from 10 years prior |
| Initial conditions uncertainty | Uncertainty associated with the starting conditions of a forecasting model run | Uncertainty in initial focal states, such as fish abundance, chlorophyll |
| Observation uncertainty | Difference between the observed data and the true (latent) state that the model is designed to predict; does not propagate forward, so it does not affect the credible interval | Calibration uncertainty in a temperature sensor; sampling uncertainty when estimating species abundance |
| Parameter uncertainty | Variance around the model parameter estimates | Uncertainty in the growth rate parameter in a timber yield model |
| Predictive interval | Range of values within which predicted observations are expected to fall with a specified probability; includes observation uncertainty in addition to other uncertainties and so is larger than the credible interval; should be used when comparing models to observed data | Range of chlorophyll |
| Process uncertainty | Uncertainty due to model specification (ecological processes that are simplified, absent, or incorrectly represented by the model) or inherent stochasticity in the system | Uncertainty arising from not including an important life history stage in a population growth model; uncertainty arising from demographic stochasticity in plankton communities |
| Random effects uncertainty | Uncertainty associated with estimation of random effects, which are used to describe shared variance across groups in space and time | Uncertainty in the value of a random site effect representing the degree to which beetle communities vary together across different sampling sites in a metacommunity model |
Note: Definitions are adapted from Dietze (2017a) unless otherwise specified.
FIGURE 1Directed acyclic graph (DAG) of a Bayesian state‐space model, where y is the observed cyanobacterial density at time t, x are driver data (physical covariates) at time t, m is the estimated true, or latent, cyanobacterial density at time t, β is a vector of parameters in the process model (slope, intercept, etc.), and τproc and τobs are the precisions of normal distributions representing process error and observation error, respectively. Parameters are modeled as distributions in the parameter model. Parameters, along with driver data, determine the predicted latent states in the process model, which are fitted to observations using the data model
FIGURE 2Map of Lake Sunapee, New Hampshire, USA with locator map (inset). Data from Site 1 were used for Bayesian state‐space models, data from Site 2 were used to inform priors for Site 1 models, and data from Site 3 provided lake‐level covariates for Site 1 models. Land use data was obtained from the New Hampshire Geographically Referenced ANalysis and Information transfer system (GRANIT) land cover assessment in 2001
List of Bayesian state‐space models and covariates used in hindcast analysis
| Model name | Model description | Process model | Covariates |
|---|---|---|---|
| RW | Random walk |
| |
| OffsetRW | Random walk with an offset constant |
| |
| AC | State‐space model with autocorrelation term |
| |
| BaseLM | State‐space model with both offset and autocorrelation |
| |
| MinWaterTemp | BaseLM with a single linear covariate |
| Minimum water temperature on sampling day |
| MinWaterTempLag | BaseLM with a single linear covariate |
| Minimum water temperature 1 week prior to the sampling day |
| WaterTempMA | BaseLM with a single linear covariate |
| 7‐day Moving average of water temperature including the sampling day |
| SchmidtLag | BaseLM with a single linear covariate |
| Maximum Schmidt stability 1 week prior to the sampling day |
| WindDir | BaseLM with a single linear covariate |
| Proportion of daily wind measurements blowing towards Site 1 with a 2‐day lag |
| GDD | BaseLM with a single quadratic covariate |
| Growing degree days |
| Schmidt+Wind | BaseLM with two linear covariates |
| Maximum Schmidt stability 1 week prior to the sampling day and proportion of daily wind measurements blowing towards Site 1 with a 2‐day lag |
| Temp+Wind | BaseLM with two linear covariates |
| Minimum water temperature on sampling day and proportion of daily wind measurements blowing towards Site 1 with a 2‐day lag |
| Wind+GDD | BaseLM with one linear and one quadratic covariate |
| Proportion of daily wind measurements blowing towards Site 1 with a 2‐day lag and growing degree days |
Notes: m is the latent state of Gloeotrichia echinulata density at time t, N() represents a normal distribution with mean and precision (τproc; precision is the inverse of the variance). X, x1, and x2 are environmental covariates in single‐covariate and two‐covariate models. represents parameters for the process model equations. Models that were fit to 2009–2014 data but subsequently excluded from the hindcast analysis due to coefficient estimates close to 0 can be found in Appendix S1: Table S4.
FIGURE 3Hindcasting workflow diagram. The dotted line indicates the transition between the data assimilation (updating model fits) and hindcasting (including uncertainty partitioning) stages of the workflow. Step 1: Data is assimilated by iteratively adding 1 week of Gloeotrichia echinulata and physical driver data to the model input dataset and re‐running SSMs in JAGS to update parameter estimates and initial state conditions (“today's” G. echinulata density) for hindcasts. Step 2: Draws from the posterior distribution for each week's updated initial condition (2a.1, 2a.2) and parameters (2b), as well as from the historical driver data time series in 2009–2014 (2c), are used to generate hindcast input data. During the first week of the sampling season initial conditions are drawn from an informed prior using data from Maine lakes (2a.1); during all other weeks of the season the initial conditions are drawn from the most recent estimated latent state of the re‐calibrated JAGS model output (2a.2). Step 3: Hindcast input data is fed into an uncertainty partitioning algorithm, outside of JAGS, which runs the process model 4 weeks into the future
Hindcasting results across models for the 2015–2016 hindcasting period
| RMSE, ln (colonies/L) | Predictive SD, ln (colonies/L) | Predictive loss, ln (colonies/L) | Δ predictive loss (ΔPL), ln (colonies/L) | Coverage (%) | Peak timing (timesteps) | Bias (colonies/L) | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Model name | 1 week | 4 weeks | 1 week | 4 weeks | 1 week | 4 weeks | 1 week | 4 weeks | 1 week | 4 weeks | 1 week | 4 weeks | 1 week | 4 weeks |
| RW | 1.89 | 2.23 | 1.63 | 3 | 2.5 | 3.74 | 0.25 | 1.36 | 97.2 | 100 | 1 | 14 | −0.43 | −0.97 |
| OffsetRW | 1.88 | 2.1 | 1.62 | 2.99 | 2.49 | 3.65 | 0.24 | 1.27 | 97.2 | 100 | 1 | 14 | −0.04 | 0.26 |
| AC | 1.68 | 1.5 | 1.5 | 2.21 | 2.26 | 2.67 | 0.01 | 0.29 | 100 | 100 | 1 | 14 | −0.65 | −1.18 |
| BaseLM | 1.67 | 1.61 | 1.51 | 2.09 | 2.25 | 2.64 | 0 | 0.26 | 94.4 | 100 | 1 | 14 | −0.92 | −1.51 |
| MinWaterTemp | 1.81 | 1.6 | 1.43 | 1.84 | 2.31 | 2.43 | 0.06 | 0.05 | 94.4 | 96.8 | 14 | 12 | −0.93 | −1.41 |
| MinWaterTempLag | 1.79 | 1.63 | 1.45 | 1.8 | 2.31 | 2.43 | 0.06 | 0.05 | 97.2 | 87.1 | 14 | 12 | −1 | −1.45 |
| WaterTempMA | 1.78 | 1.6 | 1.44 | 1.83 | 2.3 | 2.43 | 0.05 | 0.05 | 94.4 | 90.3 | 14 | 12 | −0.95 | −1.43 |
| SchmidtLag | 1.74 | 1.59 | 1.46 | 2.04 | 2.27 | 2.59 | 0.02 | 0.21 | 97.2 | 100 | 14 | 12 | −0.9 | −1.42 |
| WindDir | 1.77 | 1.55 | 1.5 | 1.99 | 2.32 | 2.52 | 0.07 | 0.14 | 94.4 | 100 | 1 | 14 | −0.96 | −1.51 |
| GDD | 1.83 | 1.59 | 1.43 | 1.84 | 2.32 | 2.43 | 0.07 | 0.05 | 94.4 | 96.8 | 14 | 14 | −1.08 | −1.4 |
| Schmidt+Wind | 1.82 | 1.55 | 1.46 | 1.98 | 2.33 | 2.51 | 0.08 | 0.13 | 94.4 | 100 | 14 | 13 | −0.93 | −1.43 |
| Temp+Wind | 1.88 | 1.56 | 1.43 | 1.8 | 2.37 | 2.38 | 0.12 | 0 | 91.7 | 93.5 | 14 | 14 | −0.95 | −1.42 |
| Wind+GDD | 1.88 | 1.55 | 1.43 | 1.82 | 2.36 | 2.39 | 0.11 | 0.01 | 91.7 | 93.5 | 14 | 13 | −0.92 | −1.35 |
| Ensemble | 1.80 | 1.56 | 1.46 | 1.89 | 2.32 | 2.45 | 0.07 | 0.07 | 97.2 | 96.8 | 14 | 14 | −0.97 | −1.43 |
Notes: RMSE, root mean square error; predictive variance, mean variance of the predictive interval; predictive loss, ; Δ predictive loss, the difference between predictive loss for each model and the best‐performing model for that forecast horizon; coverage, the percentage of observations falling within the 95% predictive interval; peak timing, the number of weeks between peak G. echinulata density during the hindcasting period and when the model predicted peak density; bias, mean difference between median predicted and observed values. Note that all assessment metrics are conducted on log‐transformed data except for mean bias.
FIGURE 4Time series of G. echinulata density at site 1 in Lake Sunapee from 2009 to 2016 (a, c); panels (b) and (d) show a reduced scale to better illustrate variability at low density
FIGURE 5Time series of median predicted and observed G. echinulata density (colonies/L) for 1‐week‐ahead hindcasts in 2015 for null models (a–d) and best‐performing models (d–e, as the null BaseLM model was also a best‐performing model; Table 3). Similar figures for 2016 hindcasts and models not shown here may be in found in the supplemental material (Appendix S1: Figures S11, S12 and S15)
FIGURE 6Time series of median predicted and observed G. echinulata density (colonies/L) for 4‐week‐ahead hindcasts in 2015 for null models (a–d) and best‐performing models (d–e, as the null BaseLM model was also a best‐performing model; Table 3). Similar figures for 2016 hindcasts and models not shown here may be in found in the supplemental material (Appendix S1: Figures S13, S14 and S15). Note the y‐axis change between Figures 5 and 6 to accommodate larger credible and predictive intervals at the 4‐week forecast horizon
FIGURE 7Uncertainty partitioning of the 1‐week‐ahead to 4‐week‐ahead hindcasts averaged across 2015–2016 for null models (a–d) and best‐performing models (d–e; Table 3). Similar figures for other models may be found in the supplemental material (Appendix S1: Figure S16)
FIGURE 8Uncertainty partitioning for (a) 1‐week‐ahead and (b) 4‐week‐ahead hindcasts averaged across the 2015–2016 hindcasting period across models. White triangles indicate a best‐performing model at the respective forecast horizon as assessed by change in predictive loss (ΔPL; Table 3)