| Literature DB >> 34437625 |
Cole Heasley1, J Johanna Sanchez1, Jordan Tustin1, Ian Young1.
Abstract
Monitoring of fecal indicator bacteria at recreational waters is an important public health measure to minimize water-borne disease, however traditional culture methods for quantifying bacteria can take 18-24 hours to obtain a result. To support real-time notifications of water quality, models using environmental variables have been created to predict indicator bacteria levels on the day of sampling. We conducted a systematic review of predictive models of fecal indicator bacteria at freshwater recreational sites in temperate climates to identify and describe the existing approaches, trends, and their performance to inform beach water management policies. We conducted a comprehensive search strategy, including five databases and grey literature, screened abstracts for relevance, and extracted data using structured forms. Data were descriptively summarized. A total of 53 relevant studies were identified. Most studies (n = 44, 83%) were conducted in the United States and evaluated water quality using E. coli as fecal indicator bacteria (n = 46, 87%). Studies were primarily conducted in lakes (n = 40, 75%) compared to rivers (n = 13, 25%). The most commonly reported predictive model-building method was multiple linear regression (n = 37, 70%). Frequently used predictors in best-fitting models included rainfall (n = 39, 74%), turbidity (n = 31, 58%), wave height (n = 24, 45%), and wind speed and direction (n = 25, 47%, and n = 23, 43%, respectively). Of the 19 (36%) studies that measured accuracy, predictive models averaged an 81.0% accuracy, and all but one were more accurate than traditional methods. Limitations identifed by risk-of-bias assessment included not validating models (n = 21, 40%), limited reporting of whether modelling assumptions were met (n = 40, 75%), and lack of reporting on handling of missing data (n = 37, 70%). Additional research is warranted on the utility and accuracy of more advanced predictive modelling methods, such as Bayesian networks and artificial neural networks, which were investigated in comparatively fewer studies and creating risk of bias tools for non-medical predictive modelling.Entities:
Mesh:
Year: 2021 PMID: 34437625 PMCID: PMC8389397 DOI: 10.1371/journal.pone.0256785
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1PRISMA flow diagram of article selection.
Summary characteristics of models extracted from 53 relevant articles that created predictive models of FIB using environmental variables.
| Authors and year of publication | Location of recreational waters | Number of beaches and swimming seasons | Predictors explored in study | Predictors in at least one final model in study | Type of model | Model validation | Performance metrics | Recommendations/conclusions of study | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Anderson, Kendall W (2019) [ | Lake Michigan, Chicago, Illinois, USA | 19 Beaches 4 Seasons | Rainfall <24 hr, Turbidity | Rainfall <24 hr, Turbidity | Decision tree | Original dataset | Dry and wet conditions | Decision tree can reduce the need for qPCR testing in some conditions and issue early advisories. |
| 2 | Avila, Rodelyn, Horn, Beverley, Moriarty, Elaine, Hodson, Roger, Moltchanova, Elena (2018) [ | Oreti river, Wallacetown, New Zealand | 1 Sampling site | Rainfall 24hr, Rainfall 48 hr, Previous day [FIB], Discharge/ flow | Rainfall 24hr, Rainfall 48 hr, Previous day [FIB], Discharge/ flow | Multiple Linear Regression (MLR), Bayesian modelling, Tree regression/ random forest, Markov chain, Log-linear regression, Logistic regression, Discriminant analysis, Classification tree | Bootstrapping/ cross- validation (Leave one out and k-fold cross-validation) | Sensitivity: Dynamic regression = 62%, MLR = 62.5%, regression tree = 68%, random forest = 80%, Bayesian network = 95%, classification tree = 68%, linear discriminant analysis = 74%, Markov chain = 0%, logistic regression = 74%, quadratic discriminant analysis = 86%, random forest classification = 71% | Bayesian Networks shown to be most useful tool for prediction. |
| 3 | Bachmann-Machnik, Anna, Dittmer, Ulrich, Schoenfeld, Annika (2019) [ | Lake Baldeney/ Ruhr River, North-Rhine-Westphalia, Germany | 1 Beach | Rainfall 24hr, Sewer outflow [FIB], Discharge/ flow, Combined sewer overflow duration | No final model presented | Univariate regression | Original dataset | All R2 < 0.3, none presented | Overflow events at one combined sewer outflow do not necessarily result in exceedances. |
| 4 | Brady, Amie M G, Bushon, Rebecca N, Plona, Meg B (2009) [ | Cuyahoga River, Cuyahoga Valley National Park, Ohio, USA | 4 Sampling sites | Rainfall 24hr, Turbidity, Discharge/ flow | Rainfall 24hr, Turbidity, Discharge/ flow | Multiple Linear Regression, Univariate regression | Temporal validation (new season) | Sensitivity: Site1 = 94%, Site2 = 100%, Sites3&4 = 73%-91% | One predictive model outperformed persistence model, the other did not. Two models generalized from the first model at other locations did not outperform persistence models. |
| 5 | Brady, Amie M G, Plona, Meg B (2009) [ | Cuyahoga River, Cuyahoga Valley National Park, Ohio, USA | 4 Sampling sites | Rainfall 24hr, Turbidity, Water level | Turbidity | Univariate regression | Temporal validation (new season) | Accuracy = 77% | Turbidity model predicted |
| 6 | Brady, Amie M.G., Plona, Meg B. (2015) [ | Cuyahoga River, Ohio, USA | 2 Sampling sites | Rainfall 24hr, Rainfall 48 hr, Water temperature, Turbidity, Discharge/ flow | Rainfall 48 hr, Turbidity | Multiple Linear regression | Temporal validation (new season) | Models for most recent year. | Automatic predictions implemented, recommend continuing nowcast system and further studies along river. |
| 7 | Brady, Amie MG, Plona, Meg B (2012) [ | Cuyahoga River, Cuyahoga Valley National Park, Ohio, USA | 3 Sampling sites | Rainfall 24hr, Rainfall 48 hr, Turbidity, Discharge/ flow, Water level | Rainfall 48 hr, Turbidity, Discharge/ flow, Water level | Multiple Linear Regression | Temporal validation (new season) | Sensitivity = 88% and 100% | Predictive models performed better than persistence models in first two years, but not last year, possibly due to excess precipitation. |
| 8 | Brooks, Wesley R., Fienen, Michael N., Corsi, Steven R. (2013) [ | Lake Erie and Lake Michigan, Cleveland and Toledo, Ohio, Port Washington, Wisconsin, USA | 4 Beaches | Rainfall 24hr, Rainfall 48 hr, Air Temperature, Water temperature, Wave height, Solar radiation, Barometric pressure, Turbidity, Wind speed, Wind direction, Relative humidity, Discharge/ flow, Algae index, Bird count, Lake level, Month, Day of year, Sub-season | Rainfall 24hr, Rainfall 48 hr, Air temperature, Water temperature, Wave height, Solar radiation, Barometric pressure, Turbidity, Wind speed, Wind direction, Relative humidity, Discharge/ flow, Algae index, Bird count, Lake level, Month, Day of year | Multiple Linear Regression (partial least squares) | Temporal validation (new season) | Sensitivity and specificity used but values not listed. | Partial least squares automates the model building process and compares favorably to the other regression models. |
| 9 | Brooks, Wesley, Corsi, Steven, Fienen, Michael, Carvin, Rebecca (2016) [ | Chequamegon Bay, Lake Superior and Lake Michigan, Manitowoc County, Wisconsin, USA | 7 Beaches | Rainfall <24 hr, Rainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Air Temperature, Water temperature, Wave height, Turbidity, Wind speed, Wind direction, Discharge/ flow, Conductivity, Current speed, Current direction, Wave direction, Cloud cover, Bird count, Bather count, Algae presence, Day of year | Rainfall <24 hr, Rainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Air Temperature, Water temperature, Wave height, Turbidity, Wind speed, Wind direction, Discharge/ flow, Conductivity, Current speed, Current direction, Wave direction, Cloud cover, Bird count, Bather count, Algae accumulation, Day of year | Multiple Linear Regression (partial least squares and sparse partial least squares), Tree regression/ random forest, Logistic regression, Adaptive LASSO, Gradient boosting | Bootstrapping/ cross validation | AUC (AUROC): Gradient boosting cross-validation tree estimate = 0.76, Gradient boosting out-of-bag tree estimate = 0.75, MLR Adaptive LASSO = 0.73 and 0.72, Logistic regression adaptive LASSO = 0.68, 0.65, 0.63, and 0.62, Sparse partial least squares = 0.70 and 0.70, Partial Least Squares = 0.66, MLR genetic algorithm = 0.65, Logistic regression genetic algorithm = 0.60 and 0.58 | Of 14 regression methods, a random forest model was the most accurate. |
| 10 | Corsi, Steven R, Borchardt, Mark A, Carvin, Rebecca B, Burch, Tucker R, Spencer, Susan K, Lutz, Michelle A, McDermott, Colleen M, Busse, Kimberly M, Kleinheinz, Gregory T, Feng, Xiaoping, Zhu, Jun (2016) [ | Lake Michigan, Wisconsin, USA | 3 Beaches | Rainfall >24 hr, Rainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Air Temperature, Water temperature, Wave height, Turbidity, Discharge/ flow, Conductivity, Water level, Current speed, Current direction, Cloud cover, Algae abundance | NA | NA | NA | Fecal indicator bacteria model not presented | NA |
| 11 | Cyterski, M, Zhang, S, White, E, Molina, M, Wolfe, K, Parmar, R, Zepp, R (2012) [ | Lake Michigan, Milwaukee, Wisconsin, USA | 1 Beach | Air Temperature, Water temperature, Turbidity, Wind speed, Wind direction, Relative humidity, Conductivity, pH, Water level, Chloride, [NH4+], [NO3-] | Air Temperature, Water temperature, Turbidity, Wind speed, Wind direction, Relative humidity, Conductivity, pH, [NH4+], [NO3-], Water level | Multiple Linear Regression | Bootstrapping/ cross validation | Mean square error of prediction = 1.85, 3.67, and 2.71 | Temporal synchronization analysis of environmental predictors improved the predictive regression models. |
| 12 | Dada, Ayokunle Christopher, Hamilton, David P (2016) [ | Lake Rotorua, North Island, New Zealand | 3 Beaches | Rainfall 72+ hr, Barometric pressure, Wind speed, Wind direction, Discharge/ flow, Total nitrogen, Total phosphorus, Distance from lake exit, Suspended solids, Particulate inorganic phosphorus | Rainfall 72+ hr, Wind speed, Distance from Lake exit, Particulate inorganic phosphorus | Multiple Linear Regression | Temporal validation (2 new seasons) | Sensitivity = 0%-50% | Models worked well, could be used for guiding swimming advisories. |
| 13 | Francy, Donna S., Gifford, Amie M., and Darner, Robert A. (2003) [ | Lake Erie and Mosquito Lake, Cleveland, Huntington Reservation, Lake County, and Mosquito Lake State Park, Ohio, USA | 6 Beaches | Rainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Water temperature, Wave height, Previous day [FIB], Solar radiation, Turbidity, Wind speed, Wind direction, Discharge/ flow, Conductivity, Bird count, Day of year, Water level, Current direction, Days since last rainfall | Rainfall 24hr, Rainfall 72+ hr, Wave height, Previous day [FIB], Turbidity, Wind direction, Discharge/ flow, Bird count, Current direction, Day of year, Days since last rainfall | Multiple Linear Regression | Original Dataset | R2 = 0.17–0.58 | Models were beach specific, future research could test created models in future years, and test whether adding subsequent years’ data improves the models. |
| 14 | Francy, D.S., Brady, A.M.G., Carvin, R.B., Corsi, S.R., Fuller, L.M., Harrison, J.H., Hayhurst, B.A., Lant, J., Nevers, M.B., Terrio, P.J., Zimmerman, T.M. (2013) [ | Lake Michigan, Lake Erie, Lake Ontario, and Lake Superior, Illinois, Indiana, Michigan, New York, Ohio, Pennsylvania, and Wisconsin, USA | 49 Beaches | Rainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Air Temperature, Water temperature, Wave height, Solar radiation, Barometric pressure, Turbidity, Wind speed, Wind direction, Relative humidity, Discharge/ flow, Conductivity, pH, Chlorophyll a, Day of year, Bird count, Debris assessment, Dissolved O2, Wave period, Sub-season, Current direction, Current speed, Cloud cover, Water level, Algae category, Bather count, Weather category | Rainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Air temperature, Water temperature, Wave height, Solar radiation, Barometric pressure, Turbidity, Wind speed, Wind direction, Relative humidity, Discharge/ flow, Conductivity, Chlorophyll a, Day of year, Bird count, Debris assessment, Current direction, Cloud cover, Water level, Sub-season, Algae category, Bather count, Weather category | Multiple Linear Regression | Temporal validation (new season) | Sensitivity = 0%-100% | 24 of the 42 models performed at least 5% more accurately than persistence models. |
| 15 | Francy, Donna S, Darner, Robert A (2007) [ | Lake Erie, Cleveland and Huntington Reservation, Ohio, USA | 3 Beaches | Rainfall 24hr, Rainfall 48 hr, Water temperature, Wave height, Water level, Number of wet and dry days, Day of year | Rainfall 24hr, Rainfall 48 hr, Water temperature, Wave height, Day of year | Multiple Linear Regression | Temporal validation (new season) | Sensitivity = 53.3%, 50%, 32.6% | Additional data was added to refine models. An online nowcast system implemented. |
| 16 | Francy, Donna S, Stelzer, Erin A, Duris, Joseph W, Brady, Amie M G, Harrison, John H, Johnson, Heather E, Ware, Michael W (2013) [ | Inland lakes, Ohio, USA | 13 Beaches | Rainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Water temperature, Wave height, Solar radiation, Turbidity, Wind speed, Wind direction, Discharge/ flow, Conductivity, Bird count, Bather count, Water level, Day of year | Rainfall 48 hr, Rainfall 72+ hr, Water temperature, Wave height, Turbidity, Wind speed, Wind direction, Discharge/ flow, Bather count, Water level, Bird count, Day of year | Multiple Linear Regression | Temporal validation (new season) | Sensitivity = 0%- 62.5% | Three of nine site models had better accuracy, sensitivity, and specificity than persistence models, notably at two lakes with higher swimmer densities. |
| 17 | Francy, Donna S., Bertke, Erin E., Darner, Robert A. (2009) [ | Lake Erie, Huntington Reservation and Edgewater State Park, Ohio, USA | 2 Beaches | Rainfall 24hr, Rainfall 48 hr, Water temperature, Wave height, Solar radiation, Turbidity, Wave period, Water level, Days since last rain, Bather count, Day of year | Rainfall 24hr, Rainfall 48 hr, Wave height, Turbidity, Water level, Day of year | Multiple Linear Regression | Original dataset | Sensitivity = 57.1% and 31.7% | The predictive model at one beach outperformed persistence model, but the predictive model at the other beach did not. |
| 18 | Francy, Donna S., Darner, Robert A., Bertke, Erin E. (2006) [ | Lake Erie, Lorain, Huntington Reserve, Cleveland, Ohio, USA | 5 Beaches | Rainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Water temperature, Wave height, Turbidity, Wind direction, Bird count, Water level Day of year, | Rainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Water temperature, Wave height, Turbidity, Wind direction, Water level, Day of year | Multiple Linear Regression | Original dataset | Sensitivity: Threshold method = 59.1%-92.9%, Predicted [ | The best model made better predictions than persistence models and predictions were made available online. |
| 19 | Francy, Donna S., Darner, Robert A. (2003) [ | Lake Erie, and Mosquito Lake, Cleveland, Huntington Reservation, Ohio, USA | 4 Beaches | Rainfall 24hr, Rainfall 72+ hr, Water temperature, Wave height, Solar radiation, Turbidity, Wind speed, Wind direction, Discharge/ flow, Bird count, Day of year, Current direction | Rainfall 24hr, Rainfall 72+ hr, Wave height, Turbidity, Wind speed, Wind direction, Discharge/ flow, Bird count, Day of year, Current direction | Multiple Linear Regression | Original dataset | R2 = 0.32–0.41 | Predition error too high to accurately predict |
| 20 | Frick, W.E (2006) | Lake Erie, Huntington Beach, Cleveland, Ohio, USA | 1 Beach | Unknown | Unknown | Multiple Linear Regression | Unknown | R2 and Mallow’s Cp used but not values not reported | Tested the Virtual Beach program at a beach, showing the program can be helpful for creating predictive models. |
| 21 | Frick, Walter E., Ge, Zhongfu, Zepp, Richard G. (2008) [ | Lake Erie, Huntington Beach, Cleveland, Ohio, USA | 1 Beach | Rainfall 24hr, Rainfall 48 hr, Air Temperature, Water temperature, Wave height, Solar radiation, Turbidity, Wind speed, Wind direction, Cloud cover, Dew point, Precipitation potential, Rainfall intensity | Rainfall 24hr, Wave height, Turbidity, Wind speed, Wind direction, Cloud cover, Dew point, Rainfall intensity | Multiple Linear Regression | Original dataset | Adjusted R2 = 0.457–0.610 | Dynamic model built off of small amounts of data compare to static models built with more data. |
| 22 | Hatfield, Nancy Lee Clark (2000) [ | Lake Erie and an artificial lake, Maumee Bay State Park, Ohio, USA | 2 Beaches | Air Temperature, Water temperature, Wave height, Turbidity, Wind speed, Wind direction, Bather count, Bird count, Current direction, Wave direction, weather category, Days since last rain | Air temperature, Water temperature, Turbidity, Wind speed, Wind direction, Bird count, Bather count, Days Since last rain | Multiple Linear Regression | Original dataset | No predictive model for inland lake found. | A reliable model was created for the Lake Eire beach but not for the artificial lake. |
| 23 | He, Cheng, Post, Yvonne, Dony, John, Edge, Tom, Patel, Mahesh, Rochfort, Quintin (2016) [ | Lake Ontario, Toronto, Ontario, Canada | 1 Beach | Rainfall <24 hr, Air Temperature, Water temperature, Wave height, Turbidity, Wind speed, Wind direction, Discharge/ flow, Bird count, Water level, Current speed, Current direction | Rainfall <24 hr, Turbidity, Wind speed, Wind direction, Discharge/ flow | Decision tree | Temporal validation (2 new seasons) | Accuracy = 76% and 78% | Model performed better than previously developed linear regression model and persistence model. |
| 24 | Heberger, Matthew G, Durant, John L, Oriel, Kimberly A, Kirshen, Paul H, Minardi, Lee (2008) [ | Mystic River watershed, Boston, Massachusetts, USA | 1 Sampling site | Rainfall <24 hr, Discharge/ flow, Time since last rainfall | Rainfall <24 hr, Discharge/ flow, Time since last rainfall | Multiple Linear Regression | Temporal validation (new season) | R2 (calibration) = 0.42 | Predictive models showed good agreement with models developed for other systems, and showed model perform well with rivers. |
| 25 | Herrig, Ilona, Seis, Wolfgang, Fischer, Helmut, Regnery, Julia, Manz, Werner, Reifferscheid, Georg, Boeer, Simone (2019) [ | Rhine and Moselle rivers, Rhineland-Palatinate, Germany | 2 Beaches | Rainfall 24hr, Rainfall 72+ hr, Water temperature, Solar radiation, Turbidity, Discharge/ flow, Conductivity, pH, Chlorophyll a, Dissolved O2 | Rainfall 24hr, Solar radiation, Discharge/ flow | Bayesian modelling | Temporal validation (new season) | R2: Site1 = 0.73, Site2 = 0.55 | Whether microbial interactions in the river are driven by hydro-meteorological factors or trophic/biotic level factors plays an important role in modelling and outcomes variables. |
| 26 | Hong, Yi, Soulignac, Frederic, Roguet, Adelaide, Li, Chenlu, Lemaire, Bruno J, Martins, Rodolfo Scarati, Lucas, Francoise, Vincon-Leite, Brigitte (2021) [ | Lake Créteil, Créteil, Valde-Marne, France | 3 Sampling sites | Rainfall >24 hr, Air Temperature, Water temperature, Sewer outflow [FIB], Solar radiation, Barometric pressure, Wind speed, Wind direction, Relative humidity, Discharge/ flow, Water level, Cloud cover | Rainfall >24 hr, Air temperature, Wind speed, Wind direction, Relative humidity, Air temperature, Cloud cover | Hydrodynamic modelling | Temporal validation (new season) | R2 = 0.89 NSE coefficient ≥ 0.7 for water flow simulations | Accurate predictions show promise for hydrodynamic modelling in stormwater systems and lakes. |
| 27 | Jones, Rachael M, Liu, Li, Dorevitch, Samuel (2013) [ | Lake Michigan, Chicago, Illinois, USA | 3 Beaches | Rainfall <24 hr, Rainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Solar radiation, Water level, Time since last rain, Rain intensity | Rainfall 48 hr, Rainfall 72+ hr, Solar radiation, Time since last rain, Intensity of rainfall | Linear mixed effects model | Division of original dataset | Ecoli: Sensitivity = 23% and 42% | Predictive models performed with good accuracy but low sensitivity. |
| 28 | Madani, M, Seth, R (2020) [ | Lake St. Clair, Windsor, Ontario, Canada | 1 Beach | Rainfall <24 hr, Rainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Air Temperature, Water temperature, Wave height, Turbidity, Wind speed, Wind direction, Cloud cover, Weather category, Bird count | Rainfall <24 hr, Rainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Air temperature, Water temperature, Wave height, Turbidity, Wind speed, Wind direction, Weather Category, Bird count | Multiple Linear Regression | Temporal validation (new season) | Sensitivity = 30%-78% | The predictive model outperformed persistence models. Models built using two, three, and four years of data, with the model built using two years being marginally better. |
| 29 | Maimone, Mark, Crockett, Christopher S, Cesanek, William E (2007) [ | Schuylkill River, Philadelphia, Pennsylvania, USA | 12 Sampling sites | Turbidity, Discharge/ flow, Time since last rainfall | Turbidity, Discharge/ flow, Time since last rainfall | Decision tree | Temporal validation (new season) | Accuracy = 66% | Early testing shows model can be accurate, and when it is inaccurate, it is overly cautious, more data will be added to algorithm when available. |
| 30 | Mälzer, H.-J., aus der Beek, T, Müller, S, Gebhardt, J (2016) [ | Lake Baldeney/ Ruhr River, Northrhine-Westphalia, Germany | 1 Beach | Rainfall <24 hr, Rainfall 24hr, Air Temperature, Water temperature, Solar radiation, Turbidity, Discharge/ flow, Conductivity, pH, Total and dissolved organic carbon, Spectral adsorption coefficients at 254 and 436nm, [NH4+], [NO2], [NO3-], ortho- and total-phosphate, dissolved O2, Days since last rainfall | Air temperature, Water temperature, Turbidity, pH, Spectral adsorption at 254 and 436nm, [NO3-], [NH4+], Dissolved O2, Days since last rainfall, some ANN variables not listed | Multiple Linear Regression, Artificial neural networks, Deterministic (hydrodynamic) models, Logistic models | Unknown | Sensitivity: Single regression = 53%-100%, MLR = 67%-100%, ANN = 89%-100%, Logistic = 80%-100% | ANN was the most accurate model, but accuracy varied across stretches of the river. |
| 31 | Marion, Jason W (2011) [ | Inland lakes, Ohio, USA | 7 Beaches | Water temperature, Turbidity, Carlson’s Trophic Index (calculated by Chlorophyll a, Total phosphorus, Secchi depth), Phycocyanin, Dissolved O2 | Total phosphorus, Carlson’s Trophic Index (calculated by Total phosphorus or mean of 3 index measures from Phosphorus, Secchi depth, and Chlorophyll a), Phycocyanin | Logistic regression | Original dataset | AUROC: TP = 0.7050, Phycocyanin = 0.6398, TSI-TP = 0.6875, TSI-mean = 0.7203 | Improved sensitivity is desired to reduce false negatives, but model can be useful for real-time estimates of fecal indicators. |
| 32 | Molina, M., Cyterski, Mike, Whelan, G., Zepp, R. (2014) | A Great Lake beach, USA | 1 Beach | Unknown | Unknown | Multiple Linear regression | Unknown | Sensitivity, specificity, R2 used but values not listed | Onsite data provided better predictive accuracy than publicly available data. |
| 33 | Motamarri, Srinivas, Boccelli, Dominic L (2012) [ | Charles River Basin, Massachusetts, USA | 1 Sampling site | Rainfall 24hr, Rainfall 48 hr, Previous day [FIB], Solar radiation, Discharge/ flow, Rainfall intensity, Rime since last rainfall | Rainfall 48 hr, Rainfall 72+ hr, Discharge/ flow, Rainfall intensity, Time since last rainfall (rainfall of >0.25 inches and >0.5inches) | Multiple Linear Regression, Artificial neural networks, Learning vector quantization (LVQ) | Backward elimination, LVQ used variance gained method and determinant gain method for 2 models. Top 5 variables chosen for all models. | Sensitivity: | ANN and LVQ performed similarly, with LQV performing better with less variables included in the model. |
| 34 | Nevers, Meredith B, Shively, Dawn A, Kleinheinz, Gregory T, McDermott, Colleen M, Schuster, William, Chomeau, Vinni, Whitman, Richard L (2009) [ | Lake Michigan, Green Bay, Sturgeon bay, Door County, Wisconsin, USA | 24 Beaches | Rainfall 48 hr, Air Temperature, Water temperature, Previous day [FIB], Barometric pressure, Wind speed, Wind direction, Bird count, Water level, Wave period, Algae accumulation | Rainfall 48 hr, Water temperature, Wave height, Previous day [FIB], Barometric pressure, Wind speed, Wind direction, Bird count, Water level, Algae accumulation | Tree regression/ random forest | Original dataset | R2 (adjusted R2) = 0.318 (0.315), 0.251 (0.247), 0.195 (0.184) | Models affected by generally low |
| 35 | Nevers, Meredith B, Whitman, Richard L (2005) [ | Lake Michigan, Indiana Dunes National Park, Indiana, USA | 5 Beaches | Rainfall <24 hr, Rainfall 24hr, Air Temperature, Water temperature, Solar radiation, Barometric pressure, Turbidity, Wind speed, Wind direction, Relative humidity, Discharge/ flow, Conductivity, pH, Chlorophyll a, Dew point, Wind gust, Dissolved O2, Water level, Wave period, Wave direction, Colour, Cloud cover, Current speed, Current direction | Rainfall <24 hr, Wave height, Turbidity, Chlorophyll a, Wave period | Multiple Linear Regression | Original dataset | R2: North wind = 0.6335, South wind = 0.320, North and south = 0.465 | Predictive models more accurate than persistence models. Variation better explained at beach level models. |
| 36 | Nevers, Meredith B, Whitman, Richard L, Frick, Walter E, Ge, Zhongfu (2007) [ | Lake Michigan, Indiana Dunes National Lakeshore, Indiana, USA | 2 Beaches | Rainfall 24hr, Air Temperature, Water temperature, Wave height, Turbidity, Wind speed, Discharge/ flow, Conductivity, pH, Chlorophyll a, Dissolved O2, Water level, Dew point, Cloud cover, Current direction, Current speed, Spectral adsorption coefficients at 254 and 436nm, Wave period | Wave height, Barometric pressure, Turbidity, Wind speed, Wind direction, Conductivity, Wave period | Multiple Linear Regression | Original dataset | R2 = 0.722 and 0.504 | Predictive models had less error than persistence models. Able to model beaches with multiple outfall sources. |
| 37 | Nevers, Meredith B., Whitman, Richard L. (2008) [ | Lake Michigan, Indiana, USA | 12 Beaches | Rainfall <24 hr, Air Temperature, Water temperature, Wave height, Previous day [FIB], Solar radiation, Barometric pressure, Turbidity, Wind speed, Wind direction, Discharge/ flow, Conductivity, pH, Chlorophyll a, Colour, Dissolved O2, Wave height, Dew point, Current direction, Cloud cover | Rainfall <24 hr, Wave height, Turbidity, Wind direction, Discharge/ flow | Multiple Linear Regression | Whole lake model used a division of the original dataset, while beach specific models used whole original dataset | R2: All beaches = 0.48, Beach specific = 0.34–0.57 | Many beaches along the same coastline may be able to be modelled as if they were one beach. |
| 38 | Olyphant, G A (2005) [ | Lake Michigan, Indiana Dunes state park, Indiana, and Lake County, Illinois, USA | 4 Beaches | Rainfall <24 hr, Rainfall 24hr, Air Temperature, Water temperature, Wave height, Solar radiation, Wind speed, Wind direction, Water level, Streamflow [FIB], Time sample collected | Rainfall 24hr, Air temperature, Water temperature, Wave height, Solar radiation, Wind speed, Wind direction, Water level, Streamflow [FIB], Time sample collected | Multiple Linear Regression (Ordinary and generalized-least squares) | Original dataset | R2 = 0.65–0.76 | Predictive models outperformed persistence models. Model was still 90% accurate even in extreme high or low cases. |
| 39 | Olyphant, Greg A, Whitman, Richard L (2004) [ | Lake Michigan, Chicago, Illinois, USA | 1 Beach | Rainfall <24 hr, Rainfall 24hr, Rainfall 48 hr, Air Temperature, Water temperature, Wave height, Solar radiation, Turbidity, Wind speed, Wind direction, Conductivity, pH, Water level, Wave frequency, Dissolved O2 | Rainfall 24hr, Water temperature, Solar radiation, Turbidity, Wind speed, Wind direction, Water level | Multiple Linear Regression | Original dataset | Sensitivity = 93% | Model was accurate at predicting exceedances. Large array of instrumentation tested directly on the beach. |
| 40 | Parkhurst, David F, Brenner, Kristen P, Dufour, Alfred P, Wymer, Larry J (2005) [ | Lake Michigan, Indiana, and Detroit river, Michigan, USA | 5 Beaches | Air Temperature, Water temperature, Wave height, Previous day [FIB], Wind speed, Wind direction, Sunny (Y/N), Bather count, Cloud cover, Current direction, Water level, Day of week, Time sample collected | Air Temperature, Water temperature, Wave height, Previous day [FIB], Wind speed, Wind direction, Sunny (Y/N), Bather count, Cloud cover, Current direction, Water level, Day of week, Time sample taken | Tree regression/ random forest | Temporal validation (new season) | [ | Tree regression a useful tool for exploratory analysis. Predictive model worked poorly at predicting the raw values of |
| 41 | Rossi, Alessandra, Wolde, Bernabas T., Lee, Lee H., Wu, Meiyin (2020) [ | Passaic and Pompton rivers, New Jersey, USA | 1 Sampling site | Rainfall <24 hr, Rainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Air Temperature, Water temperature, Turbidity, Conductivity, pH, Chlorophyll a, Dissolved O2, [NO3], Dissolved organic carbon | Rainfall 72+ hr, Conductivity, pH | Logistic regression | Bootstrapping/ cross validation | R2 = 0.23 and 0.41 | Model shows a probabilistic measure of exceedance likelihood. Bagging technique improves reliability of model. |
| 42 | Safaie, Ammar, Wendzel, Aaron, Ge, Zhongfu, Nevers, Meredith B, Whitman, Richard L, Corsi, Steven R, Phanikumar, Mantha S (2016) [ | Lake Michigan, Indiana Dunes National Park, Indiana, USA | 3 Beaches | Water temperature, Solar radiation, Turbidity, Discharge/ flow, Conductivity, Current speed, Current direction | Water temperature, Solar radiation, Turbidity, Current speed, Current direction | Multiple Linear Regression and Hydrodynamic modelling | Original dataset | R2: Statistical model = 0.749 and 0.710, Mechanistic = 0.603 and 0.722 | The cooperative modeling approach of using statisitical models and hydrodynamic models to improve model building of the other lead to models with good predictive power that can generate real-time forecasts. |
| 43 | Seis, W, Zamzow, M, Caradot, N, Rouault, P (2018) [ | River Havel, Berlin, Germany | 1 Sampling site | Rainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Discharge/ flow | Rainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Discharge/ flow | Bayesian modelling | Temporal validation (2 new seasons) | Leave-one-out cross-validation information criterion = 177, 195, 191 and assessed graphically | A methodology for an early warning system, including probabilistic alert levels were developed. The model provides solutions to the current alert system. |
| 44 | Shively, Dawn A, Nevers, Meredith B, Breitenbach, Cathy, Phanikumar, Mantha S, Przybyla-Kelly, Kasia, Spoljaric, Ashley M, Whitman, Richard L (2016) [ | Lake Michigan, Chicago, Illinois, USA | 9 Beaches | Rainfall <24 hr, Rainfall 24hr, Rainfall 48 hr, Air Temperature, Water temperature, Wave height, Solar radiation, Barometric pressure, Turbidity, Wind speed, Wind direction, Relative humidity, Wave period, Water level, Day of year | Air temperature, Wave height, Solar radiation, Barometric pressure, Turbidity, Wind direction, Wind speed, Day of year | Multiple Linear Regression | Temporal validation (new season) | Sensitivity = 0%-36% | Fully automated water quality system used for input into predictive model that outperformed the persistence model. Interannual model refinement improved performance. |
| 45 | Simmer, Reid A (2016) [ | F.W. Kent Park Lake, Oxford, Iowa, USA | 1 Beach | Rainfall <24 hr, Rainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Air Temperature, Water temperature, Wave height, Solar radiation, Turbidity, Wind speed, Wind direction, Relative humidity, pH, Dissolved O2, Wave direction, Bird count, Bather count, Concentration of goose droppings, Algae presence, Day of year | Rainfall 72+ hr, Water temperature, Wind speed, Wind direction, pH, Dissolved O2, Wave direction, Bather count, Goose dropping concentration, Day of year | Multiple Linear Regression | Bootstrapping/ cross-validation | Sensitivity: 4yr = 60.00%, 2015 = 66.67% | Both predictive models created were more accurate than persistence models. |
| 46 | Telech, Justin W, Brenner, Kristen P, Haugland, Rich, Sams, Elizabeth, Dufour, Alfred P, Wymer, Larry, Wade, Timothy J (2009) [ | Lake Erie and Lake Michigan, Bay Village, Ohio, Indiana Dunes National Lakeshore and Michigan City, Indiana, and St. Joseph, Michigan | 4 Beaches | Rainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Air Temperature, Water temperature, Wave height, Turbidity, Wind speed, Wind direction, pH, Cloud cover, Bather count, Bird count, Boat count, Time sample collected | Rainfall 24hr, Rainfall 48 hr, Water temperature, Wave height, Turbidity, Wind speed, Wind direction, pH, Cloud cover, Bather count, Bird count, Boat count, Time sample collected | Multiple Linear Regression | Original dataset | R2 with different | Both models did not perform well at predicting |
| 47 | Uejio, Christopher K, Peters, Theodore W, Patz, Jonathan A (2012) [ | Geneva Lake, Wisconsin, United States | 5 Beaches | Rainfall 24hr, Rainfall 72+ hr, Air Temperature, Wind speed, Wind direction, Discharge/ flow, Cloud cover, Days since last rainfall, Month, Sampling time | Rainfall 24hr, Rainfall 72+ hr, Wind speed, Wind direction, Discharge/ flow, Month, Sampling time, Cloud cover | Bayesian modelling | Original dataset | Sensitivity = 0%-54% | Predictive models at some of the beaches had good accuracy and could support decisions. |
| 48 | Wang, Leizhi, Zhu, Zhenduo, Sassoubre, Lauren, Yu, Guan, Liao, Chen, Hu, Qingfang, Wang, Yintang (2020) [ | Lake Erie, Erie county, New York, USA | 3 Beaches | Rainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Air Temperature, Water temperature, Wave height, Barometric pressure, Turbidity, Wind speed, Wind direction, Discharge/ flow, Water level, Bird count, Algae category, Debris category, Fecal matter category, Odor (Y/N), Combined sewer overflow (Y/N), Day of year, Wave direction, Cloud cover, Current speed, current direction | Rainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Air Temperature, Water temperature, Wave height, Barometric pressure, Turbidity, Wind speed, Wind direction, Discharge/ flow, Water level, Bird count, Algae category, Debris category, Fecal matter category, Odor (Y/N), Combined sewer overflow (Y/N), Day of year, Wave direction, Cloud cover, Current speed, Current direction | Model stacking of these outputs: Multiple Linear Regression (including Partial least squares, sparse partial least squares), Bayesian modelling, Tree regression/ random forest | Bootstrapping/ cross-validation (leave one year out cross-validation) | Accuracy = 78%, 81%, and 82.3% | A model stacking approach improved robustness of prediction power, with random forest contributing the most weight in the model. |
| 49 | Wendzel, Aaron (2014) [ | Lake Michigan, Indiana Dunes National Lakeshore, Indiana, USA | 3 Beaches | Solar radiation, Conductivity, Current speed, Current direction, decay rate | Solar radiation, Conductivity, Current speed, Current direction, decay rate | Hydrodynamic modelling | Original dataset | RMSE = 0.600, 0.647, and 0.809 | Model could accurately simulate FIB concentrations at beaches using unstructured grids. |
| 50 | Whitman, R L, Nevers, M B (2008) [ | Lake Michigan, Chicago, Illinois, USA | 23 Beaches | Air Temperature, Water temperature, Wave height, Solar radiation, Barometric pressure, Wind speed, Wind direction, Day of year | Wave height, Barometric pressure, Day of year | Multiple Linear Regression | Original dataset | Adjusted R2 = 0.20–0.41 | Beaches geographically close to each other had correlated |
| 51 | Zhang, Juan, Qiu, Han, Li, Xiaoyu, Niu, Jie, Neyers, Meredith B, Hu, Xiaonong, Phanikumar, Mantha S (2018) [ | Lake Michigan, Indiana Dunes National Park, Indiana, USA | 3 Beaches | Rainfall 24hr, Water temperature, Wave height, Previous day [FIB], Solar radiation, Turbidity, Wind speed, Wind direction, Discharge/ flow, Conductivity, Past [FIB] beyond one day | Rainfall 24hr, Water temperature, Wave height, Turbidity, Wind speed, Discharge/ flow, Past [FIB] beyond one day | Artificial neural networks: Nonlinear input-output (NIO), nonlinear autoregressive neural network (NAR), nonlinear autoregressive network with exogenous inputs (NARX), NAR + discrete wavelet transform (WA-NAR) | Division of original dataset | Sensitivity: NIO = 0, 1, 1 | NARX performed the best, with WA-NAR in second but requiring no explanatory variables. All models were comparable to or outperformed other predictive models previously built the these beaches. |
| 52 | Zimmerman, Tammy M (2008) [ | Presque Isle Beach 2, City of Erie, Pennsylvania, USA | 1 Beach | Rainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Water temperature, Wave height, Discharge/ flow, Turbidity, Wind speed, Wind direction, Conductivity, pH, Dissolved O2, Bird count, Current speed and direction | Wave height, Turbidity, Bird count | Multiple Linear regression | Temporal validation (new season) | Sensitivity = 50.0% | Predictive models outperformed persistence models, notably in the models using the previous two seasons only. |
| 53 | Zimmerman, Tammy M (2006) [ | Lake Erie, City of Erie, Pennsylvania, USA | 1 Beach | Rainfall 72+ hr, Water temperature, Wave height, Turbidity, Discharge/ flow, Conductivity, pH, Bird count, Debris category, Boat count, Dissolved O2, Current direction, Current speed | Rainfall 72+ hr, Wave height, Turbidity, Wind direction | Multiple Linear regression | Original dataset | R2: 2004 = 0.54, 2005 = 0.71, Both = 0.64 | Predictive models were able to predict non-exceedances well, but performed worse at predicting exceedances. |
a If different lengths of time were used at different locations, the highest number of seasons is presented. Only seasons used in model building were included, entire seasons used for model validation are not included in this count.
b Statistics for validation of models used over calibration data when available.
* Conference proceeding, only an abstract was available.
Definitions: Area under the curve (AUC), Area under the receiver operator curve (AUROC), Root mean squared error (RMSE), Percent bias (PBIAS), Least absolute shrinkage and selection operator (LASSO), Nash-Sutcliffe efficiency (NSE), Fourier transform (Fn).
Fig 2Frequency of the number of swimming seasons used in building models.
Fig 3Frequency of publication types.
Fig 4Frequency of the location of beaches.
Fig 5Frequency of the number of beaches, or sampling sites if beaches not provided.
Modelling techniques for creating the predictive models present in 53 relevant studies.
| Model characteristics | Number of studies | % of total studies |
|---|---|---|
| Modelling technique | ||
| Multiple linear regression | 37 | 70% |
| Tree regression and/or random forests | 6 | 11% |
| Logistic regression | 5 | 9% |
| Bayesian networks | 5 | 9% |
| Deterministic/ hydrodynamic modelling | 4 | 8% |
| Artificial neural networks | 3 | 6% |
| Univariate regression | 3 | 6% |
| Decision tree | 3 | 6% |
Fig 6Frequency of environmental variables explored in studies and frequency of variables included final models.
Risk of bias checklist summary for 53 relevant studies.
| Study design and criteria | Number of studies | % of studies |
|---|---|---|
| Source of predictor data | ||
| Government data | 37 | 70% |
| Collected by researchers | 28 | 53% |
| Collected by other researchers | 8 | 15% |
| Conservation Authorities | 3 | 6% |
| Other | 1 | 2% |
| Not clear | 6 | 11% |
| Method for selecting predictors for multivariate modeling | ||
| All included | 27 | 51% |
| Virtual Beach | 7 | 13% |
| Preselected based on significant association with FIB | 6 | 11% |
| Only univariate modeling performed | 3 | 6% |
| Preselected based on | 2 | 4% |
| Other | 4 | 8% |
| Not clear | 5 | 9% |
| Predictor selection method for inclusion in final model | ||
| All possible variable combinations created, and final model chosen by model fit characteristics (e.g., R2, RMSE) | 13 | 25% |
| Full model approach | 10 | 19% |
| Akaike’s Information Criterion | 7 | 13% |
| Virtual Beach | 7 | 13% |
| Backward selection | 4 | 8% |
| Forward selection | 3 | 6% |
| Univariate model | 3 | 6% |
| Bayesian Information Criterion | 1 | 2% |
| Other | 7 | 13% |
| Not clear | 5 | 9% |
| Model performance measure | ||
| R2 or adjusted R2 | 32 | 60% |
| Sensitivity | 26 | 49% |
| Specificity | 25 | 47% |
| Accuracy | 19 | 32% |
| Root mean squared error | 8 | 15% |
| Area under the curve or area under the receiver operator curve | 4 | 8% |
| False negative or positive rate | 3 | 6% |
| Fourier transform | 2 | 4% |
| Percent bias | 2 | 4% |
| Nash-Sutcliffe efficiency | 2 | 4% |
| Mallow’s Cp | 2 | 4% |
| Other | 15 | 28% |
| Model validation method | ||
| Fitting to original dataset | 21 | 40% |
| Temporal validation (new seasons) | 20 | 38% |
| Bootstrapping/ cross-validation | 6 | 11% |
| Division of original dataset | 3 | 6% |
| Geographical validation (new beaches/ sites) | 1 | 2% |
| Not clear | 2 | 4% |
| Were predictor weights or regression coefficients shrunk at all? | ||
| Yes | 1 | 2% |
| No | 52 | 98% |
| Are modeling assumptions satisfied? | ||
| Yes | 12 | 23% |
| No | 1 | 2% |
| Not clear | 40 | 75% |
| Handling of predictors in modelling | ||
| Categorized | 20 | 38% |
| Log-transformed | 18 | 34% |
| Weighted days | 11 | 21% |
| Square roots | 4 | 8% |
| Other transformations | 5 | 9% |
| Handling of missing data | ||
| Left as missing | 9 | 17% |
| Remove predictors with missing data | 3 | 6% |
| Data replaced with data from nearby sensor or sample collection | 3 | 6% |
| Remove days with missing predictor data | 1 | 2% |
| Autocorrelation and partial autocorrelation | 1 | 2% |
| Not clear | 37 | 70% |
| Were predictor distributions compared between calibration and validation datasets? | ||
| Yes | 3 | 6% |
| No | 50 | 94% |