Literature DB >> 34437625

Systematic review of predictive models of microbial water quality at freshwater recreational beaches.

Cole Heasley1, J Johanna Sanchez1, Jordan Tustin1, Ian Young1.   

Abstract

Monitoring of fecal indicator bacteria at recreational waters is an important public health measure to minimize water-borne disease, however traditional culture methods for quantifying bacteria can take 18-24 hours to obtain a result. To support real-time notifications of water quality, models using environmental variables have been created to predict indicator bacteria levels on the day of sampling. We conducted a systematic review of predictive models of fecal indicator bacteria at freshwater recreational sites in temperate climates to identify and describe the existing approaches, trends, and their performance to inform beach water management policies. We conducted a comprehensive search strategy, including five databases and grey literature, screened abstracts for relevance, and extracted data using structured forms. Data were descriptively summarized. A total of 53 relevant studies were identified. Most studies (n = 44, 83%) were conducted in the United States and evaluated water quality using E. coli as fecal indicator bacteria (n = 46, 87%). Studies were primarily conducted in lakes (n = 40, 75%) compared to rivers (n = 13, 25%). The most commonly reported predictive model-building method was multiple linear regression (n = 37, 70%). Frequently used predictors in best-fitting models included rainfall (n = 39, 74%), turbidity (n = 31, 58%), wave height (n = 24, 45%), and wind speed and direction (n = 25, 47%, and n = 23, 43%, respectively). Of the 19 (36%) studies that measured accuracy, predictive models averaged an 81.0% accuracy, and all but one were more accurate than traditional methods. Limitations identifed by risk-of-bias assessment included not validating models (n = 21, 40%), limited reporting of whether modelling assumptions were met (n = 40, 75%), and lack of reporting on handling of missing data (n = 37, 70%). Additional research is warranted on the utility and accuracy of more advanced predictive modelling methods, such as Bayesian networks and artificial neural networks, which were investigated in comparatively fewer studies and creating risk of bias tools for non-medical predictive modelling.

Entities:  

Mesh:

Year:  2021        PMID: 34437625      PMCID: PMC8389397          DOI: 10.1371/journal.pone.0256785

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Between 2000 and 2014, 140 outbreaks were reported in 35 states and a territory in the United States (U.S.) in untreated recreational water sources, leading to 4958 cases of waterborne disease, with 84% of the outbreaks associated with a lake, pond, or reservoir [1]. However, when accounting for non-outbreak linked cases, underreporting, and missing state data, the estimate for total water-borne illness from recreational surface waters in the U.S. is around 90 million cases annually, costing $2.2-$3.7 billion USD in healthcare services [2]. Routine monitoring for water-borne pathogens is infeasible at recreational beaches, therefore, fecal indicator bacteria (FIB) are sampled as a marker of potential pathogen concentrations and risk of infection to bathers. There are many pathogens that are spread via recreational water use that can cause recreational water illness, including enteric viruses (e.g. norovirus, adenovirus) and bacterial and protozoal pathogens (e.g. Campylobacter, Salmonella, Cryptosporidium) [3, 4]. E. coli is often used as the indicator for the presence of these pathogens in freshwater beaches [5]. Enterococcus is occasionally used as an indicator in addition to or in place of E. coli, most commonly in marine waters [6-8]. E. coli is often a preferred indicator in freshwater sources due to its strong association with the risk of gastrointestinal illness in bathers [5, 9]. Decisions on whether to close or post beaches as potentially unsafe for swimming due to water quality concerns are conducted by public health officials or other beach managers. Traditionally, these decisions are based on evaluating whether FIB levels in beach waters exceed health-action threshold values. This approach has been termed the “persistence model” of beach management, because it typically relies on culture-based laboratory assessments of FIB counts which require 18–24 hours to obtain a result, leading beach managers to make water quality decisions using the previous day’s measurements. More modern genetic techniques, such as qPCR, can achieve results in 3–4 hours, but are costly for beach management and laboratories to run daily [10]. Some beach managers have moved to forecasting FIB levels using predictive models. These models typically use environmental inputs such as temperature, precipitation, and turbidity to predict FIB levels at beaches on a given day, which can then be validated and assessed with the subsequent FIB lab results [11, 12]. A wide variety of predictive modelling methods have been used at recreational beaches; including multiple linear regression [13, 14], artificial neural networks [15], and Bayesian networks [16]. These models use local weather and environmental data, collected from various sources, that are associated with FIB concentrations in the water [6, 17]. Given the variety of predictive modelling approaches and applications published to-date, there is a need to identify and describe existing approaches, trends, and their accuracy to inform beach water management policies. The purpose of this systematic review was to identify and summarize modelling methods used, where they have been applied, and their performance in correctly predicting beach water quality to support management decisions (e.g., posting a beach as unsuitable for swimming due to poor water quality). The review was conducted as part of a larger study to examine environmental influences on freshwater beach quality in Canada. Therefore, we have focused the scope on models developed for freshwater, recreational sites located in a temperate climate. To our knowledge, no systematic review exists on predictive models of fecal indicator bacteria at freshwater recreational sites in temperate climates.

Methods

Review question and eligibility criteria

The protocol for this review was created in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) Protocol 2015 checklist [18]. The remainder of this review was written using the PRISMA 2020 statement [19]; a PRISMA checklist is located in S1 Table. A review protocol was developed a priori following Cochrane Collaboration review guidelines (see S1 Protocol) [20]. However, the protocol was not registered with any databases. The research questions were: 1) what types of predictive models were created for predicting FIB concentrations based on environmental variables for freshwater beach management decisions? 2) which predictors were included in these models? 3) how accurate are the models in determining if recreational water quality exceeds guideline recommendations? Our eligibility criteria followed the PECO approach: Population, Exposure, Comparison, and Outcome. Our population of interest included freshwater beaches in temperate climates that are used for recreational purposes. Therefore, we excluded models focusing on coastal and estuarial waters, and waters not used for recreation (drinking water sources). Our exposure of interest included environmental data that can be collected in real time to support beach water monitoring, such as weather parameters and water conditions. We included models that compared accuracy to their original dataset, to persistent models, and that used other validation methods (e.g., bootstrapping). Our outcome of interest was FIB levels. Models predicting algal blooms were excluded. We included publications reporting on the development and/or evaluation of predictive models, reported in journal articles, conference proceedings, thesis and dissertations, and government reports. Reviews and commentary articles were excluded.

Search strategy

We designed a comprehensive search strategy in collaboration with a research librarian. The following databases were used to search for relevant articles: Medline via OVID, SciTech Premium, Scopus, Web of Science, and ProQuest Dissertations and Thesis Global. The search terms used in each database are provided in S2 Table. As an example of the search terms used, the search in Scopus was: (Escherichia coli OR enterococc* OR fecal indicator bacteria) AND (regression analysis OR predict* OR nowcast* OR forecast* OR model*) AND ("fresh water" OR recreational water OR beach* OR lake OR river) AND (weather OR monitor* OR rain* OR environmental). All articles published until the search date, December 15, 2020, were included with no publication date restrictions. A grey literature search was also conducted and involved searching nine targeted government websites from December 10–14, 2020. A list of websites searched is available in the S3 Table. To ensure all relevant publications were captured, reference lists of relevant articles were hand-searched for additional potentially relevant articles.

Relevance screening

Citations identified by the searches were stored in a Mendeley database (Elsevier, Amsterdam, Netherlands), deduplicated, and then uploaded into DistillerSR (Evidence Partners, Ottawa, Canada). All articles were independently screened twice by CH and JS in two levels of screening: title and abstract screening (Level 1) and full article screening (Level 2). Level 1 screening involved the question: Is this reference potentially relevant to our review? (Yes/No). Level 2 involved three screening questions: Is this article about microbial water quality? (e.g., measuring E. coli, Enterococcus). Is this article about freshwater, recreational beaches in a temperate region? Does this article report on a predictive model for beach water quality using environmental data? (Yes/No for all). Beaches were defined as any site intended for primary water contact activities (e.g., swimming, wading, water sports) to capture all recreational water sites. All screening forms were created prior to any screening and pre-tested by two reviewers screening 50 articles and discussing discrepancies. Pre-testing of Level 1 screening resulted in a kappa score of 0.76, after which the reviewers discussed their conflicts and agreed to proceed with independent reviewing after improving clarity on how to apply the eligibility criteria. Questions for level 2 were discussed prior to screening and tested on five articles by both reviewers to ensure consistent interpretation and clarity of the questions.

Data characterization and extraction

Articles passing the screening process were obtained as full-texts and data were extracted using a pre-specified and pre-tested form. Data were extracted by CH into a form in DistillerSR, which can be found in S5 Table. The form included 20 questions that collected information such as location details of beaches, length of study, type of predictive model, variables explored in making the model, performance metrics of the model, and risk-of-bias. Data extraction results were independently validated by JS.

Risk-of-bias assessment and data analysis

Risk of bias of each relevant article was assessed using the CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS) [21]. We adapted the checklist from human health predictive models to environmental modelling. We considered “participants” to be beach days, and questions relating to human health were removed (e.g., details of treatments, blinding outcomes). Of 21 CHARMS questions, 10 were included in the data extraction form. Questions included sources of data, blinding predictors from outcomes, number and handling of missing data, predictor selection method, predictor transformations, and model validation methods and performance measures. Due to a priori knowledge that many studies collect data from government sources, predictor measurement methods were not included. CHARMS does not score studies based on bias, therefore, we did not determine an overall risk-of-bias score or rating for each study. Data from DistillerSR were downloaded in Excel (Microsoft, Redmond, United States) for analysis, which consisted of descriptive summary tabulations. Data visualizations were also created in Excel. While we report on performance metrics, we do not draw conclusions on validity nor compare models to each other. Meta-analysis was not deemed appropriate for this review given that predictive modelling approaches and performance metrics varied widely across studies.

Results

Of 1710 unique citations identified in the search, 53 relevant studies were identified and included in the review (Fig 1). A descriptive summary of the model types, variables, and performances from each relevant study is presented in Table 1.
Fig 1

PRISMA flow diagram of article selection.

Table 1

Summary characteristics of models extracted from 53 relevant articles that created predictive models of FIB using environmental variables.

Authors and year of publicationLocation of recreational watersNumber of beaches and swimming seasons aPredictors explored in studyPredictors in at least one final model in studyType of modelModel validationPerformance metrics bRecommendations/conclusions of study
1Anderson, Kendall W (2019) [22]Lake Michigan, Chicago, Illinois, USA19 Beaches 4 SeasonsRainfall <24 hr, TurbidityRainfall <24 hr, TurbidityDecision treeOriginal datasetDry and wet conditionsSensitivity = 73.54% and 75.83%Specificity = 56.29% and 55.29%AUC = 69.10%, and 69.30%Decision tree can reduce the need for qPCR testing in some conditions and issue early advisories.
2Avila, Rodelyn, Horn, Beverley, Moriarty, Elaine, Hodson, Roger, Moltchanova, Elena (2018) [23]Oreti river, Wallacetown, New Zealand1 Sampling site9 SeasonsRainfall 24hr, Rainfall 48 hr, Previous day [FIB], Discharge/ flowRainfall 24hr, Rainfall 48 hr, Previous day [FIB], Discharge/ flowMultiple Linear Regression (MLR), Bayesian modelling, Tree regression/ random forest, Markov chain, Log-linear regression, Logistic regression, Discriminant analysis, Classification treeBootstrapping/ cross- validation (Leave one out and k-fold cross-validation)Sensitivity: Dynamic regression = 62%, MLR = 62.5%, regression tree = 68%, random forest = 80%, Bayesian network = 95%, classification tree = 68%, linear discriminant analysis = 74%, Markov chain = 0%, logistic regression = 74%, quadratic discriminant analysis = 86%, random forest classification = 71%Specificity: All above 85%Error rate: dynamic MLR = 0.18, MLR = 0.21, regression tree = 0.22, random forest = 0.17, Bayesian network = 0.21, classification tree = 0.22, linear discriminant analysis = 0.19, Markov chain = 0.24, logistic regression = 0.19, Quadratic discriminant analysis = 0.26, random forest classification = 0.22Bayesian Networks shown to be most useful tool for prediction.
3Bachmann-Machnik, Anna, Dittmer, Ulrich, Schoenfeld, Annika (2019) [24]Lake Baldeney/ Ruhr River, North-Rhine-Westphalia, Germany1 Beach3 SeasonsRainfall 24hr, Sewer outflow [FIB], Discharge/ flow, Combined sewer overflow durationNo final model presentedUnivariate regressionOriginal datasetAll R2 < 0.3, none presentedOverflow events at one combined sewer outflow do not necessarily result in exceedances.
4Brady, Amie M G, Bushon, Rebecca N, Plona, Meg B (2009) [25]Cuyahoga River, Cuyahoga Valley National Park, Ohio, USA4 Sampling sites4 SeasonsRainfall 24hr, Turbidity, Discharge/ flowRainfall 24hr, Turbidity, Discharge/ flowMultiple Linear Regression, Univariate regressionTemporal validation (new season)Geographical validation (sites3&4 modelled with site1 model)Sensitivity: Site1 = 94%, Site2 = 100%, Sites3&4 = 73%-91%Specificity: Site1 = 64%, Site2 = 0%, Sites3&4 = 22%-96%Accuracy: Site1 = 81%, Site2 = 90%, Sites3&4 = 68%-91%One predictive model outperformed persistence model, the other did not. Two models generalized from the first model at other locations did not outperform persistence models.
5Brady, Amie M G, Plona, Meg B (2009) [26]Cuyahoga River, Cuyahoga Valley National Park, Ohio, USA4 Sampling sites4 SeasonsRainfall 24hr, Turbidity, Water levelTurbidityUnivariate regressionTemporal validation (new season)Accuracy = 77%False negative rate = 21%Turbidity model predicted E. coli at all sites, worth posting the predicted concentrations online.
6Brady, Amie M.G., Plona, Meg B. (2015) [27]Cuyahoga River, Ohio, USA2 Sampling sites6 SeasonsRainfall 24hr, Rainfall 48 hr, Water temperature, Turbidity, Discharge/ flowRainfall 48 hr, TurbidityMultiple Linear regressionTemporal validation (new season)Models for most recent year.Sensitivity = 100.0%Specificity = 55.6%Accuracy = 82.2%R2 = 0.77 and 0.75RMSE = 0.3107 and 0.3197Automatic predictions implemented, recommend continuing nowcast system and further studies along river.
7Brady, Amie MG, Plona, Meg B (2012) [28]Cuyahoga River, Cuyahoga Valley National Park, Ohio, USA3 Sampling sites7 SeasonsRainfall 24hr, Rainfall 48 hr, Turbidity, Discharge/ flow, Water levelRainfall 48 hr, Turbidity, Discharge/ flow, Water levelMultiple Linear RegressionTemporal validation (new season)Sensitivity = 88% and 100%Specificity = 33% and 29%Adjusted R2 = 0.69 and 0.76RMSE = 0.34405 and 0.27396Predictive models performed better than persistence models in first two years, but not last year, possibly due to excess precipitation.
8Brooks, Wesley R., Fienen, Michael N., Corsi, Steven R. (2013) [29]Lake Erie and Lake Michigan, Cleveland and Toledo, Ohio, Port Washington, Wisconsin, USA4 Beaches4 SeasonsRainfall 24hr, Rainfall 48 hr, Air Temperature, Water temperature, Wave height, Solar radiation, Barometric pressure, Turbidity, Wind speed, Wind direction, Relative humidity, Discharge/ flow, Algae index, Bird count, Lake level, Month, Day of year, Sub-seasonRainfall 24hr, Rainfall 48 hr, Air temperature, Water temperature, Wave height, Solar radiation, Barometric pressure, Turbidity, Wind speed, Wind direction, Relative humidity, Discharge/ flow, Algae index, Bird count, Lake level, Month, Day of yearMultiple Linear Regression (partial least squares)Temporal validation (new season)Sensitivity and specificity used but values not listed.Partial least squares automates the model building process and compares favorably to the other regression models.
9Brooks, Wesley, Corsi, Steven, Fienen, Michael, Carvin, Rebecca (2016) [30]Chequamegon Bay, Lake Superior and Lake Michigan, Manitowoc County, Wisconsin, USA7 Beaches4 SeasonsRainfall <24 hr, Rainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Air Temperature, Water temperature, Wave height, Turbidity, Wind speed, Wind direction, Discharge/ flow, Conductivity, Current speed, Current direction, Wave direction, Cloud cover, Bird count, Bather count, Algae presence, Day of yearRainfall <24 hr, Rainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Air Temperature, Water temperature, Wave height, Turbidity, Wind speed, Wind direction, Discharge/ flow, Conductivity, Current speed, Current direction, Wave direction, Cloud cover, Bird count, Bather count, Algae accumulation, Day of yearMultiple Linear Regression (partial least squares and sparse partial least squares), Tree regression/ random forest, Logistic regression, Adaptive LASSO, Gradient boostingBootstrapping/ cross validationAUC (AUROC): Gradient boosting cross-validation tree estimate = 0.76, Gradient boosting out-of-bag tree estimate = 0.75, MLR Adaptive LASSO = 0.73 and 0.72, Logistic regression adaptive LASSO = 0.68, 0.65, 0.63, and 0.62, Sparse partial least squares = 0.70 and 0.70, Partial Least Squares = 0.66, MLR genetic algorithm = 0.65, Logistic regression genetic algorithm = 0.60 and 0.58Of 14 regression methods, a random forest model was the most accurate.
10Corsi, Steven R, Borchardt, Mark A, Carvin, Rebecca B, Burch, Tucker R, Spencer, Susan K, Lutz, Michelle A, McDermott, Colleen M, Busse, Kimberly M, Kleinheinz, Gregory T, Feng, Xiaoping, Zhu, Jun (2016) [31]Lake Michigan, Wisconsin, USA3 Beaches1 SeasonRainfall >24 hr, Rainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Air Temperature, Water temperature, Wave height, Turbidity, Discharge/ flow, Conductivity, Water level, Current speed, Current direction, Cloud cover, Algae abundanceNANANAFecal indicator bacteria model not presentedNA
11Cyterski, M, Zhang, S, White, E, Molina, M, Wolfe, K, Parmar, R, Zepp, R (2012) [32]Lake Michigan, Milwaukee, Wisconsin, USA1 Beach1 SeasonAir Temperature, Water temperature, Turbidity, Wind speed, Wind direction, Relative humidity, Conductivity, pH, Water level, Chloride, [NH4+], [NO3-]Air Temperature, Water temperature, Turbidity, Wind speed, Wind direction, Relative humidity, Conductivity, pH, [NH4+], [NO3-], Water levelMultiple Linear RegressionBootstrapping/ cross validationMean square error of prediction = 1.85, 3.67, and 2.71Temporal synchronization analysis of environmental predictors improved the predictive regression models.
12Dada, Ayokunle Christopher, Hamilton, David P (2016) [33]Lake Rotorua, North Island, New Zealand3 Beaches12 SeasonsRainfall 72+ hr, Barometric pressure, Wind speed, Wind direction, Discharge/ flow, Total nitrogen, Total phosphorus, Distance from lake exit, Suspended solids, Particulate inorganic phosphorusRainfall 72+ hr, Wind speed, Distance from Lake exit, Particulate inorganic phosphorusMultiple Linear RegressionTemporal validation (2 new seasons)Sensitivity = 0%-50%Specificity = 96.67%-100%Adjusted R2 = 0.73Accuracy = 96.67%-100%RMSE = 0.23–0.64Models worked well, could be used for guiding swimming advisories.
13Francy, Donna S., Gifford, Amie M., and Darner, Robert A. (2003) [34]Lake Erie and Mosquito Lake, Cleveland, Huntington Reservation, Lake County, and Mosquito Lake State Park, Ohio, USA6 Beaches3 SeasonsRainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Water temperature, Wave height, Previous day [FIB], Solar radiation, Turbidity, Wind speed, Wind direction, Discharge/ flow, Conductivity, Bird count, Day of year, Water level, Current direction, Days since last rainfallRainfall 24hr, Rainfall 72+ hr, Wave height, Previous day [FIB], Turbidity, Wind direction, Discharge/ flow, Bird count, Current direction, Day of year, Days since last rainfallMultiple Linear RegressionOriginal DatasetR2 = 0.17–0.58Accuracy = 71.2%-90.9%False positive rate = 4.0%-15.1%False negative rate = 3.9%-14.6%Models were beach specific, future research could test created models in future years, and test whether adding subsequent years’ data improves the models.
14Francy, D.S., Brady, A.M.G., Carvin, R.B., Corsi, S.R., Fuller, L.M., Harrison, J.H., Hayhurst, B.A., Lant, J., Nevers, M.B., Terrio, P.J., Zimmerman, T.M. (2013) [35]Lake Michigan, Lake Erie, Lake Ontario, and Lake Superior, Illinois, Indiana, Michigan, New York, Ohio, Pennsylvania, and Wisconsin, USA49 Beaches2 SeasonsRainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Air Temperature, Water temperature, Wave height, Solar radiation, Barometric pressure, Turbidity, Wind speed, Wind direction, Relative humidity, Discharge/ flow, Conductivity, pH, Chlorophyll a, Day of year, Bird count, Debris assessment, Dissolved O2, Wave period, Sub-season, Current direction, Current speed, Cloud cover, Water level, Algae category, Bather count, Weather categoryRainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Air temperature, Water temperature, Wave height, Solar radiation, Barometric pressure, Turbidity, Wind speed, Wind direction, Relative humidity, Discharge/ flow, Conductivity, Chlorophyll a, Day of year, Bird count, Debris assessment, Current direction, Cloud cover, Water level, Sub-season, Algae category, Bather count, Weather categoryMultiple Linear RegressionTemporal validation (new season)Sensitivity = 0%-100%Specificity = 40.4%-100%Accuracy = 55.8%-98.4%24 of the 42 models performed at least 5% more accurately than persistence models.
15Francy, Donna S, Darner, Robert A (2007) [36]Lake Erie, Cleveland and Huntington Reservation, Ohio, USA3 Beaches7 SeasonsRainfall 24hr, Rainfall 48 hr, Water temperature, Wave height, Water level, Number of wet and dry days, Day of yearRainfall 24hr, Rainfall 48 hr, Water temperature, Wave height, Day of yearMultiple Linear RegressionTemporal validation (new season)Sensitivity = 53.3%, 50%, 32.6%Specificity = 87.6%, 94.6%, 94.6%R2 = 0.43, 0.42, 0.32Additional data was added to refine models. An online nowcast system implemented.
16Francy, Donna S, Stelzer, Erin A, Duris, Joseph W, Brady, Amie M G, Harrison, John H, Johnson, Heather E, Ware, Michael W (2013) [37]Inland lakes, Ohio, USA13 Beaches2 SeasonsRainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Water temperature, Wave height, Solar radiation, Turbidity, Wind speed, Wind direction, Discharge/ flow, Conductivity, Bird count, Bather count, Water level, Day of yearRainfall 48 hr, Rainfall 72+ hr, Water temperature, Wave height, Turbidity, Wind speed, Wind direction, Discharge/ flow, Bather count, Water level, Bird count, Day of yearMultiple Linear RegressionTemporal validation (new season)Sensitivity = 0%- 62.5%Specificity = 65.2%-97.8%Accuracy = 65.4%-91.8%Three of nine site models had better accuracy, sensitivity, and specificity than persistence models, notably at two lakes with higher swimmer densities.
17Francy, Donna S., Bertke, Erin E., Darner, Robert A. (2009) [38]Lake Erie, Huntington Reservation and Edgewater State Park, Ohio, USA2 Beaches4 SeasonsRainfall 24hr, Rainfall 48 hr, Water temperature, Wave height, Solar radiation, Turbidity, Wave period, Water level, Days since last rain, Bather count, Day of yearRainfall 24hr, Rainfall 48 hr, Wave height, Turbidity, Water level, Day of yearMultiple Linear RegressionOriginal datasetSensitivity = 57.1% and 31.7%Specificity = 89.1% and 76.6%Accuracy = 84.9% and 61.0%The predictive model at one beach outperformed persistence model, but the predictive model at the other beach did not.
18Francy, Donna S., Darner, Robert A., Bertke, Erin E. (2006) [39]Lake Erie, Lorain, Huntington Reserve, Cleveland, Ohio, USA5 Beaches6 SeasonsRainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Water temperature, Wave height, Turbidity, Wind direction, Bird count, Water level Day of year,Rainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Water temperature, Wave height, Turbidity, Wind direction, Water level, Day of yearMultiple Linear RegressionOriginal datasetSensitivity: Threshold method = 59.1%-92.9%, Predicted [E. coli] = 25.9%-82.1%Specificity: Threshold method = 52.6%-94.9%, Predicted [E. coli] = 63.2%-98.9% R2 = 0.35–0.44Accuracy: Threshold = 76.6%-89.7%, Predicted [E. coli] = 74.5%-88.1%The best model made better predictions than persistence models and predictions were made available online.
19Francy, Donna S., Darner, Robert A. (2003) [40]Lake Erie, and Mosquito Lake, Cleveland, Huntington Reservation, Ohio, USA4 Beaches3 SeasonsRainfall 24hr, Rainfall 72+ hr, Water temperature, Wave height, Solar radiation, Turbidity, Wind speed, Wind direction, Discharge/ flow, Bird count, Day of year, Current directionRainfall 24hr, Rainfall 72+ hr, Wave height, Turbidity, Wind speed, Wind direction, Discharge/ flow, Bird count, Day of year, Current directionMultiple Linear RegressionOriginal datasetR2 = 0.32–0.41Accuracy = 71.2%-90.9%False positive rate = 4.0%-14.4%False negative rate = 4.0–14.6%Predition error too high to accurately predict E. coli concentrations, but can be used to predict the probability of exceedances.
20Frick, W.E (2006)* [41]Lake Erie, Huntington Beach, Cleveland, Ohio, USA1 Beach1 SeasonUnknownUnknownMultiple Linear RegressionUnknownR2 and Mallow’s Cp used but not values not reportedTested the Virtual Beach program at a beach, showing the program can be helpful for creating predictive models.
21Frick, Walter E., Ge, Zhongfu, Zepp, Richard G. (2008) [42]Lake Erie, Huntington Beach, Cleveland, Ohio, USA1 Beach1 SeasonRainfall 24hr, Rainfall 48 hr, Air Temperature, Water temperature, Wave height, Solar radiation, Turbidity, Wind speed, Wind direction, Cloud cover, Dew point, Precipitation potential, Rainfall intensityRainfall 24hr, Wave height, Turbidity, Wind speed, Wind direction, Cloud cover, Dew point, Rainfall intensityMultiple Linear RegressionOriginal datasetAdjusted R2 = 0.457–0.610Dynamic model built off of small amounts of data compare to static models built with more data.
22Hatfield, Nancy Lee Clark (2000) [43]Lake Erie and an artificial lake, Maumee Bay State Park, Ohio, USA2 Beaches1 SeasonAir Temperature, Water temperature, Wave height, Turbidity, Wind speed, Wind direction, Bather count, Bird count, Current direction, Wave direction, weather category, Days since last rainAir temperature, Water temperature, Turbidity, Wind speed, Wind direction, Bird count, Bather count, Days Since last rainMultiple Linear RegressionOriginal datasetNo predictive model for inland lake found.Sensitivity = 50%Specificity = 82%R2 = 0.253A reliable model was created for the Lake Eire beach but not for the artificial lake.
23He, Cheng, Post, Yvonne, Dony, John, Edge, Tom, Patel, Mahesh, Rochfort, Quintin (2016) [44]Lake Ontario, Toronto, Ontario, Canada1 Beach3 SeasonsRainfall <24 hr, Air Temperature, Water temperature, Wave height, Turbidity, Wind speed, Wind direction, Discharge/ flow, Bird count, Water level, Current speed, Current directionRainfall <24 hr, Turbidity, Wind speed, Wind direction, Discharge/ flowDecision treeTemporal validation (2 new seasons)Accuracy = 76% and 78%Model performed better than previously developed linear regression model and persistence model.
24Heberger, Matthew G, Durant, John L, Oriel, Kimberly A, Kirshen, Paul H, Minardi, Lee (2008) [45]Mystic River watershed, Boston, Massachusetts, USA1 Sampling site1 SeasonRainfall <24 hr, Discharge/ flow, Time since last rainfallRainfall <24 hr, Discharge/ flow, Time since last rainfallMultiple Linear RegressionTemporal validation (new season)R2 (calibration) = 0.42PRESS (calibration) = 16.9Accuracy: correctly predicted 4/5 exceedances and 15/16 non-exceedancesPredictive models showed good agreement with models developed for other systems, and showed model perform well with rivers.
25Herrig, Ilona, Seis, Wolfgang, Fischer, Helmut, Regnery, Julia, Manz, Werner, Reifferscheid, Georg, Boeer, Simone (2019) [46]Rhine and Moselle rivers, Rhineland-Palatinate, Germany2 Beaches2 SeasonsRainfall 24hr, Rainfall 72+ hr, Water temperature, Solar radiation, Turbidity, Discharge/ flow, Conductivity, pH, Chlorophyll a, Dissolved O2Rainfall 24hr, Solar radiation, Discharge/ flowBayesian modellingTemporal validation (new season)R2: Site1 = 0.73, Site2 = 0.55Whether microbial interactions in the river are driven by hydro-meteorological factors or trophic/biotic level factors plays an important role in modelling and outcomes variables.
26Hong, Yi, Soulignac, Frederic, Roguet, Adelaide, Li, Chenlu, Lemaire, Bruno J, Martins, Rodolfo Scarati, Lucas, Francoise, Vincon-Leite, Brigitte (2021) [47]Lake Créteil, Créteil, Valde-Marne, France3 Sampling sites1 SeasonRainfall >24 hr, Air Temperature, Water temperature, Sewer outflow [FIB], Solar radiation, Barometric pressure, Wind speed, Wind direction, Relative humidity, Discharge/ flow, Water level, Cloud coverRainfall >24 hr, Air temperature, Wind speed, Wind direction, Relative humidity, Air temperature, Cloud coverHydrodynamic modellingTemporal validation (new season)R2 = 0.89 NSE coefficient ≥ 0.7 for water flow simulationsAccurate predictions show promise for hydrodynamic modelling in stormwater systems and lakes.
27Jones, Rachael M, Liu, Li, Dorevitch, Samuel (2013) [48]Lake Michigan, Chicago, Illinois, USA3 Beaches2 SeasonsRainfall <24 hr, Rainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Solar radiation, Water level, Time since last rain, Rain intensityRainfall 48 hr, Rainfall 72+ hr, Solar radiation, Time since last rain, Intensity of rainfallLinear mixed effects modelDivision of original datasetEcoli: Sensitivity = 23% and 42%Specificity = 77% and 89%Accuracy = 77% and 77%EnterococcusSensitivity = 53% and 62%Specificity = 80% and 84%Accuracy = 72% and 76%Predictive models performed with good accuracy but low sensitivity.
28Madani, M, Seth, R (2020) [14]Lake St. Clair, Windsor, Ontario, Canada1 Beach3 SeasonsRainfall <24 hr, Rainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Air Temperature, Water temperature, Wave height, Turbidity, Wind speed, Wind direction, Cloud cover, Weather category, Bird countRainfall <24 hr, Rainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Air temperature, Water temperature, Wave height, Turbidity, Wind speed, Wind direction, Weather Category, Bird countMultiple Linear RegressionTemporal validation (new season)Sensitivity = 30%-78%Specificity = 73%-90%R2 = 0.065–0.225AUROC = 0.70–0.79RSME = 0.251–0.449Accuracy = 64%-78%The predictive model outperformed persistence models. Models built using two, three, and four years of data, with the model built using two years being marginally better.
29Maimone, Mark, Crockett, Christopher S, Cesanek, William E (2007) [49]Schuylkill River, Philadelphia, Pennsylvania, USA12 Sampling sites1 SeasonTurbidity, Discharge/ flow, Time since last rainfallTurbidity, Discharge/ flow, Time since last rainfallDecision treeTemporal validation (new season)Accuracy = 66%Early testing shows model can be accurate, and when it is inaccurate, it is overly cautious, more data will be added to algorithm when available.
30Mälzer, H.-J., aus der Beek, T, Müller, S, Gebhardt, J (2016) [50]Lake Baldeney/ Ruhr River, Northrhine-Westphalia, Germany1 Beach6 SeasonsRainfall <24 hr, Rainfall 24hr, Air Temperature, Water temperature, Solar radiation, Turbidity, Discharge/ flow, Conductivity, pH, Total and dissolved organic carbon, Spectral adsorption coefficients at 254 and 436nm, [NH4+], [NO2], [NO3-], ortho- and total-phosphate, dissolved O2, Days since last rainfallAir temperature, Water temperature, Turbidity, pH, Spectral adsorption at 254 and 436nm, [NO3-], [NH4+], Dissolved O2, Days since last rainfall, some ANN variables not listedMultiple Linear Regression, Artificial neural networks, Deterministic (hydrodynamic) models, Logistic modelsUnknownSensitivity: Single regression = 53%-100%, MLR = 67%-100%, ANN = 89%-100%, Logistic = 80%-100%Specificity: Single regression = 27%-68%, MLR = 0–51%, ANN = 56–83%, Logistic = 40–51%ANN was the most accurate model, but accuracy varied across stretches of the river.
31Marion, Jason W (2011) [51]Inland lakes, Ohio, USA7 Beaches1 SeasonWater temperature, Turbidity, Carlson’s Trophic Index (calculated by Chlorophyll a, Total phosphorus, Secchi depth), Phycocyanin, Dissolved O2Total phosphorus, Carlson’s Trophic Index (calculated by Total phosphorus or mean of 3 index measures from Phosphorus, Secchi depth, and Chlorophyll a), PhycocyaninLogistic regressionOriginal datasetAUROC: TP = 0.7050, Phycocyanin = 0.6398, TSI-TP = 0.6875, TSI-mean = 0.7203Goodness of fit test: TP = 0.1269, Phycocyanin = 0.5050, TSI-TP = 0.3588, TSI-mean = 0.3241Improved sensitivity is desired to reduce false negatives, but model can be useful for real-time estimates of fecal indicators.
32Molina, M., Cyterski, Mike, Whelan, G., Zepp, R. (2014)* [52]A Great Lake beach, USA1 BeachUnknown seasonUnknownUnknownMultiple Linear regressionUnknownSensitivity, specificity, R2 used but values not listedOnsite data provided better predictive accuracy than publicly available data.
33Motamarri, Srinivas, Boccelli, Dominic L (2012) [53]Charles River Basin, Massachusetts, USA1 Sampling site2 SeasonsRainfall 24hr, Rainfall 48 hr, Previous day [FIB], Solar radiation, Discharge/ flow, Rainfall intensity, Rime since last rainfallRainfall 48 hr, Rainfall 72+ hr, Discharge/ flow, Rainfall intensity, Time since last rainfall (rainfall of >0.25 inches and >0.5inches)Multiple Linear Regression, Artificial neural networks, Learning vector quantization (LVQ)Backward elimination, LVQ used variance gained method and determinant gain method for 2 models. Top 5 variables chosen for all models.Sensitivity:MLR = 48%, ANN = 68%, LVQ = 86%Specificity: MLR = 95%, ANN = 92%, LVQ = 98%R2 and MSE: Values not reportedANN and LVQ performed similarly, with LQV performing better with less variables included in the model.
34Nevers, Meredith B, Shively, Dawn A, Kleinheinz, Gregory T, McDermott, Colleen M, Schuster, William, Chomeau, Vinni, Whitman, Richard L (2009) [54]Lake Michigan, Green Bay, Sturgeon bay, Door County, Wisconsin, USA24 Beaches3 SeasonsRainfall 48 hr, Air Temperature, Water temperature, Previous day [FIB], Barometric pressure, Wind speed, Wind direction, Bird count, Water level, Wave period, Algae accumulationRainfall 48 hr, Water temperature, Wave height, Previous day [FIB], Barometric pressure, Wind speed, Wind direction, Bird count, Water level, Algae accumulationTree regression/ random forestOriginal datasetR2 (adjusted R2) = 0.318 (0.315), 0.251 (0.247), 0.195 (0.184)Mallow’s Cp: 6.00, 5.4, 5.68Models affected by generally low E. coli concentrations at beaches, resulting in low signal-to-noise in model building.
35Nevers, Meredith B, Whitman, Richard L (2005) [55]Lake Michigan, Indiana Dunes National Park, Indiana, USA5 Beaches1 SeasonRainfall <24 hr, Rainfall 24hr, Air Temperature, Water temperature, Solar radiation, Barometric pressure, Turbidity, Wind speed, Wind direction, Relative humidity, Discharge/ flow, Conductivity, pH, Chlorophyll a, Dew point, Wind gust, Dissolved O2, Water level, Wave period, Wave direction, Colour, Cloud cover, Current speed, Current directionRainfall <24 hr, Wave height, Turbidity, Chlorophyll a, Wave periodMultiple Linear RegressionOriginal datasetR2: North wind = 0.6335, South wind = 0.320, North and south = 0.465Incorrect open/closed predictions: North = 3%, south = 2%Predictive models more accurate than persistence models. Variation better explained at beach level models.
36Nevers, Meredith B, Whitman, Richard L, Frick, Walter E, Ge, Zhongfu (2007) [56]Lake Michigan, Indiana Dunes National Lakeshore, Indiana, USA2 Beaches1 SeasonRainfall 24hr, Air Temperature, Water temperature, Wave height, Turbidity, Wind speed, Discharge/ flow, Conductivity, pH, Chlorophyll a, Dissolved O2, Water level, Dew point, Cloud cover, Current direction, Current speed, Spectral adsorption coefficients at 254 and 436nm, Wave periodWave height, Barometric pressure, Turbidity, Wind speed, Wind direction, Conductivity, Wave periodMultiple Linear RegressionOriginal datasetR2 = 0.722 and 0.504Adjusted R2 = 0.694 and 0.477RMSE = 0.371 and 0.419one and zero type I errors, three and two type II errorsPredictive models had less error than persistence models. Able to model beaches with multiple outfall sources.
37Nevers, Meredith B., Whitman, Richard L. (2008) [57]Lake Michigan, Indiana, USA12 Beaches1 SeasonRainfall <24 hr, Air Temperature, Water temperature, Wave height, Previous day [FIB], Solar radiation, Barometric pressure, Turbidity, Wind speed, Wind direction, Discharge/ flow, Conductivity, pH, Chlorophyll a, Colour, Dissolved O2, Wave height, Dew point, Current direction, Cloud coverRainfall <24 hr, Wave height, Turbidity, Wind direction, Discharge/ flowMultiple Linear RegressionWhole lake model used a division of the original dataset, while beach specific models used whole original datasetR2: All beaches = 0.48, Beach specific = 0.34–0.57RMSE: All beaches = 0.42, Beach specific = 0.334–0.545Many beaches along the same coastline may be able to be modelled as if they were one beach.
38Olyphant, G A (2005) [58]Lake Michigan, Indiana Dunes state park, Indiana, and Lake County, Illinois, USA4 Beaches1 SeasonRainfall <24 hr, Rainfall 24hr, Air Temperature, Water temperature, Wave height, Solar radiation, Wind speed, Wind direction, Water level, Streamflow [FIB], Time sample collectedRainfall 24hr, Air temperature, Water temperature, Wave height, Solar radiation, Wind speed, Wind direction, Water level, Streamflow [FIB], Time sample collectedMultiple Linear Regression (Ordinary and generalized-least squares)Original datasetR2 = 0.65–0.76Accuracy = 85% - 88%Predictive models outperformed persistence models. Model was still 90% accurate even in extreme high or low cases.
39Olyphant, Greg A, Whitman, Richard L (2004) [59]Lake Michigan, Chicago, Illinois, USA1 Beach1 SeasonRainfall <24 hr, Rainfall 24hr, Rainfall 48 hr, Air Temperature, Water temperature, Wave height, Solar radiation, Turbidity, Wind speed, Wind direction, Conductivity, pH, Water level, Wave frequency, Dissolved O2Rainfall 24hr, Water temperature, Solar radiation, Turbidity, Wind speed, Wind direction, Water levelMultiple Linear RegressionOriginal datasetSensitivity = 93%Specificity = 86%Multiple correlation coefficient, R = 0.84Model was accurate at predicting exceedances. Large array of instrumentation tested directly on the beach.
40Parkhurst, David F, Brenner, Kristen P, Dufour, Alfred P, Wymer, Larry J (2005) [60]Lake Michigan, Indiana, and Detroit river, Michigan, USA5 Beaches1 SeasonAir Temperature, Water temperature, Wave height, Previous day [FIB], Wind speed, Wind direction, Sunny (Y/N), Bather count, Cloud cover, Current direction, Water level, Day of week, Time sample collectedAir Temperature, Water temperature, Wave height, Previous day [FIB], Wind speed, Wind direction, Sunny (Y/N), Bather count, Cloud cover, Current direction, Water level, Day of week, Time sample takenTree regression/ random forestTemporal validation (new season)[E. coli]R2 = 0.138 and 0.800log[E. coli]R2 = 0.824 and 0.125Tree regression a useful tool for exploratory analysis. Predictive model worked poorly at predicting the raw values of E. coli but worked well at predicting magnitude or log (E.coli).
41Rossi, Alessandra, Wolde, Bernabas T., Lee, Lee H., Wu, Meiyin (2020) [61]Passaic and Pompton rivers, New Jersey, USA1 Sampling site1 SeasonRainfall <24 hr, Rainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Air Temperature, Water temperature, Turbidity, Conductivity, pH, Chlorophyll a, Dissolved O2, [NO3], Dissolved organic carbonRainfall 72+ hr, Conductivity, pHLogistic regressionBootstrapping/ cross validationR2 = 0.23 and 0.41Misclassification rate = 0.2 and 0.09.Chi-Squared statistic = 18.66 (p = 0.0001) and 23.27 (p = 0.0007)Model shows a probabilistic measure of exceedance likelihood. Bagging technique improves reliability of model.
42Safaie, Ammar, Wendzel, Aaron, Ge, Zhongfu, Nevers, Meredith B, Whitman, Richard L, Corsi, Steven R, Phanikumar, Mantha S (2016) [62]Lake Michigan, Indiana Dunes National Park, Indiana, USA3 Beaches1 SeasonWater temperature, Solar radiation, Turbidity, Discharge/ flow, Conductivity, Current speed, Current directionWater temperature, Solar radiation, Turbidity, Current speed, Current directionMultiple Linear Regression and Hydrodynamic modellingOriginal datasetR2: Statistical model = 0.749 and 0.710, Mechanistic = 0.603 and 0.722RMSE: Statistical = 0.431 and 0.464, Mechanistic = 0.601 and 0.521PBIAS: Statistical = -3.792 and -11.553, Mechanistic: 8.094 and 7.250NSE: Statistical = 0.554 and 0.444, Mechanistic = 0.133, 0.299RSR: Statistical = 0.677 and 0.745, Mechanistic = 0.931 and 0.837, Fn: Statistical = 0.316 and 0.333, Mechanistic = 0.440 and 0.373The cooperative modeling approach of using statisitical models and hydrodynamic models to improve model building of the other lead to models with good predictive power that can generate real-time forecasts.
43Seis, W, Zamzow, M, Caradot, N, Rouault, P (2018) [63]River Havel, Berlin, Germany1 Sampling site6 SeasonsRainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Discharge/ flowRainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Discharge/ flowBayesian modellingTemporal validation (2 new seasons)Leave-one-out cross-validation information criterion = 177, 195, 191 and assessed graphicallyA methodology for an early warning system, including probabilistic alert levels were developed. The model provides solutions to the current alert system.
44Shively, Dawn A, Nevers, Meredith B, Breitenbach, Cathy, Phanikumar, Mantha S, Przybyla-Kelly, Kasia, Spoljaric, Ashley M, Whitman, Richard L (2016) [13]Lake Michigan, Chicago, Illinois, USA9 Beaches3 SeasonsRainfall <24 hr, Rainfall 24hr, Rainfall 48 hr, Air Temperature, Water temperature, Wave height, Solar radiation, Barometric pressure, Turbidity, Wind speed, Wind direction, Relative humidity, Wave period, Water level, Day of yearAir temperature, Wave height, Solar radiation, Barometric pressure, Turbidity, Wind direction, Wind speed, Day of yearMultiple Linear RegressionTemporal validation (new season)Sensitivity = 0%-36%Specificity = 73%-100%Adjusted R2 = 0.046–0.349Accuracy = 68%-97%Fully automated water quality system used for input into predictive model that outperformed the persistence model. Interannual model refinement improved performance.
45Simmer, Reid A (2016) [64]F.W. Kent Park Lake, Oxford, Iowa, USA1 Beach4 SeasonsRainfall <24 hr, Rainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Air Temperature, Water temperature, Wave height, Solar radiation, Turbidity, Wind speed, Wind direction, Relative humidity, pH, Dissolved O2, Wave direction, Bird count, Bather count, Concentration of goose droppings, Algae presence, Day of yearRainfall 72+ hr, Water temperature, Wind speed, Wind direction, pH, Dissolved O2, Wave direction, Bather count, Goose dropping concentration, Day of yearMultiple Linear RegressionBootstrapping/ cross-validationSensitivity: 4yr = 60.00%, 2015 = 66.67%Specificity: 4yr = 90.00%, 2015 = 96.97%Adjusted R2: 4yr = 0.53, 2015 = 0.47Accuracy: 4yr = 84%, 2015 = 90.48% (2015 being the last year modelled in the study)Both predictive models created were more accurate than persistence models.
46Telech, Justin W, Brenner, Kristen P, Haugland, Rich, Sams, Elizabeth, Dufour, Alfred P, Wymer, Larry, Wade, Timothy J (2009) [65]Lake Erie and Lake Michigan, Bay Village, Ohio, Indiana Dunes National Lakeshore and Michigan City, Indiana, and St. Joseph, Michigan4 Beaches2 SeasonsRainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Air Temperature, Water temperature, Wave height, Turbidity, Wind speed, Wind direction, pH, Cloud cover, Bather count, Bird count, Boat count, Time sample collectedRainfall 24hr, Rainfall 48 hr, Water temperature, Wave height, Turbidity, Wind speed, Wind direction, pH, Cloud cover, Bather count, Bird count, Boat count, Time sample collectedMultiple Linear RegressionOriginal datasetR2 with different E. coli enumeration methods: qPCR = 0.22, 0.57, 0.39, 0.81Membrane filtration = 0.45, 0.86, 0.50, 0.94Both models did not perform well at predicting Enterococcus exceedances.
47Uejio, Christopher K, Peters, Theodore W, Patz, Jonathan A (2012) [66]Geneva Lake, Wisconsin, United States5 Beaches8 SeasonsRainfall 24hr, Rainfall 72+ hr, Air Temperature, Wind speed, Wind direction, Discharge/ flow, Cloud cover, Days since last rainfall, Month, Sampling timeRainfall 24hr, Rainfall 72+ hr, Wind speed, Wind direction, Discharge/ flow, Month, Sampling time, Cloud coverBayesian modellingOriginal datasetSensitivity = 0%-54%Specificity = 0%-80%Accuracy = 54%-100%Predictive models at some of the beaches had good accuracy and could support decisions.
48Wang, Leizhi, Zhu, Zhenduo, Sassoubre, Lauren, Yu, Guan, Liao, Chen, Hu, Qingfang, Wang, Yintang (2020) [67]Lake Erie, Erie county, New York, USA3 Beaches7 SeasonsRainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Air Temperature, Water temperature, Wave height, Barometric pressure, Turbidity, Wind speed, Wind direction, Discharge/ flow, Water level, Bird count, Algae category, Debris category, Fecal matter category, Odor (Y/N), Combined sewer overflow (Y/N), Day of year, Wave direction, Cloud cover, Current speed, current directionRainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Air Temperature, Water temperature, Wave height, Barometric pressure, Turbidity, Wind speed, Wind direction, Discharge/ flow, Water level, Bird count, Algae category, Debris category, Fecal matter category, Odor (Y/N), Combined sewer overflow (Y/N), Day of year, Wave direction, Cloud cover, Current speed, Current directionModel stacking of these outputs: Multiple Linear Regression (including Partial least squares, sparse partial least squares), Bayesian modelling, Tree regression/ random forestBootstrapping/ cross-validation (leave one year out cross-validation)Accuracy = 78%, 81%, and 82.3%A model stacking approach improved robustness of prediction power, with random forest contributing the most weight in the model.
49Wendzel, Aaron (2014) [68]Lake Michigan, Indiana Dunes National Lakeshore, Indiana, USA3 Beaches1 SeasonSolar radiation, Conductivity, Current speed, Current direction, decay rateSolar radiation, Conductivity, Current speed, Current direction, decay rateHydrodynamic modellingOriginal datasetRMSE = 0.600, 0.647, and 0.809PBIAS = 6.511, 10.489, 24.283Fourier Norm = 0.408, 0.408, 0.660Model could accurately simulate FIB concentrations at beaches using unstructured grids.
50Whitman, R L, Nevers, M B (2008) [69]Lake Michigan, Chicago, Illinois, USA23 Beaches5 SeasonsAir Temperature, Water temperature, Wave height, Solar radiation, Barometric pressure, Wind speed, Wind direction, Day of yearWave height, Barometric pressure, Day of yearMultiple Linear RegressionOriginal datasetAdjusted R2 = 0.20–0.41Beaches geographically close to each other had correlated E. coli fluctuations.
51Zhang, Juan, Qiu, Han, Li, Xiaoyu, Niu, Jie, Neyers, Meredith B, Hu, Xiaonong, Phanikumar, Mantha S (2018) [15]Lake Michigan, Indiana Dunes National Park, Indiana, USA3 Beaches1 SeasonRainfall 24hr, Water temperature, Wave height, Previous day [FIB], Solar radiation, Turbidity, Wind speed, Wind direction, Discharge/ flow, Conductivity, Past [FIB] beyond one dayRainfall 24hr, Water temperature, Wave height, Turbidity, Wind speed, Discharge/ flow, Past [FIB] beyond one dayArtificial neural networks: Nonlinear input-output (NIO), nonlinear autoregressive neural network (NAR), nonlinear autoregressive network with exogenous inputs (NARX), NAR + discrete wavelet transform (WA-NAR)Division of original datasetSensitivity: NIO = 0, 1, 1NAR = 0, 0.5 0.5NARX1 = 0, 0.5, 1NARX2 = 0, 0.5, 1WA-NAR = 0.5, 0.33, 0.4Specificity:NIO = 1, 1, 1 NAR = 1, 1, 1 NARX1 = 1, 0.99, 1 NARX2 = 1, 1, 0.99WA-NAR = 1, 1, 1R2: NIO = 0.53, 0.43, 0.46NAR = 0.38, 0.34, 0.59NARX1 = 0.80, 0.82, 0.80NARX2 = 0.77, 0.83, 0.82WA-NAR = 0.62, 0.57, 0.62RMSE: NIO = 0.33, 0.53, 0.32 NAR = 0.24, 0.41, 0.45NARX1 = 0.15, 0.31, 0.23 NARX2 = 0.26, 0.23, 0.29WA-NAR = 0.07, 0.10, 0.11NARX performed the best, with WA-NAR in second but requiring no explanatory variables. All models were comparable to or outperformed other predictive models previously built the these beaches.
52Zimmerman, Tammy M (2008) [70]Presque Isle Beach 2, City of Erie, Pennsylvania, USA1 Beach3 SeasonsRainfall 24hr, Rainfall 48 hr, Rainfall 72+ hr, Water temperature, Wave height, Discharge/ flow, Turbidity, Wind speed, Wind direction, Conductivity, pH, Dissolved O2, Bird count, Current speed and directionWave height, Turbidity, Bird countMultiple Linear regressionTemporal validation (new season)Sensitivity = 50.0%Specificity = 97.4%R2 = 0.66Predictive models outperformed persistence models, notably in the models using the previous two seasons only.
53Zimmerman, Tammy M (2006) [71]Lake Erie, City of Erie, Pennsylvania, USA1 Beach2 SeasonsRainfall 72+ hr, Water temperature, Wave height, Turbidity, Discharge/ flow, Conductivity, pH, Bird count, Debris category, Boat count, Dissolved O2, Current direction, Current speedRainfall 72+ hr, Wave height, Turbidity, Wind directionMultiple Linear regressionOriginal datasetR2: 2004 = 0.54, 2005 = 0.71, Both = 0.64Log-likelihood: 2004 = -73.33, 2005 = -143.30, Both = -224.31Predictive models were able to predict non-exceedances well, but performed worse at predicting exceedances.

a If different lengths of time were used at different locations, the highest number of seasons is presented. Only seasons used in model building were included, entire seasons used for model validation are not included in this count.

b Statistics for validation of models used over calibration data when available.

* Conference proceeding, only an abstract was available.

Definitions: Area under the curve (AUC), Area under the receiver operator curve (AUROC), Root mean squared error (RMSE), Percent bias (PBIAS), Least absolute shrinkage and selection operator (LASSO), Nash-Sutcliffe efficiency (NSE), Fourier transform (Fn).

a If different lengths of time were used at different locations, the highest number of seasons is presented. Only seasons used in model building were included, entire seasons used for model validation are not included in this count. b Statistics for validation of models used over calibration data when available. * Conference proceeding, only an abstract was available. Definitions: Area under the curve (AUC), Area under the receiver operator curve (AUROC), Root mean squared error (RMSE), Percent bias (PBIAS), Least absolute shrinkage and selection operator (LASSO), Nash-Sutcliffe efficiency (NSE), Fourier transform (Fn). Studies were published from 2000 to 2021 (median of 2013). S6 Table summarizes study characteristics, including number of years of model building and publication type (Figs 2 and 3). While the maximum number of swimming seasons included in model building was 12 seasons [33], 19 (36%) of the studies used only one swimming season of data for model creation. Around half (26 studies, 49%) used two seasons or less. However, the number of seasons used in model building do not include seasons that were used solely for model validation in the 21 (40%) of studies that used temporal validation.
Fig 2

Frequency of the number of swimming seasons used in building models.

Fig 3

Frequency of publication types.

Five countries were represented in this study: U.S. (44 publications), Germany (4), Canada (2), New Zealand (2), and France (1). Additionally, the studies mostly focused on the Great Lakes, in particular Lake Michigan (20 studies) and Lake Erie (14) (Fig 4). Lake Ontario and Lake Superior were investigated in two studies each. No studies included Lake Huron. Overall, 40 studies (75%) modelled lakes and 13 studies (25%) modelled rivers. Fig 5 shows the frequency of the number of beaches in each study.
Fig 4

Frequency of the location of beaches.

Fig 5

Frequency of the number of beaches, or sampling sites if beaches not provided.

Table 2 summarizes modelling methods employed in the studies. The most commonly used model building method was multiple linear regression, which was used in 37 studies (70%), while univariate linear regression used in three (6%). Logistic regression, using a dichotomous outcome variable representing whether recreational waters met thresholds for safe use by bathers, were explored in five studies (9%). Additionally, tree regression or random forests were utilized in six studies (11%). Decision trees were created in three studies (6%). Beginning in 2012, more computationally advanced models were introduced including Bayesian networks, artificial neural networks, and deterministic or hydrodynamic models, of which there were five (9%), three (6%), and four (8%) of these model types, respectively. Various studies involved multiple modelling methods to compare their efficacy, comparing multivariate linear regression, artificial neural networks, hydrodynamic models, Bayesian networks, and stacking of multiple models together.
Table 2

Modelling techniques for creating the predictive models present in 53 relevant studies.

Model characteristicsNumber of studies% of total studies
Modelling technique
    Multiple linear regression3770%
    Tree regression and/or random forests611%
    Logistic regression59%
    Bayesian networks59%
    Deterministic/ hydrodynamic modelling48%
    Artificial neural networks36%
    Univariate regression36%
    Decision tree36%
E. coli was the most commonly investigated FIB (n = 46 studies, 87%), while 11 (21%) modelled Enterococcus, one (2%) modelled total fecal coliforms, and one (2%) included models for Salmonella and Campylobacter. Of these, 34 (64%) studies log-transformed the concentration of the FIB of interest. The predictor variables examined and included in final models are presented in S7 Table (and Fig 6). The variables used in most studies’ final models were turbidity, wind direction, wave height, and wind speed. Time variables were important in creating models, as seen with the regular inclusion of day of year, sampling time, and month/ sub-season variables in final models. Forty-five (85%) studies assessed rainfall variables, including amount of rainfall in the previous <24, 24, 48, or 72 or more hours, the length of time since the last rainfall, or intensity of the last rainfall. Three commonly transformed variables were log10(turbidity), log10(discharge), and weighted rainfall. Most studies obtained these environmental variables from government sources such as US Geological Survey river gauges and National Weather Service airport weather stations.
Fig 6

Frequency of environmental variables explored in studies and frequency of variables included final models.

Accuracy of predictive models was measured in 19 studies. The overall accuracy of these studies was 81% (S8 Table). Of these studies, 13 compared their accuracy to pre-existing persistence models at those locations, and with the exception of one study, all or most of their models were more accurate than persistence models. Risk-of-bias characteristics of each individual study are presented as S9 Table, while summary data are presented in Table 3. We found that one study adjusted predictor weights to address overfitting (regularization of data) and only three studies (6%) compared predictors’ calibration distributions to validation distributions. Additionally, little information was provided on the handling of missing data, with only 17 (32%) studies reporting any method of dealing with missing FIB concentrations or predictor values. Modelling assumptions, such as normality, were rarely fully addressed, with only 12 (23%) studies affirming they met all model assumptions.
Table 3

Risk of bias checklist summary for 53 relevant studies.

Study design and criteriaNumber of studies% of studies
Source of predictor data
    Government data3770%
    Collected by researchers2853%
    Collected by other researchers815%
    Conservation Authorities36%
    Other12%
    Not clear611%
Method for selecting predictors for multivariate modeling
    All included2751%
    Virtual Beach713%
    Preselected based on significant association with FIB611%
    Only univariate modeling performed36%
    Preselected based on a priori criteria or literature24%
    Other48%
    Not clear59%
Predictor selection method for inclusion in final model
    All possible variable combinations created, and final model chosen by model fit characteristics (e.g., R2, RMSE)1325%
    Full model approach1019%
    Akaike’s Information Criterion713%
    Virtual Beach713%
    Backward selection48%
    Forward selection36%
    Univariate model36%
    Bayesian Information Criterion12%
    Other713%
    Not clear59%
Model performance measure
    R2 or adjusted R23260%
    Sensitivity2649%
    Specificity2547%
    Accuracy1932%
    Root mean squared error815%
    Area under the curve or area under the receiver operator curve48%
    False negative or positive rate36%
    Fourier transform24%
    Percent bias24%
    Nash-Sutcliffe efficiency24%
    Mallow’s Cp24%
    Other1528%
Model validation method
    Fitting to original dataset2140%
    Temporal validation (new seasons)2038%
    Bootstrapping/ cross-validation611%
    Division of original dataset36%
    Geographical validation (new beaches/ sites)12%
    Not clear24%
Were predictor weights or regression coefficients shrunk at all?
    Yes12%
    No5298%
Are modeling assumptions satisfied?
    Yes1223%
    No12%
    Not clear4075%
Handling of predictors in modelling
    Categorized2038%
    Log-transformed1834%
    Weighted days1121%
    Square roots48%
    Other transformations59%
Handling of missing data
    Left as missing917%
    Remove predictors with missing data36%
    Data replaced with data from nearby sensor or sample collection36%
    Remove days with missing predictor data12%
    Autocorrelation and partial autocorrelation12%
    Not clear3770%
Were predictor distributions compared between calibration and validation datasets?
    Yes36%
    No5094%
Predictor measurements were mostly collected from governmental sources (37 studies, 70%) or directly by the authors (28 studies, 53%) deploying their own instruments or water sampling. Most predictor transformations were categorizations (20 studies, 38%), weighting rainfall over several days (11 studies, 21%), or logarithmic (18 studies, 34%), however some studies utilized other transformations such as polynomial [64] or trigonometric transformations [34]. Twenty-seven studies (51%) reported they used no pre-screening criteria for selecting variables that were evaluated in multivariable modelling. To select predictors in final models, 13 studies (25%) used model fit characteristics of predicted values compared to actual values of FIB concentrations in many or all possible models. A full model approach using all variables was used in 10 studies (19%). Other techniques included backwards elimination, Akaike’s Information Criterion, and forward selection. Seven (13%) studies created models using the Virtual Beach software tool.

Discussion

This review compiles results of the literature reporting on predictive models of FIB at fresh, recreational waters using environmental predictors. It provides novel insight on key variables of interest, modeling techniques, and considerations of modeling for those looking to create predictive models at other waters. Our review is the first to provide a systematic approach to reviewing the literature in this area. It focuses exclusively on fresh, recreational waters, and further explores the role of various environmental predictors, which is novel to the literature of this type of modelling. de Brauwere et al. reviewed regression and hydrodynamic models predicting FIB in all surface waters in 2014, and provided an in-depth summary of important processes for hydrodynamic models [72]. We similarly found that most relevant studies in this area were conducted in the U.S., despite wider search parameters. Additionally, this review reports on the validation techniques and amount of data used during model building and validation of reviewed studies. As the geology, pollution sources, and climate of beaches differs geographically, building beach-specific models is important for accuracy [13, 65, 72]. Even in the same region, different bodies of water behave differently. For example, Hatfield [43] created an effective model for FIB in Lake Erie, but a similar model for a nearby artificial lake was not successful due to poor efficacy. However, geographically similar beaches within a specific region may be able to be modelled similarly to help reduce resources required to build models [54]. Different beach models may require different modeling approaches and environmental variables, so it is important to explore these elements in new contexts before generalizing models to other beaches. Predictive modelling has the ability to overcome several issues in recreational water monitoring. Firstly, it addresses the reliance on persistence models, where the accuracy of posting beaches as suitable or unsuitable for swimming and other water activities depends on FIB concentrations remaining consistent across the 24-hour lab-response time. It also does not require the large resource and capacity investment of upgrading to qPCR for rapid testing, as most beach managers collect FIB data and government weather and water stations are already set up at or near many recreational waterways, resulting in less investment to collect data to develop and implement models. However, these techniques can still be integrated together. The city of Chicago has adopted a hybrid model for determining beach water quality [73]. The five beaches (out of 20) that produce 56% of poor water quality days are tested with qPCR everyday, with the others placed into clusters, with one beach per cluster tested with qPCR and the rest predicted with models. This hybrid approach identifies poor water quality days three times more accurately than the previous predictive models alone. The rapid testing ensures accuracy, while the predictive models reduce costs and may provide a solution to the shortcomings of both methods. The efficacy of predictive models depends on the quality and accuracy of information put into them. Thirty-seven studies collected at least some of their environmental data from governmental sources, which are likely to be reliable in quality. While they might reflect slightly different weather conditions from beaches, due to being located elsewhere, such small changes are not likely to be a limitation in modelling. Rainfall is an important environmental factor as it washes microbial contamination from urban surfaces and agricultural sources into larger bodies of water, and increases sewer and river discharge [35, 47]. As a result, elevated E. coli levels are often associated with extreme rainfall events [69]. A wide range of timeframes for antecedent rainfall were explored, from a few hours prior to sampling to several days before. For easier interpretation, this review categorized these times as <24 hours, 24 hours, 48 hours, and 72 or more hours. Of the studies that explored times across this range, the most commonly used time in final models was 72+ hours [48, 61, 64]. Some studies also evaluated weighted rainfall variables that emphasized more recent rainfall across a 3-day period. Regardless, when explored in a study, every rainfall variable was included in at least one final model more than 50% of the time, indicating the value of examining and comparing a variety of ways of expressing rainfall. After rainfall, turbidity was the most frequently included variable in at least one final model. It’s importance relates to the association of bacteria with sediments and particulate suspended solids [74]. As UV radiation can kill E. coli, higher turbidity can protect the bacteria by absorbing or scattering solar radiation [75]. The importance of sand-associated FIB was shown at a beach in Lake Huron, where erosion of sand was the main source of E. coil from the foreshore to surface water, mediated by wave height [76]. Larger waves may also be responsible for washing bird fecal matter from the beach into the water [54]. Wind direction and speed are important explanatory variables as they are associated with driving FIB from sediments or point sources towards the beach [77, 78]. Winds, waves, and turbidity are often correlated parameters, as winds and waves churn sediments which increases turbidity [43, 78]. While explored less often, temporal variables were consistently included in final models, 100% of the time for day of year, day of week, and time of sampling, and 75% of the time for sub-season/month. FIB may accumulate in water bodies over the summer and, on average, increase over time during the bathing season [34]. Depending on characteristics, FIB concentrations may increase as the day progresses [66] or decrease [65] due to solar inactivation. This result is also dependent on enumeration method, as Telech et al. found that time of day was an important predictor of Enterococcus cell counts, but not qPCR results [65]. Pollution sources, such as waterfowl, other bathers, and discharge into the body of water were similarly explored less often but were nonetheless important considerations. Numerous modelling techniques and predictor selection methods were utilized in this review. Multiple linear regression methods were the most popular and were shown to produce accurate predictions. However, other methods may produce more accurate predictions. Comparing models built at different locations with different variables and rates of FIB exceedances would not yield accurate comparisons; however, four studies included in this review compared modelling techniques using the same data and were thus able to compare techniques. The best performing models in these four studies were artificial neural networks [50], Bayesian networks [23], gradient boosting machine (a type of random forest) [30], and a model stacking algorithm that combines two or more models into one prediction [67]. All outperformed regression methods such as ordinary, partial, and sparse partial least squares methods for multiple linear regression, and were more consistent across years and locations. Further research is warranted on these approaches and their utility for implementation in routine beach water quality monitoring. Predictor selection was also varied, but no comparisons of methods were conducted. However, seven studies (13%) used the Virtual Beach tool, created by the U.S. Environmental Protection Agency, which is intended to aid researchers and beach managers in creating predictive models [79]. The tool allows users to upload data, explore relationships among variables, transform variables, use different regression-based modelling techniques (including a recent addition of a gradient boosting machine), and evaluate models based on several model fit characteristics. The tool is free and designed to be user-friendly to support implementation of modelling at more beaches. While a gradient boosting machine was added, it still relies on regression techniques. Models created by the tool outperformed persistence models in some studies [27] but not others [37]. A few key limitations in the literature were found in the risk-of-bias. For instance, 22 studies validated their models by refitting the model through the original dataset that built the model without internal validation (bootstrapping or cross-validation), which increases the risk of overfitting [21]. Furthermore, only 13 studies (25%) specified whether or not modelling assumptions were met, which could impact model accuracy and reliability. Lastly, 37 studies (70%) did not provide any information about how missing data were dealt with, which raises additional concerns about reliability of the models. The risk of bias checklist, CHARM, required several modifications for this review compared to it’s intended context of human health outcomes. A checklist intended for systematic reviews of non-health related predictive models would benefit future reviews and improve reporting of risk of bias information when creating predictive models in this research area. The goal of predictive models is to produce more accurate results than persistence models, using the previous day’s FIB measurement for current day decisions. Most models included in this review outperformed persistence models to varying degrees, in terms of sensitivity, specificity, and/or accuracy, supporting the use of predictive models in management decisions [27, 35, 64, 70, 80]. However even if models are used for management decisions, routine water sampling for FIB should still be conducted to ensure models remain valid, and are updated and refined as appropriate, across seasons. To ensure models are up to date, the U.S. Geological Survey suggests that beach managers update their predictive models before every new bathing season [27, 70], which may not always occur in practice [81]. Once an accurate model is created, their use by beach management or the public to make decisions regarding recreational activities requires a user-friendly interface. The U.S. Geological Survey Great Lakes NowCast [81] provides real-time estimates of beach water quality along Lake Erie and Lake Ontario to the public [81]. Built from the Ohio NowCast system, several studies in this review were used in developing this tool [35, 36, 38]. The predictive models created for the Cuyahoga river were also added into the Ohio NowCast [27, 28]. The website allows users to examine current and past conditions, and also explains factors in the model. The Philly Rivercast [82] provides nowcasts for the Skullykill River and it’s development was outlined by Maimone et al. [49]. These platforms are used by beach managers and the public, which allows authorities to make real-time water quality decisions easily, and the pubic to learn about beach postings prior to arrival and make decisions about whether or not to swim or engage in other recreational activities at the beach. Additionally, as seen with the Great Lakes NowCast, these platforms can be modified and scaled to include new beaches as appropriate. There were several limitations to this study. Firstly, while grey literature was included, only selected government websites were searched. Therefore, we could have missed some relevant studies. However, our search verification strategy helped to mitigate this potential bias. Lastly, our review was geographically limited to fresh, recreational waters in temperate regions, excluding models created for marine, tropical and subtropical waters. Predictive models in those settings may have different environmental predictors and performance.

Conclusions

This review is the first to systematically examine literature on predictive models for FIB levels in fresh, recreational waters. The review reports on 53 relevant articles extracted from five databases. We have highlighted commonly explored and frequently used environmental variables and modelling techniques that can inform future predictive modelling projects and options for beach managers. Rainfall, turbidity, wind, and wave height were most commonly incorporated into final models, and most models used linear regression. Evidence supports use of real-time models of FIB levels as an indicator of water quality rather than or in addition to using persistence models. At locations with consistent monitoring of FIB, predictive models can improve the effectiveness and response times of risk communication with beachgoers about recreational water quality risks, which can help to potentially reduce water-borne illness. A risk of bias checklist was adapted for this review and identified common limitations in the literature. Future research may benefit from a risk of bias checklist intended for non-medical predictive models. This review provides insight for researchers and beach managers interested in creating their own predictive models in terms of key variables, modelling approaches, and bias-reduction techniques to consider. More research should be conducted to evaluate the effectiveness and utility of more advanced predictive modelling approaches such as artificial neural networks, Bayesian approaches, and other machine learning methods.

PRISMA checklist for systematic reviews components and location they can be found in the review.

(PDF) Click here for additional data file.

Search terms used in each database.

(PDF) Click here for additional data file.

Grey literature search of government websites and their URLs.

Searched December 10–14, 2020. (PDF) Click here for additional data file.

Eligibility criteria to define microbes of interest, geography, predictors of interest, and types of publications.

(PDF) Click here for additional data file.

Data extraction form, including primary outcomes and risk of bias questions.

(PDF) Click here for additional data file.

Descriptive summary of number of swimming seasons used for model building, number of beaches investigated, FIB of interest, geography of beaches, and type of publication of the 53 relevant studies.

(PDF) Click here for additional data file.

Frequency of variables explored in studies and used in a final model for predicting microbial water quality.

(PDF) Click here for additional data file.

Average accuracy of models that assessed accuracy and whether or not they performed better than persistence models.

(PDF) Click here for additional data file.

Risk-of-bias characteristics of 53 articles reporting on predictive models of fecal indicator bacteria using environmental predictors, excluding characteristics found in Table 1 of main text.

(PDF) Click here for additional data file.

Protocol for systematic literature review.

(PDF) Click here for additional data file. 22 Jul 2021 PONE-D-21-14797 Systematic Review of Predictive Models of Microbial Water Quality at Freshwater Recreational Beaches PLOS ONE Dear Dr. Heasley, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Sep 05 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Zaher Mundher Yaseen Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Interactive comment on “Systematic Review of Predictive Models of Microbial Water Quality at Freshwater Recreational Beaches” by Heasley et al. Pr. Salim Heddam heddamsalim@yahoo.fr. https://orcid.org/0000-0002-8055-8463 The paper is very interesting, well written and well documented and easy to read. Few similar studies are available in the literature. In depth literature review was conducted and a useful taxonomy of predictive models of microbial water quality at freshwater (i.e. lakes and river) was provided, where standards regression and machines learning models are categorized into several groups based upon their: structure, input variables (i.e., predictor), location, number of beaches and swimming seasons, performances metrics and how the proposed models were validated against measured data, and especially, highlighting their practical benefit and environmental implications. Also, the present study provides a taxonomy that can be used to distinguish between simple, complicated, and complex models developed at different time scale. I have seen some important conclusion of the present study, which I really appreciate: (i) multiple linear regressions is used most often at nearly 70%, (ii) rainfall is reported as the most important weather variable used for model development, (iii) water turbidity (68%) and temperature (70%) are the most significant water quality variables selected as relevant predictors, (iv) in overall the proposed models were validated using R2 and the RMSE, while what surprised me mostly is that, the Nash-Sutcliffe efficiency was rarely adopted as performance metric for model validation, and (v) the most fecal indicator of interest is the E.coli. The reviewed papers were deeply analyzed and the reported results were scientifically discussed which made the manuscript very sound. I have read the paper several times, and I see that the authors have hardly worked for providing an excellent paper and to my scientific opinion it can be accepted without revision. Reviewer #2: The topic of this review (Systematic Review of Predictive Models of Microbial Water Quality at Freshwater Recreational Beaches) is interesting and informative for the readers. This manuscript has been written reasonably and satisfactorily, but further modifications are required before publication. My comments are: - The abstract needs improvements and add more mathematical findings to be more informative. - A list of abbreviations should be added. - A list of contents should be included. -The novelty of this work was not clearly presented. Please follow the literature review and show the knowledge gaps identified and link them to your research objectives. - The conclusion part needs improvement to make it more informative to the readership. Reviewer #3: Comments on the following manuscript: Systematic Review of Predictive Models of Microbial Water Quality at Freshwater Recreational Beaches The manuscript should be improved to visualize the results of the review conducted by the reviewers: Here my comments on the manuscript: Abstract: Comment No.1: Don’t use He, we in all the manuscript, revise the following sentences. And other sentences in all the manuscript. And so on. We conducted a systematic review of predictive models of fecal indicator bacteria at freshwater recreational sites in temperate climates to identify and describe the existing approaches, trends, and their performance to inform beach water management policies. We conducted a comprehensive search strategy, including five databases and grey literature, screened abstracts for relevance, and extracted data using structured forms. Data were descriptively summarized. Comment No.2: Add recommendation in the nd of the abstract for future research. Introduction: Comment No.3: Check that all references cited on the manuscript. Results: Comment No.4: Add column in the left of Table 1 indicates the number of study from 1 to 53 Comment No.5: Add column in the right in Table 1 shows the recommendations if found of each study Comment No.6: Add column in the center of Table 1 displays the limitations of using the proposed model of each study if found Comment No.7: Use pie chart and column charts beside with figures to visualize your results in Table 2, 3, and 4 Especially for Table No.4, for each section, new figure can be added Weather, hydrodynamic, contamination sources and others Comment No.8: Use pie chart to indicate results for each contour. Comment No.9: In your question: Were predictor weights or regression coefficients shrunk at all? Indicate why the results 100% for No answer Comment No.10: Add new section and talk about the accuracy and limitations of predictive models. Comment No.11: For results in table number 5, indicate the limitation with researchers comparing with governmental for the availability of data and its effect on results. Comment No.12: Add new table ion it compare between the used models and each limitations in prediction. Discussion: Provide reasons for the following conclusions: Comment No.13: Multiple linear regression methods were the most popular and were shown to produce accurate predictions. However, other methods may produce more accurate predictions. Comparing models built at different locations with different variables and rates of FIB exceedances would not yield accurate comparisons; however, four studies included in this review compared modelling techniques. Comment No.14: Add sentences indicate the less significant of other parameters when using the modeling rather than the parameters in the following sentences: Larger waves may also be responsible for washing bird fecal matter from the beach into the water [54]. Wind direction and speed are important explanatory variables as they are associated with driving FIB from sediments or point sources towards the beach [77,78]. Winds, waves, and turbidity are often correlated 277 parameters, as winds and waves churn sediments which increases turbidity [43,78]. Comment No.15: Give more information about the following sentence: Different geographical contexts require different approaches and variables, so it is important to explore these elements in new contexts. Conclusion: Comment No.16: Add one more recommendations for future studies in the end of conclusion section. Comment No.17: Add symbol section definition if available to you. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Pr. Salim Heddam Reviewer #2: No Reviewer #3: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. Submitted filename: REVIEW NOTE-PONE-D-21-14797.docx Click here for additional data file. 12 Aug 2021 Please see attached response to reviewers with feedback to each specific point made by the editor and reviewers. Submitted filename: Response to Reviewers.docx Click here for additional data file. 16 Aug 2021 Systematic Review of Predictive Models of Microbial Water Quality at Freshwater Recreational Beaches PONE-D-21-14797R1 Dear Dr. Heasley, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Zaher Mundher Yaseen Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: 18 Aug 2021 PONE-D-21-14797R1 Systematic review of predictive models of microbial water quality at freshwater recreational beaches Dear Dr. Heasley: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Zaher Mundher Yaseen Academic Editor PLOS ONE
  39 in total

1.  Indicator bacteria at five swimming beaches-analysis using random forests.

Authors:  David F Parkhurst; Kristen P Brenner; Alfred P Dufour; Larry J Wymer
Journal:  Water Res       Date:  2005-02-08       Impact factor: 11.236

2.  Improving the robustness of beach water quality modeling using an ensemble machine learning approach.

Authors:  Leizhi Wang; Zhenduo Zhu; Lauren Sassoubre; Guan Yu; Chen Liao; Qingfang Hu; Yintang Wang
Journal:  Sci Total Environ       Date:  2020-10-17       Impact factor: 7.963

3.  Slow adoption of rapid testing: Beach monitoring and notification using qPCR.

Authors:  Abhilasha Shrestha; Samuel Dorevitch
Journal:  J Microbiol Methods       Date:  2020-05-19       Impact factor: 2.363

Review 4.  Nowcasting methods for determining microbiological water quality at recreational beaches and drinking-water source waters.

Authors:  Donna S Francy; Amie M G Brady; Jessica R Cicale; Harrison D Dalby; Erin A Stelzer
Journal:  J Microbiol Methods       Date:  2020-06-06       Impact factor: 2.363

5.  Comparison of different model approaches for a hygiene early warning system at the lower Ruhr River, Germany.

Authors:  Hans-Joachim Mälzer; Tim Aus der Beek; Silke Müller; Jörg Gebhardt
Journal:  Int J Hyg Environ Health       Date:  2015-06-23       Impact factor: 5.840

6.  Impact of Escherichia coli from stormwater drainage on recreational water quality: an integrated monitoring and modelling of urban catchment, pipes and lake.

Authors:  Yi Hong; Frédéric Soulignac; Adélaïde Roguet; Chenlu Li; Bruno J Lemaire; Rodolfo Scarati Martins; Françoise Lucas; Brigitte Vinçon-Leite
Journal:  Environ Sci Pollut Res Int       Date:  2020-09-02       Impact factor: 4.223

Review 7.  Sunlight-mediated inactivation of health-relevant microorganisms in water: a review of mechanisms and modeling approaches.

Authors:  Kara L Nelson; Alexandria B Boehm; Robert J Davies-Colley; Michael C Dodd; Tamar Kohn; Karl G Linden; Yuanyuan Liu; Peter A Maraccini; Kristopher McNeill; William A Mitch; Thanh H Nguyen; Kimberly M Parker; Roberto A Rodriguez; Lauren M Sassoubre; Andrea I Silverman; Krista R Wigginton; Richard G Zepp
Journal:  Environ Sci Process Impacts       Date:  2018-08-16       Impact factor: 4.238

8.  Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015: elaboration and explanation.

Authors:  Larissa Shamseer; David Moher; Mike Clarke; Davina Ghersi; Alessandro Liberati; Mark Petticrew; Paul Shekelle; Lesley A Stewart
Journal:  BMJ       Date:  2015-01-02

9.  Estimate of incidence and cost of recreational waterborne illness on United States surface waters.

Authors:  Stephanie DeFlorio-Barker; Coady Wing; Rachael M Jones; Samuel Dorevitch
Journal:  Environ Health       Date:  2018-01-09       Impact factor: 5.984

View more
  2 in total

1.  Region-Specific Associations between Environmental Factors and Escherichia coli in Freshwater Beaches in Toronto and Niagara Region, Canada.

Authors:  Johanna Sanchez; Jordan Tustin; Cole Heasley; Mahesh Patel; Jeremy Kelly; Anthony Habjan; Ryan Waterhouse; Ian Young
Journal:  Int J Environ Res Public Health       Date:  2021-12-06       Impact factor: 3.390

2.  Assessment of a Multifunctional River Using Fuzzy Comprehensive Evaluation Model in Xiaoqing River, Eastern China.

Authors:  Yongfei Fu; Yuyu Liu; Shiguo Xu; Zhenghe Xu
Journal:  Int J Environ Res Public Health       Date:  2022-09-27       Impact factor: 4.614

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.