| Literature DB >> 33131841 |
Leizhi Wang1, Zhenduo Zhu2, Lauren Sassoubre3, Guan Yu4, Chen Liao5, Qingfang Hu6, Yintang Wang6.
Abstract
Microbial pollution of beach water can expose swimmers to harmful pathogens. Predictive modeling provides an alternative method for beach management that addresses several limitations associated with traditional culture-based methods of assessing water quality. Widely-used machine learning methods often suffer from high variability in performance from one year or beach to another. Therefore, the best machine learning method varies between beaches and years, making method selection difficult. This study proposes an ensemble machine learning approach referred to as model stacking that has a two-layered learning structure, where the outputs of five widely-used individual machine learning models (multiple linear regression, partial least square, sparse partial least square, random forest, and Bayesian network) are taken as input features for another model that produces the final prediction. Applying this approach to three beaches along eastern Lake Erie, New York, USA, we show that generally the model stacking approach was able to generate reliably good predictions compared to all of the five base models. The accuracy rankings of the stacking model consistently stayed 1st or 2nd every year, with yearly-average accuracy of 78%, 81%, and 82.3% at the three studied beaches, respectively. This study highlights the value of the model stacking approach in predicting beach water quality and solving other pressing environmental problems.Entities:
Keywords: E. coli; Fecal indicator bacteria; Machine learning model; Model stacking; Water quality
Year: 2020 PMID: 33131841 DOI: 10.1016/j.scitotenv.2020.142760
Source DB: PubMed Journal: Sci Total Environ ISSN: 0048-9697 Impact factor: 7.963