| Literature DB >> 30200256 |
Sina Keller1, Philipp M Maier2, Felix M Riese3, Stefan Norra4, Andreas Holbach5, Nicolas Börsig6, Andre Wilhelms7, Christian Moldaenke8, André Zaake9, Stefan Hinz10.
Abstract
Inland waters are of great importance for scientists as well as authorities since they are essential ecosystems and well known for their biodiversity. When monitoring their respective water quality, in situ measurements of water quality parameters are spatially limited, costly and time-consuming. In this paper, we propose a combination of hyperspectral data and machine learning methods to estimate and therefore to monitor different parameters for water quality. In contrast to commonly-applied techniques such as band ratios, this approach is data-driven and does not rely on any domain knowledge. We focus on CDOM, chlorophyll a and turbidity as well as the concentrations of the two algae types, diatoms and green algae. In order to investigate the potential of our proposal, we rely on measured data, which we sampled with three different sensors on the river Elbe in Germany from 24 June⁻12 July 2017. The measurement setup with two probe sensors and a hyperspectral sensor is described in detail. To estimate the five mentioned variables, we present an appropriate regression framework involving ten machine learning models and two preprocessing methods. This allows the regression performance of each model and variable to be evaluated. The best performing model for each variable results in a coefficient of determination R 2 in the range of 89.9% to 94.6%. That clearly reveals the potential of the machine learning approaches with hyperspectral data. In further investigations, we focus on the generalization of the regression framework to prepare its application to different types of inland waters.Entities:
Keywords: algae; chlorophyll a; field campaign; fluorometer; hyperspectral data; machine learning; multi-sensor system; regression; spectral features; water quality parameters
Mesh:
Substances:
Year: 2018 PMID: 30200256 PMCID: PMC6164519 DOI: 10.3390/ijerph15091881
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Figure 1Map of the study area with the probe measurements.
Figure 2Application of the Biofish sensor system (white floating cylinder) from the research vessel Elbegrund in the river Elbe.
Number of datapoints per water quality parameter as the target variable and dataset.
| Water Quality Parameter | Number of Datapoints | ||
|---|---|---|---|
| Full Dataset | Training Subset | Test Subset | |
| CDOM | 802 | 240 | 562 |
| Chlorophyll | 1035 | 339 | 696 |
| Green algae | 1028 | 336 | 692 |
| Diatoms | 1012 | 332 | 680 |
| Turbidity | 802 | 240 | 562 |
Figure 3Distributions of the water quality parameter values. Each full dataset (blue bars) is split randomly into a training (orange) and a test (red) subset. The number of datapoints is symbolized as N.
Figure 4Schematic representation of the regression framework (adapted from [23]). The water quality parameter is either CDOM, chlorophyll a, green algae, diatoms or turbidity. * PCA is applied solely on the hyperspectral vector data.
Regression results for the estimation of CDOM, chlorophyll a, green algae, diatoms and turbidity. The bold values represent the best regression results, respectively. The RMSE is given in the respective units of the target variables. After applying min-max scaling, the RMSE is unitless.
| Variable | Model | Baseline | with PCA | with Scaling | |||
|---|---|---|---|---|---|---|---|
| RMSE | RMSE | RMSE | |||||
| CDOM | Linear | 74.3 | 1.05 | 83.2 | 0.85 | 74.3 | 0.12 |
| PLS | 84.9 | 0.81 | 83.2 | 0.85 | 84.9 | 0.09 | |
| RF | 82.4 | 0.87 | 91.4 | 0.61 | 82.4 | 0.10 | |
| ET | 86.2 | 0.77 | 86.3 | 0.09 | |||
| AdaBoost | 80.0 | 0.93 | 91.9 | 0.59 | 79.9 | 0.11 | |
| GB | 80.1 | 0.93 | 91.2 | 0.61 | 80.0 | 0.11 | |
| k-NN | 85.5 | 0.79 | 85.3 | 0.80 | 83.0 | 0.10 | |
| SVM | 91.2 | 0.61 | 85.6 | 0.09 | |||
| ANN | 87.2 | 0.74 | 50.8 | 1.44 | |||
| SOM | 85.8 | 0.78 | 83.5 | 0.84 | 83.0 | 0.10 | |
| Chlorophyll | Linear | 70.2 | 18.45 | 75.5 | 16.70 | 70.2 | 0.14 |
| PLS | 73.5 | 17.38 | 75.5 | 16.70 | 73.5 | 0.13 | |
| RF | 76.5 | 16.36 | 88.7 | 11.35 | 76.6 | 0.12 | |
| ET | 80.0 | 15.10 | 80.0 | 0.11 | |||
| AdaBoost | 68.7 | 18.89 | 80.0 | 15.11 | 66.7 | 0.14 | |
| GB | 76.5 | 16.36 | 89.4 | 10.98 | 75.5 | 0.12 | |
| k-NN | 76.1 | 16.51 | 76.6 | 16.34 | 75.4 | 0.12 | |
| SVM | 90.0 | 10.71 | 87.6 | 0.09 | |||
| ANN | 67.3 | 19.12 | 90.5 | 10.40 | |||
| SOM | 74.3 | 17.12 | 74.7 | 16.99 | 71.5 | 0.13 | |
| Green algae | Linear | 49.7 | 14.42 | 62.3 | 12.49 | 49.7 | 0.18 |
| PLS | 62.6 | 12.44 | 62.3 | 12.49 | 62.6 | 0.15 | |
| RF | 69.6 | 11.21 | 81.6 | 8.73 | 69.6 | 0.14 | |
| ET | 73.1 | 10.55 | 73.2 | 0.13 | |||
| AdaBoost | 60.5 | 12.78 | 75.6 | 10.05 | 61.7 | 0.15 | |
| GB | 67.0 | 11.68 | 80.6 | 8.95 | 67.1 | 0.14 | |
| k-NN | 68.8 | 11.35 | 68.6 | 11.40 | 68.0 | 0.14 | |
| SVM | 79.7 | 9.18 | |||||
| ANN | 56.8 | 13.34 | 81.3 | 8.79 | 75.9 | 0.12 | |
| SOM | 64.8 | 12.06 | 64.3 | 12.15 | 66.2 | 0.15 | |
| Diatoms | Linear | 55.0 | 10.51 | 62.4 | 9.60 | 55.0 | 0.15 |
| PLS | 58.8 | 10.06 | 62.4 | 9.60 | 58.8 | 0.15 | |
| RF | 68.2 | 8.84 | 81.8 | 6.68 | 68.2 | 0.13 | |
| ET | 72.7 | 8.19 | 86.4 | 5.78 | 72.7 | 0.12 | |
| AdaBoost | 56.7 | 10.31 | 76.3 | 7.62 | 56.8 | 0.15 | |
| GB | 68.0 | 8.87 | 81.5 | 6.73 | 67.2 | 0.13 | |
| k-NN | 68.6 | 8.78 | 68.6 | 8.78 | 67.7 | 0.13 | |
| SVM | 78.2 | 7.32 | |||||
| ANN | 62.4 | 9.60 | 79.8 | 0.10 | |||
| SOM | 64.0 | 9.40 | 64.2 | 9.38 | 63.8 | 0.14 | |
| Turbidity | Linear | 45.0 | 0.44 | 70.9 | 0.32 | 44.0 | 0.19 |
| PLS | 73.3 | 0.30 | 70.9 | 0.32 | 73.3 | 0.13 | |
| RF | 68.0 | 0.33 | 84.1 | 0.24 | 67.7 | 0.15 | |
| ET | 73.6 | 0.30 | 73.0 | 0.14 | |||
| AdaBoost | 67.6 | 0.34 | 85.2 | 0.23 | 66.4 | 0.15 | |
| GB | 69.2 | 0.33 | 85.5 | 0.22 | 69.3 | 0.14 | |
| k-NN | 72.5 | 0.31 | 72.8 | 0.31 | 70.7 | 0.14 | |
| SVM | 74.3 | 0.30 | 73.3 | 0.13 | |||
| ANN | 79.9 | 0.26 | 88.4 | 0.20 | |||
| SOM | 76.0 | 0.29 | 71.6 | 0.31 | 73.0 | 0.14 | |
Figure 5Visualization of the regression results generated by the ET model (central columns) compared to the real probe measurements (left columns) matched with their respective recorded GPS data along the river Elbe. The min-max scaled deviations between the measured (meas.) and the estimated (est.) values of the water quality parameters are illustrated in the right columns. We refer to the chlorophyll a concentration in this plot as Chl-a.
Figure 6Feature importance of the ET regressor without preprocessing (baseline). The upper plot represents the mean spectrum of the hyperspectral data.