| Literature DB >> 32283787 |
Lucas Silveira Kupssinskü1, Tainá Thomassim Guimarães1, Eniuce Menezes de Souza2, Daniel C Zanotta1, Mauricio Roberto Veronez1, Luiz Gonzaga1, Frederico Fábio Mauad3.
Abstract
Total Suspended Solids (TSS) and chlorophyll-a concentration are two critical parameters to monitor water quality. Since directly collecting samples for laboratory analysis can be expensive, this paper presents a methodology to estimate this information through remote sensing and Machine Learning (ML) techniques. TSS and chlorophyll-a are optically active components, therefore enabling measurement by remote sensing. Two study cases in distinct water bodies are performed, and those cases use different spatial resolution data from Sentinel-2 spectral images and unmanned aerial vehicles together with laboratory analysis data. In consonance with the methodology, supervised ML algorithms are trained to predict the concentration of TSS and chlorophyll-a. The predictions are evaluated separately in both study areas, where both TSS and chlorophyll-a models achieved R-squared values above 0.8.Entities:
Keywords: K nearest neighbors; artificial neural networks; chlorophyll-a; machine learning; random forest; remote sensing; total suspended solids; water quality
Mesh:
Substances:
Year: 2020 PMID: 32283787 PMCID: PMC7181123 DOI: 10.3390/s20072125
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Flowchart of the proposed method.
Figure 2Train-test cross-validation split methodology used in this paper. The first operation applied is a train-test split that randomly separates 90% of the data as training and 10% for testing. The second operation is a validation split that uses only the train split of the previous step and produces ten different folds of training and validation splits.
Figure 3Study areas.
Descriptive statistics of the field data and post IDW interpolation.
| Field Data | IDW | ||||
|---|---|---|---|---|---|
| Broa | Unisinos | Broa | Unisinos | ||
| Sample Size | 90 | 42 | 3933 | 35,028 | |
| Chlorophyll-a | Mean | 7.35 | 94.30 | 7.18 | 96.88 |
| Median | 7.20 | 67.52 | 6.71 | 91.65 | |
| Mode | 6–10 | 50–150 | 5.5–10 | 44–137 | |
| Standard Deviation | 2.66 | 59.04 | 2.46 | 62.12 | |
| Total Suspended | Mean | 3.05 | 14.93 | 2.92 | 15.56 |
| Median | 2.92 | 14.92 | 2.91 | 15.34 | |
| Mode | 3 | 15 | 3 | 16 | |
| Standard Deviation | 1.26 | 3.29 | 1.12 | 3.19 | |
| pH | Mean | 5.70 | 8.67 | 5.69 | 8.89 |
| Median | 5.84 | 8.85 | 5.81 | 9.15 | |
| Standard Deviation | 0.55 | 0.82 | 0.53 | 0.68 | |
Figure 4Dependent variables’ distribution from both study areas considering the data post-IDW interpolation.
Mean values for the evaluated metrics in the cross-validation step. The highest values for are presented in boldface.
| Unisinos | Broa | ||||
|---|---|---|---|---|---|
| Chl-a | TSS | Chl-a | TSS | ||
| Linear regression |
| 0.31521 | 0.22380 | 0.35389 | 0.39648 |
|
| 0.03642 | 0.04681 | 0.01642 | 0.01345 | |
| LASSO |
| 0.31517 | 0.22379 | 0.35126 | 0.39457 |
|
| 0.03643 | 0.04681 | 0.01649 | 0.01349 | |
| KNN |
| 0.89644 | 0.85111 | 0.76172 | 0.72900 |
|
| 0.00549 | 0.00895 | 0.00607 | 0.00602 | |
| SVR |
| 0.63457 | 0.56024 | 0.41329 | 0.42219 |
|
| 0.01942 | 0.02650 | 0.01491 | 0.01290 | |
| RF |
| 0.90012 |
|
|
|
|
| 0.00531 | 0.00867 | 0.00450 | 0.00415 | |
| ANN |
|
| 0.85371 | 0.72573 | 0.67819 |
|
| 0.00524 | 0.00882 | 0.00700 | 0.00719 | |
Figure 5Boxplot representing the metric obtained on the cross-validation; the metric on the test set is marked with a star.
Figure 6Comparison between the maps generated by the random forest algorithm and through IDW interpolation.
Figure 7Comparison between field measurements with the random forest results.