| Literature DB >> 31844597 |
Zheng Xing1,2, Junying Chen1,2, Xiao Zhao1,2, Yu Li2, Xianwen Li2, Zhitao Zhang1,2, Congcong Lao2, Haifeng Wang2.
Abstract
Water pollution has been hindering the world's sustainable development. The accurate inversion of water quality parameters in sewage with visible-near infrared spectroscopy can improve the effectiveness and rational utilization and management of water resources. However, the accuracy of spectral models of water quality parameters is usually prone to noise information and high dimensionality of spectral data. This study aimed to enhance the model accuracy through optimizing the spectral models based on the sensitive spectral intervals of different water quality parameters. To this end, six kinds of sewage water taken from a biological sewage treatment plant went through laboratory physical and chemical tests. In total, 87 samples of sewage water were obtained by adding different amount of pure water to them. The raw reflectance (Rraw) of the samples were collected with analytical spectral devices. The Rraw-SNV were obtained from the Rraw processed with the standard normal variable. Then, the sensitive spectral intervals of each of the six water quality parameters, namely, chemical oxygen demand (COD), biological oxygen demand (BOD), NH3-N, the total dissolved substances (TDS), total hardness (TH) and total alkalinity (TA), were selected using three different methods: gray correlation (GC), variable importance in projection (VIP) and set pair analysis (SPA). Finally, the performance of both extreme learning machine (ELM) and partial least squares regression (PLSR) was investigated based on the sensitive spectral intervals. The results demonstrated that the model accuracy based on the sensitive spectral ranges screened through different methods appeared different. The GC method had better performance in reducing the redundancy and the VIP method was better in information preservation. The SPA method could make the optimal trade-offs between information preservation and redundancy reduction and it could retain maximal spectral band intervals with good response to the inversion parameters. The accuracy of the models based on varied sensitive spectral ranges selected by the three analysis methods was different: the GC was the highest, the SPA came next and the VIP was the lowest. On the whole, PLSR and ELM both achieved satisfying model accuracy, but the prediction accuracy of the latter was higher than the former. Great differences existed among the optimal inversion accuracy of different water quality parameters: COD, BOD and TN were very high; TA relatively high; and TDS and TH relatively low. These findings can provide a new way to optimize the spectral model of wastewater biochemical parameters and thus improve its prediction precision.Entities:
Keywords: Band screen; Extreme learning machine; Gray correlation method; Hyperspectral remote sensing; Set pair analysis; Variable importance projection analysis
Year: 2019 PMID: 31844597 PMCID: PMC6911691 DOI: 10.7717/peerj.8255
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Main water quality parameters.
| Parameters | Water inlet | Anoxic tank | Aerobic tank | Sedimentation tank | Outlet of water | Experiment methods |
|---|---|---|---|---|---|---|
| NH3-N (mg/L) | 34.853 | 1.723 | 1.499 | 0 | 0 | According to Nessler’s reagent spectrophotometer with the amount of visible light spectrophotometer 722 N for determining NH3-N |
| Total alkalinity (mg/L) | 251.70 | 147.02 | 148.20 | 101.15 | 103.50 | According to acid base indicator titration method |
| Total hardness (mmol/L) | 1.09 | 1.13 | 1.13 | 1.17 | 1.11 | According to the EDTA titration method (GB11914-1989) |
| Total dissolved substance (mg/L) | 351 | 323 | 317 | 344 | 343 | According to Gravimetric method (GB T5750.5-2006) |
| COD (mg/L) | 425 | 140 | 134 | 23 | 20 | According to the dichromate method (GB11914-1989) using a standard COD digestion apparatus (K-100) to determine COD |
| BOD (mg/L) | 86 | 8.5 | 13 | 3.1 | 6.2 | According to the dilution and inoculation method (HJ505-2009) with a constant temperature incubator (HWS-150 type) for determining the content of BOD5 |
Figure 1Spectral data curve.
(A) Reflectance spectral curves. (B) Standard normal variable reflectance curves.
Figure 2Correlation coefficients of water quality parameters with standard normal variable reflectance.
Figure 3Gray correlation degree (GCD) for water quality parameters with standard normal variable reflectance.
Maximum gray correlation degree and band intervals of water quality parameters content with standard normal variable reflectance.
| Water quality parameters | Sensitive band numbers | Maximum GCD | Maximum GCD band intervals (nm) |
|---|---|---|---|
| COD | 49 | 0.7949 | 820–830 |
| BOD | 50 | 0.7974 | 820–830 |
| NH3-N | 46 | 0.7973 | 820–830 |
| TA | 93 | 0.7878 | 820–830 |
| TDS | 381 | 0.802 | 815–825/766 |
| TH | 601 | 0.8504 | 815–825 |
Figure 4The VIP scores of water quality parameters with standard normal variable reflectance.
The maximum VIP scores of water quality parameters with standard normal variable reflectance.
| Water quality parameters | Sensitive band numbers | Maximum VIP scores | Maximum VIP band interval (nm) | Maximum VIP band (nm) |
|---|---|---|---|---|
| COD | 753 | 1.634 | 465–475 | 466 |
| BOD | 770 | 1.439 | 464–474 | 762 |
| NH3-N | 768 | 1.397 | 990–999 | 999 |
| TA | 709 | 2.466 | 460–470 | 464 |
| TDS | 497 | 4.275 | 460–470 | 463 |
| TH | 543 | 4.893 | 460–470 | 463 |
Figure 5The SPA scores of water quality parameters with standard normal variable reflectance.
The Set pair analysis (SPA) scores of water quality parameter with standard normal variable reflectance.
| Water quality parameters | Sensitive band numbers | Maximum SPA | Maximum SPA band interval (nm) | Maximum SPA band (nm) |
|---|---|---|---|---|
| COD | 750 | 1.598 | 465–475 | 466 |
| BOD | 767 | 1.409 | 762–763 | 762 |
| NH3-N | 765 | 1.364 | 990–999 | 999 |
| TA | 696 | 2.409 | 460–470 | 464 |
| TDS | 280 | 1.928 | 460–470 | 463 |
| TH | 223 | 1.944 | 460–470 | 463 |
PLSR model results based on full band, GC, SPA and VIP screening band.
| Water quality parameters | Wavelength selection methods | Main factor numbers | Modeling set | Validation set | Relative percent deviation | Robustness |
|---|---|---|---|---|---|---|
| RPD | Robust | |||||
| COD | GC | 6 | 0.970 | 0.954 | 4.690 | 0.984 |
| VIP | 4 | 0.938 | 0.917 | 3.238 | 0.978 | |
| SPA | 4 | 0.938 | 0.917 | 3.237 | 0.978 | |
| All | 9 | 0.992 | 0.956 | 4.532 | 0.964 | |
| BOD | GC | 5 | 0.976 | 0.965 | 5.335 | 0.988 |
| VIP | 10 | 0.992 | 0.951 | 4.424 | 0.958 | |
| SPA | 4 | 0.958 | 0.930 | 3.348 | 0.971 | |
| All | 7 | 0.990 | 0.960 | 4.998 | 0.970 | |
| NH3-N | GC | 5 | 0.972 | 0.970 | 5.894 | 0.998 |
| VIP | 7 | 0.970 | 0.935 | 3.977 | 0.965 | |
| SPA | 7 | 0.969 | 0.935 | 3.974 | 0.965 | |
| All | 8 | 0.985 | 0.962 | 5.228 | 0.977 | |
| TDS | GC | 8 | 0.876 | 0.772 | 2.049 | 0.880 |
| VIP | 3 | 0.767 | 0.758 | 1.892 | 0.988 | |
| SPA | 5 | 0.820 | 0.793 | 2.150 | 0.964 | |
| All | 5 | 0.862 | 0.791 | 2.126 | 0.918 | |
| TA | GC | 5 | 0.917 | 0.890 | 2.963 | 0.971 |
| VIP | 4 | 0.883 | 0.927 | 3.578 | 1.049 | |
| SPA | 4 | 0.883 | 0.927 | 3.588 | 1.050 | |
| All | 7 | 0.956 | 0.921 | 3.206 | 0.963 | |
| TH | GC | 12 | 0.978 | 0.900 | 2.871 | 0.920 |
| VIP | 4 | 0.779 | 0.733 | 1.764 | 0.941 | |
| SPA | 3 | 0.794 | 0.708 | 1.701 | 0.892 | |
| All | 5 | 0.820 | 0.817 | 2.228 | 0.996 |
ELM model results based on full band, GC, SPA and VIP screening band.
| Water quality parameters | Wavelength selection methods | The number of neurons in the hidden layer | Modeling set | Validation set | Relative percent deviation | Robustness |
|---|---|---|---|---|---|---|
| R2c | R2p | RPD | Robust | |||
| COD | GC | 23 | 0.964 | 0.956 | 4.667 | 0.991 |
| VIP | 51 | 0.884 | 0.885 | 2.892 | 1.001 | |
| SPA | 55 | 0.959 | 0.928 | 3.567 | 0.967 | |
| All | 48 | 0.936 | 0.920 | 3.513 | 0.983 | |
| BOD | GC | 230 | 0.986 | 0.976 | 6.192 | 0.989 |
| VIP | 36 | 0.949 | 0.933 | 3.872 | 0.983 | |
| SPA | 37 | 0.939 | 0.930 | 3.783 | 0.990 | |
| All | 30 | 0.914 | 0.876 | 2.764 | 0.959 | |
| NH3-N | GC | 200 | 0.979 | 0.976 | 6.596 | 0.997 |
| VIP | 144 | 0.982 | 0.965 | 5.391 | 0.983 | |
| SPA | 37 | 0.960 | 0.959 | 4.901 | 1.019 | |
| All | 48 | 0.949 | 0.940 | 4.155 | 1.093 | |
| TDS | GC | 23 | 0.595 | 0.613 | 1.431 | 1.029 |
| VIP | 19 | 0.714 | 0.715 | 1.822 | 1.001 | |
| SPA | 50 | 0.820 | 0.790 | 2.198 | 0.964 | |
| All | 48 | 0.735 | 0.706 | 1.627 | 0.961 | |
| TA | GC | 103 | 0.907 | 0.895 | 3.059 | 0.986 |
| VIP | 22 | 0.827 | 0.828 | 2.150 | 1.001 | |
| SPA | 68 | 0.952 | 0.924 | 3.651 | 0.970 | |
| All | 30 | 0.778 | 0.784 | 2.089 | 1.008 | |
| TH | GC | 6 | 0.570 | 0.560 | 1.250 | 0.982 |
| VIP | 45 | 0.673 | 0.688 | 1.606 | 1.022 | |
| SPA | 37 | 0.937 | 0.910 | 3.358 | 0.971 | |
| All | 32 | 0.535 | 0.497 | 1.132 | 0.930 |
Figure 6Validation of water quality parameters based on the best model.
(A) COD with GC-PLSR model. (B) BOD with GC-ELM model. (C) NH3-N with GC-ELM model. (D) TDS with SPA-ELM model. (E) TA with SPA-ELM model. (F) TH with SPA-ELM model.