| Literature DB >> 33897276 |
Abstract
The present study evaluates the water quality status of 6-km-long Kali River stretch that passes through the Aligarh district in Uttar Pradesh, India, by utilizing high-resolution IRS P6 LISS IV imagery. In situ river water samples collected at 40 random locations were analyzed for seven physicochemical and four heavy metal concentrations, and the water quality index (WQI) was computed for each sampling location. A set of 11 spectral reflectance band combinations were formulated to identify the most significant band combination that is related to the observed WQI at each sampling location. Three approaches, namely multiple linear regression (MLR), backpropagation neural network (BPNN) and gene expression programming (GEP), were employed to relate WQI as a function of most significant band combination. Comparative assessment among the three utilized approaches was performed via quantitative indicators such as R 2, RMSE and MAE. Results revealed that WQI estimates ranged between 203.7 and 262.33 and rated as "very poor" status. Results further indicated that GEP performed better than BPNN and MLR approaches and predicted WQI estimates with high R 2 values (i.e., 0.94 for calibration and 0.91 for validation data), low RMSE and MAE values (i.e., 2.49 and 2.16 for calibration and 4.45 and 3.53 for validation data). Moreover, both GEP and BPNN depicted superiority over MLR approach that yielded WQI with R 2 ~ 0.81 and 0.67 for calibration and validation data, respectively. WQI maps generated from the three approaches corroborate the existing pollution levels along the river stretch. In order to examine the significant differences among WQI estimates from the three approaches, one-way ANOVA test was performed, and the results in terms of F-statistic (F = 0.01) and p-value (p = 0.994 > 0.05) revealed WQI estimates as "not significant," reasoned to the small water sample size (i.e., N = 40). The study therefore recommends GEP as more rational and a better alternative for precise water quality monitoring of surface water bodies by producing simplified mathematical expressions.Entities:
Keywords: ANN; GEP; Kali River; MLR; Spectral reflectance; WQI
Year: 2021 PMID: 33897276 PMCID: PMC8058146 DOI: 10.1007/s10668-021-01437-6
Source DB: PubMed Journal: Environ Dev Sustain ISSN: 1387-585X Impact factor: 3.219
Fig. 1Location map of the study area (map not to scale)
WQI and corresponding water quality rating as per the BIS (1986) specifications
| S no | WQI | Status | Possible usages |
|---|---|---|---|
| 1 | 0–50 | Excellent | Drinking, irrigation and industrial |
| 2 | 50–100 | Good | Domestic, irrigation and industrial |
| 3 | 100–200 | Poor | Irrigation |
| 4 | 200–300 | Very poor | Restricted use for all purposes |
| 5 | > 300 | Severe | Proper treatment required before use |
Fig. 2Subset image of study area with sampling locations along the river stretch
Fig. 3Neural network architecture with input variables as bands/band combinations and WQI as target variable
Fig. 4An example of gene ET
Fig. 5Flowchart illustrating the process of GEP model building
Formulated band combinations with details of BPNN architectures
| Band combination cases | Input/independent variables | Target variable | Network architecture I–H–Oa | Learning rate | |
|---|---|---|---|---|---|
| No | Description | ||||
| 1 | 2 | G, R | WQI | 2–2-1 | 0.058 |
| 2 | 2 | G, NIR | 2–2-1 | 0.072 | |
| 3 | 2 | R, NIR | 2–3-1 | 0.069 | |
| 4 | 3 | G, R, NIR | 3–4-1 | 0.087 | |
| 5 | 4 | G, R, NIR, G/R | 4–5-1 | 0.055 | |
| 6 | 4 | G, R, NIR, G/NIR | 4–3-1 | 0.047 | |
| 7 | 4 | G, R, NIR, R/NIR | 4–6-1 | 0.025 | |
| 8 | 5 | G, R, NIR, G/R, G/NIR | 5–4-1 | 0.065 | |
| 9 | 5 | G, R, NIR, G/R, R/NIR | 5–6-1 | 0.046 | |
| 10 | 5 | G, R, NIR, G/NIR, R/NIR | 5–7-1 | 0.095 | |
| 11 | 6 | G, R, NIR, G/R, G/NIR, R/NIR | 6–8-1 | 0.075 | |
aI–H–O: input–hidden layer neurons–output
Parameters adopted for the optimal GEP model
| Parameter | Value |
|---|---|
| Population size | 50 |
| Genes per chromosomes | 8 |
| Gene head length | 14 |
| Maximum generations | 5000 |
| Fitness function | |
| Precision (hit tolerance) | 0.01 |
| Mutation rate | 0.054 |
| Inversion rate | 0.1 |
| Computational functions | + , − , × ,/,1/ Addition, subtraction, multiplication, division, inverse, square, negation |
| Linking function | addition |
| IS transposition rate | 0.1 |
| RIS transposition rate | 0.1 |
| Gene transposition rate | 0.1 |
| Recombination one-point rate | 0.3 |
| Recombination two-point rate | 0.3 |
Descriptive statistics of the measured WQPs
| S. no | Water quality parameters (mg/l) | Range min–max | Mean | Standard deviation ( σ) | Population variance ( | Sample variance ( σ2) | SEMa | BIS standard (BIS, |
|---|---|---|---|---|---|---|---|---|
| 1 | pH | 6.93–7.56 | 7.33 | 0.12 | 0.0147 | 0.0141 | 0.0242 | 6.5–8.5 |
| 2 | EC (μs/cm) | 1083–1852 | 1644 | 190.61 | 36,334 | 34,881 | 38.12 | 300 |
| 3 | TDS | 754–851 | 800.32 | 20.06 | 402.56 | 386.53 | 4.013 | 500 |
| 4 | Alkalinity | 540–680 | 616.68 | 40.39 | 1631.56 | 1566.3 | 8.078 | 200 |
| 5 | COD | 72.15–91.20 | 81.55 | 6.10 | 57.38 | 49.32 | 1.515 | 250 |
| 6 | DO | 2.08–6.74 | 5.85 | 1.04 | 1.089 | 0.882 | 0.211 | 5 |
| 7 | BOD | 23.20–38.50 | 29.61 | 4.02 | 16.124 | 15.48 | 0.803 | 5 |
| 8 | Cr | 0 | 0 | 0 | 0 | 0 | 0 | 0.05 |
| 9 | Pb | 0.19–0.24 | 0.21 | 2.21 | 1.4 × 10–4 | 1.1 × 10–4 | 0.002 | 0.1 |
| 10 | Fe | 0.01–0.03 | 0.02 | 0.01 | 8.7 × 10–5 | 8.4 × 10–5 | 0.0018 | 0.3 |
| 11 | Mn | 0 | 0 | 0 | 0 | 0 | 0 | 0.1 |
aSEM standard error of means
Coefficient of determination (R2), RMSE and MAE between the observed and estimated WQIs from three approaches
| Band combination cases | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Inputs/variables | 2 | 2 | 2 | 4 | 4 | 5 | 5 | 5 | 6 | ||||
| Iterations | 35,234 | 28,731 | 32,186 | 36,871 | 44,528 | 47,351 | 51,277 | 49,545 | 55,285 | 57,368 | |||
| MLR | Cal (80%) | 0.46 | 0.66 | 0.65 | 0.72 | 0.77 | 0.56 | 0.59 | 0.51 | 0.32 | 0.27 | ||
| RMSE | 18.51 | 15.67 | 9.45 | 6.85 | 7.31 | 10.52 | 12.35 | 11.68 | 17.77 | 21.45 | |||
| MAE | 16.77 | 15.21 | 8.63 | 5.37 | 6.82 | 8.86 | 10.73 | 11.24 | 15.52 | 18.96 | |||
| Val (20%) | 0.34 | 0.52 | 0.47 | 0.51 | 0.55 | 0.57 | 0.49 | 0.43 | 0.15 | 0.09 | |||
| RMSE | 16.52 | 8.96 | 13.76 | 9.63 | 7.55 | 7.37 | 11.52 | 13.93 | 28.79 | 36.21 | |||
| MAE | 15.23 | 8.44 | 10.73 | 8.68 | 7.15 | 6.97 | 9.38 | 12.51 | 23.48 | 31.92 | |||
| BPNN | Cal Tr Tec | 0.84 | 0.88 | 0.85 | 0.92 | 0.89 | 0.92 | 0.87 | 0.79 | 0.87 | 0.93 | ||
| RMSE | 3.78 | 3.64 | 3.32 | 2.58 | 3.38 | 3.76 | 4.81 | 5.24 | 4.95 | 3.17 | |||
| MAE | 3.34 | 3.46 | 2.98 | 2.42 | 3.15 | 3.28 | 4.47 | 4.89 | 4.33 | 2.87 | |||
| Val (20%) | 0.76 | 0.79 | 0.84 | 0.85 | 0.81 | 0.83 | 0.69 | 0.73 | 0.79 | 0.83 | |||
| RMSE | 5.21 | 6.87 | 5.91 | 5.85 | 6.76 | 6.62 | 7.23 | 7.17 | 6.95 | 6.34 | |||
| MAE | 4.85 | 5.54 | 5.73 | 4.97 | 6.16 | 5.65 | 7.12 | 6.78 | 6.23 | 5.92 | |||
| GEP | Cal (80%) | 0.78 | 0.84 | 0.91 | 0.88 | 0.92 | 0.89 | 0.83 | 0.91 | 0.79 | 0.82 | ||
| RMSE | 6.89 | 6.34 | 5.21 | 5.96 | 3.16 | 3.67 | 7.78 | 3.41 | 7.39 | 6.57 | |||
| MAE | 5.74 | 5.22 | 4.78 | 5.17 | 2.87 | 2.83 | 6.41 | 2.75 | 5.53 | 5.27 | |||
| Val (20%) | 0.63 | 0.79 | 0.82 | 0.86 | 0.87 | 0.81 | 0.76 | 0.82 | 0.71 | 0.79 | |||
| RMSE | 8.87 | 7.21 | 6.87 | 6.34 | 5.92 | 6.54 | 7.11 | 7.08 | 7.87 | 8.29 | |||
| MAE | 7.34 | 6.88 | 6.12 | 5.43 | 4.58 | 5.71 | 6.51 | 5.24 | 6.19 | 7.33 | |||
X no. of input/independent variables, bI total no. of iterations for BPNN, cTr training data (60%), Te testing data (20%), Cal calibration, Val validation
‘Bold’ indicates the optimal WQI model achieved
MLR coefficients for calibration data for the most appropriate band combination
| Multiple regression equation | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| BCa case Nn | Parameter distribution | Regression coefficients | RMSE | MAE | RMSE | MAE | |||
| 5 | 04: G, R, NIR, G/R | 0.81 | 4.36 | 3.0 | 0.60 | 6.3 | 4.64 | ||
aBC band combination,bX no. of input/independent variables, cCal calibration, dVal validation
Fig. 7Scatter plots between observed and estimated WQIs from a MLR approach for band combination 5; 4 inputs, b BPNN approach for band combination 4; 3 inputs and c GEP approach for band combination 5; 4 inputs
Final weight matrix of the trained BPNN model with 3 input variables
| Predictor variables | BPNN prediction model (hidden layer) | |||
|---|---|---|---|---|
| Connecting weights of 4 neurons | ||||
| Input layer | N1 | N2 | N3 | N4 |
| G | − 0.234 | 0.046 | 0.824 | 0.147 |
| R | − 0.463 | − 0.042 | 0.568 | − 0.221 |
| NIR | 0.173 | − 0.386 | − 0.034 | − 0.579 |
| Bias | 0.251 | − 0.366 | − 0.721 | 0.632 |
| Target variable (WQI) output layer | 0.437 | − 0.254 | − 0.022 | 0.771 |
N1 neuron 1, N2 neuron 2, N3 neuron 3, N4 neuron 4
Fig. 6Expression trees for the optimal GEP model with 4 spectral bands
Fig. 8Comparative line plot of observed WQI and estimated WQI from the three employed approaches
Fig. 9WQI maps of the river stretch generated from a MLR, b BPNN and c GEP analysis
One-way ANOVA test for WQI estimates from the three employed approaches
| Square summation | Degree of freedom | Mean of square | ||||
|---|---|---|---|---|---|---|
| Between groups | 0.9329 | 2 | 0.4665 | 0.01 | 0.994 | 3.08 |
| Within groups | 9830.53 | 117 | 84.021 | |||
| Total | 9831.47 | 119 |
As the P-value is more than 0.05, it is not significant at 2.5% level
Fig. 10Boxplot of WQI estimates from the three employed approaches