Literature DB >> 33385083

Development of a robust ensemble meta-model for prediction of salinity time series under uncertainty (case study: Talar aquifer).

Abstract

The aim of this study is to develop an accurate and reliable numerical model of the coastal Talar aquifer threatened by seawater intrusion by developing an ensemble meta-model (MM). In comparison with previous methodologies, the developed model has the following superiority: (1) Its performance is enhanced by developing ensemble MMs using four different meta-modelling frameworks, i.e., artificial neural network, support vector regression, radial basis function, genetic programing and evolutionary polynomial regression; (2) The accuracy of different MMs based on 16 integration of four meta-modeling frameworks is compared; and (3) the effect of aquifer heterogeneity on the MM. The performance of the proposed MM was assessed using an illustrative case aquifer subject to seawater intrusion. The obtained results indicate that the ensemble MM that combines all four meta-modeling frameworks outperformed the GP and ANN models, with a correlation coefficient of 0.98. Moreover, the proposed MM using nonlinear-learning ensemble of SVR-EPR achieves a better and robust forecasting performance. Therefore, it can be considered as an accurate and robust simulator to predict salinity levels under different abstraction patterns in variable density flow. The result of uncertainty analyses reveals that robustness value and pumping rate are inversely proportional and scenarios with a robustness measure of about 12% are more reliable.

Entities: Chemical Gene

Keywords: Earth sciences; Ensemble meta-model; Environmental science; Hydrology; Info-gap theory; Nonlinear-learning ensemble; Robust prediction; Seawater intrusion; Variable density flow

Year: 2020 PMID： 33385083 PMCID： PMC7772554 DOI： 10.1016/j.heliyon.2020.e05758

Source DB: PubMed Journal: Heliyon ISSN： 2405-8440

Introduction

Coastal aquifers are subject to seawater intrusion (SWI), generally caused by over pumping. Thus, it is necessary to design appropriate management strategies for optimal water extraction from coastal aquifers while controlling the SWI into the wells [1]. Different numerical models based on variable density flow have been developed and combined with optimization algorithms to find the optimal pumping strategies [2]. Nevertheless, to converge the optimal pumping patterns, the repetitive process of simulation-optimization (SO) model for computing state variables is time consuming [3]. One of the techniques to reduce the computational time on the coupled SO model is to construct efficient replacement models, generally named MMs of the original numerical model [4]. MMs have been applied in recent studies to approximate complex numerical density dependent flow and contaminant transport [5,6]. In SO model for SWI management, two types of meta-modeling approaches have been generally implemented: offline and online. In the offline approach, the numerical model is completely substituted by MM and sampling algorithm is independent of the optimization model whereas, in the online approach, the MM is trained adaptively using the numerical model during optimization process [7]. At present, there are relatively limited on-line MMs for SO process and major off-line methods implemented in SWI managements. For example, Rao et al, [8] used Artificial Neural Network (ANN) as the approximation of finite difference SEAWAT [9] model and coupled with simulated annealing optimization algorithm to identify the optimal locations for a group of production wells in costal deltas with paleo channels. Kourakos and Mantoglou [10] developed a novel meta-modeling framework using modular neural network (MNN) where many sub-network cooperate to approximate numerical SEAWAT model. The efficiency of their proposed model for pumping management of 34 public wells in the Greek island of Santorini indicated that the time required by the MM scheme is only 5% of the time of standard SO model. Dhar and Datta [11] compared the burden time of the direct link of Non-dominated Sorting Genetic Algorithm-II (NSGA-II) with FEMWATER [12] and ANN-based MM as an approximation of the numerical model. The performance evaluations of the results between the two techniques of SO indicated that the developed ANN model is computationally effective in generating the optimal Pareto front. Despite the popularity of ANN as MM, there are some disadvantages of this model including high computational time for training its parameters and over-fitting as far as model complexity [13]. On the other hand, GP model has rather simple formulation and hence it needs less model parameters compared to weights of ANN [14]. Sreekanth and Datta [14] compared the performance and computational time of GP and MNN as a MM for FEMWATER model, into the optimization problem. They substituted FEMWATER with MNN and MM and linked to a multi-objective genetic algorithm to find the optimal pumping rate of an illustrative study area with 11 pumping wells and compared it with the genetic programming (GP). The result showed that GP parameters using online technique are less than MNN model and hence the Pareto solution generated by linked GP and optimization model has less uncertainty. In terms of simplicity and robustness, radial basis functions (RBF) was reported as the best meta-modelling formwork [15,16]. Several works have replaced simulation models by RBF for computationally intensive optimization of SWI problems. Papadopoulou et al [15] applied a RBF-ANN as a replacement of density dependent model and coupled it with a differential evolution algorithm to reduce the burden time of optimization runs. Christelis and Mantoglou [16] assessed the efficiency of cubic RBF in single-objective pumping optimization of real coastal aquifer at Vathi in Kalumnos Island, Greece. They found that using the adaptive RBF MM the run time of the SWI optimization was reduced by 96 %. Recently, there has been the interest in implementation of novel data driven model such as Evolutionary Polynomial Regression (EPR) and Kernel Extreme Learning to decrease the run time of SO processes. Hussain et al [17] developed EPR as a replacement for SUTRA [18] code and combined it with a multi objective genetic algorithm to assess the effect of different desalination techniques for controlling SWI in a hypothetical aquifer. The Pareto solution generated by EPR - NSGAII algorithm was compared with those obtained using SUTRA - NSGAII. They found that the Pareto front generated by two schemes of SO model have a similar trend, while retaining the advantage of MM in 98% reduction in run time compared to original SO framework. Song et al [19] proposed an online meta-modeling framework based on Kernel Extreme Learning machine where the MM is adaptively trained using SEAWAT model during optimization to reduce error accumulation of prediction and escape from the local optima trap. It is evident from the literature review that the commonly used MM for the management of coastal aquifers include ANN, RBF, GP, and EPR. In most previous studies, the common approach is to build an accurate MM using a single algorithm. However, few studies have attempted to compare the performances and uncertainties of different MMs in a single framework analysis. Moreover, the uncertainties generated by single MM parameters may result in low accuracy in forecasting [20]. Thus, a combination of the ensemble MMs with an uncertainty analysis model can improve the accuracy and robustness of the prediction [20]. The aim of this study is to construct a fast and robust MM for the management of Talar aquifer with the minimum SWI subject to a set of physical and technical constraints. In Iran, the change in management authority over the time has resulted in different management scemarios. The Iranian government first started enacting water laws affecting coastal area in 1932; the Ministry of Agriculture has been responsible for river and watershed protected zones until 2008, at which time the Ministry of Energy assumed the responsibility. The government considers water that discharges into lakes and seas as surplus, making it easier for Ministry of Energy to issue withdrawal certificates. Thus, the development of a robust management model for coastal aquifers, considering water laws in Iran is necessary. In this study, we designed robust and fast MMs to find the optimal pumping rate from Talar aquifer under seawater intrusion. This paper is organized in four sections: 1) development of a finite difference numerical model to simulate density-dependent groundwater flow and solute transport for Talar aquifer; 2) designing 16 MMs using different combination of four algorithms. i.e., ANN, RBF, GP, SVR and EPR, including four single models, 15 linear ensembles and one non-linear ensemble; 3) assessing the performance of all 16 MM to identify the best; 4) selecting the most robust management scenarios from the optimal solutions generated by the best MM.

Methodology

In this section, the governing equations of numerical model and the formulation of nonlinear learning ensemble model using four machine learning models (i. e. Support vector machine, Evolutionary polynomial regression, Genetic programming and Radial basis function) are proposed. Then, the uncertainty analysis of MM using Info-gap theory is presented. Finally, the boundary condition and location of case study for discussion and comparison of MM are described.

Governing equation

The finite difference variable density ground water flow model SEAWAT was implemented to simulate flow and solute transport in a real anisotropic aquifer subject to SWI. SEAWAT solves the integrated flow and solute transport equations simultaneously by coupling MODFLOW [21] and MT3DMS [22]. The SEAWAT model implemented in this paper is based on MODFLOW-2000 and MT3DMS, in the framework of GMS 10.2. The variable density flow equation in an anisotropic aquifer can be written as [23]:Where is the hydraulic conductivity tensor (), is hydraulic head (), is a constant for density coupling, is specific storage (), is dissolved concentration (), is reference head level, represents time (), the porosity of the medium, is volumetric inflow rate, is the density of freshwater (). is the bulk density of seawater (). The governing equation for solute transport in aquifers without explicit consideration of sorption can be written as [24]:Where is the key component of dispersion (), is the velocity vector (), is the volumetric source per unit volume of aquifer (), denotes the chemical reaction term ().

Development of single MM

Support vector machine

Given a set of instance in which is the vector of input, n represents input dimensions and is the output of support vector machine (SVR) model. The goal of SVR algorithm is to reach a linear relation in the high-dimensional feature space using a projection of input vector by the nonlinear function which minimizes the risk . This process can be written as [25]:Where , are weight and bias, respectively. is a penalty coefficient and represents the loss function. The optimal value of the above mentioned parameters can be calculated by the following optimization equation:Where and denote a threshold precision, the regression equation can be expressed as below:Where and are Lagrange multipliers and is kernel function based on the radial basis function with radius of as expressed below [25]:

Evolutionary polynomial regression

Evolutionary polynomial regression (EPR) combines the features of regression techniques with the genetic programming. The outputs of EPR are mathematical expressions of input parameters [26]. A typical structure of EPR algorithm can be written as:Where is the model predicted output, is the input vector, is a function determined using engineering judgment, is a function generated within the optimization process, is the number of expression of , is coefficient of term and is the bias. The optimal value of is identified using genetic algorithm (GA) to reach the best efficiency of EPR model with high correlation between simulated (by SEAWAT) and predicted. The decision variables of GA model are the number of expression, range of exponent and type of expression to build the EPR model. In the EPR model different mathematical expressions are generated for different ranges of input parameters.

Genetic programming

Genetic programming is an evolutionary algorithm approach to solve a complex problem by breaking it into smaller sub-problems. It is a new form of the genetic algorithm (GA) and uses the basic concept of GA to make a regression model over a set of populations. GP has a tree structure composed of input data and different mathematical expressions such as subtraction and multiplication. The main object of GP is to generate many mathematical programs which are highly correlated to input and output data. An error criterion is comparing the output of each program with the target value. GP identifies the optimal solution over randomly generated programs by defining cross-over and mutation with the aim of error minimization. GP structure is like a tree in which the mathematical operators are defined as splitting criteria. Hundreds of mathematical equations are examined by the natural selection to reach the goodness of fit criteria [27].

Radial basis function

RBF forecasts the output at the input as follows:Where n is the size of vector, is the weights of interpolation, is a radial function, and |||| is the Euclidean distance between two pointsQj and Qi. The most commonly used types of radial basis functions for predicting groundwater quality is the Gaussian function [28], which can be expressed as:in which is a constant parameter that improves the prediction accuracy.

Proposed linear ensembles

In previous studies the ensemble data driven models [28,29] have been reported as efficient tools for modeling non-linear problems. A weighted average MM using the optimal weight of single MM can enhance the performance of MM [7]. This technique is based on minimizing the error index ():Where is the weight of each MM, and are the actual value and predicted value of input respectively. is the covariance matrix which is expressed as [7]:Where is the number of training samples, and represent different MMs and is the vector of errors. The ensemble model which indicates the smallest error between the predicted and observed value of training samples is selected as the best MM.

Proposed nonlinear ensembles of SVR and EPR

A new meta-modelling technique, EnsemSVR-EPR is developed as an approximation for SEAWT code. The ensemble models which are explained below can predict the variation of output (SWI) due to different input vectors using the result of SEAWAT model. Nevertheless, the generalization performance of SVR model is restricted by the number of regression coefficient vectors. Regarding the high efficiency of the ensemble method, nonlinear ensembles of SVR, EPR and GA (EnsemSVR-EPR) are constructed in this study (see Figure 1). The weighted contribution of SVR models is implemented for the training of a global EPR tree in which the maximum depth of tree is optimized and adapted using GA during training process (see Figure 1). The principal object of EPR tree is to learn the nonlinear behavior of five SVR models and identify the optimal weights for ensemble learning. Thus the output of SVR models and the actual value of output on training samples were set respectively as the input and output for construction of the EPR tree. The depth of the EPR tree is optimized with GA algorithm to reach the performance of EnsemSVR-EPR. Since the proposed EnsemSVR-EPR is trained with small training examples, the model does not have high accuracy for unseen samples. To overcome this problem, within the optimization process, new samples are generated by SEAWAT model and added to the training data. The performance of MMs is assessed using statistical criteria such as the root mean squared error (RMSE), the mean of the absolute error () and correlation coefficient ():where is number of training samples, and denote the output value of numerical and MM, respectively.

Figure 1

Flow diagram of the proposed EnsemSVR-EPR.

Uncertainty analysis using the info-gap theory

The Info-gap theory (IGT) presented by Ben-Haim [30] is a non-stochastic approach to minimize management decisions and to maximize robustness. For the implementation of the Info-gap method it is necessary to determine first the main parameters of the simulation model significantly affecting the objective function. Then, engineering judgement is used for the determination of the values of the uncertain parameters upper bound b and the lower bound b around the calibrated value. The simulation model is run for different set of values of uncertain parameters within the predefined range. Thus, the model should be simulated for cases and this process is carried out for all uncertain parameters simultaneously. The robustness bound is expressed as: Robustness threshold () is set equal to the maximum value of the output minus a constant percent (w). and are the maximum and minimum deviations from the desired value, respectively. The most sensitive parameter in SWI problems is usually the hydraulic conductivity which is strongly affected by the heterogeneity of the geological structure of the aquifer. This heterogeneity can result in uncertainty in model predictions. In order to calculate the robustness of a non-dominated scenario, in this paper first the value for the ith region () has been varied between the center and upper value (u) with constant intervals. The value of output (i.e. SWI) for scenario () is calculated for If for the value of , the objective function value is less than a chosen threshold , the robustness of scenario for is considered equal to . This means that the scenario with is the one with the highest robustness for the parameter . Therefore, the robustness for the ith scenario can be expressed as: In order to provide the upper bound for , its value in each part of the aquifer is projected using different realizations of the field in log of wells. Different permutations of are calculated and the minimum and maximum projected k values are taken, respectively, as the center and upper bound for the Info-gap method [31]. Regarding different ranges of robustness value for optimal scenarios, these values are normalized relating to maximum probable robustness value [32]:Where represents the normalized robustness.

Case study

Talar aquifer (52°22′- 53°42′N and 36°43′- 37°30′E) is located in the coastal area of Mazandaran city, Iran (Figure 2 a). The total area of aquifer near the Caspian sea is 776 , 30% of which is plain with agricultural and industry use. The major part of the unconfined Talar aquifer consists of fine grained sand and conglomerate deposits and the permeability of the aquifer near the study area (Figure 2 b) varies between 0.15 and. The average annual precipitation ranges from 665 mm in the south to 761 mm in the north and recharge by irrigation is about 32% of total extraction. In the Talar aquifer groundwater is extracted by 1320 shallow wells, with pumping rate of 195 million for agricultural use (89%) and drinking purposes (11%). The boundary conditions of the study area are the Caspian Sea at the north side and the mountain front in the south. The western and eastern boundaries of the study area are Babolrood and Siahrood, respectively (Figure 2 b).

Figure 2

a) Layout of Talar aquifer b) aquifer boundary conditions c) source and sink locations of study area.

Model development

The remainder of this section is arranged as follows. In section 1, the proposed SEAWAT model is applied to pumping optimization in the coastal aquifer primarily affected by SI in the Talar aquifer of Iran. In section 2, the proposed MM model is trained with pumping patterns generated by SEAWAT model.

Development of numerical model

A variable density flow equation is implemented using SEAWAT in framework of GMS. 10. 2 to simulate groundwater level and salinity distribution in the Talar region. The length and width of study area are 16 km and 11 km respectively, which are divided into 500 × 500 m cells in X and Y directions and vertically divided into 3 layers. The simulation time in the transient condition was set equal to 5 years with 60 monthly time steps. The initial total dissolved solids (TDS) value of southern edge and shoreline was considered 0 and 12.6, respectively [30]. For the other active cells near the rivers and Siahrood river the concentration was set 0.1 . To develop the numerical model, Caspian Sea boundary was defined using constant head () package with constant salt concentration. Also, the southern boundary was set as constant flow boundary with conductance and groundwater level of 18500 and 18 respectively. Lateral boundaries were Siahrood and Babolrood rivers with the conductance of and simulated using River package. There are 1320 shallow wells with low pumping rates (each 0–25 ) in the study area with an overall pumping rate of about 14820 (see Figure 2c). The range of flow and transport parameters used by the simulation model are summarized in Table 1.

Table 1

Flow and transport model parameters.

Parameter	Range (units)	Number of parameters
Thickness of aquifer (d)	10–49(b) (m)	-
Shoreline slope (-)	0.06(b) (-)	1
Density of seawater (ρs)	1015(b) (kg/m3)	1
Density of fresh water (ρ0)	1000(b) (kg/m3)	1
Longitudinal dispersivity (αL)	30–50(b) (m)	1
Transverse dispersivity (αT)	4–10(b) (m)	1
Molecular diffusion coefficient (Dm)	6 × 10⁻⁷(b) (-)	1
Fluid viscosity (μ)	86(b) (kg/(m/d))	1
Horizontal hydraulic conductivity (kh)	0.15–0.4(a) (m/d)	3
Effective porosity (n)	0.35(b) (-)	1
Specific storage (ss)	6 × 10⁻⁴(b) (-)	1
Vertical anisotropy (λ)	10–50(b) (-)	2

Parameters for calibration step.

Parameters values from Mahab Ghodss Consulting [33].

Flow and transport model parameters. Parameters for calibration step. Parameters values from Mahab Ghodss Consulting [33]. Model calibration was performed using PEST 13.0 with data from two observation wells (O1 and O2) located near the shoreline (Figure 2c). The observation data from 2012 to 2015 were set for calibration and validation of groundwater level and salinity concentration. The aquifer parameters chosen for calibration of groundwater level and transport were the three hydraulic conductivities (HK) in XY plan () and the recharge in the southern boundary () (see Figure 2c). The initial value of southern boundary was defined using general head boundary () with the conductance (8430 ) and stage (18 ). The initial groundwater level of the study area was calculated by means of kriging based on 4 observation data (see Figure 2c). MODFLOW model was executed for a 100 years period, to reach a transient condition for groundwater head. To enhance model performance, a sensitivity analysis was performed to assess the effect of different values of HK on groundwater levels (Colombani et al., 2016).

Calibration of numerical model

To obtain a good fit between simulated and observed groundwater head, the water elevation of general head boundary in the southern edge was adjusted, since it was the main source of inflow for the aquifer. Regarding the result of sensitivity analysis the groundwater level of Obs point O1 is mostly influenced to whereas has little impact on groundwater head. Thus, to increase model accuracy, the small step of () was set in the calibration stage. Figure 3 shows the simulated versus observed values of groundwater head and TDS concentrations. The good match between the simulated and observed TDS after adjustment of parameter values indicates that calibration of dispersion and diffusion coefficients was not necessary.

Figure 3

The chart of predicted and observed groundwater level (left) and TDS concentration (right) of observation point O2.

The chart of predicted and observed groundwater level (left) and TDS concentration (right) of observation point O2. Groundwater level of the calibrated model (Figure 3) confirms that there is a high correlation between simulated and observed data for 0.85 of the points. The and RMSE for TDS were 0.0046 and 0.38, respectively. Figure 4 and Figure 5 show the simulated TDS concentration and groundwater head in the study area at the end of September 2020, respectively. As illustrated in Figure 4, SWI length is mostly impacted by pumping rate near the coastline. Figure 5 reveals that extraction rates increases toward the north and west and hence TDS line 12.6 in this zone is advanced to a distance of 2.2 from the sea boundary. Whereas the impact of recharge rate is higher in the east edge, which has a small depth (<12 ) and hence this area has a small SWI value (1.35 ).

Figure 4

Variation of salinity throughout the study area at the end of the September 2020.

Figure 5

Variation of head throughout the study area at the end of September 2020.

Variation of salinity throughout the study area at the end of the September 2020. Variation of head throughout the study area at the end of September 2020.

Development of MM

In the developed prediction model, demonstrates the maximum displacement of seawater wedge associated with the pumping rate vector thus, the MM estimates the, given the total pattern of pumping rates. In order to reduce the number of inputs, 14 circular areas including wells are defined. In this method, the temporal variation of seawater wedge toe is forecasted discretely by some learning models with different weights and bias terms (see Figure 6). The boundary of each control area is determined on the basis on the distances between wells (see Figure 6) and the pumping rate of wells inside the control area is considered equal. Therefore, the MM has 14 inputs controlling .The total length of seawater intrusion over total time step can be calculated using the principle of superposition as expressed below:

Figure 6

The structure of developed EnsemSVR-EPR.

The structure of developed EnsemSVR-EPR. In this relation is the location of iso-contour TDS-50% (6.3 ). For RBF model, the value of C and type of basis function were chosen by trial and error 3 and Gaussian basis function, respectively. A 2-layers back-forward ANN algorithm was developed with Levenberg–Marquardt training scheme. The optimal number of neurons was chosen as 14 with sigmoid output function and the learning rate and momentum coefficients were assigned as 0.2. The GP algorithm was generated using a population size of 300, cross over and mutation rates were chosen as 45 and 85 percent, respectively. For the generalization of GP model, the number of parameters using mathematical operators such as subtraction and division was set as 36. In this study, from different models generated by EPR, the simplest model with the smallest number of parameters and with the highest correlation was chosen. The generated EPR equations for SWI () predictions are written as:Where and are the center and pumping rates of ith cluster (Figure 6). To enhance the predictive and generalization abilities of the Ensem SVR-EPR method, five different SVR models are determined by trial and error. The structure of SVR models are respectively SVR1 with and , SVR2 with and , SVR3 with and , SVR4 with and , SVR5 with and , SVR6 with and . The population size of GA is set as 50 and the values of genetic operators, e. g. mutation and crossover probability are assigned as 0.05 and 0.8, respectively.

Results and discussion

The remainder of this section is arranged as follows. In section 1, the Performance of MMs is discussed base on statistical indexes. In section 2, the robustness of optimal scenarios proposed are measured by an efficient MM.

Performance of MMs

To assess the efficiency of the proposed MMs, the training samples were divided by means of the hold-out technique and 30% of instances were used in the validation step. Table 2 displays the statistical criteria obtained for all 16 MMs and MMs with lower correlation and higher error values placed at the end of the table. Moreover, the optimal weights for linear combination of 4 MMs are presented in Table 2. For all MMs, between simulated and predicted SWI varies from 0.90 to 0.99. EnsemSVR-EPR shows better model performance with as 0.99 and RMSE value of 7.75 which displays the advantage of nonlinear-learning ensemble.

Table 2

Evaluation results for the MMs.

MM	R2	MAE(m)	RMSE(m)
ANNa	0.97	8.53	12.21
RBFb	0.93	22.45	36.23
EPRc	0.94	19.81	31.28
GPd	0.96	11.58	15.52
0.8ANN+ 0.2RBF	0.95	13.45	17.23
0.7ANN+0.3EPR	0.97	9.84	14.21
0.6ANN+0.5GP	0.98	7.78	11.26
0.2RBF+0.8EPR	0.92	23.16	37.46
0.1RBF+0.9GP	0.92	22.16	36.51
0.3EPR+0.7GP	0.95	14.68	17.69
0.6ANN+0.1RBF+0.3EPR	0.97	10.14	15.42
0.5ANN+0.1RBF+0.4GP	0.96	12.78	16.68
0.4ANN+0.2EPR+0.4GP	0.98	6.54	9.88
0.1RBF+0.4EPR+0.5GP	0.96	12.26	14.28
0.4ANN+0.1RBF+0.2EPR+0.3GP	0.97	9.81	11.56
EnsemSVRd-EPR	0.99	5.52	7.75

Artificial neural network.

Radial basis function.

Evolutionary polynomial regression.

Support vector regression.

Evaluation results for the MMs. Artificial neural network. Radial basis function. Evolutionary polynomial regression. Support vector regression. These results support the conclusions of Roy and Datta (2017) who used Ensemble model as an alternative metamodelling method for coastal aquifer management and compared it with the most widely used ANN. As indicated in Table 2, between four single MMs, ANN model has the best correlation coefficient (CC = 0.98) with minimum RMSE (12.21) and MAE (8.53) for testing samples. Table 2 also suggests the same trend of correlation coefficient for the prediction of SWI and by GP and ANN model. The observations of the present study over single MMs contradict with Christelis and Mantoglou (2017) conclusion which that emphasized the minor differences between GP and ANN in the simulation-optimization system. The high correlation coefficient of GA among single MMs confirms the previous results of Kourakos and Mantoglou [10] especially in multi-objective optimization formulations. However, Table 2 reveals that ensemble MM does not always achieve good performance compared to the single MM. In general, the linear combination of ANN and GP with other algorithms improves the efficiency of MMs. Among the single and ensemble MMs, RBF and RBF + GP show apparently the lowest correlation with highest error indexes. Therefore the EnsemSVR-EPR MM can be implemented as the best approximation of numerical model for prediction of SWI in variable density flow. The scatter plot of EnsemSVR-EPR for testing samples is shown in Figure 7 and reveals that the predicted value of length is in good agreement with the results of SEAWAT model.

Figure 7

Comparison of the results of SEAWAT model with EnsemSVR-EPR for different locations: a) P1, b) observation P4, c) observation P5, d) observation P7.

Comparison of the results of SEAWAT model with EnsemSVR-EPR for different locations: a) P1, b) observation P4, c) observation P5, d) observation P7. A single-parameter sensitivity analysis using the exponential terms of EPR tree was performed to investigate the effect of each input parameters (pumping rates) on the length. The results indicated that length is most influenced by and and hence, the coefficients of this inputs are greater than the other 14 inputs. Thus, in the adaptive sampling by SEAWAT model new 100 training samples are generated near this point. To evaluate the capability of adaptive EnsemSVR-EPR model to predict the peak value of TDS, the predicted and simulated values of TDS for point 7 are graphically presented in Figure 8. As shown the maximum value of error between EnsemSVR-EPR and the numerical model is about 95 and the error frequency around mean value (15.85) follows a Gaussian distribution.

Figure 8

The comparison of testing error among SEAWAT simulator and EnsemSVR-EPR at point.

The comparison of testing error among SEAWAT simulator and EnsemSVR-EPR at point. The results of this study is in agreement with a highlighted finding of Jovanović et al [34] who consistently indicated that the Ensemble of MMs can improve robustness of the predictions by extracting data while supporting among single MM by reducing the effect of poor predictions.

Robustness of solutions

The start points of info-gap criteria are presented in Table 3. This considers different user-defined values for w. The values of have been rescaled by a min-max normalization with the minimum = 100 m and the maximum value = 6200 m to define the lower and upper bounds of . As shown in Table 3, the robustness thresholds for and pumping rates vary around the mean (0.41) and they maximum (0.117) value, respectively.

Table 3

Robustness thresholds for input-outputs.

Robustness threshold	SWI	Pumping rate
Minimum value	0.074	0.117
Maximum value	0.635	0.875
wa (%)	(rb)b	rb
5	0.607	0.839
10	0.579	0.804
15	0.550	0.768
20	0.522	0.733
25	0.494	0.698
30	0.466	0.662
35	0.438	0.627
40	0.410	0.591
45	0.382	0.556
50	0.354	0.521

Specific percent (w) of the difference of the maximum and minimum amount of the utility function.

Robustness thresholds.

Robustness thresholds for input-outputs. Specific percent (w) of the difference of the maximum and minimum amount of the utility function. Robustness thresholds. For the robustness assessment of each scenario, it is necessary to calculate the range of variation of HK for each region. As illustrated in Figure 1, the of three geological parts is varied between 0.15 to 0.4 . The objective function value has been assumed not to be influenced by the dispersion coefficient and has been assumed as the most sensitive parameter. Figure 9 shows the robustness measure due to the variation of . As illustrated in this figure, the robustness measure for scenarios varies from 2 to 18. However, the highest-robustness scenario has the lowest pumping rate.

Figure 9

Robustness measures of different solutions considering the objective (w = 40).

Robustness measures of different solutions considering the objective (w = 40). HK is less sensitive to SWI when the discharge rate from all 14 regions is small and the distribution of salinity proves to be mostly dependent on discharge from coastal wells. Therefore, different SWI levels can be caused by a constant discharge rate. Figure 9 illustrates the two scenarios 21 and 36 (in red) having the same discharge rates and “15” and “12” robustness measures, respectively. The highest-robustness scenario is the one in which most of the discharges are from regions 2 and 3. The scenarios which have a robustness measure higher than 12% can be considered as reliable. The robustness measures of different scenarios for objective function of pumping rates are 18 while the profit depends on the discharge rate and is not affected by the aquifer . Figure 10 illustrates that a few scenarios having similar values of robustness measures have different pumping rates and .

Figure 10

The value of objective functions versus robustness measure for different scenarios.

The value of objective functions versus robustness measure for different scenarios. Figure 10 shows the normalized and pumping rate values related to the robustness of each scenario, in which the relation between robustness measure and pumping rate value is inversely proportional. With the additional consideration on Figure 10, it can be concluded that the scenarios 17 to 21 require about 50% of maximum pumping rate. The location of seawater wedge toe corresponding to the lower (0.1, 0.35, 0.3) and upper bounds of HK (0.4, 0.34, 0.4) is measured. To find the most robust outputs among MM results, the SWI length of 22 samples for 9 realizations of is measured. It is obvious from Figure 11 that the highest and lowest variation of SWI corresponds to scenarios 6 and 2, respectively. It means that scenario 2 is the least impacted by the variation of and it has the highest robustness.

Figure 11

Upper and lower bound around the calibrated value.

Upper and lower bound around the calibrated value. The spatial location of the interface for many conditional fields corresponding to the scenario 2 is illustrated in Figure 12. The results indicate that the interface location for all four realizations is closer to the calibrated HK, thus scenario 2 is reliable. This scenario corresponds to discharge from point 5. Although the effect of HK on SWI is increased when the total abstraction rate is large, the result of Figure 12 highlights that a constant pumping rate can cause different shapes of the saline zone. This is because of the distribution in the aquifer is heterogeneous which causes non-uniform effects on transition zone.

Figure 12

Temporal and spatial location of transition zone (scenario 2).

Conclusions

In this study, five machine learning algorithms including ANN, GP, EPR, SVR and RBF, were developed to build 15 linear and one non-linear ensemble MMs as approximation of SEAWAT model for density dependent flow modelling of Talar aquifer. The ensemble MMs are adaptively assessed for 100 pumping patterns by SEAWAT model. To increase the accuracy of the predicted salinity time series, the predicted value by MMs at four observation points was evaluated. The assessment of statistical criteria reveals that EnsemSVR-EPR MM with external optimization system has a great performance to predict the variation of TDS due to different pumping patterns in the aquifer with 1320 pumping wells. The combination of RBF with other MMs results in significant reduction in the correlation coefficient compared to GP model. Moreover, ensemble MM did not always achieve good efficacy relative to the single MM. Finally, the robustness of accurate EnsemSVR-EPR model under variation of calibrated aquifer parameters is investigated. The result indicates that SWI length is greatly influenced by the variation of hydraulic conductivity field, which results in uncertainty of MM prediction. However, SWI level in the middle part of aquifer is less influenced by the variation of hydraulic conductivity and EnsemSVR-EPR MM presents a reliable prediction for this area.

Declarations

Author contribution statement

A. Ranjbar: Conceived and designed the experiments; Performed the experiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Wrote the paper. C. Cherubini: Conceived and designed the experiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Wrote the paper.

Funding statement

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Data availability statement

Data will be made available on request.

Declaration of interests statement

The authors declare no conflict of interest.

Additional information

No additional information is available for this paper.

3 in total

1. Computational benefits using artificial intelligent methodologies for the solution of an environmental design problem: saltwater intrusion.

Authors: Maria P Papadopoulou; Ioannis K Nikolos; George P Karatzas
Journal: Water Sci Technol Date: 2010 Impact factor: 1.915

2. MODFLOW/MT3DMS-based simulation of variable-density ground water flow and transport.

Authors: Christian D Langevin; Weixing Guo
Journal: Ground Water Date: 2006 May-Jun Impact factor: 2.671

3. Application of ensemble surrogates and adaptive sequential sampling to optimal groundwater remediation design at DNAPLs-contaminated sites.

Authors: Qi Ouyang; Wenxi Lu; Tiansheng Miao; Wenbing Deng; Changlong Jiang; Jiannan Luo
Journal: J Contam Hydrol Date: 2017-11-06 Impact factor: 3.188

3 in total