Literature DB >> 35202281

Neuro-Particle Swarm Optimization Based In-Situ Prediction Model for Heavy Metals Concentration in Groundwater and Surface Water.

Kevin Lawrence M De Jesus1,2,3, Delia B Senoro1,2,3,4, Jennifer C Dela Cruz1,5, Eduardo B Chan6.   

Abstract

Limited monitoring activities to assess data on heavy metal (HM) concentration contribute to worldwide concern for the environmental quality and the degree of toxicants in areas where there are elevated metals concentrations. Hence, this study used in-situ physicochemical parameters to the limited data on HM concentration in SW and GW. The site of the study was Marinduque Island Province in the Philippines, which experienced two mining disasters. Prediction model results showed that the SW models during the dry and wet seasons recorded a mean squared error (MSE) ranging from 6 × 10-7 to 0.070276. The GW models recorded a range from 5 × 10-8 to 0.045373, all of which were approaching the ideal MSE value of 0. Kling-Gupta efficiency values of developed models were all greater than 0.95. The developed neural network-particle swarm optimization (NN-PSO) models for SW and GW were compared to linear and support vector machine (SVM) models and previously published deterministic and artificial intelligence (AI) models. The findings indicated that the developed NN-PSO models are superior to the developed linear and SVM models, up to 1.60 and 1.40 times greater than the best model observed created by linear and SVM models for SW and GW, respectively. The developed models were also on par with previously published deterministic and AI-based models considering their prediction capability. Sensitivity analysis using Olden's connection weights approach showed that pH influenced the concentration of HM significantly. Established on the research findings, it can be stated that the NN-PSO is an effective and practical approach in the prediction of HM concentration in water resources that contributes a solution to the limited HM concentration monitored data.

Entities:  

Keywords:  groundwater; heavy metals; neural network; particle swarm optimization; surface water

Year:  2022        PMID: 35202281      PMCID: PMC8879014          DOI: 10.3390/toxics10020095

Source DB:  PubMed          Journal:  Toxics        ISSN: 2305-6304


1. Introduction

The Philippines is an archipelagic country consisting of more than 7600 islands in the Southeast Asian region. The Philippines is among the countries in the world with vast mineral deposits, and it was assessed that thirty percent of the land area of the country has the resources to commence mining activities [1]. Although the mining industry provides immense livelihood opportunities, mining activities in the Philippines have a mixed footprint of economic progress and impact on humans and the environment. Consequently, regular environmental quality monitoring, especially heavy metal (HM) concentration, is lacking due to access to equipment, laboratory facilities, and water resource locations to regularly monitor the degree of HM concentration. These resources could lead to different risks, as much evidence is available that proper education and information regarding HM concentrations in water resources are essential. Lack of information on the quality of water could result in several detrimental effects not only on the environment but also on the health of the people in the community [2]. The list of abbreviations and symbols in this study is presented in Table 1.
Table 1

List of abbreviations and symbols used in this study.

Abbreviation/SymbolDescriptionAbbreviation/SymbolDescription
AASAtomic Absorption SpectroscopyIoTInternet of Things
AIArtificial IntelligenceKGEKling-Gupta Efficiency
AICAkaike Information CriterionKNNK-Nearest Neighbor
AMDAcid Mine DrainageLMLevenberg-Marquardt
ANNArtificial Neural NetworkLSTMLong Short-Term Memory
BBOBiogeography-Based OptimizationLSWLake Surface Water
BPBack PropagationM5PModel Tree
BRBayesian RegularizationMANFISMulti-output Adaptive Neuro-Fuzzy Inference System
CACluster AnalysisMHMIModified Heavy Metal Index
CIContamination IndexMLMachine Learning
CSWCoastal Surface WaterMLGIMachine Learning Geostatistical Interpolation
EBKEmpirical Bayesian KrigingMLRMultiple Linear Regression
EHCIEntropy Weight-based HM Conc. IndexNARXNon-linear AutoRegressive eXogeneous
FCMFuzzy c-means Clustering MethodPLIPollution Load Index
GAGenetic AlgorithmPMIPrincipal Component Analysis-based Metal Index
GEPGene Expression ProgrammingPSOParticle Swarm Optimization
GFFGeneralized Feed ForwardRBFRadial Basis Function
GPGrid PartitioningRIRelative Importance
GRNNGeneralized Regression Neural NetworkSCMSubtractive Clustering Method
HEIHeavy Metal Evaluation IndexSPISynthetic Pollution Index
hN-PSOHybrid Neuro-Particle Swarm OptimizationSVMSupport Vector Machine
HPIHeavy Metal Pollution IndexSVM-PolySVM with Polynomial
ICAImperialist Competitive AlgorithmWQGWater Quality Guidelines
ICP-OESInductively Coupled Plasma-Optical Emission SpectrometryWQIWater Quality Index
Numerous areas affected by mine tailings have been identified in the Philippines. These mine tailings, normally called acid mine drainage (AMD), are primarily in remote areas and regions where access is a challenge. Due to vast deposits of minerals in the Philippines, mining activities commenced in several provinces of the archipelago. Some of these activities resulted in detrimental and extreme mining disasters that released toxic substances into the environment, affecting the ecosystem and the community adjacent to the mining site [3]. The summary of the mining disasters in the Philippines is presented in Table A1 in Appendix A.
Table A1

Mining disasters in the Philippines.

Location/DateCause of ReleaseDescription of ReleaseImpactReference
Siocon, Zamboanga Del Norte/6 April and 11 July 2007Heavy rains eroded the clay soil and destroyed the concrete wall of the zinc extraction sulphide dam.Contaminated water with detectable levels of cyanide and mercury ran down the Canatuan River and into the Siocon River, eventually reaching the sea.Reports of siltation had reached up to 3 m thick, which caused frequent flash floods and obstructed irrigation flow from the river. The river mouth became brown and fish catches decreased.[118]
Rapu–Rapu Island, Albay/ 11 and 31 October 2005The tailings pump failed, spilling tailings from the mill’s emergency pond into the gold processing facility and neighboring Alma and Pagcolbon creeks.At least 20 cubic meters of slurry material (containing cyanide beyond the standard of 0.05 mg/L and other toxic heavy metals and chemicals)Two kilograms of dead small fish and crustaceans at the shoreline collected on the same day at the location where the affected creeks exit into the sea.[119]
San Marcelino, Zambales27 August 27 and 11 September 2002The spillway of Bayarong tailings dam collapsed during heavy rain.High concentrations of heavy metals and sulfide materials.Low-lying settlements were inundated with mining waste, 250 residents were evacuated, and some tailings leaked into Mapanuepe Lake and later into the Sto. Tomas River.[120]
Sipalay, Negros Occidental8 December 1995At the Bulawan gold mine, the pressure of impounded tailings created a leakage in the decant tower of tailings pond no. 1.Mine tailings caused the siltation of the Sipalay River.Excessive quantities of dust covered a 5-square-kilometer region, affecting the air quality, and local inhabitants reported a rise in respiratory diseases.[121]
Toledo City, Cebu9 August 1999The outlet of an open pit’s drainage tunnel (from a closed copper mine) was obstructed, resulting in the loosening of accumulated silt and discharge into the Sapangdaku River toward the sea.5.7 million m3 of acidic waterIncreased acidity in afflicted water bodies, resulting in fish mortality.[122]
Placer, Surigao del Norte/ 26 April 1999Tailings pond No. 7 tailings discharged due to a broken concrete pipe.700,000 cubic meters of cyanide tailingsSeventeen homes buried, 40 heactares affected, including 20 hectares of agricultural land[123]
Sibutad, Zamboanga del Norte6 November 1997Two strong rain events resulted in mudflows and rockslides into a silt dam.Sibutad gold project’s silt dam overflowedCaused flash floods damaging the nearby houses and rice fields and fish kills.[124]
Mankayan, Benguet17 October 1986Tailings pond 3 collapsed as a result of a compromised dam embankment caused by excessive loading.Mine tailings overflowed and huge amounts of Cu-contaminated mine wastes carried by the Comillas RiverCaused siltation of the Abra River, affecting nine towns, and toxic contamination of the river, depriving the region of about 7.33 million kg of rice every year.[125]
Marinduque Island6 December 1993Maguilaguila siltation dam collapsed because of the siltation pressure at the dam wall.Toxic mine tailings in silt and waterFlooding of the Mogpog River resulted in the death of two children, cattle, contamination of agricultural land, and flooding of downstream communities and Mogpog town.[126,127]
Marinduque Island24 March 1996According to the official explanation, the rock around the plug in the Tapian Pit drainage tunnel was cracked, resulting in the plug’s failure. However, in August 1995, the tunnel began to leak. Marcopper/Placer Dome began drilling 160 m down to the tube in September 1995. The drill struck the tube on 24 March 1996, releasing an air pocket that had been holding back tailings and initiating the leak.The estimate based on the United Nations is between 2–3 million cubic meters over the first 4–5 days of discharge alone.Approximately 1200 persons were evacuated, 26 km of the Makulapnit and Boac river systems were rendered impassable by tailings, flash floods cut off five communities, and 67,000 cubic meters of bagged tailings were gathered and placed on the banks of the Boac river since the cleaning started in 2000.[126,127]
Due to the inadequacies in environmental quality assessment, individuals lack awareness and understanding of the hazards presented by heavy metals to people, especially in low-income countries such as the Philippines [4]. Nearly three decades of mining tailings discharge on Marinduque Island have resulted in higher levels of HM in the environment and residents. The typical exposure pathways of Marinduque villagers residing near the Boac and Mogpog river, where mine tailings were hosted in 1993 and 1996, were bathing, laundering, and passing through the river without protective gear or equipment [5]. Heavy metal contamination in water resources is associated with high toxicity, permanence, and resistance to degradation, disrupting the regular function of water resource systems and constituting a hazard to people by accessing various routes and developing adverse significant health problems [6]. Exposure to these heavy metals can lead to many diseases, including kidney problems such as proteinuria from cadmium exposure [7]; lung carcinoma from chromium exposure [8]; severe damage to the nervous and reproductive system, kidneys, and liver from lead exposure [9]; manganism, which is described as a Parkinson’s-like syndrome that manifests weakness, myalgia, anorexia nervosa, apathy, slow speech, emotionless facial expression, monotonous tone of voice, and clumsy limb movement from manganese exposure [10]; contact dermatitis and eczema and respiratory problems from oral exposure to nickel [11]; damage to the liver, brain, and kidney from long-term copper exposure [12]; tissue damage, organ failure, and elevated cancer risks from exposure to high concentrations of iron [13]; risk of developing anemia and damage to the pancreas from chronic toxicity of zinc [14]. Testing of groundwater (GW) and surface water (SW) samples is a tedious activity due to the different phases involved such as collection, transport, treatment, storage, and analysis of samples. Another issue that might be encountered is the high cost involved in the elemental analysis of these water samples. An additional challenge encountered by the researchers in Marinduque Province is the difficulty in the procurement period for reagents, chemical, and calibration standard solutions that could take 3–4 months to receive, as there are imported from other countries. Another factor affecting the deficiency in monitoring water resources affected by AMD is insufficient skilled personnel to perform the periodic collection and testing of water samples. This condition has been experienced by various local government units (LGUs). Moreover, although inductively coupled plasma (ICP) and atomic absorption spectroscopy (AAS) instruments are being used effectively for heavy metal detection in aquatic environments, several disadvantages, such as expensive instrumentation, limited sample throughput, slow turnaround time, difficulty in performing required measurements for in-situ samples, and consumption of chemicals, have restricted their frequent use in affected mine tailing sites [15,16]. This situation prompted researchers to look for equally viable approaches to serve the need for convenient and regular water quality assessment and monitoring. The use of innovative tools and methods is essential to address the current issues on environmental quality monitoring, especially in areas where there are AMDs. The onset of the industry 4.0 scenario gives birth to artificial intelligence (AI) tools and techniques that include the Internet of Things (IoT), smart sensors, advanced robotics, big data analytics, 3D printing, augmented reality, location detection, cloud computing, and machine learning (ML). Due to the input and output correlation capability of ML, it can be used as a possible approach for addressing the deficiency of instruments used for elemental analysis and tools to determine the concentration of the metals in surface and groundwater [17]. Machine learning is a data analytics technology that enables computers to learn from their experiences. It makes use of computational tools to derive knowledge completely from information, rather than depending on a predefined equation as a model. ML techniques discover natural patterns in data, providing insight and assisting in making more accurate judgments and predictions [18]. The transition to industry 4.0, including the rise in big data, enables the use of ML to become an essential technique in problem-solving of different areas and industries, including environmental quality monitoring [19]. The ML methods are classified into two categories: supervised and unsupervised learning. Whereas supervised learning is used to train a model on identified input and output datasets to forecast future outputs, unsupervised learning is used to discover unknown patterns or inherent structures in the input dataset. Supervised and unsupervised machine learning techniques are used concurrently. Classification and regression are part of supervised learning, whereas clustering is part of unsupervised learning. Support vector machines (SVM), discriminant analysis, Naive Bayes, and nearest neighbor are all examples of classification techniques, whereas multiple linear regression (MLR), generalized linear models, support vector regression (SVR), Gaussian process regression, ensemble methods, decision trees, and neural networks are all examples of regression techniques. The clustering approaches under unsupervised learning include hierarchical clustering, Markov models, and neural networks. Neural networks are classified as both supervised and unsupervised machine learning under the regression and clustering category. The utilization of artificial neural networks (ANN) can capture the dominant characteristics of complex and non-linear systems. Neural networks apply a modeling method that is based on empirical data rather than on simplified ideal assumptions. ANN is broadly applied in various areas of environmental engineering such as air quality monitoring [20], soil quality monitoring [21], and water quality prediction [22]. The use of ANN was found to be superior in performance as it was implemented in various fields including medicine [23,24,25], geotechnical engineering [26], petroleum engineering [27], construction technology [28], building energy [29], and waste management [30], as compared to other models, such as linear and SVM models. Moreover, it was also found out that integrating an optimization technique to ANN further improved its performance as compared to linear and SVM models as observed in the studies of Sun et al. in 2019 [31] and Zhang et al. in 2020 [32]. Neural networks are a heuristic modeling technique that is established on the behavior of genetic neural configurations. ANN could eliminate the tedious and invasive process of digestion of water samples to obtain heavy metal concentrations [33,34]. By combining the input and output data, it is possible to construct a network of interconnected neurons capable of predicting variables given a set of inputs. The simplest and most broadly employed type of neural network is the feed-forward multilayered backpropagation algorithm (BP-NN). However, traditional algorithms such as BP-NN have limitations such as being unable to traverse error function peaks. To address the BP-NN difficulties, evolutionary algorithms optimization methods such as particle swarm optimization (PSO) are used. PSO has a global search function that could be utilized to optimize the performance of the BP-NN [35]. The PSO method is a well-established optimization technique that draws inspiration from the group interactions and motion patterns of insects, birds, and fish. It is a search method based on optimization and stochastic population search that seeks to solve problems in a continuous solution space [36]. Recent studies involved the use of different ML tools in predicting heavy metals in AMD-affected surface and groundwater sites. Table 2 presents the machine learning models from published papers predicting heavy metal concentrations in surface and groundwater.
Table 2

Published heavy metal prediction models in groundwater and surface water.

Prediction MethodSample TypeTarget Output(s) of the ModelReference
ANN–PSO, ANN- Bayesian Regularization (BR)GWAs, Cu, Pb, Zn[37]
ANN–Imperialist Competitive Algorithm (ICA), ANN–Levenberg–Marquardt (LM)GWAs, Cu, Pb, Zn[38]
ANN, ANN–Biogeography–Based Optimization (BBO) Algorithm, Multi-output Adaptive Neuro-Fuzzy Inference System (MANFIS)–Subtractive Clustering Method (SCM)GWFe, Mn, Pb, Zn[39]
SVM based Regression–Radial Basis Function (RBF)GWPb, Zn, Cu[40]
ANNGWSi, Al, Fe, K, Ca, Na, Mg, Cl, Mn, Sr, Br(Groundwater)[41]
BP-NN, Nonlinear AutoRegressive eXogenous (NARX)GWAs[42]
ANNGWWater Quality Index[43]
ANN, MLRGWpH, EC, TDS, TH, MHMI, PLI, SPI[44]
ANN, Deep LearningGWHPI, HEI, CI, EHCI, HMI, PMI[45]
MLP-NN, Elman-NN, GFF-NNGWPb, Zn, As[46]
BP-NNGWTurbidity, Fe, Cl, SO4, TDS, TH, Mn, Zn, KMnO4 Index, NO3-N, NO2-N, NH3-N, F[47]
MLR, BP-NN, GEPSWWQI[48]
NARX, BP-NNCSWCr, Ni, Cu, Pb[49]
K-Means CA, BP-NNLSWFe, Cu[50]
ANN, SVMSWTi, Cu, Mn, Ni, As, Cd, Sb, Pb[51]
MANFIS–Grid Partitioning (GP), MANFIS-SCM, MANFIS–Fuzzy c-means Clustering Method (FCM)SWCu, Fe, Mn, Zn[52]
Adaptive Neuro–Fuzzy Inference System (ANFIS)SWCd[53]
ANN–LM, ANN–ICASWAs, Cu, Pb, Zn[38]
ANNSWMn[42]
ANN, SVM with Polynomial (SVM-Poly), SVM–RBF, Model Tree (M5P), K–Nearest Neighbor (K-NN)SWCu[54]
ANNSWCu[55]
SVM, Generalized Regression Neural Network (GRNN)SWCu, Fe, Mn, Zn[56]
SVM, ANNSWNi, Fe[57]
BP-LMSWCd, Cr, Cu,[58]
These ML models for heavy metal detection were developed using water parameters that can only be obtained through laboratory testing such as nitrate, sulphate, biological oxygen demand (BOD), chemical oxygen demand (COD), total nitrogen (TN), total phosphorus (TP), and phosphate, to name a few. The in-situ implementation has never been carried out due to due to the challenges described above. Therefore, the purpose of this study was to develop a heavy metal prediction model that utilized in-situ measurements to create tools, such as prediction models, to address the challenges in conducting regular water quality assessment and monitoring.

2. Materials and Methods

The study implemented a hybrid neuro–particle swarm optimization (hN-PSO) approach in predicting heavy metal concentrations in SW and GW. This hN-PSO method utilized simple physicochemical water properties such as temperature, water pH, electrical conductivity (EC), and total dissolved solids (TDS) as input datasets. The number of water samples with elemental analysis was also included. These SW and GW samples were utilized to generate spatial concentration maps of physicochemical parameters and heavy metal concentrations using the machine learning geostatistical interpolation (MLGI) approach through the use of MATLAB and GIS ArcMap. The following sections detail the focus and analysis implemented in this study.

2.1. Location of the Study and Data Gathering

One of the largest mining activities in the country was in the province of Marinduque, an island in the southern portion of Luzon approximately 160 km south of Metro Manila. The mining operations in the province started in 1969 and the generated mine tailings were discharged to the major water bodies on the island [59]. An earth dam was constructed at Maguilaguila to prevent silt from a waste pit from being discharged to the Mogpog River. A dam break happened in 1993 and impacted the Mogpog River, causing detrimental effects such as flooding, significant adverse effects on the infrastructure, agriculture, and public health in the surrounding community. In 1996, the Tapian Pit collapsed, dumping approximately 180,000 to 260,000 m3 of mine tailings into the 27-km-long Boac River. This was considered one of the world’s worst mining disasters [60]. The rivers and tributaries in the province of Marinduque were shown in Figure 1.
Figure 1

Major rivers and its tributaries in the province of Marinduque.

The GW samples were collected from various wells from six municipalities of Marinduque island province. Surface water samples were also taken from the island province’s different rivers and tributaries. One (1) liter polyethylene bottle was used for samples collection. The GW and SW data were gathered in accordance with EPA regulations SESDPROC-301-R3 [61] and SESDPROC-201-R3 [62]. The SW bodies in the province of Marinduque are classified as Type C, which is defined as fishery water for the reproduction as well as the growth of aquatic species, for boating, fishing, and water for agriculture, according to the Department of Environment and Natural Resources (DENR) Administrative Order (DAO) 2016–08 [63]. The island province has a climatic classification of Type III, with a Dry Season (DS) running from November to April and a Wet Season (WS) spanning the remainder of the year [64]. Both GW and SW samples were collected covering the DS and WS.

2.2. Analysis of Physicochemical Parameters and HM Concentrations

In–situ measurements of the physicochemical parameters from 62 SW and 34 GW sampling sites were collected during the DS, whereas 59 SW and 49 GW were collected during the WS. These water properties were temperature (°C), pH, EC (µS/cm), and TDS (mg/L) using a Hanna HI 9811-5 handheld multi-parameter sampler. Total metals were also quantified for each surface water and groundwater sample. The EPA Method 3005A and 200.7 were used for the acid digestion and elemental analysis, respectively, to determine total recoverable or dissolved metals [65,66] and HM metal concentrations for SW and GW at limited sampling locations. The elemental analysis was carried out by ICP-OES Optima 8000.

2.3. Neuro-Particle Swarm Optimization Modelling (NN-PSO)

The prediction models for the HM concentration in SW and GW were developed using a particle swarm optimization informed backpropagation neural network. For this purpose, the MATLAB R2021a Neural Network Toolbox was utilized throughout the study.

2.3.1. Machine Learning Geostatistical Interpolation (MLGI) Mapping

The collected surface water and groundwater samples during the dry season (DS) and wet season (WS) were mapped using an MLGI method [67]. This included physicochemical parameters and heavy metal concentrations of SW and GW. The hybrid technique integrated with the empirical Bayesian kriging (EBK) method generated the spatial concentration maps of the target study area. The generated maps were used as part of the datasets utilized in the development of heavy metal prediction models.

2.3.2. Data Pre-Processing

The ANN is capable of processing purely numerical data input. As a result, all data must be transformed into numerical data input format. Data normalization was accomplished by scaling the network’s input and output data nodes between −1 and +1. To prevent the inverse effect of input variables with differing scales, the data from the input variables and heavy metal concentrations were normalized to the same scale. Data normalization was used to ensure rapid convergence and to acquire the lowest mean square error (MSE) values possible [45]. The normalized values of each input and output were achieved using Equation (1). where y is the normalized value, y = +1, y = −1, x is the real value, and x and x are the upper and lower limit quantities of the parameter being normalized.

2.3.3. Backpropagation Neural Network

ANN is a basic and appropriate approach for modeling non-linear connections. To describe the effects of each aspect, it is based on a mathematical model that incorporates managing components called neurons and the relationships between them. This approach’s extensive flexibility to generate outcomes from complex or partial data makes it ideal for forecasting unique scenarios [68]. ANN is a data management strategy that mimics how a natural neural system like the human brain integrates data. It exports experimental data to the network configuration. The network understands the total system by calculating mathematical facts. The model’s key feature is its unique data management structure arrangement. An ANN is designed to categorize data or identify arrays using knowledge-based techniques. The synaptic interactions between neurons include learning knowledge [69]. The learning methodology is used to change and develop the neural network. The model is upgraded to provide a better result from a given input. An ANN is composed of three layers, each of which is derived from a biological neuron. Input neurons (IN) of the biological neuron transmit data to the hidden layer (HL), which in turn delivers data to the output neurons (ON). During activation, each neuron performs a biased sum of the inputs from the neurons to which it is linked. If the entire database exceeds the established limitation amount, the neuron links to additional neurons called transfer [70,71]. The internal characteristics of the surface water and groundwater heavy metal prediction model include the training algorithm and the transfer function. The LM approach (training algorithm) employs a second-order derivative of the performance index but approximates the Hessian matrix using the Jacobian gradient. This approach is the fastest way to train feedforward ANNs with a few hundredweights [72]. For the transfer function, the hyperbolic tangent sigmoid (tansig) is the most often utilized neural activation function for multilayer networks. The tansig functions return outputs scaled between −1 and 1 [73]. The architecture of the heavy metal prediction models is presented in Figure 2.
Figure 2

The architecture of the heavy metal prediction models.

The complete data array used during the training and testing stages was split into two groups. One set contains 70% of the dataset utilized for network training, whereas the remaining 30% of the data were used for network validation (15%) and testing (15%). The training procedure involved exhibiting entire example pattern pairs in the training dataset to the network and adjusting the weights of the connections until desired values are obtained by the MATLAB Neural Network Toolbox using an iteration-based method. The trained network is exposed to the testing data array to verify the efficiency of the training process after the training procedure is complete [74]. The purpose of the testing step of an ANN model is to ensure that the constructed model was properly trained, and that adequate generalization was achieved. As a result, testing and training the network are basically similar. The testing set is crucial for ensuring that the network has not only remembered a particular dataset but has also learned the application-specific patterns. The testing dataset is a separate dataset that is unknown to the network. After the training phase was completed, the testing dataset is employed to validate and generalize the trained network. When the network can exactly generalize the output for this testing data, the neural network is able to properly forecast the output for new data, and the network is verified [75]. The number of hidden layers was reduced to one, and the number of hidden neurons was limited to 1–30 to prevent an overly complex model. Additionally, the early stopping approach was used to avoid overfitting. The training was halted at the moment of the lowest error [76].

2.3.4. Particle Swarm Optimization

PSO is a well-known evolutionary process that is motivated by the individual interaction and mobility kinetics of insects, birds, and fish. It is a search method based on optimization and population stochastic search that seeks to solve problems in a continuous search space [11]. Particle swarm optimization’s primary benefits over other optimization methods such as ICA and genetic algorithm (GA) are its faster learning rate and lower memory requirements. The particle motions determine the optimal global and optimal personal locations using the PSO. Equations (2) and (3) are the equations for the location and velocity of the particles. where V, V, X, and X refer to the particle’s current and new velocity and location, respectively. Additionally, C and C indicate two positive and steady acceleration quantities designated from operators. The variables r and r refer to arbitrary variables in the form of (0,1), whereas ω denotes the inertia weight (IW) [77].

2.3.5. Hybrid NN-PSO (hN-PSO)

The ANN’s training process generates a minimization issue that may be addressed using either traditional or metaheuristic techniques. In an hH-PSO model, the PSO is used to reduce the ANN’s errors by establishing the model’s optimal weights and biases [78]. The parameters in this study are the weights and biases, and the model’s viable area is based on the interval over which these parameters fluctuate. Equation (4) may be utilized to determine the fitness function of the ith particle. where E is the fitness variable, T is the target variable, P is the forecasted variable output based on weights and biases, S is the population of training samples, and O is the number of neurons. The following steps are required to implement the hN-PSO model: (a) developing an ANN model with initial weights and biases using a specified number of hidden neurons (HN) in the HL, (b) revising the weights and biases to represent the location of a particle in the model’s “x”-dimensional space, where “x” is the total number of weights and biases, (c) using each particle in each iteration, output values can be predicted and the fitness function value can be calculated using Equation (3), and (d) updating the location of the particles by the PSO algorithm for a specified number of populations and iterations until the target is achieved (fitness function is minimized) [35]. The number of HN for HM models assessed in this research ranges from 1 to 30, as suggested by Tufaner and Demicri in 2020 [79]. Moreover, the number of iterations has been set to 2000 as suggested in the study of Rukhaiyar et al. in 2018 [80]. The hN-PSO system for predicting the heavy metal concentration is shown in Figure 3 [32].
Figure 3

The hN-PSO system.

PSO was used to train and optimize the initial ANN model. The neural network’s weights and biases were introduced and optimized using the following steps: (a) data collection, (b) network development, (c) network design, (d) initial weights and biases, (e) ANN training using PSO, (f) network validation, and (g) network acceptance as the governing model. Following the preparation of the obtained data, the PSO initialized the particles randomly, including the number of hidden neurons and transfer function and the location and velocity of each particle, which are updated with each iteration. The process starts by traversing the hyperspace of possible solutions in search of the ideal solution. At each iteration, the particles respond adequately depending on their knowledge of other particles. The ANN optimization block diagram using PSO is presented in Figure 4 [81].
Figure 4

Block diagram of ANN weights optimization using PSO.

2.3.6. Performance Validation and Measurement

Internal and external validation will be used to assess the network’s performance. Internal validation is a component of the model’s development phase and will be carried out in the manner proposed by Thio et al. [82]. An external validation, which includes an external dataset, will be performed to assess the generalizability of the governing neural network architecture when applied to an external set of data [83,84]. The performance measurement criteria used is the Pearson’s correlation coefficient (PCC) and MSE wherein the ideal value is 1 and 0 for the Pearson’s correlation coefficient and MSE, respectively [77,85]. These values are statistical measurement factors employed to compute the connection and the variance concerning computed and forecasted quantities, respectively. Equations for computing the MSE and Pearson’s correlation coefficient are presented in Equations (5) and (6). where n is the overall quantity of the dataset, and P and O are the HM concentrations forecasted by the ANN techniques and observed quantities, respectively, whereas the and were the average observed values and average forecasted values of the HM concentrations. Moreover, to extend the evaluation of the models, the Akaike information criterion (AIC) and the Kling–Gupta efficiency (KGE) were also used as performance metrics in this study. The AIC is a measure of goodness and a tool for model selection wherein the least value is the criterion for selecting the best model. At the same time, the KGE is based on the correlation, variability bias, and mean inclination of the model [86,87]. The equations for obtaining the AIC and KGE are presented in Equations (7) and (8). where N is the number of datasets, k is the quantity of HN, R is the linear correlation between the actual and predicted value, VE is the variability error, which is the ratio between the standard deviation of the expected and the observed value, and BT is the bias term, which is the ratio of the mean predicted and mean actual value. The ideal value for KGE is equal to 1. The optimal architecture of the models, which includes the number of hidden neurons in the hidden layer, was obtained using the MSE and AIC values. The MSE values were utilized to calculate the AIC values for each model topology (hidden neurons from 1 to 30) for SW and GW during the DS and WS. The hidden neuron that corresponds to the minimum AIC values was the governing model structure adopted in each HM model.

2.4. Comparison to Other Models

The performance of the developed hN-PSO model for predicting heavy metals in SW and GW were compared to the performance of the other prediction models such as linear and support vector machine models. Linear models include robust linear regression, linear regression, and stepwise linear regression models. Support vector machine models include linear SVM, fine Gaussian SVM, cubic SVM, medium Gaussian SVM, and quadratic SVM. Robust regression is an effective method for regression analysis and a viable alternative to least squares regression approaches for datasets polluted by outliers or influential observations [88]. MLR determines the correlation between two or more input variables by applying a linear equation to observed data. MLR involves both data summarization and investigation of the connection between variables. The general form of MLR models is presented in Equation (9) wherein Y is the independent variable, X is the dependent variable, and ai are the regression coefficients [89], and a stepwise regression is a technique for testing the statistical significance of each independent variable in a linear regression model sequentially [90]. Linear SVM is utilized for linearly separable data, which implies that if a dataset can be categorized into two classes using a single straight line, the data are linearly separable, and a linear SVM classifier is employed [91]. There are three Gaussian SVMs employed in the model comparison, including the fine Gaussian SVM, medium Gaussian SVM, and coarse Gaussian SVM. Fine Gaussian SVM uses a moderate amount of memory for binary classification and a large amount of memory for multiclass classification during the training phase, whereas medium Gaussian SVM uses a moderate amount of memory for binary classification and a large amount of memory for multiclass classification during the training phase. Additionally, the memory consumption of a medium Gaussian SVM is high for multiclass classification and low for binary classification [92]. Coarse Gaussian SVM is a non-linear SVM learning approach that falls under the category of data mining. Furthermore, the coarse Gaussian SVM is difficult to comprehend and has limited flexibility [93]. Additionally, cubic and quadratic SVMs were also compared to the developed NN-PSO models. A cubic SVM is an effective SVM approach when dealing with a memory space constraint wherein SVM locates a hyperplane in a multidimensional space that best separates the classes [94], whereas in quadratic SVM, memory utilization is low for binary classification and high for multiclass classification during its training phase. The prediction speed is likewise rapid for binary classification and slow for multiclass classification [95].

2.5. Sensitivity Analysis

Due to the efficiency of the simulation results, the connection weight (CW) segmentation approach was able to calculate the contribution of each input variable to the HM concentration using a sensitivity analysis. Sensitivity analysis was utilized to determine the rate of change in model output as a function of model parameter changes and, as a result, to identify the most influential factors in HM concentrations in surface and groundwater [96,97]. The connections between the neurons are represented by the CW, which means the connections between the issue and the resolution. Sensitivity analysis was used to establish the RI of individual input parameters. Olden’s CW technique was used in this work to estimate the variable significance of individual input variables to the number of heavy metals in SW and GW [98]. The CW approach was used to compute the product of the IN–HN and HN–ON, CWs for each IN and ON, and then add the products across the HN. The higher the sum, the more significant the IN is. Equation (10) is used to compute the RI of an input variable “i”. where R.I. is the variable significance of the variable “i” in the input layer (IL) to the HM concentration, x is the index quantity of the HN, W is the CW between the input parameter “i” and the HN noted as x, and W is the CW between the HN noted as x and the ON noted as y [99].

3. Results

This section summarizes the principal results and includes all models developed using neural networks coupled with particle swarm optimization technique. This section also provides the result of the sensitivity analysis of the variables and their influence on the HM concentrations in SW and GW as well as uncertainty analysis of the HM models.

3.1. Heavy Metal Concentrations

The dataset for the physicochemical characteristics and HM concentrations used in the study are depicted in Appendix B. The physicochemical characteristics observed include temperature (Figure A1), pH (Figure A2), EC (Figure A3), and TDS (Figure A4). The heavy metal concentrations observed include Cr (Figure A5), Cd (Figure A6), Fe (Figure A7), Mn (Figure A8), Zn (Figure A9), Ni (Figure A10), Pb (Figure A11), and Cu (Figure A12).
Figure A1

Spatial maps of temperature for (a) SW during the DS; (b) SW during the WS; (c) GW during the DS; (d) GW during the WS.

Figure A2

Spatial maps of pH for (a) SW during the DS; (b) SW during the WS; (c) GW during the DS; (d) GW during the WS.

Figure A3

Spatial maps of EC for (a) SW during the DS; (b) SW during the WS; (c) GW during the DS; (d) GW during the WS.

Figure A4

Spatial maps of TDS for (a) SW during the DS; (b) SW during the WS; (c) GW during the DS; (d) GW during the WS.

Figure A5

Spatial maps of Cr for (a) SW during the DS; (b) SW during the WS; (c) GW during the DS; (d) GW during the WS.

Figure A6

Spatial maps of Cd for (a) SW during the DS; (b) SW during the WS; (c) GW during the DS; (d) GW during the WS.

Figure A7

Spatial maps of Fe for (a) SW during the DS; (b) SW during the WS; (c) GW during the DS; (d) GW during the WS.

Figure A8

Spatial maps of Mn for (a) SW during the DS; (b) SW during the WS; (c) GW during the DS; (d) GW during the WS.

Figure A9

Spatial maps of Zn for (a) SW during the DS; (b) SW during the WS; (c) GW during the DS; (d) GW during the WS.

Figure A10

Spatial maps of Ni for (a) SW during the DS; (b) SW during the WS; (c) GW during the DS; (d) GW during the WS.

Figure A11

Spatial maps of Pb for (a) SW during the DS; (b) SW during the WS; (c) GW during the DS; (d) GW during the WS.

Figure A12

Spatial maps of Cu for (a) SW during the DS; (b) SW during the WS; (c) GW during the DS; (d) GW during the WS.

Table 3 and Table 4 enumerate descriptive information on the HM concentrations in SW samples from Marinduque Province. The mean surface water temperature for both the dry and wet seasons was within the Philippine Water Quality Guidelines (WQG). The mean surface water pH for both the dry and wet seasons was less than the minimum pH value set by the Philippine WQG and the WHO standards. The mean EC and TDS of surface water were greater than the guidelines set by the WHO, which are 1500 µS/cm and 1200 mg/L, respectively. The mean concentration of total Cd, Fe, and Cu was above the Philippine WQG and the World Health Organization guidelines (WHO). The mean concentration of Cr in SW was above the Philippine WQG. Ni concentration also exceeded the guidelines set by the WHO for both dry and wet seasons.
Table 3

Descriptive statistics of the physicochemical parameters and total concentration of HM (in mg/L) used in the DS–SW model.

ParameterNMinMaxMeanGuidelines
Philippine WQG [100]WHO
Temp (°C)8026.036.430.5825–31 -
pH802.99.46.286.5–9.06.5–9.2
EC (µS/cm)80130.06000.02617.21-1500
TDS (mg/L)8060.03000.01377.66-1200
Cr (mg/L)800.000290.037660.018200.0100.050
Cd (mg/L)800.007060.061220.043150.0050.003
Fe (mg/L)800.452372.761952.323901.5000.300
Mn (mg/L)800.0004911.097832.072690.2000.400
Zn (mg/L)800.000479.580501.690572.0003.000
Ni (mg/L)800.004130.126890.101560.2000.070
Pb (mg/L)800.003390.056080.038510.0500.010
Cu (mg/L)800.0276317.165677.67426-2.000
Table 4

Descriptive statistics of the physicochemical parameters and total concentration of HM (in mg/L) used in the WS–SW model.

ParameterNMinMaxMeanGuidelines
Philippine WQG [100]WHO
Temp (°C)8026.733.730.2625–31 -
pH803.18.45.946.5–9.06.5–9.2
EC (µS/cm)80905380.02211.00-1500.00
TDS (mg/L)80402670.01142.35-1200.00
Cr (mg/L)800.000230.037660.029370.0100.05
Cd (mg/L)800.000400.061220.044590.0050.003
Fe (mg/L)800.0691553.0162421.748081.50.3
Mn (mg/L)800.003610.017690.010270.20.4
Zn (mg/L)800.024800.074300.039222.003.00
Ni (mg/L)800.004150.126890.088200.200.07
Pb (mg/L)800.006800.056070.034580.050.01
Cu (mg/L)800.006900.207300.09144-2.00
The GW physicochemical characteristics recorded were below the Philippine National Standards for Drinking Water (PNSDW) 2017 and WHO requirements for both dry and rainy seasons. Likewise, the EC and TDS were within the World Health Organization limits. The total mean concentration of Cr, Cd, Fe, Zn, and Ni exceeded the PNSDW 2017 and the WHO guidelines. Table 5 and Table 6 include descriptive data for the physicochemical parameters and total concentrations of HM utilized in the DS and WS-GW models.
Table 5

Descriptive statistics of the physicochemical parameters and total concentration of HM (mg/L) used in the DS–GW model.

ParameterNMinMaxMeanGuidelines
PNSDW 2017 [67]WHO
Temp (°C)8026.349.637.72--
pH806.17.97.016.5–8.56.5–9.2
EC (µS/cm)8080.02350.01140.45-1500.000
TDS (mg/L)8030.01150.0499.12600.0001200.000
Cr (mg/L)800.017330.171820.075270.0500.050
Cd (mg/L)800.000550.103890.068790.0030.003
Fe (mg/L)800.0003854.6856711.501161.0000.300
Mn (mg/L)800.000098.718572.441370.4000.400
Zn (mg/L)800.0009856.9613313.952115.0003.000
Ni (mg/L)800.000130.125300.089550.0700.070
Pb (mg/L)800.015600.121780.106760.0100.010
Cu (mg/L)800.037110.260500.215421.0002.000
Table 6

Descriptive statistics of the physicochemical parameters and total concentration of HM (in mg/L) used in the WS–GW model.

ParameterNMinMaxMeanGuidelines
PNSDW 2017 [67]WHO
Temp (°C)8026.236.730.25--
pH805.67.96.856.5–8.56.5–9.2
EC (µS/cm)8020.02840.01185.05-1500.000
TDS (mg/L)8010.01400.0601.20600.001200.000
Cr (mg/L)800.016380.171790.147670.0500.050
Cd (mg/L)800.000550.103890.044580.0030.003
Fe (mg/L)800.1639013.586109.824321.0000.300
Mn (mg/L)800.004050.145790.040890.4000.400
Zn (mg/L)800.024800.519920.265635.0003.000
Ni (mg/L)800.001010.124900.100050.0700.070
Pb (mg/L)800.054960.121780.118310.0100.010
Cu (mg/L)800.006900.027590.022571.0002.000

3.2. Correlation Analysis

The PCC was applied to examine the degree of association between physicochemical properties and measured HM concentrations in SW throughout the dry and rainy seasons. Pearson’s correlation matrix (PCM) plots of these parameters across the DS and WS are shown in Figure 5. It was discovered that during both the DS and WS, the EC and TDS, as well as Cr and Cd, were positively associated. This recorded data confirmed the study of Tiwari et al. [101] and Huang et al. [102]. Cu and Zn also exhibited a positive association throughout the dry and wet seasons, agreeing with the findings of Bhuyan et al. [103]. During the wet season, a positive relation between Cd:Cu and Cd:Pb, as well as Ni and Pb, was recorded. These data agreed with the findings of Wang et al. [104]. These positive correlations among the heavy metals could indicate that the water body in the research area has similar hydro-chemical features [105].
Figure 5

Pearson’s correlation matrix plots for the surface water physicochemical parameters and HM concentrations during (a) DS; and (b) WS.

Correlations between physicochemical characteristics and heavy metals in groundwater were also studied. It was shown that EC and TDS had a positive relationship for both dry and wet seasons as illustrated by Figure 6. The positive association between EC and TDS is due to conductivity being a function of total dissolved solids, the ionic composition of water, and the concentration of dissolved species [106]. Likewise, a positive correlation was observed for Cd and Cr for both the dry and wet seasons, which suggests a common origin of these heavy metals [107]. Additionally, it was established that during the dry season, Cr has a positive link with Ni and Pb, which was confirmed by the results of Ukah et al. and Magesh et al. [105,108]. Moreover, similar to the findings of Rashid et al. in 2021 [109], there was a high positive correlation between Cd:Ni and Cd:Pb during the WS. Furthermore, a positive correlation was recorded between Fe and Zn. This is consistent with Senoro et al. results in 2022 [110]. Positive correlations between these metals showed that they originated from a common source. The loadings from Cd, Pb, and Ni were associated to anthropogenic sources as observed by Wagh et al. in 2018 [111]. The Fe and Zn correlation was geogenic in nature as elaborated in the findings of Bhutiani et al. in 2016 [112].
Figure 6

Pearson’s correlation matrix plots for the groundwater physicochemical parameters and HM concentrations during (a) DS; and (b) WS.

3.3. NN-PSO Modelling Results

Table 7 and Table 8 illustrate the NN-PSO simulations for the HM models in SW during the dry and rainy seasons, respectively. This section contains the R and MSE findings, as well as the topology of the created HM models. The topology of the heavy metal models was described as 4-HN-1 (input neurons-hidden neurons-output neurons), wherein the number of hidden neurons for each HM model is likewise shown in Table 7 and Table 8.
Table 7

NN-PSO simulation results for the heavy metal models in SW during the DS.

Hidden NeuronsNo. of ParticlesNo. of IterationsElapsed Time (sec)MSER
ValidationTesting
Cr3072000121.28840.0000090.968780.98999
Cd2832000153.18870.0000210.955660.96404
Fe2992000150.50060.0048710.995850.97976
Mn2732000151.06930.0043640.999330.98620
Zn3062000116.52870.0317730.999040.96388
Ni2962000115.78150.0000470.983160.97981
Pb3042000152.84670.0000200.978320.96557
Cu3022000153.22600.0390100.999720.98390
Table 8

NN-PSO simulation results for the heavy metal models in SW during the WS.

Hidden NeuronsNo. of ParticlesNo. of IterationsElapsed Time (sec)MSER
ValidationTesting
Cr2922000157.22460.00000090.990500.98830
Cd3082000152.50880.00001500.983070.96868
Fe28102000151.05970.07027600.980990.96368
Mn29102000138.31260.00000060.956860.96337
Zn3032000114.71990.0000050.985590.98614
Ni2712000120.57530.0000580.987790.96227
Pb2912000119.64430.0000080.983770.98897
Cu2892000153.60680.0002690.967070.95589
The R validation of HM models in SW varied between 0.95566 and 0.99972 during the DS and between 0.95686 to 0.98779 during the WS. The R values varied from 0.95710 to 0.98620 and 0.94923 to 0.98897, respectively, during the DS and WS throughout the testing phase. The highest MSE was observed in the Cu model, whereas the lowest MSE was observed in the Cr model for SW during DS. Moreover, the highest MSE was observed in the Fe model, whereas the lowest MSE was observed in the Mn model in the SW for the period of the WS. The R validation of HM models in GW varied between 0.97786 and 0.99965 during the dry season and between 0.96040 to 0.99925 during the wet season. The R values varied from 0.95538 to 0.99835 and 0.97544 to 0.99815, respectively, during the DS and WS throughout the testing phase. The Mn model had the greatest MSE, whereas the Pb model had the lowest for GW during DS. Additionally, the Fe model had the greatest MSE, whereas the Cu model had the lowest MSE in GW during the WS. The NN-PSO simulation findings for the HM models in GW during the DS and WS are shown in Table 9 and Table 10.
Table 9

NN-PSO simulation results for the HM models in GW during the DS.

Hidden NeuronsNo. of ParticlesNo. of IterationsElapsed Time (sec)MSER
ValidationTesting
Cr3082000154.56530.0000140.988510.98640
Cd2812000156.78190.0000780.989100.98683
Fe2992000150.36150.0318660.984140.96054
Mn3062000158.47570.0403150.984140.96002
Zn2942000122.39000.0087800.999650.99577
Ni27102000155.50620.0000730.977860.95538
Pb2892000124.10620.0000030.996410.99788
Cu2932000122.37250.0000300.996630.99835
Table 10

NN-PSO simulation results for the HM models in GW during the WS.

Hidden NeuronsNo. of ParticlesNo. of IterationsElapsed Time (sec)MSER
ValidationTesting
Cr2922000162.47540.000148000.968130.98011
Cd2742000157.03240.000038000.994260.98938
Fe2982000145.39540.045373000.960400.97544
Mn3082000164.00520.000078000.982310.97926
Zn2852000161.12270.000123000.998610.99775
Ni2972000160.06930.000022000.974630.98991
Pb28102000122.71190.000006000.987880.99495
Cu3052000157.58300.000000050.999250.99815
The validation and testing plots of the SW and GW during the DS and WS are exhibited in Appendix C. These include the plots for Cr (Figure A13), Cd (Figure A14), Fe (Figure A15), Mn (Figure A16), Zn (Figure A17), Ni (Figure A18), Pb (Figure A19), and Cu (Figure A20).
Figure A13

Correlation plots for NN-PSO simulations of Cr for (a) SW during the DS; (b) SW during the WS; (c) GW during the DS; (d) GW during the WS.

Figure A14

Correlation plots for NN-PSO simulations of Cd for (a) SW during the DS; (b) SW during the WS; (c) GW during the DS; (d) GW during the WS.

Figure A15

Correlation plots for NN-PSO simulations of Fe for (a) SW during the DS; (b) SW during the WS; (c) GW during the DS; (d) GW during the WS.

Figure A16

Correlation plots for NN-PSO simulations of Mn for (a) SW during the DS; (b) SW during the WS; (c) GW during the DS; (d) GW during the WS.

Figure A17

Correlation plots for NN-PSO simulations of Zn for (a) SW during the DS; (b) SW during the WS; (c) GW during the DS; (d) GW during the WS.

Figure A18

Correlation plots for NN-PSO simulations of Ni for (a) SW during the DS; (b) SW during the WS; (c) GW during the DS; (d) GW during the WS.

Figure A19

Correlation plots for NN-PSO simulations of Pb for (a) SW during the DS; (b) SW during the WS; (c) GW during the DS; (d) GW during the WS.

Figure A20

Correlation plots for NN-PSO simulations of Cu for (a) SW during the DS; (b) SW during the WS; (c) GW during the DS; (d) GW during the WS.

The optimal network parameters of the developed HM models (IN-HN-ON) for SW and GW are presented in Table 11. These include the training algorithm used (Levenberg–Marquardt algorithm), transfer function (tansig), and the number of hidden neurons for each model of SW and GW during the dry and wet seasons.
Table 11

Optimal parameters of the developed HM models in SW and GW during the DS and WS.

ModelGoverning Network StructureModelGoverning Network Structure
SW Dry Cr4-30-1GW Dry Cr4-30-1
SW Dry Cd4-28-1GW Dry Cd4-28-1
SW Dry Fe4-29-1GW Dry Fe4-29-1
SW Dry Mn4-27-1GW Dry Mn4-30-1
SW Dry Zn4-30-1GW Dry Zn4-29-1
SW Dry Ni4-29-1GW Dry Ni4-27-1
SW Dry Pb4-30-1GW Dry Pb4-28-1
SW Dry Cu4-30-1GW Dry Cu4-29-1
SW Wet Cr4-29-1GW Wet Cr4-29-1
SW Wet Cd4-30-1GW Wet Cd4-27-1
SW Wet Fe4-28-1GW Wet Fe4-29-1
SW Wet Mn4-29-1GW Wet Mn4-30-1
SW Wet Zn4-30-1GW Wet Zn4-28-1
SW Wet Ni4-27-1GW Wet Ni4-29-1
SW Wet Pb4-29-1GW Wet Pb4-28-1
SW Wet Cu4-28-1GW Wet Cu4-30-1
Figure 7 illustrates the link between the quantity of HN varying from 1 to 30 and the corresponding AIC values. It was observed that the lowest AIC values were seen in 27 HN for Mn (SW dry), Ni (SW wet and GW dry), and Cd (GW wet); 28HN for Cd (SW dry and GW dry), Fe (SW wet), Cu (SW wet), Pb (GW dry and GW wet), and Zn (GW wet); 29 HN for Fe (SW dry, GW dry, and GW wet), Ni (SW dry and GW wet), Cr (SW wet and GW wet), Mn (SW wet), Pb (SW wet), Zn (GW dry), and Cu (GW dry); and 30 HN for Cr (SW dry and GW dry), Zn (SW dry and SW wet), Pb (SW dry), Cu (SW dry and GW wet), Cd (SW wet), and Mn (GW dry and GW wet). These lowest AIC values observed in the respective hidden neurons were included in the metrics used to identify the governing model for each target HM.
Figure 7

Effect of the hidden neurons on the heavy metal model performance measured using AIC: (a) surface water—dry season; (b) surface water—wet season; (c) groundwater—dry season; (d) groundwater—wet season.

The results of the additional evaluation metric using the KGE are presented in Figure 8. Results show that the developed models for the SW and GW during the dry and wet seasons have good KGE values, wherein all values were greater than 0.95.
Figure 8

KGE values for SW and GW Models during the dry and wet season.

3.4. Comparison to Other Models

Linear and support vector machine models were likewise developed to see how well the developed hybrid NN-PSO models performed compared to other models. The results indicated that for surface water models generated using NN-PSO, the R-values were up to 1.60 times more than the R-values obtained using the SVM models and 1.2 to 2.0 times greater than the R-values obtained using the linear models during the dry season. For the wet season surface water model, the R values for HM models utilizing NN-PSO were up to 1.1 times and 1.6 times greater than the highest R-value observed from the SVM models and linear models, respectively. Similarly, the R values obtained from the heavy metal dry season groundwater models were up to 1.4 times greater than the highest R-value obtained from the SVM models and 1.2 to 2.5 times greater than the R-values obtained from the linear models. Moreover, wet season ground-water models were shown to have R-values up to 2.1 times and 1.9 times greater than the highest recorded R-value for models created using SVM and linear models, respectively. The radar graphs of the observed values for the generated NN-PSO models and the linear and SVM models are shown in Figure 9, Figure 10, Figure 11 and Figure 12.
Figure 9

Radar graph showing the performance of the (a) Cr; (b) Cd; (c) Fe; (d) Mn; (e) Zn; (f) Ni; (g) Pb; (h) Cu surface water models during the dry season.

Figure 10

Radar graph showing the performance of the (a) Cr; (b) Cd; (c) Fe; (d) Mn; (e) Zn; (f) Ni; (g) Pb; (h) Cu surface water models during the wet season.

Figure 11

Radar graph showing the performance of the (a) Cr; (b) Cd; (c) Fe; (d) Mn; (e) Zn; (f) Ni; (g) Pb; (h) Cu groundwater models during the dry season.

Figure 12

Radar graph showing the performance of the (a) Cr; (b) Cd; (c) Fe; (d) Mn; (e) Zn; (f) Ni; (g) Pb; (h) Cu groundwater models during the wet season.

3.5. Sensitivity Analysis Using Olden’s Connection Weight Approach

The CW of the models for DS and WS were utilized to calculate the RI of the physicochemical parameters to the HM concentrations. Using Olden’s CWs approach, the results of the calculation of R.I. in each model are shown in Figure 13.
Figure 13

The relative importance of the physicochemical parameters to the heavy metal concentration in (a) SW during the DS; (b) SW during the WS; (c) GW during the DS; (d) GW during the DS.

It was observed in the DS SW models that the surface water pH is the parameter with the highest RI to the Cr, Mn, Zn, Ni, and Cu concentration, whereas that of Cd, Pb, and Fe were seen to be temperature, EC, and TDS, respectively. The SW models during WS displayed temperature to have the highest R.I. to the Zn, Pb, and Cu concentration, whereas that of Cr and Mn was determined to be the TDS. It was also observed that pH is the most influential parameter for the Cd, Fe, and Ni concentrations. For the GW models during DS, the temperature was observed to have the highest R.I. to the Cr, Cd, Zn, Ni, and Pb concentration, whereas TDS recorded the highest RI for Mn and Cu. It was likewise seen that groundwater pH is the most influential factor for the Fe concentration in the DS. For the WS groundwater models, the temperature was observed to have the highest R.I. for Cr and Cd, whereas it was EC for Fe and TDS for Ni and Cu. It was also observed that the groundwater pH during the wet season is the most significant parameter for the Ni, Zn, and Pb concentrations.

4. Discussion

The HM contaminants in SW and GW endanger human life and contribute to the deterioration of the environment and health risks. Detected HM concentrations in SW and GW were higher than the WHO and Philippine standards. As water is a vital part of everyday activity of a community such as Marinduque Province, guaranteeing universal access and sustainable management of water should be ensured as mandated by the United Nations Sustainable Development Goals (SGDs). Accurately predicting HM concentration in water resources is significant to ensure that proper monitoring of its progression in the environment is implemented. It also enables the proper dissemination of information of the potential risks that a community might be exposed to and creates mitigating measures and remediation strategies that the authorities could implement to address the heavy metal contamination in water resources. The ML models are fundamentally data-driven, with several studies implementing different ML techniques with different variables used as the input parameters. The ANN-PSO technique was implemented in this study to develop the HM intensity models. The developed NN-PSO models were compared to models created using linear and SVM methods. It was observed that the developed NN-PSO models for heavy metals in surface water during the dry and wet season performed better than the models created using linear and SVM models. The observed R-values were up to 1.6 times and 1.1 times greater than the highest R-values observed for linear and SVM models during the dry and wet seasons. Moreover, the GW models were observed to have R-values that were greater than up to 1.4 times and 1.6 times the R-values observed in the highest linear and SVM models during the DS and WS. The findings were in agreement with the results of the study by Zhang et al. in 2020 wherein the hybrid ANN-PSO was observed to perform better as compared to the other MLR and machine learning models including SVM [32]. Furthermore, comparisons in the performance of the developed models to the previously published models are shown in Figure 14. These models include different ML techniques such as NN-PSO, NN-BR, NN-ICA, NN-LM, NN-BBO, MANFIS-SCM, SVM, MANFIS-GP, MANFIS-SCM, ANFIS, K-NN, and GRNN. In this study, a hybrid ANN-PSO model, i.e., hN-PSO, was used with input parameters that were easier to obtain. The findings reveal that the present study’s performance measures are on par with the performance of previously published heavy metal models, with the additional advantage of using parameters that can be collected in-situ.
Figure 14

Comparison of the performance of the published models and the developed models for (a) DS surface water; (b) WS surface water; (c) DS groundwater; (d) WS groundwater.

The simulation of different internal characteristics and topologies of models were implemented to obtain the governing models of HMs total concentration in SW and GW. It was observed that the governing models were based on the AIC performance metric values, which are the minimum among the observed HN. It was observed that as the AIC value reached its minimum value, further increasing the number of HN will return an increased AIC value. This implies that the network has already become generalized [113]. In addition, KGE values for all models in surface and groundwater during the dry and wet seasons were greater than 0.95, which suggests that the models are accurate. Additionally, it was discovered that the quantity of HNs had no influence on the model’s performance, which is consistent with Çolak’s findings in 2021 [114]. Olden’s CW technique is an excellent strategy for illuminating the neural network’s black box design by offering more explanatory insight into the input parameters’ contributions [115]. The Olden’s CWs approach was implemented in this study because it was the best method as suggested in the findings of Olden et al. [116]. In this study, it was observed that the temperature and pH are the most influential input parameters to the HM concentrations; these findings were also observed in Morin and Mutt [117]. The use of ML tools such as NN-PSO models is in line with the transition of different disciplines to Industry 4.0. Employing ML techniques provides a new avenue to create models that can be applied in real-world conditions that can be complex and non-linear in nature. The PSO method was used with the ANN model in this research to create the optimal model with the least error.

5. Conclusions

Limited HM concentrations data in SW and GW inspired the researchers to develop models as tools for environmental quality monitoring specifically on the total concentration of HMs such as Cr, Cd, Fe, Mn, Zn, Ni, Pb, and Cu. Using the physicochemical properties of SW and GW, which are the most commonly monitored data, the NN-PSO models were developed and exhibited good performance that was on par with linear, SVM, and existing published deterministic and AI models that used input parameters that cannot be obtained in field conditions. The performance of the models was evaluated using their AIC values, and the number of HN that returned the lowest AIC value was chosen as the governing topology for the HM models. It was further observed that as the AIC value reached its minimum value, further increasing the number of HN returned an increased AIC value due to network generalization. Moreover, the R and KGE values of the governing model were almost equal to 1, whereas MSE values were approaching 0, which are the ideal values for these performance metrics. The performance of the developed NN-PSO models was compared to created models using linear and SVM approaches, and the findings suggest that the NN-PSO was the superior modeling tool in this study based on its model performance. The R.I. of the input parameters was assessed using Olden’s CWs approach. The sensitivity analysis showed that temperature and pH were the parameters with the most influence on the HM concentration. Based on the elaborated findings, it can be concluded that the use of the hN-PSO for forecasting HM concentrations in water resources is both an effective and a practical approach. Additionally, the recorded findings indicate the efficacy of using common water parameters as inputs to prediction models for HM concentrations that may be simply adopted for in-situ settings. These findings could initiate and contribute to regular monitoring of HM concentrations in SW and GW. The findings of the study illustrate that the NN-PSO is an effective method and practical approach in predicting HM concentration in water resources. Moreover, this study provides a model with parameters suitable for in-situ purposes and on par with the other HM prediction models in water resources. Future studies on the inclusion of HM speciation and its prediction models with more data dimensions, or the use of feature reduction approach in model development are recommended. Multi-layered networks using deep learning and other ML may also be studied. Integration of time components is another facet for future study that will be investigated using time series-based methods such as NARX and LSTM, which can be used in dynamic analysis and transport studies.
  34 in total

1.  Effectiveness of groundwater heavy metal pollution indices studies by deep-learning.

Authors:  Sudhakar Singha; Srinivas Pasupuleti; Soumya S Singha; Suresh Kumar
Journal:  J Contam Hydrol       Date:  2020-09-23       Impact factor: 3.188

2.  Metals geochemistry and mass export from the Mississippi-Atchafalaya River system to the Northern Gulf of Mexico.

Authors:  Jeremy H Reiman; Y Jun Xu; Songjie He; Emily M DelDuco
Journal:  Chemosphere       Date:  2018-04-17       Impact factor: 7.086

3.  Application of radial basis function artificial neural network to quantify interfacial energies related to membrane fouling in a membrane bioreactor.

Authors:  Yifeng Chen; Genying Yu; Ying Long; Jiaheng Teng; Xiujia You; Bao-Qiang Liao; Hongjun Lin
Journal:  Bioresour Technol       Date:  2019-09-04       Impact factor: 9.642

4.  Comparison of Models for Predicting Quality of Life After Surgical Resection of Hepatocellular Carcinoma: a Prospective Study.

Authors:  Chong-Chi Chiu; King-Teh Lee; Hao-Hsien Lee; Jhi-Joung Wang; Ding-Ping Sun; Chien-Cheng Huang; Hon-Yi Shi
Journal:  J Gastrointest Surg       Date:  2018-06-18       Impact factor: 3.452

5.  Data-driven soft computing modeling of groundwater quality parameters in southeast Nigeria: comparing the performances of different algorithms.

Authors:  Johnbosco C Egbueri; Johnson C Agbasi
Journal:  Environ Sci Pollut Res Int       Date:  2022-01-25       Impact factor: 4.223

6.  An artificial neural network to predict resting energy expenditure in obesity.

Authors:  Emmanuel Disse; Séverine Ledoux; Cécile Bétry; Cyrielle Caussy; Christine Maitrepierre; Muriel Coupaye; Martine Laville; Chantal Simon
Journal:  Clin Nutr       Date:  2017-09-01       Impact factor: 7.324

7.  Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network.

Authors:  Awni Y Hannun; Pranav Rajpurkar; Masoumeh Haghpanahi; Geoffrey H Tison; Codie Bourn; Mintu P Turakhia; Andrew Y Ng
Journal:  Nat Med       Date:  2019-01-07       Impact factor: 53.440

8.  Remote sensing-based water quality index estimation using data-driven approaches: a case study of the Kali River in Uttar Pradesh, India.

Authors:  Saif Said; Shadab Ali Khan
Journal:  Environ Dev Sustain       Date:  2021-04-21       Impact factor: 3.219

9.  Development and Internal Validation of Machine Learning Algorithms for Preoperative Survival Prediction of Extremity Metastatic Disease.

Authors:  Quirina C B S Thio; Aditya V Karhade; Bas JJ Bindels; Paul T Ogink; Jos A M Bramer; Marco L Ferrone; Santiago Lozano Calderón; Kevin A Raskin; Joseph H Schwab
Journal:  Clin Orthop Relat Res       Date:  2020-02       Impact factor: 4.755

View more
  2 in total

1.  Statistical Assessment, Modeling, and Mitigation of Water and Soil Pollution.

Authors:  Lucica Barbeş; Alina Bărbulescu
Journal:  Toxics       Date:  2022-05-18

2.  In Situ Measurements of Domestic Water Quality and Health Risks by Elevated Concentration of Heavy Metals and Metalloids Using Monte Carlo and MLGI Methods.

Authors:  Delia B Senoro; Kevin Lawrence M de Jesus; Ronnel C Nolos; Ma Rowela L Lamac; Khainah M Deseo; Carlito B Tabelin
Journal:  Toxics       Date:  2022-06-21
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.