Literature DB >> 36119861

Long Short-Term Memory-based simulation study of river happiness evaluation - A case study of Jiangsu section of Huaihe River Basin in China.

Abstract

Real-time prediction of the state of the river itself and the degree of its benefit to the people is the leading way to achieve human-water harmony. Using the indicator scoring method as the evaluation method, we used the river evaluation data and results with time series characteristics as features and labels and applied the concept of transfer learning to Long Short-Term Memory to establish six subsystems, including water safety, water quality, economic contribution, water ecology, water management and water culture, to conduct a real-time rolling evaluation simulation study on the degree of river happiness in the Jiangsu section of the Huaihe River Basin in China. The empirical results show that the maximum Root Mean Square Error (RMSE) of the training set and test set of each system is 0.0226, and the lowest coefficient of determination R2 is 0.9011, which proves that the model fits well, according to which the relevant data of the watershed in June 2022 are brought in, and the evaluation result is obtained as 89.77 points. The overall trend is good, but a certain tendency to fall back at the level of economic contribution can be found, and the reasons are analyzed objectively.

Entities: Chemical

Keywords: Long short-term memory; River happiness evaluation simulation; Transfer learning

Year: 2022 PMID： 36119861 PMCID： PMC9479020 DOI： 10.1016/j.heliyon.2022.e10550

Source DB: PubMed Journal: Heliyon ISSN： 2405-8440

Introduction

On September 18, 2019, General Secretary Xi Jinping held a symposium on ecological protection and high-quality development of the Yellow River Basin in Zhengzhou, China. He issued a great call to “make the Yellow River a happy river for the benefit of the people” (Liu and Cheng, 2020). Since then, the river evaluation has worked with the “happiness scale” as the core idea came into being. A happy river maintains its health and supports the economic and social development of the watershed. In addition, it embodies the idea of “harmony between human and water”. It allows for a high level of security and satisfaction for the people in the watershed (Happy River Research Group, 2020). From the meaning of Happy River, we can extract its vital influencing factors. They include safe operation, continuous supply, ecological health, and human-water harmony (Zuo et al., 2020a). Safe operation means the sound structure and function of the river itself. Happy rivers have smooth water and sand channels and flood and drought prevention (Dupuits et al., 2019; Hubble, 2010). Continuous supply means the river can provide sufficient and high-quality water resources for residential life, industry, agriculture, etc (Gumbo and Kapangaziwiri, 2021). Ecological health encompasses water ecology and water environmental health. First, the river should have a good quality water body and sediment. Secondly, rivers should have high biodiversity (Wolfram et al., 2021). Human-water harmony means that people’s development and protection of rivers can be synergistic (Zuo et al., 2020b). These factors influence the evaluation of Happy River. Happy River Evaluation is the process by which water practitioners score a particular river through a series of procedures. In general, the procedure includes the identification of critical influencing factors, selection of evaluation indicators and selection of evaluation methods (Chen et al., 2022a). Scholars have conducted preliminary studies on the evaluation indicators of happy rivers. Throughout the research journey, the changes in the evaluation target layer are shown below. Initially, the happiness of rivers was evaluated in terms of their natural attributes, human and social attributes, and the degree of human-water harmony (Han and Xia, 2020). As the research progressed, academics proposed a more comprehensive set of goals. They are “flood prevention and security, quality water resources, healthy water ecology, livable water environment, and advanced water culture” (Jin et al., 2022; Sally, 2021; Chen et al., 2022b; Xia et al., 2022). We can find that the target system is missing expectations in terms of water management. In addition, scholars have conducted preliminary studies on the evaluation methods of happy rivers. At present, the evaluation methods of Happy River can be divided into two types according to their intrinsic nature – subjective and subjective-objective combination. The main idea of subjective evaluation is to analyze the influencing factors, form a complete evaluation system, and use different Weighting and evaluation methods to score. The models used in the existing studies include the entropy-weighted physical element model, cloud model, fuzzy evaluation method, and improved grey TOPSIS model (Wang et al., 2021a, Wang et al., 2021b; Huang et al., 2021; Han and Xia, 2020; Chen, 2021). The above models have apparent shortcomings. They are more subjective and have a cumbersome evaluation process. As a result, scholars have begun to explore evaluation methods that combine subjectivity and objectivity. The idea of combining subjectivity and objectivity is to train a neural network based on existing evaluation data with black box principles to produce a prediction that approximates the actual value (Qiaozhen et al., 2019). Currently, the mainstream neural network models include Radial Basis Function (RBF), Back Propagation (BP), Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM). BP, RBF, and LSTM have been repeatedly applied to evaluate water ecological health and quality (Tong et al., 2022; Cui, 2012; Shi et al., 2021). In order to overcome the shortcomings of the above study, we added the evaluation index of “water management” and applied LSTM to the evaluation of Happy River. “Efficient water management” means that the management and services related to river management are not absent and efficient. Key influencing factors include the professional quality of water staff, the degree of information management of water work, etc. The increase in water management expectations has enriched the Happy River. Besides, we chose LSTM as the evaluation method. This is because it can handle time series data better than BP and RBF (Smagulova and James, 2019). This evaluation is precisely a time series data processing process. As mentioned previously, LSTM has been used several times to evaluate water quality with good fitting results (Zhou et al., 2021). The Happy River evaluation only replaced the evaluation index compared to the water quality evaluation. Therefore, we performed the migration and retraining of the model. In this process, we need to adjust the hyperparameters of the LSTM so that the model’s fit remains superior. In summary, the innovations of this paper are shown below. We have enriched the evaluation index system of Happy River and led the evaluation of Happy River into a new era of objectivity and efficiency using LSTM.

Research methodology and data sources

Research methodology

Empowerment methods

This research is based on the expert scoring method, entropy method and CRITIC for comprehensive weighting. Expert scoring is a method of calculating weights using experts' assessment of the importance of indicators. It is a subjective empowerment method (Chen et al., 2018). The entropy method is a method of weight calculation using data entropy information, i.e., the amount of information. It is a method applicable when there are fluctuations between data and, at the same time, will use the data fluctuations as a kind of information (Dash and Kalamdhad, 2021). CRITIC is a method for weight calculation using correlations between data (Zhu and Chang, 2020; Yjc and Dza, 2021). First, we believe that river evaluation is highly specialized and prone to information asymmetry in water resources. Professional advice from the water sector was helpful in the evaluation itself. Therefore, the index assignment table provided by experts from the China Institute of Water Resources and Hydropower Research was selected for this study. Second, we used a combination of the entropy-critic for objective empowerment to reduce the evaluation’s subjective arbitrariness. The CRITIC method does not measure the degree of dispersion between indicators. The entropy method tends to ignore the correlations that may be contained among the indicators. However, the two complement each other perfectly. We believe that such a combination approach not only entirely takes into account the variability of the data of each indicator but also can take into account the correlation between the data (Fu and Chu, 2020).

Transfer learning

Transfer learning is the learning process of taking a model learned in an old domain and applying it to a new domain based on similarities between data, tasks, or models (Panigrahi et al., 2021). Transfer learning can be classified according to four criteria: the presence or absence of labels in the target domain, learning methods, features, and offline versus online forms (Chen, 2019), as shown in Figure 1. The transfer learning covered in this study is model-based migration. Model-based migration refers to the method of finding the parameter information shared between them the source and target domains to achieve migration, and this form of migration requires the assumption that the data in the source domain and the data in the target domain can share some parameters of the model (Fernandes and Cardoso, 2017; Kaya et al., 2019; Bayoudh et al., 2020). Finetune method is the original developed method for deep network migration and the one used in this paper. Existing studies have shown that: deep migration networks are more effective than random initialization of weights; 2, deep migration networks have advantages in suppressing data variability; migration of network layers can accelerate network learning and optimization; and the first three layers of neural networks are general features, which will be more effective for migration (Yosinski et al., 2014).

Figure 1

Schematic diagram of transfer learning classification.

Long Short-Term Memory

Recurrent neural networks (RNNs) are a powerful class of neural network models for processing and predicting sequential data (Yang et al., 2018). Long Short-Term Memory (LSTM) operates on a similar principle to RNNs. However, because the structure of the LSTM black box (Internal Unit of LSTM) is richer and more detailed, it has more powerful information storage and prediction capabilities. It is a model that overcomes the inherent flaws of RNN - gradient disappearance and gradient explosion (Bengio, 2002). Due to its excellent properties, many scholars are devoted to using LSTM for research related to sequence data, such as behaviour simulation, image recognition, medical diagnosis, etc (Yin et al., 2016; Zhang et al., 2020a, Zhang et al., 2020b; Xia et al., 2018). The LSTM architecture introduced in this study is from Graves and Schmidhuber (2005). Its basic structure is a chain. As shown in Figure 2, there are several black boxes A in the whole chain. Let us take the second black box in the figure as an example to explain the role of the black box. The black box A can form a nonlinear mapping between the input value xt and the output value ht at time t. Specifically, the black box does not tell us the mathematical expression between the input and output values, but if we give the black box a large number of input and output values, it will be trained to produce a neural network with high accuracy. At this point, when we enter a new value, the black box will tell us an output value that is very close to the actual value. This is how the black box, or LSTM, works. In LSTM, the black box A can also keep the information at moment t (Ct and ht) and transmit them to moment t + 1. This transmitted information will modify the input values at moment t + 1. Therefore, the most remarkable feature of LSTM is its powerful time series data processing capability.

Figure 2

Diagram of LSTM structure.

Diagram of LSTM structure. In the following, we expand the black box A, as shown in Figure 3. The labelled boxes “σ” are called gates in the LSTM cell. The ft is called the forget gate, it is called the update gate, and ot is called the output gate. The “gates” can be considered a fully connected layer, and these gates help the LSTM store and update the information (Houdt et al., 2020). Specifically, gating is implemented by Sigmoid functions and dot product operations, and gating does not provide additional information. In addition, from Figure 3, the LSTM black box has three input and three output values. The three input values are the cell state at moment t − 1 (Ct−1), the hidden state at moment t − 1 (ht−1) and the sample vector at moment t (xt). The three output values are two hidden states at moment t (ht) and one cell status at moment t (Ct). The output value at moment t − 1 is the input value at moment t. Therefore, we only need to explore how the output values are formed.

Figure 3

Diagram of LSTM internal structure.

Diagram of LSTM internal structure. The general form of the gating control is shown in Eq. (1). Where σ(x) = 1/(1 + exp(−x)), is called the Sigmoid function. The Sigmoid function is often used as an activation function for LSTM because of two main properties. First, the Sigmoid function is easy to derive, facilitating the subsequent use of gradient-based parameter optimization algorithms. Second, the Sigmoid function can control the passage rate of information. When the output value is 0, the gating does not pass any information. When the output value is 1, the gating passes all the information. In addition, x and h are the input and output values of the black box, respectively. w and b are the weight matrix and bias term, respectively. As shown in Figure 3, the value of gating at time t is related to x and h. Hochreiter defines the general expression for gating. Multiplying x and h by the corresponding weight matrix, adding bias term, and performing the activation function operation can generate the gate unit (Yu et al., 2019). The initial weights and bias terms are random, but this does not affect the final training accuracy of the neural network. This is because until a mature LSTM is formed, it belongs to a supervised learning neural network. Then, we can keep adjusting the hyperparameters for trial and error. Eventually, we will harvest an LSTM with high accuracy in the continuous forward and backward propagation of information. After obtaining the general expression for gating, we can obtain the specific expressions for the three gates. The expression of the forgetting gate is shown in Eq. (2). The expression of the update gate is shown in Eq. (3). The expression of the output gate is shown in Eq. (4). As shown in Figure 3, C, the memory cell at moment t, consists of the summation of information from two sources. On the one hand, it results from element-by-element multiplication of ft and C. On the other hand, it is the result of the element-by-element multiplication of i and C′. Thus, C is essentially the sum of the information at moment t−1 (after forgetting) and the information at moment t (after updating). The expression of C is shown in Eq. (5). Where ⊙ represents the corresponding element multiplication. “Tanh” is a hyperbolic tangent function. This function is also widely used in LSTM like the Sigmoid function. Finally, we give the expression for the last output value at moment t (the hidden state at moment t − h). As shown in Figure 3, the information in h results from C passing through the tanh activation function and multiplying it element by element with o. The formula for ht is shown in Eq. (6).

Data sources

The data required for the construction of the long and short-term neural network evaluation simulation model are the sample and the label, the sample is the original data of the river happiness degree evaluation index system, and the label is the happiness degree score. The degree of river happiness is the collection and the collective name of the degree of health of the river itself, the degree of supporting high-quality economic and social development of the basin, the degree of carrying cultural soft power, the degree of human-water harmony, etc. Based on the study of relevant literature, the professional recommendations of the “China River and Lake Happiness Index Report 2020” were used as the main body and combined with the critical speech of General Secretary Jinping Xi (Xi, 2019), three levels were developed, including “target level, guideline level and indicator level”. “excellent water security”, “quality water resources”, “positive water economy”, “harmonious water ecology”, “efficient water management”, “advanced water culture” 6 subsystems, a total of 34 indicators of the evaluation system were made. A superior water security subsystem indicates that the river can combat flood and drought hazards. Key influencing factors include the extent of flooding in the basin, flood recovery efficiency, and flood prevention efficiency. A superior quality water subsystem means the river has excellent and stable water quality. Key influencing factors include surface water and groundwater quality conditions, etc. A positive water economy subsystem indicates that the river can satisfy agricultural, industrial, and domestic water use and high water use efficiency. Key influencing factors include the degree of water resource development and utilization, water supply security, etc. Balanced aquatic ecosystems indicate that the river ecosystem is stable and biodiversity-rich. Key influencing factors include the degree of natural habitat retention, the degree of soil and water conservation, and the degree of biodiversity. An efficient water management subsystem indicates that the management and services associated with river management are not absent and efficient. Key influencing factors include the professional quality of water staff, the degree of information management of water work, etc. The advanced water culture subsystem indicates that the transmission and innovation of river-related culture are not absent and efficient. Key influencing factors include the impact of the water landscape, public awareness of water conservation, etc. As shown in Table 1, the above data constitute a sample set. Based on professionalism, data differentiation, data relevance of the choice of three assignment methods - expert scoring method, entropy method, CRITIC and indicator scoring method with a comprehensive score as a label set.

Table 1

Evaluation indicators of happy river.

Target Level	Guideline Level	Primary Indicator Level		Secondary Indicator Level
Assessment Of Watershed Happiness A	Excellent Water Security B₁	Flooding Human Mortality Rate	C₁
		Flooding Economic Loss Rate	C₂
		Rate Of Flood Prevention Projects Meeting Standards	C₃	Dike Flood Control Project Standard Attainment Rate	D₁
				Reservoir Flood Control Project Standard Attainment Rate	D₂
				Sluice Gates Flood Control Project Standard Attainment Rate	D₃
		Flood Resilience	C₄
	Quality Water Resources B₂	River And Lake Water Quality Index	C₅
		Qualified Rate Of Centralized Drinking Water Sources For Surface Water	C₆
		Groundwater Resources Protection Index	C₇
	Positive Water Economy B₃	Water ResourcesDevelopment Utilization Rate	C₈
		Water Security Rate	C₉	Urban And RuralWater Supply Penetration Rate	D₄
				Proportion Of Actual Irrigated Area	D₅
				Water Withdrawal of 10,000 RMB of Industrial Added Value	D₆
		The Ability of Water Resources to Support High-quality Development	C₁₀	GDP Output Per Unit Of Water	D₇
			C₁₀	Water Use Elasticity Coefficient	D₈
		Resident Well-being Index	C₁₁	GDP Per Capita	D₉
				Engel Coefficient	D₁₀
				Average Life Expectancy	D₁₁
	Harmonious Water Ecology B₄	Retention Rate Of Natural Habitats In Rivers And Lakes	C₁₂	Water Area Retention Rate	D₁₂
		Retention Rate Of Natural Habitats In Rivers And Lakes	C₁₂	Percentage Of River Vertical Connectivity Above Medium Level	D₁₃
		Rate Of Ecological Flow Of Important Rivers And Lakes Meeting Standards	C₁₃
		Soil And Water Conservation Rate	C₁₄
		Aquatic Biodiversity Index	C₁₅
		Urban And Rural Residents Pro-Water Index	C₁₆
	Efficient Water Management B₅	Percentage Of Middle And Senior Workers In The Water Sector	C₁₇
		Water Education Base Opening Rate	C₁₈
		Water Resources Management Information System Construction	C₁₉
		Water Institutional Reform	C₂₀
	Advanced Water Culture B₆	Historical Water Culture Protection And Inheritance Index	C₂₁	Historic Water Cultural Heritage Preservation Index	D₁₄
		Historical Water Culture Protection And Inheritance Index	C₂₁	Historical Water Culture Dissemination Power	D₁₅
		Modern Water Culture Creation Innovation Index	C₂₂
		Water Landscape Impact Index	C₂₃
		Public Water Governance Awareness Participation	C₂₄	Public Water Awareness Penetration Rate	D₁₆
		Public Water Governance Awareness Participation		Public Participation In Water Governance	D₁₇

The index calculation method and assignment method are shown in Table 2.

Evaluation indicators of happy river. The index calculation method and assignment method are shown in Table 2.

Table 2

Schematic table of the calculation and scoring methods of the main evaluation indicators.

Index	Calculation method	Assignment method
C₁	C₁′ = the average of the monthly flooding mortality rate in the last twelve months within the basin, where the monthly flooding mortality rate = the total flooding death and disappearance population in that month (unit: person)/the total population in the basin in that month (unit: million people) ∗ 100%	C₁′ = 0, C₁ = 100.C₁′ ≥ 0.42 persons per million, C₁ = 0.Other cases are assigned points by linear interpolation
C₂	C₂′ = the average monthly flood economic loss rate in the last twelve months within the basin, where the monthly flood economic loss rate = direct economic loss due to flood in that month (unit: million yuan)/GDP in that month within the basin (unit: million yuan) ∗100%	C₂′ = 0%, C₂ = 100.C₂′ ≥1.5%, C₂ = 0.Other cases are assigned by linear interpolation
D₁	D₁′ = the length of dykes that meet the standard (unit: km)/the total length of planned dikes (unit: km)∗100%	D₁ = D₁′∗100
D₂	D₂′ = the number of reservoirs that can play a usual role in flood control according to the design/the total number of reservoirs with flood control function ∗100%, where reservoirs are calculated according to large and medium-sized, small, and their weights is 0.6 and 0.4 respectively.	D₂ = D₂′∗100
D₃	D₃′ = Number of sluice gates that can play a usual role in flood control according to the design/total number of sluice gates with flood control function planned∗100%	D₃ = D₃′∗100
C₄	The expert experience scoring method is used to evaluate four parameters: economic strength of the basin, development level, rescue and relief capacity, and post-disaster recovery action power	The total score of 4 parameters is 100 points, based on the expert experience scoring method, and using the weighted average method to calculate the score of post-flood recovery capacity, the weight of the four parameters are 0.3, 0.2, 0.25, 0.25
C₅	The paper conducts this evaluation based on the proportion of I-III river lengths and the proportion of poor V river lengths. The proportion of I-III river lengths is the proportion of the length of rivers with water quality categories better than and equal to III to the length of the evaluated rivers. The proportion of poor V river length is the proportion of the length of rivers with the water quality category of poor V to the length of the evaluated rivers.	The table of river water quality indicators uses the relevant provisions of the Technical Regulations for Surface Water Resources Quality Evaluation (SL395-2007)
C₆	C₆′ = the number of qualified surface water centralized drinking water sources/total number of surface water centralized drinking water sources ∗ 100%	C₆ = C₆′∗100
C₇	C₇′ = total regional shallow groundwater extraction/regional groundwater extractable volume	C₇′ ≤ 0.3, C₇ = 100.C₇′ is reduced by 10 points for each 0.1 increase in C₇′.C₇′ ≥ 1.3, C₇ = 0.
C₈	C₈′ = water supply volume/total water resources∗100%. Where the water supply volume does not include the net transfer of water (transfer in - transfer out) and the water supply volume of other water resources	C₈′ ≤ 40%, C₈ = 100.C₈′ ≤ 50%, C₈ = 80.C₈′ ≤ 67%, C₈ = 60.C₈′≤75%, C₈ = 40.C₈′≤90%, C₈ = 20.C₈′>90%, C₈ = 0.The assignment criteria table is based on the “Technical Guidelines for River and Lake Health Assessment” (SL/T793-2020)
D₄	D₄′=(urban water supply penetration rate∗urban population + county water supply penetration rate∗county population + formed town water supply penetration rate∗formed town population + rural tap water penetration rate∗rural population)/total basin population∗100%	D₄ = D₄′∗100
D₅	D₅′ = actual irrigated area of arable land/irrigated area∗100%	D₅ = D₅′∗100
D₆	D₆′ = industrial water consumption (unit: billion cubic meters)/industrial added value (unit: million yuan)∗100%	D₆ = D₆′∗100
D₇	D₇′ = 10,000/10,000 Yuan GDP water consumption	D₇ = D₇′/baseline value∗100; if D₇ ≥ 100, count 100.Where the baseline value is taken as the median water consumption level of high-income countries,US$130m³, which translates into a GDP output of 531 yuan per square of water (in RMB)
D₈	D₈′ = average monthly growth rate of water consumption/average monthly growth rate of GDP (less than 1, 100 points, 1–2, 80 points, etc.)	D₈′ ≤ 1, D₈ = 100.D₈′ is reduced by 10 points for every 1 increase in D₈′
D₉	D₉′ = basin GDP/basin populationThe arithmetic mean of annual data was used for monthly data.	D₉ = D₉′/benchmark value ∗ 100; if D₉ ≥ 100, count 100.Where the benchmark value is taken from the lower level of high-income countries, i.e. US$20,000, with an exchange rate of 689.85 RMB/US$100
D₁₀	D10′=∑ENCi∗CAPi∑CAPiENCi is the Engel coefficient of municipality i in the basin, CAPi is the population of municipality i in the basin	D₁₀ = benchmark value/D₁₀′∗100; if D₁₀ ≥ 100, count 100.Where the benchmark value is taken as the middle level of the UN′s affluence standard (20%–30%), i.e. 25%
D₁₁	D11′=∑ALEi∗CAPi∑CAPiALEi is the average life expectancy of municipality i in the basin, CAPi is the population of municipality i in the basin	D₁₁ = D₁₁′/baseline value∗100; if D₁₁ ≥ 100, count 100.Where the baseline value is taken as the median of 81 years in high-income countries
D₁₂	D₁₂′ = area of watershed space (rivers, lakes, reservoirs, beaches, mudflats, swamps) (unit: km²)/area of watershed space in 1980s (unit: km²)	D₁₂ = D₁₂′∗100
D₁₃	D13′=∑i=1naibiLj∗100bi=bLi+bQi2bLi=(Lai/Lj)∗(Lbi/Lj)(Lai/Lj)+(Lbi/Lj)/2αbQi=Qi/QjβWhere: D13 is the longitudinal connectivity index of the river segment; ai is the barrier coefficient corresponding to the barrage of the ith type; bi is the location correction factor of the barrier of the ith type; bLi is the location correction factor characterizing the influence of the location of the barrier on the longitudinal connectivity of the river at this level, characterizing the influence of the location of the barrier on the longitudinal connectivity of the river at this level; bQi is the influence of the location of the barrier on the connectivity between the river segment and the confluent main stream (estuary). Lai is the distance of the barrier from the source of the river; Lbi is the distance of the barrier from the estuary (or confluence into the main stream); Qi is the multi-year average natural runoff at the barrier; Qj is the multi-year average natural runoff at the estuary (or confluence into the main stream); and, α, β is the standardization coefficient, taking the values of 0.78 and 0.5 respectively.D13″=∑j=1nD13′Lj∑j=1nLjWhere: D13 is the vertical connectivity index of the major rivers in the primary zone, n is the number of rivers in the region with an area greater than 10,000 km²; Lj is the length of the jth river	According to the existing results of the national water ecology protection and restoration plan for major rivers and lakes, the national water resources protection plan, etc., combined with the actual basin, the standardization method of the vertical connectivity index of major rivers is determined: D₁₃ (1-D₁₃″/2.5)∗100; when D₁₃″ > 2.5, D₁₃ = 0
C₁₃	C₁₃′ = Number of control sections (points) that meet the ecological flow target/number of evaluation sections (points)∗100%	C₁₃₌C₁₃′∗100
C₁₄	C₁₄′ = Area with soil erosion intensity below mild/Area of evaluation area∗100%	C₁₄ = C₁₄′/soil and water conservation rate threshold∗100
C₁₅	C₁₅′ = the diversity indices of aquatic organisms (benthos, algae, phytoplankton, zooplankton) in the basin for the month
C₁₆	C₁₆′ = Number of National Scenic Water Conservancy Areas in the basin (unit: one)/basin area (unit: 100,000 km²)	C₁₆′ = 0, C₁₆ = 0;C₁₆′ ∈ (0,1], C₁₆ = 20;C₁₆′ ∈ (1,5], C₁₆ = 40;C₁₆′ ∈ (5,10, C₁₆ = 60;C₁₆′ ∈ (10,20], C₁₆ = 80;C₁₆′ ∈ (20,+∞], C₁₆ = 100;
C₁₇	C₁₇′ = Number of senior workers in local water conservancy sector (unit: person)/Total number of workers in water conservancy sector (unit: person)	C₁₇ = C₁₇′∗100
C₁₈	C₁₈′ = Number of national water education bases within the basin (unit: one)/Total number of national water education bases (unit: one)	C₁₈ = C₁₈′∗100
C₁₉		Has been established to the county water resources management information system at all levels for 100 points, has been established to the municipal water management information system for 80 points, other cases, 60 points
C₂₀		Has completed the municipalities, districts and counties water system reform 100 points, has completed the district and county water system reform 80 points, other cases, 60 points
D₁₄	D₁₄′ = (number of world-class heritage ∗5 + number of national heritage ∗2 + number of provincial heritage) (unit: one)/basin area (unit: 100,000 km²)	D₁₄′ = 0, D₁₄ = 0.D₁₄′ ≥ 10, D₁₄ = 100.Other cases according to linear interpolation assignment of points
D₁₅	D₁₅′=(Number of national museums or bases∗2 + number of provincial museums or bases) (unit: one)/watershed area (unit: 100,000km²)	D₁₅′ = 0, D₁₅ = 0.D₁₅′ ≥ 6, D₁₅ = 100.Other cases are assigned by linear interpolation
C₂₂	C₂₂′ = [Number of national-level current year (scientific research projects with acceptance conditions + scientific research papers + awards + authorized patents) ∗2 + number of provincial-level current year (scientific research projects with acceptance conditions + scientific research papers + awards + invention patents)]/basin area (unit: 100,000 km²)	C₂₂′ = 0, C₂₂ = 0.C₂₂′ ≥ 6, C₂₂ = 100.Other cases are assigned by linear interpolation
C₂₃	C₂₃′ = [Number of world-class natural heritage water landscapes∗5 + number of national-level (natural heritage water landscapes + wetland parks + national parks)∗2 + number of provincial-level (natural heritage water landscapes + wetland parks + national parks)]/total resident population in the watershed (unit: million people)	C₂₃′ ≤ 1, C₂₃ = 50.C₂₃′≥ 10, C₂₃ = 100.Other cases were assigned by linear interpolation
D₁₆	Questionnaire survey	Using questionnaires to analyze the popularity of public awareness of water, respect for water, care for water and water conservation, each questionnaire has a total score of 100, and the average score is calculated according to all questionnaires.
D₁₇	Questionnaire survey	Using questionnaires, statistical analysis of public participation in activities related to water science, water construction, water supervision, etc., with a total score of 100 points for each questionnaire and an average score calculated based on all questionnaires

Empirical analysis

Study area

The Jiangsu section of the Huaihe River Basin in China, which mainly flows through the north-central region of Jiangsu Province, China, involves eight prefecture-level cities, namely Xuzhou, Nantong, Lianyungang, Huaian, Yancheng, Yangzhou, Taizhou and Suqian, and is located at 116°22′–121°00′E and 32°23′–35°07′N. As shown in Figure 4, it is connected to the Tong Yang Canal and Yangtze River basin in the south, reaching the Yiliu hilly mountains, and the Yellow River basin in the north, reaching the Yimeng Mountains. The easternmost section of the Huaihe River Basin. With the waste Yellow River as the boundary, it is divided into the Yishushi and the lower reaches of the Huai River, with the cities of Xuzhou and Lianyungang belonging to the Yishushi region and the remaining six cities belonging to the lower reaches of the Huai River. There are many lakes and rivers in the Jiangsu section of the basin, including Hongze Lake, the Beijing-Hangzhou Grand Canal and the Huai-Shu New River, among which the Hongze Lake Wetland is an important freshwater wetland reserve in China with a good environment and a variety of biological and plant resources.

Figure 4

Jiangsu section of Huaihe River Basin, China.

Data processing

Data smoothness analysis

The Augmented Dickey-fuller Test (ADF) test method uses an autoregressive model and optimizes the information criterion for multiple lagged values, which can determine the trend strength of the time series. If the original hypothesis of the test can be expressed as a unit root, indicating that it is non-stationary, the alternative hypothesis is stationary. The p-value in the statistical test is the probability. If it is less than or equal to the threshold value (0.05), it indicates that the original hypothesis is rejected (the data is smooth); if it is higher than the threshold value (0.05), it indicates that the null hypothesis cannot be rejected (the data is non-smooth). The ADF value is the ADF test value, the ADF value is compared with the critical value, and generally, the critical value is chosen to be 1%. In this study, Econometrics Views software (Eviews) was used to conduct the test, and the results showed that the p-values of the 34 characteristic series were less than 0.05, the ADF value of each characteristic was negative, and these ADF values were less than the critical value (1%), indicating that the original hypothesis was rejected, i.e., the time series was smooth. In summary, the original data is smooth, and the next stage requires a data cleaning process for this data.

Data cleaning

The data cleaning process removes the scattered null values in the data and for individual missing values. According to the data set and based on the research objective of this paper, the 5-bit moving average method is used to fill in the data. The formula for the missing value Mt at time t is shown in Eq. (7).where: M is the missing value at moment t; X, X, X, X, X denote the five data values preceding moment t, respectively. In summary, the data cleaning in this study is completed by removing the wrong values and filling the missing values.

Data pre-processing

Firstly, Data is normalized to resolve the difference in magnitude; secondly, the processed data is divided into the training set and test set, and 10% of the data is selected as the test set and 90% of the data as the training set in this experiment (Yang et al., 2018).

Analysis of results

First, we make a comprehensive assignment using the expert scoring method, entropy method and CRITIC method. The results of the weight assignment are shown in Table 3. Table 3 shows that the weights derived from the entropy weighting method and CRITIC have some differences, but the general trend is the same. They are calculated using the correlation of data information. The results of the expert scoring method differ slightly from these trends. It is based on the work experience of water experts. We can get a perfect score for each system when we do an arithmetic average of the three results. The total scores of water security, water resources, water economy, water ecology, water management, and water culture subsystems are 23.13, 11.37, 20.07, 20.03, 12.7 and 12.7, respectively.

Table 3

Schematic table of the results of the three methods of assigning weights.

Expert Scoring Method	C₁	C₂	D₁	D₂	D₃	C₄	C₅	C₆	C₇	C₈	D₄	D₅	D₆	D₇	D₈	D₉	D₁₀
	0.075	0.075	0.030	0.030	0.015	0.025	0.060	0.045	0.045	0.050	0.030	0.023	0.023	0.031	0.031	0.020	0.023
	D₁₁	D₁₂	D₁₃	C₁₃	C₁₄	C₁₅	C₁₆	C₁₇	C₁₈	C₁₉	C₂₀	D₁₄	D₁₅	C₂₂	C₂₃	D₁₆	C₁₇
Entropy Method	0.020	0.015	0.015	0.045	0.030	0.030	0.015	0.010	0.030	0.030	0.030	0.015	0.010	0.025	0.025	0.015	0.010
	0.023	0.012	0.022	0.016	0.011	0.161	0.040	0.035	0.025	0.052	0.006	0.011	0.009	0.011	0.003	0.008	0.010
	D₁₁	D₁₂	D₁₃	C₁₃	C₁₄	C₁₅	C₁₆	C₁₇	C₁₈	C₁₉	C₂₀	D₁₄	D₁₅	C₂₂	C₂₃	D₁₆	C₁₇
	0.009	0.010	0.146	0.026	0.030	0.032	0.024	0.013	0.091	0.039	0.017	0.011	0.008	0.017	0.009	0.036	0.026
Critic	C₁	C₂	D₁	D₂	D₃	C₄	C₅	C₆	C₇	C₈	D₄	D₅	D₆	D₇	D₈	D₉	D₁₀
	0.047	0.018	0.025	0.045	0.022	0.042	0.040	0.025	0.026	0.031	0.018	0.046	0.019	0.020	0.028	0.024	0.021
	D₁₁	D₁₂	D₁₃	C₁₃	C₁₄	C₁₅	C₁₆	C₁₇	C₁₈	C₁₉	C₂₀	D₁₄	D₁₅	C₂₂	C₂₃	D₁₆	C₁₇
	0.025	0.038	0.030	0.025	0.052	0.020	0.018	0.019	0.029	0.031	0.042	0.019	0.036	0.037	0.034	0.023	0.025

Schematic table of the calculation and scoring methods of the main evaluation indicators. Schematic table of the results of the three methods of assigning weights. Second, we arithmetically average the scores obtained using the three weights. We used this score as the evaluation result of the happiness level of the river during the ten years. The scores are shown in Table 4. Table 4 shows that the river happiness in the Jiangsu section of the Huaihe River Basin in China is good from 2012 to 2021. From 76.4 points in 2012 to 87.34 points in 2021, the scoring rate has increased by 14.32%. Among them, water security, water resources, water economy, water ecology, water management, and water culture subsystem score increases of 7.68 %, 1.41 %, 26.21 %, 1.42 %, 62.33 %, and 14.60 %, respectively. The score of each subsystem in 2021 reached 92.78%, 82.23%, 84.7%, 89.07%, 83.46%, and 87.17%, respectively. We can see that the water safety, ecology, and culture subsystems are currently scoring high. This indicates that they are in good condition. In addition, the water economy and water management subsystem scores had a higher rate of increase. This indicates that they have made greater progress.

Table 4

Happiness score of Jiangsu section of Huaihe river basin, 2012–2022.

Year	Water Safety	Water Quality	Economic Contribution	Water Ecology	Water Management	Water Culture	Total Score
2012	19.93	9.22	13.47	17.59	6.53	9.66	76.40
2013	20.83	9.44	13.69	17.83	6.62	9.81	78.22
2014	21.79	9.11	13.96	17.86	6.69	9.88	79.29
2015	21.07	8.55	14.57	17.55	7.28	10.06	79.08
2016	21.00	9.10	15.14	17.77	9.89	10.36	83.26
2017	21.00	8.60	16.27	17.81	10.07	10.40	84.14
2018	21.00	9.27	17.34	17.74	9.99	10.74	86.08
2019	20.99	9.48	16.83	17.70	10.43	10.86	86.29
2020	21.55	9.32	16.67	17.96	10.57	10.95	87.03
2021	21.46	9.35	17.00	17.84	10.60	11.07	87.34
2022	21.97	9.94	16.58	18.59	10.98	11.71	89.77

Happiness score of Jiangsu section of Huaihe river basin, 2012–2022. Finally, we use the raw data as features and the scores as labels. We selected 108 samples (108 monthly data from 2012 to 2020) as the training set and 12 samples (12 monthly data from 2021) as the test set. We use this to build an LSTM simulation model of the degree of river happiness. In the model construction, we tried to borrow parameters from other good training results as the initial parameters of the model. After continuous debugging, we came up with the best model parameters. The model parameters are shown in Table 5. As mentioned earlier, we divided the system into six subsystems: water security, water resources, water economy, water ecology, water management, and water culture for LSTM modelling. The results of the model fitting are shown in Table 6. The training set of each system was associated with a maximum RMSE of 0.0226 and a minimum coefficient of determination R2 of 0.9699. The maximum RMSE for each system test set was 0.0193, and the lowest coefficient of determination R2 was 0.9011. These low error rates demonstrate the goodness of the model fit. We use the first subsystem – water security – to demonstrate the fitting effect. The other subsystems are explained similarly, so we will not expand the description. The fit of the “water security” subsystem is shown in Figures 5 and 6. We can see from the figure that the model fits well in both the training and test sets. The other subsystem fits are shown in Annexes 4–13.

Table 5

Selection of LSTM parameters.

numHiddenUnits	miniBatchSize	LearnRateDropPeriod
128	64	250
LearnRateDropFactor	MaxEpochs	InitialLearnRate
0.2	500	0.001

Table 6

Schematic table of the fitting effects of the training set and test set for each subsystem.

	Train Set			Test set
	RMSE	R²	Total RMSE	RMSE	R²	Total RMSE
Water Safety	0.0051	0.9885	0.0145	0.0193	0.9884	0.0113
Water Quality	0.0163	0.9780		0.0045	0.9011
Economic Contribution	0.0073	0.9868		0.0174	0.9041
Water Ecology	0.0107	0.9874		0.0148	0.9723
Water Management	0.0226	0.9898		0.0091	0.9458
Water Culture	0.0013	0.9699		0.0089	0.9012

Figure 5

Schematic diagram of the fitting effect of the training set of the water safety subsystem.

Figure 6

Schematic diagram of the fitting effect of the test set of the water safety subsystem.

Selection of LSTM parameters. Schematic table of the fitting effects of the training set and test set for each subsystem. Schematic diagram of the fitting effect of the training set of the water safety subsystem. Schematic diagram of the fitting effect of the test set of the water safety subsystem. After demonstrating the feasibility of the model, we bring in the data for the June 2022 river. Subsequently, we obtained the scores of river happiness in the Jiangsu section of the Huaihe River Basin. The scores are shown in Table 4. The overall score was 89.77, an increase of 2.78% compared to 2021. The scores of each subsystem increased by 2.38%, 6.31%, 4.2%, 3.58%, and 5.78%, respectively, compared to 2021. The score of the water economy subsystem decreased by 2.47% compared to 2021. In summary, the general trend of happy river development is positive. However, we can find a tendency for the economic contribution level to fall back. The system of water’s contribution to the economy has entered a period of stability. This is concentrated in demand stability, cost stability, channel stability, etc. Shipping, hydropower generation, drinking water supply, river and lake biological supply and other channels have formed a relatively mature pricing system and trading market. This is the main reason the water economic system is more stable and making small steps forward at the macro level. However, to alleviate the new crown epidemic, the state has taken a series of impact measures on shipping, river and lake bio-supply, etc. These measures have caused problems such as increased transportation costs, lower income levels of the population and reduced demand. These are a large part of the reason for this retreat in water economy subsystem scores. The relevant literature also confirms this finding (Du et al., 2021; Sun et al., 2021; Zhang et al., 2020b).

Conclusion

We identified two shortcomings after combing through the existing studies on the evaluation of Happy River. First, the target layer of the existing indicator system is missing expectations in terms of water management. To this end, we have added indicators such as “the opening rate of water education bases, the degree of construction of water resources management information systems, and the proportion of senior staff in the water sector” and used them as the basis for water management evaluation. Second, the existing evaluation methods are more subjective, and the evaluation process is cumbersome. For this purpose, we focus on the evaluation method that combines subjectivity and objectivity - neural networks. We chose LSTM as a method for the evaluation of the Happy River because of its feasibility for water quality evaluation. The empirical results show that the maximum RMSE between each system’s training and test sets is 0.0226. The lowest coefficient of determination R2 for each system was 0.9011. This indicates that the model fits well. Compared with the existing research results, we have enriched the evaluation index system of Happy River and led the evaluation of Happy River into a new era of objectivity and efficiency by using LSTM. Of course, we found the following limitations to this study. Due to the small sample size, we divided the whole system into six subsystems for modelling. Therefore, we will improve the model by adding optimization algorithms such as genetic algorithms and particle swarm algorithms in our future research. We expect the optimization algorithm to significantly improve the accuracy of the model fit to realize the modelling of the whole system.

Declaration

Author contribution statement

Tingting Zhu: Conceived and designed the experiments; Performed the experiments; Wrote the paper. Juqin Shen: Contributed reagents, materials, analysis tools or data. Fuhua Sun: Analyzed and interpreted the data.

Funding statement

This work was supported by Social Science Foundation of Jiangsu Province [19GLD002], Fundamental Research Funds for the Central Universities [2018B58814], Central University Basic Scientific Research Business Expenses Special Funds [2019B69214], Water Conservancy Science and Technology Project of Jiangsu Province [2019013].

Data availability statement

The authors do not have permission to share data.

Declaration of interests statement

The authors declare no conflict of interest.

Additional information

No additional information is available for this paper.

5 in total

5. Discussion on the existing methodology of entropy-weights in water quality indexing and proposal for a modification of the expected conflicts.

Authors: Siddhant Dash; Ajay S Kalamdhad
Journal: Environ Sci Pollut Res Int Date: 2021-05-27 Impact factor: 4.223