Longlong Liu1, Di Ma1, Ahmad Taher Azar2,3, Quanmin Zhu4. 1. School of Mathematical Sciences, Ocean University of China, Qingdao 266000, China. 2. Robotics and Internet-of-Things Lab (RIOTU), Prince Sultan University, Riyadh 11586, Saudi Arabia. 3. Faculty of Computers and Artificial Intelligence, Benha University, 13511 Benha, Egypt. 4. Department of Engineering Design and Mathematics, University of the West of England, Frenchy Campus Coldharbour Lane, Bristol BS16 1QY, UK.
Abstract
In this paper, a gradient descent algorithm is proposed for the parameter estimation of multi-input and multi-output (MIMO) total non-linear dynamic models. Firstly, the MIMO total non-linear model is mapped to a non-completely connected feedforward neural network, that is, the parameters of the total non-linear model are mapped to the connection weights of the neural network. Then, based on the minimization of network error, a weight-updating algorithm, that is, an estimation algorithm of model parameters, is proposed with the convergence conditions of a non-completely connected feedforward network. In further determining the variables of the model set, a method of model structure detection is proposed for selecting a group of important items from the whole variable candidate set. In order to verify the usefulness of the parameter identification process, we provide a virtual bench test example for the numerical analysis and user-friendly instructions for potential applications.
In this paper, a gradient descent algorithm is proposed for the parameter estimation of multi-input and multi-output (MIMO) total non-linear dynamic models. Firstly, the MIMO total non-linear model is mapped to a non-completely connected feedforward neural network, that is, the parameters of the total non-linear model are mapped to the connection weights of the neural network. Then, based on the minimization of network error, a weight-updating algorithm, that is, an estimation algorithm of model parameters, is proposed with the convergence conditions of a non-completely connected feedforward network. In further determining the variables of the model set, a method of model structure detection is proposed for selecting a group of important items from the whole variable candidate set. In order to verify the usefulness of the parameter identification process, we provide a virtual bench test example for the numerical analysis and user-friendly instructions for potential applications.
Entities:
Keywords:
gradient descent algorithm; neural networks; neuro-computing; parameter estimation; total non-linear model
Because a total non-linear model can provide a very concise representation for complex non-linear systems and has good extrapolation characteristics, it has attracted the attention of academic research and applications. Compared with the polynomial non-linear auto-regressive moving average with exogenous input (NARMAX) model, the total non-linear model is an extension of the polynomial model, which can be defined as the ratio of two polynomial expressions [1,2,3]. The introduction of denominator polynomials makes the NARMAX model non-linear in parameters and regression terms. Therefore, compared with the polynomial model, the model identification and the controller design of the total non-linear model are much more challenging [4,5]. In view of the difficulty of parameter estimation of a total non-linear model, using simple and effective algorithm and machine learning should be considered for extracting the information from measurement data.
1.1. Literature Survey
At present, a variety of model structure detection techniques and parameter estimation algorithms are developed for non-linear models, including the orthogonal model structure detection and parameter estimation program [6], the generalized least square estimator [7,8], the prediction error estimator [9,10], the Kalman filter estimator [11,12], the genetic algorithm estimator [12,13], the artificial neural network estimator [14,15,16,17], etc. However, most of these algorithms are parameter estimators for polynomial non-linear models. Zhu and Billings have done a lot of research work on the parameter identification of a total non-linear model [7,8], and they put forward the parameter estimation method of a total non-linear model based on a back-propagation (BP) algorithm in 2003. They discussed the advantages of BP calculation in recognition of the classical model to provide the best combination of classical and neural network methods and provided a powerful tool for analyzing a large number of systems.In [18], a back-propagation estimation formula based on neuro-computing was presented for estimating the total non-linear model parameters, where a pack of solutions were derived for the problems of parameter initialization, learning rate selection, stop criteria and model structure detection, and the convergence of a back-propagation estimator (BPE). However, Reference [18] only proposed a parameter estimation method for single-input and single-output (SISO) systems, and correspondingly the case studies. Expanding [18], this paper presents solutions for the parameter estimation of a total non-linear multi-input and multi-output (MIMO) model. Due to the complexity of a MIMO system, it is more difficult to estimate the model parameters, but they are more general in academic research and applications. For example, the parameters of a MIMO system are many more than those of a SISO system, and the parameters to be estimated each time will be multiplied, which increases the difficulty of estimation. Moreover, due to the coupling of multiple systems, the parameter values of each system also affect each other. The algorithms to estimate these parameters are not independent but interactive and complex. Because the components of different MIMO systems are different, the total connection neural network structure adopted in [18] is not suitable for estimating the parameters of MIMO systems. When the MIMO system is mapped into a neural network, the network structure is often asymmetric or non-completely connected (the neurons in the hidden layer are not connected with all the neurons in the input layer). That is to say, the network is not a common completely connected feedforward neural network, and the general BP algorithm cannot be directly applied to the estimation of the parameters. Therefore, the learning algorithm of the parameters must be properly derived. Due to the asymmetry of the network, the convergence of the network is also facing challenges. It is necessary to analyze the convergence of the network and give the specific conditions of the network convergence. A MIMO system needs to identify the parameters of a SISO system several times, and a MIMO system can have multiple inputs. In the simulation experiment, the parameters of the system should be estimated under different combinations of multiple inputs, and the performance of the network estimator should be verified. Therefore, the parameter identification of a MIMO system is much more challenging.
1.2. Motivation and Contributions
The authors of [19] presented a thorough analysis that included two kernel components, the SISO model and the orthogonal algorithms are parameter estimators for polynomial non-linear models such as predictive and back propagation computation. Since then, rational model identification has gone to diversified directions, such as more theoretical considerations of a non-linear least squares algorithm [4], a maximum likelihood estimation [3], and a biased compensation recursive least squares algorithm [2]. It has been noted that the MIMO rational model identification has seldom attracted research, probably due to the complexity in algorithm formulation and the coupling effect. However, this MIMO rational model identification should be a research agent now because of recent applications and increasing computing facilities.The total non-linear system model, which is relatively new, is the alternative name of the NARMAX rational model, which was defined by a survey paper on the rational model identification [19]. The total non-linear model emphasizes the non-linearity in both the parameters and control inputs, and it has been taken as a challenging structure for designing non-linear dynamic control systems [1]. The rational model gives more consideration as expanded polynomials in math, structure detection, and parameter estimation in the field of system identification [2,3]. Therefore, the main contribution of the new study is to use neural computing algorithms for a MIMO model parameter estimation. The new study is a complement to those classical NAMAX approaches.The rest of the paper is organized as follows. The total non-linear model is described in Section 2. Section 3 presents the gradient descent calculation of parameter estimation. Next, model structure detection is discussed in Section 4. A convergence analysis of an algorithm is presented in Section 5. Simulation results and discussions are demonstrated in Section 6. Finally, Section 7 includes the paper conclusions and some of the future aspects.
2. Total Non-Linear Model
In mathematics, the dynamic total non-linear model of a MIMO system with error can be defined as
where and are the measured output and model output, respectively; is the input; is the model error; and is the sampling time index. Numerator and denominator as represented by polynomials, regression term , and are products of past inputs, outputs, and errors, such as ,
,
., and are the parameter sets of and , respectively.The task of parameter estimation is to extract the relevant parameter values from the measured input and output data for a given model structure. To form a regression expression for parameter estimation, multiplying of both sides of Formula (1) givesTo consider the neuro-computing approach for parameter estimation, a total non-linear model is expressed into a non-completely connected feedforward neural network, as shown in Figure 1.
Figure 1
Structure of a neural network corresponding to a total non-linear model.
We define the network with an on both sides, Formula (11) is obtainedinput layer, a hidden layer, and an output layer, where:The input layer consists of regression terms and ; here, a neuron in the hidden layer is not connected to all the neurons in the input layer, that is, the network is a non-completely connected feedforward neural network.The action function of the neurons in the hidden layer is linear, and the output of the hidden layer neurons is or .The action function of the output layer neurons is linear, and the output of the ith output layer neuron is .The connection weights between the input layer neurons and the hidden layer neurons are the parameters and of the model.The connection weight between the hidden layer neurons and the ith output layer neurons are and the observed output .Leung and Haykin proposed a rational function neural network [20] but did not define a generalized total non-linear model structure or consider the relevant errors. Therefore, their parameter estimation algorithm could not provide an unbiased estimation for noise damaged data, which was essentially a special implementation of Zhu and Billings’s [7,8] methods in the case of no noise data. The method proposed in this paper is a further study of the method in Zhu [18]. The characteristics of a total non-linear model (1) are as follows:By setting parameter , Zhu’s [18] model can be a special case of the model in Formula (1).The model is non-linear in parameters and regression terms, which was caused by denominator polynomials.When the denominator of the model is close to 0, the output deviation would be large. In this paper, considering this point, division operation was avoided in the action function of the neuron when the neural network model was being built.The structure of the neural network corresponding to the total non-linear model is a non-completely connected feedforward neural network, or a partially connected feedforward neural network. Therefore, the convergence of the network becomes a big problem, which is the difficulty of this paper.The model has a wide range of application prospects. In many non-linear system modeling and control applications, the total non-linear model has been gradually adopted. Some non-linear models, such as the exponential model , which describes the change of dynamic rate constant with temperature, cannot be directly used. The exponential model can be firstly transformed into a non-linear model (), and then, system identification can be implemented [19,21,22].
3. Gradient Descent Calculation of Parameter Estimation
For the convenience of the following derivations, set the output of neuron in the output layer of the neural network as .Define the error measure function of one iteration of network as:The Lyapunov method is often used to analyze the stability of a neural network [23]; similarly, the network parameters are estimated by minimizing the network error based on the Lyapunov method. It should be noted that when the total non-linear model is represented in the neural network structure of Figure 1, the parameter estimation of the model can be described as the training of neural network weight by minimizing the error in Formula (5).In order to train the weights of the network, the learning algorithm based on the gradient descent is given by Formulas (6) and (7):
where and are learning rates.By deriving Formula (4) from on both sides, Formula (8) is obtained:Substituting Formula (8) into Formula (6) to get Formula (9), we can then get Formula (10):By deriving Formula (4) from on both sides, Formula (11) is obtained:
Substituting Formula (11) into Formula (8) to get Formula (12), we then get Formula (13):The gradient descent algorithm for parameter estimation of a total non-linear model is summarized in Algorithm 1.
4. Model Structure Detection
Model structure detection is to select important items from a rather large model set (usually called the whole item set) and determine the sub-model with important items [18]. Because of the powerful self-learning and associative memory function of an artificial neural network [24], it is the first-choice tool to identify the model structure. When identifying systems with unknown structures, it is important to avoid losing these important items in the final model. For the structure detection of a total non-linear model, the connection weight estimation in the neural network, that is, the parameter estimation of the total non-linear model, could be used to select the significant terms.For the important and unimportant items in the whole model item set, the knock-out algorithm is adopted. First, remove the items that lead to the increase of network error, and then knock out the items with lighter weight according to the requirements of significance. Finally, test the error of the non-linear model composed of the remaining items. The specific algorithm is summarized in Algorithm 2.In the above process, the neural network is not only used to estimate the parameters of the model but also to detect the structure of the model and analyze the significance of the regression term.
5. Convergence Analysis of the Algorithm
Convergence proof:Assuming that a connection weight of the neural network shown in Figure 1 is changed, this weight can take any value. When the weight corresponds to the regression term parameter of the numerator of the total non-linear model, the resulting network error changes as follows (remove the lower corner marks in Formula (2) for the convenience of proof):Substitute Formula (2) into Formula (1) to get Formula (18):When is updated, (18) becomes (19):is the new error of the neural network after the weight has been updated. Subtract Formula (18) from Formula (19) to get Formulas (20) and (21):In order to ensure , , namely:Solving Formula (22) gives:When the changed weight corresponds to the regression parameter of the denominator of the total non-linear model, the resulting network error change is as follows:Subtracting Formula (24) from Formula (25) gives as the new denominator of the neural network after the weight has been updated.In order to satisfy , namely, , that is:Because the learning coefficient is too large, the training effect of the network is not effective; accordingly, we take to get , and thus, it has:To sum up, the network is convergent when the following conditions are met:Under these two conditions, this algorithm provides a convergence estimate for the parameters of the total non-linear model.
6. Simulation Results and Discussions
Consider a representative example of a total non-linear model:Because the disturbance of input data will cause interference to the estimation of parameters [25], in this section, the parameter estimation for different inputs was selected. Firstly, for the simulation system without noise, 2000 pairs of input/output data were used as data sets for uniform sampling in 20 cycles, and the learning rate was designed as a linear attenuation sequence (in 50 iterations, the learning rate decreases from η0 = 0.5 to ηend = 0.02). The algorithm in this paper was used to estimate 10 parameters at the same time. Table 1, after 50 iterations, shows the estimated values and mean square deviation of parameters.
Table 1
Parameter estimation of a noiseless system.
u1(t)
u2(t)
θ1
θ2
θ3
θ4
θ5
θ6
θ7
θ8
θ9
θ10
MSE
sine
sine
0.5002
0.8025
1.0003
1.0034
1.0000
0.2006
0.5010
1.0004
1.0018
0.9991
2.351E-06
sine
square
0.5000
0.8000
1.0000
1.0000
1.0000
0.1996
0.4982
1.0182
0.9677
1.0473
0.0003
square
square
0.4973
0.8760
1.0110
1.0031
1.0153
0.2013
0.5072
1.0354
0.9744
1.0840
0.0015
The inputs and of the system are either a sine wave or square wave with an amplitude of 2. Figure 2 shows the difference between the measured value (the real output value of Formula (6.1)) and the output value obtained using the parameter estimator when inputs and are both sine waves. In the same way, Figure 3 shows the difference between measured value and output value when inputs and are both sine waves. Figure 4 shows the difference between measured value and output value when input is a sine wave and is a square wave.
Figure 2
Error of with sine–sine input.
Figure 3
Error of with sine–sine input.
Figure 4
Error of with sine–square input.
Figure 5 shows the difference between measured value and output value when input is a sine wave and is square wave. Figure 6 shows the difference between measured value and output value when input and are both square waves. Figure 7 shows the difference between measured value and output value when input and are both square waves. It can be seen that when the inputs are both sine waves, the accuracy of parameter estimation is the highest, while when the inputs are square waves, the estimation accuracy of the parameters is relatively lower. This is because when the inputs are square waves, the output of the system has an overshoot, that is to say, the observation value of the system itself has an error. Training the network with error data will certainly lead to an error of parameter estimation; especially the estimation error of is the highest. Here, is the parameter of because is the third power of the output, which further amplifies the error. Substituting the value of into the update of would inevitably lead to an estimation error.
Figure 5
Error of with sine–square input.
Figure 6
Error of with square–square input.
Figure 7
Error of with square–square input.
For a system with noise interference, it is more difficult to estimate its parameters because of the error of the measured value itself [26,27]. In order to further verify the performance of the estimator, a system with noise was selected for parameter estimation, that is, by adding a noise signal to the original system. The noises and were random uniform sequences, where each mean value was zero, each variance was 0.1, and each signal-to-noise ratio was 10. We repeated the previous experiment for the system with noise, and the experimental results are shown in Table 2. The difference between the measured value and the estimated value of the system output is shown in Figure 8, Figure 9, Figure 10, Figure 11, Figure 12 and Figure 13.
Table 2
Parameter estimation of a noisy system.
u1(t)
u2(t)
θ1
θ2
θ3
θ4
θ5
θ6
θ7
θ8
θ9
θ10
MSE
sine
sine
0.5003
0.8041
1.0005
1.0054
1.0001
0.2008
0.5014
1.0005
1.0016
0.9987
5.342E-06
sine
square
0.5000
0.8001
1.0000
1.0001
1.0000
0.2045
0.5019
1.073
1.1364
1.0898
0.0032
square
square
0.4953
0.8765
1.0085
1.0327
1.0095
0.2969
0.7030
0.9971
1.0007
0.9953
0.0058
Figure 8
Error of with noise and sine–sine input.
Figure 9
Error of with noise and sine–sine input.
Figure 10
Error of with noise and sine–square input.
Figure 11
Error of with noise and sine–square input.
Figure 12
Error of with noise and square–square input.
Figure 13
Error of with noise and square–square input.
In order to detect the structure of the model, 20 items including 10 items in Formula (6.1) were used as the whole items set of models. The newly added numerator term is of order 1, the denominator term is of order 2, and the input lag and output lag are both of order 1. Using the knock-out algorithm in Section 4, the final 10 items of the model are in good agreement with those in Formula (6.1).From the above experimental results, the estimation accuracy of the algorithm proposed in this paper is acceptable, and the mean square deviations are all less than 0.003. This level of error is acceptable.
7. Conclusions
In this paper, the parameter estimation of a SISO rational model was extended to that of a MIMO total non-linear model. A method of parameter estimation of a MIMO non-linear rational model based on a gradient descent algorithm was proposed, and the convergence condition was proposed for the asymmetry of the network. It was proven that the estimator is properly effective by mathematical derivation and simulation. This estimation method has a strong generalization property and could be widely used in many fields, such as non-linear system modeling and control applications. Some systems that could not directly use this method, such as the exponential model describing the change of the kinetic rate constant with the temperature, could first be converted into a rational model and then use the developed estimation method. Some of the future work could be foreseen as (1) estimating the parameters of the state space model based on an artificial neural network, (2) estimating the parameters of a MIMO state space model, (3) estimating the parameters of the non-linear state space model, and (4) estimating the parameters of total non-linear spatial state models.