Literature DB >> 33286401

LSSVR Model of G-L Mixed Noise-Characteristic with Its Applications.

Shiguang Zhang^1,2,3, Ting Zhou⁴, Lin Sun^1,3, Wei Wang¹, Baofang Chang¹.

Abstract

Due to the complexity of wind speed, it has been reported that mixed-noise models, constituted by multiple noise distributions, perform better than single-noise models. However, most existing regression models suppose that the noise distribution is single. Therefore, we study the Least square S V R of the Gaussian-Laplacian mixed homoscedastic ( G L M - L S S V R ) and heteroscedastic noise ( G L M H - L S S V R ) for complicated or unknown noise distributions. The ALM technique is used to solve model G L M - L S S V R . G L M - L S S V R is used to predict short-term wind speed with historical data. The prediction results indicate that the presented model is superior to the single-noise model, and has fine performance.

Entities: Disease

Keywords: Gaussian–Laplacian mixed noise-characteristic; Least square SVR; empirical risk loss; equality constraint; wind-speed forecasting

Year: 2020 PMID： 33286401 PMCID： PMC7517163 DOI： 10.3390/e22060629

Source DB: PubMed Journal: Entropy (Basel) ISSN： 1099-4300 Impact factor: 2.524

1. Introduction

In practical applications, if the data are collected in a multi-source environment, the noise distribution is complex and unknown. Therefore, it is almost impossible for a single-noise distribution to clearly describe the real-noise [1]. is a method of that implements a sum-of-squares error function together with regularization, thus controlling the bias–variance trade-off [2,3]. It is intended to find the concealed linear structures in the original data [4,5]. For the sake of transition from linear to nonlinear function, the following generalization can be made [6]: by mapping input vectors into a high-dimensional feature space H (H is Hilbert space) through some nonlinear-mapping, seek the solution of the optimization problem in space H. Using a suitable kernel function , nonlinear-mappings can be estimated by kernel , which is an extended with kernel techniques. In recent years, as a data-rich nonlinear forecasting tool has been increasingly welcomed [7], which is applicable in many different contexts [8,9,10], such as machine learning, optical character recognition, and especially wind speed/power forecasting. Generally, the existing techniques used for wind-speed forecasting include: (i) physical; (ii) statistical (also called data-driven); and (iii) artificial intelligence (AI)-based methods. The physical models attempt to estimate wind flow around and inside the wind farm using physical laws governing the atmospheric behavior [11,12]. The statistical models seek the relationships between a set of explanatory variables and the on-line measured generation data, and the historical wind speed data recorded at the site are only used to establish the statistical model. We can model it in a variety of ways, including persistence method and auto-regressive model [13,14]. AI methods include artificial neural networks (ANNs) [15], deep learning [16], SVR machines [17,18], and the hybrid methods [19,20]. Suykens et al. [21,22,23] proposed least square support vector regression model with Gaussian noise (, also known as kernel ridge regression ()). Mixed-model based on multi-objective optimization [24,25], mixed-method based on singular spectrum analysis, firefly algorithm, and BP neural network predict wind speed with complicated noise [26], indicating that the mixed prediction method has the ability of powerful prediction. Mixed machine [27] is applied to forecast the wind speed noise, which improves performance of wind-speed prediction. [28] models fitted by Gaussian–Laplacian (G-L) mixed noise are developed, and good performance is obtained compared with the existing regression algorithm. To solve the above problems, we study model of G-L mixed noise-characteristic for complex or unknown noise distribution. In this case, we construct a technique to search the optimal solution of the corresponding regression task. Although many algorithms have been implemented in past years, we exploit ALM method, as shown in Section 4. If the task is not differentiable or discontinuous, the sub gradient descent method can be employed, or the SMO [29] can also be used if there is a very large sample size. The structure of this paper is as follows. Section 2 derives the optimal empirical risk loss by Bayesian principle. Section 3 constructs the model of G-L mixed noise. Section 4 gives the solution and algorithm design of . In Section 5, the numerical experiment of short-term wind-speed prediction is presented. Finally, we conclude the work.

2. Bayesian Principle to Mixed Noise Empirical Risk Loss

Given the Dataset where , is the training data. R represents real number set, is the n-dimensional Euclidean-Space, and N is the sample size. Superscript T is the transpose of matrix. Assuming that the sample of dataset is generated by the additive noise function , the relationship between the measured value and predicted value is: where is random, i.i.d. (independent, identical probability distribution) with of mean and standard deviation . Generally, the noise (probability density function) is unknown. It is necessary to predict unknown target from training set . Following the authors of [30,31], the optimal empirical risk loss in the sense of Maximum Likelihood (MLE) is i.e., the empirical risk loss is the log-likelihood of noise characteristic. It is assumed that noise in Equation (2) is Laplacian, with . By Equation (3), in MLE the optimal empirical risk loss should be . Suppose noise in Equation (2) is Gaussian of zero mean and homoscedastic standard deviation . By Equation (3), the empirical risk loss of Gaussian noise with homoscedasticity is . The noise in Equation (2) is Gaussian of zero mean and heteroscedastic standard deviation . By Equation (3), the empirical risk loss for Gaussian-noise with heteroscedasticity is (). Assume noise in Equation (2) is the mixed noise of two kinds of noise with the s and , respectively. Suppose that . By Equation (3), the corresponding empirical risk loss of mixed-noise is where are the convex empirical risk losses of the above two kinds of noise characteristic, respectively. The weight factors are and . Figure 1 displays the Gaussian–Laplacian (G-L) empirical risk loss of different parameters (the parameter lambda is ) [29].

Figure 1

G-L empirical risk loss of different parameters.

3. Model of G-L Mixed Noise-Characteristic

Given the training samples , construct the linear regressor . To deal with nonlinear problems, it can be summarized as follows: mapping input vectors into high-dimension feature space H through the nonlinear mapping (take a prior distribution), induced by nonlinear kernel function , kernel mapping is any positive definite Mercer kernel. ([6,28]). Positive definite Mercer kernel: Assume that X is a subset of Φ is called a positive definite Mercer kernel if there is mapping where Therefore, the optimization problem of Space H is solved. At present, the input vectors are replaced by inner product in feature space H. Through the use of kernel , the linear model be extended to a nonlinear . In general, the mixed distribution has fine approximation ability to any continuous distribution. When there is no prior knowledge of real-noise, it can well adapt to unknown or complicated noise. Thus, it is presented that a uniform model with mixed noise characteristics (). The primal problem of model is formalized as where parameter represents weight-vector, b is the bias-term, is the penalty parameter, and the weight factors are , . , is a nonlinear mapping which transfers the input dataset to a higher-dimensional feature space H. is the random noise variable at time . is the convex loss-functions for noise characteristic in sample-point (). In the application domain, most distributions do not obey Gaussian distribution, and they also do not satisfy Laplacian distribution. the noise distribution is complicated, and it is almost impossible to describe real noise with a single distribution. It has been reported that mixed noise models, constituted by multiple noise distributions, perform better than single-noise model [1]. As the function fitting -machine, the goal is to estimate an unknown function from dataset . In this section, G-L mixed homoscedastic and heteroscedastic noise distributions are used to fit complicated noise characteristic.

3.1. Model of G-L Mixed Homoscedastic Noise-Characteristic

Suppose noise in Equation (2) is Gaussian of zero mean and homoscedastic standard deviation . By Equation (3), we have that the empirical risk loss of homoscedastic-Gaussian-noise characteristic is . The Laplacian-noise is . Adopting G-L mixed homoscedastic noise distribution to fit complicated noise-characteristic, by Equation (4), the empirical risk loss about G-L mixed homoscedastic noise is . Putting forward the model of G-L mixed homoscedastic noise-characteristic (), the primal problem of is depicted as where parameter vector , is homoscedastic, is a penalty parameter, and the weight factors are and . The solution of the primal problem in Equation ( The dual problem of the primal problem in Equation ( where We introduce Lagrange functional as Minimizing and deriving the partial-derivative , respectively, on the basis of KKT-conditions, we get We obtain The extreme condition is replaced by , and the maximum value of is obtained. The dual problem in Equation (8) of the primal problem in Equation (7) is derived. □ Therefore, The decision-maker for may be represented as where the parameter vector , , is the inner-product of H and is the kernel-function. Suppose the noise in Equation (2) is Gaussian homoscedastic noise, which is Gaussian noise of zero mean and the homoscedastic variance . Thus, the dual problem of can be derived by Theorem 2:

3.2. Model of G-L Mixed Heteroscedastic Noise-Characteristic

It is assumed that the noise in Equation (2) is Gaussian of zero mean and heteroscedastic standard deviation , that is , . From Equation (3), the empirical risk loss of heteroscedastic Gaussian-noise characteristic is and the loss-function of Laplacian-noise is , . Utilizing G-L mixed heteroscedastic noise distribution to predict complicated noise-characteristic, from Equation (4), the loss function corresponding to G-L mixed heteroscedastic noise is . The new model with G-L mixed heteroscedastic noise-characteristic () is proposed. The primal problem of is depicted as where the parameter vector is , are heteroscedastic, and is the penalty parameter. The weight-factors are and . The solution of the primal problem in Equation ( The dual problem of model where It is easier to derive the proof of Theorem 2 by analogy with Theorem 1. □ We have The decision-maker for may be expressed as where the parameter vector is , , and is the kernel function. Suppose noise in Equation (2) is G-L mixed-homoscedastic-noise, in which Gaussian-noise of zero mean and homoscedastic-variance , Theorem 1 can be deduced from Theorem 2.

4. Solution from ALM

In this section, we use Augmented Lagrange-multiplier method (ALM) [32] to solve the dual problem in Equation (8) by applying Gradient descent or Newton’s method to a sequence of equality-constrained problems. By eliminating equality constraints, arbitrary equality constraints can be reduced to equivalent unconstrained problems [33,34]. If there are large-scale training samples, some rapid optimization techniques can be combined with the proposed model, for example the sequential minimal optimization (SMO) algorithm [29] and the stochastic gradient decent (SDG) algorithm [35]. Theorems 1 and 2 provide effective recognition techniques for and , respectively. In this section, we derive the solution from ALM and the algorithm for model of G-L mixed homoscedastic noise characteristic (). Analogously, the solution of model can be obtained by ALM method. (1) Let dataset be , where , , . (2) The optimal parameters were searched by using the 10-fold cross-validation strategy, and the appropriate kernel function was selected. (3) Solve model of the problem in Equation (8), and get the optimal solution . (4) Build the decision-function as follows The parameter vector is , , , () is the inner product in H, is a kernel function.

5. Case Study

This section tests and verifies the validity of constructed model by comparing it with other techniques in the Heilongjiang, China dataset . This case study consists of the following subsections: G-L mixed-noise characteristic of wind speed, prediction performance evaluation criteria, and short-term wind-speed forecasting based on an actual dataset.

5.1. G-L Mixed-Noise-Characteristic of Wind-Speed

To demonstrate the effectiveness of the proposed model, we collected wind speed data from Heilongjiang. The dataset consists of more than one year of wind speed data, recording wind speed values every 10 min. We first discovered the G-L mixed noise and conducted experiments on it. We found that turbulence is the main reason for the high uncertainty of wind speed random fluctuations. From the perspective of wind energy, the most significant feature of wind energy resources is their variability. Now, it shows the distribution of wind speed. Take a wind speed value every 5 s and calculate the histogram of wind speed within 1–2 h. Two typical distributions are given: one is calculated when the wind speed is high and the other is calculated when the wind speed is low (see Figure 2 and Figure 3, respectively).

Figure 2

High wind speed distribution.

Figure 3

Low wind speed distribution.

We analyzed the one-month time-series dataset, and used the persistence method to investigate the error distribution [32]. The results show that the wind speed error obtained from the persistence prediction is not subject to single distribution, while approximately to G-L mixed distribution, and of is , as shown in Figure 4.

Figure 4

G-L mixed distribution of wind-speed forecasting-error with the persistence method.

As can be seen from the above charts and figures, wind speed error approximately satisfies G-L mixed distribution. This is a mixed kind of task.

5.2. Prediction Performance Evaluation Criteria

It is generally known that no prediction model forecasts perfectly. The predictable performance of , , , and also has certain evaluation criteria, for example MAE (mean absolute error), RMSE (root mean square error), MAPE (mean absolute percentage error), and SEP (the standard error of prediction). The four criteria be defined as follows: where N is the size of the dataset , is the ith actual observed data, and is the ith forecasted-result. is the mean value of observations [36,37,38,39,40]. shows how similar the predicted value is to the observed value, while measures overall deviation between predicted value and observed value. is the ratio between error and observed value. is the ratio of to average observation. They are dimensionless measurements of accuracy of wind speed system, and are sensitive to small changes.

5.3. Short-Term Wind-Speed Forecasting with Real dataset

In this section, 2160 consecutive data (1–2160, time span of 15-days) are extracted as the training set and 720 consecutive data (2161–2880, time span of 5-days) are extracted as the testing set. The input vector is , is the actual observed data of wind speed at moment , and the forecasting value is , where . That is, the above models are used to forecast wind speed of each point after 10, 30 and 60 min, respectively. Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, Figure 10, Figure 11, Figure 12 and Figure 13 describe the forecasting results given by models , , , and .

Figure 5

Result of four wind-speed forecasting models after 10 min.

Figure 6

Error of four wind-speed forecasting models after 10 min.

Figure 7

Residual box plot of four wind-speed forecasting models after 10 min.

Figure 8

Result of four wind-speed forecasting models after 30 min.

Figure 9

Error of four wind-speed forecasting models after 30 min.

Figure 10

Residual box plot of four wind-speed forecasting models after 30 min.

Figure 11

Result of four wind-speed forecasting models after 60 min.

Figure 12

Error of four wind-speed forecasting models after 60 min.

Figure 13

Residual box plot of four wind-speed forecasting models after 60 min.

The models , , , and were implemented in Matlab 7.8. Initial parameters of were , , and . The optimal parameters were searched by using 10-fold cross-validation technique. The technology of parameter selection is studied in detail in [41,42]. In this simulation, parameters were set to . The practical application demonstrates that both polynomial kernel and Gaussian kernel perform well under the assumption of smoothness. Under these circumstances, models , , , and employ polynomial and Gaussian kernel functions [43]: where d is a positive integer and is a positive number. The dual problem of and of the Gaussian-noise model () and are as follows. : The authors of [41,44] define the dual problem of as : The authors of [45,46] studied with equality constraints and inequality constraints. The loss-function of Gaussian-noise is , (). Thus, thus dual problem of is : [22] studied for Gaussian-noise model. The dual problem of is where are slack-variables. , are constants. For and , the size of is not gained, but is a variable whose value is compromised by a constant with the model complexity and relaxation variables through [35]. In Figure 5, Figure 8 and Figure 11, wind-speed forecasting-results at -point of , , , and are presented after 10, 30, and 60 min, respectively. Figure 6, Figure 9, and Figure 12 show the error statistic of wind-speed prediction using the above four models. The box plots (Figure 7, Figure 10, and Figure 13) of several noise levels further intuitively demonstrate the comparative effect of error statistics using the above four wind-speed forecasting models. The statistical criteria of , , and are displayed in Table 1, Table 2 and Table 3.

Table 1

Error statistic of four wind-speed forecasting models after 10 min.

Model	MAE (m/s)	RMSE (m/s)	MAPE (%)	SEP (%)
ν−SVR	0.4280	0.5833	8.02	7.12
GN−SVR	0.4256	0.5789	7.92	7.07
LSSVR	0.4219	0.5768	7.94	7.06
GLM−LSSVR	0.4190	0.5711	7.91	7.05

Table 2

Error statistic of four wind-speed forecasting models after 30 min.

Model	MAE (m/s)	RMSE (m/s)	MAPE (%)	SEP (%)
ν−SVR	0.7979	1.0116	23.36	12.53
GN−SVR	0.7368	0.9886	19.93	11.89
LSSVR	0.7109	0.9226	17.17	11.43
GLM−LSSVR	0.6185	0.8241	10.71	10.19

Table 3

Error statistic of four wind-speed forecasting models after 60 min.

Model	MAE (m/s)	RMSE (m/s)	MAPE (%)	SEP (%)
ν−SVR	0.9994	1.2580	33.93	15.66
GN−SVR	0.9728	1.2355	31.78	15.37
LSSVR	0.9646	1.2177	29.01	15.16
GLM−LSSVR	0.8835	1.1180	25.72	13.97

From box-whisker plots in Figure 7, Figure 10, and Figure 13, as well as Table 1, Table 2 and Table 3, it can be concluded that, in most cases, the forecasting-error of calculation is superior to , and . With the increase of prediction horizon to 30 and 60 min, the forecasting error of different models increases and the relative error decreases. Thus, in these cases, it is not that important. However, Table 1, Table 2 and Table 3 show that, under all the criteria of , , , and , the Gaussian–Laplacian mixed-noise model is slightly better than the classical model.

6. Conclusions

Most existing regression-techniques suppose that the noise model is single. Wind-speed forecasting is complicated due to its volatility and uncertainty, thus it is difficult to model with a single-noise distribution. This section summarizes our main work: (1) optimal empirical risk loss of G-L mixed noise is deduced by Bayesian principle; (2) the of G-L mixed homoscedastic noise () and G-L mixed heteroscedastic noise () for complicate noise is developed; (3) the dual problem of and is obtained using Lagrange-functional and according to KKT conditions; (4) the stability and effectiveness of the algorithm are guaranteed by solving with the ALM method; and (5) the proposed technology is used to predict short-term wind speed by historical data, and then forecast the wind speed at some time after 10, 30, and 60 min, respectively. The comparison results display that the proposed model is better than classical technologies in statistical criteria. In the same way, we can also study Gaussian–Laplacian, or Gaussian–Weibull mixed noise classification models. The new hybrid noise models would effectively solve complicated noise classification problems.

8 in total

1 in total

1. Artificial Intelligence and Computational Methods in the Modeling of Complex Systems.

Authors: Marcin Sosnowski; Jaroslaw Krzywanski; Radomír Ščurek
Journal: Entropy (Basel) Date: 2021-05-10 Impact factor: 2.524

1 in total

LSSVR Model of G-L Mixed Noise-Characteristic with Its Applications.

1. Introduction

2. Bayesian Principle to Mixed Noise Empirical Risk Loss

3. Model of G-L Mixed Noise-Characteristic

3.1. Model of G-L Mixed Homoscedastic Noise-Characteristic

3.2. Model of G-L Mixed Heteroscedastic Noise-Characteristic

4. Solution from ALM

5. Case Study

5.1. G-L Mixed-Noise-Characteristic of Wind-Speed

5.2. Prediction Performance Evaluation Criteria

5.3. Short-Term Wind-Speed Forecasting with Real dataset

6. Conclusions

1. New support vector algorithms

2. Practical selection of SVM parameters and noise estimation for SVM regression.

3. Experimentally optimal nu in support vector regression for different noise models and parameter settings.

4. Bayesian support vector regression using a unified loss function.

5. Fast sparse approximation for least squares support vector machine.

6. Improvements to the SMO algorithm for SVM regression.

7. An introduction to kernel-based learning algorithms.

8. Linear dependency between /spl epsi/ and the input noise in /spl epsi/-support vector regression.

1. Artificial Intelligence and Computational Methods in the Modeling of Complex Systems.