Literature DB >> 35415332

Dense Residual LSTM-Attention Network for Boiler Steam Temperature Prediction with Uncertainty Analysis.

Zheming Tong¹, Xin Chen¹, Shuiguang Tong¹, Qi Yang¹.

Abstract

Flexible operation of large-scale boilers for electricity generation is essential in modern power systems. An accurate prediction of boiler steam temperature is of great importance to the operational efficiency of boiler units to prevent the occurrence of overtemperature. In this study, a dense, residual long short-term memory network (LSTM)-attention model is proposed for steam temperature prediction. In particular, the residual elements in the proposed model have a great advantage in improving the accuracy by adding short skip connections between layers. To provide overall information for the steam temperature prediction, uncertainty analysis based on the proposed model is performed to quantify the uncertainties in steam temperature variations. Our results demonstrate that the proposed method exhibits great performance in steam temperature prediction with a mean absolute error (MAE) of less than 0.6 °C. Compared to algorithms such as support-vector regression (SVR), ridge regression (RIDGE), the recurrent neural network (RNN), the gated recurrent unit (GRU), and LSTM, the prediction accuracy of the proposed model outperforms by 32, 16, 12, 10, and 11% in terms of MAE, respectively. According to our analysis, the dense residual LSTM-attention model is shown to provide an accurate early warning of overtemperature, enabling the development of real-time steam temperature control.

Entities: Chemical

Year: 2022 PMID： 35415332 PMCID： PMC8992261 DOI： 10.1021/acsomega.2c00615

Source DB: PubMed Journal: ACS Omega ISSN： 2470-1343

Introduction

The operational flexibility of large-scale boilers plays a critical role in today’s power systems with a large share of renewable energy sources.[1,2] However, these boilers may face various issues such as overtemperature, slagging and corrosion of the walls, especially under load variations. Recently, scholars carried out a large number of studies on the use of data-driven models to monitor boiler operating parameters including the least-squares support vector (LSSV) model,[3] support vector regression (SVR) model,[4] autoregressive integrated moving average (ARIMA) model,[5] and convolutional neural network (CNN) model.[6] Romeo et al.[7] established a model for biomass boiler monitoring with an artificial neural network (ANN). Combustion flue gas composition, staged heat transfer, and the slagging evolution index were predicted using this model. Sujatha et al.[8] used a discriminant radial basis network combined with the boiler flame image collected by a charge coupled device (CCD) camera to monitor the combustion status of coal-fired boilers. Zeng et al.[9] established a dynamic adaptive four-parameter discretegrey system model for coalbed methane production, and the mean relative percentage error was 0.48%. Yu et al.[10] developed an algorithm based on the gray-predictor-based algorithm (GPBA) to monitor the uncertainty of steam flow and noise. Tong et al.[11] proposed an online prediction program for the heating surface dust scale based on wavelet analysis and SVR. The prediction accuracy on the test data was 98.5%. Grochowalski et al.[12] developed a CNN program to predict 12 temperature distributions in the boiler combustion chamber, including 48 input parameters, and compared and analyzed the differences between the 12 target predictions. Stable boiler steam temperature changes can avoid overtemperature and tube burst accidents. However, the steam temperature delay characteristic is significant and extremely unstable with the change in working conditions. It is very difficult to accurately predict the temperature change. Mazalan et al.[13] predicted the steam temperature of a boiler using a Levenberg–Marquardt learning algorithm combined with a neural network. The study found that the main factors of the steam temperature change in coal-fired power plants are generator output power, steam flow, steam pressure, and desuperheating water flow. However, the best machine learning model still suffers from poor stability and overfitting. The influence of time on the prediction accuracy of the steam temperature cannot be ignored. The generation of the steam temperature has a significant delay characteristic. The long short-term memory network (LSTM) has good performance in time-series forecasting, which solves the problems of gradient disappearance, gradient explosion, and a long sequence dependence in the long sequence training process.[14,15] Gupta et al.[16] used a single layer of LSTM with 32 nodes to predict fouling in air preheaters, which can be predicted 3 months in advance. Tan et al.[17] analyzed the effect of different delay time sequences on the model. Li et al.[18] studied the comparison of different granularity windows based on LSTM and found that the LSTM network exhibited high performance in predicting both short term and long term, but the difference became larger for long-term prediction. Cheng et al.[19] performed a sensitivity analysis based on LSTM to study the importance of multiple variables in the model. Chen et al.[20] proposed a dynamic threshold estimation method to identify anomalous data and constructed performance metrics for key parameters on LSTM. Attention structures such as transformers have achieved great success in text translation and speech recognition. A hierarchical attention structure is proposed to improve the performance of CNN models.[21] Li et al.[22] introduced an attention mechanism to improve the performance of the the building energy consumption prediction, and their attention mechanism showed that key useful information was given greater weight. Inapakurthi et al.[23] proposed a multiobjective evolutionary algorithm to realize the optimal network structural design of a recurrent neural network (RNN) and LSTM in univariate and multivariate environments and to improve the modeling accuracy using Monte Carlo global sensitivity analysis. This shows that intelligent algorithms achieve excellent results in network hyperparameter optimization.[24−26] Uncertainty analysis is important compared to fixed value analysis because of uncertainties in data measurement, model fitting, and operating conditions. By considering the influence of multiple influencing factors to optimize the uncertain parameters, the validity of the uncertainty analysis is proven.[27−29] Although machine learning models have achieved some success in many fields,[30,31] research is still needed on boilers. Boiler combustion is complex but easily produces ash, resulting in large temperature changes and a large lag. Therefore, an accurate prediction of steam temperature is challenging. The prediction of boiler operating parameters has been a hot issue in recent years. However, the models based on time-series information are less frequently studied in boilers. To overcome the above shortcomings, a boiler steam temperature prediction model based on the dense residual LSTM-attention network is proposed to improve the prediction performance. In addition, a quantile-based dense residual LSTM-attention network interval prediction modeling method is proposed, which uses the boiler steam temperature prediction result interval instead of the traditional deterministic prediction result. The rest of this study is organized as follows. Section describes the coal-fired boiler model, experimental data, and proposed methodology in detail. Section provides the results and a discussion of the proposed method for coal-fired boiler operating data. Conclusions and future work are given in Section .

Methodology

Coal-Fired Boiler

A real-scale coal-fired boiler is selected as the case study model. The furnace has a cross section of 7.09 m  ×  7.09 m, and the average flow velocity of flue gas is 8 m/s. Boiler combustion involves complex heat transfer and flow, and it easily generates a large amount of ash, which leads to large changes in steam temperature and long delay times. As shown in Figure , the boiler in this case has a complete heat exchange process and multiple heating surface combinations. The pulverized coal enters the furnace through the burner, mixes with air, and burns to produce flue gas. Then, the flue gas passes through the panel superheater, a high-temperature superheater, a low-temperature superheater, and an economizer.

Figure 1

Model of the 170t/h tangential pulverized coal-fired boiler.

Model of the 170t/h tangential pulverized coal-fired boiler. The final period of steaming requires multiple flue gas–water heat exchanges. As shown in Figure , the water enters the economizer, the steam drum, and the water wall through the feed pump and circulates back to the steam drum. Then, the steam flows through the low-temperature superheater, high-temperature superheater, and plate superheater to produce high-temperature steam. Changes in fuel and operating conditions can cause fluctuations in steam temperature. However, this process change requires a long delay time. Therefore, it is challenging to establish a time-series model to accurately predict changes in steam temperature.

Figure 2

Schematics of the operation of a coal-fired boiler.

Experimental Data

The steam temperature from the boiler can be influenced by many factors. The studied data was collected by the distributed control system (DCS), including 69 variables such as steam flow, desuperheating water flow, boiler oxygen flow, blower air flow, and furnace outlet flue gas temperature. These 69 variables influence each other. For example, increasing the air volume of the blower will increase the temperature of the flue gas at the outlet of the furnace. High flue gas temperature leads to an increase in steam temperature. To maintain a stable temperature, the flow of desuperheating water is usually increased. These variables also affect the main steam temperature. There are 3360 data samples in total. The sampling interval is 3 min, which covers 5 days of historical data for the studied boiler. Figure and Table show the temperature, flow, and pressure data of the main steam collected in this study. Figure also indicates that the parameter change of the main steam is nonlinear and irregular. We chose the main steam temperature as the prediction target, and the temperature range is between 500.08 and 533.25 °C. In particular, the purpose of our results is to predict the main steam temperature changes in the future. The historical data of the main steam temperature can also be used as input variables in the model.

Figure 3

(a) Steam temperature data. (b) Steam flow data. (c) Steam pressure data.

Table 1

Steam Temperature Data Statistics

temperature	count	mean	max	min
data	3360	519.08 °C	533.25 °C	500.08 °C

(a) Steam temperature data. (b) Steam flow data. (c) Steam pressure data. We choose 15% of the total data as the test set and 85% of the total data as the training set. The data collection times of the test set and the training set do not overlap or leak. Specifically, we used a total of 2880 samples from 2019/5/24 0:00 to 2019/5/29 23:57 as the training set and a total of 480 samples from 2019/5/29 0:00 to 2019/5/30 23:57 as the test set. Because of the fluctuation and working characteristics of the sensor, missing values appear in the data. We fill in the missing values with the values from the previous moment and normalize the maximum and minimum values.

Dense Residual LSTM-Attention Network

Time-series forecasting has always been a challenging task. The recurrent neural network (RNN)[32] is a time-series model that can effectively process sequence data. Many theories and experiments on the topic of RNN have been reported. Problems with RNN are encountered in practice, called “vanishing gradients” and “exploding gradients”.[33,34] Because of the sequentially connected structure of the RNN, only recent states are considered. Training RNN becomes difficult when samples have long sequences of dependencies. We propose a novel dense residual LSTM-attention network. As shown in Figure , it consists of two parts: an LSTM part and an attention part. The input [X,X–1,...,X–] is an [n + 1, c] vector, where n + 1 represents the time dimension and c represents that the input at each time is a c variable. The input vector is dimensionally transformed by 1 × 1 convolution to obtain [n + 1, c′], where c′ is the transformed dimension. The obtained results are connected to the LSTM network part to extract features, and each layer of LSTM is [n + 1, c′]. The result output by LSTM is connected to the attention part to extract important feature components, the feature vector from the last time is selected, and the output dimension is [1, c′]. Finally, the [1, c′] dimensional vector is connected to the fully connected layer and outputs a k-dimensional vector [Y+1,Y+2,...,Y+].

Figure 4

Framework of the dense residual LSTM-attention network.

Framework of the dense residual LSTM-attention network. The residual connection exhibits good performance in the field of image recognition. Our work adds a multiple LSTM dense residual connection structure, and this design can reduce information loss and better capture more comprehensive information. Figure a is a structural diagram of LSTM. It controls the inflow and outflow of historical information through a gating design so as to avoid the disappearance of long-distance historical information. LSTM is composed of memory cells and three gated structures. Memory cells are used to store historical information; three gated structures include the forget gate, input gate, and output gate. They control the inflow and outflow of information through activation functions. Iterative formulas are used to extract timing-related features between gates. They are calculated using the follow equations.where σ represents the sigmoid activation function; tanh represents the double tangent activation function; W and U represents the weight matrix. Assuming the dimension of the LSTM cell input and output is m, the dimension of the weight matrix is [m, m]. b represents the bias matrix; f, i, o, g, c, and h represent the forget gate, input gate, output gate, nonlinear transformation of the input, cell state, hidden state, respectively; and ⊙ represents elementwise multiplication.

Figure 5

(a) Architecture of an LSTM cell. (b) Architecture of attention.

(a) Architecture of an LSTM cell. (b) Architecture of attention. The attention part is a multihead attention structure. This design performs well in the fields of translation and text recognition. As shown in Figure b, it uses the query-key-value (Q–K–V) mode to output self-attention to the input sequence. It can extract the features of important regions. The attention parts can be connected to each other. Attention can be expressed in the following formwhere Q is the query matrix, K is the key matrix, V is the value matrix, and d is the hidden dimension of K. The dimension of input X is n, and Q, K, and V are is obtained by multiplying input X and parameter matrix W. The dimensions of Q, K, and V are [n, d]. To evaluate the developed steam temperature prediction model, four evaluation indicators were introduced, namely, the mean absolute error (MAE), square root error (MSE), mean square root error (RMSE), and r-square (R2). The following equations are the definitions of the four evaluation indicatorswhere y is the actual value, ŷ is the predicted value, and n is the number of samples.

Method for Uncertainty Analysis

There are many uncertain factors in the boiler steam–water system. The final steam temperature has uncertain fluctuations. The reliability of results must be considered. Traditional deterministic prediction methods cannot give the quality of steam temperature prediction results. The uncertainty analysis method can give the confidence interval of the prediction result. This interval describes the possible range of future forecast results. More accurate forecasts are accompanied by lower uncertainty. Therefore, the realization of the interval prediction of the steam temperature helps boiler staff to understand the uncertainty and risk levels of future steam temperature changes. We proposed an interval prediction modeling method based on the quantile and dense residual LSTM-attention network. It can effectively solve the problem of uncertainty prediction modeling of the boiler main steam temperature and estimate the quality of the prediction result of the steam temperature. Figure shows the calculation process of uncertainty analysis. Quantile regression is used to study the relationship between the input X and the conditional quantile of the response variable Y, namely,where τ is the quantile, the value is 0–1, β(τ) = [β0(τ), β1(τ), β2(τ), ...., β(τ)]represents the quantile regression coefficient, which will change as the quantile changes, and n is the number of samples.

Figure 6

Uncertainty analysis and calculation process.

Uncertainty analysis and calculation process. The quantile regression coefficients β(τ) can be used to obtain the response of the input variable to the output variable in different quantiles. When τ takes continuous values in (0, 1), the conditional distribution of the response variable can be obtained, and then the conditional density can be obtained. Model training can be transformed into solving the following optimization problems:[35]where y is the actual value, X is the input, ŷ is the predicted value, τ is the quantile point, and β(τ) represents the quantile regression coefficient. We divide the training set and validation set from the training data. The stopping criterion for model training is that the square root error of the validation set is less than the tolerance set of the algorithm or more than the maximum number of rounds, and the algorithm terminates. To accurately and quantitatively perform uncertainty analysis, the quantile score (QS) and the prediction-interval-normalized root-mean-square width (PINRW) are used to evaluate the quality of probabilistic prediction results. QS can be expressed in the following formwhere τ is the quantile point, τ corresponds to the ith τ, r represents the number of τ values, N is the number of time points, y is the actual value at time t, and ŷ(τ) is the predicted value at times t and τ. PINRW is an indicator of the width of the interval. It can be expressed in the following formwhere α is the confidence and R is the range of samples. Uα is the upper bound of confidence at time t, and Lα is the lower bound of confidence at time t.

Results and Discussion

Performance of Different Models

In our experiments, we comprehensively compare the proposed model with SVR, ridge regression (RIDGE), RNN, the gated recurrent unit (GRU), and LSTM. For all models, we divided the training set and validation set from the training data for model training. We use 10-fold cross-validation to prevent the risk of overfitting. The optimal weights of the model on the validation set are selected to be validated on the test set. All of our experiments are repeated 10 times, and the mean values of the metrics are compared to remove the randomness of model training. For all models, the input historical time dimension is 5, the feature dimension for each historical time is constructed from data points of 69 variables, and the output dimension is 1. We applied a grid search to optimize the parameters of the SVR and RIDGE models. Parameters such as the learning rate and the number of nodes in each layer in the RNN, GRU, and LSTM models and our proposed model are the same and are set empirically and chosen by trial and error. All models are built on the pytorch platform. Additional details and control experiments are provided in Table .

Table 2

Parameter Settings of Comparative Experiments

model	parameters	quantity
SVR	c	0–1
	gamma	0–1
RIDGE	alpha	0–1
(RNN, GRU, LSTM)	number of layers	3
	number of neurons in each layer	64
	learning rate	0.001
	number of iterations	30
LSTM-attention	number of LSTM layers	3
	number of neurons in each LSTM layer	64
	number of attention layers	1
	learning rate	0.001
	number of iterations	30

Figure is a comparison diagram of the predicted value and the true value of the dense residual LSTM-attention network. It can be seen that our method has a good predictive effect, and the curve of the predicted value and the actual value is very close. In the prediction of the next 3 min, the errors in the test set are almost all within ±5%.

Figure 7

Prediction and experimental results for the next 3 min.

Prediction and experimental results for the next 3 min. The predictive effects of different models are quantitatively analyzed. As shown in Table , compared with SVR, RIDGE, RNN, GRU, and LSTM, our method achieves competitive advantages on the four evaluation metrics for predicting the steam temperature in the next 3 min. Specifically, the error in our method for the mean absolute error (MAE) is 0.059, which indicates that the average error between the predicted value and the experimental value is within ±0.6 °C. Compared to SVR and RIDGE, the prediction accuracy of the proposed model outperforms those of SVR and RIDGE by 32% and that of MAE by 16%. SVR and RIDGE cannot extract the temporal features of variables, so they have poor predictive performance. The LSTM model error is 0.667 on MAE. However, our model has relative improvements of 12, 10, and 11% compared to the RNN, GRU, and LSTM models. This is because the residual elements and attention elements can obtain richer information. The closer the value of the r-square (R2) indicator to 1, the better. In addition, our model achieves the square root error (MSE), mean square root error (RMSE), and R2 metrics, providing better prediction performance.

Table 3

Comparison of Our Proposed Model with Different Models

model	MAE	MSE	RMSE	R2
SVR	0.875	1.300	1.140	0.947
RIDGE	0.710	0.951	0.975	0.961
RNN	0.681	0.876	0.936	0.964
GRU	0.662	0.825	0.908	0.966
LSTM	0.667	0.828	0.910	0.966
LSTM-attention	0.596	0.714	0.845	0.971

Parametric Analysis

To analyze the prediction performance of the model in more detail, we analyzed the prediction effect of the long-term horizon and the feature importance analysis.

Long-Term Horizon

As shown in Figure , we compare the effect of different prediction durations 3–30 min ahead. It can be seen that the prediction performance of our model is the best for both short-term and long-term predictions. Compared to SVR, RIDGE, RNN, GRU, and LSTM, our model improves by 29, 26, 19, 15, and 17% in terms of average MAE from 3 to 30 min. The prediction accuracy decreases as the prediction time increases. For example, forecasting 3 min ahead provides better performance than forecasting 6 min ahead. This is because when the advance prediction time increases, the steam temperature of the boiler fluctuates greatly, which increases the difficulty of prediction. Compared to the LSTM model, our model is 11% lower in MAE, 14% lower in MSE, and 7% lower in RMSE in predicting temperature 3 min ahead. Meanwhile, our model is 13% lower in MAE, 30% lower in MSE, and 17% lower in RMSE in predicting temperature 15 min ahead. This shows that our model is more advantageous in predicting long-term time series, but when forecasting 15–30 min in advance, all model errors are large. To obtain accurate results, it is recommended that the prediction time be less than 15 min.

Figure 8

Error of prediction 3–30 min into the future.

Feature Importance Analysis

We conduct an experimental analysis on the predictive importance of the characteristics of the model variables. The shapley additive explanations (SHAP) method was used to calculate the variable’s contribution to the predicted target value,[36] which is important for model interpretability. SHAP provides key parameters that affect the target and whether the contribution of input features is positive or negative. As shown in Figure , a positive value of the shape value indicates a positive correlation, and a negative value indicates a negative correlation. The darker the red, the greater the importance. It can be seen that the historical main steam temperature and header temperature are positively correlated with the predicted target, and the desuperheating water flow rate is negatively correlated with the predicted target. The desuperheating water flow, historical main steam temperature, and header temperature are the main factors that affect the predicted steam temperature target. Changes in these important variables should be noted when monitoring the boiler steam temperature. This helps to regulate the steam temperature and quickly locate the cause of overtemperature faults.

Figure 9

Description of the importance of different features for a 3 min step ahead prediction.

Uncertainty Analysis Results

On the basis of the quantile and dense residual LSTM-attention model, we predicted the boiler steam temperature interval. As shown in Figure , the lower uncertainty has a narrower forecast fluctuation range, which means that the quality of the forecast results is more accurate. For a forecast 3 min ahead, the range of uncertainty is small. However, comparing the prediction intervals of different time steps in the future, as the future prediction time increases, the fluctuation range of the prediction interval becomes larger. For example, the uncertainty interval of the forecast 18 min ahead is large, which means that the quality of the forecast results decreases.

Figure 10

(a–f) Different confidence intervals predictions for the range of 3–18 min, with the the median value in red and the experimental value in black. The colors of different intervals represent the prediction results under different confidence levels. As shown in Table , the influence of the prediction step size on the prediction performance in the uncertainty interval is quantitatively analyzed in terms of QS and PINRW. QS is an indicator of the quantitative analysis of the interval prediction error. PINRW represents the average widths of prediction intervals. A good interval prediction result should be shortened with a small average width of the prediction interval and a location close to the experimental value. It can be seen that the prediction time increases from 3 to 18 min and that QS increases by 81%. Furthermore, PINRW decreases as the confidence interval decreases from 90 to 60%. This shows that the model can obtain the effective lower and upper bounds of the prediction results. Therefore, uncertainty analysis provides more comprehensive information for the prediction results and helps to judge the reliability of the model prediction results.

Table 4

Comparison of Uncertain Quantitative Indicators for the Range of 3–18 min

future time (min)	QS/°C	PINRW (90%)	PINRW (80%)	PINRW (70%)	PINRW (60%)
3	0.3017	0.1261	0.0837	0.0555	0.0163
6	0.7694	0.3355	0.1171	0.1000	0.1395
9	1.0127	0.5368	0.2360	0.1461	0.1453
12	1.2665	0.4624	0.3063	0.1831	0.0708
15	1.4107	0.4584	0.2568	0.2257	0.0989
18	1.6279	0.4846	0.2925	0.2146	0.1434

Conclusions

In this study, we propose a dense residual LSTM-attention model to predict boiler steam temperature. Our model establishes a relationship between variables such as desuperheating water flow, amount of boiler oxygen, and the target change in steam temperature. The model is evaluated by using the operating data of a real boiler, and the results show that the proposed model is effective. According to our analysis, our model has the following advantages: (1) Compared to SVR, RIDGE, RNN, GRU, and LSTM methods, our method can achieve better performance in forecasting. In the target prediction of the next 3 min, the mean absolute error of our method with respect to the test data is within ±0.6 °C. (2) Compared to the prediction effect of the model at multiple different future times, the results show that our model can predict long-term time series well. Meanwhile, a prediction horizon of less than 15 min is suggested for steam temperature prediction. (3) Different variable features have different degrees of importance in the dense residual LSTM-attention model. Parameters such as the desuperheating water flow, historical main steam temperature, and header temperature are the main factors that affect the prediction results of the steam temperature. (4) We have proposed an interval uncertainty analysis method, which can give the fluctuation interval of the result and provide comprehensive information for boilers. The research results are of great significance for fault early warning and energy efficiency improvement and can also be applied to multivariate time series problems such as solar power generation steam temperature predictions and wind power uncertainty analysis. Future work will focus on combining the proposed prediction model with an operational control strategy for effective boiler operation.

5 in total