Zheming Tong1, Xin Chen1, Shuiguang Tong1, Qi Yang1. 1. State Key Laboratory of Fluid Power and Mechatronic Systems and School of Mechanical Engineering, Zhejiang University, Hangzhou 310027, China.
Abstract
Flexible operation of large-scale boilers for electricity generation is essential in modern power systems. An accurate prediction of boiler steam temperature is of great importance to the operational efficiency of boiler units to prevent the occurrence of overtemperature. In this study, a dense, residual long short-term memory network (LSTM)-attention model is proposed for steam temperature prediction. In particular, the residual elements in the proposed model have a great advantage in improving the accuracy by adding short skip connections between layers. To provide overall information for the steam temperature prediction, uncertainty analysis based on the proposed model is performed to quantify the uncertainties in steam temperature variations. Our results demonstrate that the proposed method exhibits great performance in steam temperature prediction with a mean absolute error (MAE) of less than 0.6 °C. Compared to algorithms such as support-vector regression (SVR), ridge regression (RIDGE), the recurrent neural network (RNN), the gated recurrent unit (GRU), and LSTM, the prediction accuracy of the proposed model outperforms by 32, 16, 12, 10, and 11% in terms of MAE, respectively. According to our analysis, the dense residual LSTM-attention model is shown to provide an accurate early warning of overtemperature, enabling the development of real-time steam temperature control.
Flexible operation of large-scale boilers for electricity generation is essential in modern power systems. An accurate prediction of boiler steam temperature is of great importance to the operational efficiency of boiler units to prevent the occurrence of overtemperature. In this study, a dense, residual long short-term memory network (LSTM)-attention model is proposed for steam temperature prediction. In particular, the residual elements in the proposed model have a great advantage in improving the accuracy by adding short skip connections between layers. To provide overall information for the steam temperature prediction, uncertainty analysis based on the proposed model is performed to quantify the uncertainties in steam temperature variations. Our results demonstrate that the proposed method exhibits great performance in steam temperature prediction with a mean absolute error (MAE) of less than 0.6 °C. Compared to algorithms such as support-vector regression (SVR), ridge regression (RIDGE), the recurrent neural network (RNN), the gated recurrent unit (GRU), and LSTM, the prediction accuracy of the proposed model outperforms by 32, 16, 12, 10, and 11% in terms of MAE, respectively. According to our analysis, the dense residual LSTM-attention model is shown to provide an accurate early warning of overtemperature, enabling the development of real-time steam temperature control.
The operational flexibility of large-scale boilers plays a critical
role in today’s power systems with a large share of renewable
energy sources.[1,2] However, these boilers may face
various issues such as overtemperature, slagging and corrosion of
the walls, especially under load variations. Recently, scholars carried
out a large number of studies on the use of data-driven models to
monitor boiler operating parameters including the least-squares support
vector (LSSV) model,[3] support vector regression
(SVR) model,[4] autoregressive integrated
moving average (ARIMA) model,[5] and convolutional
neural network (CNN) model.[6] Romeo et al.[7] established a model for biomass boiler monitoring
with an artificial neural network (ANN). Combustion flue gas composition,
staged heat transfer, and the slagging evolution index were predicted
using this model. Sujatha et al.[8] used
a discriminant radial basis network combined with the boiler flame
image collected by a charge coupled device (CCD) camera to monitor
the combustion status of coal-fired boilers. Zeng et al.[9] established a dynamic adaptive four-parameter
discretegrey system model for coalbed methane production, and the
mean relative percentage error was 0.48%. Yu et al.[10] developed an algorithm based on the gray-predictor-based
algorithm (GPBA) to monitor the uncertainty of steam flow and noise.
Tong et al.[11] proposed an online prediction
program for the heating surface dust scale based on wavelet analysis
and SVR. The prediction accuracy on the test data was 98.5%. Grochowalski
et al.[12] developed a CNN program to predict
12 temperature distributions in the boiler combustion chamber, including
48 input parameters, and compared and analyzed the differences between
the 12 target predictions.Stable boiler steam temperature changes
can avoid overtemperature
and tube burst accidents. However, the steam temperature delay characteristic
is significant and extremely unstable with the change in working conditions.
It is very difficult to accurately predict the temperature change.
Mazalan et al.[13] predicted the steam temperature
of a boiler using a Levenberg–Marquardt learning algorithm
combined with a neural network. The study found that the main factors
of the steam temperature change in coal-fired power plants are generator
output power, steam flow, steam pressure, and desuperheating water
flow. However, the best machine learning model still suffers from
poor stability and overfitting.The influence of time on the
prediction accuracy of the steam temperature
cannot be ignored. The generation of the steam temperature has a significant
delay characteristic. The long short-term memory network (LSTM) has
good performance in time-series forecasting, which solves the problems
of gradient disappearance, gradient explosion, and a long sequence
dependence in the long sequence training process.[14,15] Gupta et al.[16] used a single layer of
LSTM with 32 nodes to predict fouling in air preheaters, which can
be predicted 3 months in advance. Tan et al.[17] analyzed the effect of different delay time sequences on the model.
Li et al.[18] studied the comparison of different
granularity windows based on LSTM and found that the LSTM network
exhibited high performance in predicting both short term and long
term, but the difference became larger for long-term prediction. Cheng
et al.[19] performed a sensitivity analysis
based on LSTM to study the importance of multiple variables in the
model. Chen et al.[20] proposed a dynamic
threshold estimation method to identify anomalous data and constructed
performance metrics for key parameters on LSTM. Attention structures
such as transformers have achieved great success in text translation
and speech recognition. A hierarchical attention structure is proposed
to improve the performance of CNN models.[21] Li et al.[22] introduced an attention mechanism
to improve the performance of the the building energy consumption
prediction, and their attention mechanism showed that key useful information
was given greater weight. Inapakurthi et al.[23] proposed a multiobjective evolutionary algorithm to realize the
optimal network structural design of a recurrent neural network (RNN)
and LSTM in univariate and multivariate environments and to improve
the modeling accuracy using Monte Carlo global sensitivity analysis.
This shows that intelligent algorithms achieve excellent results in
network hyperparameter optimization.[24−26] Uncertainty analysis
is important compared to fixed value analysis because of uncertainties
in data measurement, model fitting, and operating conditions. By considering
the influence of multiple influencing factors to optimize the uncertain
parameters, the validity of the uncertainty analysis is proven.[27−29]Although machine learning models have achieved some success
in
many fields,[30,31] research is still needed on boilers.
Boiler combustion is complex but easily produces ash, resulting in
large temperature changes and a large lag. Therefore, an accurate
prediction of steam temperature is challenging. The prediction of
boiler operating parameters has been a hot issue in recent years.
However, the models based on time-series information are less frequently
studied in boilers. To overcome the above shortcomings, a boiler steam
temperature prediction model based on the dense residual LSTM-attention
network is proposed to improve the prediction performance. In addition,
a quantile-based dense residual LSTM-attention network interval prediction
modeling method is proposed, which uses the boiler steam temperature
prediction result interval instead of the traditional deterministic
prediction result.The rest of this study is organized as follows. Section describes the coal-fired
boiler model, experimental data, and proposed methodology in detail. Section provides the results
and a discussion of the proposed method for coal-fired boiler operating
data. Conclusions and future work are given in Section .
Methodology
Coal-Fired Boiler
A real-scale coal-fired
boiler is selected as the case study model. The furnace has a cross
section of 7.09 m × 7.09 m, and
the average flow velocity of flue gas is 8 m/s. Boiler combustion
involves complex heat transfer and flow, and it easily generates a
large amount of ash, which leads to large changes in steam temperature
and long delay times.As shown in Figure , the boiler in this case has a complete
heat exchange process and multiple heating surface combinations. The
pulverized coal enters the furnace through the burner, mixes with
air, and burns to produce flue gas. Then, the flue gas passes through
the panel superheater, a high-temperature superheater, a low-temperature
superheater, and an economizer.
Figure 1
Model of the 170t/h tangential pulverized
coal-fired boiler.
Model of the 170t/h tangential pulverized
coal-fired boiler.The final period of steaming
requires multiple flue gas–water
heat exchanges. As shown in Figure , the water enters the economizer, the steam drum,
and the water wall through the feed pump and circulates back to the
steam drum. Then, the steam flows through the low-temperature superheater,
high-temperature superheater, and plate superheater to produce high-temperature
steam. Changes in fuel and operating conditions can cause fluctuations
in steam temperature. However, this process change requires a long
delay time. Therefore, it is challenging to establish a time-series
model to accurately predict changes in steam temperature.
Figure 2
Schematics
of the operation of a coal-fired boiler.
Schematics
of the operation of a coal-fired boiler.
Experimental Data
The steam temperature
from the boiler can be influenced by many factors. The studied data
was collected by the distributed control system (DCS), including 69
variables such as steam flow, desuperheating water flow, boiler oxygen
flow, blower air flow, and furnace outlet flue gas temperature. These
69 variables influence each other. For example, increasing the air
volume of the blower will increase the temperature of the flue gas
at the outlet of the furnace. High flue gas temperature leads to an
increase in steam temperature. To maintain a stable temperature, the
flow of desuperheating water is usually increased. These variables
also affect the main steam temperature.There are 3360 data
samples in total. The sampling interval is 3 min, which covers 5 days
of historical data for the studied boiler. Figure and Table show the temperature, flow, and pressure data of the
main steam collected in this study. Figure also indicates that the parameter change
of the main steam is nonlinear and irregular. We chose the main steam
temperature as the prediction target, and the temperature range is
between 500.08 and 533.25 °C. In particular, the purpose of our
results is to predict the main steam temperature changes in the future.
The historical data of the main steam temperature can also be used
as input variables in the model.
(a) Steam temperature data. (b) Steam
flow data. (c) Steam pressure
data.We choose
15% of the total data as the test set and 85% of the
total data as the training set. The data collection times of the test
set and the training set do not overlap or leak. Specifically, we
used a total of 2880 samples from 2019/5/24 0:00 to 2019/5/29 23:57
as the training set and a total of 480 samples from 2019/5/29 0:00
to 2019/5/30 23:57 as the test set. Because of the fluctuation and
working characteristics of the sensor, missing values appear in the
data. We fill in the missing values with the values from the previous
moment and normalize the maximum and minimum values.
Dense Residual LSTM-Attention Network
Time-series forecasting
has always been a challenging task. The recurrent
neural network (RNN)[32] is a time-series
model that can effectively process sequence data. Many theories and
experiments on the topic of RNN have been reported. Problems with
RNN are encountered in practice, called “vanishing gradients”
and “exploding gradients”.[33,34] Because of the sequentially connected structure of the RNN, only
recent states are considered. Training RNN becomes difficult when
samples have long sequences of dependencies.We propose a novel
dense residual LSTM-attention network. As shown in Figure , it consists of two parts:
an LSTM part and an attention part. The input [X,X–1,...,X–] is
an [n + 1, c] vector, where n + 1 represents the time dimension and c represents that the input at each time is a c variable.
The input vector is dimensionally transformed by 1 × 1 convolution
to obtain [n + 1, c′], where c′ is the transformed dimension. The obtained results
are connected to the LSTM network part to extract features, and each
layer of LSTM is [n + 1, c′].
The result output by LSTM is connected to the attention part to extract
important feature components, the feature vector from the last time
is selected, and the output dimension is [1, c′].
Finally, the [1, c′] dimensional vector is
connected to the fully connected layer and outputs a k-dimensional vector [Y+1,Y+2,...,Y+].
Figure 4
Framework of the dense residual LSTM-attention
network.
Framework of the dense residual LSTM-attention
network.The residual connection exhibits
good performance in the field
of image recognition. Our work adds a multiple LSTM dense residual
connection structure, and this design can reduce information loss
and better capture more comprehensive information. Figure a is a structural diagram of
LSTM. It controls the inflow and outflow of historical information
through a gating design so as to avoid the disappearance of long-distance
historical information. LSTM is composed of memory cells and three
gated structures. Memory cells are used to store historical information;
three gated structures include the forget gate, input gate, and output
gate. They control the inflow and outflow of information through activation
functions. Iterative formulas are used to extract timing-related features
between gates. They are calculated using the follow equations.where σ represents
the sigmoid activation
function; tanh represents the double tangent activation function; W and U represents the weight matrix. Assuming
the dimension of the LSTM cell input and output is m, the dimension of the weight matrix is [m, m]. b represents the bias matrix; f, i, o, g, c, and h represent the forget gate, input gate, output gate,
nonlinear transformation of the input, cell state, hidden state, respectively;
and ⊙ represents elementwise multiplication.
Figure 5
(a) Architecture of an
LSTM cell. (b) Architecture of attention.
(a) Architecture of an
LSTM cell. (b) Architecture of attention.The attention part is a multihead attention structure. This design
performs well in the fields of translation and text recognition. As
shown in Figure b,
it uses the query-key-value (Q–K–V) mode to output self-attention
to the input sequence. It can extract the features of important regions.
The attention parts can be connected to each other. Attention can
be expressed in the following formwhere Q is
the query matrix, K is the key matrix, V is the value matrix, and d is the hidden dimension of K. The dimension
of input X is n, and Q, K, and V are is obtained by multiplying
input X and parameter matrix W.
The dimensions of Q, K, and V are [n, d].To evaluate the developed steam temperature
prediction model, four
evaluation indicators were introduced, namely, the mean absolute error
(MAE), square root error (MSE), mean square root error (RMSE), and
r-square (R2). The following equations are the definitions of the
four evaluation indicatorswhere y is the
actual value, ŷ is the predicted value, and n is the number of
samples.
Method for Uncertainty Analysis
There
are many uncertain factors in the boiler steam–water system.
The final steam temperature has uncertain fluctuations. The reliability
of results must be considered. Traditional deterministic prediction
methods cannot give the quality of steam temperature prediction results.
The uncertainty analysis method can give the confidence interval of
the prediction result. This interval describes the possible range
of future forecast results. More accurate forecasts are accompanied
by lower uncertainty. Therefore, the realization of the interval prediction
of the steam temperature helps boiler staff to understand the uncertainty
and risk levels of future steam temperature changes.We proposed
an interval prediction modeling method based on the quantile and dense
residual LSTM-attention network. It can effectively solve the problem
of uncertainty prediction modeling of the boiler main steam temperature
and estimate the quality of the prediction result of the steam temperature. Figure shows the calculation
process of uncertainty analysis. Quantile regression is used to study
the relationship between the input X and the conditional
quantile of the response variable Y, namely,where τ is
the quantile, the value is
0–1, β(τ) = [β0(τ), β1(τ), β2(τ), ...., β(τ)]represents the quantile regression
coefficient, which will change as the quantile changes, and n is the number of samples.
Figure 6
Uncertainty analysis and calculation process.
Uncertainty analysis and calculation process.The quantile regression coefficients β(τ)
can be used
to obtain the response of the input variable to the output variable
in different quantiles. When τ takes continuous values in (0,
1), the conditional distribution of the response variable can be obtained,
and then the conditional density can be obtained. Model training can
be transformed into solving the following optimization problems:[35]where y is the actual value, X is the input, ŷ is the predicted value, τ is the quantile point, and
β(τ) represents the quantile regression coefficient.We divide the training set and validation set from the training
data. The stopping criterion for model training is that the square
root error of the validation set is less than the tolerance set of
the algorithm or more than the maximum number of rounds, and the algorithm
terminates.To accurately and quantitatively perform uncertainty
analysis,
the quantile score (QS) and the prediction-interval-normalized root-mean-square
width (PINRW) are used to evaluate the quality of probabilistic prediction
results. QS can be expressed in the following formwhere τ
is the quantile point, τ corresponds to the ith τ, r represents the number of
τ values, N is the number of time points, y is the actual value at time t, and ŷ(τ) is the predicted
value at times t and τ. PINRW is an indicator
of the width of the interval. It can be expressed in the following
formwhere α is the confidence
and R is the range of samples. Uα is the upper bound of confidence at time t, and Lα is the lower bound of confidence
at time t.
Results
and Discussion
Performance of Different
Models
In
our experiments, we comprehensively compare the proposed model with
SVR, ridge regression (RIDGE), RNN, the gated recurrent unit (GRU),
and LSTM. For all models, we divided the training set and validation
set from the training data for model training. We use 10-fold cross-validation
to prevent the risk of overfitting. The optimal weights of the model
on the validation set are selected to be validated on the test set.
All of our experiments are repeated 10 times, and the mean values
of the metrics are compared to remove the randomness of model training.
For all models, the input historical time dimension is 5, the feature
dimension for each historical time is constructed from data points
of 69 variables, and the output dimension is 1. We applied a grid
search to optimize the parameters of the SVR and RIDGE models. Parameters
such as the learning rate and the number of nodes in each layer in
the RNN, GRU, and LSTM models and our proposed model are the same
and are set empirically and chosen by trial and error. All models
are built on the pytorch platform. Additional details and control
experiments are provided in Table .
Table 2
Parameter Settings of Comparative
Experiments
model
parameters
quantity
SVR
c
0–1
gamma
0–1
RIDGE
alpha
0–1
(RNN, GRU, LSTM)
number of layers
3
number of neurons in each layer
64
learning rate
0.001
number of iterations
30
LSTM-attention
number
of LSTM layers
3
number of neurons in each LSTM layer
64
number of attention layers
1
learning rate
0.001
number of iterations
30
Figure is a comparison
diagram of the predicted value and the true value of the dense residual
LSTM-attention network. It can be seen that our method has a good
predictive effect, and the curve of the predicted value and the actual
value is very close. In the prediction of the next 3 min, the errors
in the test set are almost all within ±5%.
Figure 7
Prediction and experimental
results for the next 3 min.
Prediction and experimental
results for the next 3 min.The predictive effects of different models are quantitatively analyzed.
As shown in Table , compared with SVR, RIDGE, RNN, GRU, and LSTM, our method achieves
competitive advantages on the four evaluation metrics for predicting
the steam temperature in the next 3 min. Specifically, the error in
our method for the mean absolute error (MAE) is 0.059, which indicates
that the average error between the predicted value and the experimental
value is within ±0.6 °C. Compared to SVR and RIDGE, the
prediction accuracy of the proposed model outperforms those of SVR
and RIDGE by 32% and that of MAE by 16%. SVR and RIDGE cannot extract
the temporal features of variables, so they have poor predictive performance.
The LSTM model error is 0.667 on MAE. However, our model has relative
improvements of 12, 10, and 11% compared to the RNN, GRU, and LSTM
models. This is because the residual elements and attention elements
can obtain richer information. The closer the value of the r-square
(R2) indicator to 1, the better. In addition, our model achieves the
square root error (MSE), mean square root error (RMSE), and R2 metrics,
providing better prediction performance.
Table 3
Comparison
of Our Proposed Model with
Different Models
model
MAE
MSE
RMSE
R2
SVR
0.875
1.300
1.140
0.947
RIDGE
0.710
0.951
0.975
0.961
RNN
0.681
0.876
0.936
0.964
GRU
0.662
0.825
0.908
0.966
LSTM
0.667
0.828
0.910
0.966
LSTM-attention
0.596
0.714
0.845
0.971
Parametric
Analysis
To analyze the
prediction performance of the model in more detail, we analyzed the
prediction effect of the long-term horizon and the feature importance
analysis.
Long-Term Horizon
As shown in Figure , we compare the
effect of different prediction durations 3–30 min ahead. It
can be seen that the prediction performance of our model is the best
for both short-term and long-term predictions. Compared to SVR, RIDGE,
RNN, GRU, and LSTM, our model improves by 29, 26, 19, 15, and 17%
in terms of average MAE from 3 to 30 min. The prediction accuracy
decreases as the prediction time increases. For example, forecasting
3 min ahead provides better performance than forecasting 6 min ahead.
This is because when the advance prediction time increases, the steam
temperature of the boiler fluctuates greatly, which increases the
difficulty of prediction. Compared to the LSTM model, our model is
11% lower in MAE, 14% lower in MSE, and 7% lower in RMSE in predicting
temperature 3 min ahead. Meanwhile, our model is 13% lower in MAE,
30% lower in MSE, and 17% lower in RMSE in predicting temperature
15 min ahead. This shows that our model is more advantageous in predicting
long-term time series, but when forecasting 15–30 min in advance,
all model errors are large. To obtain accurate results, it is recommended
that the prediction time be less than 15 min.
Figure 8
Error of prediction 3–30
min into the future.
Error of prediction 3–30
min into the future.
Feature
Importance Analysis
We
conduct an experimental analysis on the predictive importance of the
characteristics of the model variables. The shapley additive explanations
(SHAP) method was used to calculate the variable’s contribution
to the predicted target value,[36] which
is important for model interpretability. SHAP provides key parameters
that affect the target and whether the contribution of input features
is positive or negative. As shown in Figure , a positive value of the shape value indicates
a positive correlation, and a negative value indicates a negative
correlation. The darker the red, the greater the importance. It can
be seen that the historical main steam temperature and header temperature
are positively correlated with the predicted target, and the desuperheating
water flow rate is negatively correlated with the predicted target.
The desuperheating water flow, historical main steam temperature,
and header temperature are the main factors that affect the predicted
steam temperature target. Changes in these important variables should
be noted when monitoring the boiler steam temperature. This helps
to regulate the steam temperature and quickly locate the cause of
overtemperature faults.
Figure 9
Description of the importance of different features
for a 3 min
step ahead prediction.
Description of the importance of different features
for a 3 min
step ahead prediction.
Uncertainty
Analysis Results
On the
basis of the quantile and dense residual LSTM-attention model, we
predicted the boiler steam temperature interval. As shown in Figure , the lower uncertainty
has a narrower forecast fluctuation range, which means that the quality
of the forecast results is more accurate. For a forecast 3 min ahead,
the range of uncertainty is small. However, comparing the prediction
intervals of different time steps in the future, as the future prediction
time increases, the fluctuation range of the prediction interval becomes
larger. For example, the uncertainty interval of the forecast 18 min
ahead is large, which means that the quality of the forecast results
decreases.
Figure 10
(a–f) Different confidence intervals predictions
for the
range of 3–18 min, with the the median value in red and the
experimental value in black. The colors of different intervals represent
the prediction results under different confidence levels.
(a–f) Different confidence intervals predictions
for the
range of 3–18 min, with the the median value in red and the
experimental value in black. The colors of different intervals represent
the prediction results under different confidence levels.As shown in Table , the influence of the prediction step size on the prediction
performance
in the uncertainty interval is quantitatively analyzed in terms of
QS and PINRW. QS is an indicator of the quantitative analysis of the
interval prediction error. PINRW represents the average widths of
prediction intervals. A good interval prediction result should be
shortened with a small average width of the prediction interval and
a location close to the experimental value. It can be seen that the
prediction time increases from 3 to 18 min and that QS increases by
81%. Furthermore, PINRW decreases as the confidence interval decreases
from 90 to 60%. This shows that the model can obtain the effective
lower and upper bounds of the prediction results. Therefore, uncertainty
analysis provides more comprehensive information for the prediction
results and helps to judge the reliability of the model prediction
results.
Table 4
Comparison of Uncertain Quantitative
Indicators for the Range of 3–18 min
future time (min)
QS/°C
PINRW (90%)
PINRW (80%)
PINRW (70%)
PINRW (60%)
3
0.3017
0.1261
0.0837
0.0555
0.0163
6
0.7694
0.3355
0.1171
0.1000
0.1395
9
1.0127
0.5368
0.2360
0.1461
0.1453
12
1.2665
0.4624
0.3063
0.1831
0.0708
15
1.4107
0.4584
0.2568
0.2257
0.0989
18
1.6279
0.4846
0.2925
0.2146
0.1434
Conclusions
In this
study, we propose a dense residual LSTM-attention model
to predict boiler steam temperature. Our model establishes a relationship
between variables such as desuperheating water flow, amount of boiler
oxygen, and the target change in steam temperature. The model is evaluated
by using the operating data of a real boiler, and the results show
that the proposed model is effective. According to our analysis, our
model has the following advantages: (1) Compared to SVR, RIDGE, RNN,
GRU, and LSTM methods, our method can achieve better performance in
forecasting. In the target prediction of the next 3 min, the mean
absolute error of our method with respect to the test data is within
±0.6 °C. (2) Compared to the prediction effect of the model
at multiple different future times, the results show that our model
can predict long-term time series well. Meanwhile, a prediction horizon
of less than 15 min is suggested for steam temperature prediction.
(3) Different variable features have different degrees of importance
in the dense residual LSTM-attention model. Parameters such as the
desuperheating water flow, historical main steam temperature, and
header temperature are the main factors that affect the prediction
results of the steam temperature. (4) We have proposed an interval
uncertainty analysis method, which can give the fluctuation interval
of the result and provide comprehensive information for boilers.The research results are of great significance for fault early
warning and energy efficiency improvement and can also be applied
to multivariate time series problems such as solar power generation
steam temperature predictions and wind power uncertainty analysis.
Future work will focus on combining the proposed prediction model
with an operational control strategy for effective boiler operation.