Jiaqi Zheng1, Lianwei Ma2, Yi Wu2, Lingjian Ye3, Feifan Shen2. 1. College of Science & Technology, Ningbo University, Ningbo 315300, People's Republic of China. 2. School of Information Science and Engineering, NingboTech University, Ningbo 315100, People's Republic of China. 3. School of Engineering, Huzhou University, Huzhou 313000, People's Republic of China.
Abstract
A soft sensor is a key component when a real-time measurement is unavailable for industrial processes. Recently, soft sensor models based on deep-learning techniques have been successfully applied to complex industrial processes with nonlinear and dynamic characteristics. However, the conventional deep-learning-based methods cannot guarantee that the quality-relevant features are included in the hidden states when the modeling samples are limited. To address this issue, a supervised hybrid network based on a dynamic convolutional neural network (CNN) and a long short-term memory (LSTM) network is designed by constructing multilayer dynamic CNN-LSTM with improved structures. In each time instant, data augmentation is implemented by dynamic expansion of the original samples. Moreover, multiple supervised hidden units are trained by adding quality variables as part of the layer input to acquire a better quality-related feature learning performance. The effectiveness of the proposed soft senor development is validated through two industrial applications, including a penicillin fermentation process and a debutanizer column.
A soft sensor is a key component when a real-time measurement is unavailable for industrial processes. Recently, soft sensor models based on deep-learning techniques have been successfully applied to complex industrial processes with nonlinear and dynamic characteristics. However, the conventional deep-learning-based methods cannot guarantee that the quality-relevant features are included in the hidden states when the modeling samples are limited. To address this issue, a supervised hybrid network based on a dynamic convolutional neural network (CNN) and a long short-term memory (LSTM) network is designed by constructing multilayer dynamic CNN-LSTM with improved structures. In each time instant, data augmentation is implemented by dynamic expansion of the original samples. Moreover, multiple supervised hidden units are trained by adding quality variables as part of the layer input to acquire a better quality-related feature learning performance. The effectiveness of the proposed soft senor development is validated through two industrial applications, including a penicillin fermentation process and a debutanizer column.
In practical industry,
it is necessary and important to explore
the real-time indices of product quality to ensure the safety and
efficiency of industrial processes. However, key variables indicating
the quality are often difficult to measure, while other ordinary process
variables can be routinely collected.[1] These
quality variables need to be inspected by laboratory analyzers, which
are costly and areunable to meet the demand of real-time measurement.
In addition, the first-principles model is becoming extremely difficult
to obtain due to the complexity of modern industries.[2−4] To alleviate the aforementioned problems, soft sensors have been
developed to implement the online measurement of quality variables
by constructing inherent data-driven models based on those easy-to-measure
process variables.In the past decades, virtual sensing techniques
have been successfully
applied to many industrial processes, bringing about accurate real-time
prediction results of quality variables.[5,6] Principal component
analysis (PCA) and partial least squares (PLS) are typical data-driven
methods to establish soft sensor models.[7−11] In order to cope with the nonlinear feature of industrial processes,
several developments have been made on the basis of PCA and PLS for
soft sensor modeling. A typical strategy is constructing a kernel
function to establish the nonlinear mapping between the original data
space and a high-dimensional space.[12,13] Therefore,
a linear regression model can be constructed in the high-dimensional
space, where nonlinear optimization can be avoided and the data characteristic
can be captured. Other methods beyond the framework of PCA and PLS
are also designed to meet the demand of quality prediction for nonlinear
processes such as support vector regression (SVR),[14,15] Gaussian process regression (GPR),[16−18] and an artificial neural
network (ANN).[19,20] ANN is one of the most widely
virtual sensing modeling methods and can give a promising performance
of nonlinear approximation and adaptive learning. In terms of the
advantages, the quality prediction results by ANN for nonlinear processes
with a large data scale are fairly good. Unfortunately, ANN-based
soft sensor models often confront the gradient vanishing and exploding
problem during the training procedure when the network structure is
complicated.To resolve this limitation and improve the robustness
of quality
prediction for nonlinear processes, soft sensors based on deep-learning
techniques have been developed in recent years.[21] A deep belief network (DBN),[22−24] a staked autoencoder
(SAE),[25−27] and a convolutional neural network (CNN)[28−31] are practical tools to construct soft sensor models for nonlinear
processes. In comparison to conventional soft sensor models, the deep-learning-based
models can provide a better quality prediction performance for nonlinear
processes. Further improvements have been made by scholars on the
basis of the original deep-network structures to handle other process
characteristics. For example, Sun and Ge developed a novel soft sensor
model based on a gated stacked target-related autoencoder (GSTAE)
by adding prediction errors of target values into the loss function
when the pretraining procedure was executed, which improved the prediction
performance in comparison to the conventional SAE-based soft sensor
models.[32] Yuan et al. proposed a soft sensor
framework with a multichannel CNN model to capture local correlations
of distant process variables.[33] In addition,
the dynamic nature of a process is another common and important issue
for soft sensing modeling. Generally, variable trajectories of practical
processes present correlations along a time index, which results in
the coexistence of process nonlinearity and dynamics. To address this
problem, dynamic soft sensor models have been designed to obtain an
accurate prediction performance for complex processes. For example,
He et al. developed a soft sensor model using a dynamic extreme learning
machine (ELM) by adding a special linear hidden layer node based on
the traditional ELM.[34] Lately, a novel
soft sensor development using an echo state network (ESN) integrated
with a singular value decomposition was proposed and applied to complex
chemical processes.[35] In addition, a recurrent
neural network (RNN) has also been introduced to construct nonlinear
dynamic soft sensors for quality prediction.[36] Although RNN is a mainstream deep-learning model, it still suffers
from the problem of gradient vanishing and exploding due to the “tanh”
activation function. For an improvement of the network structure,
a long short-term memory (LSTM) neural network has been developed
to overcome the deficiency of RNN.[37] The
long-term memory is taken into consideration for LSTM, which is able
to describe the time-series model more accurately with more parameters
in comparison to RNN. So far, LSTM-based soft sensors have been successfully
designed and applied to different industrial processes with both nonlinear
and dynamic properties.[38,39] However, soft sensor
models based on the conventional RNN and LSTM structures are unsupervised,
which means that the quality information may not been exploited in
the hidden units. To make full use of the quality data, a soft sensor
model based on a dynamic neural network named nonlinear autoregression
with exogenous input (NARX) was designed.[40] Under the supervised framework, the correlations between ordinary
process variables and quality variables can be extracted properly
by hidden layers. When a multilayer perceptron is implemented with
NARX, the quality variables are utilized as part of the model input.
However, the quality variables are not directly used for the intermediate
hidden layers that are not connected to the input layer. Further development
can be conducted under the supervised framework. By stacking multiple
layers in a hierarchical way and adding quality variables to each
hidden layer through hierarchical learning, the stacked network is
able to enhance the extraction of deep quality-relevant characteristics
that are beneficial for quality prediction.[41] To make full use of the quality-relevant information, improved supervised
soft sensor models based on deep learning have been developed.[42] Quality variables are employed as part of the
layer input, where the model parameters of each hidden layer are determined
by both the quality variables and process variables. The supervised
soft sensor framework has been proved to be effective to deal with
the quality prediction problem for nonlinear dynamic processes. Although
the deep-learning-based supervised soft sensors can provide acceptable
prediction performance for nonlinear dynamic processes, there are
still some limitations when they face complicated practical processes.
First, most dynamic soft sensor models focus on the temporal correlations
of process data, where the feature of local correlations is not extracted
adequately. As a consequence, effective information and potential
relationships of discontinuous data may be ignored and the accuracy
of soft sensor models will be influenced. Meanwhile, the sampling
interval varies between different processes and the scale of modeling
samples can be small. Thus, data augmentation is a necessary and important
strategy to describe process characteristics thoroughly.In
light of the aforementioned problems, a supervised dynamic CNN-LSTM
(SDCNN-LSTM) network has been designed to construct the soft sensor
model for complex industrial processes with nonlinear and dynamic
features. The major procedure and contributions of the proposed method
are demonstrated as follows. First, quality variables are prepared
for the original unsupervised layers, where the quality-relevant features
can be better captured from each hidden layer. Second, a data augmentation
strategy is designed after the input layer by expanding the original
one-dimensional (1D) samples into two-dimensional (2D) feature maps.
Hence, the scale of modeling data is enlarged and the temporal correlations
remain, which is adopted to solve the problem of data deficiency.
Finally, the hybrid dynamic CNN-LSTM network is constructed on the
basis of the supervised framework with data augmentation. In summary,
the advantages of both CNN and LSTM networks can be used for nonlinear
dynamic processes, where the data augmentation strategy and the full
utilization of quality information will help to improve the accuracy
of the soft sensor model.The rest of the paper is organized
as follows. Section illustrates some basics of
CNN and LSTM networks. Then, the detailed framework of the SDCNN-LSTM
soft sensor is demonstrated in Section . Two applications are introduced to evaluate the performance
of the proposed soft sensor development in Section , including a penicillin fermentation process
and a debutanizer column. Finally, conclusions are made in Section .
Background
Convolutional Neural Network
CNN
is a typical feed-forward neural network, as well as a multilayer
representative deep-learning algorithm. The core idea of CNN is the
scheme of local connection, weight sharing, and pooling. By modeling
strategies based on CNN, significant features of the original data
can be extracted spontaneously to implement target identification,
classification, and recognition. For different CNN frameworks such
as LeNet-5 and its improved form AlexNet, the particular network structures
vary from one network to another. A common feature of these network
structures is that both networks consist of four major layers, which
are the input layer, the convolutional layer, the subsampling layer,
and the output layer.For the input layer, usually a 2D data
matrix is collected from the raw image or sequential data set. For
the convolutional layer, the layer input is the output of the previous
layer. Then, the layer input is operated by convolution kernels to
form several feature maps, while the number of feature maps is equal
to the convolutional kernels. The size of a convolutional kernel for
2D input can be 1D or 2D with fixed kernel weights. A simple and specific
case of the convolutional operation is illustrated in Figure . It can be inferred from Figure that the dimension
of the original input matrix is reduced and the convolved feature
is extracted after the operation through a convolution kernel. For
the convolutional operation, the height ho and width wo of the output can be denoted
aswhere h and w are the
height and width of the input data, respectively, h and w are the height and width of the convolutional kernel,
respectively, p is the padding size, and s is the stride size.
Figure 1
Convolutional operation with a 6 ×
6 input map, 3 × 3
convolution kernel, 1 stride size, and no padding.
Convolutional operation with a 6 ×
6 input map, 3 × 3
convolution kernel, 1 stride size, and no padding.After the convolutional operation, the rectified liner unit
(ReLU)
function defined in eq is adopted as the activation function for the feature maps, which
is able to remarkably improve the learning efficiency and nonlinear
representation.For the subsampling layer, the pooling strategy is implemented
and often works after the convolutional layer. Similarly to the convolutional
layer, the feature of the local connection is extracted in the subsampling
layer. Differently, the pooling rule is predefined and no extra parameters
are required in the model training procedure. Figure shows two types of pooling approaches as
the max pooling strategy and the average pooling strategy, which are
widely used to construct the subsampling layer. Hence, the scale of
the feature maps is reduced while the representative data features
can be preserved.
Figure 2
Two pooling strategies with a 4 × 4 input map, 2
× 2
pooling size, and 2 stride size.
Two pooling strategies with a 4 × 4 input map, 2
× 2
pooling size, and 2 stride size.For the output layer, the feature extraction results from different
channels should be concatenated into a single vector. By this means,
the output layer is also called as connection layer, which establishes
connections among different feature maps as well as the final model
output. As a result, an activation function is also required to achieve
a specific purpose such as classification and regression.
Long Short-Term Memory
A long short-term
memory (LSTM) network is developed on the basis of a recurrent neural
network (RNN). Although RNN has an advantage in handling dynamic processes,
the gradient vanishing problem of RNN often influences the accuracy
of modeling. In comparison with RNN, LSTM is able to avoid the aforementioned
issue by designing the cell and gate structure. The structure of the
single-layer LSTM network is demonstrated in Figure .
Figure 3
Network structure of the single-layer LSTM.
Network structure of the single-layer LSTM.Three gate structures, including the input gate i, the forget gate f, and the output gate o, are defined on the basis
of the LSTM
cell, which are described aswhere σ(·)denotes the sigmoid activation
function that , W** are the
weighting parameters and b* are the bias
parameters of different gate structures, x is the model input at time index t, and h–1 is the LSTM hidden state at time instant t –
1.Then, c̃ defines
what features of the cell input should be kept using the tanh function
asHence, the cell state c can be determined with the
aforementioned network structures
aswhere ⊙
denotes the pointwise multiplication,
which indicates that the current cell state is the weighted combination
of the previous cell state and the current cell input.On the
basis of the current cell state, the hidden state at time
instant t can be calculated asWhen multiple
LSTM layers are constructed to form a deep neural
network, the hidden state will become the cell input of the next layer.
To implement the regression task, the estimated model output ŷ is usually connected
with the hidden state h by the sigmoid activation function asOn consideration of the particular characteristic of the sequential
data, the back-propagation through time (BPTT) algorithm is usually
used to train the LSTM-based network, as presented in the Appendix.
Soft Sensor
Development Based on Supervised
DCNN-LSTM Network
Hybrid Dynamic CNN-LSTM
Network
Although
CNN is a useful technique to extract the latent features of the original
data and reduce the data complexity, it is usually applied to image
processing such as image classification and recognition. As a widely
used deep-learning algorithm, CNN is able to deal with process nonlinearity
effectively. Due to this advantage, the problem of the nonlinear property
of industrial processes can be automatically resolved by the CNN modeling
strategy. However, a single CNN soft sensor model may neglect the
time-series correlations of sequential data while the prediction task
is conducted. To address the problem, the LSTM network is concatenated
to the CNN layer. Different from the existing literature, which takes
the 1D sequential samples as the model input of the hybrid CNN-LSTM
network, a deep DCNN-LSTM structure is developed in this work.The proposed DCNN-LSTM network structure is illustrated in Figure . It can be inferred
that there are two parts involved in the hybrid network. The DCNN
layers are designed to extract the features of sequential data, while
the LSTM layers are developed to predict the process quality. The
original 1D samples are expanded to 2D feature maps through a data
augmentation step. To improve the reliability of the downsampling
stage, both the max pooling and average pooling strategies are adopted
with a concatenating operation. After the pooling layer, a flatten
layer is connected to the network to unfold the data into 1D form.
Then, two LSTM layers are added after the DCNN network. In addition,
both LSTM layers are followed by a dropout layer to avoid the overfitting
problem during the training procedure. After the LSTM structures,
a fully connected layer is designed as the weighted sum of the previous
network output. Finally, a regression layer is generated as the model
output.
Figure 4
Design procedure of the hybrid DCNN-LSTM network.
Design procedure of the hybrid DCNN-LSTM network.The “dynamic” characteristic of the proposed
network
is reflected in two aspects. First of all, a deep neural network with
the LSTM structure has the capacity to extract the dynamic feature
of processes, which means that a CNN-LSTM-based network is a dynamic
model in essence. Meanwhile, it is noted that the model input of this
network in each time instant is a 2D feature map instead of the conventional
1D sample vector for sequential data modeling. The moving window strategy
is used to expand the original 1D vector x(t) = [x1(t), x2(t), ..., x(t)] to the 2D dynamic matrix aswhere n is the number of
process variables and l is the expanding length of
the original vectors. Thus, the model input of the DCNN-LSTM network
can be denoted as {X (t – k + 1), X (t - k), ..., X (t)}, where k is the modeling length of the sequential data.
Soft Sensor with Supervised DCNN-LSTM Model
The LSTM
network has been proven to be an excellent framework for
the soft sensor modeling of nonlinear dynamic processes. To implement
the quality prediction scheme, the values of process variables are
usually collected as the model input, while the key variable that
is difficult to directly measure is regarded as the model output.
However, the feature of the quality variable is often ignored during
the prediction process, since most of the soft sensor models are unsupervised.
To overcome the aforementioned deficiency and make full use of the
obtained quality data, it is necessary to construct a supervised soft
sensor model, where the state of the quality variable should be exploited
as part of the model input.A case of the overall network structure
of the proposed SDCNN-LSTM model with one CNN layer and three LSTM
layers is presented in Figure . To implement the supervised framework, the quality variables
are first introduced to form the 2D feature maps with other input
variables during the data augmentation procedure. Thus, the quality
information is preserved in the CNN layer when local correlations
of variables are extracted. Moreover, in the LSTM structure, the quality
variables are utilized as part of the cell input in each LSTM unit.
The LSTM network structure of this model can be modified as
Figure 5
Network structure of the SDCNN-LSTM soft
sensor.
Network structure of the SDCNN-LSTM soft
sensor.Therefore, the quality information
is involved in the training
procedure of LSTM layers, where the feature extraction performance
of quality-relevant features for nonlinear dynamic processes can be
improved significantly.It is worth mentioning that the current
state of the quality variable
is unable to be obtained during the online prediction procedure, since
it is definitely the model output at the same time. To conduct the
supervised modeling and real-time prediction process, an initialization
step has been carried out to estimate the current output as ỹ(t) = y(t – 1). Therefore, the complete procedure of the
proposed supervised DCNN-LSTM-based soft sensor framework can be summarized
as Figure .
Figure 6
Flow diagram
of the SDCNN-LSTM soft sensor.
Flow diagram
of the SDCNN-LSTM soft sensor.According to Figure , the training stage of the proposed method can be summarized ascollect the training
data set {x(t),y(t)} and conduct variable-wise normalizationaugment the original 1D training samples
to the 2D dynamic matricesdetermine the network structures and
model hyperparameterstrain the SDCNN-LSTM soft sensor model
with the predefined hyperparameters and the training data setThen, the prediction stage can be implemented
on the basis of the
trained soft sensor model ascollect the testing data set {x(t),ỹ(t)}, where ỹ(T) = y(T −1) is the estimated output
at time instant Tconduct the data normalization step
on the basis of the result of the training samplesexpand the original 1D testing samples
to the 2D dynamic matricespredict the current quality variable ŷ(T) on the basis of the trained
SDCNN-LSTM soft sensor modelmove to the next online prediction
stage with T = T + 1To train the proposed soft sensor model, the mean squared
error
(MSE) is utilized as the cost functionwhere n is the number of
the training samples, i is the index of the training
samples, y(i) is the actual value
of the key variable, and ŷ(i) is the prediction result of the key variable. In this work, the
Adam algorithm is used to minimize the cost function during the training
stage.To evaluate the performance of the soft sensor model,
the root-mean-squared
error (RMSE) is calculated for the testing data setwhere k is the number
of
the testing samples, j is the time index, and y(j) and ŷ(j) are the actual and predicted values of the testing quality
variables, respectively. A smaller value of RMSE will indicate that
the general error of prediction is simultaneously less.In addition,
the coefficient of determination R2 is
also calculated for the testing data setwhere
the statistical analysis of the residual
space is carried out and a larger value of R2 will indicate a more accurate prediction performance. The
calculation result of R2 is able to reveal
the total variance in the residual space and the related information
carried in the testing output.From the advantages of the original
CNN-LSTM network and other
dynamic soft sensor models, the proposed supervised network provides
two main developments. The first improvement is the data augmentation
strategy expanding the original 1D samples to the 2D feature maps,
by which the problem of data deficiency can be resolved.[31] By construction of the 2D feature maps, two
types of correlations are involved. One is the variablewise correlations
between variables. The other is the temporal autocorrelations of variables
along the time index. Thus, both local nonlinear spatial and dynamic
feature hierarchies can be learned from the massive unlabeled data
using local patches with convolution and pooling operators layer by
layer. Therefore, the scale of the modeling data is enlarged, where
both the variablewise and temporal correlations that are difficult
to learn for the 1D-data-based model can now be extracted properly.
Another contribution is the design of the supervised network, where
the quality variables are fully used as the input of each hidden layer.
In comparison to the traditional supervised dynamic networks such
as NARX, the quality information is permeated into the entire network
structure by the proposed model, which is able to extract more abundant
quality-related information within the hidden units. The determination
of the hyperparameters is conducted by trial and error. The limitation
of the current work is that the selected hyperparameters may not reach
the optimal values.
Results and Discussion
Penicillin Fermentation Process
The
fed-batch penicillin fermentation process is a typical biochemical
process with both nonlinear and dynamic characteristics, which is
widely used as a benchmark platform for research on soft sensor modeling,
fault diagnosis, real-time control, and production optimization of
industrial processes. The flowchart of the penicillin fermentation
process is presented in Figure . The penicillin fermentation process consists of three operating
phases. During the preculture phase, the biomass reactants are growing
for the preparation of the reaction up to the critical concentration.
Then, the penicillin concentration begins to increase rapidly at the
second phase, where the penicillin production rate reaches its peak.
At the final stage, the production rate of penicillin decreases due
to the consumption of the biomass reactants until the end of a batch.
Figure 7
Flowchart
of the penicillin fermentation process.
Flowchart
of the penicillin fermentation process.The PenSim v2.0 simulator developed by the research group of the
Illinois Institute of Technology is widely used in many studies for
performance evaluation. On the basis of the PenSim benchmark, our
research group redeveloped the simulator in MATLAB/Simulink with the
same kinetic model. The improved simulator allows users to customize
the trajectories of manipulated variables freely, which brings about
adequate flexibility. In this simulator, the penicillin concentration
is regarded as the process quality and the key variable. Twelve other
process variables as given in Table , including manipulated variables and state variables,
are collected as the input of soft sensor models. In practical industry,
it is important to predict the penicillin concentration according
to these easy-to-measure variables to ensure the production safety
and quality.
Table 1
Process Variables of the Penicillin
Process
variable
description
x1
aeration rate
x2
agitator power
x3
substrate feed rate
x4
substrate
temperature
x5
substrate concentration
x6
dissolved oxygen concentration
x7
biomass
concentration
x8
culture volume
x9
CO2 concentration
x10
pH
x11
generated
heat
x12
cooling water flow rate
The total operation time of the penicillin fermentation
process
is 400 h, where the sampling interval of process variables is 1 h.
Thus, 400 samples can be collected for one batch. The first 300 samples
are collected as the training samples, while the remaining 100 samples
are regarded as the testing samples. The trajectories of process variables
are presented in Figure , where strong dynamic characteristics are involved in the process.
Figure 8
Variable
trajectories of the penicillin process.
Variable
trajectories of the penicillin process.According to the process variables in Table , the dimensions of x(t) and y(t) of the SDCNN-LSTM
network are 12 and 1, where 1 convolutional layer and 2 LSTM layers
are constructed. In the convolutional layer, the filter size is set
as [2 1] and the filter number is 15. The pool sizes of the max pooling
layer and the average pooling layer are both set as [3 1]. The numbers
of hidden units in each LSTM layer is set as [50 20 100]. The sequence
length for training and prediction is set as 10. The prediction performance
of the soft sensor can vary due to different training algorithms.
In this work, the Adam algorithm is adopted to train the proposed
network. During the training procedure with the Adam algorithm, the
value of the gradient threshold is 6 and the minimum batch size is
24. In addition, the number of maximum epochs has a great influence
on the prediction result as well. The number of maximum epochs is
selected as [10 20 30 40 50 60 70 80 90 100] for both the training
data and testing data. The detailed prediction results of the proposed
soft sensor under each epoch number are presented in Figure . It can be inferred that the
best performance of quality prediction occurs under the circumstance
of 50 maximum epochs, since the RMSEs of both the training data and
testing data reach a low level. Hence, the number of the maximum epochs
is determined to be 50 in this case.
Figure 9
RMSEs of the penicillin process with different
maximum epochs.
RMSEs of the penicillin process with different
maximum epochs.Although the proposed soft sensor
framework provides a promising
result, it is still insufficient to prove its effectiveness. Therefore,
prediction results of the penicillin concentration based on the LSTM,
DCNN-LSTM, NARX, and SLSTM soft sensors were carried out as comparisons,
where the hyperparameters were determined by the trial-and-error technique
as given in Table .
Table 2
Hyperparameters of the Penicillin
Process by Different Methods
LSTM
DCNN-LSTM
NARX
SLSTM
SDCNN-LSTM
maximum epochs
50
70
100
60
50
gradient
threshold
6
6
6
6
sequence
30
30
30
30
length
input
10
delays
minimum
batch size
24
24
24
24
hidden layers
[50 20 100]
[50 20 100]
[60 30]
[50 20 100]
[50 20 100]
Table shows the
RMSEs of these methods with the same training data and testing data.
Comparatively, the SDCNN-LSTM soft sensor provides smaller prediction
errors and larger R2 values among these
methods, which demonstrates the advantage of the proposed method over
the existing methods. As complements, the detailed penicillin concentration
prediction results of these methods are presented in Figure and a boxplot of the testing
prediction error distributions by different methods is shown in Figure . It can be inferred
from Figure that
the prediction trajectory of the SDCNN-LSTM soft sensor is able to
track the real trajectory more precisely in comparison to the rest
of methods. Furthermore, Figure illustrates that the prediction errors of the proposed
soft sensor (method 5) are much smaller since the median value is
closer to zero. In addition, no exception value that exceeds the maximum
or minimum threshold is found in the boxplot of the proposed method.
Table 3
RMSEs of the Penicillin Process by
Different Methods
Boxplot
of the testing penicillin prediction errors by different
methods: (1) LSTM; (2) DCNN-LSTM; (3) NARX; (4) SLSTM; (5) SDCNN-LSTM.
Penicillin
concentration prediction results: (a) LSTM; (b) DCNN-LSTM;
(c) NARX; (d) SLSTM; (e) SDCNN-LSTM.Boxplot
of the testing penicillin prediction errors by different
methods: (1) LSTM; (2) DCNN-LSTM; (3) NARX; (4) SLSTM; (5) SDCNN-LSTM.
Debutanizer Column
The debutanizer
column is an important part of the desulfuring and naphtha splitter
plant, as shown in Figure . Propane and butane are removed in the debutanizer column
from the top, while stabilized gasoline is separated in the bottom
as well as the remaining part of butane. To obtain a good separation
effect, the butane concentration is required to be minimized in the
bottom of the debutanizer column. Several sensors are installed around
the debutanizer column, as shown by gray circles in Figure . Hence, it is necessary and
feasible to predict the butane concentration with those easy-to-measure
process variables for a further process optimization and control scheme.
Figure 12
Block
scheme of the debutanizer column.
Block
scheme of the debutanizer column.The process variables collected by the physical sensors are given
in Table , which are
utilized as x(t) values of the virtual
sensor. As was already mentioned, the butane concentration is regarded
as the quality variable y(t). Therefore,
the dimensions of x(t) and y(t) are 7 and 1, respectively.
Table 4
Process Variables of the Debutanizer
Column
variable
description
u1
top temperature
u2
top pressure
u3
reflux flow
u4
flow to next
process
u5
sixth tray temperature
u6
bottom temperature A
u7
bottom temperature B
In total, 2394 samples are collected during
the entire process,
where the numbers of training samples and testing samples are 1556
and 838, respectively. One convolutional layer and 2 LSTM layers are
constructed in this case. In the convolutional layer, the filter size
is set as [2 2] and the filter number is 30. The pool sizes of the
max pooling layer and the average pooling layer are both set as [3
3]. The numbers of hidden units in each LSTM layer is set as [80 50].
The sequence length for modeling is set as 30. The value of the gradient
threshold is 6, and the minimum batch size is 32 for the Adam algorithm.
Similar to the first case, the number of maximum epochs is selected
between 10 and 100 with an interval of 10 for both the training data
and testing data. The prediction results of the SDCNN-LSTM soft sensor
with diverse epoch numbers are provided in Figure . With reference to the curve, 50 maximum
epochs are adopted in the debutanizer column case due to the smallest
predicted RMSE of the butane concentration.
Figure 13
RMSEs of the debutanizer
column with different maximum epochs.
RMSEs of the debutanizer
column with different maximum epochs.For comparison, quality prediction was carried out on the basis
of LSTM, DCNN-LSTM, SLSTM, and the proposed SDCNN-LSTM soft sensors
for the debutanizer column. The hyperparameters were also determined
by the trial-and-error technique, as given in Table .
Table 5
Hyperparameters of
the Penicillin
Process by Different Methods
LSTM
DCNN-LSTM
NARX
SLSTM
SDCNN-LSTM
maximum epochs
50
60
100
50
50
gradient
threshold
2
2
2
2
sequence
30
30
30
30
length
input
15
delays
minimum
batch size
32
32
32
32
hidden layers
[80 50]
[80 50]
[40 30]
[80 50]
[80 50]
Table displays
the prediction results of each method. With respect to the prediction
RMSEs of the training and testing data, the proposed method shows
its merit with the smallest prediction error among all of the methods.
Meanwhile, the detailed butane concentration prediction results are
presented in Figure with curves of prediction trajectories and the real values. Intuitively,
the prediction curve of the proposed method is more accurate from
the perspective of the tight trajectory tracking. The results of the R2 calculation also indicate that the soft sensing
modeling with the proposed method can describe better correlations
in the residual space. Furthermore, a boxplot of the error distributions
of the testing data by different methods is presented in Figure , which illustrates
that the prediction result of the proposed SDCNN-LSTM soft sensor
(method 5) has the ability to contribute great effort to the accurate
prediction of the key variable with fewer large errors that exceed
the boundary of the boxplot. In conclusion, the additional DCNN layer
is able to extract the dynamic feature of the process more effectively
in comparison to the original LSTM network. In addition, the supervised
modeling framework significantly improves the prediction accuracy
of the key variable.
Table 6
RMSEs of the Debutanizer
Column by
Different Methods
Boxplot
of the testing butane prediction errors by different methods:
(1) LSTM; (2) DCNN-LSTM; (3) NARX; (4) SLSTM; (5) SDCNN-LSTM.
Butane concentration
prediction results: (a) LSTM; (b) DCNN-LSTM;
(c) NARX; (d) SLSTM; (e) SDCNN-LSTM.Boxplot
of the testing butane prediction errors by different methods:
(1) LSTM; (2) DCNN-LSTM; (3) NARX; (4) SLSTM; (5) SDCNN-LSTM.
Conclusion
In this
paper, a hybrid supervised dynamic CNN-LSTM network is
proposed to construct a soft sensor model for complex industrial processes
with nonlinear and dynamic characteristics. In comparison to the traditional
stacked LSTM network, the hybrid dynamic CNN-LSTM network is designed
to implement data augmentation by expanding the original 1D samples
into 2D feature maps, which makes the virtual sensor more efficient
to cope with strong process dynamics. Furthermore, the quality variable
is utilized as the labeled data to meet the demand of supervised modeling
and prediction. The well-established supervised dynamic CNN-LSTM network
is able to provide an accurate and reliable prediction result for
nonlinear dynamic processes. Two applications, including a penicillin
fermentation process and a debutanizer column case, were tested to
evaluate the performance of the SDCNN-LSTM-based soft sensor. The
experimental results in comparison with other soft sensor methods
provide solid evidence of the effectiveness of the SDCNN-LSTM model.
It is also noted that the determination of model parameters is crucial
to the prediction performance of deep-learning-based soft sensor models.
Therefore, future work will focus on the development of a general
parameter optimization approach with the proposed soft sensor model
to further improve the prediction performance.