Literature DB >> 35601320

Nonlinear Dynamic Soft Sensor Development with a Supervised Hybrid CNN-LSTM Network for Industrial Processes.

Jiaqi Zheng¹, Lianwei Ma², Yi Wu², Lingjian Ye³, Feifan Shen².

Abstract

A soft sensor is a key component when a real-time measurement is unavailable for industrial processes. Recently, soft sensor models based on deep-learning techniques have been successfully applied to complex industrial processes with nonlinear and dynamic characteristics. However, the conventional deep-learning-based methods cannot guarantee that the quality-relevant features are included in the hidden states when the modeling samples are limited. To address this issue, a supervised hybrid network based on a dynamic convolutional neural network (CNN) and a long short-term memory (LSTM) network is designed by constructing multilayer dynamic CNN-LSTM with improved structures. In each time instant, data augmentation is implemented by dynamic expansion of the original samples. Moreover, multiple supervised hidden units are trained by adding quality variables as part of the layer input to acquire a better quality-related feature learning performance. The effectiveness of the proposed soft senor development is validated through two industrial applications, including a penicillin fermentation process and a debutanizer column.

Entities: Chemical

Year: 2022 PMID： 35601320 PMCID： PMC9118388 DOI： 10.1021/acsomega.2c01108

Source DB: PubMed Journal: ACS Omega ISSN： 2470-1343

Introduction

In practical industry, it is necessary and important to explore the real-time indices of product quality to ensure the safety and efficiency of industrial processes. However, key variables indicating the quality are often difficult to measure, while other ordinary process variables can be routinely collected.[1] These quality variables need to be inspected by laboratory analyzers, which are costly and areunable to meet the demand of real-time measurement. In addition, the first-principles model is becoming extremely difficult to obtain due to the complexity of modern industries.[2−4] To alleviate the aforementioned problems, soft sensors have been developed to implement the online measurement of quality variables by constructing inherent data-driven models based on those easy-to-measure process variables. In the past decades, virtual sensing techniques have been successfully applied to many industrial processes, bringing about accurate real-time prediction results of quality variables.[5,6] Principal component analysis (PCA) and partial least squares (PLS) are typical data-driven methods to establish soft sensor models.[7−11] In order to cope with the nonlinear feature of industrial processes, several developments have been made on the basis of PCA and PLS for soft sensor modeling. A typical strategy is constructing a kernel function to establish the nonlinear mapping between the original data space and a high-dimensional space.[12,13] Therefore, a linear regression model can be constructed in the high-dimensional space, where nonlinear optimization can be avoided and the data characteristic can be captured. Other methods beyond the framework of PCA and PLS are also designed to meet the demand of quality prediction for nonlinear processes such as support vector regression (SVR),[14,15] Gaussian process regression (GPR),[16−18] and an artificial neural network (ANN).[19,20] ANN is one of the most widely virtual sensing modeling methods and can give a promising performance of nonlinear approximation and adaptive learning. In terms of the advantages, the quality prediction results by ANN for nonlinear processes with a large data scale are fairly good. Unfortunately, ANN-based soft sensor models often confront the gradient vanishing and exploding problem during the training procedure when the network structure is complicated. To resolve this limitation and improve the robustness of quality prediction for nonlinear processes, soft sensors based on deep-learning techniques have been developed in recent years.[21] A deep belief network (DBN),[22−24] a staked autoencoder (SAE),[25−27] and a convolutional neural network (CNN)[28−31] are practical tools to construct soft sensor models for nonlinear processes. In comparison to conventional soft sensor models, the deep-learning-based models can provide a better quality prediction performance for nonlinear processes. Further improvements have been made by scholars on the basis of the original deep-network structures to handle other process characteristics. For example, Sun and Ge developed a novel soft sensor model based on a gated stacked target-related autoencoder (GSTAE) by adding prediction errors of target values into the loss function when the pretraining procedure was executed, which improved the prediction performance in comparison to the conventional SAE-based soft sensor models.[32] Yuan et al. proposed a soft sensor framework with a multichannel CNN model to capture local correlations of distant process variables.[33] In addition, the dynamic nature of a process is another common and important issue for soft sensing modeling. Generally, variable trajectories of practical processes present correlations along a time index, which results in the coexistence of process nonlinearity and dynamics. To address this problem, dynamic soft sensor models have been designed to obtain an accurate prediction performance for complex processes. For example, He et al. developed a soft sensor model using a dynamic extreme learning machine (ELM) by adding a special linear hidden layer node based on the traditional ELM.[34] Lately, a novel soft sensor development using an echo state network (ESN) integrated with a singular value decomposition was proposed and applied to complex chemical processes.[35] In addition, a recurrent neural network (RNN) has also been introduced to construct nonlinear dynamic soft sensors for quality prediction.[36] Although RNN is a mainstream deep-learning model, it still suffers from the problem of gradient vanishing and exploding due to the “tanh” activation function. For an improvement of the network structure, a long short-term memory (LSTM) neural network has been developed to overcome the deficiency of RNN.[37] The long-term memory is taken into consideration for LSTM, which is able to describe the time-series model more accurately with more parameters in comparison to RNN. So far, LSTM-based soft sensors have been successfully designed and applied to different industrial processes with both nonlinear and dynamic properties.[38,39] However, soft sensor models based on the conventional RNN and LSTM structures are unsupervised, which means that the quality information may not been exploited in the hidden units. To make full use of the quality data, a soft sensor model based on a dynamic neural network named nonlinear autoregression with exogenous input (NARX) was designed.[40] Under the supervised framework, the correlations between ordinary process variables and quality variables can be extracted properly by hidden layers. When a multilayer perceptron is implemented with NARX, the quality variables are utilized as part of the model input. However, the quality variables are not directly used for the intermediate hidden layers that are not connected to the input layer. Further development can be conducted under the supervised framework. By stacking multiple layers in a hierarchical way and adding quality variables to each hidden layer through hierarchical learning, the stacked network is able to enhance the extraction of deep quality-relevant characteristics that are beneficial for quality prediction.[41] To make full use of the quality-relevant information, improved supervised soft sensor models based on deep learning have been developed.[42] Quality variables are employed as part of the layer input, where the model parameters of each hidden layer are determined by both the quality variables and process variables. The supervised soft sensor framework has been proved to be effective to deal with the quality prediction problem for nonlinear dynamic processes. Although the deep-learning-based supervised soft sensors can provide acceptable prediction performance for nonlinear dynamic processes, there are still some limitations when they face complicated practical processes. First, most dynamic soft sensor models focus on the temporal correlations of process data, where the feature of local correlations is not extracted adequately. As a consequence, effective information and potential relationships of discontinuous data may be ignored and the accuracy of soft sensor models will be influenced. Meanwhile, the sampling interval varies between different processes and the scale of modeling samples can be small. Thus, data augmentation is a necessary and important strategy to describe process characteristics thoroughly. In light of the aforementioned problems, a supervised dynamic CNN-LSTM (SDCNN-LSTM) network has been designed to construct the soft sensor model for complex industrial processes with nonlinear and dynamic features. The major procedure and contributions of the proposed method are demonstrated as follows. First, quality variables are prepared for the original unsupervised layers, where the quality-relevant features can be better captured from each hidden layer. Second, a data augmentation strategy is designed after the input layer by expanding the original one-dimensional (1D) samples into two-dimensional (2D) feature maps. Hence, the scale of modeling data is enlarged and the temporal correlations remain, which is adopted to solve the problem of data deficiency. Finally, the hybrid dynamic CNN-LSTM network is constructed on the basis of the supervised framework with data augmentation. In summary, the advantages of both CNN and LSTM networks can be used for nonlinear dynamic processes, where the data augmentation strategy and the full utilization of quality information will help to improve the accuracy of the soft sensor model. The rest of the paper is organized as follows. Section illustrates some basics of CNN and LSTM networks. Then, the detailed framework of the SDCNN-LSTM soft sensor is demonstrated in Section . Two applications are introduced to evaluate the performance of the proposed soft sensor development in Section , including a penicillin fermentation process and a debutanizer column. Finally, conclusions are made in Section .

Background

Convolutional Neural Network

CNN is a typical feed-forward neural network, as well as a multilayer representative deep-learning algorithm. The core idea of CNN is the scheme of local connection, weight sharing, and pooling. By modeling strategies based on CNN, significant features of the original data can be extracted spontaneously to implement target identification, classification, and recognition. For different CNN frameworks such as LeNet-5 and its improved form AlexNet, the particular network structures vary from one network to another. A common feature of these network structures is that both networks consist of four major layers, which are the input layer, the convolutional layer, the subsampling layer, and the output layer. For the input layer, usually a 2D data matrix is collected from the raw image or sequential data set. For the convolutional layer, the layer input is the output of the previous layer. Then, the layer input is operated by convolution kernels to form several feature maps, while the number of feature maps is equal to the convolutional kernels. The size of a convolutional kernel for 2D input can be 1D or 2D with fixed kernel weights. A simple and specific case of the convolutional operation is illustrated in Figure . It can be inferred from Figure that the dimension of the original input matrix is reduced and the convolved feature is extracted after the operation through a convolution kernel. For the convolutional operation, the height ho and width wo of the output can be denoted aswhere h and w are the height and width of the input data, respectively, h and w are the height and width of the convolutional kernel, respectively, p is the padding size, and s is the stride size.

Figure 1

Convolutional operation with a 6 × 6 input map, 3 × 3 convolution kernel, 1 stride size, and no padding.

Convolutional operation with a 6 × 6 input map, 3 × 3 convolution kernel, 1 stride size, and no padding. After the convolutional operation, the rectified liner unit (ReLU) function defined in eq is adopted as the activation function for the feature maps, which is able to remarkably improve the learning efficiency and nonlinear representation. For the subsampling layer, the pooling strategy is implemented and often works after the convolutional layer. Similarly to the convolutional layer, the feature of the local connection is extracted in the subsampling layer. Differently, the pooling rule is predefined and no extra parameters are required in the model training procedure. Figure shows two types of pooling approaches as the max pooling strategy and the average pooling strategy, which are widely used to construct the subsampling layer. Hence, the scale of the feature maps is reduced while the representative data features can be preserved.

Figure 2

Two pooling strategies with a 4 × 4 input map, 2 × 2 pooling size, and 2 stride size.

Two pooling strategies with a 4 × 4 input map, 2 × 2 pooling size, and 2 stride size. For the output layer, the feature extraction results from different channels should be concatenated into a single vector. By this means, the output layer is also called as connection layer, which establishes connections among different feature maps as well as the final model output. As a result, an activation function is also required to achieve a specific purpose such as classification and regression.

Long Short-Term Memory

A long short-term memory (LSTM) network is developed on the basis of a recurrent neural network (RNN). Although RNN has an advantage in handling dynamic processes, the gradient vanishing problem of RNN often influences the accuracy of modeling. In comparison with RNN, LSTM is able to avoid the aforementioned issue by designing the cell and gate structure. The structure of the single-layer LSTM network is demonstrated in Figure .

Figure 3

Network structure of the single-layer LSTM.

Network structure of the single-layer LSTM. Three gate structures, including the input gate i, the forget gate f, and the output gate o, are defined on the basis of the LSTM cell, which are described aswhere σ(·)denotes the sigmoid activation function that , W** are the weighting parameters and b* are the bias parameters of different gate structures, x is the model input at time index t, and h–1 is the LSTM hidden state at time instant t – 1. Then, c̃ defines what features of the cell input should be kept using the tanh function as Hence, the cell state c can be determined with the aforementioned network structures aswhere ⊙ denotes the pointwise multiplication, which indicates that the current cell state is the weighted combination of the previous cell state and the current cell input. On the basis of the current cell state, the hidden state at time instant t can be calculated as When multiple LSTM layers are constructed to form a deep neural network, the hidden state will become the cell input of the next layer. To implement the regression task, the estimated model output ŷ is usually connected with the hidden state h by the sigmoid activation function as On consideration of the particular characteristic of the sequential data, the back-propagation through time (BPTT) algorithm is usually used to train the LSTM-based network, as presented in the Appendix.

Soft Sensor Development Based on Supervised DCNN-LSTM Network

Hybrid Dynamic CNN-LSTM Network

Although CNN is a useful technique to extract the latent features of the original data and reduce the data complexity, it is usually applied to image processing such as image classification and recognition. As a widely used deep-learning algorithm, CNN is able to deal with process nonlinearity effectively. Due to this advantage, the problem of the nonlinear property of industrial processes can be automatically resolved by the CNN modeling strategy. However, a single CNN soft sensor model may neglect the time-series correlations of sequential data while the prediction task is conducted. To address the problem, the LSTM network is concatenated to the CNN layer. Different from the existing literature, which takes the 1D sequential samples as the model input of the hybrid CNN-LSTM network, a deep DCNN-LSTM structure is developed in this work. The proposed DCNN-LSTM network structure is illustrated in Figure . It can be inferred that there are two parts involved in the hybrid network. The DCNN layers are designed to extract the features of sequential data, while the LSTM layers are developed to predict the process quality. The original 1D samples are expanded to 2D feature maps through a data augmentation step. To improve the reliability of the downsampling stage, both the max pooling and average pooling strategies are adopted with a concatenating operation. After the pooling layer, a flatten layer is connected to the network to unfold the data into 1D form. Then, two LSTM layers are added after the DCNN network. In addition, both LSTM layers are followed by a dropout layer to avoid the overfitting problem during the training procedure. After the LSTM structures, a fully connected layer is designed as the weighted sum of the previous network output. Finally, a regression layer is generated as the model output.

Figure 4

Design procedure of the hybrid DCNN-LSTM network.

Design procedure of the hybrid DCNN-LSTM network. The “dynamic” characteristic of the proposed network is reflected in two aspects. First of all, a deep neural network with the LSTM structure has the capacity to extract the dynamic feature of processes, which means that a CNN-LSTM-based network is a dynamic model in essence. Meanwhile, it is noted that the model input of this network in each time instant is a 2D feature map instead of the conventional 1D sample vector for sequential data modeling. The moving window strategy is used to expand the original 1D vector x(t) = [x1(t), x2(t), ..., x(t)] to the 2D dynamic matrix aswhere n is the number of process variables and l is the expanding length of the original vectors. Thus, the model input of the DCNN-LSTM network can be denoted as {X (t – k + 1), X (t - k), ..., X (t)}, where k is the modeling length of the sequential data.

Soft Sensor with Supervised DCNN-LSTM Model

The LSTM network has been proven to be an excellent framework for the soft sensor modeling of nonlinear dynamic processes. To implement the quality prediction scheme, the values of process variables are usually collected as the model input, while the key variable that is difficult to directly measure is regarded as the model output. However, the feature of the quality variable is often ignored during the prediction process, since most of the soft sensor models are unsupervised. To overcome the aforementioned deficiency and make full use of the obtained quality data, it is necessary to construct a supervised soft sensor model, where the state of the quality variable should be exploited as part of the model input. A case of the overall network structure of the proposed SDCNN-LSTM model with one CNN layer and three LSTM layers is presented in Figure . To implement the supervised framework, the quality variables are first introduced to form the 2D feature maps with other input variables during the data augmentation procedure. Thus, the quality information is preserved in the CNN layer when local correlations of variables are extracted. Moreover, in the LSTM structure, the quality variables are utilized as part of the cell input in each LSTM unit. The LSTM network structure of this model can be modified as

Figure 5

Network structure of the SDCNN-LSTM soft sensor.

Network structure of the SDCNN-LSTM soft sensor. Therefore, the quality information is involved in the training procedure of LSTM layers, where the feature extraction performance of quality-relevant features for nonlinear dynamic processes can be improved significantly. It is worth mentioning that the current state of the quality variable is unable to be obtained during the online prediction procedure, since it is definitely the model output at the same time. To conduct the supervised modeling and real-time prediction process, an initialization step has been carried out to estimate the current output as ỹ(t) = y(t – 1). Therefore, the complete procedure of the proposed supervised DCNN-LSTM-based soft sensor framework can be summarized as Figure .

Figure 6

Flow diagram of the SDCNN-LSTM soft sensor.

Flow diagram of the SDCNN-LSTM soft sensor. According to Figure , the training stage of the proposed method can be summarized as collect the training data set {x(t),y(t)} and conduct variable-wise normalization augment the original 1D training samples to the 2D dynamic matrices determine the network structures and model hyperparameters train the SDCNN-LSTM soft sensor model with the predefined hyperparameters and the training data set Then, the prediction stage can be implemented on the basis of the trained soft sensor model as collect the testing data set {x(t),ỹ(t)}, where ỹ(T) = y(T −1) is the estimated output at time instant T conduct the data normalization step on the basis of the result of the training samples expand the original 1D testing samples to the 2D dynamic matrices predict the current quality variable ŷ(T) on the basis of the trained SDCNN-LSTM soft sensor model move to the next online prediction stage with T = T + 1 To train the proposed soft sensor model, the mean squared error (MSE) is utilized as the cost functionwhere n is the number of the training samples, i is the index of the training samples, y(i) is the actual value of the key variable, and ŷ(i) is the prediction result of the key variable. In this work, the Adam algorithm is used to minimize the cost function during the training stage. To evaluate the performance of the soft sensor model, the root-mean-squared error (RMSE) is calculated for the testing data setwhere k is the number of the testing samples, j is the time index, and y(j) and ŷ(j) are the actual and predicted values of the testing quality variables, respectively. A smaller value of RMSE will indicate that the general error of prediction is simultaneously less. In addition, the coefficient of determination R2 is also calculated for the testing data setwhere the statistical analysis of the residual space is carried out and a larger value of R2 will indicate a more accurate prediction performance. The calculation result of R2 is able to reveal the total variance in the residual space and the related information carried in the testing output. From the advantages of the original CNN-LSTM network and other dynamic soft sensor models, the proposed supervised network provides two main developments. The first improvement is the data augmentation strategy expanding the original 1D samples to the 2D feature maps, by which the problem of data deficiency can be resolved.[31] By construction of the 2D feature maps, two types of correlations are involved. One is the variablewise correlations between variables. The other is the temporal autocorrelations of variables along the time index. Thus, both local nonlinear spatial and dynamic feature hierarchies can be learned from the massive unlabeled data using local patches with convolution and pooling operators layer by layer. Therefore, the scale of the modeling data is enlarged, where both the variablewise and temporal correlations that are difficult to learn for the 1D-data-based model can now be extracted properly. Another contribution is the design of the supervised network, where the quality variables are fully used as the input of each hidden layer. In comparison to the traditional supervised dynamic networks such as NARX, the quality information is permeated into the entire network structure by the proposed model, which is able to extract more abundant quality-related information within the hidden units. The determination of the hyperparameters is conducted by trial and error. The limitation of the current work is that the selected hyperparameters may not reach the optimal values.

Results and Discussion

Penicillin Fermentation Process

The fed-batch penicillin fermentation process is a typical biochemical process with both nonlinear and dynamic characteristics, which is widely used as a benchmark platform for research on soft sensor modeling, fault diagnosis, real-time control, and production optimization of industrial processes. The flowchart of the penicillin fermentation process is presented in Figure . The penicillin fermentation process consists of three operating phases. During the preculture phase, the biomass reactants are growing for the preparation of the reaction up to the critical concentration. Then, the penicillin concentration begins to increase rapidly at the second phase, where the penicillin production rate reaches its peak. At the final stage, the production rate of penicillin decreases due to the consumption of the biomass reactants until the end of a batch.

Figure 7

Flowchart of the penicillin fermentation process.

Flowchart of the penicillin fermentation process. The PenSim v2.0 simulator developed by the research group of the Illinois Institute of Technology is widely used in many studies for performance evaluation. On the basis of the PenSim benchmark, our research group redeveloped the simulator in MATLAB/Simulink with the same kinetic model. The improved simulator allows users to customize the trajectories of manipulated variables freely, which brings about adequate flexibility. In this simulator, the penicillin concentration is regarded as the process quality and the key variable. Twelve other process variables as given in Table , including manipulated variables and state variables, are collected as the input of soft sensor models. In practical industry, it is important to predict the penicillin concentration according to these easy-to-measure variables to ensure the production safety and quality.

Table 1

Process Variables of the Penicillin Process

variable	description
x₁	aeration rate
x₂	agitator power
x₃	substrate feed rate
x₄	substrate temperature
x₅	substrate concentration
x₆	dissolved oxygen concentration
x₇	biomass concentration
x₈	culture volume
x₉	CO₂ concentration
x₁₀	pH
x₁₁	generated heat
x₁₂	cooling water flow rate

The total operation time of the penicillin fermentation process is 400 h, where the sampling interval of process variables is 1 h. Thus, 400 samples can be collected for one batch. The first 300 samples are collected as the training samples, while the remaining 100 samples are regarded as the testing samples. The trajectories of process variables are presented in Figure , where strong dynamic characteristics are involved in the process.

Figure 8

Variable trajectories of the penicillin process.

Variable trajectories of the penicillin process. According to the process variables in Table , the dimensions of x(t) and y(t) of the SDCNN-LSTM network are 12 and 1, where 1 convolutional layer and 2 LSTM layers are constructed. In the convolutional layer, the filter size is set as [2 1] and the filter number is 15. The pool sizes of the max pooling layer and the average pooling layer are both set as [3 1]. The numbers of hidden units in each LSTM layer is set as [50 20 100]. The sequence length for training and prediction is set as 10. The prediction performance of the soft sensor can vary due to different training algorithms. In this work, the Adam algorithm is adopted to train the proposed network. During the training procedure with the Adam algorithm, the value of the gradient threshold is 6 and the minimum batch size is 24. In addition, the number of maximum epochs has a great influence on the prediction result as well. The number of maximum epochs is selected as [10 20 30 40 50 60 70 80 90 100] for both the training data and testing data. The detailed prediction results of the proposed soft sensor under each epoch number are presented in Figure . It can be inferred that the best performance of quality prediction occurs under the circumstance of 50 maximum epochs, since the RMSEs of both the training data and testing data reach a low level. Hence, the number of the maximum epochs is determined to be 50 in this case.

Figure 9

RMSEs of the penicillin process with different maximum epochs.

RMSEs of the penicillin process with different maximum epochs. Although the proposed soft sensor framework provides a promising result, it is still insufficient to prove its effectiveness. Therefore, prediction results of the penicillin concentration based on the LSTM, DCNN-LSTM, NARX, and SLSTM soft sensors were carried out as comparisons, where the hyperparameters were determined by the trial-and-error technique as given in Table .

Table 2

Hyperparameters of the Penicillin Process by Different Methods

	LSTM	DCNN-LSTM	NARX	SLSTM	SDCNN-LSTM
maximum epochs	50	70	100	60	50
gradient threshold	6	6		6	6
sequence	30	30		30	30
length
input			10
delays
minimum batch size	24	24		24	24
hidden layers	[50 20 100]	[50 20 100]	[60 30]	[50 20 100]	[50 20 100]

Table shows the RMSEs of these methods with the same training data and testing data. Comparatively, the SDCNN-LSTM soft sensor provides smaller prediction errors and larger R2 values among these methods, which demonstrates the advantage of the proposed method over the existing methods. As complements, the detailed penicillin concentration prediction results of these methods are presented in Figure and a boxplot of the testing prediction error distributions by different methods is shown in Figure . It can be inferred from Figure that the prediction trajectory of the SDCNN-LSTM soft sensor is able to track the real trajectory more precisely in comparison to the rest of methods. Furthermore, Figure illustrates that the prediction errors of the proposed soft sensor (method 5) are much smaller since the median value is closer to zero. In addition, no exception value that exceeds the maximum or minimum threshold is found in the boxplot of the proposed method.

Table 3

RMSEs of the Penicillin Process by Different Methods

	LSTM	DCNN-LSTM	NARX	SLSTM	SDCNN-LSTM
RMSE	0.0573	0.0219	0.0390	0.0263	0.0122
R²	0.6772	0.8790	0.8273	0.9055	0.9761

Figure 10

Penicillin concentration prediction results: (a) LSTM; (b) DCNN-LSTM; (c) NARX; (d) SLSTM; (e) SDCNN-LSTM.

Figure 11

Boxplot of the testing penicillin prediction errors by different methods: (1) LSTM; (2) DCNN-LSTM; (3) NARX; (4) SLSTM; (5) SDCNN-LSTM.

Penicillin concentration prediction results: (a) LSTM; (b) DCNN-LSTM; (c) NARX; (d) SLSTM; (e) SDCNN-LSTM. Boxplot of the testing penicillin prediction errors by different methods: (1) LSTM; (2) DCNN-LSTM; (3) NARX; (4) SLSTM; (5) SDCNN-LSTM.

Debutanizer Column

The debutanizer column is an important part of the desulfuring and naphtha splitter plant, as shown in Figure . Propane and butane are removed in the debutanizer column from the top, while stabilized gasoline is separated in the bottom as well as the remaining part of butane. To obtain a good separation effect, the butane concentration is required to be minimized in the bottom of the debutanizer column. Several sensors are installed around the debutanizer column, as shown by gray circles in Figure . Hence, it is necessary and feasible to predict the butane concentration with those easy-to-measure process variables for a further process optimization and control scheme.

Figure 12

Block scheme of the debutanizer column.

Block scheme of the debutanizer column. The process variables collected by the physical sensors are given in Table , which are utilized as x(t) values of the virtual sensor. As was already mentioned, the butane concentration is regarded as the quality variable y(t). Therefore, the dimensions of x(t) and y(t) are 7 and 1, respectively.

Table 4

Process Variables of the Debutanizer Column

variable	description
u₁	top temperature
u₂	top pressure
u₃	reflux flow
u₄	flow to next process
u₅	sixth tray temperature
u₆	bottom temperature A
u₇	bottom temperature B

In total, 2394 samples are collected during the entire process, where the numbers of training samples and testing samples are 1556 and 838, respectively. One convolutional layer and 2 LSTM layers are constructed in this case. In the convolutional layer, the filter size is set as [2 2] and the filter number is 30. The pool sizes of the max pooling layer and the average pooling layer are both set as [3 3]. The numbers of hidden units in each LSTM layer is set as [80 50]. The sequence length for modeling is set as 30. The value of the gradient threshold is 6, and the minimum batch size is 32 for the Adam algorithm. Similar to the first case, the number of maximum epochs is selected between 10 and 100 with an interval of 10 for both the training data and testing data. The prediction results of the SDCNN-LSTM soft sensor with diverse epoch numbers are provided in Figure . With reference to the curve, 50 maximum epochs are adopted in the debutanizer column case due to the smallest predicted RMSE of the butane concentration.

Figure 13

RMSEs of the debutanizer column with different maximum epochs.

RMSEs of the debutanizer column with different maximum epochs. For comparison, quality prediction was carried out on the basis of LSTM, DCNN-LSTM, SLSTM, and the proposed SDCNN-LSTM soft sensors for the debutanizer column. The hyperparameters were also determined by the trial-and-error technique, as given in Table .

Table 5

Hyperparameters of the Penicillin Process by Different Methods

	LSTM	DCNN-LSTM	NARX	SLSTM	SDCNN-LSTM
maximum epochs	50	60	100	50	50
gradient threshold	2	2		2	2
sequence	30	30		30	30
length
input			15
delays
minimum batch size	32	32		32	32
hidden layers	[80 50]	[80 50]	[40 30]	[80 50]	[80 50]

Table displays the prediction results of each method. With respect to the prediction RMSEs of the training and testing data, the proposed method shows its merit with the smallest prediction error among all of the methods. Meanwhile, the detailed butane concentration prediction results are presented in Figure with curves of prediction trajectories and the real values. Intuitively, the prediction curve of the proposed method is more accurate from the perspective of the tight trajectory tracking. The results of the R2 calculation also indicate that the soft sensing modeling with the proposed method can describe better correlations in the residual space. Furthermore, a boxplot of the error distributions of the testing data by different methods is presented in Figure , which illustrates that the prediction result of the proposed SDCNN-LSTM soft sensor (method 5) has the ability to contribute great effort to the accurate prediction of the key variable with fewer large errors that exceed the boundary of the boxplot. In conclusion, the additional DCNN layer is able to extract the dynamic feature of the process more effectively in comparison to the original LSTM network. In addition, the supervised modeling framework significantly improves the prediction accuracy of the key variable.

Table 6

RMSEs of the Debutanizer Column by Different Methods

	LSTM	DCNN-LSTM	NARX	SLSTM	SDCNN-LSTM
RMSE	0.1907	0.1554	0.0620	0.0435	0.0376
R²	0.4770	0.5337	0.8943	0.9556	0.9610

Figure 14

Butane concentration prediction results: (a) LSTM; (b) DCNN-LSTM; (c) NARX; (d) SLSTM; (e) SDCNN-LSTM.

Figure 15

Boxplot of the testing butane prediction errors by different methods: (1) LSTM; (2) DCNN-LSTM; (3) NARX; (4) SLSTM; (5) SDCNN-LSTM.

Butane concentration prediction results: (a) LSTM; (b) DCNN-LSTM; (c) NARX; (d) SLSTM; (e) SDCNN-LSTM. Boxplot of the testing butane prediction errors by different methods: (1) LSTM; (2) DCNN-LSTM; (3) NARX; (4) SLSTM; (5) SDCNN-LSTM.

Conclusion

In this paper, a hybrid supervised dynamic CNN-LSTM network is proposed to construct a soft sensor model for complex industrial processes with nonlinear and dynamic characteristics. In comparison to the traditional stacked LSTM network, the hybrid dynamic CNN-LSTM network is designed to implement data augmentation by expanding the original 1D samples into 2D feature maps, which makes the virtual sensor more efficient to cope with strong process dynamics. Furthermore, the quality variable is utilized as the labeled data to meet the demand of supervised modeling and prediction. The well-established supervised dynamic CNN-LSTM network is able to provide an accurate and reliable prediction result for nonlinear dynamic processes. Two applications, including a penicillin fermentation process and a debutanizer column case, were tested to evaluate the performance of the SDCNN-LSTM-based soft sensor. The experimental results in comparison with other soft sensor methods provide solid evidence of the effectiveness of the SDCNN-LSTM model. It is also noted that the determination of model parameters is crucial to the prediction performance of deep-learning-based soft sensor models. Therefore, future work will focus on the development of a general parameter optimization approach with the proposed soft sensor model to further improve the prediction performance.

4 in total

1. A Layer-Wise Data Augmentation Strategy for Deep Learning Networks and Its Soft Sensor Application in an Industrial Hydrocracking Process.

Authors: Xiaofeng Yuan; Chen Ou; Yalin Wang; Chunhua Yang; Weihua Gui
Journal: IEEE Trans Neural Netw Learn Syst Date: 2021-08-03 Impact factor: 10.451

2. A novel deep learning based fault diagnosis approach for chemical process with extended deep belief network.

Authors: Yalin Wang; Zhuofu Pan; Xiaofeng Yuan; Chunhua Yang; Weihua Gui
Journal: ISA Trans Date: 2019-07-08 Impact factor: 5.468

3. Gated Stacked Target-Related Autoencoder: A Novel Deep Feature Extraction and Layerwise Ensemble Method for Industrial Soft Sensor Application.

Authors: Qingqiang Sun; Zhiqiang Ge
Journal: IEEE Trans Cybern Date: 2022-05-19 Impact factor: 11.448

4. Development of soft sensor for neural network based control of distillation column.

Authors: Asha Rani; Vijander Singh; J R P Gupta
Journal: ISA Trans Date: 2013-01-30 Impact factor: 5.468

4 in total