Literature DB >> 36092620

Horizontal Data Augmentation Strategy for Industrial Quality Prediction.

Shiwei Gao¹, Qingsong Zhang¹, Ran Tian¹, Zhongyu Ma¹, Xiaochao Dang¹.

Abstract

In recent years, neural network-based soft sensor technology has been widely used in industrial production processes and has excellent optimization, monitoring, and quality prediction performance. This paper proposes a horizontal data augmentation strategy to provide highly available data for subsequent prediction models, called the combined autoencoder data augmentation (CADA) strategy. This paper has developed a CADA-based convolutional neural network (CADA-CNN) soft sensor model and applied it to the process of industrial debutanizer and industrial steam volume. In terms of method validation, this paper compares the output data of the proposed CADA by the Spearman correlation coefficient to verify the strategy's feasibility. Then, the output data of the CADA strategy is fed into the artificial neural network (NN), support vector regression (SVR), and convolutional neural network (CNN) for comparison experiments. The final experimental results show that our proposed CADA-CNN model has lower prediction error and better prediction error distribution.

Entities: Chemical

Year: 2022 PMID： 36092620 PMCID： PMC9453794 DOI： 10.1021/acsomega.2c01747

Source DB: PubMed Journal: ACS Omega ISSN： 2470-1343

Introduction

In the industrial production process, to better control the product quality, various advanced control, optimization, and monitoring technologies are widely used.[1,2] Applying these advanced technologies often requires many instruments and relies on real-time feedback on product quality.[3] However, some analytical tools have the characteristics of long sampling periods and high latency. Therefore, in complex industrial production processes, product quality data often require a high cost to be measured, including time cost, labor cost, and capital cost.[4] Soft sensor technology is considered a substitute for traditional analytical instruments due to its rapid response, low maintenance cost, and simple operation.[5] It can provide predictive estimates of key variables by building mathematical models from easily measurable auxiliary variables such as pressure, temperature, and flow. Soft sensors can generally be classified into two main categories: mechanism-based modeling and data-driven modeling. Mechanism-based modeling is based on a deep understanding of the process mechanism through macroscopic or microscopic equilibrium equations to determine the mathematical relationships between key variables and easily measurable auxiliary variables.[6,7] This modeling method has high requirements for modelers, and the modeling process is time-consuming and difficult to maintain. With the rapid development of computer technology, data-driven modeling methods are being used more and more extensively in industrial production processes.[8,9] These modeling approaches use to process data exclusively without considering its physical meaning, and the modeling is simple and easy to maintain. Typical data-driven modeling approaches include multivariate analysis, statistical theory, and neural network modeling. The development of artificial neural network algorithms has been hot in recent years, and this approach is also widely used in soft sensor modeling. For example, artificial neural network (NN) and support vector regression (SVR), which are used extensively as baseline methods;[10,11] deep belief networks (DBN), which build a joint probability distribution between data and labels;[11,12] autoencoder networks (AE), which use input data for supervision to guide the network in learning mapping relationships;[2−4,6,13] long- and short-term memory networks (LSTM), which can “remember” and can be applied to time series;[1,14−17] and convolutional neural networks (CNN), which is based on visual principles and pays more attention to local features.[18−23] For soft sensor modeling, neural networks extract useful features from many easily accessible auxiliary variables and then build a model between the key variables and the extracted features for prediction. With the development of artificial neural networks for many years, the improvements and applications in various research directions can demonstrate their excellent feature representation capabilities.[10] Typically, the abundant data collected in the process plant are high-dimensional with strong correlations and high redundancy, which is also known as data-rich but information-poor.[2] The ability to represent the features of a neural network comes from the data. Therefore, a large amount of representative data is essential to capture the hidden characteristics of the data and the characteristics of the data distribution. Although the process auxiliary variables are easily accessible, acquiring key variables is still costly.[1] Data augmentation is an effective strategy that can not only create data samples for model training but also help to improve the generalization ability of the model.[3] This paper proposes and applies a combined autoencoder data augmentation (CADA) strategy to soft sensor modeling. On the one side, this paper uses the proven nonlinear autoregressive moving average model to expand the dataset with historical data. On the other hand, this paper uses the autoencoder network to perform initial feature extraction on the data. It regards the extracted features as the dataset’s coarse screening features and uses them to enhance the data features. Then, the data obtained by the two methods are combined and used as sample input data for the subsequent regression prediction model. Instead of generating a new virtual sample, this paper expanded the data features through the adaptive combination of the two methods based on the original data, which helps express more valuable data features in subsequent regression predictions. In the regression analysis phase, the CNN model extracts high-value features from the input data and adds key variables to the top layer of the network to fine-tune the entire CNN network. This paper adopts a structural adaptive approach to discuss the feasibility of the CADA strategy. This paper built the complete CADA-CNN soft sensor model and compared the experiments with artificial neural network (NN) and support vector regression (SVR) regression models. The results demonstrate that our proposed CADA-CNN model has a lower prediction error and better prediction error distribution than the comparable models. In this paper, our main contributions are summarized as follows. This paper proposes a combined autoencoder data augmentation (CADA) strategy, a generic framework, and a preliminary exploration is carried out in this paper. In this paper, three models of the CADA strategy are built and the feasibility of the strategy is explored by conducting a correlation analysis. In this paper, a CADA-CNN soft sensor model is designed based on the proposed CADA strategy and the hyperparameters in the model are experimentally analyzed. The rest of this article is structured as follows. Section shows the related working studies of the proposed method and how the combined method works. Section provides a detailed description of the combined autoencoder data augmentation strategy and the overall process of the soft sensor model under this strategy. Then, Section presents results and a discussion on the process debutanizer unit and the process steam volume to show the effectiveness of the proposed strategy. In Section , the main work of the paper is summarized, and an outlook for future research is provided.

Related Work

Neural network models require large amounts of data to support them, which is expensive and time-consuming for many applications to obtain. Therefore, this paper focuses on finding data augmentation strategies that combine with the current research hotspots. Our goal is to find more efficient data augmentation strategies and provide high-quality data for subsequent regression prediction models, and this is a data preprocessing process. Guided by extensive expert experience, this paper proposes a combined autoencoder data augmentation strategy for soft sensor modeling. Our proposed strategy is related to two aspects of the research literature: First, this paper investigated widely used data augmentation methods and their application to soft sensor modeling. Second, this paper investigates methods for updating and improving the autoencoder neural network and how to use it for modeling.

Data Augmentation

Data augmentation is a simple and effective strategy that provides a large representative sample of data for effective model learning but also helps to improve the generalization ability of the model.[3] In general, data augmentation methods can be divided into horizontal and vertical augmentation in terms of the distribution of the augmented data. If the data containing the auxiliary variables and the corresponding labeled variables is considered a complete piece of data, the vertical augmentation of the data can be seen as increasing the number of entries in the dataset. For example, graphic image processing[24−27] may generate new data by flipping, cropping, and adding noise. These methods are considered to help improve the generalization of the model. In regression analysis, such as predicting the weather, industrial product quality forecasting, and soft sensor modeling prediction, data augmentation is typically performed using generative adversarial networks (GAN)[28−31] and linear interpolation methods.[3,32] The horizontal augmentation of data can be seen as expanding the number of attributes for each piece of data while maintaining the current data size. For example, the horizontal dimension of the data is raised to a larger extent using an autoregressive moving average model.[33] The methods mentioned above are often dataset-dependent and are realized by trial and error under the guidance of much expert knowledge.[3] The horizontal and vertical data augmentation methods described above can be seen as mutations or redistributions of local data, thus reducing the model’s sensitivity to small changes to improve the model’s generalization ability. However, mutations can introduce foreign features not inherent in the dataset, and redistribution may change the original distribution of features in the data. Furthermore, this corruption is persistent and can misguide feature extraction when passed between layers of the model, which may eventually lead to a weakened feature representation of the model. Thus, we propose a combined autoencoder data augmentation (CADA) strategy, hoping to use global features extracted by neural networks to alleviate the above problem. We need to find a base data augmentation method according to the following conditions to validate our proposed CADA strategy. First, we need to find a data augmentation method within the soft sensor field as a baseline method; Second, the baseline method must be rigorously proven; Third, the baseline method must be validated over a long period by a multiliterature study. In the course of the thesis research, we found that the nonlinear autoregressive moving average model met the above requirements. The specific rationales are as follows: First, the method was rigorously proven. L. Fortuna et al.[33] used a nonlinear autoregressive moving average model for data augmentation on the debutanizer column dataset in 2005 and provided rigorous proof of their proposed nonlinear fourth-order model. Second, the method has a long history of extended research. In 2018, Yuan et al.[2] proposed a novel variable-wise weighted stacked autoencoder (VWSAE) model based on this method and experimentally verified the superior performance of the model. In 2019, Zhou et al.[4] proposed a stacked quality-driven autoencoder approach based on this method to construct a high-performance soft sensor model and experimentally verified that the model has better prediction results. In 2020, Ren et al.[17] proposed a supervised long short-term memory network based on this method to capture hidden features in dynamic data and experimentally verify the effectiveness of the network. Generating adversarial networks is a promising approach to data augmentation that uses games between generators and discriminators to generate highly credible data. It can be seen as a vertical augmentation method to raise the number of data entries. The CADA strategy proposed in this paper is a horizontal augmentation method, which increases the attribute columns of the data while maintaining the original amount of data. Hence, the vertical expansion methods in refs (3) and (31) are not discussed in this paper. The nonlinear autoregressive moving average model is used to fit the real input/output data,[33] and the model output can be expressed aswhere y(K) is the current system output estimation, y(k – i) is a generic lagged sample of the system output, and u(k – j) is a lagged sample of the i-th system input. The maximum output delay of the model is assumed to be n, and ni represents the i-th maximum delayed input. The unknown function F(·) is the regression analysis function, and only the proven fourth-order model is extracted as the baseline method in this paper, so the regression function is not discussed. The specific use of this fourth-order model is described in the case studies in Section .

Autoencoder Neural Network (AE)

The autoencoder is an unsupervised learning model based on a backpropagation algorithm with optimization methods.[2,3] The single autoencoder is a three-layer network structure as in Figure , with an input layer on the left, a hidden layer in the middle, and an output layer on the right. The whole network model can be divided into two parts: the encoding part and the decoding part. This network model’s encoding and decoding parts are symmetrical, i.e., the number of nodes in the input layer is equal to that in the output layer. The middle hidden layer can be a single layer or multiple layers. When there are numerous hidden layers, they can be considered various AEs stacking to form a stacked autoencoder. The autoencoder uses the input data as supervision to guide the neural network to learn a mapping relationship that reconstructs the output R.

Figure 1

Autoencoder (AE) neural network model diagram.

Autoencoder (AE) neural network model diagram. The AE model has some sparsity and can complete the automatic selection of data features and the automatic completion of the dimensionality reduction process, thus forcing the neural network to learn high-value features. As shown in Figure , the encoding process of AE is from the input layer to the hidden layer, where the high-dimensional input data x is encoded into the low-dimensional hidden variable h through the nonlinear mapping function f(·)where W is the weight matrix and b is the bias vector. The decoding process of AE is a process from the hidden layer to the output layer, reflecting the hidden layer data through the inverse mapping function g and reconstructing the input data x̃ in the output layerwhere W̃ and b̃ are the corresponding weight matrices and bias vectors in the decoding process. The objective of the model is to minimize the reconstruction error, i.e., the error between the input data x and the output data x̃ so that more high-value features are retained in the parameter set θ = {W, W̃, b, b̃}. Denote the raw observed input dataset as x ∈{x1, x2,...,, x}, To obtain the parameter set θ, the reconstruction error can be minimized by calculating the loss function asThe AE network forces the hidden layer to extract high-value features through extraction and reconstruction operations. Subsequent regression prediction models can directly use these extracted features.[2,4,6] Hence, the extracted high-value features can be considered globally relevant and do not destroy the feature distribution of the original data.

Soft Sensor Modeling

This section will detail the proposed combined autoencoder data augmentation strategy and the complete soft sensor modeling steps. Our introduction will be divided into the following two aspects: first, we introduce the combined autoencoder data augmentation strategy and its internal modes of structural adaptation and present a validation method for the strategy. Second, we introduce the modeling process of the soft sensor modeling and the evaluation metrics of the model.

Combined Autoencoder Data Augmentation (CADA) Strategy

The main idea of this paper is derived from ref (3): the data enhancement approach aims to provide highly representative training data for subsequent regression models. And in refs (2) and (4), we learn that autoencoder networks have the characteristics of automatic compression and forced extraction of high-value features. Therefore, this paper attempts to use a traditional nonlinear autoregressive moving average model combined with an autoencoder to find a data augmentation method with higher performance gains. Our proposed CADA strategy is a preliminary exploration of an adaptive combination of the two methods. So in this paper, we explore three modes, one original mode (the baseline mode) and two other research modes (the structural adaptive comparison modes), the specific mode flow diagram shown in Figure . Mode 1 uses a fourth-order nonlinear autoregressive moving average model, demonstrated in ref (33). We have embedded this method in the CADA strategy and used it as our baseline model for comparison purposes. In mode 2, we used an AE network to perform coarse feature extraction from the raw data. We combined the output of the hidden layer with the expanded data from the fourth-order nonlinear autoregressive moving average model of mode 1. In mode 3, we first use mode 1 to expand the raw data and then use the AE network to perform coarse extraction of features on the expanded data. After the calculation is completed, the hidden layer output of the AE network is extracted and combined with the expanded data of mode 1. As the CADA strategy is horizontal in this paper, the “connection” in modes 2 and 3 is to expand the data to a higher number of columns. The two methods are reusable in the CADA strategy, and all exist in a single model. Only the input and output interfaces of the data need to be adjusted between the different modes. The details of the data flow are shown in Figure .

Figure 2

Flowchart of the three different modes in the CADA strategy (mode 1 is the baseline mode, and modes 2 and 3 are the research modes with different structural assignments for the two methods).

Flowchart of the three different modes in the CADA strategy (mode 1 is the baseline mode, and modes 2 and 3 are the research modes with different structural assignments for the two methods). In this paper, Spearman’s rank correlation coefficient is used to verify the feasibility of the CADA strategy. In this paper, the correlation coefficient is an indication of the direction of correlation between the auxiliary variable X and the key variable Y. When X increases and Y tends to increase, the Spearman correlation coefficient is positive; when X increases and Y tends to decrease, the Spearman correlation coefficient is negative. In particular, when the Spearman correlation coefficient is zero, indicating no convergence of Y as X increases, the Spearman correlation coefficient increases in absolute value as X and Y get closer to a complete monotonic correlation. The Spearman correlation coefficient is defined as the Pearson correlation coefficient between rank variables. For a sample with a capacity of n rows and m columns in this paper, the correlation coefficient for the m data columns is

CADA-CNN Soft Sensor Model

The soft sensor model in this paper is divided into two stages. The first stage is the data augmentation stage, where we augment the data using the CADA strategy. The second stage is the regression prediction stage. We use a convolutional neural network (CNN) that focuses more on local features to perform the regression prediction of features, as the features are augmented for local data in the first stage. Therefore, the complete soft sensor model is called the CADA-CNN model. Intuitively, Figure shows the CADA-CNN soft sensor model diagram.

Figure 3

Diagram of CADA-CNN soft sensor model.

Diagram of CADA-CNN soft sensor model. This paper shows the specific CNN network structure in the regression analysis stage in Figure . In this stage, we set up three convolutional layers, interspersed with a pooling layer in the second and third convolutional layers, and finally used a fully connected neural network for the predictive representation of the features and to obtain the predicted output in the output layer. The specific algorithmic flow of the CNN network is shown in Table .

Table 1

Convolutional Neural Network Algorithm Flow

algorithm: convolution regression
input: Output of CADA stage X(DA), key variables Y
output: key variables for prediction Y_pred
1:	parameter setting: batch size, epochs, learning rate.
2:	loss function: mean absolute error (MAE).
3:	optimizers: Adam.
4:	conv parameter setting: kernel size, padding, activation function.
5:	initial weight.
6:	repeat:
7:	loss (MAE) ←
8:	weight ← updated parameters by gradient descent
9:	until: convergence of weight

In this paper, the three modes of CADA strategy are modeled, respectively, and the modeling process is shown in Figure , with the following modeling steps.

Figure 4

Flowchart of CADA-CNN soft sensor modeling.

Step 1: The auxiliary variable selection, collection, and preprocessing. Step 2: Determine train and test datasets. Step 3: The autoencoder network in the CADA strategy is pre-trained, the number of iterations and learning rate of this network is determined, and the feasibility of the CADA strategy is verified by correlation analysis. Step 4: Pre-training the CADA-CNN model and determining the learning rate of the CNN network in this model. Step 5: The CADA-CNN soft sensor model is trained according to the hyperparameters determined in steps 3 and 4. Step 6: Fine-tune the overall network and modify the network parameters slightly. Step 7: Testing the test set and evaluating the performance of the soft sensor model. Flowchart of CADA-CNN soft sensor modeling. This paper uses the three model indicators used in refs (2−4) to evaluate the model. Mean absolute error (MAE) is defined asRoot mean square error (RMSE) defined asR-square (R2) is defined as

Results and Discussion

This section performs a comparative ablation study of CADA strategies using a debutanizer column and an industrial steam volume dataset. We will describe and analyze the following four aspects. First, we introduce the dataset used for this case study and its associated variables. Second, we present the usage of the baseline method identified in this paper and the model structure parameters of the neural network used. Third, we experimentally set the hyperparameters of the AE network in the CADA strategy and performed a correlation analysis on the output data. Fourth, we experimentally determine the hyperparameters of the CADA-CNN soft sensor model and analyze the model’s index scores and prediction results on the test set.

Debutanizer Column

Separating crude oil is a very complex and important refining process in the petroleum industry. The debutanizer column is an important industrial refinery furnace for separating liquefied petroleum gas and stabilized light hydrocarbons, mainly for desulphurization and naphtha splitting. The flowchart of the debutanizer column is shown in Figure . To ensure product quality, the butane content at the bottom of the debutanizer column must be minimized. As a result, the real-time measurement of the butane content in the column is the key point for the accurate control of the refinery process. As a result, the real-time measurement of the butane content in the column is the key point for the accurate control of the refinery process. However, the concentration of C4, which can reflect the butane content, cannot be measured directly but requires continuous measurement and analysis of the subsequent overheads of the deisopentane tower with the aid of a gas chromatograph.

Figure 5

Debutanizer column flowchart.

Debutanizer column flowchart. In summary, the gas chromatograph has a serious delay in measuring butane content, and the equipment is expensive to maintain, which cannot guarantee the real-time control of the refinery process. To alleviate these problems, soft sensor technology, which is easy to operate and low maintenance, predicts the C4 content. The seven points in Figure are the data collection points for the auxiliary variables, and Table describes the auxiliary and key variables.

Table 2

Variable Description for the Debutanizer Column

input variables	variable description
u₁	top temperature
u₂	top pressure
u₃	reflux flow
u₄	flow to next process
u₅	6th tray temperature
u₆	bottom temperature A
u₇	bottom temperature B
_y	butane content

Baseline Method and Model Structural Parameters

In this subsection, we present the following two aspects. First, we present the specific operation of the determined baseline method, i.e., the fourth-order nonlinear autoregressive moving average model, on the debutanizer column dataset. Second, we present the model structure parameters of the two neural networks in the proposed CADA-CNN model, namely, the autoencoder and the convolutional neural network. There are seven auxiliary variables and one key variable in the debutanizer column dataset. The dataset is expanded according to the proven fourth-order nonlinear autoregressive moving average model using historical data for the u5 attribute and the key variable y. The specific data expansion is shown in the augmentation matrix (9).[33] A total of 2390 data samples are collected in this process, of which 1000 samples are used as the training dataset and the remaining samples as the test dataset.In this paper, three modes are set up in the CADA strategy, where mode 1 uses data augmentation such as the augmentation matrix (9), and in modes 2 and 3, the autoencoder network (AE) is used. Therefore, we need to configure the AE network structure, which is referenced in ref (2) and set to [13 8 3]. Since there are 13 variables in the augmented variable vector of the data after the fourth-order nonlinear autoregressive moving average model, the number of neurons in the input layer of AE is 13. The high-value features extracted from the hidden layer of the AE network were expanded into the data vector and the data were passed into the CNN network as k × k, thus setting the middle hidden layer neurons of the AE network to three. In the regression analysis stage, the structure of the CNN network is shown in Figure . The Adam optimizer is used for optimization, the loss function is set to MAE, the convolutional kernel size is 2 × 2, the padding method is the same, and the relu function is used as the activation function.

CADA Parameter Determination and Correlation Analysis

In this subsection, we present the following two aspects. First, we experimentally determine the hyperparameters for the CADA stage. Second, we perform a correlation analysis of the output data from the CADA stage. In the CADA strategy, both mode 2 and mode 3 use the autoencoder network, so we need to experimentally limit the number of iterations and learning rate of the autoencoder network. In exploring the number of iterations, we refer to the setting in ref (2) and set the learning rate tentatively at 0.01 (this learning rate will be experimentally validated subsequently), with 2000 iterations on mode 2 and mode 3, respectively, whose network loss varies with the number of iterations as shown in Figure . In Figure , as can be seen, the pattern of loss change is the same for both modes. The loss of the autoencoder network stopped decreasing after nearly 1000 iterations, so we set the number of iterations for each mode in the CADA stage at 1000.

Figure 6

CADA stage, mode 2, and mode 3 loss variation diagram.

CADA stage, mode 2, and mode 3 loss variation diagram. We tentatively set the learning rate at 0.01 and experimentally determined the number of iterations to be 1000 when exploring the variation pattern of the number of iterations versus loss. Therefore, seven sets of experiments are conducted to set the learning rate. Respectively, set the learning rate (lr) to {0.001,0.005,0.01,0.05,0.1,0.5,1}, the relationship between the learning rate, loss, and iterations is shown in Figure . As seen in Figure , the loss of both mode 2 and mode 3 decreases smoothly as the number of iterations increases when the learning rate is 0.001 and 0.005. As the learning rate continues to rise, the loss of mode 2 and mode 3 fluctuate as the number of iterations rises. Hence, we can determine that the change in loss is close to a critical state at a learning rate of around 0.005. Meanwhile, to reduce the fluctuations during multiple independent experiments, we selected the learning rate of the CADA stage as 0.001.

Figure 7

CADA stage, loss, iterations, and learning rate variation diagram.

CADA stage, loss, iterations, and learning rate variation diagram. To compare the correlation of the data constructed by the three modes in the CADA strategy more intuitively, we numbered the data in the three modes. The numbering description table is shown in Table . The data columns numbered 1–7 are the raw data columns, those numbered 1–5 and 8–15 are the data columns outputted by mode 1, those numbered 1–5 and 8–18 are the data columns outputted by mode 2, and those numbered 1–5 and 8–15 and 19–21 are the data columns outputted by mode 3. The key variable y data column was used to calculate the Spearman correlation coefficient with the original seven attribute columns in the dataset and the output data columns of the three modes. The Spearman correlation coefficient calculation results are shown in Table and Figure . The correlation coefficients calculated for the data columns numbered 1–15 are constant. In contrast, the data columns numbered 16–21 are calculated from the high-value features extracted by the AE network and will change each time. Therefore, we use the network settings determined from the above experiments to ensure the stability of the AE network, set the number of iterations to 1000 and the learning rate to 0.001, and repeat 20 times to calculate its mean value.

Table 3

Description of Data Column Numbers

data description	number
raw data	1–7
mode 1 output	1–5,8–15
mode 2 output	1–5,8–18
mode 3 output	1–5,8–15,19–21

Table 4

ρ for the Output Data of the Three Modes in the CADA Strategy

number	ρ	number	ρ
1	0.068678652	12	0.996632657
2	0.21090934	13	0.987065435
3	0.248085121	14	0.971603143
4	0.149349171	15	0.950779782
5	0.21023562	16	0.096137048
6	0.064929623	17	0.128862912
7	0.043576496	18	0.074797045
8	0.24673177	19	0.471020202
9	0.286921231	20	0.820837668
10	0.330141462	21	0.790777659
11	0.053040193

Figure 8

Histogram of ρ for each data column (the maximum ρ values for each part of the legend are marked in the figure).

Histogram of ρ for each data column (the maximum ρ values for each part of the legend are marked in the figure). As shown in Table and Figure , the correlation coefficients of the raw data are low. However, after the fourth-order nonlinear moving average method of mode 1, the expanded data columns have high correlation coefficients, as shown in the data columns numbered 9,10,12,13,14,15, respectively. The high-value features extracted by the AE network also have similarly high correlation coefficients, as shown in the data columns numbered 20,21 respectively. The data numbered 16–18 are the high-value features extracted from the AE network in mode 2. The results show that mode 2 has a lower correlation coefficient than mode 3, and mode 3 has a higher correlation coefficient than some of the data in mode 1. Our proposed CADA strategy can significantly expand the data columns with a higher correlation on the base method, thus demonstrating the strategy’s feasibility. In this subsection, we present the following three aspects. First, the hyperparameters of the CADA-CNN soft sensor model are determined. Second, the experimental results of the model and the scores of the model evaluation indicators are analyzed. Third, the prediction error of the model is analyzed. In this paper, we use 1000 data as the training set and the remaining data as the test set, so we set the batch size to 50 and the epochs to 20 by referring to the setting in ref (4). We conducted five groups of experiments for the CNN regression network to determine the size of a learning rate of {0.001,0.003,0.005,0.008,0.01}. The variation of its learning rate and MAE loss with increasing epochs is shown in Figure .

Figure 9

Plot of CADA-CNN model learning rate and MAE loss with epoch.

Plot of CADA-CNN model learning rate and MAE loss with epoch. As shown in Figure , there is a substantial decrease in loss during the first three epochs of the experiment and a slight decrease in loss during subsequent training. The loss in the first epoch of the model decreases when the learning rate decreases. In addition, the smaller the learning rate, the smaller the MAE loss when training is completed with 20 epochs. Therefore, to minimize the training error, we set the learning rate of the CNN regression network in the CADA-CNN model to 0.001. This paper uses the parameters described above to build the CADA-CNN soft sensor model and conduct experiments. In which we use the base regressors as used in ref (2) for comparison tests in the regression analysis stage, which are multilayer artificial neural networks (NN) with the structure of [13 10 7 4 1]2, support vector regression (SVR). And two citation comparison models are used, VWSAE-NN2 and SQAE-NN4. The complete experimental indicator scores are shown in Table .

Table 5

Results of CADA-CNN Model Metrics

CADA	model	MAE	RMSE	R²
AE-only	NN	0.0705	0.0910	0.7781
	SVR	0.0741	0.1053	0.7022
	CNN	0.0562	0.0791	0.8323
mode 1 (baseline mode)	NN	0.0259	0.0491	0.9321
	SVR	0.0519	0.0656	0.8846
	CNN	0.0284	0.0421	0.9478
	VWSAE-NN²	0.0277	0.0379	0.9444
	SQAE-NN⁴	0.0220	0.0303	0.9646
mode 2	NN	0.0350	0.0646	0.8764
	SVR	0.0468	0.0649	0.8869
	CNN	0.0318	0.0471	0.9404
mode 3	NN	0.0267	0.0449	0.9434
	SVR	0.0433	0.0599	0.9035
	CNN	0.0273	0.0361	0.9651

From the evaluation metrics in Table , as can be seen, in mode 3, the MAE, RMSE, and R2 metrics of the CADA-CNN model outperformed the comparison model VWSAE-NN. The MAE and RMSE metrics of the CADA-CNN model are slightly higher due to the different data selection and less improvement, but the R2 metric is better than that of the SQAE-NN model. Overall, the CADA-CNN model outperformed mode 2 and mode 1 (baseline mode) under mode 3. As shown in Table and Figure , this result indirectly illustrates the lower correlation coefficients calculated for the high-value features extracted by the AE neural network in mode 2 and the higher correlation coefficients in mode 3. As can be seen from the results of the ablation experiments only involving autoencoders in Table , the AE-only experimental metrics are inferior and cannot be compared to the better models available. To provide a more intuitive understanding of the prediction results of the soft sensor model, we extracted the prediction results of the regression model as a CNN for each of the three modes, which is represented in Figure .

Figure 10

Graph of prediction results versus true values for the CADA-CNN model in three modes: (a) comparison of prediction results for all test data and (b) comparison of prediction results for test data number 70 to 120. From Figure a, we can see that the prediction results under the three modes of the CADA strategy are significantly different. In mode 1, the prediction curve for the data augmentation mode using the fourth-order nonlinear autoregressive moving average model is in the middle of the three modes. However, the prediction curves for mode 2 and mode 1 are essentially the same, but in some regions, such as around the data point with the test set number in [70,120] in Figure b, the prediction curve for mode 2 is lower than that for mode 1. Possible reasons for this occurrence are fluctuations in the model when predicting particular data points, inadequate support of feature data, etc. As can be seen in Figure a,b, mode 3, i.e., after the expansion of mode 1 and then using the AE network for coarse feature extraction, and combining the outputs of the two methods, has a high degree of fit with the real value, which also reflects that mode 3 has a high score among the various evaluation indicators obtained in Table . From Figure , it is only possible to see whether the predictions fit the real value, so we calculated the error between the predicted and real value for each mode in the CADA-CNN model, which is derived from the difference between the predicted and real value. The detailed prediction errors for each mode are shown in Figure .

Figure 11

Error area chart of the predicted and true value of CADA-CNN model in three modes.

Error area chart of the predicted and true value of CADA-CNN model in three modes. Figure presents the difference between the predicted and true values using an area plot. The area chart is bounded by the prediction error curve, using the area between the curve and the zero axis to represent the magnitude of the error value and the fluctuations in the prediction. From Figure , we can visualize that in mode 1, the prediction error is between [−0.1,0.2], and the experiment in this mode serves as our baseline. In mode 2, there is a significant fluctuation in the prediction error, which expands to a range between [−0.2,0.35]. In this mode, the prediction error decreases for most of the data points in the test set. Still, it increases significantly around some particular points, such as the data point with the test set number [70,120]. In mode 3, the range of the prediction error is further reduced to [−0.1,0.15], and the prediction error in the entire test set is significantly reduced compared to baseline. The prediction results are statistically presented to reflect the prediction error distribution of the CADA-CNN model under the three modes. The complete histogram of the prediction error distribution is shown in Figure . As can be seen from the error distribution curve in Figure , the error distribution of the baseline method, i.e., the CADA-CNN model under mode 1, is biased to the right of the zero labels, indicating an uneven error distribution. The prediction error distributions for modes 2 and 3 are not skewed and are evenly distributed around the zero labels. Meanwhile, the sharper the error distribution curve, the more concentrated the distribution. In Figure , the error distribution curve for mode 2 is flatter than that for mode 3, which means that mode 2 has a larger prediction error than mode 3. It also shows that the CADA-CNN model has a better prediction error distribution under mode 3.

Figure 12

Histogram of error distribution statistics for the CADA-CNN model in the three modes (the bars in the figure are the number of error statistics in that range, and the curves in the figure indicate the distribution of that error).

Industrial Steam Volume

Thermal power generation uses the released heat energy when fuel is burned to heat the water in the boiler to produce steam. The steam is accumulated in a special pressure tank and is used to drive the turbine. As a result, the turbine rotates the generator for electricity production. The flowchart of thermal power generation is shown in Figure . In this process, the energy conversion efficiency of the boiler is the key to the efficiency of electricity generation. In other words, the transformation efficiency of the fuel is realized when the fuel is burned to heat the water in the boiler and to produce high temperature and pressure steam. The factors affecting the energy transfer of this process are complex, including the boiler’s adjustable parameters, such as fuel charge, ventilation air volume, boiler water volume, and boiler operating conditions, such as boiler bed temperature, bed pressure, furnace chamber temperature, pressure, etc.

Figure 13

Thermal power flowchart.

Thermal power flowchart. There are 38 auxiliary variables and 1 key variable in the data. A total of 2884 data samples are collected, of which 2500 samples are used as training data and the rest as test data. In this experiment, we focus on testing the effectiveness of the CADA strategy and the performance of each model. Therefore, we use the same parameter configuration as in the previous experiments. The specific data expansion for the baseline model is shown in the augmentation matrix (10).[33] In addition, we should adjust the number of output neurons of the AE network to 5 to facilitate the integration of data from modes 2 and 3. The complete experimental indicator scores are shown in Table .

Table 6

Results of CADA-CNN Model Metrics

CADA	model	MAE	RMSE	R²
AE-only	NN	0.0560	0.0835	0.8110
	SVR	0.0670	0.0906	0.7775
	CNN	0.0614	0.0854	0.8021
mode 1 (baseline mode)	NN	0.0618	0.0872	0.7953
	SVR	0.0597	0.0845	0.8078
	CNN	0.0634	0.0878	0.7915
mode 2	NN	0.0552	0.0802	0.8271
	SVR	0.0560	0.0795	0.8299
	CNN	0.0562	0.0792	0.8312
mode 3	NN	0.0528	0.0766	0.8421
	SVR	0.0569	0.0794	0.8302
	CNN	0.0528	0.0739	0.8530

From the evaluation metrics in Table , the trends in the overall experimental results are the same as the previous debutanizer column experiments. The results for mode 3 were all better than the other ablation experiments, the results for AE-only and mode 1 were essentially the same, and the results for mode 2 were slightly better than the former two baselines. Since the experimental results on both datasets trended the same, we did not extract and analyze the industrial steam volume experiment results.

Conclusions

This paper discusses the feasibility of the data augmentation strategy, which combines the autoencoder network with the nonlinear autoregressive moving average model. Meanwhile, a CADA-CNN soft sensor model is designed, and the effectiveness of the strategy and model is validated by experiments in an industrial process debutanizer column and ablation testing of CADA strategies on an industrial steam volume dataset. The experimental results show that our proposed CADA strategy has a large improvement in the prediction performance of the subsequent regression model. The proposed CADA-CNN model has a smaller prediction error and a better error distribution at mode 3. In this paper, subject to several requirements mentioned in the paper, our proposed CADA strategy is only combined with the proven fourth-order nonlinear autoregressive moving average model, which may be combined more effectively with other methods. Moreover, in this paper, we only use the autoencoder network, and there may be more efficient networks to replace its position. The strategy validated in this paper also offers the possibility of further exploration in different areas. For example, the CADA strategy could be useful in classification problems, where autoencoder networks have many research applications.

5 in total

1. Online cement clinker quality monitoring: A soft sensor model based on multivariate time series analysis and CNN.

Authors: Yantao Zhao; Bochuan Ding; Yuling Zhang; Liming Yang; Xiaochen Hao
Journal: ISA Trans Date: 2021-02-03 Impact factor: 5.468

Review 2. Data augmentation for deep-learning-based electroencephalography.

Authors: Elnaz Lashgari; Dehua Liang; Uri Maoz
Journal: J Neurosci Methods Date: 2020-07-31 Impact factor: 2.390

3. A Layer-Wise Data Augmentation Strategy for Deep Learning Networks and Its Soft Sensor Application in an Industrial Hydrocracking Process.

Authors: Xiaofeng Yuan; Chen Ou; Yalin Wang; Chunhua Yang; Weihua Gui
Journal: IEEE Trans Neural Netw Learn Syst Date: 2021-08-03 Impact factor: 10.451

4. Text Data Augmentation for Deep Learning.

Authors: Connor Shorten; Taghi M Khoshgoftaar; Borko Furht
Journal: J Big Data Date: 2021-07-19

5 in total