Literature DB >> 34056237

Output-Related and -Unrelated Fault Monitoring with an Improvement Prototype Knockoff Filter and Feature Selection Based on Laplacian Eigen Maps and Sparse Regression.

Abstract

In the process industry, fault monitoring related to output is an important step to ensure product quality and improve economic benefits. In order to distinguish the influence of input variables on the output more accurately, this paper introduces a subalgorithm of fault-unrelated block partition into the prototype knockoff filter (PKF) algorithm for its improvement. The improved PKF algorithm can divide the input data into three blocks: fault-unrelated block, output-related block, and output-unrelated block. Removing the data of fault-unrelated blocks can greatly reduce the difficulty of fault monitoring. This paper proposes a feature selection based on the Laplacian Eigen maps and sparse regression algorithm for output-unrelated blocks. The algorithm has the ability to detect faults caused by variables with small contribution to variance and proves the descent of the algorithm from a theoretical point of view. The output relation block is monitored by the Broyden-Fletcher-Goldfarb-Shanno method. Finally, the effectiveness of the proposed fault detection method is verified by the recognized Eastman process data in Tennessee.

Entities: Chemical Disease Gene

Year: 2021 PMID： 34056237 PMCID： PMC8153765 DOI： 10.1021/acsomega.1c00506

Source DB: PubMed Journal: ACS Omega ISSN： 2470-1343

Introduction

Fault monitoring is the key to ensure the long-term stable operation of industrial production processes. Among various fault detection methods, data-driven fault detection in the production process has attracted much attention. Partial least squares (PLS)[1] and principal component analysis (PCA) are widely used. Many algorithms are derived from the basic algorithm.[2−9] Based on the impact on the product output, faults can be divided into output-related faults and output-unrelated faults. Output-related fault means that when some components of input variables deviate from the normal value range for some reason, the values of the corresponding output variables are affected and deviate from the normal value range. On the contrary, output-unrelated fault means that when some components of input variables deviate from the normal value range, the output variables are not affected and remain in the normal value range. There are many output-related fault-monitoring methods. The basic method is to extract the low-dimensional load matrix which can represent the data information from the input data matrix and to make the orthogonal projection of the load matrix to the direction of the output variable to get the output-related information. The problem is that each load vector is a linear combination of all input data. Even though we can monitor the occurrence of faults, it will be very difficult to locate specific variables for the faults. Therefore, this paper puts forward the idea of dividing the input data into blocks, which is not based on the industrial process but based on the influence of the input data matrix on faults and product output. The prototype knockoff filter (PKF) algorithm[10] divides the input variables into output-related and -unrelated blocks, which is not enough, because when a certain fault occurs, it generally does not affect the values of all input variables. The values of some variables do not fluctuate before and after the fault, which cannot provide any useful information on fault monitoring and brings difficulties to numerical solutions. After confirming the occurrence of the fault, the next fault diagnosis work should determine the position of the fault, and the fewer the variables are selected, the better. In this paper, an improved PKF (IPKF) algorithm is proposed, which divides the input data matrix into three blocks: fault-unrelated block, output-related block, and output-unrelated block. The flowchart of the IPKF-BFGS-FLMSR method proposed in this paper is shown in Figure .

Figure 1

Flowchart of the proposed IPKF-BFGS-FLMSR approach.

Flowchart of the proposed IPKF-BFGS-FLMSR approach. The PLS method establishes the monitoring model by maximizing covariance between input variables and output variables. By projecting the input and output variables into the corresponding low-dimensional space, the orthogonal eigenvectors of the input and output variables can be obtained. The feature vector emphasizes the interpretation of input variables to output variables, which not only reduces the dimension of variables but also eliminates useless noise. From this feature, it is easy to see that PLS has a weak ability to detect the faults which are unrelated to output. When the input data contain orthogonal information, it will have a negative impact on the predicted output. In addition, the PLS algorithm is easily disturbed by singular points, so it is not suitable for nonlinear regression. Here, we want to keep the advantages of the PLS method and improve its shortcomings. The PLS algorithm belongs to the conjugate direction method in the optimization method. The BFGS method is a quasi-Newton algorithm independently proposed by Broyden, Fletcher, Goldfarb, and Shanno in 1970 to solve unconstrained optimization problems. The BFGS method is also a conjugate direction method. The BFGS algorithm not only satisfies the property of the conjugate direction but also is the quasi-Newton algorithm with the best numerical stability so far. The BFGS algorithm not only has the advantages of the conjugate direction method but also has the advantages of the quasi-Newton method: fast convergence and little iteration. The BFGS algorithm has global convergence and superlinear convergence speed. Therefore, the BFGS algorithm is widely used in various research fields, for example, to generate high-fidelity harmonics,[11] to classify files,[12] and to train weights in neural networks.[13] Because the BFGS algorithm has more advantages than the PLS algorithm, this paper uses the BFGS algorithm for the fault detection of output-related blocks to improve the efficiency of fault detection. The PCA method selects principal components for data reduction according to the order of variance from large to small, and variables with little variance contribution basically do not affect principal components. Therefore, when faults occur in variables with little variance contribution, it is difficult for PCA to detect faults, and the detection efficiency will be very low. Based on this phenomenon, an important dimension reduction method which is called feature selection appears. The feature selection method can remove redundant features from original data and process a large number of high-dimensional data in a short time.[14] The advantage of feature selection is that it can identify the representative features of the original data set and make further calculations easier; subspaces can be classified to enhance the robustness of the algorithm to noise[15] and overfitting of calculation can be prevented.[16] The most interest has been arisen in many scholars because of the advantages of feature selection methods. At present, many methods of feature selection have been created.[15−19] In addition, feature selection methods are being widely used, such as biomedical,[20] commodity recommendation and safety monitoring,[21] speech recognition,[22] text mining,[23] and so on. In the meantime, the feature selection method is divided into three categories: supervised,[24] semisupervised,[16,25,26] and unsupervised.[27−29] Among them, the supervised feature selection method is of high accuracy,[30] and the semisupervised feature selection method can be used when the information is incomplete and only a part of the label information is available. The unsupervised feature selection method cannot provide prior information, so it requires large scales of computation. Inspired by the above methods, this paper proposes a new supervised feature selection method for data dimension reduction (FLMSR). In this method, the influence of variables with small variance contribution on principal components is improved by adding weights, such that faults caused by these variables can be monitored in time. A joint framework of feature selection is introduced into the objective function in the method. The framework integrates the local geometric features in the data retained by Laplacian Eigen maps, and the idea of the linear discriminant analysis (LDA) algorithm can increase the distance between data groups, reduce the distance within data groups, and improve the data recognition ability and L2,1-norm regularization. Furthermore, the updating rules of the algorithm and the proof of its descent are given. The FLMSR model is used to monitor the faults of output-unrelated blocks; simulation experiments show that the algorithm in this paper can overcome the disadvantage that the PCA algorithm has a high false alarm rate for faults caused by variables with little variance contribution and has stronger fault detection capability. The main contributions of this paper are as follows: The PKF algorithm is improved, and the idea of dividing variables into three blocks is put forward to reduce the difficulty of fault location. FLMSR is proposed to reduce the dimension of data. The method integrates Laplacian Eigen maps, LDA algorithm, and L2,1-norm regularization; a brand-new feature selection framework is formed, which not only preserves the geometric structure of data but also improves the recognition ability of data. The updating rules and descent proof of the algorithm are given. The quasi-Newton algorithm BFGS with the best numerical stability is directly applied to fault detection for the first time. The rest of this paper is as follows. In Section , the PKF algorithm is briefly reviewed. The problem description, algorithm, and optimization process are also explained in this section. Section studies the convergence of the algorithm. Section applies the proposed IPKF-BFGS-FLMSR method to the operation evaluation of TEP and verifies the effectiveness of the proposed method. The fifth part gives the conclusion and outlines the future research work.

Proposed Methodology

Review

The knockoff filter algorithm is quoted in order to divide the input data into disjoint blocks and detect faults, respectively, in this paper. The knockoff filter is a method where a variable is selected by controlling the error discovery rate.[31] After the proposal, furthermore, the author applied it to high-dimensional data.[32] In the same year, sparse regression and marginal testing were used to create cluster prototypes.[33] Then, the knockoff filter for grouping selection was proposed.[10] The basic idea of the knockoff filter is to construct a knockoff matrix with the same dimension and a covariance matrix as the original data matrix according to some rules, and then, a large augmented matrix is formed together with the original matrix. The output variables are subjected to sparse regression on the augmented matrix, and variables with nonzero regression coefficients corresponding to the original data matrix are output-related variables defined in this paper, and the remaining variables are output-unrelated variables. The reason for grouping selection is that the original data columns have strong correlation and cannot fit the requirements of no singularity. Therefore, the original data columns are grouped to ensure strong correlation within groups and weak correlation between groups. Then, group representatives are selected for each group, and the knockoff filter is run on the data matrix composed of group representatives. If the group representatives are output-related, then the whole group is output-related variables, and other variables are output-unrelated variables.

Output-Related and -Unrelated Fault Monitoring Based on IPKF-BFGS-FLMSR

There is no change in the values of some variables before and after the failure, and any useful information cannot be provided for fault detection, but numerical obstacles are brought to fault detection. Therefore, the following algorithm is proposed to select this part of the variables in this paper (see Table ). Here, we use Pauta Criterion in statistics to judge the fault data.

Table 1

Create Fault-Unrelated Block Subalgorithm

algorithm: create fault-unrelated block
1. Input normal data X₀ ∈ R^N×m and one fault data as X₁ ∈ R^n×m, where N, n, and m represent the number of normal samples, fault samples, and variables, respectively. Column mean vector μ and column standard deviation vector σ of normal data X₀.
2. Remember that the number of data whose value is not in the interval (μ(i) – 3σ(i), μ(i) + 3σ(i)) (i = 1, 2, ..., m) in the ith column of fault data X₁ is s(i), and all s(i) constitutes a vector s.
3. The vector s component smaller than k(k ≪ n) is defined as a fault-unrelated variable, and the subscript of the fault-unrelated variable is put into the fault-unrelated block C.

As for the data with the fault-unrelated variable deleted, variables with an impact on output or without are only concerned. If the remaining variables are divided into output-related and output-unrelated parts, the subsequent fault location would be easier. Combined with the previous create fault-unrelated block subalgorithm and the knockoff filter algorithm, the following IPKF algorithm is given (see Table ), which can divide the input variables corresponding to the original data into three disjoint blocks, which are fault-unrelated block, output-related block, and output-unrelated block.

Table 2

IPKF Algorithm

algorithm: improvement prototype knockoff filters
1. Input normal data X₀ and fault data X₁.
2. Run the create fault-unrelated block subalgorithm to get invariant variable subscript block C₃ and the remaining variable subscript block C;
3. Run the PKF algorithm on the fault data corresponding to block C;
4. Blocks C₁ and C₂ are determined by the regression coefficients obtained in the third step. C₁ is the subscript block of output-related variable components, and C₂ is the subscript block of output-unrelated variable components.

Because of the different types of faults and ways of dividing blocks, the following fault detection model is given for the ith fault (see Table ).

Table 3

Offline Submodel

model: offline submodel of ith fault
1. Input normal data X₀ and fault data as X_i ∈ R^n×m of type i fault together as training data, where n and m represent the number of fault samples and variables, respectively.
2. Use prototype knockoff filters for data X_i^′ after removing invariant variables. The variables are further divided into an output-related part C_i1 and an output-unrelated part C_i2. For convenience, we write them as blocks C_i1, C_i2, and C_i3. The corresponding data blocks of corresponding data X₀ and X_i are denoted as X₀₁, X₀₂, X₀₃, X_i1, X_i2, and X_i3.
3. (1) Use the BFGS algorithm on the data block composed of X₀₁ and X_i1 to get P_Bi which is the loading matrix of X₀₁ and the threshold J_th,_T²,Bi, J_th,SPE,Bi of the corresponding statistic T_Bi², SPE_Bi.
(2) Use the FLMSR algorithm on the data block composed of X₀₂ and X_i2 to get P_Fi which is the loading matrix of X₀₂ and the threshold J_th,T²,Fi, J_th,SPE,Fi of the corresponding statistic T_Fi2, SPE_Fi.

Here, the calculation of statistics T2 and SPE used in fault detection is given by the following formula When a significance level of α is given, the threshold value of statistical information is given by the following formulaHere, n is the number of sampled samples and l is the number of remaining principal components.Here, g = ρ2/2μ, h = 2μ2/ρ2 and μ and ρ2 are the mean and variance of the statistics of the sample, respectively. Next, aiming at the existing fault data block, run the offline submodel, collect the way of block partition, load matrix and fault control line, and establish a general fault database following steps are given for online monitoring (see Table ).

Table 4

Total Offline Model

model: total offline model
Assuming that there are k fault data blocks, establish a fault database:
For i = 1:k
Execute offline submodel on fault data X_i to obtain data C_i1, C_i2, C_i3, P_Bi, P_Fi, J_th,T²,Bi, J_th,SPE,Bi, J_th,T²,Fi, J_th,SPE,Fi of recorded in fault database.
End

During online monitoring, create a fault-unrelated block for the newly generated sample points and compare it with the fault-unrelated blocks in the fault library. Determine the fault type to which it belongs. Furthermore, determine whether the value of its statistics exceeds the control line of the corresponding statistics and judge whether a fault occurs. If the block obtained is different from any known block of fault-unrelated blocks, it means that a new fault occurs, and it is necessary to collect fault data and update the fault database. The following steps are given for online monitoring (see Table ).

Table 5

Online Monitoring Process

model: online monitoring process
1. For each new observation x^new, statistics x_i^new belongs to subscript block C_new of interval (μ(i) – 3σ(i), μ(i) + 3σ(i)) (i = 1, 2, ..., m). Compare C_new with C_i3 in the fault data block to judge the fault data type i to which x^new belongs.
2. Using the load matrix P_Bi and P_Fi of the corresponding ith fault, calculate four statistics and then compare them with the corresponding threshold J_th,T²,Bi, J_th,SPE,Bi, J_th,T²,Fi, J_th,SPE,Fi.
(1) If > J_th,T²,Bi, then an output-related fault occurs, and the fault location appears in the variables contained in C_i1. If spe_B,new > J_th,SPE,Bi, an output-unrelated fault occurs, and the fault location appears in the variables contained in C_i1.
(2) If one of the two inequalities > J_th,T²,Fi, spe_F,new > J_th,SPE,Fi becomes true, then an output-unrelated fault occurs and the fault location appears in the variables contained in C_i2.
3. If it is judged in step 1 that some variables of x^new are out of the value range, but they are not consistent with the existing fault types, the fault data can be further collected to form a new fault data matrix, and the offline submodel can be executed to update the fault database.

In order to illustrate the effectiveness of the algorithm, the following three evaluation indicators are introduced in the paper.Here, FDR represents the monitoring result of fault samples, FAR represents the monitoring result of normal samples, and SDR represents the monitoring result of all samples. The three evaluation indices are used to evaluate the performance of the algorithm. In the process of running the above algorithm, the BFGS algorithm should be applied to the fault detection of output-related variables, but it cannot be applied to fault detection directly by itself. Therefore, the BFGS algorithm is improved and introduced for fault diagnosis in the paper. The objective function constructed from the input matrix X ∈ R and the output matrix Y ∈ R is as follows. Gradient of objective function f(b) is as follows. Mark g = ∇f(b) = XXb – XY (see Table ). The detailed steps of the BFGS algorithm in fault detection are given in Table .

Table 6

BFGS Algorithm for Fault Diagnosis

algorithm: BFGS
1. Point out the initial point b₀ ∈ R^l, the initial quasi-Newton matrix H₀ ∈ R^l×l, and the termination limit ε > 0, k ≔ 0.
2. If ∥g_k∥ ≤ ε, stop and output the optimal solution b_k.
3. Calculate d_k = −H_kg_k.
4. Calculate w_k = d_k/∥d_k∥, score vector t_k = Xw_k, and load vector p_k = X^Tt_k/t_kTt_k, where t_k is the kth column of T_B and p_k is the kth column of P_B;
5. Find the step factor α_k by linear search and make b_k+1 = b_k + α_kd_k;
6. Correct H_k to produce H_k+1, .
Make quasi-Newton condition H_k+1y_k = s_k hold, where y_k = g_k+1 – g_k,s_k = b_k+1 – b_k;
₄. k ≔ k + 1, return to step 2.

For a newly measured sample data xnew ∈ R. Calculate the value of new sample point statistics , where and statistics speB,new = ∥xnew(I – PBPBT)∥. Therefore, we can compare the value of statistics with the corresponding control limit to judge whether there is a fault. For the unrelated part of the output in the paper, a new FLMSR (feature selection based on Laplacian Eigen maps and sparse regression) method is proposed. A joint framework of feature selection into the objective function is introduced in the method. It combines the local geometric features among Laplacian Eigen maps data and ideas of the LDA algorithm, increases the distance between data groups, reduces the distance within groups, improves the ability of data recognition and L2,1-norm regularization term, and increases the sparsity of variables. Therefore, the FLMSR algorithm is introduced as follows. As the known condition that X ∈ R is normal data, and Y ∈ R2 is data synthesized by normal data and fault data according to the ratio of 1:1 and unitized by the mean and variance of normal data, the constraint minimization problem is defined as follows Here, C = I – (1/n)1, 1 = (1, 1, ..., 1). The regular term tr(WY(Lw – Lb)YW) in the objective function can keep the geometrical structure of the data and enhances the discriminative capability. Here, α is a positive regularization parameter. We introduce the L2,1-norm regularization β∥W∥21 to 10 to enforce the sparsity among rows of the transformation matrix W. The Gaussian similarity matrix of sample Y is defined as, where Y and Y are the ith and jth row of matrix Y, respectively. Then, the intraclass similarity matrix is and the interclass similarity matrix is . It is proposed that Using the external penalty function method, we write the augmented objective function of eq as followsHere, μ is the penalty factor (μ is big enough that the algorithm converges) and ϕ is the Lagrange multiplier for constraining W to be nonnegative. We consider the following situations Considering W to be fixed, we can rewrite eq as a function of P. The first and second derivatives of L1 with respect to P are as follows Define Here, P is the kth column of matrix P. Diagonal matrix D is composed of D as diagonal elements. The auxiliary function of creating F1(P) with diagonal matrix D is The update rule generated by is as followsHere, and w is the ith row of matrix W. With P and U fixed, calculate the first and second derivatives of F2(W) with respect to W as follows Considering P to be fixed, we can rewrite eq as a function of W. Define the following by the second derivative. Here, w is the kth column of matrix W. Diagonal matrix D1 is composed of D1 as the diagonal element. The auxiliary function of creating F2(W)with diagonal matrix D1 is The update rule generated by is as follows The following FLMSR algorithm is given by the above analysis process (see Table ).

Table 7

FLMSR Algorithm

algorithm: FLMSR
1. Input data X ∈ R^n×t, Y ∈ R^2n₁×t parameters α, β, σ, and μ are assigned with an initial value, maximum iteration number K, and iteration number counter k = 1.
2. Assign an initial random matrix between (0, 1) to the matrices P and W and calculate L_w, L_b, and U.
3. Update the P matrix.
4. Update the W matrix.

5. Update , judging that if the maximum iteration number K is reached, then stop iteration, otherwise, block k = k + 1 and return to step 3.

The load matrix obtained by the FLMSR algorithm above is denoted as PF. For a newly measured sample data xnew ∈ R. Calculate the value of new sample point statistics , where and statistics speF,new = ∥xnew(I – PF)∥. We compare the value of statistics with the corresponding control limit to judge whether there is a fault.

Convergence of FLMSR

The convergence of the FLMSR algorithm is discussed as follows.

Theorem 1

Minimization problem[12] is nonincreasing under the update rules

Proof

According to the definition and the construction method of the auxiliary function in ref (34), it is concluded that the function G(P, P)defined above is the auxiliary function of function F1(P). According to ref (34), Lemma 1 finds the minimum for the auxiliary function G(P, P), and the generated update rule is obtained as follows. The update rule satisfies in output L1(P) ≤ G(P, P) ≤ G(P, P) = L1(P). According to equations L(P, W) = L1(P) and L(P, W) = L1(P), in output L(P, W) ≤ L(P, W) can be obtained.

Theorem 2

Minimization problem[16] is nonincreasing under the update rules Similar to the proof of theorem 1, the function G1(W, W) is an auxiliary function of the function F2(W). Find the minimum for the auxiliary function G1(W, W), and the generated update rule is obtained as follows. The update rule satisfies in output According to equations and , in output L(P, W) ≤ L(P, W) can be obtained. In output L(P, W) ≤ L(P, W) ≤ L(P, W) is obtained by theorems 1 and 2, which show that the algorithm proposed in the paper is a descent iterative algorithm.

Results and Discussion

The TEP was first developed by Downs and Vogel (J. J. Downs 1993) and has become a benchmark platform for process monitoring and diagnosis technique validation, which contains five major parts (i.e., a reactor, a vapor–liquid separator, a product condenser, a recycle compressor, and a product stripper). Furthermore, there are 41 measured variables, that is, XMEAS (1–41), and 12 manipulated variables, that is, XMV (1–12), in the TE process. The simulation data blocks in this paper are widely recognized in process monitoring. By using the preprogrammed faults (faults 1–21), 21 testing blocks were generated. Fault 0 (with no faults) was generated under NOC. The testing data block for each fault contains 960 observations. Each data block starts with no faults, and the faults come across after the 160th sample. For process monitoring, 41 measurable variables XMEAS (1-41) and 11 manipulated variables XMV (1–11) are selected to construct the past data vector. Among them, the process measurements XMEAS (1–34, 36–41) and the manipulated variable XMV (1–11) together constitute the input matrix X, and XMEAS (35) indicates the product quality of ingredient G, which represents the output matrix Y. The following selection criteria of simulation experiment parameters are as follows: the FLMSR, PLS, and PCA. Taking the number of principal components according to the cumulative contribution rate of 80%, the number of iterations of BFGS is 80% of the number of input data columns, and KICA retains 12 independent components. The false alarm rates are maintained at the same level (99%) comparing FAR and FDR in different circumstances throughout the study. First, the performances in fault detection of KICA, IPKF-PLS-PCA, and IPKF-BFGS-FMLSR are judged by distinguishing whether the fault occurred has an influence on the output or not. The simulation results of the faults IDV (5), IDV (10), and IDV (14) by KICA, IPKF-PLS-PCA, and IPKF-BFGS-FMLSR are shown in Figures –4, respectively.

Figure 2

Fault detection results of IDV (5) by three methods. (a) Fault detection on Y, (b) KICA, (c) IPKF-PLS-PCA, and (d) IPKF-BFGS-FMLSR.

Figure 4

Fault detection results of IPKF-BFGS-FMLSR for IDV (14). (a) Fault detection on Y and (b) IPKF-BFGS-FMLSR.

Fault detection results of IDV (5) by three methods. (a) Fault detection on Y, (b) KICA, (c) IPKF-PLS-PCA, and (d) IPKF-BFGS-FMLSR. Fault detection results of IDV (10) by three methods. (a) Fault detection on Y, (b) KICA, (c) IPKF-PLS-PCA, and (d) IPKF-BFGS-FMLSR. Fault detection results of IPKF-BFGS-FMLSR for IDV (14). (a) Fault detection on Y and (b) IPKF-BFGS-FMLSR. IDV (5) is a step change of condenser cooling water inlet temperature. Figure a shows the influence on the output caused by fault IDV (5), which takes effect from the 161st sample. It will return to a steady state after the 480th sample. It can be seen from Figure b,c that IDV (5) has a slight influence on the output, and the fault disappears after a period of time. The result of the BFGS block in Figure d indicates that IDV (5) has an influence on the output only in a very limited time. The result of the FLMSR block indicates that the TE process is still in the abnormal operation state defined by IDV (5), even if the output is not affected. The IPKF-BFGS-FLMSR method proposed in this paper is the only method to provide correct monitoring for IDV (5). IDV (10) occurs in the random disturbance of C feed temperature of stream 2. It can be seen from Figure a that IDV (10) is a fault with a slightly related output. Figure b–d describes the detection effects of three methods on IDV (10). It can be seen from the figure that only the FLMSR algorithm can detect faults near 161 sample points in time. Compared with other methods, the fault diagnosis rate is obviously improved. It further proves that IPKF-BFGS-FMLSR has good fault detection capability.

Figure 3

Fault detection results of IDV (10) by three methods. (a) Fault detection on Y, (b) KICA, (c) IPKF-PLS-PCA, and (d) IPKF-BFGS-FMLSR.

IDV (14) is the reactor cooling water valve sticking. Figure a shows that IDV (19) is an output-unrelated fault. The fault detection of IDV (14) was carried out using IPKF-BFGS-FMLSR, and it is found that the output-related block is empty. After the fault-unrelated block is removed, all variables become output-unrelated variables, and its monitoring results are shown in Figure b. It can be seen that there is no false report in the IPKF-BFGS-FMLSR method, and the fault diagnosis rate can be 100%. Therefore, the IPKF-BFGS-FMLSR method can track the output-unrelated faults much better, which further proves that the IPKF-BFGS-FMLSR method can identify the output-related and output-unrelated faults correctly. IDV (19) is an unknown fault. Figure a shows that IDV (19) is an output-unrelated fault. In order to verify the monitoring effect of the FLMSR algorithm on the faults of different sizes, we take the average value of normal data subtracted from fault data as the fault size and construct monitoring data of different fault sizes by adjusting the fault size. Figure b,c,d shows the monitoring results of the original fault data, reducing the fault to half of the original fault, and enlarging the fault to 1.5 times of the original fault, respectively. It can be seen from the results that FLMSR is highly sensitive to the size of faults, and all fault data can be monitored if the faults are slightly enlarged. The fault monitoring effect is very good. Figure e,f shows the monitoring results of original fault data and 5 times fault data when the PCA algorithm acts on the same data block. It can be seen from the figure that although the amplifying fault has a certain effect on fault detection, it is far less sensitive to the fault size than the FLMSR algorithm.

Figure 5

Fault detection results of IPKF-BFGS-FMLSR and PCA for IDV (19). (a) Fault detection on Y, (b) 0.5 times the fault, (c) 1.0 times the fault, (d) 1.5 times the fault, (e) 1.0 times the fault, and (f) 5 times the fault. Table lists the monitoring effects of different methods. The first 21 lines reflect the fault detection rate, and line 22 reflects the average fault detection rate. Here, strictly according to whether the output exceeds the Pauta Criterion control line and whether the fault is output-related is defined. The first block in the table is the output-related fault. From the corresponding columns of BFGS and PLS algorithms in this block, it can be seen that the detection rate of the BFGS algorithm is higher than that of the PLS algorithm except IDV (10). In the whole first block, except IDV (7–8) and IDV(21), the highest monitoring rate is given by the FLMSR algorithm proposed in this paper. The second block is the output-unrelated fault. According to the corresponding columns of BFGS and PLS algorithms in this block, it can be seen that the fault detection rate of the FLMSR algorithm proposed in this paper is higher than that of the PCA algorithm when the cumulative contribution rate is 80%. Except IDV (3) and IDV(9), the highest monitoring rate is given by the FLMSR algorithm proposed in this paper. The KICA algorithm is a very good algorithm for fault detection. The fault detection rate of IPKF-BFGS-FLMSR proposed in this paper is better than that of the KICA algorithm in most cases. In addition, obviously, the lowest false alarm rate of IPKF-BFGS-FLMSR can be observed in the last row of the table. Generally speaking, by comparing the fault detection rate and false alarm rate, IPKF-BFGS-FLMSR is found to be superior to the other two methods.

Table 8

FDRs of the 21 Faults in the TE Benchmark (%)

	IPKF-BFGS-FLMSR				IPKF-PLS-PCA				KICA
Fault	T_B²	SPE_B	T_F²	SPE_F	T_pls²	SPE_pls	T_pca²	SPE_pca	T²	SPE
1	36.25	8.25	100	99.50	36.50	36.25	99.25	99.75	99.63	99.86
2	98.25	95.25	98.75	98.75	69.63	81.75	98.75	97.00	98.63	98.75
5	23.25	12.13	100	34.50	19.00	19.00	23.63	21.50	22.13	26.88
6	99.00	98.75	100	100	94.75	94.75	99.13	100	100	100
7	100	97.63	43.63	46.00	32.50	32.50	44.75	24.00	100	100
8	97.75	89.88	97.75	96.88	80.75	80.75	96.25	75.38	99.25	98.38
10	23.86	1.38	90.25	63.13	30.13	30.13	31.63	34.38	63.00	86.88
12	70.75	83.75	96.88	94.50	67.88	67.88	93.25	64.75	94.88	95.88
13	95.25	90.63	96.25	95.88	83.00	83.00	94.16	88.63	95.50	95.38
16	10.13	2.63	90.25	15.88	6.13	6.13	6.88	19.75	24.88	19.38
17	15.63	0.50	97.38	92.00	13.50	13.00	80.88	94.75	82.75	96.00
18	89.25	23.75	90.88	90.25	87.88	88.25	89.16	90.25	24.00	25.25
20	17.88	2.00	91.25	67.25	1.25	1.25	45.63	47.63	44.00	57.88
21	52.50	32.50	4.16	1.00	23.63	23.63	16.75	1.75	45.75	39.38
3	1.63	1.38	5.63	3.00	1.88	1.88	1.16	1.63	13.50	0.38
4	3.36	2.50	100	99.88	9.13	9.38	100	10.38	69.25	100
9	0.63	0.88	6.25	0.75	0.875	0.88	0.75	1.25	19.13	20.50
11	5.00	0.50	82.25	69.38	19.38	17.25	53.88	66.25	46.63	77.13
14	0	0	100	100	0	0	100	99.88	99.38	100
15	25.63	1.00	15.00	8.75	24.25	24.25	10.25	1.50	2.75	3.50
19	0	0	89.38	19.63	0	0	21.63	8.63	18.13	51.88
AVG	41.24	30.73	76.00	61.76	33.43	33.90	57.51	49.95	60.15	66.35
AVG-FAR	0.744	0.804	0.595	0.655	0.863	0.833	0.714	0.863	5.149	2.708

We further compare the SDR of the IPKF-BFGS-FLMSR method with that of the KICA method in Figure . The SDR of the IPKF-BFGS-FLMSR method is obtained by taking the maximum SDR of BFGS and FLMSR. For 21 faults, the performance of the method based on IPKF-BFGS-FLMSR is better than that based on KICA (except fault 9 and fault 12). The fault detection performance of fault 9 and fault 12 is very close, but there is still a slight difference. Taking fault 5 as an example, the SDR of the IPKF-BFGS-FLMSR method is 0.998 while that of KICA is 0.406, which shows that the SDR of the IPKF-BFGS-FLMSR method is 2.46 times higher than that of KICA. Taking fault 18 as an example, the SDR of the IPKF-BFGS-FLMSR method is 0.926 while that of KICA is 0.383, which shows that the SDR of the IPKF-BFGS-FLMSR method is also 2.4 times higher than that of KICA. The average SDR of the IPKF-BFGS-FLMSR method is 0.845 while that of KICA is 0.689, which shows that the average SDR of the IPKF-BFGS-FLMSR method is 1.2 times higher than that of KICA. This shows that FLMSR method has stronger data recognition ability. SDR results further illustrate the superior performance of the proposed method based on IPKF-BFGS-FLMSR in detecting process faults.

Figure 6

IPKF-BFGS-FLMSR and KICA are used for the SDR histogram of detection results of all 21 faults.

Conclusions

In this paper, a new output-related and -unrelated monitoring scheme based on modeling strategies of the IPKF algorithm, BFGS algorithm, and FLMSR algorithm is proposed. The proposed modeling program first uses IPKF to divide the input variables into fault-related blocks, output-related blocks, and output-unrelated blocks to obtain output-related and -unrelated changes, respectively. The output-related blocks are monitored by the BFGS algorithm with better numerical stability, and the output-unrelated blocks are monitored by the FLMSR algorithm proposed in this paper. With TEP data, the IPKF-BFGS-FLMSR method, the IPKF-PLS-PCA method, and the most advanced KICA method are compared. The results show that the IPKF-BFGS-FLMSR method proposed in this paper can achieve better fault-monitoring results in output-related and -unrelated fault monitoring. When faults occur in the industrial process, the algorithm can greatly reduce the fault range. It is not necessary to consider all variables and only need to detect output-related blocks or output-unrelated blocks to find the fault root. However, the specific fault location needs further study.

8 in total

Output-Related and -Unrelated Fault Monitoring with an Improvement Prototype Knockoff Filter and Feature Selection Based on Laplacian Eigen Maps and Sparse Regression.

Introduction

Proposed Methodology

Review

Output-Related and -Unrelated Fault Monitoring Based on IPKF-BFGS-FLMSR

Convergence of FLMSR

Theorem 1

Proof

Theorem 2

Results and Discussion

Conclusions

1. Discriminative semi-supervised feature selection via manifold regularization.

2. Sparse regression and marginal testing using cluster prototypes.

3. Minimum redundancy feature selection from microarray gene expression data.

4. A feature selection method for multivariate performance measures.

5. Incremental learning for ν-Support Vector Regression.

6. Nonnegative Matrix Factorization with Rank Regularization and Hard Constraint.

7. A Robust Regularization Path Algorithm for $\nu $ -Support Vector Classification.