Chang Li1, Zhe Zhou1, Chenglin Wen2, Zuxin Li3. 1. School of Engineering, Huzhou University, Huzhou 313000, China. 2. School of Automation, Guangdong University of Petrochemical Technology, Maoming 525000, China. 3. School of Science and Engineering, Huzhou College, Huzhou 313000, China.
Abstract
Independent component analysis (ICA) is an excellent latent variables (LVs) extraction method that can maximize the non-Gaussianity between LVs to extract statistically independent latent variables and which has been widely used in multivariate statistical process monitoring (MSPM). The underlying assumption of ICA is that the observation data are composed of linear combinations of LVs that are statistically independent. However, the assumption is invalid because the observation data are always derived from the nonlinear mixture of LVs due to the nonlinear characteristic in industrial processes. Under this circumstance, the ICA-based fault detection is unable to provide accurate detection for specific faults of industrial processes. Since the observation data come from the nonlinear mixing of LVs, this makes the observation data change faster than the intrinsic LVs on the time scale. The temporal slowness can be regarded as an additional criterion in the extraction of LVs. The slow feature analysis (SFA) derived from the temporal slowness has received extensive attention and application in MSPM in recent years. Simultaneously, the temporal slowness is expected to make up for the problem that the LVs extracted by ICA have difficulty accurately describing the characteristics of the process. To solve the above problems, this work proposes to monitor non-Gaussian and nonlinear processes using the independent slow feature analysis (ISFA) that combines statistical independence and temporal slowness in extracting the LVs. When the observation data are composed of a nonlinear mixture of LVs, the extracted LVs of ISFA can describe the characteristics of the processes better than ICA, thereby improving the accuracy of fault detection for the non-Gaussian and nonlinear processes. The superiority of the proposed method is verified by a numerical example design and the Tennessee-Eastman process.
Independent component analysis (ICA) is an excellent latent variables (LVs) extraction method that can maximize the non-Gaussianity between LVs to extract statistically independent latent variables and which has been widely used in multivariate statistical process monitoring (MSPM). The underlying assumption of ICA is that the observation data are composed of linear combinations of LVs that are statistically independent. However, the assumption is invalid because the observation data are always derived from the nonlinear mixture of LVs due to the nonlinear characteristic in industrial processes. Under this circumstance, the ICA-based fault detection is unable to provide accurate detection for specific faults of industrial processes. Since the observation data come from the nonlinear mixing of LVs, this makes the observation data change faster than the intrinsic LVs on the time scale. The temporal slowness can be regarded as an additional criterion in the extraction of LVs. The slow feature analysis (SFA) derived from the temporal slowness has received extensive attention and application in MSPM in recent years. Simultaneously, the temporal slowness is expected to make up for the problem that the LVs extracted by ICA have difficulty accurately describing the characteristics of the process. To solve the above problems, this work proposes to monitor non-Gaussian and nonlinear processes using the independent slow feature analysis (ISFA) that combines statistical independence and temporal slowness in extracting the LVs. When the observation data are composed of a nonlinear mixture of LVs, the extracted LVs of ISFA can describe the characteristics of the processes better than ICA, thereby improving the accuracy of fault detection for the non-Gaussian and nonlinear processes. The superiority of the proposed method is verified by a numerical example design and the Tennessee-Eastman process.
Process failures are usually inevitable in modern industrial processes;
the occurrence of faults may affect the quality of products, the working
efficiency, and service lives of industrial equipment and even endanger
the life safety of staff in severe cases. Therefore, it is crucial
to monitor whether the process is abnormal and detect and locate the
faults in time.The data-driven multivariate statistical process
monitoring (MSPM)
approach only uses the data collected under normal operation condition
and does not need much mechanism knowledge of the process. This approach
has a strong adaptability and has attracted more attention from the
academic community, making it still retain high activity in recent
years.[1,2] With the development of sensor technology,
the measurable process variables become more diversified, which makes
the observation data have characteristics of high dimensionality,
strong correlation, nonlinearity, and non-Gaussian. Because of the
significant correlation of process variables and high redundancy of
information, the main variations in the processes are usually dominated
by a few latent variables (LVs), and the dimensions of the dominating
LVs are often far lower than the actual dimensions of the process
variables. Therefore, it is only necessary to monitor the dominating
LVs to determine whether the processes are abnormal.[3] For this reason, some typical data-driven LV models, such
as principal component analysis (PCA), have received extensive attention
and significant progress in MSPM.[4,5]Traditional
PCA assumes that the LVs are statistically uncorrelated
and follow the Gaussian distribution.[6] However,
these assumptions are usually invalid in industrial processes, such
as chemical and biochemical processes. The LVs often follow a non-Gaussian
distribution in chemical processes, and the traditional PCA-based
process monitoring method has a low fault detection rate (FDR) in
this situation. To solve this problem, independent component analysis
(ICA) was proposed to be used in MSPM. ICA can extract the LVs that
are statistically independent as far as possible from the observation
data and through monitoring the variations of LVs to determine whether
the process is faulty. Traditional PCA only uses second-order information
(mean and variance), which can only ensure that the extracted LVs
(i.e., loading vectors) are uncorrelated but not independent. Uncorrelation
is a necessary and insufficient condition for independence. ICA further
uses high-order information (i.e., skewness and kurtosis) on the basis
of PCA. According to the Central Limit Theorem, the higher the non-Gaussian
degree of the variable, the more statistically independent.[7] The same number of independent variables contain
more information than dependent variables. The assumption that ICA
requires data to follow the non-Gaussian distribution is more in line
with the actual situation of the industrial processes. Therefore,
ICA can provide more accurate monitoring results than PCA in this
situation.Another implicit assumption of ICA is that the observation
data
are linearly combined by the statistically independent LVs. In actual
industry, however, such assumptions are difficult to satisfy because
of the strong nonlinearity of the process variables. Suppose x1, x2 are two statistically
independent components and their any nonlinear functions ε1(x1) and ε2(x2) are also statistically independent. Furthermore,
a nonlinear mixture of x1 and x2, such as x1
sin(x2) or x1 cos(x2), is still statistically
independent.[8] This indicates that when
the observation data are composed of a nonlinear mixture of LVs, the
ICA may only extract the nonlinear function of LVs or the nonlinear
mixed form of multiple LVs. For example, the LVs extracted by ICA
may be ε1(x1) and ε2(x2) but not x1 and x2. To solve this problem,
it is necessary to add additional constraints besides the independence
to extract more suitable LVs for process monitoring.Since the
observation data come from the nonlinear mixing of LVs,
this makes the observation data change faster than the intrinsic LVs
on the time scale. The temporal slowness can be regarded as an additional
criterion in the extraction of LVs. Slow feature analysis (SFA) is
a novel unsupervised LVs extraction method that can extract slowly
varying LVs from temporal data[9] and has
been used in blind source separation, pattern recognition, remote
sensing, and image processing.[10−12] SFA also has been concerned and
favored by scholars in MSPM.[13−22] Shang et al. proposed to apply SFA in process monitoring, through
the analysis of experimental data, the results show that SFA can both
describe the steady state and the dynamic state of the process and
has improved the interpretation ability in terms of temporal coherence
compared to that with classical data-driven methods.[23] Shang et al. proposed combining the fault diagnosis method
based on SFA with contribution plots, which can accurately locate
the fault location and find out the variations of other LVs caused
by the fault.[24] Shang et al. provided a
recursive SFA to adaptively monitor industrial processes.[25] Zhang and Zhao applied SFA to monitoring batch
processes.[26] Zhao et al. proposed a condition-driven
data analytics and monitoring method for wide-range nonstationary
and transient continuous processes, which made full use of SFA to
extract static and dynamic temporal characteristics under different
operation conditions.[27] The temporal slowness
can be regarded as a suitable criterion for LVs extraction.Blaschke and Wiskott proposed that statistical independence can
be combined with the temporal slowness to obtain a new method of LVs
extraction, i.e., independent slow feature analysis (ISFA).[28] Sprekeler and Wiskott showed in ref (29) that the eigenvalue equation
of the optimal function of SFA can be decomposed into a set of harmonic
eigenvalue separation problems, each of which is only related to one
of the statistically independent signal sources. They studied the
structure of harmonics, and the study showed that the slowest nonconstant
harmonics are a certain monotonic function of statistically independent
signal sources. They proved that the nonlinear transformation of statistically
independent signal sources can be regarded as some kind of coordinate
transformation. There is no difference between using SFA to extract
features from data after nonlinear mixing and using SFA to extract
features from signal sources. We will give an example to help readers
understand this claim better. Consider a cosine signal x1 = cos(t) and its quadratic
polynomial extension x2 = x12 = 0.5(1
+ cos(2t)). x2 vary more
quickly than x1 because of the frequency
magnification caused by squaring. In general, among any time-dependent
one-dimensional signal x(t) and
its nonlinear function forms ε(x(t)), the slowest varying with time is the signal x(t) itself or its invertible transformation.[29,30] Inspired by this, we propose to use ISFA in process monitoring in
this work. The ISFA can extract LVs that are not only statistically
independent but also vary slowly over time when the observation data
are composed of a nonlinear mixture of LVs. Compared with the LVs
extracted by ICA, those extracted by ISFA can better characterize
the characteristics of the observation data and thus obtain better
monitoring results.The main contributions are as follows:Both the
statistical independence and
temporal slowness are considered in extracting of the LVs from the
observation data. When the observation data are composed of a nonlinear
mixture of LVs, the ISFA can extract the LVs used for a nonlinear
mixture instead of their nonlinear form. The influence of nonlinearity
of observation data can be reduced, and more accurate monitoring results
can be obtained;To
the best of our knowledge, ISFA-based
process monitoring is proposed for the first time, and a complete
process monitoring model based on ISFA is established.This work is organized as follows: the basic principles
of ICA
and SFA are introduced in Section and Section . The introduction of ISFA and the mathematical principle
of the combining ICA and SFA are given in Section . The establishment of monitoring statistics
and control limits are introduced in Section . The conclusion will be drawn in Section . Finally, the superiority
of the proposed method is verified through a simulated multivariate
process and the Tennessee–Eastman process in Section .
Overview
of ISFA
In this section, ICA and SFA are first introduced,
respectively.
Then, the ISFA is introduced in detail and the feasibility of the
combination of ICA and SFA is explained. The objective function and
optimization procedure are given at the end of this section.
Independent Component Analysis
It
is assumed that the observation data are a vector set of N-dimensional data with zero mean. x(t) = [x1(t), ..., x(t)]T can be expressed as linear combinations of N unknown LVs, i.e., l(t) = [l1(t), ..., l(t)]T, and
is defined byThe purpose
of ICA is to estimate LVs
only from the observation data with unknown mixing matrix A and the unknown LVs l(t) = [l1(t), ..., l(t)]. The only assumption of ICA is that the observation data follow
a non-Gaussian distribution and the LVs are statistically independent
of each other. Then the optimization objective of ICA is to find a
matrix P that satisfies the components ofand are mutually statistically independent
or as independent as possible. u(t)
= Wx(t) is the whitening (or sometimes
called sphering) transformation commonly used in signal processing
and forms the whitened signal components u(t) with zero mean and unit variance.
The covariance matrix C = E(x(t)xT(t)) of the observation data x(t) is decomposed by singular value decomposition C = UΛUT and
the whitening matrix W = Λ–1/2U. After whitening, the
optimization problem of ICA is transformed into finding an orthogonal
matrix to make the extracted LVs as independent as possible. Matrix Q is an orthogonal matrix, as verified by the following relation:There are several
methods to obtain orthogonal matrix Q; here we use the
technique introduced by ref (31), which used second-order
statistics and verified its validity. The objective function can be
written aswhere C((τ) is an entry of the correlation matrix
related to time delay
and is defined as follows:where τ
is the time delay, ⟨...⟩
denotes averaging over time,[31,32] and C((τ) can be defined correspondingly.The objective function ΓICAτ can be intuitively understood as minimizing
the square sum of the off-diagonal terms of the correlation matrix
related to time delay; that is, the correlation matrix related to
time delay is diagonalized as far as possible. Setting several time
delays can have better robustness.
Slow
Feature Analysis
SFA is a novel
method that can extract slow varying LVs from the observed time series
signals. It was initially used in object recognition and pattern recognition.
In recent five years, it has been applied to process monitoring and
has achieved good results. It is assumed that the observation data
are a vector set of N-dimensional data with zero
mean x(t) = [x1(t), ..., x(t)]T. The objective
of SFA is to find a set of nonlinear input–output functions h(x) = [h1(x), ..., h(x)], such that the components of s(t) = h(x(t)) are varying
as slow as possible.[30] We use the square
of the first derivative with respect to time to measure the variance
of time. The smaller the value, the slower it varies, and vice versa.
The objective function can be written asunder the constraintsConstraints and 9 help avoid the trivial
solution u(t) = constant. Constraint ensures that different components of s(t) carry different information, instead of simply copying
each other.[9] To solve the nonlinear problem,
the optimization procedure is divided into two steps: expand the input
signal x(t) nonlinearly and treat the
problem linearly in the expanded high-dimensional space. This is a
common technique to solve a nonlinear problem. x(t) is an N-dimensional input signal, g(t) = ε(x(t)) is an M-dimensional signal
after nonlinear expansion, and ε is a nonlinear
expansion function. The commonly used nonlinear expansion form is
a polynomial expansion, such as binomial expansion:where ε0T is a constant vector. The dimension
of binomial expansion is M = N + N(N + 1)/2; the value corresponds to the
mean of each dimension so that the mean of each dimension is zero.After obtaining the nonlinear expanded signal g(t), we can handle the optimization problem of SFA linearly
in high-dimensional space. The input–output functions h(x) can be written aswhere P is an M × M matrix
to be calculated.In order to simplify the optimization procedures,
the nonlinear
expanded signal g(t) is whitened (sphered)
to obtain the whitened signal u(t) = Wg(t), so the mean value of each component u(t) is zero,
and they are uncorrelated to each other. Matrix W is
a whitening matrix as in normal ICA. Then we get y(t) = Qu(t) = QWg(t) = Pg(t) = h(x(t)); the optimization problem
of SFA is transformed into finding an orthogonal matrix Q so that the components in y(t) vary
as slowly as possible over time. From constraint , the proof that matrix Q is
an orthogonal matrix is as follows:The objective function of SFA can be further
transformed subject to maximization:The objective function
ΓSFAτ can be intuitively understood as maximizing
the sum of squares of the diagonal terms of the correlation matrix
related to time delay. Since SFA makes u̇(t) ≈ u(t+1) – u(t) approximately, the value of the time
delay τ can only take 1. The reasoning steps are described in
detail in ref (33).
Independent Slow Feature Analysis
The final
objective functions of ICA and SFA described above have
a high degree of similarity. Therefore, they were combined to become
a new method called independent slow feature analysis.[28,30] It is assumed that the observation data are a vector set of N-dimensional data with zero mean x(t) = [x1(t),
..., x(t)]T. First, the input signal x(t) is nonlinearly expanded and one obtains g(t) = ε(x(t)), ε being the nonlinear expansion function and g(t) being an M-dimensional
signal with zero mean in each component. Then, whitening of g(t) is done to obtain the whitened signal u(t) = Wg(t). Finally, the ISFA is applied on the whitened signal u(t), and an M × M matrix Q is obtained. The output signal y(t) can be written asThe first R components y1(t), ..., y(t) are statistically
independent and vary slowly over time; these R components
are called independent slow features (ISFs). The last M – R components vary faster than the previous R components and may not be independent of each other. Although
the last L – R components
are irrelevant to the final result, they are still essential in the
subsequent optimization procedure. In general, the dimension of the
statistically independent components is much smaller than the number
of the remaining components. The optimization objective of ISFA can
be written as a minimization objective function:The objective function of ISFA is the linear
combination of the objective function of ICA in eq and the objective function of SFA in eq . ωICA and ωSFA represent the weights of the ICA part
and the SFA part respectively, which determine that either statistical
independence or temporal slowness plays a leading role in ISFA. In
this work, we believe these two are equally important and set the
value of these two parameters to 1. TICA is the set of time delay and κICAτ is the set of weighting factors
for different time delays in the objective function of ICA. The value
of κICAτ determines the importance of different time delays in the ICA objective
function. The objective function of ICA is connected with the objective
function of ISFA by minus sign because the optimization objective
of ICA is to minimize the objective function and the optimization
objective of SFA is to maximize the objective function. Therefore,
the optimization objective of ISFA is to minimize eq . The objective function of ISFA
guarantees the statistical independence and the temporal slowness
of the extraction results.In general, we have an N-dimensional input signal x(t) and
obtain an M-dimensional
signal u(t) after nonlinear expanding
and whitening of x(t). Then an orthogonal
matrix Q is obtained by minimizing the objective function
of ISFA. The objective function of ISFA can be intuitively understood
as maximizing the sum of the first R diagonal items
of the correlation matrix related to time delay while diagonalizing
the correlation matrix related to time delay as much as possible.
Successive Givens rotation is a good choice for optimization procedure
because of its intuitive understanding and low computational cost.[34] Givens rotation is a rotation transformation
that selects two parameters μ and ν and rotates the vector
counterclockwise in radians around the origin in the (μ,ν)
plane. The rotation matrix has the following form:The objective function of ISFA based on Givens rotation can
be
written aswhere y′ are the instantaneous
signals in the step of rotation procedures. The method of calculating
rotation matrix Q was described in detail in ref (30). After obtaining matrix Q, ISFs y1(t),
..., y(t) that are independent of each other and vary slowly can be obtained,
yielding C((τ) = QC((τ)QT.
Process mMonitoring with ISFA
When
using ISFA for process monitoring, it is necessary to use
the extracted ISFs to establish the monitoring statistics and confidence
limits, respectively. In this work, two new process monitoring statistics
are established on the basis of the characteristics of the extracted
ISFs, and confidence limits are estimated on the basis of the probability
distribution of the monitoring statistics. To the best of our knowledge,
this is the first time that a complete process monitoring model is
given on the basis of ISFA.
Process Monitoring Statistics
with ISFA
As mentioned above, the first R components y1(t), ..., y(t) of the
ISFs y(t) are considered as ISFs and
are used
to monitor the dominating part of the process. The last M – R components are usually ignored as noise.
The demixing matrix is defined as B = QW; then it can be obtained from eq :The Euclidean
norm of each row of demixing
matrix B, also known as the 2-norm, is calculated and
it is sorted in descending order. d rows are selected
with the largest Euclidean norm in descending order to form a new
matrix B (dominant part
of B). The remaining rows form the matrix B (excluded part of B). y(t) = Bε(x(t)) calculated by matrix B are the dominant independent slow features, which
is similar to the principal component subspace part of PCA. y(t) = Bε(x(t)) calculated by matrix B is the residual part, which is similar to the residual
subspace of PCA.There are three process monitoring statistics
that have been proposed
in ref (35) when using
ICA for process monitoring: I2, I2, and SPE, where I2 and I2 are used as the process monitoring
statistics in this work. I2 is used to
monitor the subspace composed of dominant ISFs y(t). When the process varies,
this subspace is likely to vary accordingly. The mathematical meaning
of I2 is the dot product of the ISFs y(t) at time t and is defined as follows:Another process monitoring statistic I2 is also very important and is used to monitor
the residual subspace
composed of y(t). The additional statistic can not only improve the accuracy of
fault detection and provide more convincing results but also compensate
for the lack of information loss due to the incorrect number selection
of ISFs and then resulting in a low FDR. The mathematical meaning
of I2 is the dot product of the y(t) that forms the
residual subspace at time t and is defined as follows:In summary, through performing ISFA on the observation data x(t), we can obtain two process monitoring
statistics I2 and I2.
Control Limits for Monitoring Statistics
With the establishment of process monitoring statistics under normal
operations, it is necessary to establish control limits to determine
whether the process deviates from the normal operations. In PCA- and
SFA-based monitoring methods, such as Hotelling’s T2, SPE and S2 are all effective
tools with good results. However, the assumption of these process
monitoring statistics is that the LVs extracted follow a Gaussian
distribution. Therefore, probability density functions of the process
monitoring statistics follow a certain distribution, such as χ2 distribution, F distribution, etc.[23,36,37] This is also one of the reasons for possible
low FDRs and high false alarm rates (FARs) of these methods. The same
as ICA, the precondition for data processing of ISFA is that the data
must follow a non-Gaussian distribution, which is more in line with
the characteristics of complex industrial process data structure.
Since the probability distribution of the process monitoring statistics
obtained by calculation of such data hardly meets the known distribution
form, the probability density function cannot be obtained directly.
In this work, we propose to use kernel density estimation (KDE) to
estimate the probability density functions of I2 and I2 under normal operations.KDE, also called Parzen window, is used to estimate the unknown probability
density function and is one of the nonparametric estimation methods.
Since KDE does not use any prior knowledge and assumptions about the
data distribution, it is a method to study the data distribution only
from the data sample itself, so it has been widely used in statistics.
A univariate KDE with kernel K is defined as follows:[38,39]where x is the set
of parameter
points to be estimated, x is the ith observation value in the observation
data, h is the window width, also known as the smoothing
parameter, and K is the selected kernel function.
The value of the window width h directly affects
the final result of the KDE. If the value of h is
too large, the curve of the probability density function will be too
smooth, which will result in missing details in the data. If the value
of h is too small, the curve of the probability density
function will become sharp and sensitive to outliers. It not only
cannot estimate the correct probability density function but also
brings difficulties to fitting the probability density function. The
kernel function K must satisfy the following conditions:There are several kernel functions to choose from; the most
commonly
used Gaussian kernel function is selected in this work:More details of KDE are described in ref (40).Since most of the
observation data we obtain are discrete data
in the industrial processes and the probability density estimated
by KDE are also discrete values, it is impossible to divide the confidence
bounds directly. Therefore, we do a curve fitting to the estimated
density values and get the expression of the probability density function F(x). The control limit xlim with a confidence level of α can be obtained
by solving a variable upper limit definite integral of the following
formula:The advantage of
using kernel density estimation to calculate the
control limits is that it does not need to make any assumptions about
the distribution of the observation data. Because of fitting industrial
process data more realistically, the probability density function
of process monitoring statistics can be estimated more accurately,
thereby obtaining more accurate process monitoring results than traditional
process monitoring statistics such as Hotelling’s T2.
Process Monitoring with
ISFA
Process
monitoring with ISFA is divided into two parts: the off-line modeling
and the online detection. The flowchart of the process monitoring
is shown in Figure , and the details are as follows.
Figure 1
Process monitoring flowchart based on
ISFA.
Process monitoring flowchart based on
ISFA.Off-line modeling:Compute the nonlinearly
expanded signal g(t) = ε(x(t)) using x(t) under
normal operations. Whiten the data after nonlinear expansion and obtain
the whitening matrix W and the whitened data u(t).Apply ISFA to u(t), and obtain the
rotation matrix Q and the
demixing matrix B = QW.According to the Euclidean norm of
each row of matrix B, select the rows of those with the
first d largest Euclidean norm to form the matrix B; the remaining rows form the
matrix B.Calculate the process monitoring statistics I2 and I2 and calculate
the control limits separately according to the confidence level.Online monitoring:For an online observation
data xnew, do the same nonlinear expansion gnew = ε(xnew) and subtract the mean of the training data g(t).Calculate the ISFs separately, y=Bgnew and .Calculate the process
monitoring statistics andIf Inew2 ≤ Ilim2 and , the process is normal. If either Inew2 or exceeds its corresponding control limit,
the process is considered to have a fault.
Conclusion
In this work, we proposed to apply
ISFA to monitor the nonlinear
and non-Gaussian industrial process. By extracting ISFs as LVs, we
solve the problem that when the observed variable is a nonlinear mixture
of LVs, statistical independence is not a sufficient criterion to
extract the LVs that can describe the characteristics of the process.
We established corresponding process monitoring statistics and used
kernel density estimation to estimate the control limits without any
restriction on the probability distribution of process monitoring
statistics. The experimental results of the proposed method in a designed
numerical example and the TE process strongly support the theoretical
analysis. Our future research direction is how to use the LVs extracted
by ISFA to establish dynamic monitoring statistics and realize the
use of ISFA to monitor dynamic and multimode processes.[27]
Case Study
In this
part, four process monitoring methods, including PCA-based,
ICA-based, dynamic ICA-based, and the proposed ISFA-based, are compared
by using a designed numerical example and the TE process to verify
the superiority of the proposed method.
Numerical
Example
Consider the following
numerical model of a multivariate dynamic process. This model is a
modified version based on a numerical model proposed by ref (6), which is widely used in
process monitoring.where u is the correlated input:where w is a nonlinear mixture
of latent variables sig1 and sig2,This is quite an extreme nonlinearity;[41]sig1 and sig2 are
functions related to t,The output y is equal
to z plus a Gaussian
noise vector v. Each element of v has zero
mean and a variance of 0.1. The input signal u and output
signal y can be measured, but the remaining z and w are not. u and y that
can be measured from the input variables x(t) = [y(t)Tu(t)T]T. The training set consists
of 200 samples of x(t) under normal
operations, and the mean of each variable of x(t) is scaled to zero after data preprocessing. In this multivariate
dynamic process, we mix the LVs in an extreme nonlinear form, and
artificially add faults to the process by changing the LVs. We use
PCA, ICA, and ISFA to monitor the process, respectively.The
traditional PCA not only assumes that the sequence is independent
between the current moment and the historical moment for the same
LV but also assumes that different LVs are independent of each other.
However, such assumptions are difficult to satisfy in the actual industrial
processes, because the data in the industrial process have not only
cross-correlation but also autocorrelation. The assumption of traditional
ICA for extracting LVs from observation data is that the observation
data are the linear combination of LVs. Such assumptions are also
often difficult to satisfy in real industrial processes. In this multivariate
dynamic process, the assumptions of neither traditional PCA nor the
traditional ICA are satisfied.The PCA used here is the well-known
traditional PCA, the process
monitoring statistics used by PCA are Hotelling’s T2 and SPE, and the number of selected
principal components (PCs) is 4. There are several methods for extracting
statistically independent LVs of the ICA. In this case, we use the
most mature and widely used FastICA proposed by ref (7), and the number of selected
independent components (ICs) is 2. We set T = 2 and R = 5 in ISFA optimization; the reason for R = 5 is to keep the same dimension as the observation data; and the
number of selected ISFs is 2. The confidence level for all three methods
is 95%.The number of samples in the test set is 500; the faults
setting
are as follows:Fault 1: A step change for the
latent variable sig1 by 0.4 is introduced at sample 50.
This is a relatively
incipient fault, and the fault information will be hidden in the noise.Fault 2: Add 0.05(t – 50)
to the latent variable sig1, where t equals 50 to 149. The latent variable linearly increases from the
50th to the 149th moment.The process monitoring results of
PCA and ICA for fault 1 are shown
in Figure . The green
curve in the figure represents the process monitoring statistics of
the samples under normal operations, the blue curve represents the
process monitoring statistics of the fault samples, and the red dotted
lines represent the control limits. It can be clearly seen from Figure that whether it
is PCA or ICA, there are many blue fault samples below and they fluctuate
up and down repeatedly near the control limits; fault 1 cannot be
detected accurately. This is because the artificially added fault
1 is a step change of sig1; compared with the sig1 itself, it only changes by 20%, the magnitude is small, and the
fault information is easily covered by noise after a series of changes.
The FDR of T2 in PCA is 52.55%, and the
FDR of SPE is 20.84%. The FDR of I2 in
ICA is 23.50%, and the FDR of I2 in ICA is
52.33%. This shows that the main fault information is contained in
the residual subspace, and ICA cannot effectively extract representative
LVs in this multivariate nonlinear process. The process monitoring
results of PCA and ICA for fault 2 are shown in Figure . The T2 can
only detect the fault stably after the LV linearly increases to a
significant change. The SPE fluctuates up and down near control limits
during the fault occurrence. Compared with PCA, ICA can provide better
detection results for fault 2, but there are still some fault samples
that blow the line of control limits. After complex nonlinear mixing,
the fault information is easily hidden by noise. Meanwhile, the observation
data that come from the nonlinear mixing of LVs do not satisfy the
preconditions of ICA. Therefore, when sig1 changes little,
it is difficult for both ICA and PCA to detect faults in time. Only
when the value of sig1 increases linearly to a larger
value can PCA and ICA identify the fault more stably. Because ICA
uses statistical independence to extract LVs, PCA uses correlation
and the non-Gaussian data are more suitable for ICA. Therefore, ICA
can provide more accurate monitoring results than PCA. The FDR of T2 and SPE in PCA are 19.73% and 13.08%, respectively.
The FDR of I2 and I2 in ICA are 67% and 78%, respectively.
Figure 2
Monitoring results of
the numerical example designed for fault
1 using (a) PCA and (b) ICA.
Figure 3
Monitoring
results of the numerical example designed for fault
2 using (a) PCA and (b) ICA.
Monitoring results of
the numerical example designed for fault
1 using (a) PCA and (b) ICA.Monitoring
results of the numerical example designed for fault
2 using (a) PCA and (b) ICA.The monitoring results of the ISFA for fault 1 are shown in Figure . The nonlinear expansion
part in the experiment chooses a cubic polynomial because nonlinear
expansion will increase the dimension of the data and cause a lot
of information redundancy; the singular value less than 0.001 is regarded
as noise and redundant information and will be discarded. Compared
with PCA and ICA, the FDRs of ISFA are significantly improved. The
FDRs of I2 and I2 in ISFA are 52.99% and 82.71%, respectively. Compared with PCA and
ICA, the accuracy rate given by ISFA is improved by about 30%. Fault
1 is a step change that has a minor change range and is easily concealed
by noise. ISFA can amplify the small change several times and differentiate
it from the data under normal operations. The monitoring results of
ISFA for fault 2 are shown in Figure . The FDRs of I2 and I2 in ISFA are 87% and 98%, respectively. Compared
with PCA and ICA, the accuracy of fault detection has been significantly
improved. Since the number of PCs selected is 4, the number of ICs
selected is 2 and the number of ISFs selected is 2. When fewer LVs
are selected, ISFA can still get higher accuracy in two cases than
the other two methods. This shows that, in this multivariable nonlinear
process, ISFA has a stronger ability to extract representative LVs,
and the statistics constructed using these extracted LVs can more
accurately monitor the variations of the entire process. Note that
the magnitudes of the process monitoring statistics are too large
due to nonlinear expansion. To facilitate observation, the ordinates
of Figure and Figure have taken their
corresponding logarithms.
Figure 4
Monitoring results of the numerical example
designed for fault
1 using ISFA.
Figure 5
Monitoring results of the numerical example
designed for fault
2 using ISFA.
Monitoring results of the numerical example
designed for fault
1 using ISFA.Monitoring results of the numerical example
designed for fault
2 using ISFA.The FARs of the three methods
for fault 1 and fault 2 in this numerical
example are shown in Table . It can be seen from Table that the FARs of our proposed method for fault 1 are
slightly higher than PCA by 2% and slightly lower than ICA. However,
the FDRs of our proposed method is much higher than PCA and ICA. Compared
to the 30% increase in FDRs, the 2% increase in FAR can be ignored.
The FARs of our proposed method for fault 2 are higher than PCA and
ICA. Although the FARs of our proposed method are about 10% higher
than that of PCA and ICA in fault 2, FDRs are about 20% higher than
that of PCA and ICA, which still reflect the superiority of our proposed
method.
Table 1
False Alarm Rates (%) of Each Method
for Numerical Example
PCA
ICA
ISFA
fault
T2 (%)
SPE (%)
I2 (%)
Ie2 (%)
I2 (%)
Ie2 (%)
1
2.04
0.00
4.08
6.12
0.00
4.08
2
2.04
4.08
2.75
5.00
8.00
16.50
In summary, the observation
data in this case are composed of a
nonlinear mixture of LVs and ISFA is significantly better than PCA
and ICA in the FDRs of two different types of faults. The experimental
results support the theoretical analysis.
TE Process
The Tennessee–Eastman
process is an open and challenging chemical model simulation platform
developed by Eastman Chemical Co. in the United States based on the
actual chemical reaction process.[42] TE
process data have the characteristics of strong correlation, nonlinearity,
non-Gaussian, and so on that modern industrial data have and are widely
used in process monitoring and fault diagnosis for testing complex
industrial processes.[43−45] In this part, the TE process data are used to compare
the monitoring performance of the proposed method with ICA and dynamic
ICA.We use approximations of negentropy[46] to quantify the variables of the TE process. The negentropy
of a variable with a Gaussian distribution is zero.[46] A non-Gaussian variable has its negentropy larger than
0.01. Hence, the variables of TE process are indeed non-Gaussian according
to Figure .
Figure 6
Approximations
of negentropy for 52 variables of TE process.
Approximations
of negentropy for 52 variables of TE process.The plant-wide control structure of the TE process is shown in Figure . TE process data
consist of 53 observed variables, including 12 manipulated variables
XMV(1–12), 22 process measurements XMEAS(1–22), and
19 composition measurements. The sampling interval of the TE process
is 3 min, the training set samples are obtained under 25 h of running
simulation, and the training set consists of 500 samples under normal
operations. The test set samples are obtained under 48 h of running
simulation, and the faults are introduced in the eighth hour. A total
of 960 observation samples are collected in the test set, and the
first 160 observations are nonfault data. There are 21 faults in the
TE process. The detailed introduction and resource download of the
TE process can be found on the Web site http://depts.washington.edu/control/LARRY/TE/download.html.
Since the 19 composition measurements are challenging to detect in
real time in the actual chemical process, the manipulated variable
XMV(12) stirring speed is not actually manipulated. This experiment
uses 11 manipulated variables XMV(1–11) and 22 process variables
XMEAS(23–41), a total of 33 variables as input variables.
Figure 7
Plant-wide
control structure of the TE process.
Plant-wide
control structure of the TE process.The training data are all preprocessed by zero mean. In this experiment,
the number of ICs selected by ICA is 9, the number of ICs selected
by dynamic ICA is 22, and the time delay is T = 2.
To provide a more reasonable comparison result, the number of ISFs
selected by the proposed ISFA is 22, the time delay T = 2 is the same as the dynamic ICA, and R = 33
maintains the same dimension as the original input variables.It is worth noting that it is easy to have data dimension explosion
and highly redundant information after nonlinear expanding. While
increasing the computational cost, highly redundant information will
also adversely affect the final result. The purpose of the nonlinear
expansion is to linearly solve nonlinear problems in a higher dimension
space. For industrial data such as the TE process, ISFA omits the
nonlinear expansion at the data preprocessing because the data have
the characteristics of nonlinearity and strong correlation. The confidence
level of each statistic of the three methods is selected as 99%.The FDRs for 21 types of faults are shown in Table . The highest FDRs for the same type of faults
are marked in bold. It can be seen from Table , except for faults 10 and 16, the FDRs of
the proposed method are higher than the other two comparison methods.
The FDR of ISFA for fault 10 is higher than that of dynamic ICA, only
2.63% lower than ICA; the FDR of ISFA for fault 16 is higher than
dynamic ICA, only 0.13% lower than ICA. In general, the performance
of ISFA in this case is better than traditional ICA and dynamic ICA.
When the same number of LVs are selected to construct process monitoring
statistics, the FDRs of ISFA are higher than that of dynamic ICA.
It further supports the conclusion that ISFA has a stronger ability
to extract appropriate LVs than the other two methods and that the
statistics established on the extracted LVs can more accurately monitor
the variations in the entire process.
Table 2
Fault Detection
Rates (%) of Each
Method for TE Process
ICA
dICA
ISFA
fault
I2 (%)
Ie2 (%)
I2 (%)
Ie2 (%)
I2 (%)
Ie2 (%)
1
99.50
99.75
99.50
99.12
99.88
99.88
2
98.00
98.25
97.38
93.25
99.25
95.50
3
0.00
6.25
0.00
0.00
16.75
3.38
4
48.75
100.00
99.62
0.25
100.00
99.38
5
100.00
100.00
100.00
100.00
100.00
100.00
6
100.00
100.00
100.00
100.00
100.00
100.00
7
97.62
100.00
100.00
82.88
100.00
100.00
8
94.25
98.12
91.88
64.25
98.62
95.00
9
0.00
3.75
0.00
0.00
13.62
1.63
10
61.38
89.75
78.62
61.38
87.12
85.38
11
45.12
72.12
55.00
16.88
81.88
68.12
12
98.12
99.88
99.88
97.25
99.75
99.88
13
94.50
95.25
95.00
88.62
95.62
94.25
14
99.88
99.88
99.75
29.38
100.00
98.38
15
0.00
13.62
1.38
0.00
24.38
4.13
16
57.38
92.38
77.75
73.75
92.25
87.88
17
86.25
96.38
94.50
80.88
94.50
96.50
18
89.38
90.00
90.00
89.38
91.50
89.62
19
41.75
90.38
66.38
17.88
92.75
79.75
20
69.50
90.25
67.38
58.50
90.38
88.62
21
37.50
61.00
22.38
11.62
63.88
51.88
av
67.57
80.81
73.16
55.49
82.96
78.05
Next, two representative faults of
different types will be selected,
and the superiority of ISFA for process monitoring will be analyzed
in detail. Fault 11 is random variation in reactor cooling water inlet
temperature. The variations in the temperature of the reactor cooling
water inlet will cause the reactor temperature XMEAS(9) to vary. At
this time, it is necessary to increase the reactor cooling water flow
rate XMV(10) to make the reactor temperature return to normal operations.
Compared with fault 4 (i.e., reactor cooling water inlet temperature)
having a step disturbance, random variations are more challenging
to be accurately detected.The monitoring results of ICA and
dynamic ICA on fault 11 are shown
in Figure , and the
monitoring results of ISFA on fault 11 are shown in Figure . The FDRs of I2 and I2 in ICA are 45.12%
and 72.12%, respectively. The FDR of I2 is lower than that of I2, and the process monitoring
statistics of the fault samples fluctuate rapidly and repeatedly near
the control limit. This shows that the information of fault 11 is
mainly contained in the residual subspace of the ICA. The FDRs of I2 and I2 in dICA are
55% and 16.88%, respectively. The FDRs of the two process monitoring
statistics are lower than that of ICA. This is because after dICA
expands the dimensions of the data, it causes redundancy of information,
which makes the effective information contained in the data less and
ultimately leads to low detection results. The FDR of I2 in ISFA is 81.88%, which is significantly better than
those of ICA and dICA. This shows that the fault information of fault
11 is mainly contained in the dominating subspace of ISFA, and ISFA
has a stronger ability to extract appropriate LVs for the TE process.
Figure 8
Monitoring
results of the TE process for fault 11 using (a) ICA
and (b) dICA.
Figure 9
Monitoring results of the TE process for fault
11 using ISFA.
Monitoring
results of the TE process for fault 11 using (a) ICA
and (b) dICA.Monitoring results of the TE process for fault
11 using ISFA.Fault 19 is an unknown fault;
the monitoring results of ICA and
dynamic ICA on fault 19 are shown in Figure , and the monitoring results of ISFA on
fault 19 are shown in Figure . The process monitoring statistic I2 in ICA and I2 and I2 in ISFA can distinguish between normal data samples and fault
data samples well. The FDRs of other process monitoring statistics
are very low, and the values of process monitoring statistics fluctuate
repeatedly around the control limit. The FDRs of I2 and I2 in ISFA are is 92.75%
and 79.75%, respectively. The comprehensive FDRs of ISFA are higher
than that of I2 in ICA, which is 90.38%.
The FDRs of I2 and I2 in ISFA show that most of the fault information is contained in
the dominating subspace of ISFA. It is further proved that when the
number of LVs extracted is the same, ISFA can extract LVs that contain
more process information and the extracted LVs can better describe
the characteristics of the process.
Figure 10
Monitoring results of the TE process
for fault 19 using (a) ICA
and (b) dICA.
Figure 11
Monitoring results of the TE process
for fault 19 using ISFA.
Monitoring results of the TE process
for fault 19 using (a) ICA
and (b) dICA.Monitoring results of the TE process
for fault 19 using ISFA.The average FARs of
the three methods for 21 types of faults in
the TE process are shown in Table . Compared with ICA, the FAR of the monitoring statistic I2 of our proposed method is increased by 5.98%,
and the FDR is increased by 15.39%. The FAR of the monitoring statistic I2 of our proposed method is increased by 0.30%,
and the FDR is decreased by 2.76%. On the monitoring statistic I2, the method we proposed failed to achieve
better monitoring results than ICA. According to our analysis, there
may be two reasons, which follow. On the one hand, the TE process
data may not have the complex nonlinearities considered by our proposed
method. This can be proved from the results of the numerical example.
The performance of our proposed method has been greatly improved in
numerical example. On the other hand, the feature extraction ability
of our proposed method is stronger than that of ICA. Therefore, most
of the useful process information is extracted into the dominating
subspace, and the residual subspace contains less process information.
Therefore, for the monitoring statistic I2, the performance is slightly worse than that of ICA. Compared with
dynamic ICA, the FAR of the monitoring statistic I2 of our proposed method is increased by 5.98%, and the
FDR is increased by 9.80%. The FAR of the monitoring statistic I2 of our proposed method is increased by 1.82%,
and the FDR is increased by 22.50%. Although FARs have increased compared
with dICA, considering the two indicators of FARs and FDRs, our proposed
method outperforms dICA in the TE process.
Table 3
False Alarm
Rates (%) of Each Method
for TE Process
method
monitoring
statistics
false alarm
rate (%)
ICA
I2
0
Ie2
1.52
dICA
I2
0
Ie2
0
ISFA
I2
5.98
Ie2
1.82
The performance of ISFA for process monitoring is
significantly
better than those of ICA and dICA. In addition, even for faults 3,
9, and 15, the FDRs of ISFA has increased by 10% compared to those
of ICA and dICA.