Literature DB >> 35706944

Kernel principal component analysis (PCA) control chart for monitoring mixed non-linear variable and attribute quality characteristics.

Muhammad Ahsan1, Muhammad Mashuri1, Hidayatul Khusna1.   

Abstract

The products are commonly measured by two types of quality characteristics. The variable characteristics measure the numerical scale. Meanwhile, the attribute characteristics measure the categorical data. Furthermore, in monitoring processes, the multivariate variable quality characteristics may have a nonlinear relationship. In this paper, the Kernel PCA control chart is applied to monitor the mixed (attribute and variable) characteristics with the nonlinear relationship. First, the Average Run Length (ARL) is utilized to evaluate the performance of the proposed chart. The simulation studies show that the proposed chart can detect the shift in process. For this case, the Radial Basis Function (RBF) kernel demonstrates the consistent performance for several cases studied. Second, the performance comparison between the proposed chart and the conventional PCA Mix chart is performed. Based on the results, it is known that the proposed chart performs better in detecting the small shift in process. Finally, the proposed chart is applied to monitor the well-known NSL KDD dataset. The proposed chart shows good accuracy in detecting intrusion in the network. However, it still produces more False Negatives (FN).
© 2022 The Author(s).

Entities:  

Keywords:  zzm321990zzm321990zzm321990zzm321990Tzzm321990zzm321990zzm3219902zzm321990zzm321990zzm321990 Hotelling's chart; Kernel Density Estimation; Kernel PCA; Mixed quality characteristics; Nonlinearity

Year:  2022        PMID: 35706944      PMCID: PMC9189028          DOI: 10.1016/j.heliyon.2022.e09590

Source DB:  PubMed          Journal:  Heliyon        ISSN: 2405-8440


Introduction

Two types of control charts have been developed based on the monitored quality characteristics. These charts are named as the attribute and variable charts. The variable control chart is developed to monitor the variable quality characteristics (in variable or ratio scale) such as length, temperature, or height (Montgomery, 2009). Meanwhile, to monitor the attribute quality characteristics (in categorical scale) the attribute chart was applied (Ahsan et al., 2018). When the characteristics quality is correlated or cannot be monitored separately, the multivariate control chat has been developed. There are three main types of multivariate variable control charts namely Shewhart, multivariate exponentially weighted moving average (MEWMA), and multivariate cumulative sum (MCUSUM). The product quality characteristics are not only gauged individually by the attribute or variable characteristics but also can be monitored using a mixed scheme. In order to facilitate a mixed procedure of the monitoring process, several works have studied the development of the mixed characteristics charts. The mixed scheme by employing the combination between and np charts has been proposed and has a good performance in monitoring mixed characteristics (Aslam et al., 2015). The mixed chart proposed by Aslam et al. (2015) is compared with Hybrid Exponential Weighted Moving Average (HEWMA) (Aslam et al., 2016). The spatial-sign covariance matrix-based control chart has been proposed by integrating the standardized ranks and spatial signs in calculating the mixed statistics (Wang et al., 2018). Furthermore, the principal component analysis for mixed data is applied in inspecting the process (Ahsan et al., 2018) and in detecting outliers (Ahsan et al., 2019). To overcome the PCA Mix chart drawbacks, Ahsan et al. (2020) proposed the Kernel PCA (KPCA) Mix chart for monitoring the mixed variable and attribute quality characteristics. The problem arises when the PCA Mix chart (Ahsan et al., 2018) is applied to inspect the nonlinear multivariate processes. In monitoring processes, the multivariate quality characteristics may have a nonlinear relationship. Some studies about the utilization of control charts in detecting a shift in nonlinear data have been conducted. A multivariate chart based on KPCA and Exponentially Weighted Moving Average (EWMA) is proposed to monitor nonlinear biological processes (Yoo and Lee, 2006). Khediri et al. (2010) suggested Support Vector Regression (SVR) control charts for multivariate nonlinear processes with dependency on its samples. Fan et al. (2014) proposed a control chart based on filtering kernel independent component analysis–principal component analysis (FKICA–PCA) to monitor multivariate industrial processes. The nonparametric Revised Spatial Rank Exponential Weighted Moving Average (RSREWMA) control chart is developed to assess the multivariate nonlinear profile data (Pan et al., 2019). Kernel PCA can be applied in monitoring such cases mentioned above by using the control chart approach. Based on the previous study, the KPCA Mix chart (Ahsan et al., 2020) can be extended to monitor the multivariate nonlinear data. Therefore, this research suggests a mixed multivariate control chart based on the KPCA algorithm that can accommodate the mixed type of quality characteristics with the nonlinear relationship. The estimated PCs Mix from KPCA are then transformed into Hotelling's statistics. The control limit of statistics is calculated using the kernel density estimation (KDE), the same method used in Ahsan et al. (2020). Moreover, to show the benefits and drawbacks of the proposed chart, its performance is compared with the conventional PCA Mix chart. The rest of this article is arranged as follows: Some related studies are shown in section 2. Section 3 describes the Kernel PCA method. The charting procedures of the proposed KPCA Mix control chart are displayed in section 4. Section 5 presents the performance assessment of the proposed chart in detecting a shift in the process along with the comparison with the PCA Mix chart. The utilization of the proposed chart in simulated and real data is shown in Section 6. Some conclusions and possible future research are presented in Section 7.

Related research

The recent studies of the control charts are presented in this section. There are three main categories of control charts discussed in this section such as a multivariate variable chart, attribute chart, and mixed chart. The recent developments in multivariate variable charts are displayed in Table 1. Table 2 shows the recent developments of multivariate attribute charts. Meanwhile, the recent developments in mixed characteristics are presented in Table 3.
Table 1

The recent development of multivariate variable control charts.

SourcesProposed schemeFindings
Chiang et al. (2021)New scheme of multivariate auxiliary-information-based (AIB) chartThe performance of the proposed chart is evaluated using Monte-Carlo simulation and applied to cement data
Ahmad and Ahmed (2021)T2 control chart to inspect the high dimensional dataThe proposed method is usable without preprocessing or dimension reduction with high accuracy detection
Haddad (2021)T2 control charts using modified Mahalanobis distanceThe proposed method has better performance in detecting more outliers compared to the traditional chart
Cabana and Lillo (2021)Robust multivariate chart for individual observations using reweighted shrinkage estimatorsThe proposed chart has a better performance for high dimensional and high contaminated data
Maleki et al. (2020)Median estimators of the T2 control chartThe proposed method outperforms performance compared to the conventional chart
Haddad et al. (2019)Bivariate Hotelling's T2 charts with bootstrap dataThe proposed method shows a better performance compared to the conventional method
Tiengket et al. (2020)Bivariate Copulas on the Hotelling's T2 Control ChartThe bivariate copulas method can be used in the Hotelling's T2 chart
Mashuri et al. (2019)Tr (R2) control charts with Kernel Density Estimation (KDE) control limitThe proposed control chart method presents better performance to detect the shift for the large characteristics and sample size
Mehmood et al. (2019)Hotelling T2 control chart based on bivariate ranked set schemesProposed control chart schemes demonstrate an outstanding performance compared to the classical Hotelling T2
Haq and Khoo (2019)Adaptive MEWMA chartThe proposed chart surpasses the performances of the existing adaptive multivariate charts
Flury and Quaglino (2018)MEWMA chart for asymmetric gamma distributionsThe proposed MEWMA chart outperforms the performance of the conventional T2 chart in all the cases
Haq et al. (2020)Dual MCUSUM charts with auxiliary information for the process meanThe proposed chart has a better performance compared to the DMCUSUM and MDMCUSUM charts when detecting different sizes of a shift in the process mean vector
Table 2

The recent development of attribute control charts.

SourcesProposed schemeFindings
Yeganeh et al. (2021)Combined novel run rules and MEWMA control chartThe proposed method has better performance for small and moderate shifts in monitoring linear profiles
Xie et al. (2021)MCUSUM control chart for monitoring Gumbel's bivariate exponential dataThe proposed chart outperforms the other charts for most shift domains
Mashuri et al. (2020)Fuzzy bivariate chartThe proposed chart is more sensitive than the conventional bivariate Poisson chart
Zhou et al. (2020)Synthetic control chart for attribute inspectionThe proposed chart demonstrates a higher detection performance for small and large mean shifts
Quinino et al. (2020)Attribute chart for the joint monitoring of mean and varianceThe proposed method is easier to be implemented compared to the conventional approach
Aldosari et al. (2019)Attribute control chart for multivariate Poisson distribution using multiple dependent state repetitive sampling (MDSRS)The proposed method has a better performance than the conventional one based on repetitive sampling
Aslam et al. (2019)Shewhart attribute control with the neutrosophic statistical intervalThe proposed attribute control chart has a good ability to detect a shift in the process
Chong et al. (2019)Multi-attribute CUSUM-np chartThe proposed procedure has a better or equal performance compared to the conventional chart
Aslam (2019)Attribute control chart using the repetitive sampling under the fuzzy neutrosophic systemThe proposed chart with repetitive sampling under the fuzzy neutrosophic system is more sensitive in detecting a shift in the process as compared with the existing chart
Lee et al. (2017)Multinomial generalized likelihood ratio (MGLR) chartThe proposed chart has better performance than the set of 2-sided Bernoulli CUSUM charts
Table 3

The recent development in the mixed variable and attribute control charts.

SourcesProposed schemeFindings
Ahsan et al. (2020)Kernel PCA Mix ChartThe proposed chart has a better performance compared to the PCA Mix chart
Ahsan et al. (2019)PCA Mix chart for detecting outlier in mixed characteristics schemeThe proposed chart has a great performance to detect more outliers with a higher percentage of outliers added compared to the conventional and other robust charts
Ahsan et al. (2018)PCA Mix control chartThe proposed chart presents good performance for an appropriate number of principal components used
Wang et al. (2018)Multivariate sign chartSimulations show the superiority of the proposed control chart in monitoring mixed-type data
Aslam et al. (2015)The mixed chart to monitor the processThe mixed chart shows excellent performance in the monitoring process
The recent development of multivariate variable control charts. The recent development of attribute control charts. The recent development in the mixed variable and attribute control charts. Based on the recent development of the mixed control chart, it can be seen that there are a few works that studied the mixed monitoring variable and attribute characteristics. Therefore, more development in this area is needed especially for nonlinear data. This work proposes the mixed control chart based on the Kernel PCA Mix algorithm. The control limit of the statistics from PCs Mix is estimated using the KDE method which has better performance in estimating the non-normal data. The proposed chart is expected to have better performance to monitor the nonlinear mixed data. To show this, the performance of the proposed chart is compared with the conventional PCA Mix chart. Also, the application to the real data is conducted.

Kernel PCA

PCA is the basis of transformation to diagonalize the estimated covariance matrix C from input data. PCA was originally proposed for linear data. Therefore, this method is not powerful for nonlinear data. To overcome this nonlinearity problem, Schölkopf et al. (1997) proposed the Kernel PCA scheme. The basic idea of Kernel PCA is calculating the Principal Component Scores in higher dimensional space by conducting a nonlinear mapping as displayed in Fig. 1. This mapping can be executed by utilizing the kernel functions known from the Support Vector Method (SVM) (Boser et al., 1992).
Figure 1

Illustration of KPCA.

Illustration of KPCA. Assume that the centered data are mapped to feature space F, . The feature space covariance matrix with a size of can be written as in Equation (3.1). The next step is estimating the eigenvalues eigenvector that satisfies Equation (3.2). In general, the mapping is not always can be calculated. To solve the problem, the dot product calculation from to vector in feature space is performed. Let K with a size of defined as . The Principal component score (PCs) t is computed using projection of to eigenvector , where , as expressed in Equation (3.3). To solve the eigenvalue problem and principal component calculation, nonlinear mapping is not needed to be conducted. To replace this, the kernel function can be constructed .

Kernel PCA Mix chart

Statistics calculation

The main concept of the Kernel PCA Mix chart is to form the Z as a representation of the mixed variable. There are two main steps in the KPCA Mix chart procedure. First, the statistics are computed from matrix Z. Second, the control limit calculation is performed by applying the KDE. These procedures are illustrated by the flowchart in Fig. 1. Furthermore, detailed procedures are given as follows: Statistics calculation Create matrix sized where: is the centered version of a matrix which is contained the variable characteristics (numeric data). is the centered version of a matrix B which is contained the dummy from each category in attribute characteristics (categorical data) . Define , where is the identity matrix with the size of . Define , where the first p columns are specified as by 1 and the last m columns are weighted by , for . Calculate . Calculate the matrix kernel . Calculate Principal Component Scores (PCs) t using the formula as shown in Equation (4.4). From the first l principal component t, calculate the statistics using Equation (4.5). where , and eigenvalues that correspond to v-th PCs.

Control limit calculation

The control limit is estimated using the KDE approach due to its ability to follow the unknown distribution of data input. The procedures of control limit calculation are presented as: Estimate the empirical density of statistics using Equation (4.6). Calculate using the numerical integration trapezoid rule as in Equation (4.7). where and are the maximum and minimum values of . Calculate the control limit using the expression as shown in Equation (4.8).

Performance evaluation

Simulation set-up

The performance of the proposed control chart is assessed for the variable characteristics (numeric data) which have a nonlinear relationship. The nonlinear data is generated using the following procedures: The visualizations of those five generated characteristics are presented in Fig. 2.
Figure 2

3D Scatter plot of generated nonlinear data: a) , and, , b) , and, , c) , and, , d) , and, , e) , and, , f) , and, .

Generate vector and . Define five nonlinear variable characteristics as: 3D Scatter plot of generated nonlinear data: a) , and, , b) , and, , c) , and, , d) , and, , e) , and, , f) , and, .

Performance evaluation

The number of variable quality characteristics (generated from the Multivariate Normal distribution) involved is five. Meanwhile, the number of principal components l evaluated is 2, 3, and 4. The performance is evaluated for three cases, namely the case of attribute characteristics (generated from the Multinomial distribution) with extreme imbalanced, imbalanced, and balanced proportions as defined below: Furthermore, three categories of kernel functions utilized in this research are defined as follows: Balanced case with parameter Imbalanced case with parameter Extreme Imbalanced case with parameter Linear: . Polynomial: . Radial Basis Function (RBF): .

Extreme imbalanced case

The performance of the Kernel PCA Mix chart in handling nonlinear data with an extreme imbalanced proportion of attribute characteristics is tabulated in Table 4, Table 5, Table 6. For the small number of the principal component score used, it is seen that the RBF kernel performs poorer compared to the other kernels. Meanwhile, for the larger number of the principal component score used, the RBF kernel displays better results compared to the other functions. Also, for this case, the KDE control limit produces stable ARL0 at about 370.
Table 4

ARLs of an extreme imbalanced case for l = 2.

Shift
Kernel functions
δSδμRBFPolynomialLinear
00376.820374.850379.000
0.10.0025367.375377.570375.855
0.20.0050357.063354.560368.283
0.30.0075313.003345.998365.330
0.40.0100284.322330.686346.508
0.50.0125264.272317.742327.998
0.60.0150250.244302.643310.600
0.70.0175236.421286.088293.735
0.80.0200226.051268.144274.916
0.90.0225220.402252.661261.438
1.00.0250219.707238.942246.952
1.10.0275224.183225.516233.486
1.20.0300239.949213.429221.341
1.30.0325272.919202.299209.421
1.40.0350310.705191.916199.267
1.50.0375352.232182.158189.546
Table 5

ARLs of an extreme imbalanced case for l = 3.

Shift
Kernel functions
δSδμRBFPolynomialLinear
0.10.0025370.920380.330391.740
0.20.0050361.410362.590380.240
0.30.0075356.220363.143387.913
0.40.0100323.920340.355382.750
0.50.0125303.690336.754365.910
0.60.0150281.498319.845341.658
0.70.0175267.280308.166323.410
0.80.0200252.489294.769306.358
0.90.0225235.111282.124287.041
1.00.0250220.115268.235269.949
1.10.0275207.927252.926256.348
1.20.0300196.856240.774242.078
1.30.0325186.622227.922228.676
1.40.0350177.566214.832216.686
1.50.0375169.523204.300205.981
Table 6

ARLs of an extreme imbalanced case for l = 4.

Shift
Kernel functions
δSδμRBFPolynomialLinear
00362.860376.820388.540
0.10.0025365.750405.895444.445
0.20.0050359.500410.370427.713
0.30.0075350.493406.170421.805
0.40.0100338.492397.478404.936
0.50.0125321.630381.345381.178
0.60.0150311.320358.761362.487
0.70.0175297.746341.734342.331
0.80.0200285.544320.178327.656
0.90.0225274.721305.700314.535
1.00.0250260.177293.063299.053
1.10.0275248.052280.185284.094
1.20.0300236.200266.963270.278
1.30.0325224.626254.882258.745
1.40.0350214.449243.619247.118
1.50.0375205.602233.453235.817
ARLs of an extreme imbalanced case for l = 2. ARLs of an extreme imbalanced case for l = 3. ARLs of an extreme imbalanced case for l = 4.

Imbalanced case

Table 7, Table 8, Table 9 show the Kernel PCA Mix chart performance in inspecting the nonlinear for an extreme imbalanced proportion of attribute characteristics. Similar to the previous results, the control limit produces stable ARL0 at about 370. For all number of principal component scores used, the RBF kernel has a preferable performance compared to the other functions. It is also known that the linear kernel displays poorer results in this case.
Table 7

ARLs of the imbalanced case for l = 2.

Shift
Kernel
δSδμRBFPolynomialLinear
00386.060367.300380.950
0.10.0025346.665349.770384.000
0.20.0050306.600328.840379.383
0.30.0075268.633327.043366.278
0.40.0100242.388317.712348.862
0.50.0125222.198302.458333.512
0.60.0150208.613284.601314.729
0.70.0175193.365266.913295.940
0.80.0200182.924250.563277.669
0.90.0225175.184235.500262.847
1.00.0250172.916222.804246.770
1.10.0275172.819209.485233.871
1.20.0300176.240198.273220.032
1.30.0325175.111187.549207.769
1.40.0350167.725178.290197.263
1.50.0375159.685169.472187.162
Table 8

ARLs of an imbalanced case for l = 3.

Shift
Kernel
δSδμRBFPolynomialLinear
00371.020359.550396.730
0.10.0025369.610376.185425.675
0.20.0050355.200374.097423.697
0.30.0075353.843369.205422.503
0.40.0100331.568358.198400.838
0.50.0125306.777351.158377.570
0.60.0150284.471335.774355.724
0.70.0175264.586319.088336.219
0.80.0200248.086301.681317.538
0.90.0225233.595284.584299.487
1.00.0250220.216269.831281.449
1.10.0275207.939256.110265.438
1.20.0300197.140242.057250.743
1.30.0325187.698228.902238.136
1.40.0350178.887217.107226.352
1.50.0375161.626206.726214.664
Table 9

ARLs of an imbalanced case for l = 4.

Shift
Kernel
δSδμRBFPolynomialLinear
00371.100394.810377.530
0.10.0025351.615382.655396.125
0.20.0050337.440360.083401.523
0.30.0075335.985345.143395.915
0.40.0100322.286329.336381.536
0.50.0125308.940309.580363.160
0.60.0150296.383295.949344.946
0.70.0175279.708278.995325.604
0.80.0200264.274265.733306.423
0.90.0225251.411252.864287.762
1.00.0250238.127239.604273.223
1.10.0275226.427228.050260.837
1.20.0300217.344218.267248.189
1.30.0325207.195207.876236.569
1.40.0350197.691198.643225.320
1.50.0375188.935189.732215.198
ARLs of the imbalanced case for l = 2. ARLs of an imbalanced case for l = 3. ARLs of an imbalanced case for l = 4.

Balanced case

Kernel PCA Mix chart performance in assessing the nonlinear data with a balanced proportion of attribute characteristics is displayed in Table 10, Table 11, Table 12. Similar to the previous results, the control limit produces consistent ARL0 at about 370. The RBF kernel performs better compared to the others for all number of principal component scores used. Also, the RBF kernel reaches its peak performance when inspecting the balanced proportion of attribute characteristics. For this case, the Polynomial and Linear kernel functions have similar performance.
Table 10

ARLs of a balanced case for l = 2.

Shift
Kernel
δSδμRBFPolynomialLinear
00380.770398.270351.040
0.10.0025364.740426.900363.020
0.20.0050317.727404.150370.863
0.30.0075281.193388.378358.250
0.40.0100257.002375.390346.804
0.50.0125239.968353.718335.335
0.60.0150224.706333.024312.767
0.70.0175210.456310.153293.535
0.80.0200204.304290.936276.356
0.90.0225197.367272.970259.842
1.00.0250198.296256.436245.284
1.10.0275187.847242.783231.725
1.20.0300184.638229.729218.880
1.30.0325173.244217.334206.827
1.40.0350171.971205.771196.301
1.50.0375160.653195.590186.618
Table 11

ARLs of a balanced case for l = 3.

Shift
Kernel
δSδμRBFPolynomialLinear
00374.770365.750385.610
0.10.0025368.130412.890389.070
0.20.0050349.987402.257384.577
0.30.0075318.833389.743379.578
0.40.0100294.130375.072359.124
0.50.0125274.145352.745340.432
0.60.0150256.280338.009320.169
0.70.0175245.261314.724301.196
0.80.0200230.261293.350284.602
0.90.0225218.263276.424270.494
1.00.0250207.781259.397257.344
1.10.0275196.715243.654241.558
1.20.0300187.601229.277227.823
1.30.0325178.948216.039215.006
1.40.0350170.626204.779204.089
1.50.0375162.887194.774193.501
Table 12

ARLs of a balanced case for l = 4.

Shift
Kernel
δSδμRBFPolynomialLinear
00373.580380.340372.780
0.10.0025355.515439.030414.945
0.20.0050345.457432.473404.050
0.30.0075322.988421.480398.750
0.40.0100317.214410.244389.146
0.50.0125306.588387.608373.783
0.60.0150287.846366.947351.531
0.70.0175276.688349.161332.004
0.80.0200260.716329.830311.562
0.90.0225249.611311.397294.925
1.00.0250239.357296.359278.536
1.10.0275228.161281.018262.665
1.20.0300218.475265.932248.548
1.30.0325208.885252.933235.671
1.40.0350200.093241.224224.245
1.50.0375191.384230.123212.913
ARLs of a balanced case for l = 2. ARLs of a balanced case for l = 3. ARLs of a balanced case for l = 4.

Comparison with PCA Mix chart

The Kernel PCA Mix performance chart is compared with the performance of the PCA Mix chart in inspecting the nonlinear data. The performance comparisons for extreme imbalanced, imbalanced, and balanced cases are tabulated in Table 13, Table 14, Table 15, respectively. Meanwhile, the visualizations of these comparisons are displayed in Figure 3, Figure 4, Figure 5.
Table 13

Performance comparison between KPCA Mix and PCA Mix charts for extreme imbalanced case.

Shift
p=5, l=2
p=5, l=3
p=5, l=4
δSδμKPCA MixPCA MixKPCA MixPCA MixKPCA MixPCA Mix
00376.820383.490370.920376.110362.860385.690
0.10.0025367.375358.360361.410465.410365.750438.810
0.20.0050357.063340.610356.220408.150359.500430.130
0.30.0075313.003361.040323.920493.960350.493469.200
0.40.0100284.322397.270303.690424.150338.492436.240
0.50.0125264.272352.370281.498430.750321.630499.830
0.60.0150250.244335.160267.280413.010311.320461.580
0.70.0175236.421276.230252.489364.630297.746411.360
0.80.0200226.051253.160235.111303.430285.544332.780
0.90.0225220.402217.230220.115315.980274.721328.360
1.00.0250219.707154.640207.927213.670260.177263.660
1.10.0275224.183134.610196.856169.880248.052212.700
1.20.0300239.949120.240186.622166.900236.200177.520
1.30.0325272.91989.690177.566136.860224.626166.600
1.40.0350210.70570.400169.523107.190214.449140.340
1.50.0375152.23267.120162.29287.070205.60295.630
Table 14

Performance comparison between KPCA Mix and PCA Mix charts for imbalanced case.

Shift
p=5, l=2
p=5, l=3
p=5, l=4
δSδμKPCA MixPCA MixKPCA MixPCA MixKPCA MixPCA Mix
00386.060360.580374.770372.990371.100381.750
0.10.0025346.665358.310368.130487.140351.615490.200
0.20.0050306.600359.580349.987435.500337.440518.210
0.30.0075268.633359.080318.833470.580335.985557.740
0.40.0100242.388346.050294.130427.430322.286569.470
0.50.0125222.198345.080274.145452.800308.940500.090
0.60.0150208.613302.500256.280412.790296.383487.080
0.70.0175193.365279.090245.261346.090279.708398.220
0.80.0200182.924231.490230.261340.540264.274379.700
0.90.0225175.184166.520218.263306.790251.411339.520
1.00.0250172.916178.650207.781250.840238.127292.030
1.10.0275172.819143.750196.715186.980226.427268.970
1.20.0300176.240119.500187.601162.270217.344216.290
1.30.0325175.11181.310178.948145.640207.195174.670
1.40.0350167.72573.920170.626112.920197.691143.190
1.50.0375159.68558.780162.88791.410188.935112.000
Table 15

Performance comparison between KPCA Mix and PCA Mix charts for balanced case.

Shift
p=5, l=2
p=5, l=3
p=5, l=4
δSδμKPCA MixPCA MixKPCA MixPCA MixKPCA MixPCA Mix
00380.770378.110374.770370.220373.580383.910
0.10.0025364.740373.140368.130365.360355.515488.570
0.20.0050317.727366.600349.987466.790345.457572.220
0.30.0075281.193366.600318.833447.910322.988565.340
0.40.0100257.002374.300294.130425.940317.214570.590
0.50.0125239.968367.060274.145456.440306.588509.660
0.60.0150224.706366.260256.280434.600287.846451.400
0.70.0175210.456298.540245.261334.870276.688419.120
0.80.0200204.304223.350230.261310.620260.716362.910
0.90.0225197.367189.670218.263276.670249.611307.540
1.00.0250198.296164.760207.781236.940239.357255.030
1.10.0275177.847143.490196.715212.420228.161235.770
1.20.0300174.638113.170187.601145.390218.475187.540
1.30.0325163.24494.000178.948121.600208.885147.950
1.40.0350161.97169.930170.626110.920200.093123.910
1.50.0375150.65351.270162.88790.500191.38495.890
Figure 3

ARLs comparison for extreme imbalanced case for: a) p = 5, l = 2, b) p = 5, l = 3, and c) p = 5, l = 4.

Figure 4

ARLs comparison for imbalanced case: a) p = 5, l = 2, b) p = 5, l = 3, and c) p = 5, l = 4.

Figure 5

ARLs comparison for balanced case: a) p = 5, l = 2, b) p = 5, l = 3, and c) p = 5, l = 4.

Performance comparison between KPCA Mix and PCA Mix charts for extreme imbalanced case. Performance comparison between KPCA Mix and PCA Mix charts for imbalanced case. Performance comparison between KPCA Mix and PCA Mix charts for balanced case. ARLs comparison for extreme imbalanced case for: a) p = 5, l = 2, b) p = 5, l = 3, and c) p = 5, l = 4. ARLs comparison for imbalanced case: a) p = 5, l = 2, b) p = 5, l = 3, and c) p = 5, l = 4. ARLs comparison for balanced case: a) p = 5, l = 2, b) p = 5, l = 3, and c) p = 5, l = 4.

Discussion

In this subsection, some discussion about the performance of the proposed chart is provided. First, the best kernel used is the RBF kernel. This happened because the other kernel is developed based on a linear kernel. As we know that the process is generated to follow the nonlinear relationship. The RBF kernel is renowned to have a better performance in inspecting the nonlinear process and under general smoothness assumptions (Zhicheng et al., 2012). Therefore, it makes sense that the RBF kernel performs better in this study. Table 16 tabulates the summary of the performance comparison between the Kernel PCA Mix chart and PCA Mix chart. In general, both charts yield good performance in detecting the process shift. However, for the specific case, the Kernel PCA Mix chart demonstrates better performance for the small process shift. Meanwhile, the PCA Mix chart has a better performance for a large shift in process. This result indicates that the proposed method is better to be used for nonlinear data with a small shift. This happened because the PCA Mix chart is only developed for the linear process. In contrast, the proposed Kernel PCA Mix chart is developed to overcome the nonlinearity problem so that it has good performance.
Table 16

Summary of performance comparison.

Parameter data non-metriclKernel PCA MixPCA Mix
θ1,θ2 = 0.3 and θ3 = 0.42Image 1
3Image 1
4Image 1



θ1,θ2 = 0.1 and θ3 = 0.82Image 1
3Image 1
4Image 1



θ1,θ2 = 0.05 and θ3 = 0.92Image 1
3Image 1
4Image 1

• represents better performance for a small shift.

represents better performance for a large shift.

Summary of performance comparison. • represents better performance for a small shift. represents better performance for a large shift.

Application to the real data

In this section, the Kernel PCA Mix chart is applied to monitor intrusion in the real dataset. The dataset used is the famous NSL KDD. This research only analyzes 20% of the NSL KDD dataset which can be found at https://www.unb.ca/cic/datasets/nsl.html. The summary of this dataset is displayed in Table 17. From Fig. 6, it is known that the normal connection of the NSL KDD dataset is not normally distributed. The RBF kernel is used in this analysis due to its performance consistency in simulation studies.
Table 17

Summary of NSLKDD 20% dataset.

Attack typesNumber of observationsPercentage (%)
Normal13,44953.39



DOS9,23436.65
Probe2,2899.09
U2R110.04
R2L2090,83



Total25,192100.00
Figure 6

NSL-KDD 20% QQ Plot for normal connection.

Summary of NSLKDD 20% dataset. NSL-KDD 20% QQ Plot for normal connection. Table 18 shows the accuracy rate of the Kernel PCA Mix chart in detecting intrusion in the NSL KDD dataset for several principal component scores. From the results, it is seen that the optimal number of principal components is 4. After finding the optimal number of principal components, this analysis is continued by searching for the optimal value of σ. Based on the result in Table 19, it can be known that the optimal value of σ is 0.001. From the detection results, it can be seen that the proposed method has a detection accuracy of about 0.85769. The misdetection happens due to the large value of the FN rate which indicates that more attacks cannot be accurately detected as the real attack.
Table 18

Performance of Kernel PCA Mix Control Chart in monitoring the NSL-KDD dataset for different numbers of principal components.

lAccuracyFP rateFN rate
20.827440.067510.29285
30.847410.067140.25044
40.857690.083050.21016
50.846530.073610.24491
70.823470.131830.22771
100.847410.067140.25044
200.689860.427240.17601
Table 19

Performance of Kernel PCA Mix Control Chart in monitoring the NSL-KDD dataset for l = 4 and several values of σ.

σAccuracyFP rateFN rate
0.100000.587720.026320.85429
0.010000.845220.068250.25385
0.001000.857690.083050.21016
0.005000.845900.060220.26160
0.000100.634920.526430.18027
0.000010.533850.000001.00000
Performance of Kernel PCA Mix Control Chart in monitoring the NSL-KDD dataset for different numbers of principal components. Performance of Kernel PCA Mix Control Chart in monitoring the NSL-KDD dataset for l = 4 and several values of σ. The performance comparison with the other methods is shown in Table 20. The proposed method is compared with several machine learning algorithms (Decision Tree, Naïve Bayes, Logistic Regression, and Support Vector Machine) and control chart method (Hotelling's and PCA Mix chart). According to the table, it is clear that the proposed method has higher accuracy compared to the other machine learning methods and control chart method for the same number of quality characteristics monitored. Also, we can see that the proposed method yields a lower FP rate. This is indicating that the proposed method produces a lower false alarm.
Table 20

Performance comparison with the other methods.

MethodAccuracyFP rate
Hybrid Decision Tree (Farid et al., 2014)0.81920.1740
Hybrid Naïve Bayes (Farid et al., 2014)0.82390.1640
Logistic Regression (Belavagi and Muniyal, 2016)0.84000.1700
Support Vector Machine (Belavagi and Muniyal, 2016)0.75000.2400
Hotelling's T2 chart0.70230.1433
PCA Mix0.80410.3171
Proposed method0.85770.0831
Performance comparison with the other methods.

Conclusion and future research

In this research, the control chart which has the ability in monitoring the mixed variable and attribute characteristics with nonlinear relationships is proposed. The performance of the proposed chart is evaluated for several types of attribute characteristics and several kernel functions. Through simulation studies, it can be seen that the Kernel PCA Mix chart can detect the shift in process. It also can be known that the better kernel function is RBF due to its consistency in detecting a shift in process. The comparison with the PCA Mix chart shows that the proposed chart has better performance for a small shift in the process. On the other hand, the PCA Mix chart has better performance for a large shift. This method can be applied in monitoring the process with a nonlinear relationship such as in manufacture and industry, chemical process, biological process, and network anomaly detection. Furthermore, the proposed chart is also applied to monitor the real dataset. The well-known NSL KDD dataset is used as the benchmark for the proposed chart. The monitoring results show that the proposed chart has a good accuracy detection at about 0.85769. Compared to the other methods the proposed demonstrates a better performance by producing higher accuracy and lower false alarms. For future research, the Generative Principal Component Analysis (K. Liu et al., 2020, 2021) can be used in order to improve the performance of the proposed method. Also, the Bayesian-based PCA method (Y. Liu et al., 2018) can be applied for imbalanced cases.

Declarations

Author contribution statement

Muhammad Ahsan: Conceived and designed the experiments; Performed the experiments; Analyzed and interpreted the data; Wrote the paper. Muhammad Mashuri: Conceived and designed the experiments; Wrote the paper. Hidayatul Khusna: Analyzed and interpreted the data; Wrote the paper. Wibawati: Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data.

Funding statement

This work was supported by (3/81/KP.PTNBH/2021).

Data availability statement

Data associated with this study is available at https://www.unb.ca/cic/datasets/nsl.html.

Declaration of interests statement

The authors declare no conflict of interest.

Additional information

There is no additional information.
  1 in total

1.  Enhanced Defect Detection in Carbon Fiber Reinforced Polymer Composites via Generative Kernel Principal Component Thermography.

Authors:  Kaixin Liu; Zhengyang Ma; Yi Liu; Jianguo Yang; Yuan Yao
Journal:  Polymers (Basel)       Date:  2021-03-08       Impact factor: 4.329

  1 in total
  1 in total

1.  Identification of Differential Expression Genes between Volume and Pressure Overloaded Hearts Based on Bioinformatics Analysis.

Authors:  Yuanfeng Fu; Di Zhao; Yufei Zhou; Jing Lu; Le Kang; Xueli Jiang; Ran Xu; Zhiwen Ding; Yunzeng Zou
Journal:  Genes (Basel)       Date:  2022-07-19       Impact factor: 4.141

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.