Literature DB >> 31767888

Multivariate nonparametric chart for influenza epidemic monitoring.

Liu Liu1, Jin Yue2, Xin Lai3, Jianping Huang4, Jian Zhang5.   

Abstract

Control chart methods have been received much attentions in biosurvillance studies. The correlation between charting statistics or regions could be considerably important in monitoring the states of multiple outcomes or regions. In addition, the process variable distribution is unknown in most situations. In this paper, we propose a new nonparametric strategy for multivariate process monitoring when the distribution of a process variable is unknown. We discuss the EWMA control chart based on rank methods for a multivariate process, and the approach is completely nonparametric. A simulation study demonstrates that the proposed method is efficient in detecting shifts for multivariate processes. A real Japanese influenza data example is given to illustrate the performance of the proposed method.

Entities:  

Mesh:

Year:  2019        PMID: 31767888      PMCID: PMC6877522          DOI: 10.1038/s41598-019-53908-6

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


Introduction

Control charts are useful tools for fault detection[1]. Shewhart chart, CUSUM chart and EWMA chart are most popular tools in statistical process control. These control charts are efficient and fruitful for fault diagnosis in practical applications. Most control charts that need observations are univariate and usually assume that the observation follows a known gaussian distribution. In real life, we usually process multivariate or high-dimensional variables rather than univariate variables. The monitoring of high–dimensional data in a timely manner has become increasingly important in quality control. Hotelling[2] proposed a T-squared control chart for multivariate process, which assumes that the dataset distributions are multivariate normal distribution. Both the parameters of mean vector and variance matrix are known. Based on T2 statistics, Lowry et al.[3] proposed a multivariate CUSUM chart. Furthermore, Sullivan and Woodall[4] provided a change–point chart for detecting a shift of the location parameter, the scale parameter. However, statistical process control is a challenge when the underlying distribution and the magnitude of changes are both totally unknown. For the situation of a multivariate process with an unknown distribution, Yue and Liu[5], from the point of Mahalanobis data depth, introduced a chart for monitoring processes for multivariate process. Data depth is efficient and totally nonparametric. However, the computational complexity is high as the number of variables grows and may influence the performance of detection of a chart. In addition, the covariance matrix of the data depth method is constant[5]. Therefore, the method may be unsuitable when the covariance matrix in a process is not stable. Zou and Tsung[6] proposed a new multivariate EWMA chart to detect location parameters. The chart is affine-invariant, and its controlled run length distribution is the same for the class of distributions with elliptical directions. Some strictly distribution–free rank–based methods have been developed to increase the efficiency in detecting a nonparametric process[7-9]. The computation speed of these rank–based methods is fast, and the methods are easy to implement. However, all of these methods focus on a univariate process. In our article, we introduce a new nonparametric multivariate EWMA chart based on rank method, which is combined with the Hotelling T2 statistic for a multivariate process. This method is completely distribution–free, and it is easy to implement in applications. Moreover, the covariance matrix of observations keeps being updated as new observations arrive. Additionally, the computation load is very light. For multivariate or high-dimensional statistical process control, location parameter shifts sometimes occur in only one or a few characteristics in a process. We want to detect these shifts quickly, accurately and to identify the shifted location parameter components. Consider this issue, fruitful nonparametric control charts have been introduced in the literature. Qiu and Hawkins[10] and Hawkins[11] constructed a new multivariate statistical process control chart and indicated that proposed chart was more efficient than the T2 control chart when a shift occurred in only one characteristic. However, the shift of a process is usually unknown and may occur in several highly correlated variables. To address this issue, in the context of a process where the location parameter often changes in a few number of variables, Zou and Qiu[12] proposed a useful multivariate statistical process control chart by using the LASSO tool. In addition, inspired by Zou and Tsung, Liang et al.[13] came up with a new multivariate EWMA chart to monitor sparse mean changes. In our paper, the proposed method is designed to detect sparse mean changes, and the results shows that this method performs relatively better in applications. Previous studies showed that the multivariate control chart could be useful for biosurveillance. Rogerson and Yamada[14] proposed a multivariate cumulative sum approach to detect the change in spatial patterns and applied it to a county-level breast cancer datasets. Their results suggested that the proposed chart for multivariate process performed relatively better compared with the univariate method when shifts occurred in many regions. Abdollahian and Hayati Rezvan[15] applied a multivariate EWMA control chart to monitor patient’s progress after cardiac surgery, in which the proposed multivariate EWMA chart can detect an out-of-control signal that was missed by the univariate EWMA charts. This is because that the correlations between charting statistics are ignored in univariate chart. Then the univariate chart may give a misleading indication when such correlation is considerably high. The structure of this paper is organized as follows: in Section 2, the rank–based method is given, and a nonparametric chart for online monitoring is provided. A simulation of this control chart is presented in Section 3. Real data are studied to illustrate the performance of the proposed control chart in Section 4. Finally, some conclusions are presented in Section 5.

Model

EWMA control chart

The EWMA control chart has good properties for control applications. Lucas and Saccucci[16] studied the performance of EWMA and CUSUM charts. In their paper, the EWMA chart has relatively better performance for small shifts with an appropriate smoothing parameter. The EWMA control chart is first introduced for univariate variables. The EWMA control chart is easy to construct and implement, and it is based on the following statistic: Z is the EWMA statistic, where the starting value is Z0 = 0, and λ is a smoothing parameter. X represents the observations in a process. The EWMA chart corresponds to a Shewhart control chart when λ = 1. The weight of the historical data is decided by the magnitude of the smoothing parameter. A process is considered out-of-control (OC) whenever Z falls outside the range of the control limits.

Rank–based methods

A rank–based method is first given for a one–dimensional process. Liu et al.[9] introduced the rank–based method and assumed that independent observations, X, follow the model below:where μ0 is the in-control (IC) location parameter, and μ1 is the OC location parameter. τ is the unknown change point. F is an unknown continuous distribution function. Let R denote the i th sequential rank; Liu et al.[9] presented the formula for the rank of X among X1, X2, …, X, …, X as follws: The standardized sequential rank was defined aswhere . Therefore, Then, Therefore, the distribution of R* is defined in the interval The asymptotic distribution of R* is U(, ) as i → ∞. In the context of a multivariate process, it is supposed that there are m independent observations from an unknown multivariate continuous distribution with dimensionality p. That is, Y = (Y1,,Y2,, …, Y)′, i = 1, 2, …, m. There are p characteristics to be examined that we are interested in. For a set of variables, Y, Y, …, Y, j = 1, 2, …, p, which represents the j th characteristic with m observations, the rank–based method can be used to construct statistics. When the observations are p-dimensional, the i th observations are Y = (Y1,, Y2,, …, Y)′. For the j th component, Y, R* denote the i th standardized sequential rank with the arrival of the j th component Y. Therefore, the vectors Q = (R1,*, R2,*, …, R*)′ can be obtained. In addition, each component R* follows the same uniform distribution as R*. Then, the EWMA statistics can be constructed, which are based on T2 statistics. The EWMA statistics are given bywhere R = diag(λ1, λ2, …, λ, …, λ), <λ ≤ 1 represents the smoothing parameter. I represents the p-dimensional identity matrix. If there is no a priori information given, different smoothing parameters are needed for different components; then, λ1 = λ2 = ··· = λ = ··· = λ are used, and the starting value is Z0 = (0, 0, …, 0)′. The process is considered to be OC if a manufacturing or business process is in a state of uncontrollable (i.e. ZΤΣ−1Z > L), where L is the upper control limit. And the covariance matrix of Z is as follows: In particular, Σ = (1−(1−λ)2)λ/(2−λ)Σ when λ1 = λ2 = ··· = λ = ··· = λ = λ. λ is a fixed value. Usually, we take the limit form, Σ = λ/(2−λ)Σ. Σ, the covariance matrix of Q, is estimated from samples in practice.

Simulation

In the art of research, fruitful distribution–free control charts have been introduced. If a chart IC run–length distributions are the same to every continuous distribution[17], we call this chart is nonparametric or distribution-free. We discuss the choice of parameter by using the multivariate normal distribution. This indicates that the determine of parameters are still valid when a series of observations obey other distributions. Therefore, we consider the i th observation, X, is collected as time goes by using the following relational model:where And α is the probability of a type I error and β is the probability of a type II error. For a fair comparison, we usually fix α and compare β. A small β is considered better. The average run length (ARL) is the number of points that, on average will be plotted on a control chart before an OC signal. If a manufacturing or business process is IC: If the process is considered OC: Therefore, we fix IC ARL, ARL0 and compare OC ARL, ARL1. A small ARL1 is considered better. Meanwhile, inspired by Han and Tsung[18], we consider the relative mean index (RMI) values to evaluate the average performance of these charts for detecting a range of parameter changes, which are given as following:where m is the number of shifts that we considered. When detecting a certain shift δ, ARL denotes as the OC ARL of these given charts. And MARL is the smallest OC ARL among all the OC ARL values of these charts when detecting a certain shift δ. The RMI calculates the average of all the detection efficiency values[18]. A control chart with a relatively smaller RMI value is regarded as relatively better detection efficiency. We suppose that there are 1000 independent and identically distributed historical (reference) observations. X1, X2, …, X1000 are 1000 random observations from N(μ0,Σ0). To make a fair comparison, all of these control charts have the same IC zero–state ARL, which is equal to 500. It should be note that zero-state run lengths refer to the run lengths of control charts initialized at the target value[16]. When the process goes OC, a chart is considered as a better detection efficiency with a small ARL. The ARLs of these EWMA methods with λ = 0.03 for a range of shifts are presented in Table 1. EWMA1 represents the rank-based EWMA scheme, and EWMA2 represents an EWMA control chart based on the Mahalanobis depth method[5]. We also provide simulation studies with the non-diagonal covariance matrix
Table 1

ARL comparisons for the EWMA control chart under N(μ0,Σ0) with a zero–state ARL = 500.

δXRMI
0.250.511.522.534
τ= 100
EWMA1210.991.425.815.911.610.4109.60.04
EWMA2325.7154.939.819.612.910.18.980.27
τ= 200
EWMA1108.763.233.32012.31110.79.50.16
EWMA2314147.734.917.211.28.77.66.10.41
τ= 400
EWMA1137.376.137.420.215.212.711.810.30.24
EWMA2347.7145.438.118.111.397.76.90.31
ARL comparisons for the EWMA control chart under N(μ0,Σ0) with a zero–state ARL = 500. The ARLs of the EWMA scheme with λ = 0.03 for a range of shifts under N(μ0,Σ1) are presented in Table 2. In addition, the detection performance of these charts under a bivariate Weibull distribution, LBVW(θ1, θ2,α,ρ) are shown in Table 3. θ1 and θ2 are the scale parameter. α is the shape parameters. ρ is the correlation coefficient. When a process is IC, . when the process is OC. Tables 1–3 provide the ARL of the EWMA1 and EWMA2 control charts for a range of shifts δ. Tables 1–3 show that the EWMA1 control chart has a relatively better performance for detecting small shifts. EWMA2 has a better performance for detecting large shifts. On the whole, EWMA1 has a relatively small RMI.
Table 2

ARL comparisons for the EWMA control chart under N(μ0,Σ1) with a zero–state ARL = 500.

δXRMI
0.250.511.522.534
τ= 100
EWMA1239.1177.528.116.512.911.510.39.10.02
EWMA2345.2281.54724.315.711.39.68.40.3
τ= 200
EWMA1211.6127.427.61612.810.59.67.90.04
EWMA2260.5163.545.823.715.1108.370.23
τ= 400
EWMA11909426.315.512.99.88.97.50.04
EWMA2236.9149.641.321.614.79.17.56.90.24
Table 3

ARL comparisons for the EWMA control chart under LBVW(1, 1, 1, 0.5) with a zero–state ARL = 500.

δXRMI
0.250.511.522.534
τ= 100
EWMA1132.762.124.216.713.912.911.810.60.1
EWMA215694.9401912.310.19.990.19
τ= 200
EWMA1116.242.923.716.213.912.811.810.30.11
EWMA21346231.92111.2109.68.90.16
τ= 400
EWMA193.339.321.716.213.111.911.510.10.1
EWMA2107.755.42618.11110.29.78.10.11
ARL comparisons for the EWMA control chart under N(μ0,Σ1) with a zero–state ARL = 500. ARL comparisons for the EWMA control chart under LBVW(1, 1, 1, 0.5) with a zero–state ARL = 500. Table 4 presents the simulation results under N(μ2,Σ2), where μ2 = (0, 0, 0, 0, 0, 0) and Σ2 is 6 × 6 indentity matrix. Table 4 shows that EWMA1 still performs better. Sometimes, we encounter the case that observations follow block-diagonal correlation structures. Therefore, we provided ARL comparisons for observations follow a block-diagonal correlation structures, which presented in Table 5. Where μ3 = (0, 0, 0, 0) and
Table 4

ARL comparisons for the EWMA control chart under N(μ2,Σ2) with a zero–state ARL = 500.

δXRMI
0.250.511.522.534
τ= 100
EWMA1135.437.415.312.89.48.987.60.04
EWMA2175.261.52413.38.78.27.670.19
τ = 200
EWMA18528.41511.298.17.57.10.03
EWMA2106.543.519.912.78.17.57.370.16
τ= 400
EWMA170.921.713.510.98.67.57.370.03
EWMA289.938.616.311.687.176.80.12
Table 5

ARL comparisons for the EWMA control chart designed to detect a shift under N(μ3,Σ3) with a zero–state ARL = 500.

δXRMI
0.250.511.522.534
τ= 100
EWMA1114.931.113.711.39.98.98.67.90.01
EWMA234112231.614.511.78.88.47.50.83
τ= 200
EWMA178.929.313.410.49.38.57.17.10.02
EWMA2198.798.328.713.810.58.57.16.10.68
τ= 400
EWMA167.626.312.99.38.78.17.17.10.02
EWMA2110.66821.911.68.57.96.86.80.4
ARL comparisons for the EWMA control chart under N(μ2,Σ2) with a zero–state ARL = 500. ARL comparisons for the EWMA control chart designed to detect a shift under N(μ3,Σ3) with a zero–state ARL = 500. Table 5 shows the proposed methods performs relatively better. In addition, the proposed control chart based on ranks of data is a nonparametric method without assuming normal or Poisson distribution for the data. To investigate the performance of the proposed method for Poisson data, we conducted an additional simulation study under multivariate Poisson distribution. Results in Table 6 showed that the proposed methods (EWMA1) still had a better performance in terms of the OC ARL and RMI.
Table 6

ARL comparisons for the EWMA control chart designed to detect a shift under multivariate Poisson(θ1 + δ, θ2, θ0) with a zero–state ARL = 500, where (θ1, θ2, θ0) = (0.5, 0.6, 0.2).

δXRMI
0.250.511.522.534
τ= 100
EWMA1105.229.713.612.210.99.38.98.10.02
EWMA2111.731.21612.710.68.58.57.90.04
τ= 200
EWMA166.42513.310.398.68.17.90.03
EWMA271.727.914.410.69.18.17.37.20.04
τ= 400
EWMA152.624.913.110.78.68.38.17.70.02
EWMA261.825.514.110.98.38.37.67.30.04
ARL comparisons for the EWMA control chart designed to detect a shift under multivariate Poisson(θ1 + δ, θ2, θ0) with a zero–state ARL = 500, where (θ1, θ2, θ0) = (0.5, 0.6, 0.2). In addition, we also provide the computing time of the EWMA1 and EWMA2 control charts. From Fig. 1, EWMA1 has relatively shorter computing time compared to that of EWMA2. Therefore, the proposed EWMA control chart is chosen, which is based on rank methods, for monitoring in this paper.
Figure 1

Computing time of the EWMA1 and EWMA2 charts for a range of shifts.

Computing time of the EWMA1 and EWMA2 charts for a range of shifts.

Analysis of Japanese Data

Data source

That is the case, with the Japanese influenza data[19], which cover 6 regions in Japan. These regions include Gunma, Chiba, Tokyo, Ishikawa, Nagano, and Osaka. Influenza data analysis is a very important issue today[20,21]. Simultaneous monitoring of flu break–outs in multiple regions is an important topic in epidemiology. Influenza is an acute contagious disease caused by a virus[19]. The Japanese influenza data are used to illustrate the proposed control chart. Time–series data of the weekly incidence of influenza in Japan are used from January 2000 through December 2011. To evaluate the incidence data (see “Influenza Dataset” in Supplementary Information), we conduct spectral analysis, which is useful for investigating the periodicities of shorter time series, such as that of the incidence data used in the present study. The Japanese influenza data are presented in Fig. 2. A quantile–quantile (Q–Q) plot of each region that includes 782 historical observations is presented in Fig. 3. Figure 3 suggests that the normality assumption for the influenza data is invalid.
Figure 2

The Japanese influenza data.

Figure 3

The corresponding normal Q-Q plot.

The Japanese influenza data. The corresponding normal Q-Q plot. The correlation of six regions as shown in Fig. 4, for a total of C62 = 15 lines. Figure 4 shows that the cross-correlation is not stable. Therefore, we update the covariance matrix with the arrival of new observations. It should be noted that the covariance matrix Σ is updated, as presented in section 2.2.
Figure 4

Correlation of six regions.

Correlation of six regions.

Data analysis

In this section, a multivariate control chart is used to monitor the incidence of influenza in six regions which may have a certain correlation. Ignoring the correlation and using several univariate charts could lead to biased conclusions. For example, the univariate chart statistic may result in unnecessarily frequent out-of-control signals when the process is actually in control and may not detect the change when the process becomes out of control[3]. In the past few decades, many researchers have studied spectral analysis[22]. In addition to the obvious annual cycle of influenza epidemics, the longer–term incidence patterns are important for interpreting the mechanism of influenza epidemics. The method proposed by Sawada et al.[23] is a combination of spectral analysis and non–linear least squares fitting (LSF) for fitting analysis. Spectral analysis is a useful tool to investigate the periodicities of a short time series, and the formulations of the LSF curve are related to the research of Sawada et al. Spectral analysis is used identify the interepidemic period of influenza epidemics in Japan (see “Computing Code” in Supplementary Information). Based on spectral analysis, the trend of the incidence data is determined. The procedure comprises the following 3 steps. In step I, the influenza data (standardized datasets) are preprocessed. In step II, the temporal behavior of the interepidemic period is investigated. Then, LSF is used for the fitting analysis. This trend is then removed by subtracting the LSF curve from the data, thereby yielding the residual time–series data. In step III, the obtained residual time–series datasets are analysed. The vertical coordinates of Fig. 5 represents the power spectral density (PSD). Figure 5 indicates that the numbers of the maximum entropy method (MEM) spectral periods. From Fig. 5 and the processed data, we find that the power has a large magnitude at a frequency of 0.035 (1/week), and there is a second peak at a frequency of 0.019 (1/week). A large magnitude indicates that a large portion of the amplitude of the incidence data is expressed as a wave that repeats itself every year. Spectral analysis has enabled us to identify multiple periodicities for the interepidemic period of influenza epidemics (1- and 0.5-year periods). The residual time–series data are relevant.
Figure 5

Spectral analysis of the influenza data series.

Spectral analysis of the influenza data series. For residuals data, Table 7 presents the results of Shapiro-Wilk test and Kolmogorov-Smirnov test for normality. The p-values are smaller than 0.05, indicating that the data are non-normally distributed. Therefore, a nonparametric control chart could be more appropriate than those based on normality assumption. Moreover, a first order autoregressive model (AR(1)) is used to analyze the sequence correlation. Table 8 shows that sequences are highly correlated. Thus, the first order difference is employed to reduce the sequence correlation (see results in Table 9). Then the differential data can be used to illustrate the proposed method.
Table 7

Shapiro-Wilk test and Kolmogorov-Smirnov test for normality.

GunmaChibaTokyoIshikawaNaganoOsaka
Shapiro-Wilk test
W0.957380.9620.981650.939150.955040.94605
pvalue2.752e-142.271e-132.464e-08<2.2e-161.002e-142.809e-16
Kolmogorov-Smirnov test
D0.0752240.121620.178720.107590.0716470.10472
pvalue0.00028681.796e-10<2.2e-162.747e-080.0006527.112e-08
Table 8

The coefficients of AR(1) for residuals data.

GunmaChibaTokyoIshikawaNaganoOsaka
Coefficients0.90860.91050.93640.88540.90390.9111
Table 9

The coefficients of AR(1) for residuals data after the first order difference.

GunmaChibaTokyoIshikawaNaganoOsaka
Coefficients−0.1249−0.1813−0.1563−0.2178−0.0699−0.2118
Shapiro-Wilk test and Kolmogorov-Smirnov test for normality. The coefficients of AR(1) for residuals data. The coefficients of AR(1) for residuals data after the first order difference. The EWMA1 control chart of the residual data series is presented in Fig. 6. Figure 6 shows that EWMA statistics fall outside the range of the control limits in 2003, 2006, 2009. SARS jumped simultaneously from a village in China to two cities on opposite sides of the world, Singapore and Toronto, in 2003. H5N1 outbreaks in poultry peaked in 2006, and the highly pathogenic H5N1 avian influenza virus spread to affect wild or domestic birds in 17 new countries in Africa, Asia, Europe, and the Middle East. The H1N1 influenza pandemic continued to spread in 2009. From Fig. 7, the four peaks occurred at approximately the 160th case (2003-1-19), 366th case (2006-12-31), 509th case (2009-9-27), and 596th case (2011-5-29), respectively. The signal of alarm appeared for the 159th case (2003-1-12), 363th case (2006-12-10), 502th case (2009-8-9), suggesting that the proposed method can provide early detection of influenza epidemics.
Figure 6

EWMA1 control chart.

Figure 7

EWMA control chart based on data depth.

EWMA1 control chart. EWMA control chart based on data depth. We provide the performance of EWMA2 by using Japanese influenza data (Fig. 7). It can be observed that the EWMA2 chart shows an inconsistent trend with the result in practice (the charting statistics indicate that the six regions are almost at the epidemic level after 32 cases). This may be caused by the constant covariance setting in EWMA2. Hence, updating the covariance between the six regions could be important in correctly detecting an epidemic of influenza. We also presented six single univariate control charts for Japanese influenza data in Fig. 8. The univariate chart statistic gave unnecessarily frequent out-of-control signals when the process is actually in control. Specifically, the first out-of-control signal of six regions occurred approximately at the 30th case, the 61th case, the 42th case, the 24th case, the 27th case, and the 17th case, respectively. However the multivariate chart may suggest a in-control state, indicating that ignoring the correlation between regions in biosurveillance may give an unexpected high rate of false alarm.
Figure 8

Six single univariate control charts for Japanese influenza data.

Six single univariate control charts for Japanese influenza data.

Conclusions

This paper provides a new EWMA control chart based on rank methods for a multivariate process. The performance of an EWMA control chart based on rank methods and Mahalanobis depth are compared. The EWMA control chart based on rank methods has a relatively better performance for detecting small shifts. Finally, Japanese influenza data are also provided to illustrate the proposed control chart. Spectral analysis is first conducted to investigate the periodicities of shorter time series, and then non–linear least squares fitting is used for fitting analysis. The residual data series are obtained, and the residual data series are monitored. The Japanese influenza data example shows that the proposed control chart has relatively better performance for detecting process changes. Computing Code Influenza Dataset
  2 in total

1.  Monitoring non-parametric profiles using adaptive EWMA control chart.

Authors:  Saddam Akber Abbasi; Ali Yeganeh; Sandile C Shongwe
Journal:  Sci Rep       Date:  2022-08-22       Impact factor: 4.996

2.  Early Detection of SARS-CoV-2 Epidemic Waves: Lessons from the Syndromic Surveillance in Lombardy, Italy.

Authors:  Giorgio Bagarella; Mauro Maistrello; Maddalena Minoja; Olivia Leoni; Francesco Bortolan; Danilo Cereda; Giovanni Corrao
Journal:  Int J Environ Res Public Health       Date:  2022-09-28       Impact factor: 4.614

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.