Literature DB >> 34226773

Change-point analysis through integer-valued autoregressive process with application to some COVID-19 data.

Subhankar Chattopadhyay¹, Raju Maiti², Samarjit Das², Atanu Biswas¹.

Abstract

In this article, we consider the problem of change-point analysis for the count time series data through an integer-valued autoregressive process of order 1 (INAR(1)) with time-varying covariates. These types of features we observe in many real-life scenarios especially in the COVID-19 data sets, where the number of active cases over time starts falling and then again increases. In order to capture those features, we use Poisson INAR(1) process with a time-varying smoothing covariate. By using such model, we can model both the components in the active cases at time-point t namely, (i) number of nonrecovery cases from the previous time-point and (ii) number of new cases at time-point t. We study some theoretical properties of the proposed model along with forecasting. Some simulation studies are performed to study the effectiveness of the proposed method. Finally, we analyze two COVID-19 data sets and compare our proposed model with another PINAR(1) process which has time-varying covariate but no change-point, to demonstrate the overall performance of our proposed model.

Entities: Chemical

Keywords: COVID‐19; INAR(1) process; Poisson distribution; active cases; change‐point; smoothing function; time‐varying covariates

Year: 2021 PMID： 34226773 PMCID： PMC8242783 DOI： 10.1111/stan.12251

Source DB: PubMed Journal: Stat Neerl ISSN： 0039-0402 Impact factor: 1.239

INTRODUCTION

Time series of count data have been widely studied during the last three decades or so due to its increased relevance toward various fields of science. There are several ways to model count time series data. For example, McKenzie (1985, 1986) and Al‐Osh and Alzaid (1987) introduced a class of stationary integer‐valued autoregressive (INAR) time series process based on binomial thinning operator. This process was further studied and generalized by Alzaid and Al‐Osh (1990), Jin‐Guan and Yuan (1991), Freeland and McCabe (2004), Ristić, Bakouch, and Nastić (2009), Jazi, Jones, and Lai (2012), Schweer and Weiß, (2014), Maiti, Biswas, and Das (2015) and many more. In particular, McKenzie (1986) introduced the integer‐valued AR(1) or INAR(1) models with geometric and negative binomial marginals when the data are overdispersed. McKenzie (1985) and Al‐Osh and Alzaid (1987) developed an INAR(1) process with Poisson marginals, well known as PINAR(1) process which is very popular due to its simple form. The INAR(1) process was further extended to a more general INAR(p) process by Alzaid and Al‐Osh (1990) and Jin‐Guan and Yuan (1991). Ristić et al. (2009) and Schweer and Weiß (2014) proposed a new INAR(1) process based on negative binomial thinning operator which can also handle the overdispersion problem. Jazi et al. (2012) and Maiti et al. (2015) studied zero‐inflated PINAR(1) (ZIPINAR(1)) processes for zero‐inflated count data. Apart from these thinning‐based INAR processes, Cameron and Trivedi (1986) and Fokianos (2011) studied some regression‐based time series models to model count time series data. In this article, we employ the INAR process to model the data of COVID‐19 active cases which is an example of count time series data. In an INAR process there are two components at time‐point t namely, (i) nonrecovery cases from the previous time‐point (survival part) and (ii) new cases coming in the process at time‐point t (innovation terms). These INAR processes are mainly stationary since the innovation terms involve no time‐varying covariate, that is, the new cases coming in the process are not time‐dependent. But in real‐life scenarios like the COVID‐19 data sets, we can find that the rapid change in the number of infected cases makes the innovation terms time‐dependent. Besides this time‐varying nature of the innovation terms, we also notice some change‐points in these data sets. In the current scenario of COVID‐19 pandemic, we are seeing mainly two types of curves for daily new cases reported in different parts of the world, which are (i) the curve, at first, began to increase exponentially, but after major steps like “nationwide lockdowns,” “social distancing” measures, a massive number of testing, and so on taken by the respective authorities in different countries, the curve started decreasing, and (ii) the curve which came down, started to rise again as the respective authorities began to ease those measures in some parts of the world. The curves of daily active cases are also changing in the same way in those parts of the world. Hence we can spot one change‐point (upward to downward) for the curve described in Case (i) and two change‐points (upward to downward and then downward to upward) for the curve in Case (ii). In this article, we try to develop a PINAR process based on binomial thinning operator for count time series data like the COVID‐19 data where we model the innovation terms through some time‐varying covariates and smoothing change‐point function without changing the survival part. PINAR process, introduced by McKenzie (1985) and Al‐Osh and Alzaid (1987), is very popular due to its simple form and has a wide application in modeling count time series data. But this PINAR process based on binomial thinning operator is not capable of handling the count time series data which has both change‐points and time‐varying innovation terms. Hence we introduce a new suitable PINAR model which is able to tackle both these features which can be found in the COVID‐19 data sets. To incorporate the change‐points in our proposed PINAR model, the innovation terms are modeled with a smoothing version (see Smooth maximum, n.d.) of time‐varying covariate which consists of the change‐points. The idea to capture the change‐points in the innovation terms through time‐varying smoothing covariate is inspired by Chan and Tong (1986), Hansen, (2000) and Fong, Huang, Gilbert, and Permar (2017) whose works are mainly based on continuous data. We use this smoothing version of time‐varying components in our proposed model to catch the changing curvatures in the data of daily active cases. The effectiveness of the proposed model for both the studies of one change‐point and two change‐points is reviewed later by simulation study and the analysis of two COVID‐19 data sets. We compare our proposed model with another PINAR model which has time‐varying covariate but no change‐point, to illustrate the overall performance of the proposed model. The rest of the article is organized as follows. Section 2 discusses two real COVID‐19 data sets. Section 3 describes our proposed model along with a brief illustration of the INAR(1) process. We provide the distributional forms of our proposed model and the h‐step ahead forecasting distribution in Sections 4 and 5, respectively. In Section 6, we talk about the estimation method for our proposed model. Some extensive simulation studies for our proposed model are provided in Section 7. In Section 8, we analyze the COVID‐19 data sets. Finally, some conclusions are drawn in Section 9. All the proofs of the theoretical results are provided in the Appendix.

MOTIVATING DATA EXAMPLES: COVID‐19 DATA

The world is now facing the biggest global health crisis in the name of COVID‐19 pandemic unlike any in recent times. The outbreak was first identified in Wuhan, China, in early December 2019. The World Health Organization declared the outbreak a Public Health Emergency of International Concern on January 30 and a pandemic on March 11. To restrict the spread of this virus in early stages, heavy measures have been implemented in different parts of the world by the respective authorities like “nationwide lockdowns,” “rapid testing” process, strict “social distancing,” using of masks and sanitizers in public places, and so on. Hence in certain parts of the world, the situation of COVID‐19 has improved, and the lockdown has been eased in those parts. During that period, there were also some Gulf evacuations took place in different countries, especially in India. Therefore the “community transmission” has started in those parts of the world due to the highly infectious nature of this virus, and the number of infected cases began to pile up again. For further discussion in this regard, we explore two real COVID‐19 data sets in Sections 2.1 and 2.2.

COVID‐19 data of Italy

This data set is an example of Case (i) described in Section 1. We can only see one change‐point in the data of active cases of Italy and hence the study will be based on one change‐point analysis. The data (see Worldometer, n.d.) are collected from February 15 to June 6 (total 113 days). Though the first case in this country was detected back in January 2020, the cases started to increase rapidly from the beginning of March. After continuous measurements taken by the authorities, the curve of active cases has started to come down. As of June 6, 2020, the total number of confirmed cases was more than 234k, and the number of deaths was more than 33.8k. The active number of cases was more than 35 000. Figure 1 displays the data of new daily cases, the data of daily active cases, and the autocorrelation function (ACF) and partial ACF (PACF) plots of daily active cases. From the ACF and PACF plots, it seems that the data have a good fit for the AR(1) process.

FIGURE 1

COVID‐19 data of Italy

COVID‐19 data of Kerala

This data set is an example of Case (ii). Here we observe two change‐points in the data of daily active cases and hence the study will be based on two change‐point analysis. In Kerala, the first case was also detected back in January 2020, but the cases started to pile up from mid‐March. Due to heavy measurements taken by the state government of Kerala, the curve of active cases came down, but from mid‐May, the cases again started to rise when the Gulf evacuees began to come into the state. The data for Kerala (see GoK Dashboard, n.d) are collected from March 9 to June 6 (total 90 days). More than 1800 cases and total of 15 deaths were reported in Kerala as of June 6, 2020, and the active number of cases was more than 1000. Figure 2 displays the data of new daily cases, the data of daily active cases, and the ACF and PACF plots of daily active cases. It seems that the data have a good fit for the AR(1) process.

FIGURE 2

COVID‐19 data of Kerala

MODEL

In this section, we develop a new model based on the integer‐valued AR(1) process to capture the change‐points in the count time series data sets like the COVID‐19 data sets of Italy and Kerala. Here we use the INAR(1) process (proposed by McKenzie, 1985 and Al‐Osh & Alzaid, 1987) consisting of binomial thinning operator (introduced by Steutel & van Harn, 1979), to develop our proposed model for change‐point analysis, which is given by where denotes the number of daily active cases at time‐point t and represents daily new cases reported at time‐point t. We assume that follows Poisson() where is assumed to have the following form: where the tuning parameter helps to capture the changing curvature of the data. denotes the change‐point in the data. The change‐point can be easily estimated from the data. The above model is defined only for one change‐point. = are the regression parameters, and is the regression coefficient which associates with the time‐varying covariate consisting of the change‐point. However, we can easily extend the model for more than one change‐point. For two change‐points, only the form of will change and the functional form is given by where and are the two change‐points in the data. Here = are the regression parameters, and and are the regression coefficients which associate with the time‐varying covariates consisting of the change‐points. In the subsequent section, we provide a general idea about our proposed model. The use of one tuning parameter in data with one change‐point can be widened for more than one change‐point like using two different tuning parameters and for data with two change‐points. But in our proposed process for two change‐points, we put only instead of and , mainly because using reduces the computational difficulty and simplifies the form of the proposed model.

Idea behind the model

The idea behind the form of , discussed in Equations (2) and (3), comes from the threshold regression model setup (see Chan & Tong, 1986; Fong et al. 2017; Hansen, 2000). From the concept of the segmented model in the threshold regression setup, we can write the form of for one change‐point as , where and . In this segmented form of , we get to see sharp change (upward to downward or downward to upward) in the curve of daily new cases and hence in the curve of daily active cases. But in real‐life scenarios like the COVID‐19 data, we do not get to see sharp changes; most of the time we notice changing curvature(s) in these data sets. So we try to capture those changing curvature(s) in the data of daily active cases by modeling the data of daily new cases (innovation terms) in the proposed model through some time‐varying covariates and smoothing change‐point functions. Moreover, the function is not differentiable at . So we replace (for ) by a smooth differentiable maximum function (see Smooth maximum, n.d.), which is given by Hence the functional form of for one change‐point is given by that is, In the similar way, we can find the functional form of for two change‐points, which is given by

Conditions on 's

The changing behaviors of these data sets depend on some conditions on 's. We try to provide those conditions through the form of the segmented model of threshold regression setup for both sets of 's (() in Equation (2), and ( in Equation (3)) which enable our proposed model to capture the change‐point(s). The required conditions for both the studies of one change‐point and two change‐points are given below. (i) In the segmented form of for one change‐point analysis, we model as for and for . So the derivatives of are for and for in this segmented model setup for one change‐point. So for and , increases when and decreases when , that is, increases when and decreases when . Hence the change‐point is th time‐point in the data of daily new cases. So for the count time series data of one change‐point, the condition: must hold. (ii) Similarly for the study of two change‐points, we model as for , for and for . Hence the derivatives of are for , for and for in the segmented model for two change‐points. So for , and , increases when , decreases when and again increases when , that is, increases when , decreases when and again increases when . Here the two change‐points are th and th time‐points in the data of daily new cases. So the condition: must hold for the count time series data containing two change‐points.

Choices of the tuning parameter

The tuning parameter of our proposed model, , helps to capture the changing curvature(s) in the data. Here . To compute the optimal value of the tuning parameter from the data, we consider a grid search method (see Chakraborty, Laber, & Zhao, 2013, James, Witten, Hastie, & Tibshirani, 2013). In this method, we use a goodness‐of‐fit measure based on which the optimal value of is calculated. The idea of comes from the concept of Smooth maximum (n.d.) and so as the value of increases the changing curvature becomes sharper. We show this property in Figures 3 and 4 where we can clearly see as the values of shift from 0.05 to 1; the changing curvatures become sharper for both the studies of one change‐point and two change‐points. We also add the nonsmoothing version (no use of ) of the generated data, that is, the segmented data.

FIGURE 3

The changing curvatures for one change‐point study for along with segmented data (no use of )

FIGURE 4

The changing curvatures for two change‐point study for along with segmented data (no use of )

The changing curvatures for one change‐point study for along with segmented data (no use of ) The changing curvatures for two change‐point study for along with segmented data (no use of )

Estimation of change‐point(s)

To estimate the change‐point(s), we take the difference between every two consecutive observations (i.e., ) and consider the sign of those differences denoted by where if and otherwise. For a data set with one change‐point, the sequence should give us two runs: (1) run of +, and (2) run of (see Wald & Wolfowitz, 1940). Depending on the increasing or decreasing curve of , the run of + and the run of will be interchanged. For example, if the original time series plot of is bell‐shaped (i.e., initially the observations are increasing and then after a certain time‐point (say, ) the observations are decreasing), we will have a run of + first and then after the time‐point we will have a run of . The time‐point at which the first run of + ends gives us an estimate of the original change‐point . However, in real scenarios, time series data with one change‐point may not be smooth and often there are random fluctuations present in the data. As a result, there might be many small runs of + and which make the above estimation procedure difficult to locate the true change‐point. Hence we employ a presmoothing approach before implementing the above run‐based point estimation. That is, instead of working with the actual time series data, we make the data smooth by implementing some standard statistical approaches like m‐point moving average, or through a pth degree polynomial function. For the time series data with two change‐points (say, and ), the sequence should produce three runs: (1) run of +, (2) run of , and (3) again run of +. Here the run of + and the run of will be interchanged twice, that is, a run of + for the increasing curvature, then a run of for the decreasing curvature and another run of + when cases again begin to rise (another increasing curvature). The time‐point at which the first run ends gives us an estimate of the first change‐point and the time‐point at which the second run ends provides an estimate of the second change‐point . However, like the case of one change‐point, here also time series data sets are nonsmooth and hence the implementation of presmoothing approaches like m‐point moving average, or through pth degree polynomial function is required. Later, in Section 7.2, we perform a simulation study where we estimate the true change‐point(s) and provide confidence interval(s) (CI(s)) based on normal approximation. And we study the large sample properties by varying the sample size.

DISTRIBUTIONAL PROPERTIES

In this section, we study the conditional and the marginal distributions of the proposed model.

Conditional distribution

Under our proposed setup, the conditional distribution of given and (the set of all covariates up to time‐point t including smooth time‐varying and simple time‐varying covariates up to time‐point t) can be derived as where is the indicator function. This is the probability of going from state i to state j in a single step. The conditional mean and variance can be given as , and , respectively.

Marginal distribution

Since the marginal distribution of is difficult to obtain, we find the partial marginal distribution of given for , henceforth it is called the marginal distribution. Here we derive the probability generating function (PGF) of given . The derivation is valid for and hence we assume that given , the marginal distribution of is Poisson(). The reason behind this assumption can be given as follows. We know the elements which enter the system in the interval are the innovation term at time‐point t (). Now for , the interval is (0, 1], and there is no previous existing interval in the system. So in the interval (0, 1], the elements which enter the system can be seen as the first count process . Hence we can assume Poisson(). Under the assumptions that Poisson( ) and Poisson( ), we can show that the PGF of is that is, given , follows Poisson distribution with mean . The derivation of this result is presented in Appendix A. Here we can also use a recursive formula as an alternative way to derive the marginal distribution, which is given by where is the indicator function. Here the marginal mean and the marginal variance are given by and Under the above setup, the autocovariance function (ACVF) of given using the equation can be derived as The derivation of this result is presented in Appendix B. Hence for 0, the ACF can be derived as follows: It can be seen that the above expression decays exponentially to 0 as h goes to for and the restricted 's discussed in Section 3.2.

FORECASTING

h‐Step ahead forecasting distribution

To find the h‐step ahead forecasting distribution, we use the following recursive method: Thus the h‐step ahead conditional mean and conditional variance can be given as and The h‐step ahead forecasting distribution of PINAR(1) process was derived by Freeland and McCabe (2004) using the binomial thinning operator discussed by Al‐Osh and Alzaid (1987) and it turned out to be a convolution of binomial and Poisson distributions. Here we can calculate the conditional PGF of given and and then derive the forecasting distribution using this. The conditional PGF of given and can be shown as The derivation of this result is presented in Appendix C. From the above result, we can say that the h‐step ahead prediction distribution of given and is a convolution of Bin and some random variable having the PGF of the form . Therefore follows Poisson distribution with mean . Thus, the prediction distribution can be presented as where “ ” is called the convolution between two distributions. Using Corollary 1, the h‐step ahead forecasting distribution of given and can be derived as where is the indicator function, , and . The derivation of this result is presented in Appendix D.

Descriptive measure of forecasting accuracy

Given an observed data set {} of size , we partition the data into two sets. The training set containing the first n observations is used to estimate the parameters of the model and based on the rest of m observations called the test set, we define the following descriptive measure of forecasting accuracy. The h‐step ahead predicted root mean squared error (denoted by PRMSE(h)) is defined as where is the mean of the estimated h‐step ahead forecasting distribution of given and mentioned in Theorem 4. Intuitively, the PRMSE(h) should increase in h.

ESTIMATION METHOD FOR THE MODEL PARAMETERS

Conditional least squares estimation

Conditional least squares estimation is usually used for estimating the regression parameters of the model in the context of time series models. Freeland and McCabe (2004, 2005) used this approach for PINAR(1) process. In order to implement the conditional least squares estimation method, we need to minimize the sum of squared deviation about the conditional expectation which is given as instead of with respect to the regression parameters of the model, where and is the vector for regression parameters. Here numerical methods are being employed to obtain the CLS estimates of the regression parameters of the model as there are no closed forms of the CLS estimators. In the subsequent section, we have done an extensive simulation study for both the studies of one change‐point and two change‐points and from the simulation results, we have shown consistency of the CLS method. In maximum likelihood estimation, given a data set of size n, the likelihood function for the process is given by . In order to obtain the MLE estimators, we maximize the log‐likelihood function with respect to regression parameters, which can be written as . Here . In real‐life scenarios like the COVID‐19 data, the number of daily active cases at time‐point t (represented by ) and the number of daily new cases at time‐point t (represented by ) will often be large and hence in R programming language, we face difficulties to execute the MLE method because of the terms like (“j” is the number of daily active cases at time‐point t and where “i” is the number of daily active cases at time‐point ) involved in the likelihood function. So the estimation method which we have employed for data analysis is CLS method.

SIMULATION STUDY

General setup

In this section, we perform extensive simulation studies for (a) the estimation of change‐point(s), (b) the estimation of model parameters, and (c) the forecasting performances of the proposed model. To perform the studies, we simulate data from (1) one change‐point model and (2) two change‐point model. The simulation studies are performed for varying sample sizes along with different choices of model parameters, tuning parameter, and change‐points. For the simulation studies regarding the analysis of one change‐point (), (the set of all covariates up to time‐point n) is equal to where , which is the smooth time‐varying component and , which is the simple time‐varying component. And for the simulation studies regarding the analysis of two change‐points ( and ), where , , and ; here 's and 's are the smooth time‐varying components and 's are the simple time‐varying components. In the simulation studies, we use these components for each of the studies to generate data sets of varying sample sizes by the data‐generating processes mentioned in Equation (2) for one change‐point and Equation (3) for two change‐points. In the simulation study regarding forecasting performances, we compare our proposed model with the following model where denotes the daily number of active cases at time‐point t and represents the daily number of new cases at time‐point t. Here follows Poisson() where is assumed to have the following form: This model involves no change‐point. But the innovation terms depend on time‐varying covariates.

Results on change‐point(s) estimation

Here we perform a simulation study in order to provide 95% CIs for the true change‐points from the simulated data sets and examine the widths of those intervals with increasing sample size. The estimation method of change‐point(s) is discussed in Section 3.4. In order to perform this simulation study, we simulate data from the proposed model with (1) one change‐point (given in Equation (2)), and (2) two change‐points (given in Equation 3)). Two sets of regression parameters are considered for each of the above two data‐generating cases. Three different sample sizes (n) of 400, 450, and 500 are explored. Throughout the whole simulation study, we consider two different values of as 0.1 and 0.2. All the simulations results are based on 1000 Monte Carlo replications.

Case 1: Analysis of one change‐point

For one change‐point simulation study, we assume the value of the true change‐point to be where n is the sample size of the data. The estimation method of the change‐point is discussed in Section 3.4. Two sets of regression parameters used in the data‐generating process are and . For each set of the regression parameters and the tuning parameter , we simulate the data using model (1) with given in Equation (2). Here for the data‐generating method of one change‐point, , set of all covariates up to time‐point n, consists of both the smooth time‐varying components and the simple time‐varying components up to time‐point n as described in Section 7.1, where n is the sample size of the simulated data set. The process is repeated for 1000 times and we report the 95% CIs in Tables 1 and 2 where we can see that as the sample sizes increase the widths of the CIs decrease.

TABLE 1

95% confidence intervals (CIs) for the true change‐point for different sample sizes for different values of where the true change‐point is at th time‐point and true

δn=0.1
n	95% CI	Width
400	(196.6439, 203.3801)	6.7362
450	(222.5360, 227.3340)	4.7980
500	(248.1794, 251.7426)	3.5632
δn=0.2
n	95% CI	Width
400	(197.8427, 202.1033)	4.2606
450	(223.4354, 226.6266)	3.1912
500	(248.9330, 251.0090)	2.0760

TABLE 2

95% confidence intervals (CIs) for the true change‐point for different sample sizes for different values of where the true change‐point is at th time‐point, and true

δn=0.1
n	95% CI	Width
400	(197.9393, 205.0067)	7.0674
450	(224.1986, 228.8794)	4.6808
500	(249.8242, 252.8998)	3.0756
δn=0.2
n	95% CI	Width
400	(198.4818, 203.1122)	4.6304
450	(224.1968, 227.2532)	3.0564
500	(249.5226, 251.7134)	2.1908

95% confidence intervals (CIs) for the true change‐point for different sample sizes for different values of where the true change‐point is at th time‐point and true 95% confidence intervals (CIs) for the true change‐point for different sample sizes for different values of where the true change‐point is at th time‐point, and true

Case 2: Analysis of two change‐points

For the simulation study of two change‐points, the true change‐points and are assumed to be and , respectively. Two sets of values of the regression parameters used in the data‐generating process are = and . For each set of the regression parameters and the tuning parameter , we simulate the data using model (1) with given in Equation (3). The estimation method of the change‐point is discussed in Section 3.4. Here for the data‐generating method of two change‐points, , set of all covariates up to time‐point n, consists of both the smooth time‐varying components and the simple time‐varying components up to time‐point n as described in Section 7.1, where n is the sample size of the simulated data set. The process is repeated for 1000 times and the 95% CIs are reported in Tables 3 and 4. From the tables, we can see that as the sample sizes increase the widths of the CIs decrease.

TABLE 3

95% confidence intervals (CIs) for the true change‐points for different sample sizes for different values of where the true change‐points are at th and th time‐points, and true

δn=0.1
n	95% CI for first change‐point	Width	95% CI for second change‐point	Width
400	(158.7577, 161.6283)	2.8706	(235.0461, 245.1539)	10.1078
450	(178.9962, 181.0778)	2.0816	(265.5051, 274.5129)	9.0078
500	(199.3109, 200.7131)	1.4022	(296.0952, 303.7808)	7.6856
δn=0.2
n	95% CI for first change‐point	Width	95% CI for second change‐point	Width
400	(159.1228, 160.9412)	1.8184	(236.4025, 243.1755)	6.7730
450	(179.4911, 180.5349)	1.0438	(266.7304, 272.4936)	5.7632
500	(199.9124, 200.0876)	0.1752	(297.2184, 301.8216)	4.6032

TABLE 4

95% confidence intervals (CIs) for the true change‐points for different sample sizes for different values of where the true change‐points are at th and th time‐points, and true

δn=0.1
n	95% CI for first change‐point	Width	95% CI for second change‐point	Width
400	(158.5181, 162.2079)	3.6898	(233.8636, 244.1944)	10.3308
450	(178.6895, 181.8585)	3.1690	(264.5409, 273.4491)	8.9082
500	(198.8991, 201.3409)	2.4418	(294.9574, 302.7986)	7.8412
δn=0.2
n	95% CI for first change‐point	Width	95% CI for second change‐point	Width
400	(158.8499, 161.4501)	2.6002	(235.8058, 242.6082)	6.8024
450	(179.0436, 181.0664)	2.0228	(266.2779, 272.0281)	5.7502
500	(199.3532, 200.7068)	1.3536	(296.5623, 301.5857)	5.0234

95% confidence intervals (CIs) for the true change‐points for different sample sizes for different values of where the true change‐points are at th and th time‐points, and true 95% confidence intervals (CIs) for the true change‐points for different sample sizes for different values of where the true change‐points are at th and th time‐points, and true

Results on estimation of model parameters

Here we perform a simulation study to investigate the consistency of the estimation method used for the proposed model. In order to perform this simulation study, we simulate data from the proposed model with (1) one change‐point (given in Equation (2)), and (2) two change‐points (given in Equation (3)). Three sets of regression parameters are considered for each of the above two data‐generating cases. Those values are mentioned in the subsequent sections. Three different sample sizes (n) of 100, 200, and 500 are explored. Throughout the whole simulation study, we consider three different values of as 0.1, 0.5, and 1. All the simulations results are based on 1000 Monte Carlo replications. For one change‐point simulation study, we assume the value of the change‐point to be where n is the sample size of the data. Three sets of regression parameters used in the data‐generating process are , , and . For each set of the regression parameters and the tuning parameter , we simulate the data using model (1) with given in Equation (2). Then we estimate the regression parameters using CLS estimation method. Here for the data‐generating method of one change‐point, , set of all covariates up to time‐point n, consists of both the smooth time‐varying components and the simple time‐varying components up to time‐point n as described in Section 7.1, where n is the sample size of the simulated data set. The process is repeated for 1000 times and we report the mean estimates and mean squared errors (MSEs) of the regression parameters in Tables 5, 6, 7. From Tables 5, 6, 7, we can see that as the sample size increases MSE of the estimated regression parameters decreases. This empirically establishes the consistency of the CLS estimation.

TABLE 5