| Literature DB >> 26106341 |
Andrew T Jebb1, Louis Tay1, Wei Wang2, Qiming Huang3.
Abstract
Psychological research has increasingly recognized the importance of integrating temporal dynamics into its theories, and innovations in longitudinal designs and analyses have allowed such theories to be formalized and tested. However, psychological researchers may be relatively unequipped to analyze such data, given its many characteristics and the general complexities involved in longitudinal modeling. The current paper introduces time series analysis to psychological research, an analytic domain that has been essential for understanding and predicting the behavior of variables across many diverse fields. First, the characteristics of time series data are discussed. Second, different time series modeling techniques are surveyed that can address various topics of interest to psychological researchers, including describing the pattern of change in a variable, modeling seasonal effects, assessing the immediate and long-term impact of a salient event, and forecasting future values. To illustrate these methods, an illustrative example based on online job search behavior is used throughout the paper, and a software tutorial in R for these analyses is provided in the Supplementary Materials.Entities:
Keywords: ARIMA; forecasting; longitudinal data analysis; regression analysis; time series analysis
Year: 2015 PMID: 26106341 PMCID: PMC4460302 DOI: 10.3389/fpsyg.2015.00727
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Figure 1A plot of the original Google job search time series and the series after seasonal adjustment.
Figure 2The original time series decomposed into its trend, seasonal, and irregular (i.e., random) components. Cyclical effects are not present within this series.
Figure 3Two example time series displaying exaggerated positive (top panel) and negative (center panel) autocorrelation. The bottom panel depicts the ACF of the Google job search time series after seasonal adjustment.
Figure 4An example of a stationary time series (specifically, a series of uncorrelated white noise terms). The mean, variance, and autocorrelation are all constant over time, and the series displays no systematic patterns, such as trends or cycles.
Common tests in time series analysis.
| Augmented Dickey–Fuller (ADF) | The series is non-stationary; rejection implies a stationary series. | A series must be stationary before any AR or MA terms are added to account for its autocorrelation. The ADF test identifies if a series needs to be made stationary through differencing, or, after an order of differencing has been applied, if the series has indeed become stationary. |
| Durbin–Watson | The residuals from a regression model do not have a lag-1 autocorrelation; rejection implies lag-1 autocorrelated errors. | A Durbin–Watson test can assess if the residuals of a regression model are autocorrelated. When this is the case, including ARIMA terms or using generalized least squares estimation can account for this autocorrelation. |
| Ljung–Box | The errors are uncorrelated; rejection implies correlated errors. | After fitting an ARIMA or dynamic regression model to a series, the Ljung–Box test identifies if the model has been successful in extracting all the autocorrelation. |
There are other tests for stationarity, such as the Phillips–Perron and Kwiatkowski–Phillips–Schmidt–Shin (KPSS) tests which can sometimes yield contrary results. The ADF test was chosen as the focus of this paper due to its popularity and reliability. For information regarding the others, see Cowpertwait and Metcalfe (.
Figure 5A flowchart depicting various time series modeling approaches and how they are suited to address various goals in psychological research.
Figure 6Three different regression models with time as the regressor and their associated residual error series.
Figure 7A segmented regression model used to assess the effect of the 2008 economic crisis on the time series and its associated residual error series.
Figure 8A plot of the partial autocorrelation function (PACF) of the seasonally adjusted time series of Google job searches.
Figure 9ACF and PACF of the cubic model residuals used to determine the number of AR and MA terms in an ARIMA model.
Comparison of different ARIMA models.
| ARIMA(1, 1, 0): one AR term | Ljung–Box test: | 419.80 |
| ARIMA(0, 1, 1): one MA term | Ljung–Box test: | 423.84 |
| ARIMA(1, 1, 1): a mixed model | Ljung–Box test: | 425.84 |
| ARIMA(2, 1, 0): two AR terms | Ljung–Box test: | 448.79 |
| ARIMA(0, 1, 2): two MA terms | Ljung–Box test: | 425.84 |
ACF plots for all models showed that <5% of autocorrelations reached statistical significance.
Figure 10Forecasts from the dynamic regression model compared to the observed values. The blue line represents the forecasts, and the red dotted line indicates the observed values. The darker gray region denotes the 80% confidence region, the lighter gray, 90%.
Steps for specifying an ARIMA forecasting model.
| Step 1. Confirm the presence of autocorrelation. | If there is autocorrelation in the data, then an ARIMA model can be used for forecasting or ARIMA terms can be included within an existing regression model to improve its forecast accuracy (i.e., a dynamic regression/ARIMAX model). | • Examine a plot of the ACF for any large autocorrelations across different lags. In a white noise series, 5% of autocorrelations are expected to reach statistical significance, so one must look at strength of the autocorrelation in addition to statistical significance for the best diagnosis. |
| Step 2. Determine if the series is stationary. | Before AR or MA terms can be included in the model to account for the autocorrelation, the series must be stationary (i.e., a constant mean, variance, and autocorrelation). | • Examine a plot of the series for systematic changes in its mean level (i.e., trend or seasonal effects) and variance. |
| Step 3. Transform the series to stationarity. | AR and MA terms assume a stationary series, and this assumption must be met before modeling the autocorrelation. | • If the variance is not constant over time, taking the natural logarithm of the series can stabilize it. |
| Step 3. Partition the data into estimation validation periods. | Before a forecasting model is used, its accuracy should be assessed. This entails conserving some data in the latter portion of the series to compare to the predictions generated by the model (the validation period). However, the majority of the data should still be used for parameter estimation. | • As a general rule, the first 80% of the series can be used to estimate the parameters and the remaining 20% to assess the accuracy of the model predictions. |
| Step 4. Examine the ACF and PACF, and fit a parsimonious ARIMA model. | Examining the ACF and PACF of a series can indicate how many AR and MA terms will be required to explain the series autocorrelation. | • A pattern of autocorrelation that is best explained by AR terms has a steadily decaying ACF and a PACF that drops after |
| Step 5. Examine model sufficiency. | A successful model will have extracted all of the autocorrelation from the data after being fit. Noticeable remaining autocorrelation indicates that the model can be improved. | • Examine a plot of the model residuals which should appear as random white noise. |
| Step 6. Re-specify the model if necessary and use the AIC to compare models. | An initial model may not successfully explain all the autocorrelation present in the data. Alternatively, a model may successfully account for the autocorrelation but be needlessly complex (i.e., more AR or MA terms than is necessary). Thus, ARIMA modeling is an iterative, exploratory process where multiple models are specified and then compared. | • Sometimes a mixed model can explain the autocorrelation using less parameters. Alternatively, a simpler model may also fit the data well. These models can be specified and checked for adequacy (Step 5). |
| Step 7. Generate predictions and compare to observations in the validation period. | Once a model has been chosen, comparing the model predictions within the validation period allows the analyst to determine if the model produces accurate forecasts during the time periods that have been observed. This provides evidence that it will provide accurate | • After estimating model parameters from the first portion of the data, use the remaining observations to compare to the predicted values given by the model. |
| Step 8. Generate forecasts into the future. | After a good-fitting model has been selected and checked for forecasting accuracy, it can be used to generate forecasts into the future. | • Determine how many periods ahead into the future to forecast. |
ACF, Autocorrelation function; PACF, Partial autocorrelation function; ADF, augmented Dickey–Fuller; AIC, Akaike information criterion.
Glossary of time series terms.
| Trend | The overarching long-term change in the mean level of a time series. | Trends often represent time series effects that are theoretically interesting, such as the result of a critical event or the effect of other variables. Importantly, trends may be either deterministic or stochastic. Deterministic trends are those due to the constant effects of a few causal forces. As a result, they are generally stable across time and are suitable to be modeled through regression. In contrast, stochastic trends arise simply by chance and are consequently not suitably modeled through regression methods. |
| Seasonality | A pattern of rises and falls in the mean level of a series that consistently occurs across time periods. | Seasonal effects may be substantively interesting (in which case they should be estimated) or they may obscure other more important components, such as a trend (in which case they should be removed). |
| Cycles | Any repeating pattern in the mean level of a series whose duration is not fixed or known and generally occurs over a period of 2 or more years. | Cycles may also represent patterns of interest. However, cycles are more difficult to identify and generally require longer series to be adequately captured. |
| Autocorrelation | When current observations exhibit a dependence upon prior states, manifesting statistically as a correlation between lagged observations. | The presence of autocorrelation means that there is signal in the data that can be modeled by AR or MA terms to generate more accurate forecasts. |
| Stationarity | When the mean, variance, and autocorrelation of a series are constant across time. | Descriptive statistics of a time series are only meaningful when it is stationary. Furthermore, before a time series can be modeled by AR or MA terms it must be made stationary. |
| Seasonal adjustment | A process of estimating the seasonal effects and removing them from the series. | Seasonal adjustment can remove a source of variation that is not interesting from a theoretical perspective so that the elements of a time series that are of interest can be more clearly analyzed (e.g., a trend). |
| Differencing | The process of transforming the values of a series into a series of the differences between observations adjacent in time. | Differencing removes the trend from a time series and thus helps to make the mean of a time series stationary. |
| Autocorrelation function (ACF) | A measure of linear association (correlation) between the current time series values with its past series values. | The ACF allows the analyst to see if there is any autocorrelation in the data and at what lags it manifests. It is essential in identifying the appropriate number of AR and MA terms to explain the pattern of the residuals. It is also valuable for determining if there is any remaining autocorrelation after an ARIMA model has been fit (i.e., model diagnostics). |
| Partial autocorrelation function (PACF) | A measure of linear association (correlation) between the current time series values with its past series values after controlling for the intervening observations. | The PACF is useful for identifying the number of AR or MA terms that will explain the autocorrelation in the data. |
| Integrated (I) | In an ARIMA model, the number of times the series has been differenced in order to make it stationary. | Stationarity is an assumption that must be met before any AR or MA terms can be included in a model. In an ARIMA model, the Integrated component allows the inclusion of series that are non-stationary in the mean. |
| Autoregressive (AR) | When a variable is regressed on its prior values in order to account for autocorrelation. | AR terms are able to account for autocorrelation in the data to improve forecasts. |
| Moving average (MA) | When a variable is regressed on past random shocks (error terms) in order to account for autocorrelation. | MA terms are able to account for autocorrelation in the data to improve forecasts. |
| Dynamic regression (ARIMAX) | A time series model that includes both regression and ARIMA terms. | A model that includes both explanatory variables and AR or MA terms be used to simultaneously model an underlying trend and generate accurate forecasts. |