Literature DB >> 31522121

Universal Cointegration and Its Applications.

Abstract

Cointegration focuses on whether the long-term linear relationship between two or more time series is stationary even if this linear relationship does not exist or is not strong for the short term. Identifying the potential cointegration is important for economics, ecology, meteorology, neuroscience, and much more. Classic methods only considered or restricted in cointegration where the order of integration of all time series is 1. We introduce a method based on searching the vector to minimize the absolute correlation of convergent cross-mapping that can explore the universal cointegration and its extent. The proposed method can be applied to time series whose order of integration is not 1, cases that are not covered by classic cointegration. The proposed method is first illustrated and validated through time series generated by mathematical models in which the underlying relationships are known and then applied to three real-world examples.

Entities: Disease Gene Mutation Species

Keywords: Computational Mathematics; Global Change; Interdisciplinary Physics

Year: 2019 PMID： 31522121 PMCID： PMC6744394 DOI： 10.1016/j.isci.2019.08.048

Source DB: PubMed Journal: iScience ISSN： 2589-0042

Introduction

Identifying the potential cointegration among time series is a challenging and open problem (Johansen, 1988, Engle and Granger, 1991, Hamilton, 1994). Cointegration focuses on whether the long-term linear relationship between two or more time series is stationary even if this linear relationship does not exist or is not strong for the short term. For example, many models can be constructed based on well-established physical principles (modeling by mechanism). However, simulations of time series generated from these models may vary substantially. If a given model has the ability to capture the empirical observations, then the simulated time series of this model is expected to have a cointegrating relationship with the observed empirical time series. Therefore, for many candidate models, through the cointegrating relationship between the observed and simulated time series, one can determine which model is the most appropriate, i.e., the one that has the strongest cointegration. Such types of problems are ubiquitous in natural and social sciences (Kammerdiner and Pardalos, 2010, Kristoufek, 2013, Klee et al., 1987, Pedroni, 2001, Chiu and Wong, 2011, Ma and Zhu, 2019) and are difficult to discuss in complex nonlinear systems (Robinson and Hualde, 2003, Tu, 2014, Yang et al., 2014). Cointegration is common in even the simplest nonlinear systems, such as shown in Figure 1, where time series are generated by the following set of state and observation equations:where are independent normal distributions. If the matrix rank of A is 1, a constant α will exist such that , or . Therefore, the above equation reads as

Figure 1

Unidirectional Coupling Time Series

(A) Period oscillation where .

(B) Extreme value where is drawn from a Pareto distribution with a minimum value parameter of 0.1 and a shape parameter of 1.25.

(C) Chaotic map where . The initial value is drawn from a uniform distribution between 0 and 1. Steps from 101 to 150 are adopted. Although a short-term deviation between the two time series exists, the long-term relationship is stationary according to the equations.

Unidirectional Coupling Time Series (A) Period oscillation where . (B) Extreme value where is drawn from a Pareto distribution with a minimum value parameter of 0.1 and a shape parameter of 1.25. (C) Chaotic map where . The initial value is drawn from a uniform distribution between 0 and 1. Steps from 101 to 150 are adopted. Although a short-term deviation between the two time series exists, the long-term relationship is stationary according to the equations. The relationship between and is . At each step, the observation of the true state iswhere are independent normal distributions. Therefore, the following equation holds:where is a constant and is a normal distribution. Although deviates from owing to the noise for short time windows, their difference is stationary if we consider sufficiently long time intervals. The traditional statistical method for identifying the linear relationship between two time series is regression. The most well-known disadvantage of this method is spurious regression (Granger and Newbold, 1974), which is misleading statistical evidence of a linear relationship between independent nonstationary variables (see Transparent Methods, section Spurious regression). Therefore, applying a regression model to identify long-term stationary relationships is not reliable and valid (see Figure S1 and Table S1), also considering that the empirical time series are contaminated by diverse factors. Engle and Granger realized this critical problem early on and proposed the concept of cointegration (Engle and Granger, 1987). Classic cointegration, such as the Engle-Granger cointegration test (Engle and Granger, 1987) and Johansen cointegration test (Johansen, 1995), only considers or restricts in cointegration (Hamilton, 1994), i.e., (1) all time series are nonstationary, and the minimum number of differences required to obtain the stationary series is 1, and (2) a vector exists such that is stationary (see Transparent Methods, section Background of cointegration). The Engle-Granger cointegration test, based on ordinary least squares, seeks the linear combination that has the minimum variance (see Table S2). The Johansen cointegration test, based on the maximum likelihood estimator of the so-called reduced rank model, seeks the linear combination that is the most stationary (see Table S3). The prerequisite of these approaches is that each given time series is , namely, it is nonstationary, and the minimum number of differences required to obtain a stationary series is 1. Otherwise, preprocessing such as difference or log-transform will be adopted, for example, asset returns in financial time series (Tsay, 2005) and phase estimation of neurophysiological signals (Dahlhaus et al., 2017). However, these preprocessings cannot guarantee that all processed time series are all together. Even if so, the following cointegration test is applied to the processed time series rather than the original time series. Therefore, the result is not reliable and clear. However, even if the prerequisite that all time series are all is not satisfied, these methods can still be run mechanically (see Transparent Methods, section Application of classic cointegration and its misuse). For example, the time series generated by Figure 1 cannot be applied to classic cointegration methods (see Tables S4 and S5). In the three cases, the long-term linear relationships exist, but all time series are not , and classic cointegration methods are violated completely. Unfortunately, the time series of most empirical datasets, such as economics, ecology, meteorology, and neuroscience, often cannot satisfy this prerequisite. In addition to this drawback, classic cointegration analysis also has two other drawbacks (Hamilton, 1994, Dahlhaus et al., 2018). First, it is difficult or even impossible to handle time series where any preprocessing cannot obtain together. Second, it is invalid for (complete) synchronization coupling (Pikovsky et al., 2003) composed of time series from a chaos system (see Transparent Methods, section Background of cointegration). The synchronization guarantees that all time series are the same or at least similar, so their linear combination is stationary (Pikovsky et al., 2003). Therefore, synchronization and cointegration describe the same problem: dynamic fluctuation around the equilibrium (Dahlhaus et al., 2018). In this work, we examine a method that specifically returns to the essence of the cointegration definition, only considering whether the long-term linear relationship between two or more time series exists and its extent. We demonstrate the principles of our framework using controlled mathematical model examples, showing that the method can successfully identify the cointegration that is not covered by classic cointegration but important in practice. The method is particularly suitable for identifying the synchronization naturally and heuristically. Finally, we apply the method to (1) check the relationship between models and observations of global warming, (2) identify the possible synchronization in electroencephalographic signals, and (3) determine the possible leadership of Bitcoin in the cryptocurrency market. Our method is not in competition with the classic cointegration methods, such as the Engle-Granger cointegration test and Johansen cointegration test; rather, it is specifically aimed at a class of systems not covered by classic cointegration (see Transparent Methods, section Classic cointegration analysis). The classic cointegration cannot apply to all cases in this work because its prerequisite is violated, even if it can occasionally determine the existing cointegrating relationship (see Transparent Methods, section Application of classic cointegration and its misuse and Tables S6 and S7).

Results

Mathematical Model Examples

When the system size is two, the possible cointegrating vector is . Three cases to construct the cointegrating relationship are discussed: unidirectional coupling, bidirectional coupling, and synchronization coupling (see Transparent Methods, section Mathematical model to generate the time series data).

Unidirectional Coupling

Considering the case where (see Figure 1); if we assume that time series is dependent and is independent, the minimum of the absolute correlation of CCM , and the argument (see the first row of Figure 2). To better understand the details, Video S1 shows the CCM projection from to with different for the case in Figure 1A, related to Figure 1. When , approximates to a random noise. Therefore, the universal cointegration from to exists, and its extent is large. If we assume that is dependent and is independent, then the universal cointegration from to also exists, and the argument (see the second row of Figure 2). We display the exact minimum and its argument value in Table S8 and the summary of 50 different realizations in Figure 3A.

Figure 2

The Acquisition Functions of Bayesian Minimization for the Three Cases of Unidirectional Coupling Time Series

(A) Period oscillation.

(B) Extreme value.

(C) Chaotic map. The first row checks the universal cointegration from time series 1 to 2, and the second row checks the opposite direction.

Figure 3

The Minimum and Its Argument for Different Cases

(A) Unidirectional coupling including three subcases.

(B) Bidirectional coupling including three subcases.

(C) Synchronization coupling including five coupling strengths. For each subcase, we run 50 realizations, and each market point represents the minimum and its argument of one realization.

The Acquisition Functions of Bayesian Minimization for the Three Cases of Unidirectional Coupling Time Series (A) Period oscillation. (B) Extreme value. (C) Chaotic map. The first row checks the universal cointegration from time series 1 to 2, and the second row checks the opposite direction. The Minimum and Its Argument for Different Cases (A) Unidirectional coupling including three subcases. (B) Bidirectional coupling including three subcases. (C) Synchronization coupling including five coupling strengths. For each subcase, we run 50 realizations, and each market point represents the minimum and its argument of one realization.

Bidirectional Coupling

Considering the case where (see Figure S3), if we assume that time series is dependent and is independent, the minimum of the absolute correlation of CCM and the argument (see the first row of Figure S4). Therefore, the universal cointegration from to exists, and its extent is large. If we assume that is dependent and is independent, the universal cointegration from to also exists, and the argument (see the second row of Figure S4). We display the exact minimum and its argument value in Table S9 and the summary of 50 different realizations in Figure 3B.

Synchronization Coupling

Considering the case and the coupling dynamics , if ϵ is or approximates to 0.5, then the two time series are always similar irrespective of the special function or even chaos function (see Figure S5). Even if noise exists in synchronized processes, their difference should remain relatively small, i.e., the differences follow a stationary process. Therefore, the minimum of the absolute correlation of CCM , and the argument (see Figure S6). We display the exact minimum and its argument value in Table S10 and the summary of 50 different realizations in Figure 3C.

System Size Is Larger than Two

When the system size is larger than two, if we still restrict ourselves to discussing the possible universal cointegration between each pair (see Figures S7A–S7C), i.e., the size of the cointegrating vector is two, then the process is the same as before (see Figures S8A–S8C). If we extend to discussing the mixing coupling where the possible universal cointegration is among three or more time series, our method is still valid. Considering a simple example of mixing coupling where the adjacency matrix (see Transparent Methods, section Mathematical model to generate the time series data), , i.e., is dependent and and are both independent (see Figure S7D). Therefore, we search the vector to minimize the absolute correlation of CCM between cause and result by Bayesian minimization. If the minimum approximates to 0, i.e., , the universal cointegration exists, and is the desired cointegrating vector (see Figure S8D). The deeper the color is, the smaller is the absolute correlation of CCM. The sole deep color area indicates that there is only one cointegrating relationship among these time series.

Real-World Examples

Real-world data are always contaminated by diverse factors that can be reduced to observation noise and process noise. These noises introduce many short-term deviations but cannot force the disappearance of the long-term stationary relationship. Therefore, noises are a benefit rather than a curse for universal cointegration. Here, we will apply it to three simple but frontier examples.

Checking the Relationship between Models and Observations of Global Warming

Climate is the response to linkages and couplings between the atmosphere, the hydrosphere, the biosphere, the cryosphere, and the geosphere (Houghton et al., 2001, Beniston et al., 1997). Climate models based on well-established physical principles have been demonstrated not only to successfully reproduce observed features of climate changes but also to predict future changes (Allen and Tett, 1999, Zhang et al., 2007). However, the simulation results, particularly the patterns of processes and phenomena, may substantially vary from different models (Turasie, 2012, Kaufmann et al., 2011). If a climate model has the ability to capture the real system, then its outputs are expected to have a cointegrating relationship with the observed climate (Turasie, 2012). Do the outputs of climate models cointegrate with the observed climate change? If so, what is the best proxy comparing the performances of different models? Our first real-world example is applying universal cointegration to compare the global near-surface temperature of the historical observations and simulations from 48 models in the CMIP5 archive (see Figure 4A). Some studies used the regression method or classic cointegration analysis (Turasie, 2012). As indicated in our previous discussions, these methods have some fatal drawbacks, and our method does not need to determine the order of integration. If the model time series cointegrates with the observation time series , the minimum of the absolute correlation of CCM and holds. Furthermore, if the model time series is a good proxy of the observation time series, i.e., the observation time series is just the model time series with noise, then the cointegrating parameter and . Therefore, we define the Euclidean distance in the three-dimensional space from to , i.e., , as the criterion to check whether the model is a good proxy.

Figure 4

The Relationship between Models and Observations of Global Warming

(A) The observation time series and 48 model time series of temperature anomalies from 1906 to 2005. The black lines are observation time series, and the colored lines are model time series.

(B) The three-dimensional space . Each point represents a model time series. The darker the point is, the smaller is the distance to the critical point .

The Relationship between Models and Observations of Global Warming (A) The observation time series and 48 model time series of temperature anomalies from 1906 to 2005. The black lines are observation time series, and the colored lines are model time series. (B) The three-dimensional space . Each point represents a model time series. The darker the point is, the smaller is the distance to the critical point . Most model time series have a cointegrating relationship with the observation time series due to (see Figure 4B), but the cointegrating parameters . Therefore, most models successfully capture the real climate system, but they are always higher than the observation, i.e., these models overestimate the global warming. In particular, as time passes, the overestimated value will increase. We display the exact values of each model in Table S11. Here, we just propose some warnings about the climate models rather than negating their significance (Oreskes, 2004).

Identifying the Possible Synchronization in Electroencephalographic Signals

The synchronization phenomena in neuroscience have been increasingly regarded as essential for the functional coupling of different brain regions (Varela et al., 2001, Fries et al., 2001, Izhikevich, 2007), and pathological synchronization has been regarded as a main mechanism responsible for an epileptic seizure (Traub and Wong, 1982, Iasemidis, 2003, Mormann et al., 2003). Many synchronization measures are proposed, namely, nonlinear interdependence (Le Van Quyen et al., 1998), mutual information (Jeong et al., 2001), cross-correlation (Chandaka et al., 2009), and coherence function (Shaw, 1981). Because both synchronization and universal cointegration describe dynamic fluctuations around the equilibrium, universal cointegration can determine the synchronization and its extent. The potential synchronization between the left and right hemisphere rat electroencephalographic (EEG) channels is hard to guess beforehand from the raw data, and whether universal cointegration can provide a relevant contribution to the study of synchronization in EEG and whether it can disclose information are difficult to obtain by visual inspection. If the synchronization between the left and right time series is strong, the cointegrating relationship will hold, where is left/right time series. We analyze the potential synchronization between the left and right EEG channels in three different cases. Although the possible synchronization in each case is difficult to determine by visual inspection (see Figures 5A, S9A, S9C, and S9E), checking the universal cointegration and comparing the result is simple. We find that the synchronization extent of these cases is 1>3>2 in general. In case 1, the absolute correlations of universal cointegration from right to left and from left to right are both small; thus, its synchronization is always strong (see Figure 5B). The regions of dark color in case 2 display the disappearance of order. For different cell sizes, such as 50, 250, and 500, the results are similar (see Figures S9B, S9D, and S9F). Although we do not have objective means for claiming that the difference between the synchronization of the time series is significant, the quantification of synchronization between different time series can complement the conventional visual analysis and can even be of clinical value (Quiroga et al., 2002). The method is so simple, straightforward, and fast that it is very easy to be adopted as an online implementation. Additionally, the method should not be restricted to EEG data and can be valuable for studying the synchronization of other time series.

Figure 5

The Possible Synchronization in Electroencephalographic Signals

(A) Three cases of rat EEG signals/time series from right and left cortical intracranial electrodes. For better visualization, all time series are plotted with an offset step of 3.

(B) The minimum of the absolute correlation of CCM of each case. Each row represents a possible cointegration, and each cell represents its value calculated by 125 steps.

The Possible Synchronization in Electroencephalographic Signals (A) Three cases of rat EEG signals/time series from right and left cortical intracranial electrodes. For better visualization, all time series are plotted with an offset step of 3. (B) The minimum of the absolute correlation of CCM of each case. Each row represents a possible cointegration, and each cell represents its value calculated by 125 steps.

Determining the Possible Leadership of Bitcoin in the Cryptocurrency Market

In recent years, a new type of financial asset, cryptocurrency, has been introduced, and it is emerging as a new topic in empirical economic studies (Donier and Bouchaud, 2015, Gatfaoui et al., 2017, Tu et al., 2018). Notably, its market capitalization is approximately 300 billion dollars, and it is traded with many of the main national currencies with daily trading of more than 10 billion USD. The seminal and most popular cryptocurrency is Bitcoin (BTC), occupying half of the market capitalization and trade volume in the whole cryptocurrency market (ElBahrawy et al., 2017). The last half year has witnessed an unusual rise and fall in the price of cryptocurrency, emerging as an asset bubble (Bariviera, 2017, Blau, 2017); for example, the price of BTC increased from approximately 4,000 dollars in early October 2017 to approximately 20,000 dollars at its peak in December 2017. Because there is no compelling way to assess the fundamentals of cryptocurrency price, the tremendous price fluctuation is elusory and instantaneous (Urquhart, 2017, Ametrano, 2016). Despite the theoretical and economic interest in the cryptocurrency market, a comprehensive analysis of whether BTC is the factual leader is still lacking. The classic cointegration analysis is not a good choice to answer this question because it is difficult or unrealistic to guarantee the prerequisite, i.e., all time series are together even if after preprocessing of econometrics. As in the previous discussions, our universal cointegration is appropriate for addressing this question naturally and heuristically. Our last real-world example focuses on whether BTC is the leader of other cryptocurrencies and its extent. If BTC is the leader, the cointegrating relationship from BTC to another cryptocurrency will hold, where is the price time series of BTC and is another cryptocurrency. The smaller the absolute correlation of CCM is, the stronger the leadership of BTC. Here, we consider the evolution of universal cointegration from BTC to ETH, XRP, LTC, EOS, ADA, and XLM between April 1, 2017 and March 31, 2018. In general, the universal cointegration of all cryptocurrencies presents similar behavior (see Figure 6B). Before February 2018, the absolute correlation of each cointegration is approximately 0.1 with a small but saltatory fluctuation, i.e., the leadership of BTC weakly exists. Then, a cliffy peak with a width of a half month emerges in the middle of February 2018, and the peak value is larger than 0.6, i.e., the leadership of BTC disappears suddenly and extremely. After the peak, the value falls steeply and approximates to 0 at the beginning of March, i.e., the leadership of BTC strongly exists. Finally, it increases progressively, as does the leadership of BTC. Because the length of the time window is 90, we left shift 45 to discuss the relationship between the leadership and the price of BTC (see Figure 6A). In the corresponding region of the first region, from the beginning of October 2017 to the middle of December 2017, the price of BTC increases step by step. In the region from the middle of December 2017 to the end of December 2017, corresponding to the second region, the price of BTC maintains a high fluctuation with a relatively small shock. Then, from the beginning of January 2018 to the end of January 2018, the price of BTC undergoes a steep decrease. Finally, from the beginning of February 2018 to the middle of February 2018, the price of BTC increases gradually. Because BTC occupies half of the whole cryptocurrency market, its steep fall would lead to market jitters and then market collapse. Therefore, the behaviors of other cryptocurrencies are similar to those of BTC. Finally, the leadership of BTC strongly exists. If the price of BTC maintains a high fluctuation with a relatively small shock, hot money in the cryptocurrency market does not have an effective investment opportunity. Therefore, they will transfer to other cryptocurrencies. At last, the leadership of BTC disappears. In other cases, different cryptocurrencies present not only the behavior of BTC but also individual behavior. Therefore, the leadership of BTC may exist, but its extent is not enough. Other time windows will obtain similar results (see Figure S10).

Figure 6

The Possible Leadership of Bitcoin in the Cryptocurrency Market

(A) The price time series of seven typical cryptocurrencies, including Bitcoin (BTC), Ethereum (ETH), Ripple (XRP), Litecoin (LTC), EOS (EOS), Cardano (ADA), and Stellar (XLM), from April 1, 2017, to March 31, 2018.

(B) The absolute correlation of CCM from BTC to other cryptocurrencies. The time window is 90, approximately half of the length of time series.

The Possible Leadership of Bitcoin in the Cryptocurrency Market (A) The price time series of seven typical cryptocurrencies, including Bitcoin (BTC), Ethereum (ETH), Ripple (XRP), Litecoin (LTC), EOS (EOS), Cardano (ADA), and Stellar (XLM), from April 1, 2017, to March 31, 2018. (B) The absolute correlation of CCM from BTC to other cryptocurrencies. The time window is 90, approximately half of the length of time series.

Discussion

The classic cointegration analysis, such as Engle-Granger cointegration and Johansen cointegration, only considers or restricts in . These methods assume that all time series are and a vector exists such that the linear combination is ; otherwise, preprocessing will be adopted. However, even if the prerequisite that all time series are all is not satisfied, these methods can still be run mechanically. Additionally, these preprocessings cannot guarantee that all processed time series are all together. Even if so, the cointegration exists in the processed time series rather than the original time series. To address this, we introduce a method based on searching the vector to minimize the absolute correlation of CCM that can explore the universal cointegration and its extent. Our method is not in competition with the classic cointegration analysis; rather, it is specifically aimed at a class of systems not covered by classic cointegration analysis. Thus, it is not surprising that applying classic cointegration analysis to some examples fall outside of its range but covered by universal cointegration is largely uncertain and confusing (see Transparent Methods, section Application of classic cointegration and its misuse). Many methods to construct complex networks from empirical data have been proposed, but we suggest that universal cointegration inferred from time series of each pair of nodes provides a complement to the conventional method. The ability to resolve a cointegration network from their dynamical behavior sheds new light on the underlying mechanisms and driving forces, particularly when it is important to know the importance of nodes interacting as a group and need to be considered together. For example, in financial markets, accurate knowledge of the cointegration network not only can offer the potential opportunity of pairs trading but also deepens our understanding of financial collapse (Tu, 2014, Iori and Mantegna, 2018).

Limitations of the Study

Because CCM is restricted to discrete-time settings, universal cointegration cannot apply to continuous-time settings directly. The main problem is that a continuous-time signal cannot construct a shadow manifold. However, in practice, we can transform a continuous-time signal into a discrete-time signal by sampling. In the manuscript, the time series of period oscillation (see Figures 1A, S3A, and S7) is sampled from a continuous-time signal at a sampling interval of . Classic cointegration obtains the parameters directly, such as . Although it requires little calculation time, it does not test other parameters that may obtain a stronger cointegration, i.e., a vector exists such that more approximates to a noise. The calculation time required for universal cointegration is often larger than that of classic cointegration, and it mainly depends on the maximum number of iterations of Bayesian minimization. When this value increases, Bayesian minimization will be tried more times to obtain a smaller , then the cointegration will be stronger.

Methods

All methods can be found in the accompanying Transparent Methods supplemental file.

15 in total

Universal Cointegration and Its Applications.

Introduction

Results

Mathematical Model Examples

Unidirectional Coupling

Bidirectional Coupling

Synchronization Coupling

System Size Is Larger than Two

Real-World Examples

Checking the Relationship between Models and Observations of Global Warming

Identifying the Possible Synchronization in Electroencephalographic Signals

Determining the Possible Leadership of Bitcoin in the Cryptocurrency Market

Discussion

Limitations of the Study

Methods

1. Modulation of oscillatory neuronal synchronization by selective visual attention.

2. Performance of different synchronization measures in real data: a case study on electroencephalographic signals.

3. Beyond the ivory tower. The scientific consensus on climate change.

4. Reconciling anthropogenic climate change with observed temperature 1998-2008.

5. Nonlinear interdependencies of EEG signals in human intracranially recorded temporal lobe seizures.

6. An introduction to the coherence function and its use in EEG signal analysis.

7. Mutual information analysis of the EEG in patients with Alzheimer's disease.

8. Epileptic seizures are preceded by a decrease in synchronization.

Review 9. Epileptic seizure prediction and control.

10. Why Do Markets Crash? Bitcoin Data Offers Unprecedented Insights.