Literature DB >> 33286772

Composite Multiscale Partial Cross-Sample Entropy Analysis for Quantifying Intrinsic Similarity of Two Time Series Affected by Common External Factors.

Baogen Li¹, Guosheng Han¹, Shan Jiang¹, Zuguo Yu^1,2.

Abstract

In this paper, we propose a new cross-sample entropy, namely the composite multiscale partial cross-sample entropy (CMPCSE), for quantifying the intrinsic similarity of two time series affected by common external factors. First, in order to test the validity of CMPCSE, we apply it to three sets of artificial data. Experimental results show that CMPCSE can accurately measure the intrinsic cross-sample entropy of two simultaneously recorded time series by removing the effects from the third time series. Then CMPCSE is employed to investigate the partial cross-sample entropy of Shanghai securities composite index (SSEC) and Shenzhen Stock Exchange Component Index (SZSE) by eliminating the effect of Hang Seng Index (HSI). Compared with the composite multiscale cross-sample entropy, the results obtained by CMPCSE show that SSEC and SZSE have stronger similarity. We believe that CMPCSE is an effective tool to study intrinsic similarity of two time series.

Entities: Disease Species

Keywords: composite multiscale partial cross-sample entropy (CMPCSE); multiscale cross-sample entropy (MCSE); stock indices; time series

Year: 2020 PMID： 33286772 PMCID： PMC7597075 DOI： 10.3390/e22091003

Source DB: PubMed Journal: Entropy (Basel) ISSN： 1099-4300 Impact factor: 2.524

1. Introduction

Complex systems with interacting constituents exist in all aspects of nature and society, such as geophysics [1], solid state physics, climate system, ecosystem, financial system [2,3], and so forth. These complex systems are constantly generating a large number of time signals. Fortunately, in recent decades, numerous creative methods have been proposed to explore the operation mechanism of these complex systems. Among them, entropy-based methods are very powerful modern analysis technology. The concept of ’entropy’ was first proposed by Clausius to deal with thermodynamic problems, and then Boltzmann gave a microscopic explanation from the perspective of statistical mechanics and proposed Boltzmann entropy. Gibbs proposed Gibbs entropy when determining uncertain system. In 1948, Shannon introduced the concept of entropy into information theory and put forward Shannon entropy (information entropy) [4]. Shortly after that, Renyi extended it and proposed Renyi entropy [5]. In 1988, Tsallis gave a Generalization of Boltzmann-Gibbs Statistics and proposed Tsallis entropy [6]. Although Gibbs entropy and Shannon entropy have the same mathematical expression, Shannon entropy has a broader meaning than thermodynamic entropy, as all the basic laws of thermodynamics can be derived from information entropy [7]. Since information entropy and Shannon entropy were proposed, many entropy-based methods have been proposed to explore the system complexity through studying the time series generated from them [8,9]. In order to quantify the changing complexity of real finite time series, Picnus proposed the approximate entropy (ApEn) [10,11,12], which had been used to study biological time series [13,14]. In 2002, Richman et al. analyzed the deficiencies of ApEn and proposed the concept of sample entropy (SampEn). Compared with ApEn, SampEn agreed with theoretical results much closer than ApEn over a broad range of conditions, and has been successfully applied to clinical cardiovascular study [15,16]. Cross-sample entropy (Cross-SampEn) was also proposed for comparing two different time series to assess their degree of similarity [15]. And in 2010, when Liu et al. studied the correlation of foreign exchange time series, they found that cross-SampEn is superior to correlation coefficient in describing the correlation between the foreign exchange time series [17]. In 2003, Costa et al. found that an increase in the entropy of a system is usually but not always associated with an increase of complexity, so the traditional entropy-based algorithms may lead to misleading results [18]. And in order to avoid this situation, they introduced the multiscale sample entropy (MSE), which had been successfully used to study various dynamical systems [19,20,21,22,23]. Not long after that, MSE was extended to multiscale cross-sample entropy (MCSE) to measure the cross-sample entropy over different time scales. Unfortunately, in the process of multi-scale analysis, the coarse-grained procedure sets a higher requirement for the length of the time series, that is, when the length of the sequence is not long enough, it will get inaccurate results. In addition, in some cases, the insufficiency of sequence length will lead to no template vector matched to another, and hence the cross-sample entropy can not be defined. In order to overcome this shortcoming, Wu et al. proposed the composite multiscale sample entropy (CMSE) [24] and refined composite multiscale entropy (RCMSE) [25] successively. Inspired by CMSE and RCMSE, Yin et al. introduced composite multiscale cross-sample entropy (CMCSE) and Refined composite multiscale cross-sample entropy (RCMCSE) [26], which reduced the probability of undefined entropy and has been successfully used to study structural health monitoring system [27]. In 2018, in order to better study the time series from the stock market, Wu and his coworkers introduced modified multiscale sample entropy measure based on symbolic representation and similarity (MSEBSS) [28]. Recently, Wang et al. proposed multiscale cross-trend sample entropy (MCTSE) to study the similarity of two time series that with potential trends [29]. In addition, multivariate multiscale sample entropy algorithm has been proposed to deal with multivariate data [30,31,32]. Recently, Jamin and Humeau-Heurtier offered a state-of-the-art on cross-entropy measures and their multiscale approaches in [33]. On the other hand, when some scholars studied the long-range correlation between time series, they found that if two non-stationary time series are driven by a common third-party force or by common external factors, the result without considering the common third-party force may not reflect their intrinsic relationship [34,35,36]. Fortunately, Baba et al. [37] found that if two time series affected by the external factors are additive, the levels of intrinsic cross-correlation between two time series can be measured by the partial cross-correlation coefficient. In 2015, Yuan et al. [38] and Qian et al. [39] introduced partial cross-correlation analysis to deal with this kind of situation from different departure points. Inspired by the above works, we propose the composite multiscale partial cross-sample entropy (CMPCSE) to measure the intrinsic similarity of two time series affected by the third common external factor simultaneously in this paper. We first test CMPCSE on three sets of artificial data, and find that it can reveal the intrinsic similarity of the time series come from the models, and then apply it to a set of stock market indices.

2. Composite Multiscale Partial Cross-Sample Entropy

In this section, based on CMCSE [26], we propose a new method-composite multiscale partial cross-sample entropy (CMPCSE), which can be used to quantify the intrinsic similarity of two time series linearly affected by a common external factor. Consider two time series recorded simultaneously, and linearly affected by , the main steps of CMPCSE are as follows: Step 1: First we eliminate the effect of on x and y, respectively. The additive model for models of and can be given respectively as: where, . When using regression analysis to estimate the value , in a window of length s, we use the idea in MF-TWXDFA [40] to remove the effect of the sequence on and point by point as follows. For a given integer s (), the points j contained in a sliding window corresponding to the point i should satisfy . When the length of time series is different, we take different value for s. Usually the value of s is determined by experience. Accordingly, the weight function of the geographic weighted regression model is: In the window , we perform linear regression for on or on , respectively. We can get the regression values and of and , respectively. Then we get the corresponding estimates of : Then the normalized data of are defined as and , respectively. Here and are the corresponding mean and standard deviation. Next, we calculate the CMCSE of and . Step 2: Construct coarse-grained time series from the series and with the scale factor , respectively. Then we get and . Each point of the -th coarse-grained time series at a scale factor of is defined as For scale one , the times series and are the original series and . For , Figure 1 and Figure 2 show two more intuitive examples of the coarse-grained procedure.

Figure 1

Schematic illustration of the coarse-grained procedure of composite multiscale partial cross-sample entropy (CMPCSE) when . Modified from Reference [24].

Figure 2

Schematic illustration of the coarse-grained procedure of CMPCSE when . Modified from Reference [24].

Step 3: According to the following formula, construct vector sequences with length m from and respectively. Let be the number of vectors whose distance with is within the tolerance r. And then represents the total number of m-dimensional matched vector pairs and is obtained from the two -th coarse-grained time series at a scale factor of . Similarly, is the total number of matches of length . Finally, the CMPCSE is calculated with the equation: where means that neither nor is zero, that is, makes sense, and is the number that makes meaningful at a scale factor . A more intuitive procedure of CMPCSE is shown in Figure 3.

Figure 3

Flow charts of the CMPCSE algorithms.

In this paper, the entropies are calculated from scale 1 to 20, that is . And the cross-sample entropy of each pair of coarse-grained series is calculated with and , where is the value selected from the candidate set according to the criterion proposed by Lake et al. [16].

3. Numerical Experiments for Artificial Time Series

In this section, we use a additive model of x and y as Equation (9) to perform numerical simulation and verify the effectiveness of the CMPCSE. In the following numerical simulations, the series are generated from the Bivariate Fractional Brownian Motion (BFBMs), TWO-component ARFIMA process and Multifractal binomial measures, respectively, and all the third party interference factor series are pink() noise generated by the DSP System Toolbox in MATLAB 2016. In the experiments, all the results about the sequences with random terms are the average of 100 repeated results with series length .

3.1. Bivariate Fractional Brownian Motion (BFBMs)

In this subsection, in order to test the performance of CMPCSE, we first use it to calculate the partial cross-sample entropy of BFBMs in the two sets of the above additive models (Equation (9)). The and are the incremental series of the two components of BFBMs with Hurst indices and . Extensive research on BFMS has been made. We know that BFBMs is a single fractal process and there is a relationship [41,42,43]. Wei et al. studied the long-range power cross-correlations between and in 2017 [40]. In the simulations, we set: (left) , , ; (right) ; where is the cross-correlation coefficient between and . We apply the CMPCSE method to the series simulated by BFBMs and pink noise. Figure 4 shows the results between the series simulated by the pink noise and BFBMs with (left) , , ; (right) From Figure 4 we can know that the entropy values of and are very close at all time scales, but there are obviously discrepancy between the values of and except when the time scale equal to 1, which indicates that, when are affected by the third party factor z simultaneously, the CMPCSE method can capture the intrinsic cross-sample entropy values of by eliminating the influence of z.

Figure 4

The CMPCSE results between the series simulated by the pink noise and bivariate fractional Brownian motion (BFBMs) with (left) ; (right)

3.2. TWO-Component ARFIMA Process

ARFIMA process is a monofractal process [40] and often used to model the power-law auto-correlations in stochastic variables [44]. It is defined as follows: where is a memory parameter, is an independent and identically distributed Gaussian variable, and , in which is the weight . The Hurst index is related to the memory parameters [45,46]. For the two-component ARFIMA processes discussed below, we take or Y. The two-component ARFIMA process is defined as follows [47]: where quantifies the coupling strength between the two processes and . When , and are fully decoupled and become two separate ARFIMA processes as defined in Equation (11). The cross-correlation between and increases when W decreases from 1 to [47]. In the process of our calculations, we choose and the parameters of ARFIMA as and respectively, and corresponding two error terms and share one independent and identically distributed Gaussian variable with zero mean and unit variance. The CMPCSE method was used to the series simulated by two-component ARFIMA process and pink noise. Figure 5 also shows that the entropy values of and are very close at all time scales, but there are obviously discrepancy between the values of and except when the time scale equal 1. It also means that, when are affected by the third party factor z simultaneously, one can use the CMPCSE to get intrinsic cross-sample entropy values of .

Figure 5

The CMPCSE results between the series simulated by the pink noise and two-component ARFIMA process with (left) ; (right)

3.3. Multifractal Binomial Measures

In this subsection, the series to be tested come from the binomial measures generated by model with known analytic multifractal properties [40]. We combine them with pink noise to test the performance of CMPCSE. Each binomial measure or multifractal signal can be generated by iteration. We start with the iteration , where the data set consists of one value, . In the kth iteration, the data set is obtained from and . When approaches to a binomial measures, and the scaling exponent function is: In our simulation, we iterated 12 times with and then get 3 binomial measures . In our actual calculation process, we set diff, here diff means the first order difference. We present CMCSE results of the series , and the CMPCSE in Figure 6 with and . From the two pictures in Figure 6, we can easily find out that the entropy values of and are very close at all time scales, but there are obviously discrepancy between the values of and . It also indicates that, when are affected by the third party factor z simultaneously, one can use the CMPCSE method to get intrinsic cross-sample entropy values of by eliminating the influence of z on .

Figure 6

The CMPCSE results between the series simulated by the pink noise and first order difference series of the binomial measures (left) ; (right) .

4. Application to Stock Market Index

In order to validate the applicability of the CMPCSE method for empirical time series, we then apply it to stock market indices. The analyzed data sets consist of three Chinese stock indices: Shanghai securities composite index (SSEC), Shenzhen Stock Exchange Component Index (SZSE) and Hang Seng Index (HSI). All the raw data were download from https://finance.yahoo.com/. Then the daily closing data for the indices from 26 December 1999, to 17 July 2020, were used. Due to the different opening dates in mainland and Hong Kong, we exclude the data recorded on different dates and then reconnect the remaining parts of the original series to obtain time series with same length. As a result, the final daily closing data length is 5000. In practice, we usually apply normalized time series. Denoting the closing index on the tth days as , the daily index return is defined by: . Then the normalized daily return is defined as , where and are the mean value and standard deviation of the series , respectively. In 2015, Shi and Shang studied the multisacle cross-correlation coefficient and multisacle cross-sample entropy between SSEC, SZSE and HSI [48]. From their results, we can know that there is a strong correlation between the return data of SSEC and SZSE, and both them have weak correlation with HSI. The results of our estimation and comparison of the cross-sample entropy of the two return time series SSEC and SZSE, which includes two cases of including and excluding the influence of the HSI index, are shown in Figure 7. From the entropy measure results of return data in Figure 7, one can easily find that the entropy values of SSEC-SZSE are always bigger than SSEC-SZSE:HSI at all scales, which means that if the entropy values of SSEC-SZSE calculated by CMCSE are used to estimate the degree of similarity between SSEC and SZSE, the similarity between them will be underestimated. That is to say, the partial cross-sample entropy SSEC-SZSE:HSI can deliver more reasonable and real synchronization between the two return time series of SSEC and SZSE. We believe this result is reasonable, as SSEC and SZSE are the two most important stock indices in the mainland of china, so their daily return data should have strong synchronicity, especially under large time scales.

Figure 7

Estimation and comparison of the cross-sample entropy between the two return time series Shanghai securities composite index (SSEC) and Shenzhen Stock Exchange Component Index (SZSE) when including and excluding the influence of the Hang Seng Index (HSI) index.

5. Discussion and Conclusions

In this paper, we proposed CMPCSE for quantifying intrinsic similarity of two time series affected by common external factors. Firstly, we described the calculation process of CMPCSE in detail. And then, in order to test the validity of CMPCSE, we applied it to three sets of artificial data. These three sets of artificial data were constructed by linear superposition of BFBMs, TWO-component ARFIMA process and Multifractal binomial measures with pink() noise respectively. The results of each set of the artificial data show that CMPCSE can accurately measure the intrinsic cross-sample entropy of two simultaneously recorded time series by removing the effects that come from pink noise. At last, CMPCSE was employed to investigate the partial cross-sample entropy of SSEC and SZSE by eliminating the effect of HSI. Compared with the conclusion from CMCSE, the results from CMPCSE show that SSEC and SZSE have stronger similarity. Because SSEC and SZSE are the two most important stock indices in the mainland of China, they should have strong consistency, especially under large time scales, so we think the result is reasonable and it is necessary to consider partial cross-sample entropy when one wants to measure the similarity of SZSE and SSEC. On the other hand, we must also note that the first step in the calculation of CMPCSE is crucial to the result of CMPCSE. Maybe there are other ways to eliminate the influence of the third party on the two time series that we studied. In our work, we adopted the idea from Reference [40] and satisfactory results were obtained in our artificial data examples. At the same time, in our research process, we also notice that when CMPCSE is used to study the linear combination of NBVP times series mentioned in Reference [26] and pink noise, which is constructed in the way mentioned above, we can not get satisfactory results. Therefore, we think that the way to eliminate the third-party influence in this paper can not achieve good results for the sequence with violent oscillation. Meanwhile, we expect to see better methods to deal with similar times series. All in all, we think the partial cross-sample entropy analysis is necessary when one wants to measure the similarity of two times series affected by common external factors and, at present, CMPCSE is a good choice.

18 in total

1. Approximate entropy as a measure of system complexity.

Authors: S M Pincus
Journal: Proc Natl Acad Sci U S A Date: 1991-03-15 Impact factor: 11.205

2. Sample entropy analysis of neonatal heart rate variability.

Authors: Douglas E Lake; Joshua S Richman; M Pamela Griffin; J Randall Moorman
Journal: Am J Physiol Regul Integr Comp Physiol Date: 2002-09 Impact factor: 3.619

3. Approximate entropy (ApEn) as a complexity measure.

Authors: Steve Pincus
Journal: Chaos Date: 1995-03 Impact factor: 3.642

4. Multivariate multiscale entropy for brain consciousness analysis.

Authors: Mosabber Uddin Ahmed; Ling Li; Jianting Cao; Danilo P Mandic
Journal: Conf Proc IEEE Eng Med Biol Soc Date: 2011

5. Fractionally integrated process with power-law correlations in variables and magnitudes.

Authors: Boris Podobnik; Plamen Ch Ivanov; Katica Biljakovic; Davor Horvatic; H Eugene Stanley; Ivo Grosse
Journal: Phys Rev E Stat Nonlin Soft Matter Phys Date: 2005-08-18

6. Detrended cross-correlation analysis: a new method for analyzing two nonstationary time series.

Authors: Boris Podobnik; H Eugene Stanley
Journal: Phys Rev Lett Date: 2008-02-27 Impact factor: 9.161

7. Approximate entropy: a regularity measure for fetal heart rate analysis.

Authors: S M Pincus; R R Viscarello
Journal: Obstet Gynecol Date: 1992-02 Impact factor: 7.661

8. Multifractal temporally weighted detrended cross-correlation analysis to quantify power-law cross-correlation and its application to stock markets.

Authors: Yun-Lan Wei; Zu-Guo Yu; Hai-Long Zou; Vo Anh
Journal: Chaos Date: 2017-06 Impact factor: 3.642