Literature DB >> 33265527

Complexity Analysis of Global Temperature Time Series.

António M Lopes1, J A Tenreiro Machado2.   

Abstract

Climate has complex dynamics due to the plethora of phenomena underlying its evolution. These characteristics pose challenges to conducting solid quantitative analysis and reaching assertive conclusions. In this paper, the global temperature time series (TTS) is viewed as a manifestation of the climate evolution, and its complexity is calculated by means of four different indices, namely the Lempel-Ziv complexity, sample entropy, signal harmonics power ratio, and fractal dimension. In the first phase, the monthly mean TTS is pre-processed by means of empirical mode decomposition, and the TTS trend is calculated. In the second phase, the complexity of the detrended signals is estimated. The four indices capture distinct features of the TTS dynamics in a 4-dim space. Hierarchical clustering is adopted for dimensional reduction and visualization in the 2-dim space. The results show that TTS complexity exhibits space-time variability, suggesting the presence of distinct climate forcing processes in both dimensions. Numerical examples with real-world data demonstrate the effectiveness of the approach.

Entities:  

Keywords:  Lempel–Ziv complexity; complexity; sample entropy; temperature time series

Year:  2018        PMID: 33265527      PMCID: PMC7512956          DOI: 10.3390/e20060437

Source DB:  PubMed          Journal:  Entropy (Basel)        ISSN: 1099-4300            Impact factor:   2.524


1. Introduction

Understanding climatic variability is extremely important nowadays. The Earth’s climate is changing and we are facing an increasing number of extreme weather events, like floods, droughts and anomalous temperatures. Since the beginning of the 20th century the Earth’s surface average temperature has increased by nearly 0.8 °C [1,2,3], and several projections based on climate models estimate that during this century, the average temperature may increase by between 1.1 °C and 6.4 °C [1]. It has been verified that the global surface temperature has risen at an average rate of about 0.17 °C per decade since 1970 [1]. Moreover, the annual mean Arctic sea-ice extent decreased at a rate of about 3.5% to 4.1% per decade over the period from 1979 to 2012 [4]. The Earth is warming and it seems that human activity is one of the main causes for this phenomenon [5,6]. Estimates from the Intergovernmental Panel on Climate Change (IPCC) revealed a considerable increase of anthropogenic greenhouse gas emissions since the pre-industrial era [1]. In the period from 1970–2010, the annual growth rate of emissions was above 1.3% [7]. Some impacts of global warming, such as record high temperatures, melting glaciers and severe flooding are becoming increasingly common around the world [8,9,10,11,12]. Besides the direct effect on temperature, warming leads to the modification of wind patterns, the development of humidity, and significant changes in the rates of precipitation. These phenomena are under intensive research due to their major impacts on the social, economic and health aspects of human life [13,14,15,16]. The analysis of temperature time series (TTS) is an important topic that can help us understanding better climate dynamics. Several indicators of climate change can be used, but the time evolution of the Earth’s surface temperature is probably the more straightforward measure to follow [17]. Furthermore, reliable temperature data records comprising information from meteorological stations around the world are available to support data processing [18,19]. We can find in the literature several strategies for analyzing the TTS [20]. Grieser et al. [21] studied the monthly mean temperatures of European meteorological stations. They applied statistical tools to decompose the TTS into significant components and found that the phase of the annual cycle shifts within the year, backwards and forwards in Western and Eastern Europe, respectively. The authors observed that the occurrence of extreme events increased significantly and corresponded mostly to cold peaks in winter. They also found large harmonic components with a period of 92.3 months in data from Northern and Western Europe. Additionally, this period, corresponding to about 7.7 years, was also observed in the North Atlantic Oscillation. Hughes et al. [22] pointed out that the Antarctic Peninsula region is warming faster than the rest of the world. They analyzed the minimum and maximum TTS at the Faraday/Vernadsky (Latitude 65.25 S, Longitude 64.25 W) station using a multiple regression model with non-Gaussian correlated errors. It was calculated that there was an increase in the minimum monthly temperature of 6.78 °C during the years 1951–2003. Viola et al. [23] adopted nonlinear processing tools to analyze TTS. The method of delay coordinates was applied for state space reconstruction and the delay parameters were determined by means of two methods, namely the average mutual information and the false nearest neighbors. Temperatures were extrapolated until 2028 by means of simple nonlinear prediction, which, after the state space reconstruction, uses a limited number of neighboring samples to perform predictions. Founda et al. [24] employed linear regression fitting to analyze the air TTS in Athens from 1897 until 2001 and observed a tendency towards warmer years. Ge et al. [25] studied the temperature variation in China over 20 centuries, revealing not only the cold periods from 1620–1710 and 1800–1860, but also the warming registered during the 20th century. Deser et al. [26] examined the trends of the sea surface and marine air temperatures and observed warming in all cases, with the exception of the Northwestern Atlantic Ocean. The largest warming trends were found in the middle latitudes of both the Northern and Southern hemispheres. Oñate and Pou [27] processed TTS from eleven meteorological stations on the Iberian Peninsula. Trends were identified by means of the Mann–Kendall test, and multidimensional scaling provided automatic clustering. More recently, Stephenson and Doblas-Reyes [28] used multidimensional scaling as an exploratory tool to describe ensembles of forecasts. Multidimensional scaling is a computational technique for exploring similarities between groups of data and easing the visualization of hidden patterns [29]. Lopes and Machado [5] adopted multidimensional scaling to study complex correlations between global TTS. Earth surface temperatures exhibit slow space-time system dynamics that embed long memory effects and complex relationships. Climate data can be analyzed in a straightforward manner by calculating the means and variances for different levels of spacial and temporal aggregation, but these methods neglect long range dynamic phenomena. Furthermore, while many studies have addressed the problem of global warming, the aspect of the increasing complexity of the data has been somewhat overlooked. In fact, the complexity of the TTS is embedded into the measurements, but needs to be tackled independently of the problem of increasing temperatures. The complexity point of view may lead to a deeper insight into the predictability of future TTS evolution. In this paper, global TTS are viewed as manifestations of a dynamic system. For each TTS, the trend is calculated by means of empirical mode decomposition (EMD), and the complexity of the detrended series is assessed by means of four indices, namely the Lempel–Ziv complexity (), the sample entropy (), the signal harmonics power ratio (), and the fractal dimension (). The four indices capture distinct features of the TTS in a multidimensional space that is projected in 2D through hierarchical clustering. The space-time variability of the complexity measures suggests that the inputs of climate vary in both dimensions. In particular, the increasing complexity of the TTS in recent years shows that temperatures are becoming more difficult to predict. Numerical examples using real-world data for one century as well as the four complexity indices, illustrate the effectiveness of this new point of view for tackling TTS. With these ideas in mind, the paper is organized as follows. Section 2 presents the main fundamental issues that influence global warming. Section 3 overviews the mathematical tools used for processing data. Section 4 and Section 5 analyze the global TTS complexity by means of different indices. Finally, Section 6 outlines the conclusions.

2. Mathematical Fundamentals

This section introduces the main mathematical tools adopted for data processing, namely the , , Fourier series (FS), EMD, and . These tools are well suited to the time series generated by many natural phenomena, such as climatic, geophysical and biological processes.

2.1. Lempel–Ziv Complexity

The is a method of symbolic sequence analysis that measures the complexity of finite time series [30,31,32]. The is based on computing the number of distinct subsequences and their recurrence rates along the series of data [33,34,35,36]. Let us consider the finite time series, , where represents the discrete sampling instants. In the first phase, is converted into a symbolic binary sequence, , such that where, denotes a pre-defined threshold value. Usually the median of is adopted for , since it performs robustly against outliers [37]. In a second phase, we parse the original sequence, S, from left to right, to obtain a new (parsed) sequence, , formed by distinct subsequences or words. To obtain , different parsing methods are available [38,39,40]. Herein we summarize the original scheme proposed by Lempel and Ziv [30,41]. Let denote a subsequence of S composed of the symbols between positions p and q (in our case, time instants). Thus, we have , for , and (i.e., the empty set), for . The parsed sequence, , is obtained as follows: Initialize: The parsed sequence, ; An auxiliary sequence, ; A pointer, , such that, points to the symbol, , of S; . At the iteration p: Advance from to and extend the auxiliary sequence, Q, by appending the symbol ; Check whether the current auxiliary sequence Q matches any subsequence of . If no match is found, then append, as a new word, the auxiliary sequence, Q, to the parsed sequence, , and reset to ; Increment p; If , then go to step 2; If , then append Q to to yield the final parsed sequence; Count the number of different words, , in . To obtain a complexity measure independent of the length of S, the count value is normalized, yielding [30] which reflects the rate of occurrence of new patterns in S.

2.2. Sample Entropy

The is a modified version of the “approximate entropy” [42], used to measure complexity. It is defined as the symmetrical version of the logarithm of the conditional probability of two sequences that are similar for m points, remain identical for points. The value of has little variation across the length of a time series, yielding a characteristic measure of its complexity [43]. The can be calculated by the five steps presented below [44,45,46,47,48]: Generate a set of m-dimensional vectors, , representing m consecutive values of , starting at point p; Determine the similarity, , between and (i.e., the vector representing m consecutive values of , starting at point q), by calculating where is a distance, and r is a tolerance value; Compute the correlation sum, , under the constraint , to exclude self-matches, Find the probability, , of template matching for all vectors by Calculate the as The values of the parameters m and r are important in determining the , but no exact guidelines exist for optimizing their values. We know that the accuracy and confidence of the estimates increase when a small m (short template) and large r (wide tolerance) [42,46] are chosen. Moreover, when r is too small, poor conditional probability estimates are obtained, while when r is too large, detailed information is lost and tends to be 0. To avoid problems caused by noise r must be chosen carefully [42,46]. The generally-adopted parameter values are or , and , where denotes the standard deviation of the original time series, , and [46].

2.3. Fourier Analysis

The classical FS was proposed by Joseph Fourier (1768–1830) when studying problems in the area of heat conduction. The FS expresses a given signal by means of an orthonormal basis of trigonometric functions. Let us assume that , , is a periodic function with period T, , and , where f and denote the frequency and angular frequency, respectively. The signal can be expressed by means of FS expansion [49,50]: where , and the Fourier coefficients, , are given by According to Parseval’s theorem, the average power of is given by

2.4. Empirical Mode Decomposition

The EMD is a powerful signal processing tool that is able to deal with non-stationary and non-linear time series [51,52,53]. The basic idea is that the original signal is decomposed into a finite set of simpler signals called intrinsic mode functions (IMF). The algorithm is as follows [51]: Identify all extrema of ; Interpolate between minima and maxima, and find the lower and upper envelopes, and , respectively; Calculate the mean using ; Extract the details using ; Iterate on the residual, . For real-world data, this procedure iterates steps 1–4 until the detail signal, , is an intrinsic mode function (IMF), meaning that it has a zero-mean according to some criterion. It should be noted that the complete process guarantees that the number of extrema decreases between two successive residuals and that the decomposition is achieved with a finite number of modes. The original signal can be represented by where represent the modes and denotes the residual.

2.5. Fractal Dimension

A fractal is a geometrical object that displays identical patterns at different scales. The is a measure of how much the fractal fills the space as it is magnified from larger to smaller scales [54,55,56]. The box-counting method is often adopted for estimating the , due to its straightforward numerical implementation. For a fractal represented by a binary image, with non-zero pixels belonging to the fractal, and zero pixels forming the image background, the box counting algorithm is as follows [56]: Pad the image with background pixels so that its dimensions are at a power of 2; Cover the fractal object with a grid of squares with size (in the first iteration, there is just one square of equal size to the size of the image); Count the number of boxes (i.e., squares), , needed to cover the object; If , then make and repeat step 2. Estimate the as the slope of the log-log plot, versus , calculated by means of the least squares method. The estimate of obtained by the algorithm is a good approximation to

3. Dataset

The University of Delaware “Terrestrial Air Temperature: 1900–2014 Gridded Monthly Time Series (V 4.01)” was used. The dataset is publicly available at the website http://climate.geog.udel.edu/~climate/html_pages/download.html#T2014, and was retrieved on 1 March 2018. The archive consists of monthly mean air TTS for the period from January 1900 up to December 2014. The spatial coverage corresponds to land areas, with one TTS per cell, on a latitude × longitude grid of resolution. The dataset was compiled by means of interpolation techniques on data gathered from several sources, including the Global Historical Climatology Network Monthly Version 3 (GHCN3) dataset [18]; the Daily Global Historical Climatology Network (GHCN-Daily) archive [57]; the Atmospheric Environment Service/Environment Canada archive; records from the State Hydrometeorological Institute of St. Petersburg, Russia; data for Greenland provided by the GC-Net [58]; records from the Automatic Weather Station Project (Charles R. Stearns, University of Wisconsin—Madison); the Global Synoptic Climatology Network archive (Dataset 9290c, of the National Climatic Data Center); and observations contained within the Global Surface Summary of Day (GSOD) [59]. Herein, we illustrate the usability of the dataset, while highlighting some aspects of the global warming revealed by the temperature anomalies [13]. Given a reference period, , from year up to year , the temperature anomaly for the period is where represents the average temperature of year n. Temperature anomalies represent deviations from a reference value, or long-term average [13]. Positive (negative) anomalies mean that the observed temperatures are higher (lower) than the reference value. Figure 1 depicts the annual land temperature anomalies for the years , , and , relative to the period . The results show the Earth’s warming clearly, with emphasis on certain regions, namely in the Northern Hemisphere.
Figure 1

Worldwide land temperature anomalies relative to for the 5-year periods: (a) ; (b) ; (c) ; (d) .

4. On the Complexity of TTS

The TTS, , , exhibits a trend. Here, k represents time with a 1-month resolution, and and correspond to the initial and final instants—January 1900 and December 2014, respectively. The EMD proved to be a good method for determining the trends of data from nonstationary and nonlinear processes, without the need to know the trends’ functional forms [53]. Within this approach, the TTS trend was estimated by the residuals of the EMD. Figure 2 depicts the original data, , the IMF modes, , , and the residuals, , obtained for a TTS of a station with latitude/longitude coordinates , following the procedure reported in [60]. Figure 3 illustrates the original data, , and the trend, .
Figure 2

The intrinsic mode functions (IMF), , , and the residuals, , obtained for a temperature time series (TTS) of a station with latitude/longitude coordinates .

Figure 3

Monthly mean TTS, , and trend approximation, , for the TTS at the latitude/longitude coordinates .

4.1. Lempel–Ziv Complexity of the TTS

Before determining the , we first detrended the original TTS by calculating , where was determined by EMD. Figure 4 exemplifies the detrending pre-processing ofn the TTS with latitude/longitude coordinates .
Figure 4

The detrending pre-processing of the TTS with latitude/longitude coordinates by means of EMD.

The algorithm shown in Section 2.1 was then applied to each on the regular grid covering land areas, totaling TTS. Figure 5 represents the on a contour map. Using to the index, we verified that the monthly mean TTS values in the Northern Hemisphere and in certain regions of South America and Australia are more complex than those in other regions of the Earth. This spatial variability of complexity suggests that climate dynamics are driven by distinct regional forcing processes.
Figure 5

Contour map depicting for the total TTS of the grid on land.

4.2. Sample Entropy of the TTS

To calculate the , we applied the algorithm from Section 2.2 to the detrended TTS, , with and , as discussed in Section 2.2. Figure 6 represents the for each TTS of the grid on land.
Figure 6

Contour map depicting for the total TTS of the grid on land.

To compare the results obtained with and we calculated the two-dimensional correlation, R, as where denotes the mean of all values in a matrix. The correlation value obtained, , quantified the similarities observed between Figure 5 and Figure 6, showing that both indices yield similar spacial patterns. The results thus suggest the existence of distinct regional climate forcing processes.

4.3. Harmonic Content of the TTS

In this sub-section, we propose an alternative index for measuring the complexity of TTS. The index, , is given by the harmonics power ratio of the signals: where when months. Following Parseval’s theorem (10), and denote the signal power captured by the DC and first harmonic components, and the total power of , respectively. Therefore, measures the signal power contained on the higher harmonics of . Figure 7 represents the for the TTS of the grid on land. We verified that tropical and equatorial regions are characterized by larger values of .
Figure 7

Contour map depicting for the total TTS of the grid on land.

4.4. Fractal Dimension of the TTS

Here we calculate the for the phase portrait of the TTS. A n-dimensional dynamical system can be represented by a set of first-order differential equations governing n state variables. By knowing the state variables at time and the system inputs for , the system behavior for can be determined. The state space consists of the set of all possible states, each one corresponding to a unique point. As k evolves, sequences of points describing trajectories in the state space are obtained. The set of trajectories is known as the state space portrait. The phase portrait (PP) corresponds to the two-dimensional state space representation [61,62]. We consider the detrended TTS, , and its numerical time derivative, , to be the state variables. Figure 8 represents the PP of the detrended TTS with latitude/longitude coordinates .
Figure 8

The phase portrait (PP) of the detrended TTS with latitude/longitude coordinates .

Figure 9 represents the for all TTS of the grid on land.
Figure 9

Contour map depicting for the total TTS of the grid on land.

5. Temporal Dynamics of Global Warming

In this section we describe the study of the temporal complexity of global warming. In the first phase, TTS measured on land (within a total of ) was divided into time intervals, , , by means of a 30-year length sliding window, with a 28-year overlap. For each time interval, , we calculated the L-dimensional vectors, , , and , representing the complexity of the detrended TTS at the time intervals , captured in the perspective of each of the four indices. Experiments demonstrated that the window adopted established a good compromise between time discrimination and statistical significance of the data within each time interval. In a second phase, for each indice and we determined the relative frequency histograms. Figure 10 depicts the histograms for the four indices versus on contour maps, where the years correspond to the center of the time windows. The and maps exhibited two modes that increased with almost monotonically. Regarding the index, higher modes were mainly due to the TTS in the regions of Antarctica, South Africa, East Australia, North Africa, the Middle East and some setentrional areas of America and Asia. On the other hand, lower modes were obtained from TTS located in the tropical regions of Earth. In the , the higher and lower modes were due to TTS from equatorial and tropical regions, and to the TTS from North Australia, North Africa, East Asia and Northwest America. The and indices had single mode. For the , most TTS were from North Africa, South Asia, South America, Australia and Antarctica. For the , the main contribution was due to TTS from tropical regions.
Figure 10

Contour maps of the histograms of the complexity indices versus , : (a) ; (b) ; (c) ; (d) . Each year corresponds to the center of a time interval.

In the follow-up, to unveil temporal patterns, we adopted a hierarchical clustering (HC) analysis and visualization trees. The main objective of HC is to group together objects that are similar to one another in some sense [63,64]. The input of HC is a symmetric matrix , , where represents the Canberra distance between the time periods, [29,65]: In the HC algorithm, the successive (agglomerative) clustering and average-linkage methods were adopted. Figure 11 depicts the corresponding graph generated by the software, PHYLIP (http://evolution.genetics.washington.edu/phylip.html). We identified three clusters of TTS complexity patterns, closely coincident to time periods characterized by different temperature trends [66], namely the warming period, , the cooling trend in the middle of the 20th century, , and the new warming period, . Moreover, in , the HC unveiled two groups, namely and . In the clusters , and , we noted the emergence of some smaller clusters. In , we found subclusters composed of the years and . In , we verified the appearance of the groups and . Finally, in , there were and . These subclusters include between 4 and 6 time windows, suggesting some kind of periodicity. However, to the authors’ best knowledge, there have been no studies addressing these fluctuations with the standard perspective of the rise in temperature. Therefore, we question whether there is some hidden correlation between the increasing temperatures and their complexity that has somehow been overlooked in previous approaches. This open question will require additional studies with more detailed data analyses in time and space.
Figure 11

The hierarchical tree generated by the hierarchical clustering (HC) algorithm based on (17).

In summary, we verified that the complexity of the TTS is increasing, that a solid assessment is accomplished when a combination of several distinct measures is used, and that complexity is an important issue to consider when analyzing global warming.

6. Conclusions

The TTS embeds rich information contributed by a multitude of distinct factors at different scales. The complexity of the TTS is related to climate forcing factors. In this paper, information from worldwide land meteorological stations was processed and its complexity was calculated in space and time by means of different indices. The increasing complexity of the TTS in recent years demonstrates that there has been a change in the dominating factors that drive the climate. On one hand, we have natural forces, such as the orbital effect, solar radiation, volcanic eruptions, changes in land cover, greenhouse gas, and aerosols. On the other hand, we find anthropogenic factors, including the rise in greenhouse gas and aerosols caused by human activities, as well as sulfate air pollutants, reactive nitrogen, dust, urban heat islands, ozone change, and land-surfaces changes also due to human activities. Four complexity measures were adopted. However, no single measure captured all effects and a multi-measurement perspective led to a superior assessment of the system state. The resulting multidimensional set of measurements was synthesized by means of the HC visualization technique. The obtained clusters revealed clear patterns in the increasing complexity embedded in the TTS. More detailed quantification of these patterns by means of spatial statistics, namely spatial Fourier power spectrum, will be addressed in further research.
  15 in total

1.  Physiological time-series analysis using approximate entropy and sample entropy.

Authors:  J S Richman; J R Moorman
Journal:  Am J Physiol Heart Circ Physiol       Date:  2000-06       Impact factor: 4.733

2.  Approximate entropy as a measure of system complexity.

Authors:  S M Pincus
Journal:  Proc Natl Acad Sci U S A       Date:  1991-03-15       Impact factor: 11.205

3.  Approximate entropy (ApEn) as a complexity measure.

Authors:  Steve Pincus
Journal:  Chaos       Date:  1995-03       Impact factor: 3.642

4.  Fingerprints of global warming on wild animals and plants.

Authors:  Terry L Root; Jeff T Price; Kimberly R Hall; Stephen H Schneider; Cynthia Rosenzweig; J Alan Pounds
Journal:  Nature       Date:  2003-01-02       Impact factor: 49.962

5.  Interpretation of the Lempel-Ziv complexity measure in the context of biomedical signal analysis.

Authors:  Mateo Aboy; Roberto Hornero; Daniel Abásolo; Daniel Alvarez
Journal:  IEEE Trans Biomed Eng       Date:  2006-11       Impact factor: 4.538

6.  On the trend, detrending, and variability of nonlinear and nonstationary time series.

Authors:  Zhaohua Wu; Norden E Huang; Steven R Long; Chung-Kang Peng
Journal:  Proc Natl Acad Sci U S A       Date:  2007-09-10       Impact factor: 11.205

Review 7.  Estimating regularity in epileptic seizure time-series data. A complexity-measure approach.

Authors:  N Radhakrishnan; B N Gangadhar
Journal:  IEEE Eng Med Biol Mag       Date:  1998 May-Jun

8.  Evaluation of physiologic complexity in time series using generalized sample entropy and surrogate data analysis.

Authors:  Luiz Eduardo Virgilio Silva; Luiz Otavio Murta
Journal:  Chaos       Date:  2012-12       Impact factor: 3.642

9.  Detecting ventricular tachycardia and fibrillation by complexity measure.

Authors:  X S Zhang; Y S Zhu; N V Thakor; Z Z Wang
Journal:  IEEE Trans Biomed Eng       Date:  1999-05       Impact factor: 4.538

10.  Lempel-Ziv complexity of cortical activity during sleep and waking in rats.

Authors:  Daniel Abásolo; Samantha Simons; Rita Morgado da Silva; Giulio Tononi; Vladyslav V Vyazovskiy
Journal:  J Neurophysiol       Date:  2015-02-25       Impact factor: 2.714

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.