Literature DB >> 33267488

Comparing Information Metrics for a Coupled Ornstein-Uhlenbeck Process.

Abstract

It is often the case when studying complex dynamical systems that a statistical formulation can provide the greatest insight into the underlying dynamics. When discussing the behavior of such a system which is evolving in time, it is useful to have the notion of a metric between two given states. A popular measure of information change in a system under perturbation has been the relative entropy of the states, as this notion allows us to quantify the difference between states of a system at different times. In this paper, we investigate the relaxation problem given by a single and coupled Ornstein-Uhlenbeck (O-U) process and compare the information length with entropy-based metrics (relative entropy, Jensen divergence) as well as others. By measuring the total information length in the long time limit, we show that it is only the information length that preserves the linear geometry of the O-U process. In the coupled O-U process, the information length is shown to be capable of detecting changes in both components of the system even when other metrics would detect almost nothing in one of the components. We show in detail that the information length is sensitive to the evolution of subsystems.

Entities: Chemical Gene

Keywords: Fisher information; Fokker–Planck equation; Langevin equation; O-U process; information length; metrics; probability density function; stochastic processes

Year: 2019 PMID： 33267488 PMCID： PMC7515303 DOI： 10.3390/e21080775

Source DB: PubMed Journal: Entropy (Basel) ISSN： 1099-4300 Impact factor: 2.524

1. Introduction

Describing many natural systems statistically can give great insight into the system’s dynamics, when uncertainty or degrees of freedom are too high to do otherwise. Measures of information change can be particularly useful in understanding the evolution of a system under perturbation, or comparing data (e.g., see [1]). Here, by information, we specifically refer to a measurable, statistical difference between the states of a system, defined by probability density functions (PDFs), and avoid any of the more diaphanous definitions of the term. The statistical difference can be quantified by assigning a metric to probability, which then endows a stochastic system with a geometric structure. Previously, different metrics (e.g., Refs. [2,3,4,5,6,7,8,9,10] have been considered depending on the question of interest. A popular measure of the information change in a system would be entropy, which measures the uncertainty or ‘disorder’ of the system. More specifically, it is a measure of the number of states that are accessible from the current state. Comparing entropy at different times gives a measure of the difference in information for the system, called the relative entropy. We can use this relative entropy as a metric. Another example is the Wasserstein metric, which was used to optimize transport cost in the optimal transport problem [4,6]; for Gaussian PDFs, the Wasserstein metric is defined in the product space consisting of Euclidean and positive symmetric matrices for the mean and variance, respectively (e.g., see [2]). The link between the Fisher information [8] and the Wasserstein distance was made in [6] where the integral of the Fisher information [8] along the Ornstein–Uhlenbeck semigroup was shown to be the same as the Wasserstein distance. Furthermore [1] stated that relative entropy was the integral of Fisher information along the same path. However, the way in which the relative entropy has mostly been used in the past lacks a sense of locality to a metric of the system as it focuses on quantifying the difference between two given PDFs, for instance, PDFs at time and . As a result, they are independent of the intermediate PDFs between time and (the history/path of a system), and thus can only inform us about changes which affect the overall structure of the system. The work of [11] was, in part, a search for a disequilibrium component for a statistical complexity measure (SCM). In short, an SCM is a measure of both the ‘order’ and ‘disorder’ of a system, which can help to reveal hidden structures of a disordered system. They proposed several metrics ‘disorder’ or disequilibrium component of the SCM. In this paper we compare several of the proposed metrics of [11] with the information length [12,13,14,15,16,17,18,19,20]. The information length, proportional to the time integral of the square root of the infinitesimal relative entropy, depends on the intermediate states between and and is thus a Lagrangian measure. Also, the formulation of the information length allows us to measure local change for the system in time. , the total information length over the entire evolution and , was shown to be useful to quantify the proximity of any initial PDF to a final attractor of a dynamical system. For instance, for the Ornstein–Uhlenbeck process (O-U) [16,18], was shown to take its minimum value at the stable equilibrium point and increase linearly with the mean position of an initial PDF from the stable equilibrium point. This linear dependence manifests that the information length preserves the linear geometry of the underlying Gaussian process. In this paper, we will show that this linear relation is lost for other metrics (e.g., relative entropy, Jensen divergence). Note that for a chaotic attractor, varies sensitively with the mean position of a narrow initial PDF, taking its minimum value at the most unstable point [21]. This sensitive dependence of on the initial PDF is similar to a Lyapunov exponent. We note that the O-U is a prototypical relaxation problem and can be particularly useful to study, as its attractor provides a natural equilibrium state. It can model many stochastic systems which relax to a stable equilibrium. The solution to this process is Gaussian, and so has ‘nice’ properties of analytical tractability, permitting us to perform detailed investigation under the change of parameters. We first compare different metrics for a single O-U process then move to a coupled O-U process. The O-U process is a well studied model, though less so for the coupled system. Our focus is to compare different metrics and to see if the information length may be more revealing the behavior of the components of the coupled system, as well as the overall system. The remainder of this paper is organized as follows. Section 2 provides the definition of different metrics. Section 3 is devoted to the discussion of a single O-U process. Section 4 provides analytical solutions to the coupled O-U process and Section 5 compares different metrics for the coupled O-U process. Conclusions are found in Section 6. In Appendix A, we present how to solve the Fokker–Planck equation(s) numerically by using a second-order accuracy method in time and compare analytical results with numerical results. Appendix B comments on the Langevin equation for our coupled O-U process.

2. Information Length and Other Metrics

We consider a PDF for a stochastic variable x in the following.

2.1. Information Length

The information length between time 0 and t is given by where is the second moment given by, Here, has the unit of time while has no dimension. The parameter is the characteristic timescale of the system, and quantifies the correlation time for the system [15]. Hence, is the rate of change of the information in time. Integrating over gives the total number of statistically different states that a system passes through in time. We note that quantifies the information change in time through the root-mean-squared fluctuating energy, using the second moment of the partial derivative with respect to time. When the parameters governing a PDF are known, the information length can be written in terms of the Fisher information metric [8,12,15]. As noted in Section 1, is a Lagrangian quantity and has the property of being a local measure, being sensitive to how evolves at different x in time. In comparison with entropy which is independent of the spatial (x) gradient of a PDF, it is this property that may elevate above entropy in revealing micro-scale interactions within a system. The discrete version of Equations (1) and (2) are as follows: Here, i and j represent the discrete time and spatial point, respectively; is the discrete version of , is the time step while s is the spatial step.

2.2. Other Metrics

Here, we list the metrics taken from [11]. We will calculate each metric relative to the initial state PDF at in order to compare their time-evolution with that of the information length. That is, the metrics below are based on comparing two PDFs and . For convenience, the metrics are given both for the continuous process and discrete approximation (that is used for numerical calculation) by using as an index representing time and space, respectively and s as the spatial step, as above. The reference probability [] will be the initial PDF while [] is the PDF at time t [i].

2.2.1. Euclidean norm

Applying our standard notion of distance in Euclidean space seems like a natural extension. However, it quickly becomes apparent that the statistical space of stochastic systems is rarely well described by Euclidean metrics. Mostly included here as a base case, whilst this formulation seems appealing, it does not yield illuminating details about the disequilibrium of the system. We will use it as an example of a poor measure of information change.

2.2.2. Wootters’ Distance

This metric, as the notion of statistical distance itself, originates in quantum information theory [9]. However, as quantum information theory is purely statistical in formulation, it can be applied to any systems defined by a PDF. Fundamentally, this metric is based on the principle that any finite number of measurements on a stochastic system will yield results that may not be exactly the same as underlying probability distributions. It would be impossible to distinguish 2 states whose real underlying probabilities are different less than a typical fluctuation of the error of measurement. This intrinsically defines a resolution for the system. The Wootters’ distance was shown to be a monotone transformation of the Hellinger distance [22].

2.2.3. Kullback-Leibler Relative Entropy

Kullback–Leibler relative entropy was first introduced by Solomon Kullback and Richard Leibler [10], and sometimes is referred to as the Kullback–Leibler divergence. It represents a measure of the difference between a probability distribution and some other reference probability distribution. Whilst a useful tool, it is not strictly a metric as it does not satisfy the triangle inequality. It is however used in the definition of some other quantities, such as the mutual information of two co-varying random variables, and the Jensen divergence.

2.2.4. Jensen Divergence

The Jensen divergence is simply the symmetric version of the Kullback–Leibler divergence. Often it is referred to as the Jensen distance, and the square root of this quantity can be shown to be a metric [11], which can allow us to examine the statistical geometry of the system. The Jensen divergence is the mutual information of a random variable x, with a mixture distribution from and , and a binary indicator variable used to build the distribution. In other words it is a measure of the mutual dependence of x on the way you construct the mixture, and thus quantifies the amount of information difference between the two distributions.

3. The O-U Process

The one-dimensional O-U process is based on the Langevin equation where x is the stochastic variable (e.g., position, velocity, etc.), is the damping constant, is the position of the attractor for the system, and is a -correlated, Gaussian-distributed stochastic forcing, i.e., where D is the strength of the stochastic forcing. The corresponding Fokker–Planck equation [23,24] is given by where the solution is the time-dependant PDF which describes the evolution of the system. It can be shown that the solution to Equation (11) is given by [13] where and , given the initial condition where and in Equation (12) represent the inverse temperature and the mean value of x, respectively, and and in Equation (13) are the values of and at , respectively.

3.1. Information Length

In [13], we showed that the information length for the O-U process is given by where , , , and . is y evaluated at the initial time ( in our case) and is y evaluated at final time. is the initial mean position. Let the integral in the last term of Equation (14) be H, and let . Then Equation (14) can be written as where Note that this is continuous through . For , we can directly compute For we have

3.2. Wootters’ Distance

By using Equations (12) and (13) in (6), we obtain where .

3.3. Kullback–Leibler Relative Entropy

By using Equations (12) and (13) in (7), we can show where .

3.4. Jensen Divergence

By using Equations (12) and (13) in (8), we obtain

3.5. Comparison

Figure 1 shows the final value for each metric as we vary the initial position of the system. The total information length , Wootters’ distance, K-L relative entropy and Jensen divergence against are shown in blue, orange, green and red, respectively, in the long time limit as . Note that the green and red lines are overlapped. It is notable in Figure 1 that the linear relation between the metric and the initial mean value is obtained only by the information length. That is, it is only the information length that preserves the linear geometry underlying a linear stochastic process. For all other metrics, this linear relation is lost. We show in the Appendix A that our analytical metrics in Figure 1 has a good agreement with those calculated directly from the numerical solutions to the Fokker–Planck Equation (11) by time-stepping (see Figure A1).

Figure 1

The metrics against in the long time limit for a single Ornstein–Uhlenbeck (O-U) process.

Figure A1

The metrics calculated by the numerical solutions to the Fokker–Planck equation by time-stepping.

4. The Coupled O-U Process

We now consider the coupled system of equations where D is the strength of a short-correlated Gaussian noise given by Equation (10). These equations are a pair of O-U processes, linked by coupling constants and . The coupling and are due to the Dichotomous noise [25] (see Appendix B for the Langevin equation corresponding to Equations (22) and (23)). We choose a coupled system like this one to examine the localized dynamics of these interacting sub-processes. This system could model any process for which there are two competing components relaxing to an equilibrium, like evaporation in a closed system, or a reversible chemical reaction. Since we are mainly interested in the relaxation process from non-equilibrium initial states, we choose the different initial conditions for and while for simplicity, considering the case where and where the Fokker–Planck Equations (22) and (23) are reduced to Specifically, as initial conditions, we use the following different Gaussian PDFs Note that and are the initial inverse temperatures for and , respectively. On the other hand, we fix the initial mean of to be zero while the initial is taken to have any arbitrary mean value . We also note that as , and approach the same PDF. To solve Equations (24) and (25), we take the Fourier transform [ for ] and use the characteristic equation to recast Equations (24) and (25) as Here is the total derivative along the characteristic defined by which has the solution where is the initial wavenumber. We solve Equations (28) and (29) in terms of new variables for . The coupled Equations (28) and (29) are then simplified as We write down the solutions to Equations (33) and (34) using the two constants a and b To determine a and b, we take the Fourier transform of the initial conditions Equations (26) and (27) to obtain Thus, evaluating Equation (32) for at and equating them to Equations (37) and (38), we find On the other hand, using Equation (31) in Equation (32), we write () in terms of as Finally, taking the inverse Fourier transform [ for ] and performing several Gaussian integrals, we obtain where for . We can check that at , Equations (42) and (43) recover Equations (26) and (27). On the other hand, in the limit of , Equations (42) and (43) give us which is the stationary solution to a single O-U process where . We note that the total PDF is the solution of this single O-U process with the initial condition given by the sum of Equations (26) and (27). Using these analytical solutions in Equations (42)–(44), we present the different metrics in Equations (1) and (5)–(8) in Section 5.

5. Results for the Coupled O-U Process

For the illustration in this section, we show results for the fixed parameter values , and . We recall from Section 4 that for these parameter values, in equilibrium (see Equation (44)), while the mean values in Equation (45) are zero for both and . For comparing metrics, we consider the case where is initially in the final equilibrium with the zero mean value and inverse temperature . On the other hand, at is taken to have either different mean values or different inverse temperatures . Here, we present results obtained by using analytical solutions in Section 4 only. (See Appendix A for the numerical solutions and comparison with the analytical solutions.)

5.1. Varying

We first examine how the system changes when varying the initial inverse temperature of for the fixed zero mean position () (the equilibrium value). We investigate what changes are detected by each metric. Figure 2 shows the total information length (in blue), Euclidean norm (in orange), Wootters’ distance (in green), K-L relative entropy (in red), and Jensen divergence (in purple) against in the long time limit as . Panels (a), (b) and (c) are for the overall system P, and the components and , respectively. Since is initially chosen to be in its equilibrium state, the behavior for the overall system P in panel (a) is more similar to in panel (c) than in panel (b). What is prominent in both panels (a) and (c) is the presence of the distinct minimum in the information length around . This is because the final equilibrium has the inverse temperature , demonstrating that the total information also maps out the underlying attractor structure when varying , taking its minimum value at the equilibrium state (reminiscent of the results for the single O-U process above and previous works [13,16,18]). The minimum value around is also observed for other metrics (apart from the Euclidean norm) although less pronounced than the total information length. In fact, for , there is no temporal change in both and , so all the metrics apart from the Wootters’ one take the value of zero. Also, of interest is an almost linear increase in the total information length for in panel (c) as increases.

Figure 2

Behavior of the metrics for varying for the overall system in panel (a) and components in panel (b) and in panel (c).

Now, what is happening to which starts with the equilibrium state ( and zero mean value)? While Equation (44) shows that for all time, the actual PDF in Equation (42) changes with time due to its coupling to via . That is, changes over time, initially deviating from the equilibrium state due to the interaction with and then finally relaxing back to the final equilibrium state. Associated with this time evolution of is the information change which can be measured by different metrics. However, Figure 2b shows that for , it is only information length that detects a noticeable difference in the information change as changes. Furthermore, the information length for takes the minimum value for the equilibrium value of , as was the case for . This result demonstrates that the information length is sensitive to the evolution of of the component (a subsystem). To investigate the evolution of the metrics for the around , we show in Figure 3 how the metrics evolve over time for the component for the two different values of and near the equilibrium value . Panel a) is for and panel b) is for . The different metrics are denoted by using the same color as used in Figure 2. This small deviation of from the equilibrium value induces a small change in over time due to the coupling to , as noted above. However, in panels a) and b), we see a significant change in the Euclidean norm, K-L relative entropy, and Jensen divergence in time, with a large increase before settling down to a lower value. In comparison, the information length shows no such spike. These spikes are caused by the deviation of from its initial equilibrium state due to the coupling to before settling into the equilibrium state as approaches the equilibrium. What is interesting is that the information length does not show such a spike since it measures the local information change; this would thus be the more sensible view in this instance since the change in the component is small compared to its width (uncertainly). Furthermore, such spikes do not appear for nor P (results not shown) since starting from a non-equilibrium state monotonically approaches its equilibrium.

Figure 3

The metrics against time for around . in panel (a) and in panel (b). The Y-axis scaling on the panels is .

5.2. Varying

We now fix and vary the initial mean position of to examine how metrics depend on , as we have done for the single O-U process in Section 3. Figure 4 shows the total information change for each metric for different values of , for P, and in panels (a), (b), and (c) respectively. Specifically, Figure 4a shows that for P, the total information length against is linear, capturing the linearity of the system as expected from the single O-U process. None of the other metrics are capable of showing the linear relationship in the same way. On the other hand, Figure 4b,c shows an interesting non-monotonic dependence of the information length on .

Figure 4

Behavior of the metrics for varying for the overall system P in panel (a), in (b), and in (c).

To understand this, we show in Figure 5 the time evolution of P, , and in panels (a), (b), and (c), respectively, by using . Of interest is that the evolution of and in Figure 5b,c involves the formation of the two peaks from the initial one peak, followed by merging of these two peaks into one peak as a system settles into the equilibrium. This formation of the two peaks is due to the interaction between and when they are initially widely separated for a sufficiently large . The formation of two PDFs peaks for leads to the maximum in the total information length around in Figure 4b,c. Specifically, the formation of the two peaks in and shown in Figure 5b,c takes place when the two peak are a full PDF width apart, and facilitates broadening of both PDFs in the relaxation process. As is further increased from 20, and form two peaks which are more widely separated, leading to and becoming effectively broader. This in turn reduces the information length (as increases further from ) since large fluctuations (uncertainty) associated with a broad PDF reduces the information length. Again, the information length is the only measure which detects the difference in the overall information change for due to its sensitive dependence on the local evolution of a system.

Figure 5

Time-dependent partial differential equations (PDFs) for P in panel (a), for in panel (b) and for in panel (c); .

6. Conclusions

When searching for a way to quantify the information change in a given dynamical system, our choices are many and varied. Our aim here was to show the power of the information length when compared with some of the more popular methods of measuring information change. Utilizing the O-U, we compared several relative-entropy formulations with our information length to investigate what each could reveal about the dynamics of system. Specifically, we investigated the relaxation problem given by a single and coupled O-U process and compared the information length with K-L relative entropy, Jensen divergence, Wootters’ distance, and Euclidean norm. By measuring the total information length in the long time limit, we showed that was unique in detecting the linear spatial relationship between the total information change and the initial position of a PDF. In the coupled O-U process, the information length was shown to be the most effective in detecting changes in the components of the system even when the others would detect almost nothing in one of the components. In particular, when started with an equilibrium state with the zero mean value, the formation of the two peaks of (or ) from an initial one peak (or ) and the merging of the two peaks into one peak as a system settled into equilibrium was detected only by the information length with its intriguing non-monotonic dependence on (the mean value of ). This underscores the sensitivity of the information length on the evolution of subsystems. Future work will include the study of a system with multiple attractor positions for the system or how the system behaves when changing the position of the attractor. It would also be interesting to examine the case where the coupling parameters and are not constant, but are functions of time. This could result in a periodic equilibrium where the PDF varies between 2 or more unstable states. This could represent physical systems like reversible chemical reactions, or even fluctuating financial markets. It would also be of interest to investigate implication of the information length for the deep neural network [26], in particular, to elucidate the role of geodesic along which the information length is minimized [27].

4 in total

1 in total

1. Monte Carlo Simulation of Stochastic Differential Equation to Study Information Geometry.

Authors: Abhiram Anand Thiruthummal; Eun-Jin Kim
Journal: Entropy (Basel) Date: 2022-08-12 Impact factor: 2.738