Literature DB >> 35052029

The Causal Interaction between Complex Subsystems.

Abstract

Information flow provides a natural measure for the causal interaction between dynamical events. This study extends our previous rigorous formalism of componentwise information flow to the bulk information flow between two complex subsystems of a large-dimensional parental system. Analytical formulas have been obtained in a closed form. Under a Gaussian assumption, their maximum likelihood estimators have also been obtained. These formulas have been validated using different subsystems with preset relations, and they yield causalities just as expected. On the contrary, the commonly used proxies for the characterization of subsystems, such as averages and principal components, generally do not work correctly. This study can help diagnose the emergence of patterns in complex systems and is expected to have applications in many real world problems in different disciplines such as climate science, fluid dynamics, neuroscience, financial economics, etc.

Entities: Chemical

Keywords: bulk information flow; causality; complex system; networks of networks; subspace

Year: 2021 PMID： 35052029 PMCID： PMC8774361 DOI： 10.3390/e24010003

Source DB: PubMed Journal: Entropy (Basel) ISSN： 1099-4300 Impact factor: 2.524

1. Introduction

When investigating the properties of a complex system, it is often necessary to study the interaction between one subsystem and another subsystem, which themselves also form complex systems, usually with a large number of components involved. In climate science, for example, there is much interest in understanding how one sector of the system collaborates with another sector to cause climate change (see [1] and the references therein); in neuroscience, it is important to investigate the effective connectivity from one brain region to another, each with millions of neurons involved (e.g., [2,3]), and the interaction between structures (e.g., [4,5,6]; see more references in a recent review [7]). This naturally raises a question: How can we study the interaction between two subsystems in a large parental system? An immediate answer coming to mind might be to study the componentwise interactions by assessing the causalities between the respective components using, for instance, the classical causal inference approaches (e.g., [8,9,10]). This is generally infeasible if the dimensionality is large. For two subsystems, each with, say, 1000 components, they end up with 1 million causal relations, making it impossible to analyze, albeit with all the details. In this case, the details are not a benefit; they need to be re-analyzed for a big, interpretable picture of the phenomena. On the other hand, in many situations, this is not necessary; one needs only a “bulk” description of the subsystems and their interactions. Such examples are seen from the Reynolds equations for turbulence (e.g., [11]) and the thermodynamic description of molecular motions (e.g., [12]). In some fields (e.g., climate science, neuroscience, geography, etc.), a common practice is simply to take respective averages and form the mean properties, and to study the interactions between the proxies, i.e., the mean properties. A more sophisticated approach is to extract the respective principal components (PCs) (e.g., [13,14,15]), based on which the interactions are analyzed henceforth. These approaches, as we will be examining in this study, however, may not work satisfactorily; their validities need to be carefully checked before being put into application. During the past 16 years, it has been gradually realized that causality in terms of information flow (IF) is a real physical notion that can be rigorously derived from first principles (see [16]). When two processes interact, IF provides not only the direction but also the strength of the interaction. Thus far, the formalism of the IF between two components has been well established (see [16,17,18,19,20], among others). It has been shown promising to extend the formalism to subspaces with many components involved. A pioneering effort is [21], where the authors show that the heuristic argument in [17] equally applies to that between subsystems in the case with only one-way causality. A recent study on the role of individual nodes in a complex network [22] may be viewed as another effort. (Causality analyses between subspaces with the classical approaches are rare; a few examples are [23,24], etc.) However, a rigorous formalism for more generic problems (e.g., with mutual causality involved) is yet to be implemented. This makes the objective of this study, i.e., to study the interactions between two complex subsystems within a large parental system by investigating the “bulk” information flow between them. The rest of the paper is organized as follows. In Section 2, we first present the setting for the problem and then derive the IF formulas. Maximum likelihood estimators of these formulas are given in Section 3, which is followed by a validation. Finally, Section 5 summarizes the study.

2. Information Flow between Two Subspaces of a Complex System

Consider an n-dimensional dynamical system where denotes the vector of state variable , are differentiable functions of and time t, is a vector of m independent standard Wiener processes, and is an matrix of stochastic perturbation amplitudes. Here we follow the convention in physics not to distinguish a random variable from its deterministic counterpart. From the components , we separate out two sets, and , and denote them as and , respectively. The remaining components are denoted as . The subsystems formed by them are henceforth referred to as A and B, and the following is a derivation of the information flow between them. Note that, for convenience, here A and B are put adjacent to each other; if not, the equations can always be rearranged to make them so. Associated with Equations (1)–(3) there is a Fokker–Planck equation governing the evolution of the joint probability density function (pdf) of : where , . Without much loss of generality, is assumed to be compactly supported on . The joint pdfs of and are, respectively, With respect to them, the joint entropies are then To derive the evolution of , integrate out in Equation (4). This yields, by using the assumption of compactness for , Similarly, Multiplication of Equation (7) by , followed by an integration with respect to over , yields Note that in the second term of the left hand side, the part within the summation is, by integration by parts, In the derivation, the compactness assumption has been used (variables vanish at the boundaries). By the same approach, the right hand side becomes Hence, Likewise, we have Now consider the impact of the subsystem A on its peer B, written . Following Liang (2016) [16], this is associated with the evolution of the joint entropy of the latter: where signifies the entropy evolution with the influence of A excluded, which is found by instantaneously freezing as parameters. To do this, examine, on an infinitesimal interval , a system modified from the original Equations (1)–(3) by removing the r equations for , , ..., from the equation set Notice that the s and s still have dependence on , which, however, appear in the modified system as parameters. By [16], we can construct a mapping , , where means but with appearing as parameters, and study the Frobenius–Perron operator (see, for example, [25]) of the modified system. An alternative approach is given by Liang in [18], which we henceforth follow. Observe that on the interval , corresponding to the modified dynamical system, there is also a Fokker–Planck equation Here , means the joint pdf of with frozen as parameters. Note the difference between and ; the former has as parameters, while the latter has no dependence on . However, they are equal at time t. Integration of the above Fokker–Planck equation with respect to gives the evolution of the pdf of subsystem B with A frozen as parameters, written : Divide Equation (16) by and simplify the notation by to obtain Discretizing and noticing that , we have (in the following, unless otherwise indicated, the variables without arguments explicitly specified are assumed to be at time step t) To arrive at , we need to find . Using the Euler–Bernstein approximation, where, just as the notation , and , we have Take mathematical expectation on both sides. The left hand side is . By Corollary III.I of [16], and noting , and the fact that are independent of , we have Thus, Hence, the information flow from to is Likewise, we can obtain the information flow from subsystem B to subsystem A. These are summarized in the following theorem. For the dynamical system Equations ( where When , , (20) reduces to which is precisely the same as the Equation (15) in [18]; the same holds for Equation (19). These equations are hence verified. The following theorem forms the basis for causal inference. If the evolution of subsystem A (resp. B) does not depend on We only check the formula for . In (20), the deterministic part Now, is independent of , and note that is also so. Thus, we may integrate within the parentheses directly with respect to , yielding By the compactness of , the whole deterministic part hence vanishes. Likewise, it can be proved that the stochastic part also vanishes. This theorem allows us to identify the causality with information flow. Ideally, if , then B is not causal to A, and vice versa; the same holds for . □

3. Information Flow between Linear Subsystems and Its Estimation

Linear systems provide the simplest framework which is usually taken as the first step toward a more generic setting. Simple as it may be, it has been demonstrated in practice that linear results often provide a good approximation of an otherwise much more complicated problem. It is hence of interest to examine this special case. Let where and are constants. Additionally, suppose that are constants—that is to say, the noises are additive. Then, are also constants. Thus, in Equation (20), The same holds in Equation (19). Thus, the stochastic parts in both Equations (19) and (20) vanish. Since a linear system initialized with a Gaussian process will always be Gaussian, we may write the joint pdf of as where is the population covariance matrix of . By the property of the Gaussian process, it is easy to show where , is the vector of the means of , and the covariance matrix of . For easy correspondence, we will augment , , and , so that their entries have the same indices as their counterparts in , and . Separate into two parts: where and correspond to the respective parts in the two square brackets. Thus, has nothing to do with the subspace B. By Theorem 2, this part does not contribute to the causality from A to B, so we only need to consider in evaluating ; that is to say, The second term in the bracket is . The first term is Here, is the entry of the matrix Since, here, only are in question, this is equal to the entry of the matrix As is symmetric, so is , and hence . Thus, Substituting back, we obtain a very simplified result for . Likewise, can also be obtained, as shown in the following. In Equations ( where where where Given a system such as (1)–(3), we can evaluate in a precise sense the information flows among the components. Now, suppose that instead of the dynamical system, what we have are just n time series with K steps, , . We can estimate the system from the series and then apply the information flow formula to fulfill the task. Assume a linear model as shown above, and assume . following Liang (2014) [19], the maximum likelihood estimator (mle) of is equal to the least-square solution of the following over-determined problem: where ( is the time stepsize), for , . Use an overbar to denote the time mean over the K steps. The above equation is Denote by the matrix the vector , and the row vector . Then, . The least square solution of , , solves Note that is , where is the sample covariance matrix. Thus, where is the sample covariance between the series and . Thus, finally, the mle of is where is the entry of , and Likewise, Here, and is the entry of . When and , and hence, , , so . Equation (30) thus becomes recovering the well-known Equation (10) in [19].

4. Validation

4.1. One-Way Causal Relation

To see if the above formalism works, consider the vector autoregressive (VAR) process: where , , are independent. As schematized in Figure 1, and form two subsystems, written as X and Y, respectively. They are coupled only through the first and third components; more specifically, drives , and Y feeds back to X through coupling with . The strength of the coupling is determined by the parameters and . In this subsection, , so the causality is one-way, i.e., from X to Y without feedback.

Figure 1

The preset coupling between the subsystems X and Y.

Initialized with random numbers, we iterate the process for 20,000 steps and discard the first 10,000 steps to form six time series with a length of 10,000 steps. Using the algorithm by Liang (e.g., [16,18,19,20]), the information flows between and can be rather accurately obtained. As shown in Figure 2a, the information flow/causality from X to Y increases with , and there is no causality the other way around, just as expected. Since no other coupling exists, one can imagine that the bulk information flows must also bear a similar trend. Using Equations (28) and (30), the estimators are indeed similar to that, as shown in Figure 2b. This demonstrates the success of the above formalism.

Figure 2

The absolute information flows between subspaces X and Y as functions of the coupling coefficients (). (a) The componentwise information flows between and ; (b) the bulk information flows between subsystems X and Y computed with Equations (28) and (30); (c) the information flows between and ; (d) the information flows between the first principal components of () and (), respectively (units: nats per time step).

Since practically averages and principal components (PCs) have been widely used to measure complex subsystem variations, we also compute the information flows between and , and that between the first PCs of and . The results are plotted in Figure 2c,d, respectively. As can be seen, the principal component analysis (PCA) method works just fine in this case. By comparison, the averaging method yields an incorrect result. The incorrect inference based on averaging is within expectation. In a network with complex causal relations, for example, with a causality from to , the averaging of with is equivalent to mixing with its future state, which is related to the contemporary state of , and hence will yield a spurious causality to . The PCA here functions satisfactorily, perhaps because in selecting the most coherent structure, it discards most of the influences from other (implicit) time steps. However, the relative success of PCA may not be robust, as evidenced in the following mutually causal case.

4.2. Mutually Causal Relation

If both the coupling parameters, and , are turned on, the resulting causal relation has a distribution on the plane. Figure 3 shows the componentwise information flows (bottom) and (top) on the plane. The other two flows, i.e., their counterparts and , are by computation essentially zero. As argued in the preceding subsection, the bulk information flows should follow the general pattern, albeit perhaps in a more coarse and/or mild pattern, since it is a property on the whole. This is indeed true. Shown in Figure 4 are the bulk information flows between X and Y computed using Equations (28) and (30).

Figure 3

The absolute information flow from to and that from to as functions of and . The units are in nats per time step.

Figure 4

The absolute bulk information flow from subsystem Y to subsystem X, and that from X to Y. The abscissa and ordinate are the coupling coefficients and , respectively.

Again, as usual, we try the averages and first PCs as proxies for estimating the causal interaction between X and Y. Figure 5 shows the distributions of the information flows between and . The resulting patterns are totally different from what Figure 3 displays; obviously, these patterns are incorrect.

Figure 5

As Figure 4, but for the information flows between the mean series and .

One may expect that the PCA method should yield more reasonable causal patterns. We have computed the first PCs for and , respectively, and estimated the information flows using the algorithm by Liang [20]. The resulting distributions, however, are no better than those with the averaged series (Figure 6). That is to say, this seemingly more sophisticated approach does not yield the right interaction between the complex subsystems, either.

Figure 6

As Figure 4, but for the information flows between the first principal component of and that of .

5. Summary

Information flow provides a natural measure of the causal interaction between dynamical events. In this study, the information flows between two complex subsystems of a large dimensional system are studied, and analytical formulas have been obtained in a closed form. For easy reference, the major results are summarized hereafter. For an n-dimensional if the probability density function (pdf) of is compactly supported, then the information flows from subsystem A, which are made of , to subsystem B, made of (), and that from B to A are, respectively (in nats per unit time), where , and E signifies mathematical expectation. Given n stationary time series, and can be estimated. The maximum likelihood estimators under a Gaussian assumption are referred to in Equations (28) and (30). We have constructed a VAR process to validate the formalism. The system has a dimension of 6, with two subsystems respectively denoted by X and Y, each with a dimension of 3. X drives Y via the coupling at one component, and Y feeds back to X via another. The detailed, componentwise causal relation can be easily found using our previous algorithms such as that in [20]. It is expected that the bulk information flow should in general also follow a similar trend, though the structure could be in a more coarse and mild fashion, as now displayed is an overall property. The above formalism does yield such a result. On the contrary, the commonly used proxies for subsystems, such as averages and principal components (PCs), generally do not work. Particularly, the averaged series yield the wrong results in the two cases considered in this study; the PC series do not work either for the mutually causal case, though they result in a satisfactory characterization for the case with a one-way causality. The result of this study is applicable in many real world problems. As explained in the Introduction, it will be of particular use in the related fields of climate science, neuroscience, financial economics, fluid mechanics, etc. For example, it helps clarify the role of greenhouse gas emissions in bridging the climate system and the socioeconomic system (see a review in [26]). Likewise, the interaction between the earth system and public health [27] can also be studied. In short, it is expected to play a role in the frontier field of complexity, namely, multiplex networks or networks of networks (see the references in [28,29,30]). We are therefore working on these applications.

13 in total

The Causal Interaction between Complex Subsystems.

1. Introduction

2. Information Flow between Two Subspaces of a Complex System

3. Information Flow between Linear Subsystems and Its Estimation

4. Validation

4.1. One-Way Causal Relation

4.2. Mutually Causal Relation

5. Summary

1. Nonlinear PCA: characterizing interactions between modes of brain activity.

2. Information transfer between dynamical system components.

3. Information flow within stochastic dynamical systems.

4. Dynamic causal modelling.

5. Unraveling the cause-effect relation between time series.

6. Functional connectivity: the principal-component analysis of large (PET) data sets.

7. Generalised filtering and stochastic DCM for fMRI.

Review 8. On nodes and modes in resting state fMRI.