Literature DB >> 35138896

Modes of information flow in collective cohesion.

Sulimon Sattari¹, Udoy S Basak^1,2, Ryan G James^3,4, Louis W Perrin^1,5, James P Crutchfield⁴, Tamiki Komatsuzaki^1,6,7.

Abstract

Pairwise interactions are fundamental drivers of collective behavior-responsible for group cohesion. The abiding question is how each individual influences the collective. However, time-delayed mutual information and transfer entropy, commonly used to quantify mutual influence in aggregated individuals, can result in misleading interpretations. Here, we show that these information measures have substantial pitfalls in measuring information flow between agents from their trajectories. We decompose the information measures into three distinct modes of information flow to expose the role of individual and group memory in collective behavior. It is found that decomposed information modes between a single pair of agents reveal the nature of mutual influence involving many-body nonadditive interactions without conditioning on additional agents. The pairwise decomposed modes of information flow facilitate an improved diagnosis of mutual influence in collectives.

Entities: Chemical

Year: 2022 PMID： 35138896 PMCID： PMC8827646 DOI： 10.1126/sciadv.abj1720

Source DB: PubMed Journal: Sci Adv ISSN： 2375-2548 Impact factor: 14.136

INTRODUCTION

Coherent collective behavior fascinates us when a global pattern emerges from individuals who share information with others only in their local vicinity. Decisions made by one individual apparently cascade throughout the entire group. That said, not all members have the same influence. The challenge here in explaining these emergent behaviors is to infer the underlying relationships among individuals from observations. Expectedly, this challenge has attracted many over decades to diagnose collective behaviors in a variety of systems (–). In epithelial Madin-Darby canine kidney monolayers, for example, collective cell migration is triggered by multicellular protrusions, which form fingerlike structures (–). Photoablating a group of cells from the fingertip makes the remaining cell group lose its sense of direction. The interpretation is that the former and latter cells have acted as if they were “leaders” and “followers.” This functional assignment can be carried out because of their spatial location along the protrusion and because the two cell classes are genetically distinct (). Understanding the relationships between leaders and followers—even defining what those roles mean ()—is very difficult, especially so, when probing the mechanisms that cause the dynamical behaviors of aggregated agents. Beyond cells, these basic questions also apply to bird flocking (), fish schooling (, ), caribou migration (), and baboon foraging (). A leader agent is often defined as an individual that influences others more than others influence it. That is, the role is fundamentally asymmetric. Previous studies (, , –) proposed that, under this definition, pairwise analysis of trajectories can assign leaders and followers under the working hypothesis that a change in motion of the leader forecasts a change in motion of the follower. From this, one interprets the change of leader motion as a candidate cause that triggers the motion of followers. Various statistical quantities are used to infer causal relationships (). In pigeon flocks, for example, time-delayed correlation between the orientation of individuals at one time instance and the orientation of others at previous times reveals a hierarchical leadership structure and also provides a method to quantify the time scale of influence (). In such a case, the motion of one pigeon is correlated with the past motion of another. Granger causality () is seen as an improvement to time-delayed correlation as it quantifies the predictability of the current state of a variable based on knowledge of a variable at a previous time. Time-delayed correlation and Granger causality both assume linear relationships between variables, though. This generally does not hold. More recent studies argued that information-theoretic quantities—transfer entropy (TE), time-delayed mutual information (TDMI), and causation entropy—are superior when quantifying influence since they naturally accommodate the highly nonlinear nature of multiagent systems (, –, –). In practice, one must consider the potential for misclassifying influence when using information-theoretic methods. Carefully considering the definitions of information-theoretic quantities—such as TE or TDMI—further illuminates the types of influence a particular individual has. As pointed out by Schreiber () in introducing TE, TDMI reports a nonzero value between the present of a stochastic variable X and the future of a stochastic variable Y even when X has no direct influence on Y. This implies that it cannot be directly used to infer the underlying mutual influence among individuals. It also includes additional information not intrinsically coming from X. TE from X to Y, in contrast, computes the reduction of uncertainty about Y’s future while knowing X’s present, conditioned on Y’s present. Recently though, the study of James et al. () showed that, paralleling TDMI, TE incorporates additional, unwarranted information, namely, the reduction of uncertainty about Y that occurs by knowing the present state of X and Y simultaneously. This information is extraneous to determining “flow” and, misleadingly, adds to the desired information: intrinsic flow from X to Y. In this view, TE decomposes into two distinct modes of information flow—intrinsic and synergistic (). In addition to the above drawbacks, any measure of causality or information flow suffers from the problem of hidden variables, for example, when an outsider influences two agents concurrently. By observing the two agents alone, one may infer a causal relationship between them. One can condition on other agents by computing causation entropy, which is an extension of TE that accounts for the effects of additional agents (). The difficulty with this approach is that the dimension of the probability distribution required for computing the measure grows exponentially with respect to the number of additional variables, which may require unfeasibly large amounts of data to properly sample. Pairwise interactions are fundamental to information theory’s development of input-output (“two-port”) communication channels (). Hence, they provide a primary statistical tool that, as we show, makes it possible to infer the underlying influences among individuals. To obtain maximum insight into the mechanisms underlying multiagent systems, the following focuses on decomposing TE and TDMI into James et al.’s () three different fundamental modes of information flow—termed intrinsic, shared, and synergistic information flows. The results demonstrate how the decomposed elemental information flows shed light on the influences that drive leader-follower relationships. This work augments the previous work in () by manifesting previously unidentified features that come out from the mode decomposition of most commonly used information measures, such as extracting the nature of multiagent interactions from only pairwise trajectories and shedding light on previously unidentified interpretations of the modes of information flow that provide us with a more solid interpretation of causal inference than TE. As an illustrative vehicle, we use a generalized Vicsek model () with two additional features: (i) tunable influence weight of one particle over another (i.e., leaders have larger influence) and (ii) particle memory. We show that, by analyzing the effects of (i) and (ii) on the three modes of information flow, intrinsic information flow exists whenever the motion of an agent depends on another with nonzero weight, as does TE and TDMI. Shared and synergistic information, however, can or cannot occur depending on the setting of (i) and (ii), as we will show further. These results extend previous studies on modes of information flow in finite-state hidden Markov models by introducing Vicsek models that are fundamental to understanding collective motion. In addition, the results manifest that decomposing pairwise information flow between two agents (e.g., leader and follower) enables us to differentiate the underlying interaction pattern driven by some additional agents (e.g., followers) in more than two interacting agents and also allows us to probe the effect of agent memory on the different modes of information flow and their role in collective behavior.

Background: Measuring causal influence

We now review several information-theoretic measures of statistical interdependence—measures that have been offered up as ways to detect causal influence. With these in hand, we turn to explore how useful (or not) they are in analyzing leader-follower relationships.

Detecting causal influence via TDMI and TE

Our definition says that leaders are, on average, more influential than followers. As a consequence of this asymmetry, a follower’s behavior is affected by the leaders, but there is a time delay. To determine the degree of causal influence between random variables, quantitative measures have been introduced from information theory, such as TDMI and transfer (conditional) entropy. As they make no assumption about the functional relationship between variables, the latter improve on the more commonly used measures of time-delayed correlation () and Granger causality (), which can capture only linear functional relationships. Consider two stationary stochastic processes X = (…, x, x, x, …) and Y = (…, y, y, y, …) with probability mass functions p(x) = Pr {X = x} and p(y) = Pr {Y = y}, respectively. TDMI from X to Y with time delay τ is given by ()where H(Y) and H(Y∣X) are the Shannon entropy and conditional entropy, respectively. They measure, in turn, the uncertainty in Y and the uncertainty in Y remaining given X, respectively. In other words, being their difference, the mutual information M(τ) monitors the reduction in uncertainty in Y’s future knowing X’s at a time t. Since mutual information is symmetric, this is also the reduction of uncertainty in X’s present knowing Y at the future time t + τ. The symmetry, though, means that it cannot be used to infer causal influence since, by assumption, the future cannot influence present. In addition, TDMI has another perhaps more subtle drawback: When predicting influence, it can be nonzero when two variables have shared history (). That is, the condition M(τ) may hold when variable Y is not directly influenced by variable X but when either X or Y dynamics contains memory of their present configurations. TE was introduced to overcome these shortcomings (): If X influences Y, then predicting Y’s future becomes easier after knowing the present of both X and Y, compared to only knowing Y’s present. The TE from X to Y takes the form of conditional mutual information That is, T(τ) is TDMI between Y at time t + τ and X at time t conditioned by Y at time t. It is the same as subtracting the uncertainty remaining in Y at time t + τ given both X and Y at the present time t from that in Y given Y. The latter corresponds to the uncertainty of Y reduced by knowing X in addition to the knowledge of Y. T has become one of the standard methods for measuring statistical influence in classifying leaders and followers (, , –, , ). More broadly, since it improves upon TDMI for quantifying asymmetric relationships, it has become a standard for inferring causal relationships in many areas of science, including neuroscience (, –), chemistry (), human behavior (, ), and Earth systems (–). We note, however, that, like correlation, information-theoretic quantities such as TDMI and TE are not sufficient in themselves to identify causality. The latter also requires accounting for the influence of latent or hidden variables. TDMI does not condition on the present or past time steps, and conditioning on the history in computing TE is finite in time length (Y in Eq. 2). Therefore, each variable’s history may act like a hidden variable that influences outcomes. This, in turn, can lead to spurious effects when estimating information flow, as we will elucidate shortly. Recently, it was demonstrated for a simple binary system that T(τ) > 0 can occur although knowledge of X alone cannot reduce the uncertainty in Y (). It was pointed out that, in addition to information intrinsic to reducing uncertainty in Y, which comes from knowing X, independent of Y, TE from X to Y includes information that reduces the uncertainty in Y, which comes from knowing X and Y simultaneously (). Here, intrinsic information flow is the additional reduction in uncertainty in Y’s future gained from knowing X’s present compared to the reduction from knowing Y’s present alone or knowing the present of X and Y simultaneously. Intrinsic mutual information (IMI) was proposed (, ) as a measure that avoids including influence that comes from both the present of X and Y when predicting Y’s future. Note that, while IMI is a specific quantity and not synonymous with intrinsic information flow, it can be seen as an attempt to compute intrinsic information flow between two variables.

Diagnosing causal influence via intrinsic, shared, and synergistic information

Intrinsic information flow from X to Y is regarded as the information flowing from the present of X (X) to the future of Y (Y), which cannot be attributed in any way to the present of Y (Y), in contrast to TE T attributed in part to Y. Independently, secret key agreement rate S(A; B∣C) is a key concept of cryptography, which quantifies the rate of secret (information) shared between two stochastic variables A (say, Alice) and B (Bob) free from a third variable C (Eve), regarded as an “eavesdropper” (see section S3 and fig. S15) (, ). By using the secret key agreement rate where A, B, and C are replaced by X, Y, and Y, the amount of information flowing intrinsically from X to Y is equal to the secret key agreement rate S(X; Y∣Y) (defined in section S3) (). Since S(X; Y∣Y) is not computable in practice, IMI I is used as a (workable) upper bound on S(X; Y∣Y) to monitor information that flows intrinsically from X to Y. The amount of IMI communicated from process X to another Y is the infimum of taken over all possible conditional distributions () Here, is an auxiliary variable used to realize the upper bound of S(X; Y∣Y). It satisfies the Markov property , and has no information to be shared with X and Y, conditioned by Y, i..e, . Here, A → B signifies that B depends only on A, and the infimum is taken over all possible conditional distributions . I represents uncertainty reduction in Y’s future that comes from knowing only X’s present as much as possible under the assumption of the Markov property with respect to Y and , which has been found to be a convenient and accurate bound on the secret key agreement rate S(X; Y∣Y)( ≤ I) (, ). The intuitive explanation as to why the IMI forms an upper bound on the secret key agreement rate is as follows: Alice (A) and Bob (B) cannot be said to hold any information secretly if they are all accessible to Eve (C), including their secret key to decrypt their public communication (see also section S3). How does Eve infer what information Alice and Bob exchange solely from the knowledge of communication between Alice and Bob? Suppose that Eve is not restricted to using solely the value c of the stochastic variable C she observes at each step but rather any possible statistical transformation of the variable C represented by C′ ∼ p(C′∣C)p(C) with an auxiliary variable C′ [Note p(C′∣C) = δ (δ:Kronecker delta) corresponds to using the original C in the inference]. Said another way, Eve has access to knowledge regarding A and B via C plus any manipulation or transformation of C, and so the secret key agreement rate between Alice and Bob is certainly bound from above by conditional mutual information , where is the transformation of C that minimizes the mutual information between A and B. How does this intuition translate to the information flow setting? Intrinsic information flow from X to Y is a very restricted concept, that is, it is the information flowing from the present of X (X) to the future of Y (Y), which is free from the present of Y (Y). So in a very similar sense as above, if there is some manipulation of Y (i.e., ), which reduces the information between X and Y, is expected not to be attributed to the present of Y (since it was inferred via the present of Y). It is in this sense that using the secret key agreement rate helps us quantify intrinsic information flow. The following relations delineate the importance of I(τ) and its relationship to S(X; Y∣Y), T(τ), and M(τ)and In effect, Eqs. 4 to 6 demonstrate that IMI can be used to compute bounds on the deviations of M(τ) and T(τ) from S(X; Y∣Y). That is, whenever equality does not hold in Eq. 5 (Eq. 6), then there must be a portion of T(τ) (M(τ)) that is not intrinsically coming from X. From here on, we set τ = 1 and omit τ from equations, as it has been shown that τ = 1 best captures the information flow between two particles in the Vicsek model (). Once I has been determined, shared σ and synergistic S information follow immediately by subtracting it from TDMI (M) and TE (T), respectivelyand From Eqs. 4 to 6, one sees thatand Note that I monitors information coming from (mostly) X alone to Y, since IMI provides an upper bound on S(X; Y∣Y). Because of this, S > 0 (σ > 0) implies that T (M) contains information that comes from Y’s present and that T (M) should not be interpreted as information flowing only from X to Y in those cases. In the following, on the basis of I being (mostly) information coming from only X to Y, σ is the part of M, which comes from knowing both variables. On postulating that it is the information redundant in both X and Y, we referred to it as shared information. Similarly, S is synergistic information since it is the part of T, which comes from knowing both variables and since it arises from simultaneously knowing X and Y at present (). Together, I, σ, and S reveal a much more detailed decomposition of the relationship between X and Y than can be inferred from T or M alone. To ground this information-theoretic setting, the following shows, using a modified Vicsek model of collective behavior, that T and M can result in misleading interpretation concerning the underlying actual relationship among individuals. We go on to propose that I, σ, and S provide a firmer interpretation of the relationship without requiring additional experiments.

RESULTS

With this background, we now turn to diagnose interactions between individuals in a collective system and between individuals and the collective.

Modified Vicsek model

To demonstrate the interpretability of informational modes I, σ, and S, we introduce a series of augmented Vicsek models. These extend the original () with asymmetric interactions and turn on-off the dependence on the present dynamics of interacting particles. Consider N particles lying within a square box of length L with periodic boundary conditions. Particle i’s position at time t is updated over time increment Δt according towhere denotes particle i’s velocity at time t and i = 1,2, …, N. For simplicity, particles have uniform constant speed v0, and only their orientations θ change. Particle orientation is updated at each time by taking the weighted average of the velocity of neighboring particles within a given radius Rwhere w is a nonnegative asymmetric matrix whose w element determines the interaction strength that particle i exerts on particle j. w > w whenever particle i is a leader, and particle j is a follower in our setting. To model thermal noise, Δθ is a random number uniformly distributed in the range [−η0/2, η0/2] and is chosen uniquely for each particle i at each time step. In the original model (), the right-hand side of Eq. 12 ensured that θi(t + 1) resulted from the configurations of all the particles j (including that of the same particle i) within the circle of radius R centered at . Now, consider modified dynamics that modulate the dependence on θ(t) associated with follower-leader interactions that determine θ(t + 1): The leader influences the follower, but the follower does not influence the leader; i.e., wLF > 0, while wFL = 0. To graphically understand our models, we show Fig. 1 (Aa to Da), where each graph depicts one possible interaction protocol in a simple, two-particle system that determine θ(t + 1), where L and F denote leader and follower, respectively. There, if A or B are either L or F, then A → B signifies that A’s present state influences the B’s future state. We vary wLF ∈ [1,10]. We set wLL = 1 and wFF = 1 for the models in which the present state of L (F) influences the future state of L (F). For models in which the present does not influence the future for the same particle (F or L)—see Fig. 1Aa (i.e., L’s and F’s dynamics), Ba (L’s), and Ca (F’s)—we replace θi(t), which appears in computing (see ∑′ term in Eq. 20) by a random number in the interval [0,2π] to erase any influence from θi(t)’s present. Note that the value of wLL (wFF) is inconsequential when the dynamics of leader (follower) do not depend on their present state, and θ(t) depends solely on a random number in the interval [0,2π]. In type A, neither the future dynamics of L nor F depend on their present (Fig. 1, Aa). In type B, only the future dynamics of F depends on its present (Fig. 1, Ba). In type C, only the future dynamics of L depends on its present (Fig. 1, Ca). In type D, both L’s and F’s future dynamics depend on their present (Fig. 1, Da). To further examine the effects of the history of L in types C and D, we also introduced interaction types C′ and D′, which are the same as interaction types C and D, respectively, except the future state of L depends on its present only when time step t is even, and L “forgets” its present in its future dynamics as in types A and B whenever t is odd. In types C′ and D′, the dependence of the future of F on its present are not changed, that is, the future of F does not depend on its present in type C′, and the future of F always depends on its present in type D′ regardless of the value of t.

Fig. 1.

Graph representation of interaction types A, B, C, and D and the corresponding Venn diagrams representing information flow.

Areas of the circles are computed by integrating M and T over η0 ranging from 0 to 2π and wLF ranging from 1.0 to 10.0. The area of red circles and white striped circles in the Venn diagrams are equal to the (integrated) TDMI ∫M(η0, wLF) dη0 dwLF and TE , respectively. The centers of the two circles are determined as follows: First, each of the centers is connected with a horizontal line without being overlapped [the TE (TDMI) is located at the left (right)], and then a binary search algorithm was used to find the placement of those circles whose overlapping area is equal to the IMI ∫I(η0, wLF) dη0 dwLF by decreasing the distance between the centers. The part of the red (white striped) circle not overlapped with the white striped (red) circle has area equal to the synergistic information (the shared information ∫σ(η0, wLF) dη0 dwLF) (see Legend). (Aa) Type A. (Ba) Type B. (Ca) Type C. (Da) Type D. (Ab to Db) Venn diagrams from leader to follower for interaction types A to D. The information flows from follower to leader for types A and B are negligible and therefore are not shown. (Cc to Dc) Those from follower to leader for interaction types C and D. (Cd to Dd) Those from leader to follower for interaction types C′ and D′. (Ce to De) Those from follower to leader for interaction types C′ and D′.

Graph representation of interaction types A, B, C, and D and the corresponding Venn diagrams representing information flow.

Informational modes between leaders and followers

Misinterpreting causal influence

Let us first examine the amounts of TDMI (M) and TE (T) shown in Fig. 1 for different interaction types, integrated over ranges of both wLF and η0 (the landscapes of M and T as a function of wLF and η0 are shown in figs. S1 and S2). Here, the white circle with diagonal shading represents M, and the red circle represents T. The Venn diagrams can graphically capture the relationship among the decomposed modes of information with TDMI and TE. The overlapping region between M and T, the part of M that is not overlapping with T, and the part of T that is not overlapping with M represent I, σ, and S, respectively, and are discussed in the following section. As expected, ML→F and TL→F in Fig. 1(Ab to Db and Cd to Dd) exhibit substantial, nonzero values in all interaction types, since L is influencing F in all cases. Naively, one expects MF→L and TF→L to be zero for all cases since F does not influence L at all. However, there are spurious values of both MF→L and TF→L in Fig. 1(Cc to Dc), as well as spurious values of TF→L alone in Fig. 1(Ce to De). Spurious amounts of TX→Y and MX→Y, even in cases where X does not influence Y, have been ignored by a large and growing body of research in quantifying causal relationships. The following section elaborates on how decomposing M and T into I, σ, and S can improve the interpretation of information flow using different interaction types as examples.

Modes of information flow

We now interpret by Venn diagrams the different modes of information flow integrated over ranges of wLF and η0 for each interaction type (Venn diagrams for analogous binary systems are shown in fig. S13 and described in section S3). The landscapes of I, σ, and S as a function of wLF and η0 are shown in figs. S3 to S5 for types A, B, C, and D and in figs. S6 and S7, respectively, for types C′ and D′. Since each interaction type corresponds to a special case of information flow, let us briefly examine each one. In type A, where neither L nor F depends on its present (Fig. 1, Aa), only IL→F and no other type of information flow is observed, as seen in Fig. 1 (Ab). In this case, since M and T overlap completely, there is no nonoverlapping region of either M or T, signifying that I = M = T and σ = S = 0. Furthermore, all F to L information flow quantities are equal to zero, thus the Venn diagram for type A from F to L is not shown. Although this may not be a realistic case in systems of interacting agents, since agents are likely to depend not only on each other but also their history, it is only the case demonstrated where either T or M can accurately convey causal relationships between agents. Figure 1 (Ba) represents type B where only F depends on its present in the θ(t + 1) dynamics. IL→F and SL→F, (and thus ML→F and TL→F) (Fig. 1, Bb) all increase compared to the case where F’s dynamics do not depend on its present (Fig. 1, Ab). This further emphasizes our point that dependence on the present state plays a key role in the calculation of information flow even when the interactions between individuals are not intrinsically changing. Notably, the red sliver of nonoverlapping region between M and T in Fig. 1 (Bb) shows the appearance of synergistic information SL→F, which denotes that part of TL→F is not intrinsically coming from L. Since F depends on its present, the simultaneous knowledge of the present state of F and L provides more predictive power than knowing either the present state of F or L alone. Also to be noted is that σL→F is equal to zero, which is due to L not imparting any of its history onto F. F to L information flows in this case are again equal to zero, since the leader has no memory of its past history to share with the follower. Figure 1 (Ca) represents type C, where only L depends on its present. In this case, a substantial amount of σL→F appears because of the dependence of the future state of both L and F on the present state of L, as shown in Fig. 1 (Cb). L imparts information from its present on to the future dynamics of F, and meanwhile, this information is already contained in the future state of F because of the dependence of the future state of F on that same history (i.e., L’s present). In contrast to types A and B, there is a substantial amount of shared history between F and L, as shown in Fig. 1 (Cb). As in the binary system proposed by Kaiser and Schreiber () having the same graph representation as Fig. 1 (Ca), there exists a substantial amount of MF→L although there is no direct interaction in that direction. By decomposing MF→L into IF→L and σF→L, we quantitatively show that the spurious amount of MF→L is coming solely from σL→F and is thus not intrinsically coming from F. TE T was introduced to reconcile this issue (). TF→L does in fact reduce the amount information flow in that direction in our model, given that TF→L is notably less than MF→L in Fig. 1 (Cc). Why then, are TF→L, and more importantly, IF→L not equal to zero in type C (Fig. 1, Cc)? Surely, information is not intrinsically flowing from F to L in this case since there is no direct link from F to L in Fig. 1 (Ca). The reason is that the history of L acts as a hidden variable, imparting information onto both L and F. To verify this, we have introduced interaction type C′, which is the same as interaction type C (Fig. 1, Ca), except that the future state of L depends on its present only when time step t is even, and L forgets its present in its future dynamics when time step t is odd. Although the present state of L can still act as a hidden variable imparting information on both L and F, the computation of IF→L conditions on the present state of L and also minimizes the uncertainty coming from the present state of L as much as possible. Therefore, it is not possible for the history of L to have any influence on the value of IF→L in type C′ since it remembers at most one time step of its past history, which is being minimized in the computation of I. The Venn diagram from follower to leader in this case is shown in Fig. 1 (Ce). Note that TF→L still exists in this case but only in the form of synergistic information SF→L as IF→L = 0. Recall that the existence of SF→L, which is a part of TE, implies that simultaneous knowledge of the present states of L and F allows for an improvement on the prediction of the future of L compared to individual knowledge of the present states of L or F alone. Why does this happen under IF→L = 0? Here, we explain the origin of the existence of synergistic information intuitively. For interaction type C′, θF(t + 1) always results from θL(t) irrespective of the time step. In turn, the present configuration of θL(t) affects its future at every even time step t. However, θL(t + 1) is taken from [0,2π) randomly to reset its history every odd time step t, i.e., θF(t + 1) ≠ θL(t + 1). That is, simultaneous knowledge of the states of L and F reduces uncertainty about whether time step is even or odd, and, therefore, simultaneous knowledge of the present states of L and F improves the prediction power of the future of L compared to solely individual knowledge of the present state of L and F (see more in detail in section S2). Last, Fig. 1 (Da) represents type D, which is perhaps the most intuitive case in typical systems, where both entities depend on their present state in the θ(t + 1) dynamics. M, T, and σ are greater than zero for similar reasons as they are in types B and C for both leader to follower and follower to leader, and SL→F but not SF→L is greater than zero for similar reasons as well. Type D′ is the same as C′, except F depends on its present dynamics as in type D. As in type C′, the only type of information flow from F to L in type D′ is SF→L, and for similar reasons. Thus, type D, representing the most typical types of multiagent systems, contains a rich profile of information flows, which we have explained by analyzing types A, B, C, C′, and D′.

Systems of more than two agents

The analysis up to now addressed only pairwise interactions. This is in accord with the theoretical basis of the information measures used; for example, T(τ) in Eq. 2. The measures generalize straightforwardly to account for additional time series, say, of a third particle (or agent); see, for example, the causation entropy (). Suppose the third variable Z, in addition to X and Y, are each symbolized by m discrete values. Then, for example, the dimension of the probability distribution p(Y, X, Z) is m3 − 1 (−1 is because of probability normalization). This means that, the more the number of additional variables to be conditioned on increases, the more the dimension of the probability distribution required for computing the measures grows exponentially with respect to the number of additional variables. This requires increasingly large amounts of data to properly sample. Therefore, in multiagent systems, it is not usually feasible to condition on all or even a few other agents that interact with a given agent. In addition, even if additional variable(s) that indirectly affect(s) interactions between X and Y exist, it is nontrivial to look for this indirect “cause.” These hidden variables may be another agent entity, some past memory of the process of X and/or Y longer than being taken into account in the elucidation of TE, or something else. Nonetheless, estimating two-agent information measures has been proven useful for monitoring influence in systems having more than two agents (, , , , ). We will now show how measuring I, σ, and S gives marked improvements even in these admittedly approximate settings. Consider a collective in which L and F mutually interact with one another, but under model A, followers also directly interact with each other, and under model B, they do not. See, for example, Fig. 2 for the case of three agents. In the following discussion, there is one leader agent, and the number of follower agents NF is varied. L refers to the leader, and F refers to a particular follower.

Fig. 2.

Three-agent interaction diagrams.

Three-agent interaction diagrams.

(A) In model A, a leader influences both followers, both followers influence L, and followers influence each other. (B) Model B is similar to model A, but followers cannot influence each other. Weights are asymmetric: wLF (leader to followers) is greater than wFL and wFF. We set wLF = 4 and wFL = wFF = wLL = 1. Figure 3 (A and B) displays ML→F for models A and B, respectively. The plots of σ (see fig. S9) are almost indistinguishable from those of M, indicating that a majority of M is actually coming from σ, which is due to shared history between L and F. As has been established for the Vicsek model (), cohesive behavior increases as a function of density. Here, ML→F and σL→F increase as a function of NF in model A. Model B, however, is not the same as the original Vicsek model in that followers do not interact with each other, and therefore, ML→F and σL→F decrease as a function of NF, since the inclusion of additional agents that are not interacting decreases the overall cohesion between the present state of L and the future state of F. The plots of MF→L for models A and B are not shown as they are not distinguishable by eye from those of ML→F (see fig. S8).

Fig. 3.

as a function of noise level η0 (in units of π radians) for models A and B with one leader.

Here, NF = 1 (blue), 3 (red), 7 (yellow), and 15 (purple), where the number of leaders is always one. (A) ML→F for model A. (B) ML→F for model B.

as a function of noise level η0 (in units of π radians) for models A and B with one leader.

Here, NF = 1 (blue), 3 (red), 7 (yellow), and 15 (purple), where the number of leaders is always one. (A) ML→F for model A. (B) ML→F for model B. Figure 4 (A to D) shows TL→F as a function of η0 for model A, TF→L for model A, TL→F for model B, and TF→L for model B, respectively. At η0 = 0, agent movements quickly reach a regular parallel flow independent of initial coordinates and velocities, and thus, any information about their present orientations are negligible (on average) in predicting the others’ orientational motions (see movie S1). In practice, all agents are subject to finite noise due to their environment (represented here by thermal fluctuation). Gradual decreases of T as η0 increases simply arise from this natural stochasticity. In both models A and B, there are small bumps in TL→F and TF→L at η0 ≃ 0.7π, but the notable difference is that the bumps clearly decrease as a function of NF from L to F and F to L in model A (Fig. 4, A and B) and from F to L in model B (Fig. 4D) but not from L to F in model B (Fig. 4C).

Fig. 4.

as a function of noise level η0 (in units of π radians) for models A and B with one leader.

Here, NF = 1 (blue), 3 (red), 7 (yellow), and 15 (purple), where the number of leaders is always one. (A) for model A. (B) for model A. (C) for model B. (D) for model B.

Here, NF = 1 (blue), 3 (red), 7 (yellow), and 15 (purple), where the number of leaders is always one. (A) for model A. (B) for model A. (C) for model B. (D) for model B. The existence of bumps in T at η0 ≃ 0.7π and the difference in their behavior between models A and B can be explained by decomposing T into I and S. Figure 5 (A to D) shows IL→F as a function of η0 for model A, IF→L for model A, IL→F for model B and IF→L for model B, respectively. While the overall trend is very similar to that of T in Fig. 4, I does not contain bumps at η0 ≃ 0.7π. Thus, the bumps in T are explained solely by S. This suggests that when such a bump exists in the values of T as a function of noise, this may be attributed to a difference in the location of the peak of I and that of S. Furthermore, when looking at how this structure changes as a function of the number of following agents, one can deduce whether follower agents mutually interact (i.e., model A) or not (i.e., model B) solely by observing the pairwise trajectories between the leader and one follower. Similar results are also obtained by more simplified binary models (see section S2 and fig. S14). Figure 6 (A to C) shows SL→F as a function of η0 for model A, SF→L for model A, and SL→F for model B, respectively (SF→L for model B is indistinguishable by eye from SF→L for model A and therefore not shown here) (see fig. S10). As an overall trend, S is negligibly small when η0 < 0.4π, which means that since the configuration of F (L) is not varying by a large amount from time to time, the simultaneous knowledge of L and F does not decrease the uncertainty in F (L) more than knowing the current configuration of L or F alone. At intermediate values of η0, simultaneous knowledge of L and F becomes relatively more important, and at high values of η0, the simultaneous knowledge of L and F has no predictive power as the dynamics are dominated by thermal noise. As NF increases in model A, SL→F and SF→L both decrease, as the future configuration of F (L) depends on more other agents and relies less on the simultaneous knowledge of L or F alone. Therefore, increasing NF decreases the likelihood that simultaneously knowing the configuration of F and L has any additional predictive power on L or F. In model B, however, F is not affected by other followers, and therefore, SL→F remains largely unchanged as a function of NF.

Fig. 5.

as a function of noise level η0 (in units of π radians) for models A and B with one leader.

Here, NF = 1 (blue), 3 (red), 7 (yellow), and 15 (purple), where the number of leaders is always one. (A) IL→F for model A. (B) IF→L for model A. (C) IL→F for model B. (D) IF→L for model B.

Fig. 6.

as a function of noise level η0 (in units of π radians) for models A and B with one leader.

Here, NF = 1 (blue), 3 (red), 7 (yellow), and 15 (purple), where the number of leaders is always one. (A) for model A. (B) for model A. (C) for model B.

Here, NF = 1 (blue), 3 (red), 7 (yellow), and 15 (purple), where the number of leaders is always one. (A) IL→F for model A. (B) IF→L for model A. (C) IL→F for model B. (D) IF→L for model B. Here, NF = 1 (blue), 3 (red), 7 (yellow), and 15 (purple), where the number of leaders is always one. (A) for model A. (B) for model A. (C) for model B. Now let us consider the case of multiple leaders in which the motility of follower agents are subject to more than one leader. Figure 7G exemplifies the case of four agents including one leader, while the three followers can interact with one another (model A). Figure 7H exemplifies the same interaction type, but with two leaders and two followers, where the leaders cannot interact with one another but the followers can. Graph representations of cases where the leaders and followers can all mutually interact, leaders can interact with one another but followers cannot, and neither leaders nor followers can interact with one another are shown in fig. S11. The values of S in these cases are shown in fig. S12, and the results are discussed in section SIF. Here, we study the effect of increasing the number of leaders in model A, where leaders cannot interact with one another but followers can (see Fig. 7, G and H). Figure 7 (A and B) shows SL→F and SF→L, respectively, for the case of four agents, including one leader and three followers (blue) and two leaders and two followers (red). As one may expect, SL→F decreases as the number of leaders increases, since the dynamics of each follower results from the two leader agents, reducing synergistic effect between a leader-follower pair in the prediction of the follower motility. In the case of one leader and three followers, a follower is also subject to the interaction of an additional follower instead of the leader; however, since the weight of the follower is less, this does not reduce the synergistic effect as much as the case of two leaders and two followers. Counterintuitively, SF→L increases as the number of leaders increases, as shown in Fig. 7 (B and D). Note that keeping the total number of agents fixed, there are fewer followers interacting with a given leader as we increase the number of leaders. It suggests that synergistic effects SX → Y decrease as the weighted indegree of the agent Y increases, or, in other words, as more agents “participate” in determining the future of the target agent Y. In Fig. 7 (C and D), where there is a total of eight agents, the same trends are, respectively, seen as the number of leaders is increased; however, the overall values of SL→F and SF→L are lower than in Fig. 7 (A and B) due to the higher number of agents reducing the synergistic effects. In Fig. 7 (E and F), we keep the number of followers fixed to three and increase the number of leaders. Here, there is no change in SF→L as the number of leaders increases, although the total number of agents increases, because the increase in agents does not increase the indegree of L.

Fig. 7.

as a function of noise level η0 (in units of π radians) for model A with different numbers of leaders and followers.

(A and B) (A) and as a function of η0 (in units of π radians) (B) for four agents with one leader and three followers (blue) and two leaders and two followers (red). (C and D) (C) and . (D) for eight agents with one leader and seven followers (blue), two leaders and six followers (red), three leaders and five followers (yellow), and four leaders and four followers (purple). (E and F) (E) and (F) with three followers and one leader (blue), two leaders (red), and three leaders (yellow). (G) Graph representation of model A, where there is one leader and three followers. (H) Graph representation of model A, where there are two leaders and two followers.

as a function of noise level η0 (in units of π radians) for model A with different numbers of leaders and followers.

DISCUSSION

We investigated a series of model systems based on the Vicsek Model of collective motion to explore the effect of interaction protocols on the distinct modes of information flow. In theory, one would condition on all variables, as well as history, to fully interpret mutual relationships among agents in a collective. At present, this is not practical. Instead, our task was to acquire detailed and correct interpretations under the constraint of limited measurements—specifically, pairwise interactions among agents. We observed that the intrinsic information between X and Y dominates whenever there is only a link from X to Y and no direct link between Y to X or from Y to itself. However, a small amount of intrinsic information can still be observed when there is no direct link from X to Y, as in the case where X is a follower with memory and Y is a leader. We noted that this was due to the effect of memory. We also found that shared information from X to Y dominates when X and Y are both influenced by a shared history. Synergistic information dominates when present knowledge of X or Y alone cannot predict the future state of Y by itself, but knowing both the present of X and Y simultaneously does. One of the most notable consequences in our analysis of this multiagent system was that decomposing TE into intrinsic and synergistic information flows enabled us to infer whether followers interact with one another in the collective and, more generally, distinguish between cases where an agent interacts mutually with only one or a few other agents, from the cases where many interacting agents are influential. Notably, from that, one can also correctly interpret the “bump” observed in TE as a function of noise level. Although the concepts of intrinsic, shared, and synergistic information flows apply generally to any system of interacting variables, some of the main limitations of information theory still apply to the measures used in this study. Computing information flow typically requires a discretization of the dataset, which performs a coarse graining of the data (still, these coarse-grained methods have proven more effective than methods based on continuous quantities, such as time-delayed correlation ()). Furthermore, although intrinsic information is an improvement upon TE to rule out the effects of history, it cannot erase all memory effects without conditioning on longer and longer pasts, which, in practice, is not feasible. Notwithstanding these limitations, the decomposition of TE and TDMI into intrinsic, shared, and synergistic modes of information flow provides a marked improvement on the nature of interaction without requiring any additional data. On the basis of the model systems and their corresponding information flows, one can deduce which information measure is more appropriate based on the physical problem being addressed. In leader-follower classification, for example, TE is often used. However, when one does not expect to find substantial synergistic or shared flows, it equals TDMI. The latter is then a better choice since it does not require additional conditioning that increases the dimension of the probability distribution that must be well-sampled. In cases where synergistic flow is dominant, one may consider separating intrinsic and synergistic flows instead of computing just TE. This results in a much richer feature space for classification. In general, computing intrinsic, shared, and synergistic flows should perform better or at least as well as TE and TDMI in classification. Future work will verify these claims and elucidate exactly in which scenarios we expect each mode of information flow to be effective in classifying leaders and followers.

MATERIALS AND METHODS

Defining information flow

In this section, we will construct our measure of intrinsic information flow. We will start with a broader understanding of information flow and then narrow it until we arrive at our goal. To begin, information flow from a time series X to a time series Y must exist in both the behavior of X at time t and the later behavior of Y at time t + τa quantity known as the TDMI. As pointed out by Schreiber (), there are many reasons why X and Y might share information. First, both X and Y may be synchronized, and so X predicts Y in the same fashion that Y would, and so it would be disingenuous to attribute that shared information to information flow. Similarly, X and Y may be jointly influenced by a third system Z, and so there is no direct, or even indirect, information flow from X to Y. Schreiber referred to these two situations as the two time series being correlated via common history and common input signals and proposed discounting these influences from the TDMI via conditioninga quantity known as the TE. This does overcome the stated weakness of the TDMI by conditioning on Y, thus not including the information shared by X and Y, which also exists in Y. The TE can also be modified to discount the information also in a simultaneous third variable Conditioning on variables, however, is not a purely subtractive operation. That is, the following relation does not necessarily hold Rather, conditioning can increase the information shared by two variables. These phenomena are known as conditional dependence () and are perhaps best exemplified by the following distribution, where X, Y, and Z are binary random variables, and the events in which an even number of them take on the value 1 have equal probability of 1/4, and the probability of X = Y = Z = 0 is also 1/4: In this distribution, any pair of variables is independentyet each of those pairs conditioned on the third variable is highly correlated This is because knowing the value of, for example, X, does not allow us to infer the value of Y, but if conditioned on (again, for example) Z = 0, suddenly, we know that whatever value X takes, Y must as well. This is conditional dependence in its purest form: X and Y are independent, but given Z, they are perfectly correlated. This brings us back to the TE. As it is based on a particular conditional mutual information, conditioning on Y can induce an apparent correlation between X and Y, which does not exist without it. To overcome this weakness in the TE, it was proposed to take a step back and consider the problem from a slightly different perspective (). We seek an operational understanding of the information shared by X and Y, removing the influences of common history and common input signals without introducing other forms of correlation. To do this, we appeal to the cryptographic flow ansatz (), which states that intrinsic information flow exists when X and Y can agree upon a secret, while Y acts as an eavesdropper, and furthermore, that the intrinsic information flow is quantified as the rate of secret sharing existing between the two (see section S3). In essence, this means that information shared by X and Y can only be definitively attributed to flow from X to Y if there is no way that information can be reconstructed or derived by Y. To practically apply the cryptographic flow ansatz, we use a relatively easily computable upper bound, termed as IMI Effectively, this bound simply states that the information shared by X and Y, which is inaccessible to Y, is bound from above by the conditional mutual information between X and Y given all possible variables that can be constructed from . The calculation of the IMI, while not trivial, is not particularly difficult. While the optimization over is not convex, the optimization space is finite because the cardinality of can be bound by the cardinality of Y, ∣Y∣ (). The object of optimization is then a ∣Y∣×∣Y∣ row-stochastic matrix, where the i, jth entry is . Global optimization techniques, such as basin hopping, can then be used to find the global minimum. In basin hopping, an initial condition is proposed, and the local minima are found through standard gradient-based techniques; then a step in the optimization space is taken, and the local minima are found again. This is repeated some number of times, and the least of the found local minima is presumed to be the global minimum. This is the technique used in the dit information theory package (), which was used to perform the calculations in this manuscript. Briefly, this measure builds upon the TE, producing a previously unidentified metric that comes substantially closer to the TE’s stated goal of removing the effects of common history and common input signals but without introducing the possibility of conditional dependence. This is accomplished by appealing to the field of information-theoretic cryptography and drawing parallels between secret key agreement and the scientific issue of attributing information present in Y to X and X alone. IMI is a sharp bound on the secret key agreement rate. Note that only in particular cases have rates been derived. However, in these cases, IMI and secret key agreement rate coincide (, ). That said, probability distributions with IMI greater than the secret key agreement rate can be constructed. Again, only a few constructions are known (, ). In these situations, tighter bounds on secret key agreement rate exist. However, they are more challenging to calculate (, ).

Computing information flow measures

The computation of T, M, S, and σ were performed as follows. First, the orientations θL and θF are computed as described by Eq. 12 up to time T = 2 × 106 for 20 sets of initial conditions. The values of θL and θF are then discretized by binning them into six bins on the interval [0,2π] [the use of six symbols was found to be sufficient to differentiate the behaviors of L and F while maintaining a computationally feasible number of sequences to be sampled to compute information measures; see (, )]. For each set of initial conditions, the joint probability distribution of p(x(t), y(t), y(t + 1)), from the time series, where x(t) and y(t) are the discretized forms of θ, where x and y can be either L or F. p(x(t), y(t), y(t + 1)) is computed by counting the occurrences of each of the 63 possible combinations of (x(t), y(t), y(t + 1)) and dividing by the total length of the time series minus 1, 2 × 106 − 1. Once the probability distributions are computed, T is computed by plugging them into the equation (), where τ = 1. Likewise, M is computed using the formula . The calculation of I is described in the “Defining information flow” section. Last, σ and S are computed directly using Eqs. 7 and 8 (σ = M − I and S = T − I). In practice, one does not need to use longer time series to compute the modes of information flow than those required for the computations of TE or TDMI. To further confirm the statistical significance of our results on the decomposed mode of information flow, we have performed a surrogate test (, ). This uses surrogate time series by swapping a pair of two time series of two interacting agents in the set of realizations, which preserves all statistical properties of the nonlinear dynamics of the individual agents but spoils causal relationship if it exists. We confirmed that our conclusions are statistically significant (see the “Surrogate test” section and Fig. 8).

Fig. 8.

Surrogate and actual distributions used for surrogate testing.

(Blue) Distribution of surrogate values . (Red) the actual value of SF→L(NL = 2, NF = 6) − SF→L(NL = 1, NF = 7) averaged over 20 realizations for η0 = 1.1π. The values of SF→L(NL = 2, NF = 6) and SF→L(NL = 1, NF = 7) for η0 ranging from 0 to 2π are shown in Fig. 7D.

Surrogate and actual distributions used for surrogate testing.

Details of the modified Vicsek models

Here, we describe the details of the modified Vicsek model. To iterate their trajectories, the following formula is used () Here sums over all j satisfying . In this, w is a nonnegative asymmetric matrix whose w element determines the interaction strength that particle i exerts on particle j. w > w whenever particle i is a leader and particle j is a follower in our setting. Positions at the initial time t = 1 are chosen randomly from a uniform distribution within the box of length L = 10, and orientations are chosen randomly from a uniform distribution on the interval [0,2π). The interaction radius R is set to R = 3. Positions are updated using Eq. 11, and the orientations θF and θL are updated using Eq. 12. All simulations are coded in MATLAB.

Surrogate test

To validate the statistical significance of the values of M, T, I, σ, and S reported here and in the Supplementary Materials, we perform a permutation test using surrogate data following (). When performing the statistical test on SF→L, for example, we first propose the null hypothesis that there is no significant amount of SF→L. Then, we create a distribution using surrogate data that preserves the dynamical and statistical properties of F(t) and L(t) while spoiling causality between them if it exists. In random shuffle surrogate analysis, the time series F(t) is randomly shuffled, meaning F(t) at each time t is replaced by a randomly chosen F(t′) from another time t′; however, this method does not preserve dynamical properties of F(t), such as the dependency of F(t + 1) on F(t). Fourier transform–based surrogates, such as iterative amplitude adjusted Fourier transform (), are often used to preserve both the linear correlation in time (i.e., power spectrum) and the frequency distribution of the original time series. Here, desired surrogate data to unveil statistical significance for observed information modes in TE and TDMI are those preserving nonlinear dynamics and frequency distribution intrinsic to each stochastic variable of leaders and followers, θ and θ, but spoiling possible causal relation in the computation of information mode decomposition. We first performed a set of simulations for n realizations, each of which the initial positions of agents are taken randomly from a uniform distribution within a 10 by 10 box with a periodic boundary condition (the interaction radius R is equal to 3), and their orientations from a uniform distribution on the interval [0,2π), respectively. The kth realization of the ith leader’s and the jth follower’s dynamics are denoted as and (1 ≤ i ≤ NL,1 ≤ j ≤ NF), respectively, where NL and NF are the number of leaders and that of followers in the system, respectively. For example, to carry out the surrogate analysis on synergistic information , we simply swapped the (same length) time series of the kth realization or to the other k′th realization or , which preserves the nonlinear dynamical and statistical properties of leaders and followers, while any causal relation is erased simply because of different realization. Namely, the surrogate “synergistic information” is defined by . Here, i, i′, j, and j′ were chosen arbitrarily from the set of N leaders, {L}, and that of NF followers, {F}, in kth and k′th realizations (e.g., i and i′ over different realizations are not necessarily identical among the same kind of agents and randomly taken from {L} over different realizations). It was confirmed that decomposed information modes including SF were statistically indistinguishable among different choices of i and j across (n = 20) realizations here, which implies that the finite length of trajectories (T = 2 × 106) and the number of 20 realizations were enough to characterize the general properties of decomposed information modes between leader(s) and follower(s) in the models. Thus, here, we simply omitted the notation of subscripts i and j from L and F like SF→L and . If the null hypothesis, i.e., no synergistic information exists, is true, it means that the value of SF→L should not be indistinguishable statistically from . One can quantify this by setting a false-positive error rate α by using the distribution of the null hypothesis so that the value of S is fallen into the domain of 1 − α. If it is false, then the value of SF→L would appear as an outlier outside of the domain. Here, for example, Fig. 8 demonstrates the surrogate test with n = 10,000 (= 100 × 100) surrogate trials to verify the statistical significance of the difference between SF→L, where NL = 2 and NF = 6 and SF→L, where NL = 1 and NF = 7 for the multiagent Vicsek model A [see SF→L(NL = 2, NF = 6) − SF→L(NL = 1, NF = 7) in Fig. 7D]. The figure manifests that SF→L(NL = 2, NF = 6) − SF→L(NL = 1, NF = 7) are statistically significant with significance level α < 0.01 (i.e., no false positives over 10,000 surrogate trials), implying that our interpretation of “the more the NL, the larger SF→L under NL + NF being invariant” in Fig. 7D was validated. Likewise, we confirmed statistical significances of information mode decomposition here, especially when the amounts of the quantities are marginally small. It may be noted for a limited number of trajectories, such as in experimental data, that the nearest-neighbor permutation test () can be an alternative method of surrogate analysis, in which the time series F(t) at each time t is replaced by a randomly chosen F(t′) at any time t′ (≠t) where ∣L(t) − L(t′)∣ is the smallest (nearest). This preserves the marginal distributions of F(t) and L(t), and the joint probability distribution p(F(t), L(t)) between F(t) and L(t) if the nearest neighbors chosen are close enough to guarantee L(t) ≈ L(t′) while spoiling p(F(t), L(t), L(t + 1)) between F(t) and L(t + 1).

Xor
X	Y	Z	Pr
0	0	0	14
0	1	1	14
1	0	1	14
1	1	0	14

37 in total

Modes of information flow in collective cohesion.

INTRODUCTION

Background: Measuring causal influence

Detecting causal influence via TDMI and TE

Diagnosing causal influence via intrinsic, shared, and synergistic information

RESULTS

Modified Vicsek model

Graph representation of interaction types A, B, C, and D and the corresponding Venn diagrams representing information flow.

Informational modes between leaders and followers

Misinterpreting causal influence

Modes of information flow

Systems of more than two agents

Three-agent interaction diagrams.

as a function of noise level η0 (in units of π radians) for models A and B with one leader.

as a function of noise level η0 (in units of π radians) for model A with different numbers of leaders and followers.

DISCUSSION

MATERIALS AND METHODS

Defining information flow

Computing information flow measures

Surrogate and actual distributions used for surrogate testing.

Details of the modified Vicsek models

Surrogate test

1. Measuring information transfer

2. Group formation and cohesion of active particles with visual perception-dependent motility.

3. Model-free information-theoretic approach to infer leadership in pairs of zebrafish.

4. Coupling of hippocampal theta and ripples with pontogeniculooccipital waves.

5. Inferring domain of interactions among particles from ensemble of trajectories.

6. Transfer entropy--a model-free measure of effective connectivity for the neurosciences.

7. Media coverage and firearm acquisition in the aftermath of a mass shooting.

8. Self-organized sorting limits behavioral variability in swarms.

9. Structural transition in the collective behavior of cognitive agents.

10. Plasticity in leader-follower roles in human teams.