Delsin Menolascino1, ShiNung Ching2,3. 1. Department of Electrical and Systems Engineering, Washington University in St. Louis, St. Louis, MO, 63130, USA. delsin@wustl.edu. 2. Department of Electrical and Systems Engineering, Washington University in St. Louis, St. Louis, MO, 63130, USA. 3. Department of Biology and Biomedical Sciences, Washington University in St. Louis, St. Louis, MO, 63130, USA.
Abstract
We consider the notion of stimulus representation over dynamic networks, wherein the network states encode information about the identify of an afferent input (i.e. stimulus). Our goal is to understand how the structure and temporal dynamics of networks support information processing. In particular, we conduct a theoretical study to reveal how the background or 'default' state of a network with linear dynamics allows it to best promote discrimination over a continuum of stimuli. Our principal contribution is the derivation of a matrix whose spectrum (eigenvalues) quantify the extent to which the state of a network encodes its inputs. This measure, based on the notion of a Fisher linear discriminant, is relativistic in the sense that it provides an information value quantifying the 'knowablility' of an input based on its projection onto the background state. We subsequently optimize the background state and highlight its relationship to underlying state noise covariance. This result demonstrates how the best idle state of a network may be informed by its structure and dynamics. Further, we relate the proposed information spectrum to the controllabilty gramian matrix, establishing a link between fundamental control-theoretic network analysis and information processing.
We consider the notion of stimulus representation over dynamic networks, wherein the network states encode information about the identify of an afferent input (i.e. stimulus). Our goal is to understand how the structure and temporal dynamics of networks support information processing. In particular, we conduct a theoretical study to reveal how the background or 'default' state of a network with linear dynamics allows it to best promote discrimination over a continuum of stimuli. Our principal contribution is the derivation of a matrix whose spectrum (eigenvalues) quantify the extent to which the state of a network encodes its inputs. This measure, based on the notion of a Fisher linear discriminant, is relativistic in the sense that it provides an information value quantifying the 'knowablility' of an input based on its projection onto the background state. We subsequently optimize the background state and highlight its relationship to underlying state noise covariance. This result demonstrates how the best idle state of a network may be informed by its structure and dynamics. Further, we relate the proposed information spectrum to the controllabilty gramian matrix, establishing a link between fundamental control-theoretic network analysis and information processing.
In network science, considerable effort has been directed at structural analysis that reveals the interconnection architecture of engineered and biological networks[1-6]. While such analysis can illuminate intriguing and common architectural principles of complex systems, it alone cannot tell us the functionality of such architecture. In other words, to what end is the revealed structure useful? Our goal in this work is to analyze the relationship between structure, dynamics and function of (networked) systems. The specific notion of function that we consider is information coding, which has to do with how networks represent a stimulus or extrinsic input in a way that is useful for downstream processing (i.e., so that an agent can decode the identity of input stimuli based on a ‘read out’ of the state of the network). This sort of coding has been a topic of much interest in theoretical neuroscience, where understanding how networks of neurons represent stimuli is a foundational question[7-9].Of course, many general principles of information coding are known from communication theory[10]. However, it is not clear how principles of information transmission, coding/decoding and capacity are impacted when enacted over a networked system, especially one with continuous time dynamics. That is, what structure and dynamical aspects of a network make it a good information encoder? To this end, we principally address two questions: 1) What sorts of dynamics shape the input/output relationship of a network in a way which is effective in the Shannon sense (i.e. some, but not too much, redundancy to enable robust, efficient communication in the presence of noise)? It is especially unclear whether dynamical networks that do a good job encoding and/or processing information are also those that are most responsive to their inputs in a control-theoretic sense. Hence the second question: 2) Is a network that is easily controlled by its inputs necessarily one that also effectively encodes information about those inputs? These two related questions constitute the primary focus of the paper.We consider information processing defined in terms of the extent to which network states/outputs encode their respective inputs. Our particular focus is on the background state of a network and its ability to facilitate information extraction regarding other afferent inputs. Non-zero background states are frequently observed in natural dynamical systems. For example, in the study of brain networks the existence of a ‘resting state’ is well-established experimentally[11-14]. Our goal is to provide a theoretical framework with which we can better understand how non-zero resting states confer informational utility. Specifically, we will derive a background state that is optimal according to a novel information measure (also herein derived). In mathematical terms, suppose that a stimulus u induces a network state x. We will quantify the ‘knowability’ of u by comparing x against a reference background state xref. The optimal xref can be interpreted as a ‘state of readiness’ at which the network may be sustained in preparation for activity to follow.The information measure we employ is based upon the inner product 〈xref, x〉, and is rooted in the method of Fisher linear discriminants[15-17]. This inner product, as is well-known, prescribes the projection of x onto xref. For vectors of known magnitude, this projection gauges angular separation. Thus, essential characteristics of x (and by extension, of u) can be gleaned in an easily codified and quantified manner. The potential informational value to be derived from a projection of x onto xref, depends critically on the effective choice of the background state xref and uncertainty/noise. The choice of xref, in turn, depends largely on how a network responds to its inputs as a function of time (see Fig. 1). Noise and uncertainty, likewise, are impacted by the network dynamics.
Figure 1
The optimal background state xref amounts to a Fisher linear discriminant, onto which state distributions (induced by inputs) are projected. In the case of Gaussian noise, uncertainty can be visualized in terms of ellipsoids (with principal axis vmax) about the mean. Since the networks are dynamic, the optimal xref will vary with time as the dynamics carry the states forward.
The optimal background state xref amounts to a Fisher linear discriminant, onto which state distributions (induced by inputs) are projected. In the case of Gaussian noise, uncertainty can be visualized in terms of ellipsoids (with principal axis vmax) about the mean. Since the networks are dynamic, the optimal xref will vary with time as the dynamics carry the states forward.The formulation of a continuous-time dynamical (networked) system with afferent input is fundamentally aligned with analysis from control theory. A key aspect of our results will be the derivation of a Fisher information matrix, , associated with the above inner product. As we will see, the spectrum of quantifies the extent to which different afferent inputs produce different state representations. It turns out that this information spectrum has a particular statistical relationship with a traditional element from control theory, the controllability gramian matrix[18]. This is perhaps intuitive since the control gramian is mathematically equivalent to the covariance of a network in response to white noise, a key source of uncertainty (and, thus, information loss). We will formalize this relationship in our results.Assessment of information propagation through noisy networks has been a topic of increasing interest, and while there are many contexts for which such analyses are relevant, quantifying the information-carrying capacity of (real and/or artificial) neural networks has been an especially active research area[8,9,19,20]. For example, in Zylberberg et al.[9], linear Fisher information is evaluated for a two-layer feedforward network in which a scalar signal is distributed to first-layer nodes and then propagated to the second layer via a weighted matrix, with noise corrupting the output of both layers. The amount of information available about the stimulus, they observe, is dependent on the noise covariance structure at each layer, and on how these covariances relate to one another and to the direction of signal propagation (i.e. the tuning curve). However, the network considered in this work is static, so that input-output relationships are fully determined by network structure alone (i.e. there is no recurrent modulation of the signal, though noise does play a role, obviously).In contrast, Ganguli et al.[8] quantify the stimulus-encoding capacity of a linear dynamical network, again employing linear Fisher information theory. Here, the stimulus is presented as a pulse at a specific time, the ‘memory trace’ of which is preserved by the network over time to an extent depending on the network’s topology, and the statistical behavior of state noise.Our work employs the dynamical framework of Ganguli et al.[8], while considering multi-variate stimuli, akin to the ‘tuning curves’ of Zylberberg et al.[9] In fact our framework allows for stimuli of arbitrary dimensions, although here we do constrain the inputs (stimuli) to be constant (for reasons addressed in the Discussion). Also our notion of how stimulus information is encoded is different. Specifically, we employ the inner-product based readout, facilitating a comparison between an output vector x and a reference xref, as mentioned above.
Results
Problem Formulation and Preliminaries
Linear Dynamical Networks
Linear dynamics have been used to describe complex networks in several contexts[21-23], with the caveat that such dynamics provide only local approximations of more complex, nonlinear regimes. Proceeding with this limitation in mind, we consider a linear dynamical system (network) with noise, of the form:where the n-dimensional state vector x’s recurrent dynamics are described by adjacency matrix , input matrix mediating the m-dimensional input u, taken here to be constant (see Discussion), and zero-mean gaussian noise w(t), which has covariance matrix Σ. We point out the fact that the term dynamical network is used here to imply time-evolution in the network states, as opposed to a time-varying vector field; that is, A is constant. We wish to consider the linear Fisher information regarding u given the inner product of the state x(t) (which varies in time) and a reference background xref. By basic linear system theorywhere is a covariance matrix determined by the system dynamics.
Inner Product and Fisher Information
As we seek to quantify the extent to which the inner product of x(t) and xref encodes information about the input u giving rise to x(t), we employ the Fisher information matrix, denoting it , which is given bywhere 〈x, xref〉 = xxref and denotes the variance of this inner product. The inner product can be interpreted in several ways, including as the correlation or contrast between two competing states. The Fisher information lower bounds the variance of an estimate of u based on measurement of the inner product.From (3), and taking into account the independence of x and xref we have(where we have dropped dependence on t for notational convenience).Definingwe obtainUsing the derivation given explicitly in the Methods section, we obtain the Fisher information matrixwhere is the state covariance matrix as introduced in (2).In seeking a holistic assessment of the matrix , we employ the trace, which is the summed component-wise variance in our estimation of u. Since the is an outer product of two vectors (scaled by the denominator) we may express its trace as their scaled inner product:where dependence on t has again been made explicit.
Linear Dynamics and Noise Ellipsoids
Figure 1 provides a schematic of the problem formulation. Because our dynamics are linear, at any given time t the state of the network is a Gaussian random vector. The covariance of the state can be used to parameterize a quadratic form whose level sets constitute ellipsoids that encapsulate the mean. We denote the principal eigenvector of the covariance matrix as vmax. These ellipsoids capture the noise-driven uncertainty in the state. As we will soon see, the optimal xref amounts to a Fisher linear discriminant that best disassociates two competing state distributions (ellipsoids), each associated with a different stimulus. As the network dynamics carry these trajectories forward in time, the optimal xref will in general change.
Network Parameterization, Actuated Nodes and Steady-State Assumption
We will focus our attention on networks that have a Barabási-Albert (scale-free) topology[24]. The off-diagonal elements of A are binary, while the diagonal elements are assigned large enough negative values to ensure stability (see Methods). The dynamics of such networks are asymptotically stable so that in the absence of stimuli and noise, all states return to the origin.In our analysis we will vary the structure of how inputs impinge on network nodes. In particular, for an n node network, only nd ≤ n nodes will receive input. These actuated nodes are sometimes referred to as ‘driver’ nodes[25-27]. We will mostly consider the case when each actuated node recieves an independent input, so thatwhere is the identity matrix of dimension nd (the number of driven nodes).We make the assumption that the noise covariance is always at steady-state. In concept here is that the dynamics of the network are persistently excited by ongoing noise, while receiving stimuli in a temporally punctate manner. To be mathematically precise, under this assumption (2) becomes:Critically, we assume the pair A, B is controllable, so that the controllability gramian (precisely defined later) is full-rank. A final important assumption pertains to the specification of t. In cases when t is assumed to be at steady state, we set t = 10 (which we find is five times longer than the time-constant of our considered networks). In other cases, we will vary t to assess the role of dynamics.
An optimal reference state xref exists, maximizing information about u
We are interested, for the moment, in which choice of xref will maximize (10). That is, we seek to answer the question: Of all possible background states xref, which one will provide the most information about a stimulus u (with its resultant output x), given a readout of the inner product 〈xref, x〉. In order to find this ‘ideal’ reference stimulus, we transform (10) as follows:where LL (L is lower-triangular) is the Cholesky decomposition of Σ.Continuing, we havewhere x* = Lxref. Therefore, for simplicity of notation letting S = L−1ΓBBΓL−, we have the familiar Rayleigh quotientwhose values lie in the range and which achieves its extrema for and where and are the eigenvectors of S associated with eigenvalues λmin and λmax respectively. We then make the reverse transformationto obtain our ideally contrasting reference state. Mathematically (and as depicted in Fig. 1) xref is in fact the Fisher linear discriminant that best separates the induced state distributions associated with any two randomly chosen inputs.Previous results[9] have shown that an optimally informative ‘signal direction’ in a non-dynamical feedforward network is one which align with the principal axis of the noise covariance ellipsoid. Similarly, with our dynamical setup, we decided to explore the optimal xref qualitatively by examining to what extent it aligns with the principal axis of the noise covariance ellipsoid (vmax of Σ in (10)). The results are shown in Fig. 2. We notice in Fig. 2 that the ideal xref changes its orientation relative to vmax as a function of nd. This orientation is virtually uncorrelated with network size and is very predictable, as we ran 30 network realizations for each n, nd pair and found little variability. We hypothesized that this was due to prioritization of the fidelity of the portion of xref corresponding to actuated nodes, which would explain why relatively under-actuated networks showed greater overall angular divergence between xref and vmax. This is indeed the case, as shown in Fig. 3. We first examined actuated nodes, then non-actuated nodes, by segmenting xref and vmax into the first nd elements (Fig. 3(a)), then the last n − nd elements (Fig. 3(b)). Clearly, the actuated part of xref is required to be much more similar to the corresponding part of vmax than is true for the non-actuated part.
Figure 2
Fidelity of optimally contrasting reference state xref to system noise covariance decreases monotonically with nd. Shown is how xref aligns with the principal eigenvector (denoted vmax) of noise covariance matrix Σ. μCos( is the mean, over 30 network realizations, of the cosine of the angular difference (θ) between xref and vmax. Error bars are standard deviations. 30 realizations were evaluated for (a) identity and (b) random B matrices.
Figure 3
Actuated nodes of the ideal background (xref) are ‘required’ to be aligned with noise; non-actuated nodes are not. Shown is alignment of xref with principal noise covariance direction vmax (as in Fig. 2); here xref and vmax are partitioned so that (a) reflects only actuated and (b) only non-actuated nodes. μCos( is as in Fig. 2, again for 30 network realizations.
Fidelity of optimally contrasting reference state xref to system noise covariance decreases monotonically with nd. Shown is how xref aligns with the principal eigenvector (denoted vmax) of noise covariance matrix Σ. μCos( is the mean, over 30 network realizations, of the cosine of the angular difference (θ) between xref and vmax. Error bars are standard deviations. 30 realizations were evaluated for (a) identity and (b) random B matrices.Actuated nodes of the ideal background (xref) are ‘required’ to be aligned with noise; non-actuated nodes are not. Shown is alignment of xref with principal noise covariance direction vmax (as in Fig. 2); here xref and vmax are partitioned so that (a) reflects only actuated and (b) only non-actuated nodes. μCos( is as in Fig. 2, again for 30 network realizations.Aside from the dependence of the optimal xref on input structure (particularly nd), we also analyzed how the orientation of xref, relative to vmax, changes with time. Since, as mentioned above, we are working in a dynamical regime, a time-dependent analysis is straightforward. To this end, we evaluated the orientation of xref, relative to vmax, at several time points, using the same methodology employed above, with the results shown in Fig. 4. We see that the orientation of xref relative to vmax does indeed change with time, apparently smoothly, and that xref becomes more similar to vmax as time advances. This is especially true for fully- or nearly fully-actuated networks, but is generally true for all input scenarios.
Figure 4
The ideal contrast becomes more aligned with noise covariance as time progresses. Shown is the time-evolution of the relative orientation between the optimally contrasting state xref and the principal noise eigenvector vmax. At lower values of T, xref is nearly orthogonal to vmax, while as T gets larger, xref becomes much more aligned with vmax, although this alignment approaches a limit, which also varies monotonically with nd.
The ideal contrast becomes more aligned with noise covariance as time progresses. Shown is the time-evolution of the relative orientation between the optimally contrasting state xref and the principal noise eigenvector vmax. At lower values of T, xref is nearly orthogonal to vmax, while as T gets larger, xref becomes much more aligned with vmax, although this alignment approaches a limit, which also varies monotonically with nd.Thus, the optimally contrasting background/reference state is fundamentally dependent on the input structure of the network and the time evolution of network dynamics.
An optimal reference input uref exists, maximizing information about u
We expanded our inquiry to analyze admissible reference inputs uref which could give rise to an optimally contrasting state xref. More generally, we asked: Of all possible stimuli, does there exist a best one uref, resulting in an output , that provides information about all others. In this formulation, is not longer unconstrained, but rather is determined by:Using (18), we can find the optimal reference stimulus via a similar sequence of steps as in the previous subsection, definingwhere LL (L is lower-triangular) is the Cholesky decomposition of BΓΣΓB, which is positive-definite (a requirement for this decomposition) since covariance matrix Σ is inherently positive-definite and thus can be Cholesky decomposed into , so that the matrix BΓΣΓB can be written QQ for Q = BΓLΣ and is thus positive-semidefinite, while the full-rank condition of Q ensures positive-definiteness.Defining S ≡ L−BΓΓBBΓΓBL−1, we arrive atwhose values lie in the range and which achieves its extrema for and where and are the eigenvectors of S associated with eigenvalues λmin and λmax, respectively. We then make the reverse transformation to obtain our ideally contrasting reference input.We pause for a moment to consider the significance of this ‘optimal’ uref (i.e. the eigenvector of S which optimizes (21)). The existence of such an optimum means that for a given network, there is one input whose induced state best contrasts those of all other inputs.
The optimally contrasting input targets specific nodes in a concentrated manner, but not necessarily nodes of highest degree
We sought to characterize the ‘optimally informative’ uref by examining its entries (recall that we are in the domain of constant inputs) as they relate to the connectivity degree of actuated nodes. Clearly, uref has cardinality nd (see (11)). Since, then, there is a one-to-one relationship between the nd entries of uref and the driven nodes, we are able to learn about which nodes may be specially ‘targeted’ by an optimally contrasting uref. Intuition would suggest that the targeted nodes would simply be the hubs, that is, that the higher the degree of a node, the higher the value of the corresponding entry of uref. This is borne out in simulation, but to an extent which varies consistently with network size (n) and nd. Examining Fig. 5, we see that for larger networks wherein all nodes are controlled, nearly all of the large entries of uref are concentrated toward nodes in the top 5% by degree ranking (i.e. the hubs), while as we control fewer nodes, a majority of the large entries are directed toward the hubs, but this majority becomes smaller as nd decreases. Also, looking at the different network sizes, we see that, in general, larger networks show a more pronounced ‘targeting’ of the hubs, while in smaller networks the hubs are still targeted but to a lesser extent. It should be pointed out that uref is unitary, meaning there is an essential trade-off between how much energy can be focused on hubs and how much can be focused elsewhere (as is easily seen in Fig. 5), so that in very hub-oriented scenarios (i.e. large networks with high fraction of controlled nodes), uref is nearly a standard basis vector, while in smaller networks wherein fewer nodes are controlled, uref is more homogenous.
Figure 5
The optimally contrasting input uref targets network ‘hubs’, but to a degree which varies with nd. 30 networks were realized with for each size (n) and driver node (nd) combination. For a given network size, the graph shows the mean (μ) of the squared entries of the (normalized) optimal uref. The entries of uref are sorted according to the degree of targeted nodes (abscissa is a percentile, binned in increments of 5%, so that each b represents 5% of the nodes). Note that when n = 100 and there are twice as many bins as controlled nodes, hence the duplicity of values.
The optimally contrasting input uref targets network ‘hubs’, but to a degree which varies with nd. 30 networks were realized with for each size (n) and driver node (nd) combination. For a given network size, the graph shows the mean (μ) of the squared entries of the (normalized) optimal uref. The entries of uref are sorted according to the degree of targeted nodes (abscissa is a percentile, binned in increments of 5%, so that each b represents 5% of the nodes). Note that when n = 100 and there are twice as many bins as controlled nodes, hence the duplicity of values.For comparison’s sake, we ran simulations with randomly connected (Erdös-Rènyi (ER), with 0.5 edge probability) instead of scale-free networks. These networks were also undirected and rendered stable by the same method (described in Methods). We did use small (<0.1) positive edge weights, rather than unitary weights, for these networks to render their analysis more numerically tractable. We see in Fig. 6 that the optimal uref also tends to target nodes of higher degree in random ER networks, but to a much lesser extent than for scale-free networks. We hypothesize that this is because the degree distribution for scale-free networks is given by a power-law, which means there are many nodes of very low degree, and a few of very high degree. ER random networks have a binomial degree distribution, with more nodes of average degree and none of very high degree. Thus, it may be less crucial for the input to target the higher-degree nodes in ER random networks, simply because the higher-degree nodes are not much higher-degree nodes. In the ER random networks, we see a skewing of the values of uref which is inversely correlated with nd. That is, for less-actuated networks, the hubs tend to be more targeted, while for more fully-actuated networks, this targeting becomes less pronounced until at the limiting case (nd = n), the entries of uref are all nearly identical.
Figure 6
Same setup as in Fig. 5, but simulations are run for randomly connected (Erdös-Rènyi) networks (edge probability p = 0.5). Note the much smaller range of values on the vertical axis when compared with Fig. 5. Nodes of high degree (‘hubs’) are targeted, but to a lesser extent than for the scale-free networks. Skewness of the graphs is inversely related to nd; that is, it is less necessary to target hubs for more fully actuated networks, until at nd = n, uref is essentially uniform.
Same setup as in Fig. 5, but simulations are run for randomly connected (Erdös-Rènyi) networks (edge probability p = 0.5). Note the much smaller range of values on the vertical axis when compared with Fig. 5. Nodes of high degree (‘hubs’) are targeted, but to a lesser extent than for the scale-free networks. Skewness of the graphs is inversely related to nd; that is, it is less necessary to target hubs for more fully actuated networks, until at nd = n, uref is essentially uniform.
Information Spectra (of Su) are Sensitive to Network Parameterization
We now turn our attention to the problem of comparing different networks according to their information capacity, as quantified by . For this we examine the information capacity by varying uref in (21), where the intuitive strategy is to let range over the eigenvectors of S. Thus, a holistic characterization of is provided simply by the eigenvalue spectrum of S (recall that (21) takes on the value λ, the ith eigenvalue, when uref is the ith eigenvector), heretofore termed the information spectrum of a network.We obtained a distribution of information spectra for several network parametrizations. We here restricted our attention to steady state characterizations. Each distribution amounts to an empirical probability distribution of the eigenvalues of S over (random) network realizations. We assumed used zero-mean, unit-variance, uncorrelated noise (i.e. , , i ≠ j and ), though similar results were obtained for correlated noise.Figure 7(a) depicts the information spectra for several fractions of actuated (driver) nodes (aggregates over several values of n). A first observation is the presence of a small, secondary mode to the right of the principal mode. This secondary mode reflects the presence of a few particularly salient inputs that most informatively correlate with all others. It is notable that this mode, which represents the largest eigenvalue of S, systematically decreases with smaller values of nd. Certain intuition about these observations can be deduced from the rich body of work on spectra of random matrices. One such spectral characterization[28] shows that the principal eigenvalue of the adjacency matrix (here denoted A) for undirected, binary scale-free networks (such as those used for our simulations, with the exception that the diagonal of our A is adjusted, as described in Methods, to ensure stability) approximates , where n is the number of network nodes. Further, recent work[29] has shown that this maximum eigenvalue, for weighted scale-free networks with expected degree distributions, varies monotonically with the maximum node degree. Maximum degree, in turn, increases dramatically as n increases, because of the preferential attachment-based network creation algorithm[30]. Thus we would expect the spectrum of S, and in particular its principle eigenvalue, to depend on effective network size, which itself depends on nd (see (11), and note the effect of B on (20)). This makes sense intuitively, as well: We would expect higher-dimensional input spaces to admit a richer set of encoded representations.
Figure 7
(a) Information spectra as function of number of actuated nodes (distributions aggregated over n = 100, 200, 300, 400). Spectra consist of a primary mode and a smaller secondary mode. (b) Spectra of the controllability gramian for different fractions of actuated nodes. As noted in previous work, these spectra display an increasing number of modes as nd decreases. The principal mode is inset. Comparing to the information spectra in (a), we see that information spectra show marked similarity to first mode of control spectra, and both spectra reveal outlying, small modes corresponding to easiest (control) and most informative (information) directions.
(a) Information spectra as function of number of actuated nodes (distributions aggregated over n = 100, 200, 300, 400). Spectra consist of a primary mode and a smaller secondary mode. (b) Spectra of the controllability gramian for different fractions of actuated nodes. As noted in previous work, these spectra display an increasing number of modes as nd decreases. The principal mode is inset. Comparing to the information spectra in (a), we see that information spectra show marked similarity to first mode of control spectra, and both spectra reveal outlying, small modes corresponding to easiest (control) and most informative (information) directions.Further, as nd decreases, the distribution of the main mode becomes broader and more entropic. No additional modes or ‘humps’ appear as nd varies, a point we will return to shortly.
Information Spectra are Related to the Controllability Gramian
As noted previously, the information spectrum is fundamentally time-varying (governed by the network dynamics, driven by the input in question). We were particularly interested in the relationship between the information spectrum and that of the controllability gramian matrixwhich also fundamentally characterizes the input-output relationship of a linear (networked) system. Indeed, it is well known that in the limit as , the gramian is exactly equivalent to Σ, i.e., the denominator of . Thus, we sought to compare the information spectrum to that of W(∞).The gramian matrix has been a pivotal entity in the analysis of linear systems and similarly modeled networks[31-33], including certain types of brain networks[12,34]. Recent theoretical work[24] has characterized the nature of the infinite-time gramian spectrum as a function of the number of driven nodes (nd). It is shown there that for small fractions of driven nodes the spectrum manifests a series of modes or ‘humps,’ over which eigenvalues are randomly distributed (over network realizations). As is well known in linear systems theory, the magnitude of a gramian eigenvalue determines the minimum input energy needed to reach the unit hypersphere in the direction of its associated eigenvector. Thus, the principal mode of the gramian spectrum describes those directions that are ‘easiest’ to induce.Figure 7(b) depicts the gramian spectrum for the same networks as in Fig. 7(a) (i.e., with varying fraction of actuated nodes). The aforementioned modes are readily evident. What is notable from this figure is the correspondence between the information spectra to the two rightmost modes of the gramian spectrum (that is, the principal mode and the much smaller mode at far right). In interpreting this result, it is important to note that the information and gramian spectra are of different dimensions while . This is because the information spectrum captures only constant inputs, thus for a fixed time the state is restricted to an m-dimensional subspace. In this sense, we postulate that the principal mode of the gramian spectrum corresponds not simply to the ‘easiest’ to reach directions, but also those associated with constant (m-dimensional) inputs.Let us now seek to understand this numerical correspondence between control and information, shown in Fig. 7, at a conceptual level. What does it mean that the easiest directions of control (quantified by the largest eigenvalues of W−1) and network information (quantified by ) show such similarity? We hypothesize that this correspondence may be indicative of an underlying link between controllability metrics and information-based analyses, generally. Indeed, this is not a novel idea; the mathematical basis for this link has been explored[35,36] in contexts different, but related, to ours. We can summarize the essence of these discussions, as it relates to our formulation, simply by noting that depends fundamentally on a derivative of the state (to be more precise, an inner product of two states) with respect to
u. Thus, when system dynamics are such that incremental changes made to u result in large changes to the state, informational value is increased. This information is, to some extent, a measure of network sensitivity to its inputs, and sensitivity to inputs is, of course, exactly what controllability analysis quantifies.
Discussion
We developed an analysis to quantify the amount of information about an input u that can be gleaned from the contrast/correlation between its induced state x and a reference or background state xref. Our analysis shows that there exists an optimally informative xref in this context. This theoretical result reinforces intuition about how proper choice of a contrasting background might enable more rapid decoding and subsequent processing of input stimuli. We showed that the orientation of xref relative to the principal axis of noise covariance decreased monotonically with increasing fraction of nodes actuated and that this separation also decreased over time, but to an extent limited by nd This dynamical relationship between the informational optimum and the noise covariance is complementary to results based on static models[9].We expanded our inquiry to examine the uref which would give rise to xref. We found that the optimal uref tends to target network hubs, but in a way which varies consistently with number of nodes driven nd (See Fig. 5). We then derived an information spectrum that characterizes the full encoding capacity (in terms of inner product readout) of inputs. We showed that this spectrum has nuanced dependency on network size and fraction of driven nodes, with the presence of a low-dimensional set of inputs to which networks appear particularly well-tuned. Further, we reconciled the information encoding of a network with its control-theoretic properties, which characterize how the ‘energy’ of an input allow for the state space to be traversed. Our results suggest that inputs that produce ‘easy’ state excursions–recall that these inputs are postulated to be constant or near-constant (see Section)–are also those that are well-encoded.It may reasonably be asked why we have chosen inputs to be constant in the overall paradigm. At a conceptual level, our information analysis is fundamentally predicated on the derivative . That is, we seek to quantify the extent to which changes in the projection of system state x onto background xref reflect incremental changes in u. In the case of a constant u, this is readily interpreted – it quantifies the ability to deduce changes in the input composition. However, interpretability is more problematic for a time-varying u(t). What does it mean to make an incremental change in the function
u(t)? Is the relevant change spatial (composition) or temporal? In this sense, because we are dealing with a variational problem in infinite-dimensional function space, intuition is difficult.This argument can be seen mathematically. Examining (3), we see that taking a derivative with respect to u(t) presents us with the task of taking the derivative of one function of t () with respect to another function (u(t)). Thus, would become dependent on u′(t). But we conduct our analysis with respect to the objective of learning about u from a ‘readout’ of only the projection of x(t) onto xref. To assume a knowledge of the time derivative of u(t) changes the setup completely. One way around this dilemma would be to project u(t) onto a set of orthogonal basis functions (a Fourier basis, for example). If we denote a vector of basis functions (truncated so as not to be infinite) as h(t), we can approximate (almost) any u(t) by Uh(t), where U is a constant projection, or coefficient, matrix. Then, becomes linear in U and the basic formulation is preserved, with the change that instead of seeking to infer constant input u via the state projection, we seek to infer coefficient matrix U. A thorough treatment of this idea will be given in future work.Having highlighted the results from the exploration of , let us take a slightly higher-level look at the information processing which quantifies. Considering the inner product as the ‘readout’ (which forms the basis of information measure ) is intuitive since it measures correlation/contrast between two competing representations of a stimulus. In this sense, it is a highly condensed representation of potentially high-dimensional stimuli. However, it is far from clear whether a network itself could accomplish this readout, and whether this is in fact a reasonable strategy for actual information processing tasks such as input classification. The linearity of the model considered is certainly a limiting factor in this regard.Nonetheless, we believe our results highlight an interesting direction toward analyzing not simply the structural aspects of networks, but also their dynamics and ultimately their functionality. It is straightforward to envision generalizing our framework to examine other network topologies, dynamical nonlinearities and wider time-scales, as well as alternative information metrics. These types of analyses can shed light on the functional advantages of biological networks (e.g., those in the brain) and/or principles for guiding the design of engineered systems.
Methods
Derivation of u
Proceeding from (7), we make use of the fact that is a scalar (being the variance of a scalar inner product), so thatIn seeking a holistic assessment of the matrix , we employ the trace, which is the summed component-wise variance in our estimation of u. Since is an outer product of two vectors we may express its trace as their inner productWe now examine the inner product variance . It is straightforward to obtainNote that is the correlation matrix of x, so thatThereforeand combining (29) with (26) we haveThus, plugging (31) into (22) we have the Fisher information matrixas given in the main body of the text.
Network parameterization and simulations
To ensure stability, it is sufficient[24] to ensure that, , the ith diagonal element of binary adjacency matrix A is at least as negative as the sum of the non-diagonal elements in row i. That is, . Accordingly, in constructing networks, we first created a scale-free degree distribution and then formed a corresponding random graph, thus prescribing adjacency matrix A. Next we simply assigned , where each δ was picked at random from (0, 1).Creation of these adjacency matrices and the B matrices, as well as the calculations of optimally contrasting background state xref and reference stimulus uref, with the associated statistical analyses, were performed using Mathematica, with the exception of the calculations of controllability gramians, which were done by exporting these matrices to MATLAB, and using the lyap() command.
Authors: Shi Gu; Fabio Pasqualetti; Matthew Cieslak; Qawi K Telesford; Alfred B Yu; Ari E Kahn; John D Medaglia; Jean M Vettel; Michael B Miller; Scott T Grafton; Danielle S Bassett Journal: Nat Commun Date: 2015-10-01 Impact factor: 14.919