| Literature DB >> 27870846 |
Max Tegmark1,2.
Abstract
Although there is growing interest in measuring integrated information in computational and cognitive systems, current methods for doing so in practice are computationally unfeasible. Existing and novel integration measures are investigated and classified by various desirable properties. A simple taxonomy of Φ-measures is presented where they are each characterized by their choice of factorization method (5 options), choice of probability distributions to compare (3 × 4 options) and choice of measure for comparing probability distributions (7 options). When requiring the Φ-measures to satisfy a minimum of attractive properties, these hundreds of options reduce to a mere handful, some of which turn out to be identical. Useful exact and approximate formulas are derived that can be applied to real-world data from laboratory experiments without posing unreasonable computational demands.Entities:
Mesh:
Year: 2016 PMID: 27870846 PMCID: PMC5117999 DOI: 10.1371/journal.pcbi.1005123
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Properties of different integration measures.
All but the third are desirable properties; capitalized N/Y (no/yes) indicate when an integration measure lacks a desirable property or has an undesirable one. The first four properties are generally agreed to be important, while the second set of four have been argued to be important by some authors. Interpretability refers to the extent to which the measure can be given an information-theoretic interpretation satisfying desirable properties of integration (see text). Computability refers to the feasibility of evaluating the measure in practice (see text).
|
| ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Major | Always | y | y | y | y | y | y | y | y | y | y | y | y | y | y | |
| Always | y | y | y | y | y | y | y | y | y | y | y | y | ||||
| Vanishes for | n | n | n | n | n | n | n | n | n | n | n | n | n | n | ||
| Vanishes for | y | y | y | y | y | y | y | y | y | y | y | y | y | |||
| Minor | Vanishes for | y | y | y | y | y | y | y | ||||||||
| Vanishes for | y | y | y | y | y | |||||||||||
| y | y | y | y | y | y | y | y | |||||||||
| Based on | y | |||||||||||||||
| Intuitively | 2 | 2 | 2 | 2 | 2 | 0 | 2 | 2 | 2 | 2 | 0 | 1 | 0 | 0 | 0 | |
| Computationally | 1 | 2 | 2 | 0 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 2 | 2 |
Integration ϕ for different measures.
A ≡ B C−1, , ∑ ≡ C − B C−1 B = C − ACA and . C is the data covariance matrix and B is the cross-covariance between different times as defined by eq (46).
| Name | Definition | Formula for Gaussian variables |
|---|---|---|
|
|
| |
|
|
| |
|
|
| |
|
| ||
|
|
| |
|
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| ∞ | |
|
|
| ∞ |
|
| ∞ | |
|
|
| ∞ |
|
|
| |
|
|
|
|
|
|
| |
|
|
|
|
| min { | ∞ | |
| min { | ||
| min { |
Fig 1We model the time-evolution of the system state as a Markov process defined by a transition matrix M: when the (possibly unknown) system state evolves from x0 to x1, the corresponding probability distribution evolves from p0 to p1 ≡ Mp0.
All competing definitions of Φ quantify the inability to tensor factorize M, which corresponds to approximating the system as two disconnected parts A and B that do not affect one another.
Different options for approximate factorizations M ≈ M ⊗ M and .
These options correspond to the first superscript in ϕ-measures such as ϕ. The optimal factorizations maximize the accuracy of the approximate probability distribution that they predict, while the “noising” factorizations are instead defined by treating the input from the other subsystem as random noise, either with uniform distribution (option “n”) or with the observed marginal distribution (option “m”).
| Code | Factorization method |
|
|
|
| State-dependent? |
|---|---|---|---|---|---|---|
| n | Noising |
|
|
|
| N |
| m | Mild noising |
|
|
|
| N |
| o | Optimal not knowing state |
|
|
|
| N |
| x | Optimal given |
|
|
|
| Y |
| a | Optimal given |
|
|
|
| N |
Different options for which probability distributions p and q to compare, corresponding to the second and third superscripts in ϕ-measures such as ϕ.
The last three columns specify the formula for q for the three conditioning options we consider: when the state x0 is unknown (u), has a separable probability distribution (s) and is known (k), respectively.
| Code | Comparison option | Conditioning option | ||||
|---|---|---|---|---|---|---|
| u | s | k | ||||
| x | x | x | ||||
| t | Two-time state |
|
|
| ||
| f | Future state |
|
|
| ||
| a | Future state of subsystem A |
|
|
| ||
| p | Past state of subsystem A |
|
|
|
| |
Different options for measuring the difference d between two probability distributions p and q: Kullback-Leibler divergence d, L1-norm d1, L2-norm d2, Hilbert-space distance d, Shannon-Jensen distance d, Earth-Movers distance d and Mismatched Decoding distance d.
These options correspond to the fourth superscript in ϕ-measures such as ϕ. In the text, we considered options where p and q had one, two or four indices, but in this table, we have for simplicity combined all indices into a single Greek index α.
| Code | Metric | Definition | Positivity | Monotonicity | Interpretability | Tractability | Symmetry |
|---|---|---|---|---|---|---|---|
| k |
| Y | Y | Y | Y | N | |
| 1 |
| Y | Y | (Y) | N | Y | |
| 2 |
| Y | Y | (N) | (Y) | Y | |
| h |
| Y | Y | Y | N | Y | |
| s |
| Y | Y | Y | N | Y | |
| e |
| Y | Y | Y | N | Y | |
| m | See eqs ( | Y | Y | Y | N | N |
Fig 2Numerical comparison of different integration measures, averaged over 3,000 random trials.
In the bottom panel, all elements of p are independently drawn from a uniform distribution and normalized to sum to unity. In the top panel, only p(0) is randomly generated, and M is defined so as to swap the two subsystems, i.e., M = δ δ.
Fig 3Numerical comparison of the two measures ϕ and ϕ for 3,000 random trials, generated the same way as in Fig 2.
The two measures are seen to be rather similar for these examples, and to satisfy the inequality ϕ ≤ ϕ.
Fig 4Illustration of our fast Φ-approximation for an n = 16 example.
The structure of the A-matrix can be visualized either as a grid (top four examples) where each pixel color shows the value of the corresponding element A ranging from the smallest (black) to the largest (white), or as a graph (bottom examples) showing all non-zero matrix elements. Both of the matrices on the left correspond to the same graph below them, and both of the matrices on the right correspond to the same (disconnected) graph below them. Our method zeros all matrix elements |A| < ϵ below the threshold ϵ that makes the largest connected graph component involve merely half of the elements, which in the matrix picture means that there is a permutation of the elements (rows and columns) rendering the matrix block-diagonal (middle right). Whereas it would take exponentially long to try all matrix permutations, graph connectivity can be determined in polynomial time, thus enabling us to rapidly find a good approximation for the “cruelest cut” bipartition.
Fig 5How well our fast Φ-approximation works for 7,000 simulations of the n = 16 Φ-example described in the text.
Whereas it is seen to be excellent at finding the best bipartition when not all are comparably good, (i.e., when Φmax/Φmin ≫ 1), the approximation is seen to overestimate Φ by up to 15% (the median) when there is no clear winner (left side). From top to bottom, the three curves show the 95th, 50th and 5th percentiles of the overestimation factor. The shaded region delimits the largest overestimation possible, when Φappox = Φmax.