| Literature DB >> 33286721 |
Abstract
Based on the conceptual basis of information theory, we propose a novel mutual information measure-'path-based mutual information'. This information measure results from the representation of a set of random variables as a probabilistic graphical model. The edges in this graph are modeled as discrete memoryless communication channels, that is, the underlying data is ergodic, stationary, and the Markov condition is assumed to be applicable. The associated multilinear stochastic maps, tensors, transform source probability mass functions into destination probability mass functions. This allows for an exact expression of the resulting tensor of a cascade of discrete memoryless communication channels in terms of the tensors of the constituting communication channels in the paths. The resulting path-based information measure gives rise to intuitive, non-negative, and additive path-based information components-redundant, unique, and synergistic information-as proposed by Williams and Beer. The path-based redundancy satisfies the axioms postulated by Williams and Beer, the identity axiom postulated by Harder, and the left monotonicity axiom postulated Bertschinger. The ordering relations between redundancies of different joint collections of sources, as captured in the redundancy lattices of Williams and Beer, follow from the data processing inequality. Although negative information components can arise, we speculate that these either result from unobserved variables, or from adding additional sources that are statistically independent from all other sources to a system containing only non-negative information components. This path-based approach illustrates that information theory provides the concepts and measures for a partial information decomposition.Entities:
Keywords: causal inference; data processing inequality; diagnostics; information theory; mutual information; partial information decomposition; paths; tensors; transfer entropy
Year: 2020 PMID: 33286721 PMCID: PMC7597237 DOI: 10.3390/e22090952
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1The redundancy lattice [6] for: (a) two sources, and (b) three sources. The labels next to the vertices indicate the collection of (joint) sources. In case two lattice vertices are connected, the redundancy related to the highest lattice vertex (in position) is greater than or equal to the redundancy of the lower lattice vertex. The ordering relation for the vertices of the same color follows directly from the Data Processing Inequality.
Distribution for the “two-bit copy problem”.
|
|
|
|
|
|---|---|---|---|
| 0 | 0 | (0, 0) |
|
| 0 | 1 | (0, 1) |
|
| 1 | 0 | (1, 0) |
|
| 1 | 1 | (1, 1) |
|
Marginal, conditional and joint distributions for the cascade , that is, the path , with , and . For the pmf the index is not used because this pmf is independent of the chain.
| Source |
|
|
|
|
| |
|---|---|---|---|---|---|---|
| 0 | (0, 0) |
|
|
|
| 1 |
| 0 | (0, 1) |
|
|
|
| 1 |
| 0 | (1, 0) |
|
|
|
| 1 |
| 0 | (1, 1) |
|
|
|
| 1 |
| 1 | (0, 0) |
|
|
|
| 1 |
| 1 | (0, 1) |
|
|
|
| 1 |
| 1 | (1, 0) |
|
|
|
| 1 |
| 1 | (1, 1) |
|
|
|
| 1 |
Example of a system with an unobserved common cause. (a) Data set comprising three parameters. (b) Unobserved common cause. , , and .
| (a) Observed Data | (b) Unobserved Common Cause | ||||
|---|---|---|---|---|---|
|
|
|
|
|
| |
| 0 | 0 | 0 | (0, 0) |
| |
| 0 | 1 | 0 | (0, 1) |
| |
| 1 | 1 | 0 | (1, 0) |
| |
| 1 | 1 | 1 | (1, 1) |
| |
Figure 2(a) The graph when all variables are observed. (b) The graph when W is not observed.
Marginal, conditional and joint distributions for path , with , and .
| Source |
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| 0 | 0 |
|
|
|
|
|
| 0 | 1 |
|
|
|
|
|
| 1 | 0 |
|
|
|
|
|
| 1 | 1 |
|
|
|
|
|
Negative synergistic information in a system comprising three variables, implies that there must be a system comprising four variables resulting in the same probability distribution when a common cause is unobserved. The four variable system consists of the unobserved common cause, W, And , (Not ) And , and .
| 3 | 4 Variable System | ||||||
|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
| |
| 0 | 0 | 0 |
| (0, 0) | 0 | 0 | 0 |
| 0 | 0 | 1 |
| (0, 1) | 0 | 0 | 1 |
| 0 | 1 | 0 |
| (1, 0) | 0 | 1 | 0 |
| 1 | 0 | 1 |
| (1, 1) | 1 | 0 | 1 |
Figure 3The graph for the system used to demonstrate that the Left Monotonicity property and the identity property are incompatible with the local positivity property.
Two systems, both comprising three random variables with identical joint probabilities per combination of the random variables. The underlying structures are very different, which can be seen when the variables are represented in two bits, for example, the binary expansion for equals . (a) For the dyadic (pair-wise) set, ,, and . (b) For the triadic (three-way) set, mod2, and .
| (a) Dyadic | (b) Triadic | ||||||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| ||
| 0 | 0 | 0 |
| 0 | 0 | 0 |
| ||
| 0 | 2 | 1 |
| 1 | 1 | 1 |
| ||
| 1 | 0 | 2 |
| 0 | 2 | 2 |
| ||
| 1 | 2 | 3 |
| 1 | 3 | 3 |
| ||
| 2 | 1 | 0 |
| 2 | 0 | 2 |
| ||
| 2 | 3 | 1 |
| 3 | 1 | 3 |
| ||
| 3 | 1 | 2 |
| 2 | 2 | 0 |
| ||
| 3 | 3 | 3 |
| 3 | 3 | 1 |
|
The probability distribution for PwUnq.
|
|
|
|
|
|---|---|---|---|
| 0 | 1 | 1 |
|
| 1 | 0 | 1 |
|
| 0 | 2 | 2 |
|
| 2 | 0 | 2 |
|
Partial information decomposition (PID) for PwUnq. The partial information decomposition for redundant , unique, and , and synergistic information . With the value for the specific atom, using the average partial information, and the partial information atom based on path-based mutual information.
| Lattice Node |
|
|
|---|---|---|
|
| 0 | 0 |
|
| 0.5 | 0.5 |
|
| 0.5 | 0.5 |
|
| 0 | 0 |
The probability distributions from Reference [12]. We left out the distributions for And and Or because these are considered to be well known.
| 5A | 5B | 5C |
|
|
| ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| ||||||
| 0 | 0 | 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| ||||||
| 0 | 1 | 1 |
| 1 |
| 1 |
| 1 |
| 1 |
| 1 |
| ||||||
| 1 | 0 | 2 |
| 2 |
| 2 |
| 1 |
| 1 |
| 1 |
| ||||||
| 1 | 1 | − | − | 1 |
| 0 |
| − | − | 0 |
| 2 |
| ||||||
| 0 | 1 | − | − | − | − | 1 |
| − | − | − | − | 0 | − | ||||||
| 1 | 1 | − | − | − | − | 1 |
| − | − | − | − | 0 | − | ||||||
PID for 5a.
| Lattice Node |
|
|
|
|
|---|---|---|---|---|
|
| 0.3333 | 0 | 0.1383 | 0.1383 |
|
| 0.3333 | 0.6666 | 0.5283 | 0.5283 |
|
| 0.3333 | 0.6666 | 0.5283 | 0.5283 |
|
| 0.5850 | 0.2516 | 0.3900 | 0.3900 |
PID for 5b.
| Lattice Node |
|
|
|
|
|---|---|---|---|---|
|
| 0.5 | 0 | 0 | 0 |
|
| 0.5 | 1 | 1 | 1 |
|
| 0 | 0.5 | 0.5 | 0.5 |
|
| 0.5 | 0 | 0 | 0 |
PID for 5c.
| Lattice Node |
|
|
|
|
|---|---|---|---|---|
|
| 0.67 | 0.67 | 0.67 | 0.67 |
|
| 0.25 | 0.25 | 0.25 | 0.25 |
|
| 0 | 0 | 0 | 0 |
|
| 0 | 0 | 0 | 0 |
PID for ReducedOr.
| Lattice Node |
|
|
|
|
|---|---|---|---|---|
|
| 0.69 | 0.69 | 0.38 | 0.40 |
|
| 0 | 0 | 0.31 | 0.29 |
|
| 0 | 0 | 0.31 | 0.29 |
|
| 0.31 | 0.31 | 0 | 0.02 |
PID for Xor.
| Lattice Node |
|
|
|
|
|---|---|---|---|---|
|
| 1 | 1 | 1 | 1 |
|
| 0 | 0 | 0 | 0 |
|
| 0 | 0 | 0 | 0 |
|
| 0 | 0 | 0 | 0 |
PID for And/Or.
| Lattice Node |
|
|
|
|
|---|---|---|---|---|
|
| 0.5 | 0.5 | 0.29 | 0.19 |
|
| 0 | 0 | 0.21 | 0.31 |
|
| 0 | 0 | 0.21 | 0.31 |
|
| 0.31 | 0.31 | 0.10 | 0 |
PID for Sum.
| Lattice Node |
|
|
|
|
|---|---|---|---|---|
|
| 1 | 1 | 0.5 | 0.5 |
|
| 0 | 0 | 0.5 | 0.5 |
|
| 0 | 0 | 0.5 | 0.5 |
|
| 0.5 | 0.5 | 0 | 0 |
All possible combinations for the redundancies from Equation (A9).