| Literature DB >> 29472533 |
Kamaludin Dingle1,2,3, Chico Q Camargo4,5, Ard A Louis6.
Abstract
Many systems in nature can be described using discrete input-output maps. Without knowing details about a map, there may seem to be no a priori reason to expect that a randomly chosen input would be more likely to generate one output over another. Here, by extending fundamental results from algorithmic information theory, we show instead that for many real-world maps, the a priori probability P(x) that randomly sampled inputs generate a particular output x decays exponentially with the approximate Kolmogorov complexity [Formula: see text] of that output. These input-output maps are biased towards simplicity. We derive an upper bound P(x) ≲ [Formula: see text], which is tight for most inputs. The constants a and b, as well as many properties of P(x), can be predicted with minimal knowledge of the map. We explore this strong bias towards simple outputs in systems ranging from the folding of RNA secondary structures to systems of coupled ordinary differential equations to a stochastic financial trading model.Entities:
Year: 2018 PMID: 29472533 PMCID: PMC5823903 DOI: 10.1038/s41467-018-03101-6
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Simplicity Bias. The probability P(x) that an output x is generated by random inputs versus the approximate complexity for a the discrete n = 55 RNA sequence to SS map (<0.1% of outputs take up 50% of the inputs[24]), b the coarse-grained circadian rhythm ODE map (2% of the outputs take up 50% of the inputs), c the Ornstein–Uhlenbeck financial model (0.6% of the outputs take up 50% of the inputs), d L-systems for plant morphology (3% of the outputs take up 50% of the inputs), e a random 32 × 32 matrix map, and f a limited complexity 32 × 32 matrix map (both with <0.1% of the outputs taking over 50% of the inputs). Schematic examples of low and high-complexity outputs are also shown for each map. Blue dots are probabilities that take the top 50% of the probability weight for each complexity value while yellow dots denote the bottom 50% of the probability weight (only green was used for a, the RNA map, because the output probabilities were calculated using the probability estimator described in ref. [35]). The bold black lines denote the upper bound described in Eq. (3), while the dashed red lines represent the same upper bound, but with the default b = 0. For f, the upper bound line (orange) was fit to the distribution. All limited complexity maps exhibit simplicity bias, while the random matrix map does not
Fig. 2Variable complexity matrix map. The complexity of the 20 × 20 circulant matrix map can be varied by changing the complexity of the first row that defines the map. measures the ratio of the mean complexity of all individual outputs of a given map, divided by the mean complexity of outputs generated by random sampling over all inputs. In this plot, made with 2.5 × 104 matrices, the distribution of ratios is shown in a standard violin plot format. The horizontal dark blue lines denote the mean for each value of . Only relatively simple matrix maps (with smaller K̃(row)) exhibit simplicity bias, as indicated by being significantly >1