| Literature DB >> 35031656 |
Takuya Isomura1, Hideaki Shimazaki2, Karl J Friston3.
Abstract
This work considers a class of canonical neural networks comprising rate coding models, wherein neural activity and plasticity minimise a common cost function-and plasticity is modulated with a certain delay. We show that such neural networks implicitly perform active inference and learning to minimise the risk associated with future outcomes. Mathematical analyses demonstrate that this biological optimisation can be cast as maximisation of model evidence, or equivalently minimisation of variational free energy, under the well-known form of a partially observed Markov decision process model. This equivalence indicates that the delayed modulation of Hebbian plasticity-accompanied with adaptation of firing thresholds-is a sufficient neuronal substrate to attain Bayes optimal inference and control. We corroborated this proposition using numerical analyses of maze tasks. This theory offers a universal characterisation of canonical neural networks in terms of Bayesian belief updating and provides insight into the neuronal mechanisms underlying planning and adaptive behavioural control.Entities:
Mesh:
Year: 2022 PMID: 35031656 PMCID: PMC8760273 DOI: 10.1038/s42003-021-02994-2
Source DB: PubMed Journal: Commun Biol ISSN: 2399-3642
Glossary of expressions.
| Expression | Description |
|---|---|
| Canonical neural network | In this work, a canonical neural network is defined by differential equations of neural activity derived as a reduction of realistic neuron models through some approximations, which give a network of rate coding neurons with a sigmoid activation function. In particular, we consider networks comprising a middle layer that involves recurrent connections and the output layer that provides feedback responses to the environment. |
| Observations | |
| Risk function | |
| Parameter matrices | |
| Categorical distribution. In this expression, the probability that |
Fig. 1Schematic of an external milieu and neural network, and the corresponding Bayesian formation.
a Interaction between the external milieu and autonomous system comprising a two-layer neural network. On receiving sensory inputs or observations that are generated from hidden states , the network activity generates outputs . The gradient descent on a neural network cost function L determines the dynamics of neural activity and plasticity. Thus, L is sufficient to characterise the neural network. The proposed theory affirms that the ensuing neural dynamics are self-organised to encode the posterior beliefs about hidden states and decisions. b Corresponding variational Bayesian formation. The interaction depicted in a is formulated in terms of a POMDP model, which is parameterised by and . Variational free energy minimisation allows an agent to self-organise to encode the hidden states of the external milieu—and to make decisions minimising future risk. Here, variational free energy F is sufficient to characterise the inferences and behaviours of the agent.
Fig. 2Factor graph depicting a fictive causality of factors that the generative model hypothesises.
The POMDP model is expressed as a Forney factor graph[69,70] based upon the formulation in ref. [71]. The arrows from the present risk —sampled from —to past decisions optimise the policy in a post hoc manner, to minimise future risk. In reality, the current error is determined based on past decisions (top). In contrast, decision making to minimise the future risk implies a fictive causality from to (bottom). Inference and learning correspond to the inversion of this generative model. Postdiction of past decisions is formulated as the learning of the policy mapping, conditioned by . Here, A, B and C indicate matrices of the conditional probability, and bold case variables are the corresponding posterior beliefs. Moreover, and indicate the true prior beliefs about hidden states and decisions, while D and E indicate the priors that the network operates under. When and only when and , inferences and behaviours are optimal for a given task or set of environmental contingencies, and are biased otherwise.
Fig. 3Mathematical equivalence between variational free energy and neural network cost functions, depicted by one-to-one correspondence of their components.
Top: variational free energy transformed from Eq. (5) using the Bayes theorem. Here, and indicate the inverse mappings, and D and E are the state and decision priors. Bottom: neural network cost function that is a counterpart to the aforementioned variational free energy. In this equation, , , and (for ) indicate the sigmoid functions of synaptic strengths. Moreover, and are perturbation terms that characterise the bias in firing thresholds. Here, is a function of and , while is a function of . When is the sigmoid function of , holds for an arbitrary . Using this relationship, Eq. (7) is transformed into the form presented at the bottom of this figure. This form of cost functions formally corresponds to variational free energy expressed on the top of this figure. Blue lines show one-to-one correspondence of their components.
Correspondence of variables and functions.
| Neural network formation | Variational Bayes formation | |||
|---|---|---|---|---|
| Sensory inputs | Observations | |||
| Middle-layer neural activity | State posterior | |||
| Output-layer neural activity | Decision posterior | |||
| Feedback response | Decision | |||
| Neuromodulator | Risk function | |||
| Synaptic strengths | Parameter posterior | |||
| Perturbation terms | State prior | |||
| Decision prior | ||||
| Firing thresholds | ||||
| Initial synaptic strengths | Parameter prior | |||
Bold case variables (e.g. ) denote the posterior expectations of the corresponding italic case random variables (e.g. ). Note that are initial values of (for ) and are inverse learning rate factors that express the insensitivity of synaptic strengths to plasticity. Please refer to the previous paper[22] for details.
Fig. 4Simulations of neural networks solving maze tasks.
a Neural network architecture. The agent receives the states (pathway or wall) of the neighbouring 11 × 11 cells as sensory inputs. A decision here represents a four-step sequence of actions (selected from up, down, left or right), resulting in 256 options in total. The panels on the right depict observations and posterior beliefs about hidden states and decisions. b General view of the maze. The maze comprises a discrete state space, wherein white and black cells indicate pathways and walls, respectively. A thick blue cell indicates the current position of the agent, while the thin blue line is its trajectory. Starting from the left, the agent needs to reach the right edge of the maze within time steps. c Trajectories of the agent’s x-axis position in sessions before (black, session 1) and after (blue, session 100) training. d Duration to reach the goal when the neural network operates under uniform decision priors (where indicates the prior probability to select a decision involving the rightward motion in the next step). Blue and red circles indicate succeeded and failed sessions, respectively. e Failure probability (left) and duration to reach the goal (right) when the neural network operates under three different prior conditions (black, blue and cyan, respectively), where and hold. The line indicates the average of ten successive sessions. Although the neural network with exhibits better performance in the early stage, it turns out to overestimate a preference of the rightward motion in later stages, even when it approaches the wall. e was obtained with 20 distinct, randomly generated mazes. Shaded areas indicate the standard error. Refer to Methods section ‘Simulations’ for further details.
Fig. 5Estimation of implicit priors enables the prediction of subsequent learning.
a Estimation of implicit prior —encoded by threshold factor —under three different prior conditions (black, blue and cyan; c.f., Fig. 4). Here, was estimated through Bayesian inference based on sequences of neural activity, obtained with ten distinct mazes. Then, was computed by for each of 64 elements. The other 192 elements of E1 (i.e. ) were also estimated. The sum of all the elements of E1 was normalised to 1. b Prediction of the learning process within previously unexperienced, randomly generated mazes. Using the estimated , we reconstructed the computational architecture (i.e. neural network) of the agent. Then, we simulated the adaptation process of the agent’s behaviour using the reconstructed neural network and computed the trajectory of the probability of failure to reach the goal within time steps. The resulting learning trajectories (solid lines) predict the learning trajectories of the original agent (dashed lines) under three different prior conditions, in the absence of observed neural responses and behaviours. Lines and shaded areas indicate the mean and standard error, respectively. Inset panels depict comparisons between the failure probability of the original and reconstructed agent after learning (average over session 51–100), within ten previously unexperienced mazes. Refer to Methods section ‘Data analysis’ for further details.