| Literature DB >> 31562544 |
Thomas Parr1, Karl J Friston2.
Abstract
Active inference is an approach to understanding behaviour that rests upon the idea that the brain uses an internal generative model to predict incoming sensory data. The fit between this model and data may be improved in two ways. The brain could optimise probabilistic beliefs about the variables in the generative model (i.e. perceptual inference). Alternatively, by acting on the world, it could change the sensory data, such that they are more consistent with the model. This implies a common objective function (variational free energy) for action and perception that scores the fit between an internal model and the world. We compare two free energy functionals for active inference in the framework of Markov decision processes. One of these is a functional of beliefs (i.e. probability distributions) about states and policies, but a function of observations, while the second is a functional of beliefs about all three. In the former (expected free energy), prior beliefs about outcomes are not part of the generative model (because they are absorbed into the prior over policies). Conversely, in the second (generalised free energy), priors over outcomes become an explicit component of the generative model. When using the free energy function, which is blind to future observations, we equip the generative model with a prior over policies that ensure preferred (i.e. priors over) outcomes are realised. In other words, if we expect to encounter a particular kind of outcome, this lends plausibility to those policies for which this outcome is a consequence. In addition, this formulation ensures that selected policies minimise uncertainty about future outcomes by minimising the free energy expected in the future. When using the free energy functional-that effectively treats future observations as hidden states-we show that policies are inferred or selected that realise prior preferences by minimising the free energy of future expectations. Interestingly, the form of posterior beliefs about policies (and associated belief updating) turns out to be identical under both formulations, but the quantities used to compute them are not.Entities:
Keywords: Active inference; Bayesian; Data selection; Epistemic value; Free energy; Intrinsic motivation
Mesh:
Year: 2019 PMID: 31562544 PMCID: PMC6848054 DOI: 10.1007/s00422-019-00805-w
Source DB: PubMed Journal: Biol Cybern ISSN: 0340-1200 Impact factor: 2.086
Applications of active inference for Markov decision processes
| Application | Comment | References |
|---|---|---|
| Decision making under uncertainty | Initial formulation of active inference for | Friston et al. ( |
| Optimal control (the mountain car problem) | Illustration of | Friston et al. ( |
| Evidence accumulation: Urns task | Demonstration of how beliefs states are absorbed into a generative model | FitzGerald et al. ( |
| Addiction | Application to psychopathology | Schwartenbeck et al. ( |
| Dopaminergic responses | Associating dopamine with the encoding of (expected) precision provides a plausible account of dopaminergic discharges | FitzGerald et al. ( |
| Computational fMRI | Using Bayes optimal precision to predict activity in dopaminergic areas | Schwartenbeck et al. ( |
| Choice preferences and epistemics | Empirical testing of the hypothesis that people prefer to keep options open | Schwartenbeck et al. ( |
| Behavioural economics and trust games | Examining the effects of prior beliefs about self and others | Moutoussis et al. ( |
| Foraging and two-step mazes; navigation in deep mazes | Formulation of epistemic and pragmatic value in terms of | Friston et al. ( |
| Habit learning, reversal learning and devaluation | Learning as minimising variational free energy with respect to model parameters—and action selection as | FitzGerald et al. ( |
| Saccadic searches and scene construction | Friston and Buzsaki ( | |
| Electrophysiological responses: | Simulating neuronal processing with a gradient descent on variational free energy, c.f., dynamic | Friston et al. ( |
| Structure learning, sleep and insight | Inclusion of parameters into expected free energy to enable structure learning via | Friston et al. ( |
| Narrative construction and reading | Hierarchical generalisation of generative model with | Friston et al. ( |
| Computational neuropsychology | Simulation of visual neglect, hallucinations and prefrontal syndromes under alternative pathological priors | Benrimoh et al. ( |
| Neuromodulation | Use of precision parameters to manipulate exploration during saccadic searches; associating uncertainty with cholinergic and noradrenergic systems | Parr and Friston ( |
| Decisions to movements | Hybrid continuous and discrete generative models to implement decisions through movement | Friston et al. ( |
| Planning, navigation and niche construction | Agent-induced changes in environment (generative process); decomposition of goals into subgoals | Bruineberg et al. ( |
Fig. 1Markov decision process. This shows the basic structure of the discrete state-space generative model used in this paper, assuming the current time is t = τ. The factor graph on the left is the generative model we have used in previous work. Importantly, the prior belief about observations only enters this graph through the expected free energy, (see main text), which enters the prior over policies. Policies index alternative trajectories, or sequences, of actions. In this sense, they are not time dependent, as each policy determines a sequence of actions for all time-points. Conversely, the actions (u) are time dependent. U is an array that specifies an action for each time step (rows) and each policy (columns). The selected action therefore depends upon the most likely policy and the action that policy implies for that time step. Action selection is technically not part of the generative model, as it relies upon the posterior distribution Q (please see main text for details), obtained by inverting the model. This is an important, aspect of active inference, as it underwrites the way in which the system performing inference may change the process generating its observed data. The grey region of this graph indicates that the observation at the next time step is not yet available, so cannot yet be incorporated into the graph. The right factor graph is the new version of the generative model considered in this paper. This generative model does not require an expected free energy, and the prior over outcomes enters the model directly as a constraint on outcomes. This also shows a time dependence, as future outcomes are treated as unobserved latent variables (indicated by an unfilled circle). Observed variables are shown as filled circles in both graphs and unobserved variables as unfilled circles. Factors of the generative model (i.e. conditional probability distributions and prior probabilities) are shown as squares. These squares are connected to those circles containing variables that participate in the same factor. Please refer to the main text and Table 2 for a description of the variables. In the panel on the right, the definitions are given for each of the factors in blue squares. Here, Cat refers to the categorical distribution (color figure online)
Variables in update equations
| Variable | Definition |
|---|---|
| Variational free energy | |
| Expected free energy | |
| Generalised free energy | |
| Policy prior and posterior | |
| State belief (for a given policy and time) | |
| Outcome belief (for a given policy and time) | |
| Outcome | |
| Likelihood matrix (mapping states to outcomes) | |
| Transition matrix (mapping states to states) | |
| Outcome prior | |
| Fixed form policy prior | |
| Entropy of the likelihood mapping |
Fig. 2Temporal progression of Markov decision process. The upper graphs shows the structure of the generative model implied using the variational free energy, equipped with a prior that the expected free energy will be minimised by policy selection. Observations are added to the model as they occur. The lower graphs show the structure of the generative model that explicitly represents future outcomes, and minimises a generalised free energy through policy selection. As observations are made, the outcome variables collapse to delta functions. These graphics are intended to highlight two alternative conceptions of a generative model employed in an online setting. The key problem here is how to deal with missing (future) outcomes. These could be omitted until such a time as they become available. Alternatively, they could be treated as hidden variables about which we can hold beliefs. Please note that this graphic illustrates different ways of formulating the generative model used to calculate belief updates. It does not show belief updates, behaviour or any other free energy minimising process. These will be detailed in subsequent sections and figures. However, the reason for making this distinction is important for how we formulate the free energy. The key distinction between the free energies compared in this paper is which of the two perspectives on future outcomes we choose to adopt
Fig. 3Belief update equations. The blue panels show the update equations using the standard variational approach. The pink panels show the update equations when the generalised free energy is used. The equations in this figure show the fixed points for the sufficient statistics of each variational distribution. These are calculated as in the main text by finding the minima of each of the free energy functionals. As such, updating the variational distributions (left-hand side of each equation) to their fixed points (right-hand side of each equation) following each new observation minimises the corresponding free energy. The dotted outline indicates the correspondence between the generalised free energy and the sum of the variational and expected free energies, and therefore the equivalence of the form of the posteriors over policies. However, it should be remembered that the variables within these equations are not identical, as the update equations demonstrate. See Table 2 for the definitions of the variables as they appear here. The equations used here are discrete updates. A more biologically plausible (gradient ascent) scheme is used in the simulations. These simply replace the updates with differential equations that have stationary points corresponding to the variational solutions above. Because the belief updates specified in Fig. 3 take each belief distribution to its free energy minimum, the belief updates and corresponding policy choices necessarily minimise free energy. In the update equations shown here, o is treated as a binary vector with one in the element corresponding to the observed data, and zero for all other elements. This ensures consistency with the linear algebraic expression of the update equations (color figure online)
Fig. 4T-maze simulation. The left part of this figure shows the structure of the generative model used to illustrate the behavioural consequences of each set of update equations. We have previously used this generative model to address exploration and exploitation in two-step tasks; further details of which can be found in Friston et al. (2015). In brief, an agent can find itself in one of four different locations and can move among these locations. Locations 2 and 3 are absorbing states, so the agent is not able to leave these locations once they have been visited. The initial location is always 1. Policies define the possible sequences of movements the agent can take throughout the trial. For all ten available policies, after the second action, the agent stays where it is. There are two possible contexts: the unconditioned stimulus (US) may be in the left or right arm of the maze. The context and location together give rise to observable outcomes. The first of these is the location, which is obtained through an identity mapping from the hidden state representing location. The second outcome is the cue that is observed. In location 1, a conditioned stimulus (CS) is observed, but there is a 50% chance of observing blue or green, regardless of the context, so this is uninformative (and ambiguous). Location 4 deterministically generates a CS based on the context, so visiting this location resolves uncertainty about the location of the US. The US observation is probabilistically dependent on the context. It is observed with a 90% chance in the left arm in context 1 and a 90% chance in the right arm in context 2. The right part of this figure compares an agent that minimises its variational free energy (under the prior belief that it will select policies with a low expected free energy) with an agent that minimises its generalised free energy. The upper plots show the posterior beliefs about policies, where darker shades indicate more probable policies. Below these, the posterior beliefs about states (location and context) are shown, with blue dots superimposed to show the true states used to generate the data. The lower plots show the prior beliefs about outcomes (i.e. preferences), and the true outcomes (blue dots) the agent encountered. Note that a US is preferred to either CS, both of which are preferable to no stimulus (NS). Outcomes are observed at each time step, depending upon actions selected at the previous step. The time steps shown here align with the sequence of events during a trial, such that a new outcome is available at each step. Actions induce transitions from one time step to the next (color figure online)
Fig. 5Optimistic distortions of future beliefs. These raster plots represent the (Bayesian model average of the) approximate posterior beliefs about states (specifically, those pertaining to location). At each time step t, there is a set of units encoding beliefs about every other time step τ in the past and future. The evolution of these beliefs is reflected the evidence accumulation or belief updating of approximate posterior expectations, with lighter shades indicating more probable states