| Literature DB >> 30214404 |
Martin Biehl1, Christian Guckelsberger2, Christoph Salge3,4, Simón C Smith4,5, Daniel Polani4.
Abstract
Active inference is an ambitious theory that treats perception, inference, and action selection of autonomous agents under the heading of a single principle. It suggests biologically plausible explanations for many cognitive phenomena, including consciousness. In active inference, action selection is driven by an objective function that evaluates possible future actions with respect to current, inferred beliefs about the world. Active inference at its core is independent from extrinsic rewards, resulting in a high level of robustness across e.g., different environments or agent morphologies. In the literature, paradigms that share this independence have been summarized under the notion of intrinsic motivations. In general and in contrast to active inference, these models of motivation come without a commitment to particular inference and action selection mechanisms. In this article, we study if the inference and action selection machinery of active inference can also be used by alternatives to the originally included intrinsic motivation. The perception-action loop explicitly relates inference and action selection to the environment and agent memory, and is consequently used as foundation for our analysis. We reconstruct the active inference approach, locate the original formulation within, and show how alternative intrinsic motivations can be used while keeping many of the original features intact. Furthermore, we illustrate the connection to universal reinforcement learning by means of our formalism. Active inference research may profit from comparisons of the dynamics induced by alternative intrinsic motivations. Research on intrinsic motivations may profit from an additional way to implement intrinsically motivated agents that also share the biological plausibility of active inference.Entities:
Keywords: active inference; empowerment; free energy principle; intrinsic motivation; perception-action loop; predictive information; universal reinforcement learning; variational inference
Year: 2018 PMID: 30214404 PMCID: PMC6125413 DOI: 10.3389/fnbot.2018.00045
Source DB: PubMed Journal: Front Neurorobot ISSN: 1662-5218 Impact factor: 2.650
Figure 1First two time steps of the Bayesian network representing the perception-action loop (PA-loop). All subsequent time steps are identical to the one from time t = 1 to t = 2.
Figure 2Bayesian network of the generative model with parameters Θ = (Θ1, Θ2, Θ3) and hyperparameters Ξ = (Ξ1, Ξ2, Ξ3). Hatted variables are models / estimates of non-hatted counterparts in the perception-action loop in Figure 1. An edge that splits up connecting one node to n nodes (e.g., Θ2 to Ê1, Ê2, …) corresponds to n edges from that node to all the targets under the usual Bayesian network convention. Note that in contrast to the perception-action loop in Figure 1, imagined actions  have no parents. They are either set to past values or, for those in the future, a probability distribution over them must be assumed.
Figure 3Internal generative model with plugged in data up to t = 2 with Ŝ0 = s0, Ŝ1 = s1 and Â1 = a1 as well as from now on fixed hyperparameters ξ = (ξ1, ξ2, ξ3). Conditioning on the plugged in data leads to the posterior distribution . Predictions for future sensor values can be obtained by marginalising out other random variables e.g., to predict Ŝ2 we would like to get q(ŝ2|s0, s1, a1, ξ). Note however that this requires an assumption for the probability distribution over Â2.
Figure 4Bayesian network of the approximate posterior factor at t = 2. The variational parameters Φ1, Φ2, Φ3, and are positioned so as to indicate what dependencies and nodes they replace in the generative model in Figure 2.
Figure 5Bayesian network of the approximate complete posterior of Equation (40) at t = 2 for the future actions . Only and the future action appear in the predictive factor and influence future variables. In general there is one approximate complete posterior for each possible sequence of future actions.
Figure 6Generative model including at t = 2 with ŜÂ≺2 influencing future actions . Note that, only future actions are dependent on past sensor values and actions, e.g., action Â1 has no incoming edges. The increased gap between time step t = 1 and t = 2 is to indicate that this time step is special in the model. For each time step t there is an according model with the particular relation between past ŜÂ≺ and shifted accordingly.
| Actual environment states | ||
| Estimated/modeled environment states | ||
| Actual/observed sensor or outcome values | ||
| Estimated/modeled (usually future) sensor or outcome values. Note that the index τ instead of | ||
| Actions | ||
| Contemplated (usually future) actions | ||
| Agent memory state | ||
| π, ũ | π and ũ both uniquely specify future action sequences | |
| θ | θ | Generative model parameters |
| q(ŝ|ê, θ1) = q(ŝ|ê) | Model sensor dynamics, not parameterised in Friston et al. ( | |
| q(ê′|â′, ê, θ2) = q(ê′|â′, ê) | Model environment dynamics, not parameterised in Friston et al. ( | |
| Modeled initial environment state, not parameterised in Friston et al. ( | ||
| ξ = (ξ1, ξ2, ξ3) | Generative model hyperparam. or model parameter that subsumes all hyperparameters | |
| ξ1 | sensor dynamics hyperparam. | |
| ξ2 | Environment dynamics hyperparam. | |
| ξ3 | Initial environment state hyperparam. | |
| ξΓ | (α, β) | Precision hyperparam. |
| (ϕ, ϕΓ) | μ | Variational param. |
| Environment states variational param., | ||
| for each timestep τ | ||
| ϕ1 | Sensor dynamics variational param. | |
| ϕ2 | Environment dynamics variational param. | |
| ϕ3 | Initial environment state variational param. | |
| π | Future action sequence variational param. | |
| ϕΓ | Precision variational param. | |
| Variational action-value function. The dependence of | ||
| p( | Our physical environment corresponds to the generative process | |
| The generative model for active inference including γ (which we mostly omit) | ||
| Approximate complete posterior for active inference | ||
| Prior over future outcomes. |
| Actual environment states | ||
| Estimated/modeled environment states | ||
| Actual/observed sensor or outcome values | ||
| Estimated/modeled (usually future) sensor or outcome values. Note that the index τ instead of | ||
| Actions | ||
| Contemplated (usually future) actions | ||
| Agent memory state | ||
| π, | action sequences | |
| θ | θ | Generative model parameters |
| θ1 | Sensor dynamics param. | |
| θ2 | Environment dynamics param. | |
| θ3 | Initial environment state param. | |
| ξ | η | Generative model hyperparam. or model parameter that subsumes all hyperparameters |
| ξ1 | sensor dynamics hyperparam. | |
| ξ2 | Environment dynamics hyperparam. | |
| ξ3 | Initial environment state hyperparam. | |
| ξΓ | β | Precision hyperparam. |
| (ϕ, ϕΓ) | η | Variational param. |
| Environment states variational param. | ||
| For each sequence of actions and for each timestep there is a parameter | ||
| ϕ1 | Sensor dynamics variational param. | |
| ϕ2 | Environment dynamics variational param. | |
| ϕ3 | Initial environment state variational param. | |
| π | π | Future action sequence variational param. |
| ϕΓ | β | Precision variational param. |
| − | Variational action-value function. The dependence of | |
| p( | Our physical environment corresponds to the generative process | |
| The generative model for active inference | ||
| Approximate complete posterior for active inference | ||
| Prior over future outcomes. |