| Literature DB >> 35462780 |
Thijs van de Laar1, Magnus Koudahl1,2, Bart van Erp1, Bert de Vries1,3.
Abstract
The Free Energy Principle (FEP) postulates that biological agents perceive and interact with their environment in order to minimize a Variational Free Energy (VFE) with respect to a generative model of their environment. The inference of a policy (future control sequence) according to the FEP is known as Active Inference (AIF). The AIF literature describes multiple VFE objectives for policy planning that lead to epistemic (information-seeking) behavior. However, most objectives have limited modeling flexibility. This paper approaches epistemic behavior from a constrained Bethe Free Energy (CBFE) perspective. Crucially, variational optimization of the CBFE can be expressed in terms of message passing on free-form generative models. The key intuition behind the CBFE is that we impose a point-mass constraint on predicted outcomes, which explicitly encodes the assumption that the agent will make observations in the future. We interpret the CBFE objective in terms of its constituent behavioral drives. We then illustrate resulting behavior of the CBFE by planning and interacting with a simulated T-maze environment. Simulations for the T-maze task illustrate how the CBFE agent exhibits an epistemic drive, and actively plans ahead to account for the impact of predicted outcomes. Compared to an EFE agent, the CBFE agent incurs expected reward in significantly more environmental scenarios. We conclude that CBFE optimization by message passing suggests a general mechanism for epistemic-aware AIF in free-form generative models.Entities:
Keywords: active inference; constrained bethe free energy; free energy principle; message passing; variational optimization
Year: 2022 PMID: 35462780 PMCID: PMC9019474 DOI: 10.3389/frobt.2022.794464
Source DB: PubMed Journal: Front Robot AI ISSN: 2296-9144
Summary of notational conventions throughout the paper.
| Notation | Def | Explanation |
|---|---|---|
|
| Collection of (arbitrary) model variables | |
|
| Individual model variable with index | |
|
| ( | Factorized model of variables |
|
| ( | Factor (conditional or prior probability distribution) with argument variables |
|
| ( | Variational distribution of (latent) variables |
|
| ( | Average energy |
|
| ( | Entropy |
| F[ | ( | Variational Free Energy |
|
| Observation, state and control variable (at time | |
|
| Specific realization for observation or unobserved future (predicted) outcome | |
|
| Specific control realization | |
|
| ( | Generative Model engine |
|
| ( | Observation model |
|
| ( | Transition model |
|
| ( | State prior |
|
| Sequence of future observation variables | |
|
| Policy (sequence of specific future controls), | |
|
| ( | Optimized Variational Free Energy for policy |
|
| ( | Optimal policy |
|
| ( | Expected Free Energy (EFE) |
|
| ( | Aggregate observation model |
|
| ( | Aggregate state transition model, including state prior |
|
| ( | Goal prior for sequence of future observation variables |
|
| ( | Goal prior for observation variable at time |
|
| ( | Factorized model of future variables (at time |
| B[ | ( | Bethe Free Energy (BFE) |
|
| ( | Bethe entropy |
|
| ( | Bethe Free Energy of future model under policy |
|
| ( | Constrained Bethe Free Energy (CBFE) of future model under policy |
FIGURE 1Forney-style factor graph representation for the example model of (Eq. 29) (A) and message passing schedule for the Bethe Free Energy minimization of (Eq. 30) (B). Shaded messages indicate variational message updates, and the solid square node indicates given (clamped) values. The round node indicates a point-mass constraint for which the value is optimized.
Optima for the individual terms of the CBFE decompositions (Eq. 32, Eq. 34). Each row varies one quantity (variable or function) in their respective term while other quantities remain fixed. Shaded cells indicate that the term (row) is not a function(al) of that specific optimization quantity (column).
|
|
| |||
|---|---|---|---|---|
|
|
|
|
| |
| max ex. val |
| |||
| min complexity |
|
| ||
|
|
| |||
| max confidence |
|
| ||
|
|
| |||
| Max intrinsic value |
|
| ||
|
|
| |||
| min posterior divergence |
|
| ||
|
|
| |||
|
|
| |||
FIGURE 2Message passing schedule for the example model of (Eq. 37) for the BFE (A) and CBFE (B). The dashed box summarizes the observation model.
Free energies (in bits) per policy for the example application.
| Policy | BFE | CBFE |
|---|---|---|
| Ignorant | 0 | 1 |
| Informative | 0 | 0 |
FIGURE 3Layout of the T-maze. The agent starts at position 1. The reward is located at either position 2 or 3. Position 4 contains a cue which indicates the reward position.
FIGURE 4Forney-style factor graph of the generative model for the T-maze. The MUX nodes select the transition matrix as determined by the control variable. Dashed boxes summarize the indicated distributions.
FIGURE 5Message passing schedule for planning in the T-maze with the BFE.
FIGURE 6Message passing schedule for planning in the T-maze with the CBFE.
FIGURE 7(Constrained) Bethe Free Energies ((C)BFE) and Expected Free Energies (EFE) (in bits) for the T-maze policies under varying parameter settings. Each diagram plots the minimized free energy values for all possible policies (lookahead T = 2), with the first move on the vertical axis and the second move on the horizontal axis. For example, the cell in row 4, column 3 represents the policy , which first moves to position 4 and then to position 3. The values for the optimal policies are annotated red with an asterisk.
FIGURE 8Confidence, complexity and extrinsic value contributions (in bits) to the Constrained Bethe Free Energy (32) for the T-maze policies (lookahead T = 2) under varying parameter settings. Optimal values are indicated red with an asterisk.
FIGURE 9Message passing schedule for the slide step.
FIGURE 10Average reward landscapes for the Constrained Bethe Free Energy (CBFE) agent and the Expected Free Energy (EFE) agent.