| Literature DB >> 30988668 |
Axel Constant1,2,3, Maxwell J D Ramstead2,3,4,5, Samuel P L Veissière3,5,6, Karl Friston2.
Abstract
How do humans come to acquire shared expectations about how they ought to behave in distinct normalized social settings? This paper offers a normative framework to answer this question. We introduce the computational construct of 'deontic value' - based on active inference and Markov decision processes - to formalize conceptions of social conformity and human decision-making. Deontic value is an attribute of choices, behaviors, or action sequences that inherit directly from deontic cues in our econiche (e.g., red traffic lights); namely, cues that denote an obligatory social rule. Crucially, the prosocial aspect of deontic value rests upon a particular form of circular causality: deontic cues exist in the environment in virtue of the environment being modified by repeated actions, while action itself is contingent upon the deontic value of environmental cues. We argue that this construction of deontic cues enables the epistemic (i.e., information-seeking) and pragmatic (i.e., goal- seeking) values of any behavior to be 'cached' or 'outsourced' to the environment, where the environment effectively 'learns' about the behavior of its denizens. We describe the process whereby this particular aspect of value enables learning of habitual behavior over neurodevelopmental and transgenerational timescales.Entities:
Keywords: Markov decision process; active inference; decision-making; deonticity; niche construction theory; social conformity
Year: 2019 PMID: 30988668 PMCID: PMC6452780 DOI: 10.3389/fpsyg.2019.00679
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
FIGURE 1Generative model for discrete state-space navigation. The above graphical model shows the relations among the different quantities involved in action policy selection. We refer the reader to Friston et al. (2015) for a detailed discussion of these quantities, update rules, and for the variational Bayesian method used to update the approximate posterior. The generative model specifies the agent-environment relation in terms of a joint probability P(o,s,π,A). Blue circles contain the quantities known by the agent, the green circles contain the quantities that must be inferred; namely, the action policy, the states of the world that generate observations, and future states upon which observations depend. The orange circle represents an observation. The generative model comprises a likelihood and prior over states, policies, and parameters. The observation matrix A specifies the likelihood of outcomes under each state of the world. Here, each location in an 8x8 world where the Markov decision process, takes place (shown for illustrative purposes, as gray dashed grid). The agent acts by changing its state (i.e., from s), which depends on the selected policy π. The most likely or valuable policies are those that minimize expected free energy G, which depends on C – a cost function that attributes a prior cost, when the agent encounters surprising states, say, encountering red locations. C implements pragmatic value discussed in the main text. D refers to the prior expectations about the starting location, and Cat and Dir refer to the form of the distribution (categorical or Dirichlet). The transition matrix, or prior beliefs about transitions, B encoded the possible transitions that can be engaged by the agent, given its allowable policies. A policy corresponds to a sequence of actions; for example, going up, down, left, right, stay. In the above figure, the agent is at s, and has to infer the policy, say, to reach s.
FIGURE 2Deontic cues. In contrast to the epistemic and pragmatic value, the deontic value is specified directly by the observations currently at hand. In other words, it enables the agent to infer the best course of action quickly and efficiently based upon the information afforded by deontic cues. In sum, the deontic value of a policy depends on deontic cues, which are generated by the environment, and once learnt, constrain policy selection. However, the propensity of the environment to generate deontic cues itself depends upon the agent’s behavior. This circular causality brings something important to the table; namely, the ‘caching’ of beliefs about action in the environment; in particular, actions that were originally selected after observation were made and policies selected, for their epistemic and pragmatic value. Furthermore, because this ‘cache’ is shared by all agents that navigate the econiche, it enables a vicarious communication among agents, as the environment ‘learns’ about the creatures that shape it.
DEEP model of ROEs.
| Axiology | Description | |
|---|---|---|
| Deontic (external) | The (shared) value of policy endowed by a direct policy-outcome mapping, indicating ‘what one should do if?’ | Deontic value: ln P(π|oτ) = ln P(oτ|π) + ln P(π) |
| Epistemic (parameters) Epistemic (states) (internal) | The salience or information gain under a given policy, with respect to ‘where I should be if.’ | E |
| Pragmatic (internal) | The expected value of an outcome with respect to ‘what I should perceive if.’ | -E |
| Concept | Definition |
|---|---|
| Regime of expectations | Set of expectations that structure action with respect to deontic, epistemic, and pragmatic values. |
| Deontic cue | Cues that induce behavior that remains unchanged over time, through reiterated action in the environment, while in turn being induced and maintained by the agent’s repeated actions in the environment. |
| Social conformity under active inference | The state of affairs wherein the deontic value comes to dominate policy selection and preferences in action and decision-making in a given cultural group. It obtains via the deontic constraints over the learning of expectations that, in turn, structure the available possibilities for action (i.e., epistemic and pragmatic value, or affordance). |
| Human decision-making | Unique propensity to rely on cooperative biases and to outsource policy selection to third parties (e.g., material or human cues). |
| Active inference | Process whereby agents garner Bayesian model evidence for their prior expectations about the world they inhabit. Technically, active inference entails the selection of action sequences that minimize expected free energy; i.e., minimize expected surprise (uncertainty) or, equivalently, maximize expected model evidence. |
| Prior expectations | Probability distribution over hidden states prior to observations. |
| Generative model | A joint probability distribution over environmental states and subsequent observations – usually specified in terms of prior expectations (about states) and the likelihood of observations given states. The generative model is a probabilistic model of the generative process. |
| Generative process | The actual process generating (observable) outcomes from (unobservable) states of the world. |
| Bayesian model evidence | Probability of observations given the generative model (i.e., independent of environmental states), known as the integrated or marginal likelihood (because one integrates or marginalizes over states of the model). The negative logarithm of model evidence is also known as surprisal. |
| Expected free energy | Functional of counterfactual expectations about states that comprise the epistemic, pragmatic, and deontic value of a policy. It quantifies the propensity of a policy to minimize expected surprise (a.k.a. surprisal or self-information) – or maximize expected Bayesian model evidence – in the future. |