| Literature DB >> 36246500 |
Adam Safron1,2,3, Ozan Çatal4, Tim Verbelen4.
Abstract
Simultaneous localization and mapping (SLAM) represents a fundamental problem for autonomous embodied systems, for which the hippocampal/entorhinal system (H/E-S) has been optimized over the course of evolution. We have developed a biologically-inspired SLAM architecture based on latent variable generative modeling within the Free Energy Principle and Active Inference (FEP-AI) framework, which affords flexible navigation and planning in mobile robots. We have primarily focused on attempting to reverse engineer H/E-S "design" properties, but here we consider ways in which SLAM principles from robotics may help us better understand nervous systems and emergent minds. After reviewing LatentSLAM and notable features of this control architecture, we consider how the H/E-S may realize these functional properties not only for physical navigation, but also with respect to high-level cognition understood as generalized simultaneous localization and mapping (G-SLAM). We focus on loop-closure, graph-relaxation, and node duplication as particularly impactful architectural features, suggesting these computational phenomena may contribute to understanding cognitive insight (as proto-causal-inference), accommodation (as integration into existing schemas), and assimilation (as category formation). All these operations can similarly be describable in terms of structure/category learning on multiple levels of abstraction. However, here we adopt an ecological rationality perspective, framing H/E-S functions as orchestrating SLAM processes within both concrete and abstract hypothesis spaces. In this navigation/search process, adaptive cognitive equilibration between assimilation and accommodation involves balancing tradeoffs between exploration and exploitation; this dynamic equilibrium may be near optimally realized in FEP-AI, wherein control systems governed by expected free energy objective functions naturally balance model simplicity and accuracy. With respect to structure learning, such a balance would involve constructing models and categories that are neither too inclusive nor exclusive. We propose these (generalized) SLAM phenomena may represent some of the most impactful sources of variation in cognition both within and between individuals, suggesting that modulators of H/E-S functioning may potentially illuminate their adaptive significances as fundamental cybernetic control parameters. Finally, we discuss how understanding H/E-S contributions to G-SLAM may provide a unifying framework for high-level cognition and its potential realization in artificial intelligences.Entities:
Keywords: SLAM; active inference; artificial intelligence; free energy principle; hierarchical generative models; hippocampal and entorhinal systems; robotics
Year: 2022 PMID: 36246500 PMCID: PMC9563348 DOI: 10.3389/fnsys.2022.787659
Source DB: PubMed Journal: Front Syst Neurosci ISSN: 1662-5137
Potential correspondences between LatentSLAM, cognitive psychological, and bio-computational phenomena.
|
|
|
|
|---|---|---|
| Mapping/graphing: | Inferring dimensions of feature spaces and relative locations of phenomena based on observations | Relations between hippocampal place cells for particular locations combined with entorhinal grid cells for multi-scale metric-affordance information |
| Localization: | Positioning specific phenomena (including the mapping and localizing system itself) within inferred feature spaces | Conjunction of hippocampal/entorhinal place/grid cells for positioning specific events within maps/graphs |
| Sensor and actuator uncertainty: | Perceptual (including mnemonic and imaginative) ambiguity | Body and world states are indirectly inferred based on partial information from noisy signaling systems |
| Views: | Visuospatial perception (as a function of actions) | Information from ventral and dorsal visual streams (and other modalities) organized according to egocentric perspectival reference frames ( |
| Proprioceptive poses: | Somatospatial perception (as a function of actions) | Frontal-parietal hierarchies over the somatomotor strip, with modeling/control potentially enhanced |
| Experience-map: | Structuring of episodic memory and imagination both informed by and informing visuospatial and somatospatial modalities | Transitions between hippocampal place fields entailing spatiotemporal trajectories for organisms (potentially including trajectories for important effector/sensor systems such as eyes and hands), both entrained by and entraining largescale cortical attracting states |
| Spatial landmark graphs: | Consciously-accessible representations of (salience-biased) spatial relations, potentially constituting our sense of space; semantic content of graph is based on actions and corresponding sensations as paths are traversed across/through these nodes | Hippocampal place fields as chained attractors, mutually entrained with cortex to orchestrate attracting states for population activity along reduced-dimensionality manifolds for both overt and covert action-perception cycles at and between these locations |
| Hierarchical generative model: | The processes by which a coherent stream of experience is generated and remembered with respect to both action and perception | A functional and algorithmic understanding of the brain as a hybrid machine learning architecture for predictive control of an embodied-environmentally-embedded agent |
| Fisher information metric: | The amount of information gained when traveling along a trajectory given a probabilistic generative model, wherein autonomous functioning is realized by minimizing discrepancies between predicted goal and present estimated states ( | The amount of neural activity that must be expended to achieve adaptive cybernetic functioning in a given context, including with respect to constructing and refining world models entailed by patterns of effective connectivity |
| Accumulation of map uncertainty: | Deviations between models and that which is represented due to uncertainty with respect to cognition and latent world states | Deviations between likely patterns of neuronal attractor dynamics and their ability to orchestrate either overt or covert action-perception cycles (i.e., behaving or imagining) for autonomous functioning; cybernetic (and potentially thermodynamic) entropy for nervous systems |
| Loop-closures: | Events in which a familiar location in feature space is encountered with high confidence | High degrees of converging mutually consistent activity from the H/E-S and non-H/E systems |
| Graph-relaxation: | Assimilation of novel information into existing schemas | Updating connectivity patterns to influence relative positioning of hippocampal place fields, potentially accompanied by largescale reductions in Hopfield energy |
| Node creation: | Accommodation of novel information | Creation of new place fields, involving various forms of (potentially neuromodulator-dependent) hippocampal plasticity, and/or establishment of new prefrontal attractors (i.e., patterns of canalized striatal-cortical loops) |
| Navigation: | Setting destinations in generalized space, which function as sources of prediction-error to be minimized through active inference; this may apply to the organism as a whole moving through (generalized) space, or to trajectories for parts of a system for which specific intentional control is warranted (e.g., directed ocular foveations or grasping/pointing movements), including with respect to spaces of a conceptual variety (e.g., spatialized time) | Predictive sweeps of activity across place fields from hippocampal maps (cf. successor representations), which can orchestrate largescale cortical attracting states (cf. equilibrium points) and thereby drive both system-internal self-organization (i.e., perceptual inference, imagination, and learning) and overt enaction, which in turn creates new sources of information to shape subsequent H/E-S dynamics |
Please note, these cross-domain mappings are neither meant to be exhaustive nor definitive, but are instead intended to point in the direction of what a G-SLAM perspective might look like if more fully developed.
Figure 1An overview of different map types, show-casing our robotics lab. Panel (A) gives an exact metric view of the room as drawn by an architect. Panel (B) shows the same map as a 2D grid map, to create this map from panel (A) the map was rasterized and untraversable terrain was filled into the granularity of a single raster cell. Pabel (C) shows the same room as an x, y, z mapping of red/green/blue values extracted from a RGBD camera. This 3D grid map was generated by moving the camera through the physical lab. Finally, panel (D) shows the lab as a sparse graph.
Figure 2The formation of an experience-map out of views and proprioceptive poses. Sensory observations first need to be integrated into views to be compared to existing experiences from the graph. The shown graph is embedded in a Cartesian reference frame extracted from the proprioceptive information.
Figure 3Overview of the hierarchical generative model. Highlighted in blue is the bottom-up sensory stream, and in pink the top-down prediction stream. As the agent moves about, it alternates between these two modes. On the one hand it will infer state information from the observations, and on the other hand it will predict future observations from inferred states.
Figure 4Different cases for illustrating the map updating procedure. For each case we show the map (top), pose (bottom right), and views (bottom left) in their own respective spaces. The current active map node is always indicated in red and the current pose or view value is the final one in the sequence. In case (A), the agent encounters a new experience which is not within the threshold boundary of both the poses and views, so a new node is inserted into the map. Case (B) demonstrates a loop-closure event, where both the pose and view are within their respective thresholds, blue indicating the area pose information demarcated by its threshold θ, pink indicating the area covered by the view threshold. If both view and pose are within the threshold boundary (blue and pink) of the next node (case C), the estimate is shifted to the next node, skipping the current node in the graph. Finally, case (D) shows a matching pose without a matching view, requiring a new node insertion in the map.
Figure 5(A) Metric map of our lab environment, with some example camera views at the marked locations. The views at different locations (i.e., 3 and 5 or 1 and 4) appear very similar, making this a hard environment for visual SLAM. Panels (B–D) show three possible mappings of the trajectory shown in red in panel (A). (B) With a well-tuned threshold θ⋆, our LatentSLAM algorithm recovers a topological map of the environment, clearly separating the four different aisles. (C) If the threshold is too stringent (θ ≪ θ⋆), loop-closure events are not detected, as every view is seen as unique, and the map becomes incorrect as proprioception errors (the main source of mapping errors) add up. (D) When the threshold is too relaxed (θ ≫ θ⋆), similar looking aisles are mapped onto each other due to false positive loop-closures.
Figure 6A model of hippocampally-orchestrated imaginative planning and action selection via generalized navigation. Goal-oriented action sequences are depicted with respect to relevant neural processes. The hippocampal system provides (a) organization of cortical attracting states into value-canalized spatiotemporal trajectories, (b) stabilization of ensembles via theta-mediated cross-frequency phase coupling, and (c) goal-oriented cognition and behavior via contrasting (not depicted) sensed and imagined states. Hippocampal trajectories are shaped according to whichever paths are expected to result in more positively valanced outcomes (cf. reward prediction errors). The expected value associated with navigating to different portions of (potentially abstract) space is informed via coupling with similarly spatiotemporally-organized value representations (red shaded hexagons) in vmPFC and associated systems. As chained patterns of activity progress across hippocampal place fields (red hexagons with variable degrees of shading), theta-synchronized frontal ensembles (yellow shading spreading towards the front of the brain) help to generate (via cross-frequency phase coupling) ensembles for directing attention, working memory, and overt enaction. Sensory updating of posterior cortices occurs at alpha frequencies (blue shading), so providing a basis for conscious perception and imagination. With respect to these integrated estimates of sensory states, hippocampal coupling at theta frequencies (yellow shading spreading towards the back of the brain) provides a basis for (a) episodic memory and replay, (b) novel imaginings, and (c) adjustment of neuronal activity selection via orchestrated contrasting between cortical ensembles. Abbreviations: nAC, nucleus accumbens; vmPFC, ventromedial prefrontal cortex; dmPFC, dorsomedial prefrontal cortex; SMA, supplementary motor area; Pre-SMA, presupplementary motor area; SEF, supplementary eye fields; PCC, posterior cingulate cortex; PMCs, posterior medial cortices; IPL, inferior parietal lobule. Reprinted with permission from Safron (2021b).