| Literature DB >> 32236692 |
Joseph D Monaco1, Grace M Hwang2, Kevin M Schultz2, Kechen Zhang3.
Abstract
Neurobiological theories of spatial cognition developed with respect to recording data from relatively small and/or simplistic environments compared to animals' natural habitats. It has been unclear how to extend theoretical models to large or complex spaces. Complementarily, in autonomous systems technology, applications have been growing for distributed control methods that scale to large numbers of low-footprint mobile platforms. Animals and many-robot groups must solve common problems of navigating complex and uncertain environments. Here, we introduce the NeuroSwarms control framework to investigate whether adaptive, autonomous swarm control of minimal artificial agents can be achieved by direct analogy to neural circuits of rodent spatial cognition. NeuroSwarms analogizes agents to neurons and swarming groups to recurrent networks. We implemented neuron-like agent interactions in which mutually visible agents operate as if they were reciprocally connected place cells in an attractor network. We attributed a phase state to agents to enable patterns of oscillatory synchronization similar to hippocampal models of theta-rhythmic (5-12 Hz) sequence generation. We demonstrate that multi-agent swarming and reward-approach dynamics can be expressed as a mobile form of Hebbian learning and that NeuroSwarms supports a single-entity paradigm that directly informs theoretical models of animal cognition. We present emergent behaviors including phase-organized rings and trajectory sequences that interact with environmental cues and geometry in large, fragmented mazes. Thus, NeuroSwarms is a model artificial spatial system that integrates autonomous control and theoretical neuroscience to potentially uncover common principles to advance both domains.Entities:
Keywords: Emergence; Multi-robot groups; Oscillations; Place cells; Spatial navigation; Swarming
Mesh:
Year: 2020 PMID: 32236692 PMCID: PMC7183509 DOI: 10.1007/s00422-020-00823-z
Source DB: PubMed Journal: Biol Cybern ISSN: 0340-1200 Impact factor: 2.086
Fig. 1Conceptual schematic and theoretical neuroscientific inspiration for the NeuroSwarms controller. a An artificial spatial system of mobile virtual or robotic agents communicate over sparse recurrent channels (bottom) just as spatial neurons in biological neural circuits produce reverberating activity patterns that reflect energy minima in the dynamical state-space of the system (e.g., fixed-point attractors; top; adapted from Knierim and Zhang 2012). b Example simulation of the spatial self-organization of an activity bump on an attractor map. In an attractor map network, the environment is represented by a continuum of locations with overlapping place fields, leading to network connectivity that produces self-reinforcing spatial activity patterns. Adapted from Zhang (1996). c Schematic of a minimal model of temporal-phase coding in which an excitatory external input (green) is rhythmically modulated by a continuous inhibitory oscillation (blue) such as the hippocampal theta rhythm. Adapted from Monaco et al. (2019a) as permitted by the CC-BY 4.0 International License (creativecommons.org/licenses/by/4.0/) (color figure online)
Parameters, default values, and descriptions (with units) for the NeuroSwarms controller implementation
| 0.01 | s, Integration time step of simulation | |
| duration | 180.0 | s, Total simulation time |
| 300 | No. of physical agents (multi-agent) | |
| 1 | No. of physical agents (single-entity) | |
| 300 | No. of internal fields (multi-agent) or | |
| virtual particles (single-entity) | ||
| 1.0 | Max. inter-agent visibility range | |
| 3e3 | ||
| 0.9 | Momentum coefficient of agent motion | |
| 0.3 | kg, Mean agent mass (multi-agent) | |
| 3.0 | kg, Agent mass (single-entity) | |
| 1.0 | Spatial scale of swarm interaction | |
| 1.0 | Spatial scale of reward interaction | |
| 1.0 | Learning rate for swarm connections | |
| 1.0 | Learning rate for reward connections | |
| 0.0 | cycles/s, Baseline oscillatory frequency | |
| 1.0 | cycles/s, Max. increase in oscillatory | |
| frequency due to neural activation | ||
| 0.4 | Gain of sensory cue inputs | |
| 0.2 | Gain of reward inputs | |
| 0.4 | Gain of swarming inputs | |
| 0.5 | s, Time constant of sensory cue inputs | |
| 0.5 | s, Time constant of reward inputs | |
| 0.1 | s, Time constant of swarming inputs | |
| 0.0 | points, Reward contact radius |
These parameter values are multiplicatively scaled to the notional environment size, defined in points as the radius of a disk with the same area as the set of allowable locations in the environment’s interior
Fig. 2Example post-initialization ( s) swarm states for NeuroSwarms simulations. (Left) A single-agent simulation in the ‘multi-reward’ arena, which contains 3 rewards (gold stars; northwest, southwest, southeast), 7 cues (purple shapes), and 3 circular regions, referred to as ‘spawn disks,’ in which random initial locations are chosen for the agents. White enclosed areas constitute the set of allowable locations for swarm agents; black regions constitute barriers and disallowed locations. Initial particle positions are sampled from the spawn disks and initial phases are random. Green circle in southwest: the single agent; dots: 300 virtual swarm particles with internal phase state indicated by color. (Right) A multi-agent simulation in the ‘hairpin’ maze, which contains 5 connected hallways, 3 rewards, 7 cues, and 4 spawn disks. Circles: 300 swarm agents with internal phase state indicated by color; reward (gold star) size is for visual differentiation only and has no effect in the model
Fig. 3Temporal evolution of swarming and single-entity approaches to rewards. a Three agent clusters were initially populated in the multi-reward arena (Supplementary Video 1). The internal place-field location of each agent is indicated by a small black dot (e.g., s, black arrow). Phase sorting is indicated by sequentially ordered colors of the circle markers representing agent positions. A reward-centered phase ring was created ( s) with a decreasing diameter over time ( s and s; magenta arrows). A phase-sorted line segment formed and moved around a corner ( s and s; blue arrows). NeuroSwarms parameters: , , , ; Table 1. b A single-entity agent (larger green circle with green arrow) was guided by virtual particles (phase-colored dots; Supplementary Video 2). Swarm particles formed phase sequences leading the agent from the southwest corner to the reward location in the southeast corner of the arena by s. NeuroSwarms parameters: , , , , ; Table 1. c Steplike patterns of particles (orange arrows) appeared near rewards that were occluded from the perspective of the single agent (green arrows) by corners in the environmental geometry (Supplementary Video 3). While the agent became ‘indecisive’ around s because it was pulled simultaneously in both directions, the agent ultimately found its way to the southeast reward by s. NeuroSwarms parameters: , , , , ; Table 1 (color figure online)
Fig. 4Single-entity reward-approach behavior with fixed or capturable rewards. The agent was initialized to the spawn disk in the southwest corner of the multi-reward arena. a A rare example in which the single agent (green circle) captured all three rewards when rewards were fixed (i.e., they remained attractive despite previous contact with the agent): southwest reward at 8.9 s, southeast reward at 33 s, and northwest reward at 160 s (Supplementary Video 4). Movie frames show the initial contacts with each reward (gold stars). NeuroSwarms parameters: , , , , ; Table 1. b With the same parameters as (a) but initialized with a different random seed, this final frame of a simulation shows the converged state after the agent was attracted to the southwest corner and remained there for the duration (Supplementary Video 5). The red ellipse highlights that the agent became stuck between two fixed-point attractors that formed through mutual phase-desynchronization. c With the identical parameters and random seed as (b), rewards were made to be ‘capturable’ at a minimum contact radius of points (Sect. 2.5; Supplementary Video 6). Thus, rewards ceased to be attractive locations once the agent made initial contact. The agent captured the southwest reward at 5 s, the southeast reward at 27 s, and the northwest reward at 60 s. White stars indicate captured rewards. NeuroSwarms parameters: , , , , , ; Table 1 (color figure online)
Fig. 5Dispersion of exploratory trajectories with capturable rewards. a Superimposed agent trajectories are shown from 40 single-entity simulations of 180 s duration in which the agent was initialized to the southwest corner (Sect. 2.6.1). With fixed (non-capturable) rewards, only 1 simulation (bottom, red trace) contacted all three rewards in the arena (see Fig. 4a) and there was minimal variance in the exploratory paths taken by the agent in the other simulations (black traces). The dense sampling of the northwest and southeast reward location indicates these were strong attractors for the agent. With increasing contact radii of 1, 4, 10, or 15 points (top), exploratory variance increased, the reward attractors became relatively weaker, and higher proportions of agent trajectories successfully visited all three rewards (red traces). NeuroSwarms parameters: , , , , . Gold stars: reward locations. b For 700 single-entity simulations with random initial agent locations and , histograms for each of the agent spawn locations (central, southeast, or southwest) display the time-to-capture profile of each of the three rewards. NeuroSwarms parameters same as the top right panel of (a) (color figure online)
Fig. 6Dynamics of a multi-agent swarm in a large hairpin maze. Example frames are shown for simulations with agents in a rectangular environment ( points including borders) partitioned into 5 hallways in a hairpin pattern. Three hallways contain rewards which are substantially occluded from the other maze sections. Emergent formations are indicated by arrows. a Frames from a pure swarming simulation, without reward or sensory cue influence (Supplementary Video 7). NeuroSwarms parameters: , , , , , , , ; Table 1. b Frames from a simulation with 1:1 swarm/cue input gains but no reward influence (Supplementary Video 8). NeuroSwarms parameters: , , , , , , , ; Table 1. c Frames from a simulation with equalized swarm, reward and cue input gains (Supplementary Video 9). NeuroSwarms parameters: , , , , , ; Table 1. d Multi-agent trajectories are shown from two 80 s simulations: fixed rewards (top) and capturable rewards with points (bottom). Compare with multi-reward arena simulations in Fig. 5a. NeuroSwarms parameters: , , , , , ; Table 1
Fig. 7Behavioral adaptability of multi-agent swarming in the hairpin maze. Across the first 60 s of simulation (frames are shown from Supplementary Video 10), a local cluster that initialized in a corridor without rewards transitioned from random swarming behaviors to directed navigation (magenta arrows). The transition occurred when agents passed a corner into line-of-sight of the reward in the adjacent corridor (between and s). Exploratory ring formations were driven by cue heterogeneity and swarming, whereas directed trajectory sequences were oriented by reward approach. NeuroSwarms parameters: , , , , , , ; Table 1 (color figure online)