| Literature DB >> 31632262 |
Jacques Kaiser1, Michael Hoff1,2, Andreas Konle1, J Camilo Vasquez Tieck1, David Kappel2,3,4, Daniel Reichard1, Anand Subramoney2, Robert Legenstein2, Arne Roennau1, Wolfgang Maass2, Rüdiger Dillmann1.
Abstract
The endeavor to understand the brain involves multiple collaborating research fields. Classically, synaptic plasticity rules derived by theoretical neuroscientists are evaluated in isolation on pattern classification tasks. This contrasts with the biological brain which purpose is to control a body in closed-loop. This paper contributes to bringing the fields of computational neuroscience and robotics closer together by integrating open-source software components from these two fields. The resulting framework allows to evaluate the validity of biologically-plausibe plasticity models in closed-loop robotics environments. We demonstrate this framework to evaluate Synaptic Plasticity with Online REinforcement learning (SPORE), a reward-learning rule based on synaptic sampling, on two visuomotor tasks: reaching and lane following. We show that SPORE is capable of learning to perform policies within the course of simulated hours for both tasks. Provisional parameter explorations indicate that the learning rate and the temperature driving the stochastic processes that govern synaptic learning dynamics need to be regulated for performance improvements to be retained. We conclude by discussing the recent deep reinforcement learning techniques which would be beneficial to increase the functionality of SPORE on visuomotor tasks.Entities:
Keywords: neuromorphic vision; neurorobotics; reinforcement learning; spiking neural networks; synaptic plasticity
Year: 2019 PMID: 31632262 PMCID: PMC6786305 DOI: 10.3389/fnbot.2019.00081
Source DB: PubMed Journal: Front Neurorobot ISSN: 1662-5218 Impact factor: 2.650
NEST parameters.
| Time-step/resolution | 1 ms |
| Synapse update interval | 100 ms |
| (reaching) exploration noise | 35 Hz |
| (reaching) noise to exploration exc. | 750.0 |
| (reaching) visual to exploration inh. | |
| (reaching) exploration to motor exc. | 10.0 |
ROS-MUSIC parameters.
| MUSIC time-step | 1 ms…3 ms |
| DVS adapter time-step | 1 ms |
| Decoder time constant | 100 ms |
Figure 1Implementation of the embodied closed-loop evaluation of the reward-based learning rule SPORE. (Left) Our asynchronous framework based on open-source software components. The spiking network is implemented with the NEST neural simulator (Gewaltig and Diesmann, 2007), which communicates spikes with MUSIC (Ekeberg and Djurfeldt, 2008; Djurfeldt et al., 2010). The reward is streamed to all synapses in the spiking network learning with SPORE (Kappel et al., 2017). Spikes are encoded from address events and decoded to motor commands with ROS-MUSIC tool-chain adapters (Weidel et al., 2016). Address events are emitted by the DVS plugin (Kaiser et al., 2016) within the simulated robotic environment Gazebo (Koenig and Howard, 2004), which communicates with ROS (Quigley et al., 2009). (Right) Encoding visual information to spikes for the lane following experiment, see section 4.1.2 for more information. Address events (red and blue pixels on the rendered image) are downscaled and fed to visual neurons as spikes.
Figure 2Visualization of the setup for the two experiments. (Left) Reaching experiment. The goal of the task is to control the ball to the center of the plane. Visual input is provided by a DVS simulation above the plane looking downward. The ball is controlled with Cartesian velocity vectors. (Right) Lane following experiment. The goal of the task is to keep the vehicle on the right lane of the road. Visual input is provided by a DVS simulation attached to the vehicle looking forward to the road. The vehicle is controlled with steering angles.
SPORE parameters.
| Visual to motor exc. | |
| Visual to motor mul. | 10 |
| Temperature ( | 0.1 |
| Initial learning rate (β) | 1 × 10−7 |
| Learning rate decay (λ) | 8.5 × 10−5 |
| Integration time | 50 s |
| Max synaptic parameter (θ | 5.0 |
| Min synaptic parameter (θ | −2.0 |
| (reaching) episode length | 1 s |
| (lane following) episode length | 2 s |
Figure 3Results for the reaching task. (Left) Comparing the effect of different prior configurations on the overall learning performance. The results were averaged over eight trials. The performance is measured with the rate at which the target is reached (the ball moves to the center and is reset at a random position). (Right) Development of the synaptic weights over the course of learning for two trials: no prior (c = 0, top) and strong prior (c = 1, bottom). In both cases, the number of weak synaptic weights (below 0.07) increases significantly over time.
Figure 4Policy development for selected points in time in a single trial. On the top (A), the performance over time for a single, well-performing trial is depicted. The red lines indicate certain points in time, for which the policies are shown in (B–G). Each policy plot consists of a 2d-grid representing the DVS pixels. Hereby, every pixel contains a vector, which indicates the motion corresponding to the contribution of an event emitted by this pixel. The magnitude of the contribution (vector strength) is indicated by the outer pixel area. The inner circle color represents the assessment of the vector direction (angular correctness).
Figure 5Results for the lane following task with a medium prior (c = 0.25). (Left) Comparing the effect of annealing on the overall learning performance. The results were averaged over six trials. Without annealing, performance improvements are not retained and the network does not learn to perform the task. With annealing, the learning rate β decreases over time and performance improvements are retained. (Right) Development of the synaptic weights over the course of learning for a medium prior of c = 0.25 with annealing. The number of weak synaptic weights (below 0.07) increases from 41 to 231 after 1h of learning to 342 after 4 h of learning (out of 512 synapses in total).