| Literature DB >> 36061595 |
Guanlin Wu1, Dongchen Liang1, Shaotong Luan2, Ji Wang3.
Abstract
Recent years witness an increasing demand for using spiking neural networks (SNNs) to implement artificial intelligent systems. There is a demand of combining SNNs with reinforcement learning architectures to find an effective training method. Recently, temporal coding method has been proposed to train spiking neural networks while preserving the asynchronous nature of spiking neurons to preserve the asynchronous nature of SNNs. We propose a training method that enables temporal coding method in RL tasks. To tackle the problem of high sparsity of spikes, we introduce a self-incremental variable to push each spiking neuron to fire, which makes SNNs fully differentiable. In addition, an encoding method is proposed to solve the problem of information loss of temporal-coded inputs. The experimental results show that the SNNs trained by our proposed method can achieve comparable performance of the state-of-the-art artificial neural networks in benchmark tasks of reinforcement learning.Entities:
Keywords: asynchronous processing; fully differentiable; reinforcement learning; spiking neural networks; temporal coding
Year: 2022 PMID: 36061595 PMCID: PMC9428400 DOI: 10.3389/fnins.2022.877701
Source DB: PubMed Journal: Front Neurosci ISSN: 1662-453X Impact factor: 5.152
Back-propagation cases in RL tasks.
|
|
|
|
|
|---|---|---|---|
| Legal | Legal | Exist | Normal |
| INF | Legal | Equal to 0 | Stop |
| INF | INF | Equal to 0 | Stop |
| Legal | INF | Exist | Error |
Pseudocode of the forward pass in a feed-forward network with L layers.
| 1: |
| 2: |
| 3: |
| 4: |
| 5: |
| 6: |
| 7: |
| 8: |
| 9: |
| 10: |
Pseudocode of the get_causal_set function.
| 1: |
| 2: |
| 3: |
| 4: |
| 5: |
| 6: |
| 7: |
| 8: |
| 9: |
| 10: |
| 11: |
| 12: |
| 13: |
| 14: |
Figure 1Spike time variation of the output layer of CVTC. (A) An example of spike time in an episode shows the neurons' spike time for moving left and right of each step. (B) States of the car at step s1. The car is at the center of the field, and the pendulum is turning left. The spike time for moving right is high. (C) States of the car at step s2. The car is at the center of the field, and the pendulum is upright. The spike times are both low. (D) States of the car at step s3. The car is on the left of the field, and the pendulum is turning left. The spike times are both high.
Figure 2Example of the unbalanced-input problem in SNNs. (A) Network structure. (B) Input-output spike times when i3 = 1.0. (C) Input-output spike times when i3 = 1.3. Since we take the first activated output neuron as the network's output, it cannot distinguish the two different input patterns in this example. When we delayed i3 from 1.0 (B) to 1.3 (C), it only affected o2.
Pseudocode for encoding the input signals.
| 1: |
| 2: |
| 3: |
| 4: |
| 5: |
| 6: |
| 7: |
| 8: |
Hyperparameters for algorithms in experiments.
|
|
|
|
|
|---|---|---|---|
| Optimization algorithm | SGD (Amari, | Adam (Kingma and Ba, | Adam |
| Learning rate | 0.01–0.0001 | 0.001251 | 0.001 |
| Training batch size | 10 | 32 | 32 |
| Target network update frequency | 100 step | 1 episode | |
| Replay memory capacity | 1,000 | 200,000 | |
| Training batch size | 23.37 | −200 | −106.4 |
| γ | 0.99 | 0.99 | |
| ϵ | 1–0.1 | 1–0.00001 | |
|
| 20 | 15 | |
|
| 20 | 15 | |
| σ | 1.4 | 1.2 | |
| β | 0.1, 0.01, 0.001 | 0.001 | 0.001 |
Figure 3Results in MNIST task. (A) Training error. (B) Evaluation error. (C) Spike time distribution of TC. (D) Spike time distribution of CVTC.
Comparison of errors on MNIST task.
|
|
|
|
|
|---|---|---|---|
| Evaluate error (%) | 2.65 | 2.72 | 2.85 |
| Test error (%) | 2.67 | 2.72 | 2.85 |
Figure 4Training curves on CartPole task.
Comparison of performance on Gym basic tasks.
|
|
|
|
|---|---|---|
| DDQN | 195.95 ± 0.59 | −106.4 ± 1.05 |
| PPO | 198.57 ± 0.42 | −96.20 ± 0.46 |
| DDQN-SNN-CVTC (ours) | 180.19 ± 2.73 | −108.15 ± 2.1 |
| DDQN-SNN-TC | 17.89 ± 0.3 | −199 ± 0 |