| Literature DB >> 35495028 |
Yulong Yan1, Haoming Chu1, Yi Jin1, Yuxiang Huan1, Zhuo Zou1, Lirong Zheng1.
Abstract
The spiking neural network (SNN) is a possible pathway for low-power and energy-efficient processing and computing exploiting spiking-driven and sparsity features of biological systems. This article proposes a sparsity-driven SNN learning algorithm, namely backpropagation with sparsity regularization (BPSR), aiming to achieve improved spiking and synaptic sparsity. Backpropagation incorporating spiking regularization is utilized to minimize the spiking firing rate with guaranteed accuracy. Backpropagation realizes the temporal information capture and extends to the spiking recurrent layer to support brain-like structure learning. The rewiring mechanism with synaptic regularization is suggested to further mitigate the redundancy of the network structure. Rewiring based on weight and gradient regulates the pruning and growth of synapses. Experimental results demonstrate that the network learned by BPSR has synaptic sparsity and is highly similar to the biological system. It not only balances the accuracy and firing rate, but also facilitates SNN learning by suppressing the information redundancy. We evaluate the proposed BPSR on the visual dataset MNIST, N-MNIST, and CIFAR10, and further test it on the sensor dataset MIT-BIH and gas sensor. Results bespeak that our algorithm achieves comparable or superior accuracy compared to related works, with sparse spikes and synapses.Entities:
Keywords: backpropagation; sparsity regularization; spiking neural network; spiking sparsity; synaptic sparsity
Year: 2022 PMID: 35495028 PMCID: PMC9047717 DOI: 10.3389/fnins.2022.760298
Source DB: PubMed Journal: Front Neurosci ISSN: 1662-453X Impact factor: 5.152
Figure 1Spiking sparsity and synaptic sparsity facilitate the efficiency of SNN by reducing the number of synaptic operations.
Figure 2The structure of the SNN layer contains (A) the flat layer with only inter-layer synapses (w = 0), and (C) the recurrent layer with intra-layer synapses (w ≠ 0). The corresponding computational graphs are (B,D), respectively. The legends of arithmetic operations, neuron state variables, and gradients are marked in the lower right corner.
Comparison between spiking and synaptic regularization.
|
|
|
|
|
|---|---|---|---|
| Spiking | Reduce FR while ensuring accuracy |
| |
| Synaptic | Combine with rewiring for pruning | ∇ |
Figure 3Iterative calculation with linear algorithm complexity. At time T, the potential error comes from the direct error ε. At time T − 1, the potential error includes the direct error ε and the backpropagation of ε. At time T − 2, the influence of ε, ε, ε are taken into account through iterative calculation, which only requires one addition and multiplication.
Figure 4(A) The weight of synapse w controls the synaptic pruning. (B) The momentum of the synapse gradient m controls the synaptic growth.
The BPSR implementation of layer 𝕃n.
| |
| 1: |
| 2: |
| 3: |
| 4: |
| 5: |
| |
| 6: // Calculate gradient of potential by Equation (7) and (12). |
| 7: |
| 8: |
| 9: |
| 10: |
| 11: |
| 12: |
| |
| 13: M ← CalM( |
| 14: Prun ← CalPrun(W), Grow ← CalGrow(M, |
| 15: |
| 16: W : = Mask · W, |
| |
| 17: Update |
| 18: |
Figure 5The principle of (A) rate coding and (B) rank order coding, and the spike sequence of an MNIST image after coding.
Figure 6Visualization of LC sampling and each layer of SNN. (A) The structure of the SNN model. (B) The ECG signal and the input spikes after LC sampling. (C) Output spikes and corresponding FR response curves of 18 neurons in the recurrent layer. The coordinates of the spikes represent the occurrence time and the neuron index. The color indicates the impact on the prediction result, where green is positive and red is negative. Response curves are plotted channel-by-channel. The x-axis and y-axis are the input and output FR, respectively. The 18 neurons are classified according to the filter effect, and the corresponding neuron index is marked in gray number. The output spikes of the hidden layer are drawn on (D), and the predicted probability for the 5 ECG classes is shown on (E).
Figure 7(A) Runtime, (B) graphic memory overhead, (C) accuracy, and (D) convergence epoch of four learning algorithms are counted under the different number of hidden layer neurons . Panels (E–H) are the corresponding indicator under the different length of time window T.
Figure 8(A) The test accuracy and FR under different spiking regularization coefficients λ. (B) Accuracy and FR change law in the learning process. (C) Learning curves under different λ (after smoothing filtering). (D) The number of synapses and network structure changes in the recurrent layer. (E) The learning curve with and without rewiring mechanism (after smoothing filtering). (F) Significance profile of C. elegans nervous system and the gas sensor network.
SNN structures and hyper-parameters setup.
|
| |
|---|---|
| MNIST | 8 |
| N-MNIST | 4 |
| CIFAR10 |
|
| MIT-BIH | |
| Gas sensor | |
|
| |
| Potential threshold | |
| Leakage coefficient | τ = 0.5 (initial). |
| Coefficient of | α = 0.7 |
| Learning rate | CIFAR10: |
| Sparsity coefficient | CIFAR10: |
| Rewiring parameter | |
Comparison of different spiking models on MNIST dataset.
|
|
|
|
|
|
| |||
|---|---|---|---|---|---|---|---|---|
|
|
| |||||||
| Diehl et al. ( | Rate | × | MLP | 2.4M | 10.0K | 24.0+ | 6.3M | 98.6 |
| CNN | 1.4M | 14.7K | 7.5+ | 2.0M | 99.1 | |||
| Diehl and Cook ( | Rate | × | rMLP | 46.0M | 2.3K | 74.7M | 15.0M | 95.0 |
| Wu et al. ( | Rate | × | MLP | 0.6M | 6.7K | 78.9M | 2.6M | 98.89 |
| CNN | 1.4M | 41.4K | 162.3M | 5.1M | 99.42 | |||
| Yan et al. ( | Rank | × | rMLP | 0.3M | 392 | 17.3M | 84.5K | 97.3 |
| Tang et al. ( | Rank | × | CNN | 0.6M | ——— N/A | 90.2 | ||
| Comşa et al. ( | Rank | × | MLP | 0.3M | ——— N/A | 97.96 | ||
| Shi et al. ( | Rate | √ | MLP | 0.2M | ——— N/A | 94.05 | ||
| Guo et al. ( | Rate | √ | rMLP | 0.5M | ——— N/A | 88.71 | ||
| Liang et al. ( | Rank | √ | MLP | 0.4M | ——— N/A | 96 | ||
|
| Rank | × | CNN |
|
|
|
|
|
| × | rCNN |
|
|
|
| |||
| √ | rCNN |
|
|
|
| |||
The result is estimated based on the open source code.
Data is not available (N/A) due to the lack of experimental result and source code. The bold values mark our metrics for this work.
Figure 9Accuracy, operations, normalized energy consumption, and parameter size of different networks. The area of the circle represents the storage overhead of the parameters. The y-coordinate of the center represents the network accuracy. The x-coordinate represents (A) the number of operations and (B) the energy consumption of each inference. The proportions of different operations are marked in (A). Additionally, the x-axis of (a) is folded and the x-axis of (B) is logarithmic.
Comparison of different spiking models on N-MNIST dataset.
|
|
|
|
| |
|---|---|---|---|---|
| Wu et al. ( | MLP | 30 | 1.9M | 98.78 |
| Jin et al. ( | MLP | N/A | 1.9M | 98.93 |
| Wu et al. ( | CNN | 30 | 202.4M | 99.53 |
| Vaila et al. ( | Mixed CNN + SVM | N/A | 0.98M | 98.32 |
| Kaiser et al. ( | CNN | 300 | 315.5M | 99.04 |
|
| CNN | 20 |
|
|
| rCNN |
|
|
Data is not available (N/A) due to the lack of result reports. The bold values mark our metrics for this work.
Comparison of different spiking models on CIFAR10 dataset.
|
|
|
|
|
| |
|---|---|---|---|---|---|
| Cao et al. ( | 5-layer CNN | 400 | 5.7M | N/A | 77.43 |
| Wu et al. ( | 4-layer CNN | N/A | 2.9M | N/A | 50.7 |
| Wu et al. ( | 8-layer CNN | 12 | 519.8M | N/A | 90.53 |
| Sengupta et al. ( | VGG16 | 2500 | 315.5M | N/A | 91.55 |
| Allred et al. ( | LeNet5 | N/A | 0.66M | 89.9K | 66.45 |
|
| 11-layer ResNet | 12 |
|
| |
| 8 |
|
Data is not available (N/A) due to the lack of result reports. The bold values mark our metrics for this work.
Comparison of different spiking models on MIT-BIH dataset.
|
|
|
|
| |
|---|---|---|---|---|
| Kolağasioğlu ( | wavelet + rMLP | N/A | N/A | 95.5 (17 classes) |
| Corradi et al. ( | rMLP + SVM | 250 | 25.6K | 95.6 (18 classes) |
| Amirshahi and Hashemi ( | rMLP | 300 | 968.0K | 97.9 (4 classes) |
| Bauer et al. ( | rMLP | N/A | 34.8K | 97.3 (2 classes) |
| Wu et al. ( | GRU + MLP | N/A | 20.8K | 97.8 (5 classes) |
| Yan et al. ( | CNN | 180 | 184.3K | 90 (4 classes) |
|
| rMLP | 40 |
|
|
|
|
|
Data is not available (N/A) due to the lack of result reports.
Comparison of different models on gas senor dataset.
|
|
|
|
| |
|---|---|---|---|---|
| Vergara et al. ( | SVM | – | – | 87.14–96.55 |
| Imam and Cleland ( | EPL | 16 | 55.4K | 92 |
|
| rMLP | 16 |
|
|
− The indicator is not applicable.