| Literature DB >> 35692427 |
Dengyu Wu1, Xinping Yi2, Xiaowei Huang1.
Abstract
This article conforms to a recent trend of developing an energy-efficient Spiking Neural Network (SNN), which takes advantage of the sophisticated training regime of Convolutional Neural Network (CNN) and converts a well-trained CNN to an SNN. We observe that the existing CNN-to-SNN conversion algorithms may keep a certain amount of residual current in the spiking neurons in SNN, and the residual current may cause significant accuracy loss when inference time is short. To deal with this, we propose a unified framework to equalize the output of the convolutional or dense layer in CNN and the accumulated current in SNN, and maximally align the spiking rate of a neuron with its corresponding charge. This framework enables us to design a novel explicit current control (ECC) method for the CNN-to-SNN conversion which considers multiple objectives at the same time during the conversion, including accuracy, latency, and energy efficiency. We conduct an extensive set of experiments on different neural network architectures, e.g., VGG, ResNet, and DenseNet, to evaluate the resulting SNNs. The benchmark datasets include not only the image datasets such as CIFAR-10/100 and ImageNet but also the Dynamic Vision Sensor (DVS) image datasets such as DVS-CIFAR-10. The experimental results show the superior performance of our ECC method over the state-of-the-art.Entities:
Keywords: deep learning; deep neural networks (DNNs); event-driven neural network; spiking network conversion; spiking neural network (SNN)
Year: 2022 PMID: 35692427 PMCID: PMC9179229 DOI: 10.3389/fnins.2022.759900
Source DB: PubMed Journal: Front Neurosci ISSN: 1662-453X Impact factor: 5.152
Figure 1An illustrative diagram showing how SNNs process two different types of inputs and their performance comparison with CNNs. A regular image (left column)—taken from the camera—is preprocessed into a spike train (A), which then runs through the SNN in several timesteps (e.g., 128 timesteps as in the figure). A DVS input—taken from the event camera—can be represented directly as a spike train (B), and processed naturally by the SNN in several frames (e.g., 48 frames as in the figure). (C,D) show the SNN's performance with respect to the three objectives (accuracy, energy efficiency, and latency), compared to CNNs.
Comparison of key technical ingredients (HR, SR, WN, TB, TS, ECC) and workable layers (BN, MP, AP) with the state-of-the-art methods.
|
|
|
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|---|---|---|
| Cao et al. ( | √ | √ | |||||||
| Diehl et al. ( | √ | √ | √ | ||||||
| Rueckauer et al. ( | √ | √ | √ | √ | |||||
| Sengupta et al. ( | √ | √ | √ | ||||||
| Han et al. ( | √ | √ | √ | √ | |||||
| [This paper] | √ | √ | √ | √ | √ | √ |
HR, hard reset; SR, reset by subtraction, or soft reset; WN, weight normalization; TB, threshold balancing; TS, threshold scaling; ECC, explicit current control; BN, batch normalization; MP, max pooling; AP, average pooling.
As a contribution to this article, in Section 3.2, we show that both WN and TB are special cases of our ECC framework.
Among all methods, only those that can handle BN have bias terms in their pre-trained CNNs.
Figure 2Left: Our proposed CNN-to-SNN conversion for the n-th layer with a current normalization component and a thresholding mechanism. The activation in the CNN (Top) is used for current normalization in the SNN (Bottom). Right: The proposed Thresholding for Residual Elimination (TRE) and the illustration of error reduction by TRE.
Figure 3(A) Accuracy and energy consumption (MOps) with respect to timesteps for CIFAR-10. (B) Accuracy and energy consumption (MOps) with respect to timesteps, for ImageNet (Top-1 Acc). (C) Accuracy loss and latency with respect to energy consumption (MOps), for CIFAR-10 and CIFAR-100. (D) Accuracy and quantisation error with respect to timesteps, for CIFAR-10.
Figure 4Accuracy and energy consumption (MOps) with respect to frames, between 2017-SNN and ECC-SNN, for CIFAR-10-DVS, and VGG-7.
Comparison of SNN accuracy, latency, and energy consumption (MOps), between direct training, 2017-SNN and ECC-SNN, for Cifar-10-DVS.
|
|
|
|
|
|
|---|---|---|---|---|
| Direct training (VGG7) | Wu et al., | 62.50 | - | - |
| 2017-SNN (DenseNet) | Kugele et al., | 65.61 | 60 | 1,551 |
| ECC-SNN (VGG16) | this paper | 71.20 |
| 66.79 |
| ECC-SNN (VGG7) | this paper |
|
|
|
Figure 5Contribution of CN, CMB, and TRE to the reduction of mean accuracy loss, for CIFAR-10 and VGG-16.
Figure 6Comparison of SNNs using different clipping methods, ReLU6 (Jacob et al., 2018; Lin et al., 2019), ReLU-CM (Yu et al., 2020), ReLU-SC* (Deng and Gu, 2021), for CIFAR-10 on VGG-16. (A) Normalized activation distribution of the first layer. (B) Accuracy with respect to timesteps. (C) Energy consumption (MOps) with respect to timesteps. *We use ReLU-SC to train an SNN with a fixed timestep (16T), as it does not need extra training.