Literature DB >> 35755868

Backpropagation with biologically plausible spatiotemporal adjustment for training deep spiking neural networks.

Guobin Shen^1,2, Dongcheng Zhao¹, Yi Zeng^1,3,4,2,5.

Abstract

The spiking neural network (SNN) mimics the information-processing operation in the human brain. Directly applying backpropagation to the training of the SNN still has a performance gap compared with traditional deep neural networks. To address the problem, we propose a biologically plausible spatial adjustment that rethinks the relationship between membrane potential and spikes and realizes a reasonable adjustment of gradients to different time steps. It precisely controls the backpropagation of the error along the spatial dimension. Secondly, we propose a biologically plausible temporal adjustment to make the error propagate across the spikes in the temporal dimension, which overcomes the problem of the temporal dependency within a single spike period of traditional spiking neurons. We have verified our algorithm on several datasets, and the experimental results have shown that our algorithm greatly reduces network latency and energy consumption while also improving network performance.

Entities: Chemical

Keywords: SNN; backpropagation; biologically plausible spatial adjustment; biologically plausible temporal adjustment; low energy consumption; low latency; spiking neural network; surrogate gradient

Year: 2022 PMID： 35755868 PMCID： PMC9214320 DOI： 10.1016/j.patter.2022.100522

Source DB: PubMed Journal: Patterns (N Y) ISSN： 2666-3899

Introduction

Deep neural networks (DNNs) have achieved success in various research areas, such as object detection, visual tracking, face recognition, etc. However, they are still far away from the information-processing mechanisms of the human brain. Spiking neural networks (SNNs) are known as the third-generation artificial neural network. They have been widely used in many fields, such as semantic segmentation, visual explanations, privacy protection,, and object detection. The discrete spikes used to transmit information are more energy efficient and are more in line with the information-processing mechanism in the brain. Combined with neuromorphic computing, it promises to realize real intelligence. However, due to the complex neural dynamics and non-differential characteristics of SNNs, it is still a challenge to train SNNs efficiently. Existing SNN training methods can be roughly divided into three categories: the biologically plausible method, the conversion method, and the backpropagation-based method. The biologically plausible method, such as Hebbian learning rules and spike-timing-dependent plasticity (STDP), is mainly inspired by the synaptic learning rules in the human brain. The Hebbian theory believes that the connection between pre- and post-synaptic neurons will increase due to continuous and repetitive stimulation of pre-synaptic neurons. STDP is an extended Hebbian learning rule based on the temporal difference between pre- and post-synaptic neurons. Diehl et al. used the STDP learning rule and lateral inhibition in a two-layer SNN and achieved 95% accuracy on the MNIST dataset. Saeed et al. introduced a weight-sharing strategy and designed a spiking convolutional neural network. The weight was learned by the STDP layer-wisely. Kherapisheh et al. used the hand-crafted difference of Guassian (DoG) features as the input of the SNNs and trained the subsequently convolutional layer through STDP. These methods rely on the local activities of neighboring neurons to update network weights and lack the supervision of global signals. Although Zhao et al. designed a multi-layer SNN based on global feedback connections and local optimization learning rules (GLSNN), it still performs poorly when transplanted to some deep networks for some complex tasks. The conversion method is an alternative way to get high-performance SNNs. It first trains the well-performed DNNs, then converts the DNNs into SNNs with some additional adjustments.17, 18, 19, 20, 21 The analog values of DNNs are converted into the firing rates of SNNs. Although the conversion method makes the SNNs achieve performance close to the traditional DNNs, the simulation time is too long, which causes the network to have poor real-time performance and high energy consumption. Also, the conversion methods rely highly on the well-trained DNNs and do not take advantage of the temporal information of SNNs. The success of deep learning depends heavily on the proposal of the backpropagation algorithm. Several studies provide evidence for backpropagation in the brain. The feedback connections may make predictions of activities of low-level brain areas,22, 23, 24, 25 and the biological neurons will backpropagate the action potentials to provide crucial signals for synaptic plasticity.26, 27, 28, 29 Lillicrap et al. argued that the differences with the feedforward and feedback neural activities may locally approximate the error signals in backpropagation. Researchers in SNN domains also introduced the backpropagation algorithm into the optimization of SNNs with the surrogate-gradient method.31, 32, 33, 34 Surrogate gradient helps SNNs perform backpropagation through time (BPTT) so that SNNs can be adopted to larger-scale network structures, such as VGG, ResNet, etc., and perform better on more complex datasets. However, directly applying the surrogate gradient into the training of SNNs may lead to some problems. First, the surrogate gradient obtains the gradient by smoothing the spike firing function. Neurons with membrane potential around the threshold will participate in the backpropagation. As a result, the neurons that do not emit spikes may participate in weight updating, significantly increasing the network’s energy consumption. Second, the spiking neuron will reset to the resting potential after the spike is emitted. The reset operation will cut off the error along the temporal dimension during the backpropagation so that errors cannot propagate across spikes, which significantly weakens the temporal dependence of the SNNs. To address the problems mentioned above, we introduced a biologically plausible spatiotemporal adjustment to improve the backpropagation training of SNNs, which can be summarized as follows: We study the influence of the surrogate gradient on the spatial dimension of the SNNs, rethink the relationship between the neuron membrane potential and the spikes, and propose a more biologically plausible spatial adjustment (BPSA) to help regulate spike activities. We study the limitations of the surrogate gradient in the temporal dimension and introduce a more biologically plausible temporal adjustment (BPTA), which enables the SNNs to propagate errors across the spikes, enhancing the temporal dependence of the SNNs. We conduct experiments on several commonly used datasets. For the static datasets MNIST, CIFAR10, and CIFAR100, we get remarkable performance compared with other state-of-the-art SNNs. To the best of our knowledge, we have reached state-of-the-art performance for the neuromorphic datasets N-MNIST, DVS-CIFAR10, and DVS-Gesture. For the Google Speech Commands dataset, we have reached comparable performance with other artificial neural networks designed for speech recognition. Moreover, our method dramatically reduces energy consumption and latency through analysis compared with other state-of-the-art SNNs.

Results

In this section, we conduct experiments using the PyTorch framework with NVIDIA A100 graphic processing unit (GPU). The network weights are initialized with the default method of PyTorch. We use the AdamW algorithm as the optimizer, the learning rate is set with 1 × 10-3, and the same learning rate control strategy as in SGDR is used. The same method in temporal spike sequence-learning backpropagation (TSSL-BP) is used to warm up the model. The membrane potential threshold of the neuron is set to 0.5, the membrane potential decay constant , and the default simulation duration T is set to 16. The training epochs are set to 300. The α in Equation 10 is set to 0.2. First, we conduct experiments on the static MNIST, CIFAR10, and CIFAR100 datasets. To further illustrate the superiority of our algorithm, we also conduct experiments on the neuromorphic datasets N-MNIST, DVS-Gesture, and DVS-CIFAR10. And to demonstrate the adaptability of our algorithm in other domains, we conduct experiments on the speech-recognition dataset Google Speech Commands. For the static datasets, we use the direct input encoding used in Wu et al. as well as the voting strategy. For the neuromorphic dataset, we use the same data preprocessing strategy used in SpikingJelly. For different datasets, we designed three different network structures to adapt to different sizes and complexities. The small network is 128C3-MP2-128C3-256C3-MP2-2048FC-DP-10Voting, the middle is 128C3-MP2-128C3-MP2-256C3-MP2-512C3-AP4-512FC-10Voting, and the large is 128C3-128C3-MP2-128C3-MP2-256C3-MP2-512C3-MP2-1024C3-AP4-DP-1024FC-10Voting. AP denotes the average-pooling operation, MP denotes max-pooling operation, DP denotes neuron dropout, and C denotes the Conv-BN-ReLU-LIF operation.

Static datasets

MNIST is one of the most common classification datasets in the deep-learning domain, with 60,000 training datasets and 10,000 test datasets. The samples in the datasets are 28 28 gray-scale images representing handwritten numbers from 0 to 9, respectively. We use the small structure for the evaluation. The CIFAR10 dataset is more challenging for most existing SNNs. The training set has 50,000 samples, and the test set has 10,000 samples. The dataset is a 32 32 color dataset. A deeper network will achieve better performance. Hence, we adopt the middle structure to conduct the experiment. CIFAR100 is a more challenging version than CIFAR10; it has 100 categories, and each category has only 600 samples: 500 for training and 100 for testing. The network structure is the same with CIFAR10. Experimental results are compared with several deep SNN models, including conversion and BP based, as shown in Table 1.

Table 1

Classification accuracy on MNIST, CIFAR10, and CIFAR100 datasets

Models	Training method	MNIST	CIFAR10	CIFAR100
Spiking CNN⁴⁴	conversion	–	82.95	–
BackRes⁴⁵	BP	–	84.98	–
ContinueSNN⁴⁶	conversion	99.44	90.85	–
Spike-Norm¹⁹	conversion	–	91.55	–
STBP³¹	BP	99.42	50.7	–
HM2BP³³	BP	99.49	–	–
LISNN⁴⁷	BP	99.5	–	–
BNTT⁴⁸	BP	–	90.5	66.6
STBP NeuNorm³²	BP	–	90.53
BackEISNN⁴⁹	BP	99.67	90.93	–
SBPSNN⁴³	BP	99.59	90.95	–
TSSL-BP³⁴	BP	99.53	91.41	–
ST-RSBP⁵⁰	BP	99.62	–	–
RNL⁵¹	conversion	99.51	93.45	75.1
SNASNet-Fw ⁵²	NAS + BP	–	93.64	70.06
SNASNet-Bw ⁵²	NAS + BP	–	94.12	73.04
Our method	BP	99.67	92.15	68.28
Our method ResNet34	BP	–	94.51	69.32

Classification accuracy on MNIST, CIFAR10, and CIFAR100 datasets The spatiotemporal BP (STBP) NeuNorm is the STBP method with the neuron norm. For the normal network structures we set, our network achieves comparable performance with other SNN algorithms. Also, in order to illustrate the adaptability of our algorithm to deeper networks, we tested it based on the network structure ResNet34. As can be seen in the Table 1, for the CIFAR10 dataset, our network has reached state-of-the-art performance compared with other famous SNNs, whether based on BP or conversion. For the CIFAR100 dataset, although our network still has a little gap compared with RNL and SASNet, the RNL algorithm directly converts the well-trained DNNs to SNNs, while SNASNet searches a better network structure based on neural architecture search (NAS).

Neuromorphic datasets

To better illustrate our spatiotemporal adjustment, we conduct experiments on the neuromorphic datasets N-MNIST, DVS-Gesture, and DVS-CIFAR10. N-MNIST is the neuromorphic version of MNIST. The dynamic version sensor (DVS) is put in front of the static images on a computer screen. The images shift due to the DVS moving in the direction in three sides of the isosceles triangle in turn, and the two-channel spike event (on and off) is collected. DVS-Gesture is a real-time gesture-recognition dataset reported by DVS. The dataset has 11 hand gestures such as hand clips, arm rolls, etc., collected from 29 individuals under three illumination conditions. DVS-CIFAR10 is a neuromorphic version converted from the CIFAR10 dataset. 10,000 frame-based images are converted into 10,000 event streams with DVS. For N-MNIST, we use the middle structure, and for the DVS-Gesture and DVS-CIFAR10, which are more complex, we use the large structure. As can be seen in Table 2, for the N-MNSIT dataset, our method has surpassed STBP by 0.3%; even with the introduction of NeuNorm, our work still performs better than them. For the more complex gesture dataset, our model surpasses the latest STBP-tdBN by 2% and LMCSNN by 1.4%. Our model has reached state-of-the-art performance compared with other current famous SNNs. For the DVS-CIFAR10 dataset, compared with the latest STBP-tdBN, we surpassed them by nearly 11%. For LMCSNN, which make many parameters in the leaky integrate-and-fire (LIF) spiking neurons learnable, we also surpass them by 4%. Our method has achieved state-of-the-art performance for the DVS-CIFAR10 dataset.

Table 2

Classification accuracy on N-MNIST, DVS-Gesture, and DVS-CIFAR10 datasets

Models	Method	N-MNIST	DVS-Gesture	DVS-CIFAR10
HM2-BP ³³	BP	98.88	–	–
SLAYER ⁵³	BP	99.2	93.64	–
TSSL-BP 30 ³⁴	BP	99.28	–	–
IIRSNN ⁵⁴	BP	99.28	–	–
TSSL-BP 100 ³⁴	BP	99.4	–	–
STBP ³¹	BP	99.44	–	–
LISNN ⁴⁷	BP	99.45	–	–
STBP NeuNorm ³²	BP	99.53	–	60.5
BNTT ⁴⁸	BP	–	–	63.2
SALT ⁵⁵	BP	–	–	67.1
STBP-tdBN ⁵⁶	BP	–	96.87	67.8
LMCSNN ⁵⁷	BP	99.61	97.57	74.8
BackEISNN ⁴⁹	BP	99.57	–	–
Our method	BP	99.71	98.96	78.95

Classification accuracy on N-MNIST, DVS-Gesture, and DVS-CIFAR10 datasets

Speech-recognition dataset

To verify the performance of our algorithm in other domains, we validate the proposed method on the Google Speech Commands dataset. There are two versions of this dataset, and the second version is used for testing. There are 105,000 utterances in 35 categories, and each utterance is 1 s long. The two training datasets are rebalanced by repeating random samples to make the number of samples the same in each class. As can be seen in Table 3, even compared with the artificial neural networks designed for speech recognition, our algorithm still shows comparable performance.

Table 3

Classification accuracy on Google Speech Commands dataset

Models	Method	Accuracy
Sample-level ⁵⁸	DNN	92.53
Attention RNN ⁵⁹	DNN	93.9
Sample-level + SE ⁶⁰	DNN	93.95
Harmonic filters ⁶¹	DNN	96.39
Our method	SNN	94.2

Classification accuracy on Google Speech Commands dataset

Conclusion

In this paper, first, we analyze the existing problems in the SNNs trained with BP. We find that the current setting will cause the earlier spiking neurons repeat participating in the gradient calculation of the network, making a more significant influence on the network weight. The BPTT algorithm on the SNNs only propagates errors backward in a single-spike period. The temporal dependence between spikes will be truncated. By introducing the biologically plausible spatial adjustment, it will consider the spikes generated by the membrane potential of different strengths, which will have different effects on the parameter update during the backpropagation process. In addition, the biologically plausible temporal adjustment is introduced, and it considers the backpropagation across the spikes. We have achieved remarkable performance on MNIST, CIFAR10, CIFAR100, and Google Speech Commands datasets and achieved the current best performance on N-MNIST, DVS-Gesture, and DVS-CIFAR10 datasets. By analyzing the energy consumption and latency of the SNNs, we find that the BPSAs and BPTAs significantly reduces energy consumption and latency while improving performance.

Discussion

In this section, firstly, we conduct the ablation study to the BPSA and BPTA mentioned above and analyze the contribution of each module. Secondly, we explore the energy consumption of the SNNs for these adjustments. Thirdly, we discuss the latency of the SNNs affected by these adjustments. Finally, we give the limitations of our algorithm and future work. Through the analysis, it is fully illustrated that the above two adjustments can make the behavior of the spiking neurons more stable and establish a better performance while reducing network latency and energy consumption.

Ablation study

We conduct the ablation study on the neuromorphic datasets DVS-Gesture and DVS-CIFAR10 due to the more complex spatial structure and stronger temporal information, which will fully illustrate our adjustments’ importance. We use Lillicrap et al. as our baseline and then continue to add the BPSA and BPTA. As can be seen in Table 4, with the introduction of the two adjustments, the performance of the network is gradually improved, among which the spatial adjustment brings more significant improvement.

Table 4

The ablation study of the two adjustments on DVS-Gesture and DVS-CIFAR10 datasets

	Baseline	BPSA	BPSA + BPTA
DVS-Gesture	93.92	97.56	98.96
DVS-CIFAR10	71.40	75.30	78.95

The ablation study of the two adjustments on DVS-Gesture and DVS-CIFAR10 datasets We also give the test curves of the DVS-Gesture dataset. As shown in Figure 1, with the number of epochs increasing, the accuracy of the model with biologically plausible spatiotemporal adjustment fluctuates less. Because with the introduction of the two adjustments, the firing pattern of neurons is more stable, making the model more robust to more minor parameter changes. Meanwhile, a reasonable gradient allocation strategy in the BP improves the model’s generalization performance and avoids overfitting to a certain extent.

Figure 1

The test accuracy curve on DVS-Gesture of our method and the baseline

Energy-efficiency study

To illustrate the energy efficiency of our algorithm, we visualize the firing frequency of different layers in the MNIST experiment. As can be seen from the Figure 2, due to the biologically plausible spatiotemporal adjustment, our method exhibits an extremely low firing rate, especially in the initial convolutional layers.

Figure 2

The firing frequency of different convolutional layers on MNIST of our method and the baseline

The firing frequency of different convolutional layers on MNIST of our method and the baseline We compare the accuracy and energy efficiency of the SNNs trained by the method used in Wu et al., the model we propose, and the artificial neural networks (ANNs) using the same network structure and network parameters. Most operations in ANNs are multiply accumulate (MAC), while in SNNs, the spikes transmitted in the network are sparse, and the spikes are integrated into the membrane potential. As a result, most operations in SNNs are accumulate (AC) operations. We calculate the energy consumption of the SNN by multiplying floating-point operations (FLOPS) and the energy consumption of MAC and AC operations. We use the same energy-efficiency calculations as in Chakraborty et al., and the computation details can be seen in Equation 1. As can be seen in Table 5, our method has a lower firing rate and higher energy efficiency. The training method of the SNNs proposed in this paper distributes the gradient more reasonably along the spatial and temporal dimensions, avoiding the problem that the earlier spiking neurons would have a more significant influence on the network parameters. The cross-spikes propagation will also enhance the temporal dependence of the SNNs. Therefore, the method proposed in this paper achieves lower network power consumption while maintaining a higher accuracy.

Table 5

The energy-efficiency study of our model with baseline on different datasets

Dataset	Accuracy (%)	Firing rate	EE = EANNESNN (×)
MNIST	99.58/99.42	0.082/0.183	35.1/15.7
N-MNIST	99.61/99.32	0.097/0.176	29.6/16.3
CIFAR10	92.33/89.49	0.108/0.214	26.6/13.4
DVS-Gesture	98.26/93.92	0.083/0.165	34.6/17.4
DVS-CIFAR10	77.76/71.40	0.097/0.177	29.5/16.2

Represented as baseline/our method.

The energy-efficiency study of our model with baseline on different datasets Represented as baseline/our method.

Latency study

The latency of the SNNs is one of the main problems that restricts the development of SNNs. The spiking neurons need to accumulate membrane potential, and once they reach the threshold, they fire spikes and transmit information. Therefore, SNNs often require a long simulation time to achieve higher performance. Here, we study the influence of different simulation lengths on the network performance. As shown in the Figure 3, when our adjustments are not introduced, when the simulation time is reduced, the test curve of the network is not very smooth, that is, the network needs a long simulation time to converge. As can be seen in Table 6, with the introduction of the two adjustments, our training method still achieves high accuracy while reducing the simulation time. The low latency of our approach further lays the foundation for the practical application of SNNs.

Figure 3

The test accuracy of different simulation lengths on DVS-Gesture dataset with our method and the baseline

Table 6

The test accuracy on DVS-Gesture dataset of different simulation lengths of our method and the baseline

	T = 32	T = 16	T = 8	T = 4
BPSA + BPTA	98.27	98.26	96.18	92.01
BPSA	96.53	97.56	94.44	89.58
Baseline	95.49	93.92	84.03	73.96

The test accuracy of different simulation lengths on DVS-Gesture dataset with our method and the baseline The test accuracy on DVS-Gesture dataset of different simulation lengths of our method and the baseline

Limitations of the study

In this paper, through the analysis of the training of the BP-based SNN, we find that neurons that do not generate spikes will still participate in the update of network weights. Also, the error signals along the temporal dimension cannot propagate across the spikes due to the reset operation. By introducing the BPSA and BPTA mechanisms, our network is more consistent with the brain in terms of weight update, and the energy consumption and latency of the SNN are greatly reduced. However, there is no independent module in the brain specially designed for the BP pathway. In future work, we will explore more biologically plausible learning methods to train SNNs with high performance and robustness.

Experimental procedures

Resource availability

Lead contact

Further information and requests for resources should be directed to and will be fulfilled by the lead contact, Yi Zeng (yi.zeng@ia.ac.cn).

Materials availability

This study did not generate new unique materials.

Data and code availability

All original code has been deposited at https://github.com/Brain-Inspired-Cognitive-Engine/BP-STA under https://doi.org/10.5281/zenodo.6489856 and is publicly available as of the date of publication.

Spiking-neuron model

Many spiking-neuron models with biological neural characteristics have been proposed in recent years, and the LIF model is adopted in most common neuron models in deep SNNs. The LIF neurons continuously accumulate the membrane potential and emit spikes once they reach the threshold. We give a detailed description of the LIF neuron models. As shown in Equation 2, the membrane potential of the neuron changes dynamically with the input current. denotes the input current, which is composed of input spikes. R is the membrane resistance, and is the synaptic time constant. When the membrane potential is greater than the threshold , the neuron will spike and be reset to . Without loss of generality, we set the reset potential , . To facilitate the calculation and simulation, we convert Equation 2 into a discrete form with Euler method with so that we can get The input can be obtained from the pre-synaptic spikes , is the number of neurons in the layer, then we can get , and the function g is the threshold function. is the synaptic weight from the layer from neuron j to neuron i. denotes the neuron j spikes in layer at time t.

Spatiotemporal characteristics of SNNs

The discontinuity of the spike firing function makes it challenging to apply the BP directly to the training of SNNs. In recent years, surrogate gradient has been proposed to replace the discontinuous gradient with a smooth gradient function to enable the SNNs to conduct BP in the spatial and temporal domains. Here, we use the mean average firing rates of the last layer to approximate the classification label and train the network through the mean squared error (MSE): T denotes the simulation length, denotes the real labels, and the denotes the output at time t. By applying chain rule, we can obtain the gradient with respect to weight: denotes the derivative with respect to o in the layer at time step t and can be derived from the layer (spatial) and time step (temporal). As can be seen in Equation 6 and Figure 4, the traditional surrogate-gradient method will calculate the gradient around the threshold, even if the spiking neurons do not emit spikes in the forward process. This will cause a large number of neurons that do not emit spikes to participate in the parameter update, increasing network’s energy consumption. Also, as can be seen in Figure 4, for the neuron , it will participate in the weight update repeatedly according to the chain rule, and the earlier spiking moment will have a larger influence on the weight update compared with . While in neurophysiology, the farther away the spiking activity is from the current moment, the smaller the effect.

Figure 4

The forward and backward process of spiking neural networks

The dotted lines of different colors indicate the impact on the network at different time steps. The earlier spiking node will have more influence on the parameter update.

The forward and backward process of spiking neural networks The dotted lines of different colors indicate the impact on the network at different time steps. The earlier spiking node will have more influence on the parameter update. For an SSN trained with BP, the temporal dependence mainly comes from accumulating membrane potential over time. As a result, the backward process for the temporal dimension can be written as Since the spiking neurons will reset to the resting potential after reaching the threshold, that is to say that will have no relationship with , and the temporal dependence will no longer exist, as shown in Figure 5.

Figure 5

The temporal backpropagation of LIF neurons

The information can only propagate within a single-spike period and cannot propagate cross spikes.

The temporal backpropagation of LIF neurons The information can only propagate within a single-spike period and cannot propagate cross spikes. To tackle the problems mentioned above, we propose the BPSA in which the neurons along with the hierarchical layers that emit spikes will participate in the weight update. Also, we propose the BPTA to help the errors transmit to the initial time step without being clipped.

BPSA

The membrane potential of spiking neurons changes as a process of information accumulation. After the neurons have accumulated enough information, they will send the information to the post-synaptic neurons in the form of spikes. As a result, the binary spikes can be regarded as a normalization of the information contained in the membrane potential. For the BP process, it is more reasonable to only calculate the gradient of the neuron at the moment of spiking to the membrane potential. We propose a BPSA to improve the BP-based training SNNs. When the membrane potential does not reach the threshold, we will clip the gradient of the spikes to the membrane potential to avoid the problems of repeated updates at an earlier time, as in Figure 4. When the membrane potential reaches the threshold, we normalize the membrane potential and spread the information in spikes. Then, the derivative of the spikes concerning the membrane potential can be expressed as This method considers the influence of the spikes generated by the membrane potential of different strengths on the parameter update. For a spike excited by larger membrane potential, there will be a minor optimization step for the model parameters in the BP process to ensure the stability of the spikes. The spikes excited by the membrane potential near the spike threshold will have a more significant impact on the model parameters, allowing the model to quickly push the membrane potential close to the threshold away to obtain more stable spikes.

BPTA

In biological neurons, the spike that the neuron fires will affect the subsequent spikes of the neuron. When directly using the BP algorithm to optimize the parameters of the SNNs, the gradient of the loss function to the neuron output will only be propagated from the time the neuron was last excited to the present and will not cross the spikes as shown in Equation 8 and Figure 5. So, the influence between spikes will not be considered in the temporal dimension. Then, we propose a BPTA cross the spikes. Considering that the temporal dependence disappears during the BP process, we add the residual connection between spikes during the backward pathway, as shown in Figure 6. The influence to control the error transfer from time step to t is controlled by the residual factor α. The temporal feedback process can be written as

Figure 6

The temporal residual pathway helps the error transfer from time step to time step t

The temporal residual pathway helps the error transfer from time step to time step t As can be seen in Equation 10, when the neurons do not emit a spike at time t, , which is the same with the traditional BP algorithm. However, when the neuron fires a spike at time t, then , the temporal dependence can be written as . With the introduction of the BPSAs and BPTAs, the influence of different spikes becomes more reasonable, and the temporal residual backward pathway enables it to propagate errors over spikes.

25 in total