| Literature DB >> 34177473 |
Feifei Zhao1,2, Yi Zeng1,2,3,4,5.
Abstract
Most neural networks need to predefine the network architecture empirically, which may cause over-fitting or under-fitting. Besides, a large number of parameters in a fully connected network leads to the prohibitively expensive computational cost and storage overhead, which makes the model hard to be deployed on mobile devices. Dynamically optimizing the network architecture by pruning unused synapses is a promising technique for solving this problem. Most existing pruning methods focus on reducing the redundancy of deep convolutional neural networks by pruning unimportant filters or weights, at the cost of accuracy drop. In this paper, we propose an effective brain-inspired synaptic pruning method to dynamically modulate the network architecture and simultaneously improve network performance. The proposed model is biologically inspired as it dynamically eliminates redundant connections based on the synaptic pruning rules used during the brain's development. Connections are pruned if they are not activated or less activated multiple times consecutively. Extensive experiments demonstrate the effectiveness of our method on classification tasks of different complexity with the MNIST, Fashion MNIST, and CIFAR-10 datasets. Experimental results reveal that even for a compact network, the proposed method can also remove up to 59-90% of the connections, with relative improvement in learning speed and accuracy.Entities:
Keywords: accelerating learning; compressing network; developmental neural network; optimizing network structure; synaptic pruning
Year: 2021 PMID: 34177473 PMCID: PMC8220807 DOI: 10.3389/fnsys.2021.620558
Source DB: PubMed Journal: Front Syst Neurosci ISSN: 1662-5137
Figure 1The synaptic pruning strategy of BSP algorithm.
Figure 2The detailed pruning process of BSP algorithm. The red connections represent the candidate pruned synapses.
Dropout rates for different numbers of training samples and network sizes for the MNIST dataset.
| 60,000 | 0 | 0.3 | 0.4 |
| 1,200 | 0 | 0.4 | 0.6 |
Comparison of test accuracy, improvement in learning speed, and network compression for 60,000 MNIST training samples.
| 91.76 | 95.61 | 96.15 | |
| – | 95.90 | 96.77 | |
| 92.32 | 95.94 | 96.84 | |
| 1.2188 | 2.76 | 1.67 | |
| – | 1.92 | 2.71 | |
The bold values mean the improvement of accuracy compared to the initial network.
Figure 3Test error as a function of the number of iterations when the number of neurons in the hidden layer was 100 (A) or 500 (B) for 60,000 MNIST training samples.
Comparison of test accuracy, improvement in learning speed, and network compression for 1,200 MNIST training samples.
| 82.55 | 85.05 | 86.53 | |
| – | 87.43 | 88.92 | |
| 82.74 | 87.5 | 87.69 | |
| 1.0357 | 3.83 | 5.47 | |
| – | 1.6 | 1.33 | |
The bold values mean the improvement of accuracy compared to the initial network.
Comparison of test accuracy, improvement in learning speed, and network compression for 60,000 Fashion MNIST training samples.
| 83.73 | 86.56 | 87.78 | |
| – | 88.34 | 89.08 | |
| 84.33 | 86.91 | 88.4 | |
| 1.0526 | 1.14 | 2.12 | |
| – | 2.19 | 2.65 | |
The bold values mean the improvement of accuracy compared to the initial network.
Comparison of test accuracy, improvement in learning speed, and network compression for 1,200 Fashion MNIST training samples.
| 76.15 | 77.97 | 79.13 | |
| – | 79.87 | 80.8 | |
| 75.5 | 79.25 | 79.75 | |
| 1.4174 | 2.56 | 1.95 | |
| – | 1.5 | 1.67 | |
The bold values mean the improvement of accuracy compared to the initial network.
Comparison of test accuracy, improvement in learning speed, and network compression for CIFAR-10 training samples.
| 37.6 | 46.81 | 51.62 | |
| 38.23 | 51.63 | 56.52 | |
| 39.08 | 48.68 | 53.55 | |
| 1.84 | 1.78 | 1.46 | |
| 4.5 | 1.33 | 1.57 | |
The bold values mean the improvement of accuracy compared to the initial network.
Figure 4Histograms showing the distribution of weights for the initial network (A), the dropout network (B), and our method (C).
The BSP algorithm.
| |
| 1: Initialize |
| 2: Calculate |
| 3: |
| 4: Forward computation from Equation (3); |
| 5: Backpropagation computation from Equation (4) and (5); |
| 6: Choosing the candidate pruned synapses |
| 7: |
| 8: Counting the number of consecutive times |
| 9: |
| 10: |
| 11: |
| 12: |
| 13: Pruning the least important synapses |
| 14: |