| Literature DB >> 32148472 |
Xin Long1, XiangRong Zeng1, Zongcheng Ben1,2, Dianle Zhou3, Maojun Zhang1.
Abstract
The increase in sophistication of neural network models in recent years has exponentially expanded memory consumption and computational cost, thereby hindering their applications on ASIC, FPGA, and other mobile devices. Therefore, compressing and accelerating the neural networks are necessary. In this study, we introduce a novel strategy to train low-bit networks with weights and activations quantized by several bits and address two corresponding fundamental issues. One is to approximate activations through low-bit discretization for decreasing network computational cost and dot-product memory. The other is to specify weight quantization and update mechanism for discrete weights to avoid gradient mismatch. With quantized low-bit weights and activations, the costly full-precision operation will be replaced by shift operation. We evaluate the proposed method on common datasets, and results show that this method can dramatically compress the neural network with slight accuracy loss.Entities:
Mesh:
Year: 2020 PMID: 32148472 PMCID: PMC7049432 DOI: 10.1155/2020/7839064
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1Convolution operation pipeline. (a) General convolution operation without quantization of weight and activation. (b) Description of proposed method with weight and activation quantized by low-bit.
Expect error of different q2 values.
| Scheme |
|
|
|
|
|
|---|---|---|---|---|---|
|
| 0.4078 | 0.3298 | 0.2106 | 0.0825 | 0.0458 |
|
| 0.3298 | 0.2103 | 0.0795 | 0.0239 | 0.0443 |
|
| 0.2102 | 0.0791 | 0.0209 | 0.0223 | 0.0443 |
|
| 0.0790 | 0.0205 | 0.0193 | 0.0223 | 0.0443 |
|
| 0.0204 |
| 0.0193 | 0.0223 | 0.0443 |
|
|
|
| 0.0193 | 0.0223 | 0.0443 |
Figure 2Binominal choice of undefined states for wstate=±3.
Figure 3Comparison of accuracy with different combinations of quantized weights and activations. The horizontal axis shows the activation approximation bits and the vertical axis represents the quantization bits of network weight.
Test error comparison on multiple datasets.
| Method | Weight (bit) | Activation (bit) | MNIST | SVHN | CIFAR10 |
|---|---|---|---|---|---|
| BNN | 1 | 1 | 1.27 | 2.53 | 8.46 |
| BWN | 1 | 32 | 0.54 | — | 7.25 |
| TWN | 2 | 32 | 0.65 | — | 7.44 |
| DoReFa | 8 | 8 | — | 2.30 | — |
| Ours | 3 | 3 | 0.96 | 2.14 | 7.48 |
Accuracy comparison on CIFAR100.
| avb | BNN | XNOR | Ours |
|---|---|---|---|
| ResNet-34 | 48.81/78.32 | 53.28/81.29 | 61.33/87.22 |
| ResNet-50 | 52.07/81.60 | 59.20/85.32 | 62.92/88.65 |
Figure 4Comparison of accuracy with different combinations of quantized weights and activations. The horizontal axis shows the activation approximation bits and the vertical axis represents the quantization bits of network weight.
Accuracy comparison with quantization of first or last convolutional layer.
| CIFAR10/MNIST | BWN | BNN | Ours |
|---|---|---|---|
| + First − last | 92.37/99.37 | 91.40/98.66 | 92.08/98.86 |
| + First + last | 92.21/99.41 | 91.30/98.52 | 91.96/98.55 |
| − First + last | 92.52/99.38 | 91.47/98.71 | 92.52/98.75 |
| − First − last | 92.75/99.46 | 91.54/98.73 | 92.12/99.04 |
Sparsity of ResNet-18 on CIFAR10.
| Layers (weight tensors) | Full precision (1 − sparsity) (%) | Our method (1 − sparsity) (%) |
|---|---|---|
| Conv1 (64, 3, 3, 3) | 100 | 100 |
| Conv2 (64, 64, 3, 3) | 100 | 85.32 |
| Conv3 (64, 64, 3, 3) | 100 | 86.71 |
| Conv4 (64, 64, 3, 3) | 100 | 85.84 |
| Conv5 (64, 64, 3, 3) | 100 | 85.10 |
| Conv6 (128, 64, 3, 3) | 100 | 86.04 |
| Conv7 (128, 128, 3, 3) | 100 | 83.46 |
| Conv8 (128, 64, 1, 1) | 100 | 86.52 |
| Conv9 (128, 128, 3, 3) | 100 | 82.88 |
| Conv10 (128, 128, 3, 3) | 100 | 80.75 |
| Conv11 (256, 128, 3, 3) | 100 | 77.45 |
| Conv12 (256, 256, 3, 3) | 100 | 70.23 |
| Conv13 (256, 128, 1, 1) | 100 | 77.74 |
| Conv14 (256, 256, 3, 3) | 100 | 59.51 |
| Conv15 (256, 256, 3, 3) | 100 | 42.64 |
| Conv16 (512, 256, 3, 3) | 100 | 22.16 |
| Conv17 (512, 512, 3, 3) | 100 | 10.72 |
| Conv18 (512, 256, 1, 1) | 100 | 41.56 |
| Conv19 (512, 512, 3, 3) | 100 | 5.02 |
| Conv20 (512, 512, 3, 3) | 100 | 3.46 |
| 1 − Sparsity | 100 | 23.32 |
| Accuracy | 93.74 | 92.52 |