| Literature DB >> 33114078 |
Li-Na Wang1, Wenxue Liu1, Xiang Liu1,2, Guoqiang Zhong1, Partha Pratim Roy3, Junyu Dong1, Kaizhu Huang4.
Abstract
In recent years, deep learning models have achieved remarkable successes in various applications, such as pattern recognition, computer vision, and signal processing. However, high-performance deep architectures are often accompanied by a large storage space and long computational time, which make it difficult to fully exploit many deep neural networks (DNNs), especially in scenarios in which computing resources are limited. In this paper, to tackle this problem, we introduce a method for compressing the structure and parameters of DNNs based on neuron agglomerative clustering (NAC). Specifically, we utilize the agglomerative clustering algorithm to find similar neurons, while these similar neurons and the connections linked to them are then agglomerated together. Using NAC, the number of parameters and the storage space of DNNs are greatly reduced, without the support of an extra library or hardware. Extensive experiments demonstrate that NAC is very effective for the neuron agglomeration of both the fully connected and convolutional layers, which are common building blocks of DNNs, delivering similar or even higher network accuracy. Specifically, on the benchmark CIFAR-10 and CIFAR-100 datasets, using NAC to compress the parameters of the original VGGNet by 92.96% and 81.10%, respectively, the compact network obtained still outperforms the original networks.Entities:
Keywords: agglomerative clustering; deep learning; feature maps; network compression; neurons
Mesh:
Year: 2020 PMID: 33114078 PMCID: PMC7660330 DOI: 10.3390/s20216033
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Agglomerating neurons in the fully connected layers.
Figure 2Agglomerating neurons in the convolutional layers.
Figure 3The process of the network compression based on neuron agglomerative clustering.
The result of the original and compressed DBN on the MNIST dataset.
| Original Network | Compressed Network | |
|---|---|---|
| 1st layer | 500 | 300 |
| 2nd layer | 500 | 300 |
| 3rd layer | 2000 | 1000 |
| Parameters | 1.67 M | 0.64 M |
| Error rate (%) | 1.03 | 1.17 |
| Error rate with fine-tuning (%) | - | 0.98 |
Figure 4Visualization of neurons in the first hidden layer of the original DBN.
Architecture and performance of the compressed network with a high compression ratio.
| Original Network | Compressed Network | |
|---|---|---|
| 1st layer | 500 | 200 |
| 2nd layer | 500 | 100 |
| 3rd layer | 2000 | 100 |
| Parameters | 1.67 M | 0.19 M |
| Error rate (%) | 1.03 | 15.8 |
| Error rate with fine-tuning (%) | - | 1.01 |
The classification results of the original and compressed convolutional neural networks tested on the Mixed National Institute of Standards and Technology Database (MNIST).
| Original Network | Compressed Network | |
|---|---|---|
| conv_1 | 6 | 6 |
| conv_2 | 6 | 5 |
| conv_3 | 16 | 12 |
| conv_4 | 16 | 12 |
| conv_5 | 120 | 80 |
| fc_1 | 120 | 60 |
| Parameters | 0.15 M | 0.05 M |
| Error rate (%) | 0.69 | 5.14 |
| Error rate with fine-tuning (%) | - | 0.63 |
The results obtained on the CIFAR-10 and CIFAR-100 datasets. P-Pruned refers to the pruned ratio of parameters, and F-Pruned refers to the pruned ratio of floating-point operations per second (FLOPs). The best results are highlighted with bold face.
| Datasets | Model | Test Error (%) | Parameters | P-Pruned | FLOPs | F-Pruned | |
|---|---|---|---|---|---|---|---|
| VGGNet (Baseline) | 6.38 | 33.65 M | - | 6.65 | - | - | |
| CIFAR-10 | VGGNet (Model-A) | 6.19 | 2.37 M | 92.96% | 3.72 | 44.06% | +0.19 |
| VGGNet (Model-B) |
| 2.37 M |
| 3.72 | 44.06% |
| |
| VGGNet (Baseline) | 26.38 | 34.02 M | - | 6.65 | - | - | |
| CIFAR-100 | VGGNet (Model-A) | 26.30 | 6.43 M | 81.10% | 4.93 | 25.86% | +0.08 |
| VGGNet (Model-B) |
| 6.43 M |
| 4.93 | 25.86% |
|
Comparison results between neuron agglomerative clustering (NAC) and network compression by randomly merging neurons and using k-means clustering on the CIFAR-10 and CIFAR-100 datasets.
| Datasets | Method | Test Error (%) |
|---|---|---|
| Randomly merging neurons | 72.38 | |
| CIFAR-10 | Using | 6.35 |
| Using agglomerative clustering |
| |
| Randomly merging neurons | 87.53 | |
| CIFAR-100 | Using | 29.62 |
| Using agglomerative clustering |
|
Comparison between NAC and two related approaches on the CIFAR-10 and CIFAR-100 datasets.
| Datasets | Method | Parameters Pruned | |
|---|---|---|---|
| Network Slimming [ | 88.5% | +0.14 | |
| CIFAR-10 | Pruning Filters [ | 88.5% | -0.54 |
| Global Sparse Momentum [ | 88.5% | +0.20 | |
| Our Method | 88.6% |
| |
| Network Slimming [ | 76.0% | +0.22 | |
| CIFAR-100 | Pruning Filters [ | 75.1% | -1.62 |
| Global Sparse Momentum [ | 76.5% | +0.08 | |
| Our Method | 76.6% |
|