| Literature DB >> 34502792 |
Tao Wu1, Jiao Shi1, Deyun Zhou1, Xiaolong Zheng1, Na Li1.
Abstract
Deep neural networks have achieved significant development and wide applications for their amazing performance. However, their complex structure, high computation and storage resource limit their applications in mobile or embedding devices such as sensor platforms. Neural network pruning is an efficient way to design a lightweight model from a well-trained complex deep neural network. In this paper, we propose an evolutionary multi-objective one-shot filter pruning method for designing a lightweight convolutional neural network. Firstly, unlike some famous iterative pruning methods, a one-shot pruning framework only needs to perform filter pruning and model fine-tuning once. Moreover, we built a constraint multi-objective filter pruning problem in which two objectives represent the filter pruning ratio and the accuracy of the pruned convolutional neural network, respectively. A non-dominated sorting-based evolutionary multi-objective algorithm was used to solve the filter pruning problem, and it provides a set of Pareto solutions which consists of a series of different trade-off pruned models. Finally, some models are uniformly selected from the set of Pareto solutions to be fine-tuned as the output of our method. The effectiveness of our method was demonstrated in experimental studies on four designed models, LeNet and AlexNet. Our method can prune over 85%, 82%, 75%, 65%, 91% and 68% filters with little accuracy loss on four designed models, LeNet and AlexNet, respectively.Entities:
Keywords: convolutional neural network; evolutionary multi-objective algorithm; filter pruning; lightweight model
Mesh:
Year: 2021 PMID: 34502792 PMCID: PMC8434480 DOI: 10.3390/s21175901
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1The framework of evolutionary multi-objective one-shot filter pruning.
Figure 2Illustration of filter pruning and evolutionary individual encoding.
Detailed neural network variants, consisting of the configuration, accuracy and used dataset.
| Model | Conv1 | Conv2 | Conv3 | Conv4 | LeNet | AlexNet |
|---|---|---|---|---|---|---|
|
| MNIST | MNIST | MNIST | MNIST | MNIST | CIFAR10 |
|
| 0.9878 | 0.9917 | 0.9929 | 0.9935 | 0.9905 | 0.9004 |
|
| 64@(3,3) | 32@(3,3) | 16@(3,3) | 16@(3,3) | 8@(5,5) | 24@(3,3) |
| (,128) | 64@(3,3) | 32@(3,3) | 32@(3,3) | 16@(5,5) | 64@(5,5) | |
| (128,10) | (,128) | 64@(3,3) | 64@(3,3) | (,120) | 96@(3,3) | |
| (128,10) | (,128) | 64@(3,3) | (120,84) | 96@(3,3) | ||
| (128,10) | (,128) | (84,10) | 64@(5,5) | |||
| (128,10) | (,1024) | |||||
| (1024,1024) | ||||||
| (1024,10) |
Detailed parameters of the evolutionary multi-objective one-shot filter pruning.
| Parameter | Meaning | Value |
|---|---|---|
|
| The number of individuals in population | 50 |
|
| The maximum of generations | 200 |
|
| The probability of crossover | 0.9 |
|
| The probability of mutation | 0.2 |
|
| The lower bound of error in Equation ( | {0.01, 0.5} |
|
| The upper bound of error in Equation ( | {0.7, 0.9} |
Pruning results of Conv1 on MNIST. Error p and Error f mean the error of the pruned model before fine-tuning and the fine-tuned model, respectively. CR means the filter compression ratio of the pruned model.
| Model | Method | Config | Remained Filter % | Error p | Error f | Relative Error % | CR | FLOPs |
|---|---|---|---|---|---|---|---|---|
| Conv1 | original | (64) | - | - | 0.0122 | - | - | 3.21 M |
|
| (18) | 28.13 | 0.1846 | 0.0123 | 0.82 | 3.55 | 0.91 M | |
| (13) | 20.31 | 0.2984 | 0.0121 | −0.82 | 4.92 | 0.66 M | ||
| (9) | 14.06 | 0.4188 | 0.0137 | 12.30 | 7.11 | 0.45 M | ||
|
| (18) | 28.13 | 0.1823 | 0.012 | −1.64 | 3.55 | 0.91 M | |
| (13) | 20.31 | 0.3355 | 0.0122 | 0 | 4.92 | 0.66 M | ||
| (9) | 14.06 | 0.4716 | 0.0136 | 11.48 | 7.11 | 0.45 M | ||
| EMOFP | (18) | 28.13 | 0.05 | 0.0118 | −32.79 | 3.55 | 0.91 M | |
| (13) | 20.31 | 0.0523 | 0.012 | −1.64 | 4.92 | 0.66 M | ||
| (9) | 14.06 | 0.156 | 0.0136 | 11.48 | 7.11 | 0.45 M |
Pruning results of Conv2 on MNIST.
| Model | Method | Config | Remained Filter % | Error p | Error f | Relative Error % | CR | FLOPs |
|---|---|---|---|---|---|---|---|---|
| Conv2 | original | (32, 64) | - | - | 0.0083 | - | - | 842.82 K |
| (20, 27) | 48.96 | 0.0634 | 0.0081 | −0.85 | 2.04 | 351.33 K | ||
| (15, 14) | 30.21 | 0.4339 | 0.0095 | 14.46 | 3.31 | 182.23 K | ||
| (9, 8) | 17.71 | 0.5497 | 0.0097 | 16.87 | 5.65 | 104.38 K | ||
| (16, 31) | 48.96 | 0.1842 | 0.0074 | −10.84 | 2.04 | 400.65 K | ||
| (10, 19) | 30.21 | 0.5242 | 0.008 | −3.61 | 3.31 | 244.50 K | ||
| (6, 12) | 17.71 | 0.5692 | 0.0099 | 19.27 | 5.65 | 154.50 K | ||
| (20, 27) | 48.96 | 0.0933 | 0.008 | −3.61 | 2.04 | 351.33 K | ||
| (15, 14) | 30.21 | 0.2454 | 0.0086 | 3.61 | 3.31 | 182.23 K | ||
| (8, 9) | 17.71 | 0.632 | 0.0091 | 9.64 | 5.65 | 116.90 K | ||
| (16, 31) | 48.96 | 0.1841 | 0.0075 | −9.64 | 2.04 | 400.65 K | ||
| (10, 19) | 30.21 | 0.2291 | 0.0074 | −10.84 | 3.31 | 244.50 K | ||
| (6, 12) | 17.71 | 0.6058 | 0.0101 | 21.69 | 5.65 | 154.50 K | ||
| EMOFP | (12, 35) | 48.96 | 0.1001 | 0.0073 | −12.05 | 2.04 | 449.38 K | |
| (10, 19) | 30.21 | 0.1005 | 0.0077 | −7.23 | 3.31 | 244.50 K | ||
| (8, 9) | 17.71 | 0.2491 | 0.0098 | 18.07 | 5.65 | 116.90 K |
Pruning results of Conv3 on MNIST.
| Model | Method | Config | Remained Filter % | Error p | Error f | Relative Error % | CR | FLOPs |
|---|---|---|---|---|---|---|---|---|
| Conv3 | original | (16, 32, 64) | - | - | 0.0071 | - | - | 196.39 K |
| (16, 17, 12) | 40.18 | 0.3622 | 0.0086 | 21.13 | 2.49 | 39.07 K | ||
| (16, 15, 3) | 30.36 | 0.677 | 0.0168 | 136.62 | 3.29 | 14.90 K | ||
| (15, 11, 1) | 24.11 | 0.8716 | 0.0407 | 473.24 | 4.15 | 8.31 K | ||
| (7, 13, 26) | 41.07 | 0.4395 | 0.0086 | 21.13 | 2.43 | 70.32 K | ||
| (5, 10, 19) | 30.36 | 0.5054 | 0.0105 | 47.89 | 3.29 | 50.75 K | ||
| (4, 8, 16) | 25 | 0.7272 | 0.0109 | 53.52 | 4 | 42.38 K | ||
| (16, 17, 12) | 40.18 | 0.2142 | 0.0087 | 22.54 | 2.49 | 39.07 K | ||
| (16, 12, 6) | 30.36 | 0.4856 | 0.0137 | 92.96 | 3.29 | 21.43 K | ||
| (15, 10, 2) | 24.11 | 0.7025 | 0.0205 | 188.73 | 4.15 | 10.51 K | ||
| (7, 13, 26) | 41.07 | 0.5695 | 0.0084 | 19.31 | 2.43 | 70.32 K | ||
| (5, 10, 19) | 30.36 | 0.62 | 0.0111 | 56.34 | 3.29 | 50.75 K | ||
| (4, 8, 16) | 25 | 0.615 | 0.0113 | 59.15 | 4 | 42.38 K | ||
| EMOFP | (8, 13, 24) | 40.18 | 0.1014 | 0.008 | 12.68 | 2.49 | 65.75 K | |
| (7, 10, 17) | 30.36 | 0.1867 | 0.0084 | 18.31 | 3.29 | 46.18 K | ||
| (3, 9, 15) | 24.11 | 0.5989 | 0.0104 | 46.48 | 4.15 | 40.10 K |
Pruning results of Conv4 on MNIST.
| Model | Method | Config | Remained Filter % | Error p | Error f | Relative Error % | CR | FLOPs |
|---|---|---|---|---|---|---|---|---|
| Conv4 | original | (16, 32, 64, 64) | - | - | 0.0065 | - | - | 139.05 K |
| (16, 30, 37, 14) | 55.11 | 0.5102 | 0.0079 | 21.54 | 1.81 | 44.38 K | ||
| (16, 30, 27, 7) | 45.45 | 0.7952 | 0.0109 | 67.69 | 2.2 | 31.27 K | ||
| (16, 26, 15, 3) | 34.09 | 0.9152 | 0.0276 | 324.62 | 2.93 | 18.94 K | ||
| (9, 18, 35, 35) | 55.11 | 0.4502 | 0.008 | 23.08 | 1.81 | 48.00 K | ||
| (7, 15, 29, 29) | 45.45 | 0.5451 | 0.009 | 38.46 | 2.2 | 34.98 K | ||
| (6, 11, 22, 22) | 34.66 | 0.7669 | 0.0252 | 287.69 | 2.89 | 22.56 K | ||
| (16, 32, 38, 11) | 55.11 | 0.6379 | 0.0099 | 52.31 | 1.81 | 44.30 K | ||
| (16, 29, 30, 5) | 45.45 | 0.6235 | 0.013 | 1 | 2.2 | 30.85 K | ||
| (16, 24, 19, 1) | 34.09 | 0.9066 | 0.2965 | 4461.54 | 2.93 | 18.57 K | ||
| (9, 18, 35, 35) | 55.11 | 0.4294 | 0.0081 | 24.62 | 1.81 | 48.00 K | ||
| (7, 15, 29, 29) | 45.45 | 0.577 | 0.0079 | 21.54 | 2.2 | 34.98 K | ||
| (6, 11, 22, 22) | 34.66 | 0.7596 | 0.0108 | 66.15 | 2.89 | 22.56 K | ||
| EMOFP | (8, 19, 34, 36) | 55.11 | 0.1086 | 0.0082 | 26.15 | 1.81 | 48.32 K | |
| (9, 13, 28, 30) | 45.45 | 0.1793 | 0.0086 | 32.31 | 2.2 | 34.19 K | ||
| (5, 9, 24, 22) | 34.09 | 0.6199 | 0.0093 | 43.08 | 2.93 | 22.49 K |
Figure 3Pareto fronts of EMFP and scatter plots of fine-tuned solutions on designed models. Subfigure (a–d) mean the scatter plots on Conv1, Conv2, Conv3 and Conv4, respectively.
Figure 4Pareto front of EMFP and scatter plot of fine-tuned solutions on LeNet.
Pruning results of LeNet on MNIST.
| Model | Method | Config | Remained Filter % | Error p | Error f | Relative Error % | CR | FLOPs |
|---|---|---|---|---|---|---|---|---|
| LeNet | original | (8, 16) | - | - | 0.0095 | - | - | 90.09 K |
| (7, 9) | 66.67 | 0.0585 | 0.0091 | −4.21 | 1.5 | 59.91 K | ||
| (7, 2) | 37.5 | 0.4898 | 0.0171 | 80 | 2.67 | 30.58 K | ||
| (2, 0) | 8.33 | - | - | - | - | - | ||
| (5, 11) | 66.67 | 0.0464 | 0.0091 | −4.21 | 1.5 | 67.09 K | ||
| (3, 6) | 37.5 | 0.5573 | 0.0141 | 48.42 | 2.67 | 45.94 K | ||
| (1, 1) | 8.33 | 0.8472 | 0.0585 | 515.79 | 12 | 25.79 K | ||
| (7, 9) | 66.67 | 0.0325 | 0.0086 | −9.47 | 1.5 | 59.91 K | ||
| (6, 3) | 37.5 | 0.4698 | 0.02 | 110.53 | 2.67 | 34.57 K | ||
| (2, 0) | 8.33 | - | - | - | - | - | ||
| (5, 11) | 66.67 | 0.0486 | 0.0089 | −6.32 | 1.5 | 67.09 K | ||
| (3, 6) | 37.5 | 0.4066 | 0.0121 | 27.37 | 2.67 | 45.94 K | ||
| (1, 1) | 8.33 | 0.8565 | 0.0722 | 660 | 12 | 25.79 K | ||
| EMOFP | (6, 10) | 66.67 | 0.02 | 0.0085 | −10.53 | 1.5 | 63.55 K | |
| (4, 5) | 37.5 | 0.1054 | 0.0106 | 11.58 | 2.67 | 42.25 K | ||
| (1, 1) | 8.33 | 0.6915 | 0.0541 | 469.47 | 12 | 25.79 K |
Figure 5Pareto front of EMFP and scatter plot of fine-tuned solutions on AlexNet.
Comparison of the pruning results of AlexNet on CIFAR10.
| Model | Method | Config | Remained Filter % | Error | Relative Error % | CR | FLOPs |
|---|---|---|---|---|---|---|---|
| AlexNet | original | (24, 64, 96, 96, 64) | - | 0.0996 | - | - | 11.67 M |
| l1-global [ | (24, 53, 40, 14, 4) | 39.24 | 0.252 | 153.01 | 2.55 | 2.93 M | |
| l1-layer [ | (10, 25, 38, 38, 25) | 39.53 | 0.186 | 86.75 | 2.53 | 5.58 M | |
| l2-global [ | (24, 51, 42, 14, 4) | 39.24 | 0.2232 | 124.10 | 2.55 | 2.92 M | |
| l2-layer [ | (10, 25, 38, 38, 25) | 39.53 | 0.1846 | 85.34 | 2.53 | 5.58 M | |
| APoZ [ | (10, 25, 38, 38, 25) | 39.53 | 0.1801 | 80.82 | 2.53 | 5.58 M | |
| SFP [ | (10, 25, 38, 38, 25) | 39.53 | 0.1735 | 73.90 | 2.53 | 5.58 M | |
| ThiNet [ | (10, 25, 38, 38, 25) | 39.53 | 0.1612 | 61.85 | 2.53 | 5.58 M | |
| EMOFP | (9, 20, 39, 43, 24) | 39.24 | 0.1794 | 80.12 | 2.55 | 5.44 M |
Pruning results with different fine-tuning strategies on Conv2 and Conv3. Error p means the error of pruned model which is not fine-tuned, and Error sf and Error rf mean the error of the fine-tuned model with random initialized weight and shared original weight, respectively.
| Model | Solution No. | Remained Filter % | Error p | Error rf | Error sf |
|---|---|---|---|---|---|
| Conv2 | 1 | 0.4896 | 0.1001 | 0.0101 | 0.0073 |
| 2 | 0.3854 | 0.1002 | 0.0095 | 0.0079 | |
| 3 | 0.3125 | 0.1004 | 0.009 | 0.0086 | |
| 4 | 0.3021 | 0.1005 | 0.0103 | 0.0077 | |
| 5 | 0.2917 | 0.1007 | 0.0103 | 0.0081 | |
| 6 | 0.2708 | 0.101 | 0.0101 | 0.0089 | |
| 7 | 0.2396 | 0.1062 | 0.0097 | 0.0088 | |
| 8 | 0.2188 | 0.1408 | 0.0106 | 0.0106 | |
| 9 | 0.2083 | 0.1909 | 0.0106 | 0.0095 | |
| 10 | 0.1771 | 0.2491 | 0.0118 | 0.0098 | |
| Conv3 | 1 | 0.4107 | 0.1001 | 0.0104 | 0.01 |
| 2 | 0.4018 | 0.1014 | 0.0102 | 0.008 | |
| 3 | 0.3929 | 0.1047 | 0.012 | 0.0097 | |
| 4 | 0.3839 | 0.1507 | 0.0108 | 0.0093 | |
| 5 | 0.3125 | 0.1634 | 0.0115 | 0.0086 | |
| 6 | 0.3036 | 0.1867 | 0.0096 | 0.0084 | |
| 7 | 0.2946 | 0.2847 | 0.0108 | 0.0088 | |
| 8 | 0.2857 | 0.3038 | 0.0119 | 0.0095 | |
| 9 | 0.2589 | 0.4949 | 0.0128 | 0.011 | |
| 10 | 0.2411 | 0.5989 | 0.0147 | 0.0104 |
Pruning results of EMOFP for cat and dog classification. Solution No. 0 means the information of original CNN classifier.
| Solution No. | Configuration of Filters | Accuracy |
|---|---|---|
| 0 | (32, 64, 128, 128) | 0.8150 |
| 1 | (18, 27, 65, 59) | 0.8356 |
| 2 | (18, 26, 54, 54) | 0.8312 |
| 3 | (14, 25, 41, 37) | 0.8003 |
| 4 | (19, 28, 45, 43) | 0.8254 |
| 5 | (18, 32, 70, 63) | 0.8434 |
| 6 | (20, 33, 55, 51) | 0.8454 |
| 7 | (16, 27, 32, 43) | 0.8157 |
| 8 | (13, 26, 35, 40) | 0.8125 |
| 9 | (15, 29, 45, 39) | 0.8293 |