| Literature DB >> 36236445 |
Hyungkeuk Lee1, NamKyung Lee1, Sungjin Lee2.
Abstract
Due to the recent increasing utilization of deep learning models on edge devices, the industry demand for Deep Learning Model Optimization (DLMO) is also increasing. This paper derives a usage strategy of DLMO based on the performance evaluation through light convolution, quantization, pruning techniques and knowledge distillation, known to be excellent in reducing memory size and operation delay with a minimal accuracy drop. Through experiments regarding image classification, we derive possible and optimal strategies to apply deep learning into Internet of Things (IoT) or tiny embedded devices. In particular, strategies for DLMO technology most suitable for each on-device Artificial Intelligence (AI) service are proposed in terms of performance factors. In this paper, we suggest a possible solution of the most rational algorithm under very limited resource environments by utilizing mature deep learning methodologies.Entities:
Keywords: convolutional neural network; image classification; knowledge distillation; lightweight network; network compression; pruning; quantization
Mesh:
Year: 2022 PMID: 36236445 PMCID: PMC9571348 DOI: 10.3390/s22197344
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Summary of the related work.
| Reference # | Proposed |
|---|---|
| Lin et al., 2020 [ | A framework that jointly designs the efficient neural architecture and the lightweight inference engine, enabling ImageNet-scale inference on microcontrollers (MCUNet v1). |
| Lin et al., 2021 [ | A generic patch-by-patch inference scheduling, which operates only on a small spatial region of the feature map and significantly cuts down the peak memory (MCUNet v2). |
| Tan et al., 2019 [ | A new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple effective compound coefficient (EfficientNet). |
| Bello et al., 2021 [ | Training and scaling strategies: (1) scale model depth; (2) increase image resolution depending on the training regime. |
| David et al., 2021 [ | A model-architecture framework that enables hardware vendors to provide platform-specific optimizations and is open to a wide machine-learning ecosystem (TensorFlow Lite Micro). |
| Lai et al., 2018 [ | An efficient kernels developed to maximize the performance and minimize the memory footprint of neural network applications on Arm Cortex-M processors targeted for intelligent IoT edge devices(CMSIS-NN) |
| Gural et al., 2019 [ | Memory-optimal direct convolutions as a way to push classification accuracy as high as possible given strict hardware memory constraints at the expense of extra compute. |
| Sakr et al., 2021 [ | An in-place computation strategy to reduce memory requirements of neural network inference. |
| Müksch et al., 2020 [ | A comparison among several CNN variations (as like ProtoNN, Bonsai and FastGRNN) to apply 3-channel image classification using CIFAR10. |
Figure 1Quantization from FP32 to INT8.
Figure 2Quantization from FP32 to FP16.
Figure 3Baseline Quantization.
Figure 4Full Integer Quantization.
Figure 5Float 16 Quantization.
KD Classification according to Knowledge Type.
| Category | Meaning |
|---|---|
| Response | logit outputs of TSA |
| Feature | intermediate representations of TSA |
| Relation | relation between the feature maps |
KD Classification according to Distillation Type.
| Category | Meaning |
|---|---|
| Offline | KD from a pre-trained teacher model |
| Online | Update the TSM simultaneously |
| Self-Distillation | Online method with same TSA |
KD Classification according to Teacher–Student Architecture.
| Category | Meaning |
|---|---|
| Same as Teacher | Same architecture with Teacher |
| Reduced Teacher | Reduced architecture from Teacher |
| Light Network | Design with Light-Weight Conv, |
| Quantization and Pruning |
VGGNet.
| DLMO | CIFAR10 | CIFAR100 | ||||
|---|---|---|---|---|---|---|
| Eval |
|
|
|
|
|
|
| Setup | (%) | (KB) | (ms) | (%) | (KB) | (ms) |
| NQ | 71.9 | 56 | 4 | 43.5 | 425 | 18 |
| BLQ | 71.9 | 19 | 20 | 43.7 | 112 | 35 |
| FIQ | 72.5 | 18 | 44 | 43.5 | 110 | 53 |
| F16 | 72.1 | 30 | 6 | 43.8 | 214 | 19 |
| QAT | 72.5 | 18 | 42 | 42.5 | 106 | 45 |
| PRN | 68.7 | 16 | 2 | 39.7 | 123 | 2 |
| PRQ | 68.6 | 7 | 19 | 39.7 | 37 | 19 |
ResNet50.
| DLMO | CIFAR10 | CIFAR100 | ||||
|---|---|---|---|---|---|---|
| Eval |
|
|
|
|
|
|
| Setup | (%) | (KB) | (ms) | (%) | (KB) | (ms) |
| NQ | 80.7 | 94,052 | 177 | 36.5 | 94,790 | 129 |
| BLQ | 80.3 | 24,161 | 2348 | 36.5 | 24,346 | 2642 |
| FIQ | 80.4 | 24,269 | 2078 | 37.2 | 24,454 | 1997 |
| F16 | 80.6 | 47,072 | 152 | 39.4 | 47,441 | 162 |
| QAT | 80.5 | 24,281 | 2069 | 36.9 | 24,431 | 2001 |
| PRN | 81.2 | 27,503 | 155 | 43.2 | 27,857 | 148 |
| PRQ | 81.2 | 8023 | 2399 | 43.1 | 8059 | 2379 |
ResNet101.
| DLMO | CIFAR10 | CIFAR100 | ||||
|---|---|---|---|---|---|---|
| Eval |
|
|
|
|
|
|
| Setup | (%) | (KB) | (ms) | (%) | (KB) | (ms) |
| NQ | 79.6 | 169961 | 336 | 41.3 | 170,698 | 365 |
| BLQ | 79.5 | 43776 | 4576 | 41.5 | 43,960 | 4646 |
| FIQ | 80.8 | 43990 | 4555 | 37.4 | 44,174 | 4726 |
| F16 | 80.2 | 85072 | 329 | 41.4 | 85,441 | 352 |
| QAT | 80.1 | 43799 | 4529 | 37.0 | 44,111 | 4731 |
| PRN | 79.8 | 49704 | 347 | 38.7 | 49,697 | 337 |
| PRQ | 79.8 | 14812 | 4603 | 38.6 | 14,670 | 4587 |
MobileNet v1.
| DLMO | CIFAR10 | CIFAR100 | ||||
|---|---|---|---|---|---|---|
| Eval |
|
|
|
|
|
|
| Setup | (%) | (KB) | (ms) | (%) | (KB) | (ms) |
| NQ | 81.8 | 12,840 | 18 | 42.2 | 13,208 | 34.6 |
| BLQ | 81.8 | 3477 | 346 | 42.2 | 3560 | 379.7 |
| FIQ | 82.4 | 3517 | 332 | 43.1 | 3612 | 330 |
| F16 | 80.9 | 6435 | 22 | 45.1 | 6620 | 30.4 |
| PRQ | 81.0 | 1319 | 350 | 46.1 | 1349 | 350 |
| PRN | 81.0 | 3934 | 18 | 45.9 | 4063 | 19 |
MobileNet v2.
| DLMO | CIFAR10 | CIFAR100 | ||||
|---|---|---|---|---|---|---|
| Eval |
|
|
|
|
|
|
| Setup | (%) | (KB) | (ms) | (%) | (KB) | (ms) |
| NQ | 81.6 | 8924 | 11 | 45.7 | 9386 | 21.7 |
| BLQ | 81.5 | 2663 | 224 | 45.5 | 2782 | 213.6 |
| FIQ | 81.2 | 2730 | 173 | 45.9 | 2848 | 161 |
| F16 | 82.0 | 4509 | 10.6 | 41.5 | 4740 | 21.6 |
| PRQ | 80.2 | 1082 | 213 | 47.2 | 1118 | 221 |
| PRN | 80.5 | 2876 | 8 | 47.1 | 2997 | 7 |
MobileNet v3 small.
| DLMO | CIFAR10 | CIFAR100 | ||||
|---|---|---|---|---|---|---|
| Eval |
|
|
|
|
|
|
| Setup | (%) | (KB) | (ms) | (%) | (KB) | (ms) |
| NQ | 68.8 | 12,161 | 10 | 34.8 | 12,624 | 24 |
| BLQ | 68.7 | 3269 | 73 | 34.9 | 3389 | 97 |
| FIQ | 69.7 | 3301 | 41 | 35.4 | 3419 | 43 |
| F16 | 72.1 | 6128 | 10 | 35.2 | 6359 | 20 |
| PRQ | 69.6 | 1172 | 64 | 36.0 | 1187 | 69 |
| PRN | 69.7 | 3740 | 6 | 36.2 | 3862 | 7 |
MobileNet v3 Large.
| DLMO | CIFAR10 | CIFAR100 | ||||
|---|---|---|---|---|---|---|
| Eval |
|
|
|
|
|
|
| Setup | (%) | (KB) | (ms) | (%) | (KB) | (ms) |
| NQ | 77.9 | 34,939 | 26 | 37.1 | 35,401 | 37 |
| BLQ | 78.0 | 9122 | 195 | 37.2 | 9239 | 218 |
| FIQ | 77.2 | 9178 | 149 | 36.7 | 9295 | 150 |
| F16 | 77.9 | 17,525 | 26 | 38.1 | 17,756 | 28 |
| PRQ | 75.7 | 3062 | 206 | 39.4 | 3122 | 211 |
| PRN | 75.6 | 10,510 | 19 | 39.3 | 10,653 | 29 |
MobileNet v2, v3 in ImageNet.
| Network | Top-1 | MAdds | Params |
|---|---|---|---|
| V3-Large1.0 | 75.2 | 219 | 5.4M |
| V2 1.0 | 72.0 | 300 | 3.4M |
| V3-Small 1.0 | 67.4 | 56 | 2.5M |
Wide Residual Networks.
| DLMO | CIFAR10 | CIFAR100 | ||||
|---|---|---|---|---|---|---|
| Eval |
|
|
|
|
|
|
| Setup | (%) | (KB) | (ms) | (%) | (KB) | (ms) |
| NQ | 88.8 | 1899 | 39 | 60.5 | 192 | 50 |
| BLQ | 88.8 | 537 | 1820 | 59.4 | 544 | 1860 |
| FIQ | 88.7 | 538 | 1913 | 59.3 | 544 | 2009 |
| F16 | 86.7 | 982 | 39 | 60.0 | 994 | 51 |
| QAT | 88.6 | 540 | 1916 | 58.9 | 549 | 2002 |
| PRN | 87.6 | 547 | 42 | 60.2 | 572 | 37 |
| PRQ | 87.6 | 195 | 1902 | 60.3 | 200 | 1860 |
Performance Evaluation of KD in the CIFAR10 Dataset.
| DLMO | Original | KD | ||||
|---|---|---|---|---|---|---|
| Eval |
|
|
|
|
|
|
| Setup | (%) | (KB) | (ms) | (%) | (KB) | (ms) |
| M1 | 81.8 | 12,840 | 18 | 86.3 | 12,840 | 18 |
| M2 | 81.6 | 8924 | 11 | 85.4 | 8924 | 11 |
| M3L | 77.9 | 34,939 | 26 | 60.5 | 34,939 | 26 |
| M3s | 68.8 | 12,161 | 10 | 60.5 | 12,161 | 10 |
| M1-F16 | 80.9 | 6435 | 22 | 86.3 | 192 | 50 |
| M2-F16 | 82.0 | 4509 | 10.6 | 85.4 | 192 | 50 |
| M1-PR | 81.0 | 3934 | 18 | 86.3 | 192 | 50 |
| M2-PR | 80.5 | 2876 | 8 | 85.4 | 192 | 50 |