| Literature DB >> 34977350 |
Bin Lin1, Houcheng Su1, Danyang Li1, Ao Feng1, Hongxiang Li1, Jiao Li1, Kailin Jiang2, Hongbo Jiang1, Xinyao Gong1, Tao Liu1.
Abstract
Due to memory and computing resources limitations, deploying convolutional neural networks on embedded and mobile devices is challenging. However, the redundant use of the 1 × 1 convolution in traditional light-weight networks, such as MobileNetV1, has increased the computing time. By utilizing the 1 × 1 convolution that plays a vital role in extracting local features more effectively, a new lightweight network, named PlaneNet, is introduced. PlaneNet can improve the accuracy and reduce the numbers of parameters and multiply-accumulate operations (Madds). Our model is evaluated on classification and semantic segmentation tasks. In the classification tasks, the CIFAR-10, Caltech-101, and ImageNet2012 datasets are used. In the semantic segmentation task, PlaneNet is tested on the VOC2012 datasets. The experimental results demonstrate that PlaneNet (74.48%) can obtain higher accuracy than MobileNetV3-Large (73.99%) and GhostNet (72.87%) and achieves state-of-the-art performance with fewer network parameters in both tasks. In addition, compared with the existing models, it has reached the practical application level on mobile devices. The code of PlaneNet on GitHub: https://github.com/LinB203/planenet. ©2021 Lin et al.Entities:
Keywords: Feature extraction; Local feature fusion; Reduce redundant; Strong operability; efficiency
Year: 2021 PMID: 34977350 PMCID: PMC8670390 DOI: 10.7717/peerj-cs.783
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Figure 1MobileNetV2 bottleneck vs. Plane bottleneck.
Figure 2Structure diagram of plane bottlenecks.
The dotted line represents a shortcut path. The shortcut path is adopted for stride = 1. For stride = 2, the shortcut path is removed.
Figure 3Characteristic diagram of different expansion coefficients (A–B).
The amount of Plane bottlenecks to calculate.
| layers | Input_shape | Output_shape | Parameters | Madds |
|---|---|---|---|---|
| DwConv k1×k1 |
|
| ||
| Conv 1 ×1 |
|
| ||
| DwConv k2×k2 |
|
|
The amount of Mobilenetv2 bottlenecks to calculate.
| layers | Input_shape | Output_shape | Paramters | Madds |
|---|---|---|---|---|
| Conv 1 ×1 | h × w × c | h × w × ( | hw | |
| DwConv k ×k | h × w × ( | |||
| Conv 1 ×1 |
|
Comparison of parameters and Madds.
Comparison of the number of Mobilenetv2 bottlenecks variables with the number of Plane bottlenecks variables.
| Paramters | Madds | |||
|---|---|---|---|---|
| Model | Ours | MobileNetV2 | Ours | MobileNetV2 |
| Total |
|
|
| |
| Ratio |
|
| ||
The overall architecture of PlaneNet.
In this case, P-bneck represents Plane Bottlenecks. Out indicates the number of output channels. The Stride represents the step size of the first separable convolutional layer in Plane.Classes represent the number of Classes in the dataset.
| Input | Operator |
| out | Stride |
|---|---|---|---|---|
| 2242× 3 | P-bneck | 6 | 16 | 2 |
| 1122× 16 | P-bneck | 6 | 16 | 1 |
| 1122× 16 | P-bneck | 5 | 24 | 2 |
| 562× 24 | P-bneck | 5 | 24 | 1 |
| 562× 24 | P-bneck | 5 | 40 | 2 |
| 282× 40 | P-bneck | 5 | 40 | 1 |
| 282× 40 | P-bneck | 5 | 40 | 1 |
| 282× 40 | P-bneck | 3 | 48 | 2 |
| 142× 48 | P-bneck | 3 | 48 | 1 |
| 142× 48 | P-bneck | 3 | 96 | 1 |
| 142× 96 | P-bneck | 3 | 96 | 1 |
| 142× 96 | P-bneck | 3 | 160 | 2 |
| 142× 160 | P-bneck | 3 | 160 | 1 |
| 72× 160 | Conv 2d 1 × 1 | – | 1280 | 1 |
| 72× 1280 | AvgPool 7 × 7 | – | – | – |
| 12× 1280 | FC | – | Classes | – |
The results of the combination of plane bottlenecks and two separable convolutional layers with different convolutional kernels k1 and k2 is tested on CIFAR-10 with the remaining parameters s = 2, α=1, and ɛ = 6. Acc. represents accuracy.
| k1/k2 | Parameter (M) | Madds (M) | Acc. (%) |
|---|---|---|---|
| VGG16-BN | 15.0 | 313.5 |
|
| 1/1 | 9.8 | 129.3 | 22.29 |
| 1/3 | 9.9 | 130.8 | 89.10 |
| 1/5 | 9.9 | 132.7 | 90.17 |
| 3/1 | 10.0 | 133.7 | 90.51 |
| 3/3 | 10.1 | 135.2 |
|
| 3/5 | 10.1 | 137.0 | 91.54 |
| 5/1 | 10.4 | 142.6 | 91.31 |
| 5/3 | 10.4 | 143.6 |
|
| 5/5 | 10.5 | 145.5 | 91.40 |
The results of the experiment of the first separable convolution layer of plane bottlenecks on different convolution kernel step sizes s is conducted on CIFAR-10 with the remaining parameters k1=k2=3, α=1, and ɛ=6. Acc. represents accuracy.
| s | Params (M) | Madds (M) | Acc. (%) |
|---|---|---|---|
| VGG16-BN | 15.0 | 313.5 | 92.42 |
| 2 | 10.1 | 135.2 |
|
| 3 | 10.1 | 41.1 | 89.56 |
| 4 | 10.1 | 19.3 | 86.19 |
| 5 | 10.1 | 18.1 | 84.73 |
| 6 | 10.1 | 14.9 | 83.25 |
The results of the experiment on the combination of the plane bottleneck width coefficient and expansion coefficient on CIFAR-10 with the remaining parameters k1=k2=3 and s = 2.
| Model | Params (M) | Madds (M) | Acc. (%) |
|---|---|---|---|
| VGG16-BN | 15.01 | 313.47 |
|
| 0.5/1 | 0.44 | 6.66 | 85.73 |
| 0.5/2 | 0.87 | 12.49 | 88.39 |
| 0.5/3 | 1.30 | 18.32 | 90.02 |
| 0.5/4 | 1.72 | 24.15 | 89.98 |
| 0.5/5 | 2.15 | 29.99 | 90.18 |
| 0.5/6 | 2.57 | 35.82 |
|
| 1.0/1 | 1.71 | 23.90 | 88.54 |
| 1.0/2 | 3.37 | 46.16 | 90.64 |
| 1.0/3 | 5.04 | 68.41 | 91.88 |
| 1.0/4 | 6.71 | 90.67 | 91.90 |
| 1.0/5 | 8.38 | 112.92 | 92.18 |
| 1.0/6 | 10.05 | 135.18 |
|
| 1.5/1 | 3.78 | 51.76 | 89.23 |
| 1.5/2 | 7.51 | 101.76 | 91.75 |
| 1.5/3 | 11.24 | 150.35 | 91.72 |
| 1.5/4 | 14.97 | 199.64 |
|
| 1.5/5 | 18.70 | 248.94 |
|
| 1.5/6 | 22.42 | 298.23 |
|
Figure 4Feature extraction comparison.
Conv stage i represents the Grad-CAM of the convolution layer before convolution layer i + 1 (stride=2). When i = 5, it is the Grad-CAM mapping of the convolution layer before the classification layer.
Figure 5The experimental performance of the model with data enhancement and without data enhancement is tested at 192, 224, and 256 sizes.
The first row is the experiment without data enhancement, and the second row is the experiment with data enhancement.
Performance of various mainstream models in classification.
Acc. represents the accuracy of the model on the ImageNetTE dataset. #Acc. represents the accuracy of the model on the ImageNetWoof dataset.
| Model | *Acc. (%) | #Acc. (%) | Params (M) | Madds (M) |
|---|---|---|---|---|
| PlaneNet 0.4× |
| 77.55 |
|
|
| MobileNetV1 0.25× | 82.83 | 72.97 | 0.5 | 41 |
| MobileNetV3-small 0.75× | 85.86 | 78.21 | 2.4 | 44 |
| MobileNetV2 0.35× | 84.94 | 73.41 | 1.7 | 59 |
| MobileNetV3-small 1.0× | 86.09 |
| 2.9 | 66 |
| PlaneNet 1.0× |
|
|
|
|
| MobileNetV2 0.6× | 86.52 | 79.00 | 2.2 | 141 |
| GhostNet 1.0× | 88.84 | 82.26 | 5.2 | 141 |
| ShuffleNetV2 1.0× ( | 87.52 | 80.84 | 2.3 | 146 |
| MobileNetV1 0.5× | 85.30 | 73.05 | 1.3 | 149 |
| PlaneNet 1.4× |
|
|
|
|
| MobileNetv3-large 1.0× | 88.56 | 81.58 | 5.4 | 219 |
| MobileNetV2 1.0× | 88.23 | 80.40 | 3.4 | 300 |
| MnasNet 1.0× | 89.45 | 82.16 | 4.2 | 317 |
| MobileNetV1 0.75× | 85.65 | 76.53 | 2.6 | 325 |
Performance of various mainstream models in semantic segmentation.
Acc. represents the accuracy of the model on the ILSVRC2012 datasets.
| Model | Acc. (%) | Params (M) | Madds (M) |
|---|---|---|---|
| PlaneNet 0.4× | 67.23 |
|
|
| MobileNetV1 0.25× | 50.91 | 0.5 | 41 |
| MobileNetV3-small 0.75× | 64.92 | 2.4 | 44 |
| MobileNetV2 0.35× | 61.07 | 1.7 | 59 |
| MobileNetV3-small 1.0× |
| 2.9 | 66 |
| PlaneNet 1.0× |
|
|
|
| MobileNetV2 0.6× | 65.31 | 2.2 | 141 |
| GhostNet 1.0× | 72.97 | 5.2 | 141 |
| ShuffleNetV2 1.0× ( | 68.94 | 2.3 | 146 |
| MobileNetV1 0.5× | 64.21 | 1.3 | 149 |
| PlaneNet 1.4× |
|
|
|
| MobileNetv3-large 1.0 × | 74.84 | 5.4 | 219 |
| MobileNetV2 1.0× | 72.12 | 3.4 | 300 |
| MnasNet 1.0× | 75.05 | 4.2 | 317 |
| MobileNetV1 0.75× | 66.93 | 2.6 | 325 |
FPS represents Frames Per Second, which is the reciprocal of the average of the 100,000 iterations it takes to predict a single image. RTX1060 represents the FPS of the model tested on the RTX1060.
RTX2060s represents the FPS of the model tested on the RTX2060s. RTX5000 represents the FPS that the model was tested on the RTX5000. TitanV represents the FPS that the model was tested on the Titan V. AMD represents the R5-3600 CPU test on the AMD platform. Intel represents the i7-8750 h CPU test on the Intel platform.
| Model | RTX1060 | RTX2060s | RTX5000 | TitanV | AMD | Intel |
|---|---|---|---|---|---|---|
| PlaneNet 0.4 × |
|
|
|
|
|
|
| MobileNetV1 0.25 × | 217 | 198 | 253 | 231 | 75 | 59 |
| MobileNetV3-small 0.75 × | 146 | 107 | 157 | 154 | 70 | 57 |
| MobileNetV2 0.35 × | 128 | 98 | 158 | 145 | 48 | 36 |
| MobileNetV3-small 1.0 × | 140 | 108 | 159 | 146 | 63 | 50 |
| PlaneNet 1.0 × |
|
|
|
| 41 | 37 |
| MobileNetV2 0.6 × | 111 | 95 | 157 | 147 | 26 | 20 |
| GhostNet 1.0 × | 85 | 84 | 118 | 118 | 36 | 27 |
| ShuffleNetV2 1.0 × ( | 134 | 95 | 145 | 132 |
|
|
| MobileNetV1 0.5 × | 201 | 177 | 221 | 211 | 55 | 40 |
| PlaneNet 1.4 × |
|
|
|
|
|
|
| MobileNetv3-large 1.0 × | 86 | 86 | 135 | 117 | 31 | 23 |
| MobileNetV2 1.0 × | 101 | 99 | 153 | 145 | 21 | 16 |
| MnasNet 1.0 × | 98 | 101 | 152 | 140 | 28 | 21 |
| MobileNetV1 0.75 × | 178 | 156 | 199 | 192 | 35 | 27 |
FPS represents Frames Per Second, which is the reciprocal of the average of the 100,000 iterations it takes to predict a single image.
| Model | Inference time | FPS | FPS |
|---|---|---|---|
| PlaneNet 0.4 × |
|
|
|
| MobileNetv3-small 0.75 × | 0.033 | 30.73 | 8.67 |
| MobileNetv2 0.35 × | 0.035 | 28.53 | 6.98 |
| MobileNetv3-small 1.0 × | 0.036 | 28.02 | 8.64 |
| PlaneNet 1.0 × | 0.041 | 24.37 | 5.74 |
| MobileNetv2 0.6 × | 0.047 | 21.24 | 4.68 |
| GhostNet 1.0 × | 0.052 | 19.22 | 4.00 |
| ShuffleNetv2 1.0 × ( |
|
|
|
| PlaneNet 1.4 × |
|
|
|
| MobileNetv3-large 1.0 × | 0.067 | 14.82 | 3.16 |
| MobileNetv2 1.0 × | 0.067 | 14.93 | 4.28 |
Notes.
represents test FPS on raspberry pie 2b
represents test FPS on raspberry pie 4b.
The results using PASCAL VOC 2012 for semantic segmentation tasks.
Model mIOU with numbers of parameters and Madds.
| Backbone | mIOU (%) | Params (M) | Madds (B) |
|---|---|---|---|
| ResNet50 |
| 46.66 | 50.85 |
| MobileNetV1 1.0 × | 69.76 | 8.96 | 7.93 |
| MnasNet 1.0 × | 72.97 | 3.22 | 3.20 |
| MobileNetV2 1.0 × | 71.04 | 2.35 | 2.50 |
| MobileNetv3-Large 1.0 × | 73.99 | 2.94 | 1.57 |
| GhostNet 1.0 × | 72.87 | 2.65 | 1.17 |
| MobileNetV3-Small 1.0 × | 69.41 | 0.91 |
|
| PlaneNet 1.0 × | 74.48 |
| 0.48 |
Uses Pascal VOC 2012 for the semantic segmentation tasks.
FPS represents Frames Per Second, which is the reciprocal of the average of the 100,000 iterations it takes to predict a single image. RTX1060 represents the FPS for the model that was tested on the RTX1060. RTX2060s represents the FPS of the model tested on the RTX2060s. RTX5000 represents the FPS of the model that was tested on the RTX5000. TitanV represents the FPS of the model was tested on a Titan V. AMD represents the R5-3600 CPU test on the AMD platform. Intel represents the i7-8750 h CPU test on the Intel platform.
| Backbone | RTX1060 | RTX2060s | RTX5000 | TitanV | AMD | Intel |
|---|---|---|---|---|---|---|
| ResNet50 | 11 | 19 | 28 | 39 | 6 | 4 |
| MobileNetV1 1.0 × | 23 | 38 | 37 | 46 | 20 | 16 |
| MnasNet 1.0 × | 21 | 33 | 35 | 44 | 20 | 15 |
| MobileNetV2 1.0 × | 23 | 33 | 35 | 44 | 19 | 14 |
| MobileNetV3-Large 1.0 × | 24 | 35 | 36 | 45 | 25 | 19 |
| GhostNet 1.0 × | 25 | 31 | 37 | 44 | 27 | 21 |
| MobileNetV3-Small 1.0 × | 29 | 39 |
| 46 |
|
|
| PlaneNet 1.0 × |
|
| 38 |
| 33 | 27 |
Comparing the numbers of parameters and Madds of MobileNetV2 and PlaneNet.
The corresponding size of * is 224 × 224, the corresponding size of ** is 192 × 192 and the corresponding size of *** is 256 × 256.
| Parameters (M)* | Madds (M)* | Madds (M)** | Madds (M)*** | |||||
|---|---|---|---|---|---|---|---|---|
| Mobilev2 | Ours | Mobilev2 | Ours | Mobilev2 | Ours | Mobilev2 | Ours | |
| 0.5/1 | 0.30 | 0.27 | 34.91 | 27.52 | 25.65 | 20.22 | 45.59 | 35.95 |
| 0.5/2 | 0.38 | 0.32 | 47.74 | 33.60 | 35.07 | 24.69 | 62.35 | 43.89 |
| 0.5/3 | 0.46 | 0.37 | 60.57 | 39.68 | 44.50 | 29.15 | 79.11 | 51.83 |
| 0.5/4 | 0.54 | 0.42 | 73.40 | 45.76 | 53.93 | 33.62 | 95.87 | 59.77 |
| 0.5/5 | 0.62 | 0.46 | 86.23 | 51.84 | 63.35 | 38.08 | 112.63 | 67.71 |
| 0.5/6 | 0.71 | 0.51 | 99.06 | 57.92 | 72.78 | 42.55 | 129.39 | 75.65 |
| 1.0/1 | 0.73 | 0.62 | 96.88 | 67.56 | 71.18 | 49.64 | 126.54 | 88.24 |
| 1.0/2 | 1.03 | 0.80 | 139.97 | 89.03 | 102.84 | 65.41 | 182.82 | 116.28 |
| 1.0/3 | 1.34 | 0.97 | 183.06 | 110.49 | 134.50 | 81.18 | 239.10 | 144.32 |
| 1.0/4 | 1.65 | 1.15 | 226.16 | 131.96 | 166.16 | 96.95 | 295.39 | 172.35 |
| 1.0/5 | 1.95 | 1.32 | 269.25 | 153.42 | 197.81 | 112.72 | 351.67 | 200.39 |
| 1.0/6 | 2.26 | 1.50 | 313.34 | 174.89 | 229.47 | 128.49 | 407.95 | 228.43 |
| 1.5/1 | 1.62 | 1.36 | 208.46 | 135.17 | 153.15 | 99.31 | 272.27 | 176.55 |
| 1.5/2 | 2.29 | 1.74 | 306.73 | 181.33 | 225.35 | 133.22 | 400.63 | 236.84 |
| 1.5/3 | 2.97 | 2.13 | 405.00 | 227.49 | 297.55 | 167.14 | 528.98 | 297.13 |
| 1.5/4 | 3.64 | 2.51 | 503.27 | 273.64 | 369.75 | 201.05 | 657.33 | 357.42 |
| 1.5/5 | 4.31 | 2.90 | 601.54 | 319.81 | 441.95 | 234.96 | 785.69 | 417.72 |
| 1.5/6 | 4.99 | 3.28 | 699.81 | 365.97 | 514.15 | 268.88 | 914.04 | 478.01 |
Under different sizes, the precision comparison of MobileNetV2 and PlaneNet after data enhancement.
Red denotes that a measure is better than the other measure. “Our acc.” refers to PlaneNet, and M.V2 refers to MobileNetV2.
| Input_size=(192 × 192) | Input_size=(224 × 224) | Input_size=(256 × 256) | ||||
|---|---|---|---|---|---|---|
| Our acc. (%) | M.v2 acc. (%) | Our acc. (%) | M.v2 acc. (%) | Our acc. (%) | M.v2 acc. (%) | |
| 0.5/1 |
| 72.48 |
| 73.78 |
| 75.48 |
| 0.5/2 |
| 73.27 |
| 76.04 |
| 76.89 |
| 0.5/3 |
| 74.35 |
| 76.61 | 78.92 |
|
| 0.5/4 |
| 74.80 |
| 77.74 |
| 79.83 |
| 0.5/5 |
| 77.06 |
| 78.30 | 80.67 |
|
| 0.5/6 | 78.07 |
|
| 80.84 |
| 78.92 |
| 1.0/1 |
| 75.31 |
| 77.17 |
| 78.64 |
| 1.0/2 |
| 77.79 | 79.37 |
|
| 80.33 |
| 1.0/3 |
| 78.70 | 80.50 |
|
| 80.16 |
| 1.0/4 | 80.68 |
| 81.07 |
|
| 81.58 |
| 1.0/5 |
| 80.00 | 82.42 |
|
| 83.44 |
| 1.0/6 |
| 80.39 | 82.15 |
| 83.72 |
|
| 1.5/1 |
| 74.35 |
| 78.53 |
| 77.57 |
| 1.5/2 |
| 77.45 | 80.90 |
|
| 82.09 |
| 1.5/3 | 79.60 |
|
| 81.97 | 82.54 |
|
| 1.5/4 | 79.21 |
|
| 81.12 |
| 82.54 |
| 1.5/5 |
| 81.29 |
| 82.65 | 83.61 |
|
| 1.5/6 | 80.90 |
| 82.71 |
|
| 84.12 |
For different sizes, the accuracy comparison between MobileNetV2 and PlaneNet without data enhancement.
Red denotes that a measure is better than the other measure. “Our acc.” refers to PlaneNet, and M.V2 refers to MobileNetV2.
| Input_size=(192 × 192) | Input_size=(224 × 224) | Input_size=(256 × 256) | ||||
|---|---|---|---|---|---|---|
| Our acc. (%) | M.v2 acc. (%) | Our acc. (%) | M.v2 acc. (%) | Our acc. (%) | M.v2 acc. (%) | |
| 0.5/1 |
| 62.48 |
| 67.62 |
| 70.67 |
| 0.5/2 |
| 69.32 |
| 70.73 |
| 71.80 |
| 0.5/3 |
| 71.52 |
| 71.12 |
| 75.76 |
| 0.5/4 | 72.54 |
|
| 74.23 | 75.87 |
|
| 0.5/5 |
| 72.31 | 74.80 |
| 77.45 |
|
| 0.5/6 |
| 73.61 | 73.95 |
| 76.15 |
|
| 1.0/1 |
| 70.73 |
| 74.18 |
| 74.74 |
| 1.0/2 |
| 72.93 |
| 74.23 |
| 77.51 |
| 1.0/3 |
| 75.02 |
| 77.06 |
| 77.17 |
| 1.0/4 |
| 76.49 |
| 75.76 |
| 79.15 |
| 1.0/5 |
| 74.85 | 76.66 |
| 77.62 |
|
| 1.0/6 | 76.55 |
| 76.66 |
|
| 78.70 |
| 1.5/1 |
| 71.01 |
| 75.14 |
| 75.98 |
| 1.5/2 |
| 69.77 |
| 76.38 |
| 77.40 |
| 1.5/3 |
| 72.93 |
| 78.87 |
| 77.85 |
| 1.5/4 |
| 68.41 |
| 77.68 | 77.74 |
|
| 1.5/5 |
| 75.93 | 77.23 |
| 79.37 |
|
| 1.5/6 |
| 76.89 |
| 74.57 | 79.20 |
|