| Literature DB >> 31075955 |
Xu Zhao1,2, Xiaoqing Liang3,4, Chaoyang Zhao5,6, Ming Tang7,8, Jinqiao Wang9,10.
Abstract
Face detection is the basic step in video face analysis and has been studied for many years. However, achieving real-time performance on computation-resource-limited embedded devices still remains an open challenge. To address this problem, in this paper we propose a face detector, EagleEye, which shows a good trade-off between high accuracy and fast speed on the popular embedded device with low computation power (e.g., the Raspberry Pi 3b+). The EagleEye is designed to have low floating-point operations per second (FLOPS) as well as enough capacity, and its accuracy is further improved without adding too much FLOPS. Specifically, we design five strategies for building efficient face detectors with a good balance of accuracy and running speed. The first two strategies help to build a detector with low computation complexity and enough capacity. We use convolution factorization to change traditional convolutions into more sparse depth-wise convolutions to save computation costs and we use successive downsampling convolutions at the beginning of the face detection network. The latter three strategies significantly improve the accuracy of the light-weight detector without adding too much computation costs. We design an efficient context module to utilize context information to benefit the face detection. We also adopt information preserving activation function to increase the network capacity. Finally, we use focal loss to further improve the accuracy by handling the class imbalance problem better. Experiments show that the EagleEye outperforms the other face detectors with the same order of computation costs, on both runtime efficiency and accuracy.Entities:
Keywords: ARM-based devices; computer vision; face detection; model acceleration
Mesh:
Year: 2019 PMID: 31075955 PMCID: PMC6539187 DOI: 10.3390/s19092158
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1The overview of the network architecture of EagleEye face detector. The detection network is built using the information preserving activation function and the convolution factorization in almost all the backbone layers and the predicting layers.
Architecture of the backbone of the baseline face detector.
| Type/Stride | Filter Shape | Anchor Size |
|---|---|---|
| Conv/s2 |
| — |
| Conv/s2 |
| — |
| Conv/s2 |
| — |
| Conv/s1 |
| — |
| Conv/s2 |
| — |
|
| 32, 32 | |
| Conv/s2 |
| — |
| Conv/s1 |
| 64, 64 |
| Conv/s1 |
| — |
| Conv/s2 |
| 128, 128 |
| Conv/s1 |
| — |
| Conv/s2 |
| 256, 256 |
Figure 2The illusion of the depth-wise convolution.
Figure 3The illustration of using the head–shoulder region as the context information for face detection.
Figure 4The illustration of the context module.
Architecture of the backbone of EagleEye.
| Type/Stride | Filter Shape | Anchor Size | |
|---|---|---|---|
| Conv/s2 |
| — | |
| Conv dw/s2 | — | ||
| Conv/s1 |
| — | |
| Conv dw/s2 | — | ||
| Conv/s1 |
| — | |
| Conv dw/s1 | — | ||
| Conv/s1 |
| — | |
| Conv dw/s2 | — | ||
| Conv/s1 |
| — | |
|
| Conv dw/s1 | — | |
| Conv/s1 |
| — | |
| Slice | — | — | |
| Conv dw/s1/d1, | — | ||
| Conv dw/s1/d2 | — | ||
| Concat | — | — | |
| Conv/s1 |
| 32, 32 | |
| Conv dw/s2 | — | ||
| Conv/s1 |
| — | |
| Conv dw/s1 | — | ||
| Conv/s1 |
| 64, 64 | |
| Conv/s1 |
| — | |
| Conv dw/s2 | — | ||
| Conv/s1 |
| 128, 128 | |
| Conv/s1 |
| — | |
| Conv dw/s2 | — | ||
| Conv/s1 |
| 256, 256 | |
Ablation study on wider face’s validation set.
| Contributions | Baseline | EagleEye512 | ||||
|---|---|---|---|---|---|---|
| Convolution Factorization | √ | √ | √ | √ | √ | |
| Successive Downsampling Convolutions | √ | √ | √ | √ | ||
| Context Module | √ | √ | √ | |||
| Information Preserving Activation Function | √ | √ | ||||
| Focal Loss | √ | |||||
| Accuracy (mAP[easy]) | 87.9 | 83.7 | 82.2 | 82.9 | 83.3 | 84.1 |
| FLOPS | 440.3 M | 87.5 M | 72.6 M | 78.7 M | 75.3 M | 75.3 M |
Comparisons between different activation functions.
| Method | mAP [Easy] | mAP [Medium] | mAP [Hard] | FLOPS |
|---|---|---|---|---|
| ReLU | 82.9 | 76.5 | 46.5 | 72.6 M |
| PReLU | 83.3 | 77.1 | 49.5 | 75.3 M |
| Leaky ReLU | 83.4 | 76.8 | 48.2 | 75.3 M |
Comparison EagleEye with directly pruning on the baseline on wider face’s validation set.
| Method | mAP [Easy] | mAP [Medium] | mAP [Hard] | FLOPS |
|---|---|---|---|---|
| Baseline | 87.9 | 84.0 | 61.4 | 440.3 M |
| 74.7 | 65.5 | 34.7 | 80.7 M | |
| EagleEye512 | 84.1 | 79.1 | 46.2 | 75.3 M |
Speed comparison with other face detection methods on FDDB with VGA input (640 × 480).
| Method | mAP on FDDB | Desktop | ARM Based Embedded Devices | ||
|---|---|---|---|---|---|
| FPS | CPU (Desktop Devices) | FPS | CPU (Embedded) | ||
| ACF [ | 85.2 | 20 | i7-3770@3.40 | N/A | ARM Cortex-A53@1.4GHz |
| MTCNN [ | 94.4 | 16 | N/A@2.60 | 5.4 | ARM Cortex-A53@1.4GHz |
| Faceboxes [ | 96.0 | 20 | E5-2660v3@2.60 | 3.4 | ARM Cortex-A53@1.4GHz |
| 96.0 | 20 | E5-2660v3@2.60 | 10 | ARM Cortex-A53@1.4GHz | |
| EagleEye | 96.1 | 21 | E5-2660v3@2.60 | 20 | ARM Cortex-A53@1.4GHz |
Figure 5Speed (frames per second (FPS)) versus accuracy (mAP) on FDDB dataset. The speed (FPS) is tested on the ARM Cortex-A53 based embedded device.
Memory complexity comparisons between different methods with VGA input.
| Method | Parameters | Model Size | Memory Footprint |
|---|---|---|---|
| MTCNN [ | 0.50 M | 1.90 MB | 33.4 MB |
| Faceboxes [ | 0.91 M | 3.87 MB | 24.0 MB |
| 0.93 M | 3.59 MB | 32.5 MB | |
| EagleEye | 0.23 M | 0.952 MB | 13.9 MB |
Figure 6Precision-recall curve on wider face validation (easy) set.
Figure 7Precision-recall curve on wider face validation (medium) set.
Figure 8Precision-recall curve on wider face validation (hard) set.
Figure 9Visualization of the results of EagleEye on wider face dataset.
Figure 10Discontinuous receiver operating characteristic (ROC) curves on the FDDB dataset.
Figure 11Visualization of the results of EagleEye on FDDB dataset.
Figure 12Precision-recall curves on PASCAL face dataset.
Figure 13Visualization of the results of EagleEye on Pascal face dataset.