| Literature DB >> 32796644 |
Deisy Chaves1,2, Eduardo Fidalgo1,2, Enrique Alegre1,2, Rocío Alaiz-Rodríguez1,2, Francisco Jáñez-Martino1,2, George Azzopardi3.
Abstract
Face recognition is a valuable forensic tool for criminal investigators since it certainly helps in identifying individuals in scenarios of criminal activity like fugitives or child sexual abuse. It is, however, a very challenging task as it must be able to handle low-quality images of real world settings and fulfill real time requirements. Deep learning approaches for face detection have proven to be very successful but they require large computation power and processing time. In this work, we evaluate the speed-accuracy tradeoff of three popular deep-learning-based face detectors on the WIDER Face and UFDD data sets in several CPUs and GPUs. We also develop a regression model capable to estimate the performance, both in terms of processing time and accuracy. We expect this to become a very useful tool for the end user in forensic laboratories in order to estimate the performance for different face detection options. Experimental results showed that the best speed-accuracy tradeoff is achieved with images resized to 50% of the original size in GPUs and images resized to 25% of the original size in CPUs. Moreover, performance can be estimated using multiple linear regression models with a Mean Absolute Error (MAE) of 0.113, which is very promising for the forensic field.Entities:
Keywords: Benchmark; CPU; CSEM; GPU; deep learning; face detection; regression
Mesh:
Year: 2020 PMID: 32796644 PMCID: PMC7472057 DOI: 10.3390/s20164491
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Face detection performance on the WIDER Face data set [36].
| Method | Year | mAP per Category | Speed (FPS) | GPU | Code | ||
|---|---|---|---|---|---|---|---|
| Easy(%) | Medium(%) | Hard(%) | |||||
| MTCNN [ | 2016 | 85.1 | 82.0 | 60.7 | 99.0 | TITAN Black | Yes |
| S3DF [ | 2017 | 93.7 | 92.4 | 85.2 | 36.0 | TITAN X | Yes |
| FANet [ | 2017 | 95.6 | 94.7 | 89.5 | 35.6 | GTX 1080 ti | Yes |
| PyramidBox [ | 2018 | 96.1 | 95.0 | 89.5 | — | — | Yes |
| DSFD [ | 2018 | 96.6 | 95.7 | 90.4 | — | — | Yes |
| AInnoFace [ | 2019 | 96.5 | 95.7 | 91.2 | — | — | No |
Figure 1Strategy to predict the face detection performance to an input image.
Events in the WIDER Face data set grouped by level of face detection difficulty [36].
| Difficulty | Real World Events |
|---|---|
| Easy | Gymnastics, Handshaking, Waiter Waitress, Press Conference, Worker Laborer, Parachutist Paratrooper, Sports, Coach Trainer, Meeting, Aerobics, Row Boat, Dancing, Swimming, Family Group, Balloonist, Dresses, Couple, Jockey, Tennis, Spa, Surgeons. |
| Medium | Stock Market, Hockey, Students Schoolkids, Ice Skating, Greeting, Football, Running, People Driving Car, Soldier Drilling, Photographers, Sports Fan, Group, Celebration/Party, Soccer, Interview, Raid, Baseball, Soldier Patrol, Angler, Rescue. |
| Hard | Traffic, Festival, Parade, Demonstration, Ceremony, People Marching, Basketball, Shoppers, Matador Bullfighter, Car Accident, Election Campaign, Concerts, Award Ceremony, Picnic, Riot, Funeral, Cheering, Soldier Firing, Car Racing, Voter. |
Figure 2Pipeline of detecting faces after resizing.
Evaluated CPUs specification details. CPUs were installed in desktop computer, laptop or tablet surface pro.
| CPU | Base Frequency | Cores | Cache | Bus Speed | Memory |
|---|---|---|---|---|---|
| Intel i5-3450 | 3.10 GHz | 4 | 6 MB | 5 GT/s | 8 GB |
| Intel i7-8650U | 1.90 GHz | 4 | 8 MB | 4 GT/s | 16 GB |
| Intel i7-4790K | 4.00 GHz | 4 | 8 MB | 5 GT/s | 32 GB |
| Intel i9-8950HK | 2.90 GHz | 6 | 12 MB | 8 GT/s | 32 GB |
| Intel Xeon E5-2630 | 2.40 GHz | 6 | 15 MB | 7.2 GT/s | 128 GB |
Evaluated GPUs specification details. GPUs were installed in desktop or laptop computers.
| GPU | Arch. | Cores | Video | Memory | Clock | Memory |
|---|---|---|---|---|---|---|
| Memory | Bandwidth | Frequency | ||||
| Tesla K40c | Kepler | 2880 | 12 GB | 288 GB/s | 745 MHz | 128 GB |
| TITAN Xp | Pascal | 3840 | 12 GB | 547.7 GB/s | 1404 MHz | 128 GB |
| GTX 1050 Ti | Pascal | 768 | 4 GB | 112 GB/s | 1290 MHz | 32 GB |
| GTX 1060 | Pascal | 1280 | 6 GB | 162 GB/s | 1506 MHz | 32 GB |
| GTX 1070 | Pascal | 1920 | 8 GB | 256 GB/s | 1506 MHz | 32 GB |
| RTX 2060 | Turing | 1920 | 6 GB | 336 GB/s | 1365 MHz | 16 GB |
| RTX 2070 | Turing | 2304 | 6 GB | 448 GB/s | 1410 MHz | 16 GB |
Speed and accuracy (mAP and F1 score) tradeoff results on the WIDER Face data set for MTCNN, PyramidBox and DSFD face detection methods using four image resolutions, and different CPUs/GPUs configurations. The best mAP, F1 score and speed values per image size and face detector are highlighted in bold. Higher mAP and F1 score with lower speed values mean a better performance.
| Method | Full Size Image (100%) | Resized Image to 75% | Resized Image to 50% | Resized Image to 25% | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Metric | MTCNN | PyramidBox | DSFD | MTCNN | PyramidBox | DSFD | MTCNN | PyramidBox | DSFD | MTCNN | PyramidBox | DSFD | |
| Avg. mAP (%) | 56.10 | 92.47 |
| 56.40 | 92.57 |
| 54.07 | 89.57 |
| 45.67 | 72.40 |
| |
| Avg. F1 score | 0.505 | 0.881 |
| 0.505 | 0.863 |
| 0.478 | 0.791 |
| 0.392 | 0.566 |
| |
| CPU i5-3450 (s) | 0.333 | 6.749 | 18.211 | 0.203 | 3.766 | 10.017 | 0.113 | 1.689 | 4.429 | 0.051 | 0.448 | 1.053 | |
| CPU i7-8650U (s) | 0.442 | 8.182 | 17.943 | 0.268 | 4.648 | 10.152 | 0.142 | 2.110 | 4.767 | 0.061 | 0.571 | 1.291 | |
| CPU i7-4790K (s) | 0.225 | 5.784 | 11.872 | 0.135 | 3.231 | 6.170 |
| 1.418 | 2.699 | 0.030 | 0.375 | 0.581 | |
| CPU i9-8950HK (s) |
|
| 10.036 |
|
| 5.396 | 0.078 |
| 2.335 |
|
|
| |
| CPU Xeon E5 (s) | 0.573 | 4.658 |
| 0.367 | 2.810 |
| 0.219 | 1.414 |
| 0.101 | 0.500 | 0.884 | |
| Tesla K40c (s) | 0.200 | 0.718 | 0.829 | 0.126 | 0.443 | 0.500 | 0.069 | 0.244 | 0.268 | 0.035 | 0.120 | 0.132 | |
| TITAN Xp (s) | 0.141 |
| 0.278 | 0.091 |
| 0.181 | 0.054 |
| 0.110 | 0.031 | 0.055 | 0.066 | |
| GTX 1050 Ti (s) | 0.116 | 0.649 | 0.648 | 0.073 | 0.355 | 0.371 | 0.042 | 0.180 | 0.171 |
| 0.057 | 0.078 | |
| GTX 1060 (s) | 0.114 | 0.359 | 0.363 | 0.076 | 0.225 | 0.218 | 0.041 | 0.117 | 0.108 | 0.021 | 0.050 | 0.050 | |
| GTX 1070 (s) | 0.123 | 0.268 | 0.320 | 0.079 | 0.169 | 0.201 | 0.046 | 0.095 | 0.111 | 0.023 | 0.046 | 0.063 | |
| RTX 2060 (s) |
| 0.492 |
|
| 0.279 |
|
| 0.129 |
|
|
|
| |
| RTX 2070 (s) | 0.119 | 0.493 | 0.276 | 0.075 | 0.271 | 0.183 | 0.043 | 0.129 | 0.077 |
| 0.046 | 0.051 | |
| 62.70 | 92.32 | 96.86 | 61.95 | 92.01 | 96.35 | 61.54 | 90.83 | 96.01 | 55.51 | 86.19 | 92.14 | ||
Improvement () in terms of accuracy (mAP and F1 score) and speed obtained with different image resolutions—, and —with respect to values computed for full size images—baseline—using MTCNN, PyramidBox and DSFD, and different CPUs/GPUs configurations on the WIDER Face data set. The best per image size and face detector is highlighted in bold. Higher values indicate a better performance.
| Method | Img. 75% vs. Img. 100% | Img. 50% vs. Img. 100% | Img. 25% vs. Img. 100% | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| MTCNN | PyramidBox | DSFD | MTCNN | PyramidBox | DSFD | MTCNN | PyramidBox | DSFD | ||
| Avg. mAP | 0.53 | 0.11 |
| −3.62 | −3.14 |
| −18.60 | −21.70 |
| |
| Avg. F1 score |
| −2.05 | −0.30 | −5.36 | −10.22 |
| −22.35 | −35.72 |
| |
| Avg. Speed CPUs | 37.42 | 43.06 |
| 65.03 | 73.88 |
| 85.24 | 92.53 |
| |
| Avg. Speed GPUs | 36.22 |
| 37.70 | 63.64 | 67.51 |
| 81.81 |
| 82.92 | |
Figure 3Precision-Recall curves on WIDER Face data set for MTCNN, PyramidBox and DSFD face detection methods using four different image resolutions.
Figure 4Average CPU and GPU computation time(s) on the WIDER Face data set for MTCNN, PyramidBox and DSFD face detection methods using four different image resolutions.
Figure 5Detected faces using the MTCNN, PyramidBox and DSFD methods with four image resolutions.
Speed and accuracy (mAP and F1 score) tradeoff results on the UFDD data set for the MTCNN, PyramidBox and DSFD face detection methods using four different image resolutions, and CPUs/GPUs configurations. The best mAP, F1 score and speed values per image size and face detector are highlighted in bold. Higher mAP and F1 score with lower speed values mean a better performance.
| Method | Full Size Image (100%) | Resized Image to 75% | Resized Image to 50% | Resized Image to 25% | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Metric | MTCNN | PyramidBox | DSFD | MTCNN | PyramidBox | DSFD | MTCNN | PyramidBox | DSFD | MTCNN | PyramidBox | DSFD | |
| Avg. mAP (%) | 19.90 | 53.40 |
| 19.0 | 50.0 |
| 17.10 | 42.20 |
| 10.30 | 23.60 |
| |
| Avg. F1 score | 0.236 | 0.576 |
| 0.224 | 0.546 |
| 0.199 | 0.451 |
| 0.123 | 0.248 |
| |
| CPU i5-3450 (s) | 0.311 | 6.525 | 17.585 | 0.191 | 3.655 | 9.727 | 0.103 | 1.629 | 4.287 | 0.045 | 0.438 | 1.016 | |
| CPU i7-8650U (s) | 0.407 | 7.991 | 17.077 | 0.248 | 4.467 | 11.002 | 0.128 | 2.035 | 4.773 | 0.053 | 0.551 | 1.263 | |
| CPU i7-4790K (s) | 0.206 | 5.584 | 11.253 |
| 3.099 | 6.417 | 0.064 | 1.390 | 2.557 |
| 0.365 | 0.550 | |
| CPU i9-8950HK (s) |
| 4.598 |
| 0.119 |
|
|
|
| 2.253 | 0.028 |
|
| |
| CPU Xeon E5(s) | 0.510 |
| 7.795 | 0.347 | 2.711 | 4.462 | 0.199 | 1.351 |
| 0.091 | 0.459 | 0.835 | |
| Tesla K40c (s) | 0.179 | 0.676 | 0.820 | 0.116 | 0.424 | 0.477 | 0.061 | 0.237 | 0.259 | 0.034 | 0.117 | 0.127 | |
| TITAN Xp (s) | 0.130 |
| 0.276 | 0.086 |
| 0.179 | 0.050 |
| 0.110 | 0.030 | 0.056 | 0.068 | |
| GTX 1050 Ti (s) | 0.105 | 0.608 | 0.636 | 0.066 | 0.347 | 0.359 |
| 0.184 | 0.170 |
| 0.063 | 0.067 | |
| GTX 1060 (s) | 0.103 | 0.337 | 0.350 | 0.066 | 0.215 | 0.210 | 0.038 | 0.113 | 0.107 | 0.019 | 0.049 | 0.050 | |
| GTX 1070 (s) | 0.111 | 0.249 | 0.311 | 0.073 | 0.162 | 0.196 | 0.042 |
| 0.110 | 0.022 | 0.045 | 0.063 | |
| RTX 2060 (s) |
| 0.428 |
|
| 0.259 | 0.162 |
| 0.122 |
| 0.018 |
|
| |
| RTX 2070 (s) | 0.111 | 0.426 | 0.287 | 0.072 | 0.253 |
| 0.042 | 0.124 | 0.079 | 0.021 | 0.048 | 0.045 | |
| 62.87 | 92.86 | 96.68 | 62.19 | 92.12 | 96.14 | 60.86 | 90.74 | 95.94 | 52.68 | 85.49 | 92.16 | ||
Figure 6Precision-Recall curves on the UFDD Face data set for the MTCNN, PyramidBox and DSFD face detection methods using four different image resolutions.
Figure 7Average CPU and GPU computation time (s) on the UFDD Face data set for the MTCNN, PyramidBox and DSFD face detection methods using four different image resolutions.
Improvement () in terms of accuracy (mAP and F1 score) and speed obtained with different image resolutions—, and —with respect to values computed for full size images—baseline—using the MTCNN, PyramidBox and DSFD and different CPUs/GPUs configurations on the UFDD data set. The best per image size and face detector are highlighted in bold. Higher values indicate a better performance.
| Method | Img. 75% vs. Img. 100% | Img. 50% vs. Img. 100% | Img. 25% vs. Img. 100% | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| MTCNN | PyramidBox | DSFD | MTCNN | PyramidBox | DSFD | MTCNN | PyramidBox | DSFD | ||
| Avg. mAP | −4.52 | −6.37 |
| −14.07 | −20.97 |
| −48.24 | −55.81 |
| |
| Avg. F1 score | −5.29 | −5.19 |
| −15.77 | −21.81 |
| −48.12 | −56.97 |
| |
| Avg. Speed CPUs | 36.97 | 44.25 |
| 65.91 | 74.49 |
| 85.28 | 92.77 |
| |
| Avg. Speed GPUs | 35.32 | 37.45 |
| 63.36 | 65.70 |
| 80.93 |
| 83.31 | |
Mean Absolute Error (MAE), Root Mean Squared Error (RMSE) and Mean Squared Error (MSE) of the regression models built to predict the computational time (speed) based on the area of images (area), face detectors (method), image resized percentage (resized), and hardware used to process images (machine). Lower values of MAE, RMSE and MSE mean better performance.
| # | GLM Regression | Explanatory variables | Prediction | MAE | RMSE | MSE |
|---|---|---|---|---|---|---|
| 1 | Gaussian distribution (normal) | area, method, machine, resized | speed | 1.438 | 2.164 | 4.682 |
| 2 | Normal + logarithmic (log) speed | area, method, machine, resized | log (speed) | 0.624 | 1.851 | 3.425 |
| 3 | Binomial Negative distribution | area, method, machine, resized | speed | 0.287 | 0.840 | 0.705 |
| 4 | Normal + variables concatenation (concat) | area, concat (method, machine, resized) | speed | 0.246 | 0.636 | 0.405 |
| 5 |
| area, concat (method, machine, resized) | log (speed) |
|
|
|
MAE, RMSE and MSE of the regression models built to predict the F1 score (F1Score) based on the area of images (area), face detectors (method), and image resized percentage (resized). Lower values of MAE, RMSE and MSE mean better performance.
| # | GLM Regression | Explanatory variables | Prediction | MAE | RMSE | MSE |
|---|---|---|---|---|---|---|
| 1 | Gaussian distribution (normal) | area, method, resized | F1Score | 0.371 | 0.418 | 0.175 |
| 3 | Binomial Negative distribution | area, method, resized | F1Score | 0.370 | 0.446 | 0.199 |
| 2 |
| area, concat (method, resized) | F1Score |
|
|
|