| Literature DB >> 31795280 |
Loris Nanni1, Sheryl Brahnam2, Alessandra Lumini3.
Abstract
A fundamental problem in computer vision is face detection. In this paper, an experimentally derived ensemble made by a set of six face detectors is presented that maximizes the number of true positives while simultaneously reducing the number of false positives produced by the ensemble. False positives are removed using different filtering steps based primarily on the characteristics of the depth map related to the subwindows of the whole image that contain candidate faces. A new filtering approach based on processing the image with different wavelets is also proposed here. The experimental results show that the applied filtering steps used in our best ensemble reduce the number of false positives without decreasing the detection rate. This finding is validated on a combined dataset composed of four others for a total of 549 images, including 614 upright frontal faces acquired in unconstrained environments. The dataset provides both 2D and depth data. For further validation, the proposed ensemble is tested on the well-known BioID benchmark dataset, where it obtains a 100% detection rate with an acceptable number of false positives.Entities:
Keywords: depth map ensemble; face detection; filtering
Year: 2019 PMID: 31795280 PMCID: PMC6929141 DOI: 10.3390/s19235242
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Schematic of the proposed face detection system.
Figure 2Color image (left), depth map (middle), and segmentation map (right).
Figure 3Examples of images rejected by the different filtering methods.
Figure 4Examples of partitioning of a neighborhood of the candidate face region into sectors (gray area). The lower sectors and that should contain the body are depicted in dark gray [9].
Characteristics of the six datasets. MHG: Microsoft Hand Gesture, PHG: Padua Hand Gesture, PFD: Padua FaceDec, and PFD2: Padua FaceDec2.
| Dataset | Number Images | Color Resolution | Depth Resolution | Number Faces | Difficulty Level |
|---|---|---|---|---|---|
| MHG | 42 | 640 × 480 | 640 × 480 | 42 | Low |
| PHG | 59 | 1280 × 1024 | 640 × 480 | 59 | Low |
| PFD | 132 | 1280 × 1024 | 640 × 480 | 150 | High |
| PFD2 | 316 | 1920 × 1080 | 512 × 424 | 363 | High |
| MERGED | 549 | --- | --- | 614 | High |
| BioID | 1521 | 384 × 286 | --- | 1521 | High |
Performance of the six face detectors and the best performing ensembles (see the last seven rows) on the MERGED dataset (* denotes the addition of the 20°/−20° rotated images/poses in the dataset). As in [9], a face is considered detected in an image if the eye distance . DR: detection rate, FL: fast localization, FP: false positives, NPD: normalized pixel difference, SFD: Single Scale-invariant Face Detector, SN: Split up sparse Network of Winnows, VJ: Viola–Jones.
| Face Detector(s)/Ensemble | +Poses | DR | FP |
|---|---|---|---|
| VJ(2) | No | 55.37 | 2528 |
| RF(−1) | No | 47.39 | 4682 |
| RF(−0.8) | No | 47.07 | 3249 |
| RF(−0.65) | No | 46.42 | 1146 |
| SN(1) | No | 66.61 | 508 |
| SN(10) | No | 46.74 | 31 |
| FL | No | 78.18 | 344 |
| NPD | No | 55.70 | 1439 |
| SFD | No | 81.27 | 186 |
| VJ(2) * | Yes | 65.31 | 6287 |
| RF(−1) * | Yes | 49.67 | 19,475 |
| RF(−0.8) * | Yes | 49.67 | 14,121 |
| RF(−0.65) * | Yes | 49.02 | 5895 |
| SN(1) * | Yes | 74.59 | 1635 |
| SN(10) * | Yes | 50.16 | 48 |
| FL * | Yes | 83.39 | 891 |
| NPD * | Yes | 64.17 | 10,431 |
| FL + RF(−0.65) | No | 83.06 | 1490 |
| FL + RF(−0.65) + SN(1) | No | 86.16 | 1998 |
| FL + RF(−0.65) + SN(1) * | Mixed | 88.44 | 3125 |
| FL * + SN(1) * | Yes | 87.79 | 2526 |
| FL * + RF(−0.65) + SN(1) * | Mixed | 90.39 | 3672 |
| FL * + RF(−0.65) + SN(1) * + SFD | Mixed | 91.21 | 3858 |
| FL * + RF(−0.65) + SN(1) * + NPD * + SFD | Mixed |
| 16,325 |
Performance of the six face detectors and ensembles reported above on the BioID dataset (note: some values are taken from [9]).
| Face Detector(s)/Ensemble | +Poses | DR (ED < 0.15) | DR (ED < 0.25) | DR (ED < 0.35) | (FP) |
|---|---|---|---|---|---|
| VJ(2) | No | 13.08 | 86.46 | 99.15 | 517 |
| RF(−1) | No | 87.84 | 98.82 | 99.08 | 80 |
| RF(−0.8) | No | 87.84 | 98.82 | 99.08 | 32 |
| RF(−0.65) | No | 87.84 | 98.82 | 99.08 | 21 |
| SN(1) | No | 71.27 | 96.38 | 97.76 | 12 |
| SN(10) | No | 72.06 | 98.16 | 99.74 | 172 |
| FL | No | 92.57 | 94.61 | 94.67 | 67 |
| SFD | No | 99.21 | 99.34 | 99.34 | 1 |
| VJ(2) * | Yes | 13.08 | 86.46 | 99.15 | 1745 |
| RF(−1) * | Yes | 90.53 | 99.15 | 99.41 | 1316 |
| RF(−0.8) * | Yes | 90.53 | 99.15 | 99.41 | 589 |
| RF(−0.65) * | Yes | 90.53 | 99.15 | 99.41 | 331 |
| SN(1) * | Yes | 71.33 | 96.52 | 97.90 | 193 |
| SN(10) * | Yes | 72.12 | 98.36 | 99.87 | 1361 |
| FL * | Yes | 92.57 | 94.61 | 94.67 | 1210 |
| FL + RF(−0.65) | No | 98.42 | 99.74 | 99.74 | 88 |
| FL + RF(−0.65) + SN(10) | No | 99.15 | 99.93 | 99.93 | 100 |
| FL + RF(−0.65) + SN(1) * | Mixed | 99.15 | 100 | 100 | 281 |
| FL * + SN(1) * | Yes | 98.03 | 99.87 | 99.93 | 260 |
| FL * + RF(−0.65) + SN(1) * | Mixed | 99.15 | 100 | 100 | 1424 |
| FL * + RF(−0.65) + SN(1) * + SFD | Mixed |
|
| 100 | 1425 |
Performance of FL + RF(−0.65) + SN(1)* + SFD obtained combining different filtering steps on MERGED.
| Filter Combination | DR | FP |
|---|---|---|
| SIZE | 91.21 | 1547 |
| SIZE + STD | 91.21 | 1514 |
| SIZE + STD + SEG | 91.21 | 1485 |
| SIZE + STD + SEG + ELL | 91.04 | 1440 |
| SIZE + STD + SEG + ELL + EYE | 90.55 | 1163 |
| SIZE + STD + SEG + ELL + SEC + EYE | 90.39 | 1132 |
| SIZE + STD + SEG + ELL + SEC + EYE + WAV | 90.07 | 1018 |
Average processing time per image in ms.
| Detection Method/Filter | ms |
|---|---|
| RF | 12,571 |
| SN | 1371 |
| FL | 170 |
| SPD | 175 |
| SIZE | 0.33 |
| STD | 10.86 |
| SEG | 8.808 |
| ELL | 10.24 |
| EYE | 19,143 |
| WAV | 179.4 |