| Literature DB >> 31817213 |
Seyha Chim1, Jin-Gu Lee1, Ho-Hyun Park1.
Abstract
Facial landmark detection has gained enormous interest for face-related applications due to its success in facial analysis tasks such as facial recognition, cartoon generation, face tracking and facial expression analysis. Many studies have been proposed and implemented to deal with the challenging problems of localizing facial landmarks from given images, including large appearance variations and partial occlusion. Studies have differed in the way they use the facial appearances and shape information of input images. In our work, we consider facial information within both global and local contexts. We aim to obtain local pixel-level accuracy for local-context information in the first stage and integrate this with knowledge of spatial relationships between each key point in a whole image for global-context information in the second stage. Thus, the pipeline of our architecture consists of two main components: (1) a deep network for local-context subnet that generates detection heatmaps via fully convolutional DenseNets with additional kernel convolution filters and (2) a dilated skip convolution subnet-a combination of dilated convolutions and skip-connections networks-that are in charge of robustly refining the local appearance heatmaps. Through this proposed architecture, we demonstrate that our approach achieves state-of-the-art performance on challenging datasets-including LFPW, HELEN, 300W and AFLW2000-3D-by leveraging fully convolutional DenseNets, skip-connections and dilated convolution architecture without further post-processing.Entities:
Keywords: dilated convolutions; face landmark detection; fully convolutional DenseNets; skip-connections
Mesh:
Year: 2019 PMID: 31817213 PMCID: PMC6960628 DOI: 10.3390/s19245350
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1FC-DenseNet architecture.
Figure 2Conventional convolution and dilated convolution.
Figure 3Overview of the proposed approach for facial landmark detection.
Figure 4Local appearance initialization network.
Architecture of FC-DenseNet56 used in the LAI network.
| Layer | Number of Feature Maps |
|---|---|
| Input | 3 |
| 3 × 3 convolution | 36 |
| DB (4 layers) + TD | 84 |
| DB (4 layers) + TD | 144 |
| DB (4 layers) + TD | 228 |
| DB (4 layers) + TD | 348 |
| DB (4 layers) + TD | 492 |
| DB (4 layers) | 672 |
| DB (4 layers) + TU | 816 |
| DB (4 layers) + TU | 612 |
| DB (4 layers) + TU | 434 |
| DB (4 layers) + TU | 288 |
| DB (4 layers) + TU | 192 |
| 1 | 68 (keypoints) |
Figure 5Best viewed in color. Left: Output of FC-DenseNets. Middle: Visualization of kernel convolution filter (). Right: Feature map after applying the filter ().
Structure of dilated convolutions.
| Filter Size | Dilation Factor | Activation Function |
|---|---|---|
| 3 × 3 | ReLU | |
| 3 × 3 | ReLU | |
| 3 × 3 | ReLU | |
| 3 × 3 | ReLU | |
| 3 × 3 | ReLU | |
| 3 × 3 | ReLU | |
| 3 × 3 | ReLU |
Figure 6Dilated skip convolution network for shape refinement.
The list of face datasets used for training and testing.
| Dataset | Landmark | Pose | Image |
|---|---|---|---|
| Training | |||
| HELEN | 68 | ±45 | 2000 |
| LFPW | 68 | ±45 | 811 |
| 300W | 68 | ±45 | 3148 |
| 300W-LP | 68 | ±90 | 61,225 |
| Testing | |||
| HELEN | 68 | ±45 | 330 |
| LFPW | 68 | ±45 | 224 |
| 300W | 68 | ±45 | 689 |
| AFLW2000-3D | 68 | ±90 | 2000 |
Figure 7Cumulative error distribution (CED) curve and area under the curve (AUC).
Mean error in LFPW dataset.
| Method | 68 pts |
|---|---|
| Zhu et al. [ | 8.29 |
| DRMF [ | 6.57 |
| RCPR [ | 5.67 |
| SDM [ | 5.67 |
| GN-DPM [ | 5.92 |
| CFAN [ | 5.44 |
| CFSS [ | 4.87 |
| CFSS Practical [ | 4.90 |
| Ours | 3.52 |
Mean error on HELEN dataset.
| Method | 68 pts |
|---|---|
| Zhu et al. [ | 8.16 |
| DRMF [ | 6.70 |
| ESR [ | 5.70 |
| RCPR [ | 5.93 |
| SDM [ | 5.50 |
| GN-DPM [ | 5.69 |
| CFAN [ | 5.53 |
| CFSS [ | 4.63 |
| CFSS Practical [ | 4.72 |
| TCDCN [ | 4.60 |
| Ours | 3.11 |
Mean error on 300W dataset.
| Method | Common | Challenging | Fullset |
|---|---|---|---|
| RCPR [ | 6.18 | 17.26 | 7.58 |
| SDM [ | 5.57 | 15.40 | 7.50 |
| LBF [ | 4.95 | 11.98 | 6.32 |
| CFSS [ | 4.73 | 9.98 | 5.76 |
| CFSS Practical [ | 4.79 | 10.92 | 5.99 |
| RAR [ | 4.12 | 8.35 | 4.94 |
| 3DDFA [ | 6.15 | 10.59 | 7.01 |
| DeFA [ | 5.37 | 9.38 | 6.10 |
| CPM [ | 3.39 | 8.14 | 4.36 |
| Ours | 3.60 | 8.69 | 3.90 |
Figure 8Landmark detection examples from the 300W dataset.
Mean error on AFLW2000 dataset.
| Method | 68 pts |
|---|---|
| ESR [ | 7.99 |
| RCPR [ | 7.80 |
| MDM [ | 6.41 |
| SDM [ | 6.12 |
| 3DDFA [ | 5.42 |
| 3DSTN [ | 4.49 |
| DeFA [ | 4.50 |
| Ours | 4.04 |
Figure 9Landmark detection examples from AFLW2000-3D dataset.