| Literature DB >> 31983805 |
Epameinondas Antonakos1, Patrick Snape1, Grigorios G Chrysos1, Akshay Asthana2, Stefanos Zafeiriou1,3.
Abstract
Recently, technologies such as face detection, facial landmark localisation and face recognition and verification have matured enough to provide effective and efficient solutions for imagery captured under arbitrary conditions (referred to as "in-the-wild"). This is partially attributed to the fact that comprehensive "in-the-wild" benchmarks have been developed for face detection, landmark localisation and recognition/verification. A very important technology that has not been thoroughly evaluated yet is deformable face tracking "in-the-wild". Until now, the performance has mainly been assessed qualitatively by visually assessing the result of a deformable face tracking technology on short videos. In this paper, we perform the first, to the best of our knowledge, thorough evaluation of state-of-the-art deformable face tracking pipelines using the recently introduced 300 VW benchmark. We evaluate many different architectures focusing mainly on the task of on-line deformable face tracking. In particular, we compare the following general strategies: (a) generic face detection plus generic facial landmark localisation, (b) generic model free tracking plus generic facial landmark localisation, as well as (c) hybrid approaches using state-of-the-art face detection, model free tracking and facial landmark localisation technologies. Our evaluation reveals future avenues for further research on the topic.Entities:
Keywords: Deformable face tracking; Face detection; Facial landmark localisation; Long-term tracking; Model free tracking
Year: 2017 PMID: 31983805 PMCID: PMC6953975 DOI: 10.1007/s11263-017-0999-5
Source DB: PubMed Journal: Int J Comput Vis ISSN: 0920-5691 Impact factor: 7.410
Fig. 1Overview of the standard approaches for deformable face tracking. (Top) face detection is applied independently at each frame of the video followed by facial landmark localisation. (Bottom) model free tracking is employed, initialised with the bounding box of the face at the first frame, followed by facial landmark localisation
Fig. 4This figure gives a diagram of the reinitialisation scheme proposed in Sect. 4.4. Specifically, in case the face detector does not return a bounding box for a frame, the bounding box of the previous frame is used as a successful detection for the missing frame
Fig. 8This figure gives a diagram of the reinitialisation scheme proposed in Sect. 4.6 for tracking with failure detection. For all frames after the first, the result of the current landmark localisation is used to decide whether or not a face is still being tracked. If the classification fails, a re-detection is performed and the tracker is reinitialised with the bounding box returned by the detector
The set of detectors used in this paper
| Method | Citation(s) | Rigid template | DPM | Implementation |
|---|---|---|---|---|
| DPM |
Felzenszwalb et al. ( |
|
| |
|
Mathias et al. ( | ||||
|
Alabort-i-Medina et al. ( | ||||
| HR-TF |
Hu and Ramanan ( |
|
| |
| MTCNN |
Zhang et al. ( |
|
| |
| NPD |
Liao et al. ( |
|
| |
| SS-DPM |
Zhu and Ramanan ( |
|
| |
| SVM+HOG |
King ( |
|
| |
|
King ( | ||||
| VJ |
Viola and Jones ( |
|
| |
|
Bradski ( | ||||
| VPHR |
Kumar et al. ( |
|
|
The table reports the short name of the method, the relevant citation(s) as well as the link to the implementation used
The set of trackers that are used in this paper
| Method | Citation(s) | D | G | P | K | NN | Implementation |
|---|---|---|---|---|---|---|---|
| CAMSHIFT |
Bradski ( |
|
| ||||
| CCOT |
Danelljan et al. ( |
|
|
| |||
| CMT |
Nebehay and Pflugfelder ( |
|
| ||||
| DF |
Sevilla-Lara and Learned-Miller ( |
|
| ||||
| DLSSVM |
Ning et al. ( |
|
| ||||
| DSST |
Danelljan et al. ( |
|
| ||||
|
King ( | |||||||
| FCT |
Zhang et al. ( |
|
|
| |||
| HDT |
Qi et al. ( |
|
| ||||
| IVT |
Ross et al. ( |
|
| ||||
| KCF |
Henriques et al. ( |
|
| ||||
| LCT |
Ma et al. ( |
|
| ||||
| LRST |
Zhang et al. ( |
|
| ||||
| MDNET |
Nam and Han ( |
|
|
| |||
| MEEM |
Zhang et al. ( |
|
| ||||
| MIL |
Babenko et al. ( |
|
| ||||
|
Bradski ( | |||||||
| ORIA |
Wu et al. ( |
|
| ||||
| PF |
Isard and Blake ( |
|
| ||||
| RPT |
Li et al. ( |
|
| ||||
| SIAM-OXF |
Bertinetto et al. ( |
|
|
| |||
| SPOT |
Zhang and van der Maaten ( |
|
|
| |||
| SPT |
Yang et al. ( |
|
| ||||
| SRDCF |
Danelljan et al. ( |
|
| ||||
| STAPLE |
Bertinetto et al. ( |
|
| ||||
| STCL |
Zhang et al. ( |
|
| ||||
| STRUCK |
Hare et al. ( |
|
| ||||
| TGPR |
Gao et al. ( |
|
| ||||
| TLD |
Kalal et al. ( |
|
|
The table reports the short name of the method, the relevant citation(s) as well as the link to the implementation used. The initials stand for: (D)iscriminative, (G)enerative, (P)art-based, (K)eypoint trackers, and NN for trackers that employ neural networks
The landmark localisation methods employed in this paper
| Method | Citation(s) | Discriminative | Generative | Implementation |
|---|---|---|---|---|
| AAM |
Tzimiropoulos ( |
|
| |
|
Alabort-i-Medina et al. ( | ||||
| ERT |
Kazemi and Sullivan ( |
|
| |
|
King ( | ||||
| CFSS |
Zhu et al. ( |
|
| |
| SDM |
Xiong and De la Torre ( |
|
| |
|
Alabort-i-Medina et al. ( |
The table reports the short name of the method, the relevant citation(s) as well as the link to the implementation used
The set of experiments conducted in this paper
| Experiment | Section | Tracking | Detection | Landmark localisation | Failure checking | Re-initialisation | Kalman Smoothing |
|---|---|---|---|---|---|---|---|
| 1 |
|
|
| ||||
| 2 |
|
|
|
| |||
| 3 |
|
|
| ||||
| 4 |
|
|
|
|
| ||
| 5 |
|
|
|
|
| ||
| 6 |
| Comparison against state-of-the-art of 300 VW competition (Shen et al. | |||||
This table is intended as an overview of the battery of experiments that were conducted, as well as providing a reference to the relevant section
Fig. 2Example frames from the 300 VW dataset by Shen et al. (2015). Each row contains 10 exemplar images from each category, that are indicative of the challenges that characterise the videos of the category. a Category 1. b Category 2. c Category 3
Exemplar deformable tracking results that are indicative of the fitting quality that corresponds to each error value for all video categories
|
The area under the curve (AUC) and failure rate for all the experiments are computed based on the Cumulative error distributions (CED) limited at maximum error of 0.08
Results for experiment 1 of Sect. 4.3 (detection + landmark localisation) (Color table online)
|
The area under the curve (AUC) and Failure Rate are reported. The top four performing curves are highlighted for each video category. The current implementation of HR-TF cannot be executed to CPU mode, thus it would be unfair for the rest of the timing comparisons to include its GPU performance
Fig. 3Results for experiment 1 of Sect. 4.3 (detection + landmark localisation). The top 5 performing curves are highlighted in each legend. Please see Table 6 for a full summary
Results for experiment 2 of Sect. 4.4 (detection + landmark localisation + initialisation from previous frame) (Color table online)
|
The area under the curve (AUC) and failure rate are reported. The top four performing curves are highlighted for each video category
Fig. 5Results for experiment 2 of Sect. 4.4 (detection + landmark localisation + initialisation from previous frame). The top five performing curves are highlighted in each legend. Please see Table 7 for a full summary
Fig. 6Results for experiment 2 of Sect. 4.4 (detection + landmark localisation + initialisation from previous frame). These results show the effect of initialisation from the previous frame, in comparison to missing detections. The top three performing results are given in red, green and blue, respectively, and the top three most improved are given in cyan, yellow and brown, respectively. The dashed lines represent the results before the reinitialisation strategy is applied, solid lines are after (Color figure online)
Results for experiment 3 of Sect. 4.5 (model free tracking + landmark localisation) (Color table online)
| |
|
Fig. 7Results for experiment 3 of Sect. 4.5 (model free tracking + landmark localisation). The top five performing curves are highlighted in each legend. Please see Table 8 for a full summary
Results for experiment 4 of Sect. 4.6 (model free tracking + landmark localisation + failure checking) (Color table online)
|
The area under the curve (AUC) and failure rate are reported. The top 3 performing curves are highlighted for each video category
Fig. 9Results for experiment 4 of Sect. 4.6 (model free tracking + landmark localisation + failure checking). The top five performing curves are highlighted in each legend. Please see Table 9 for a full summary
Fig. 10Results for experiment 4 of Sect. 4.6 (model free tracking + landmark localisation + failure checking). These results show the effect of the failure checking, in comparison to only tracking. The results are coloured by their performance red, green, blue and orange, respectively. The dashed lines represent the results before the reinitialisation strategy is applied, solid lines are after (Color figure online)
Results for experiment 5 of Sect. 4.7 (Kalman Smoothing) (Color table online)
|
The area under the curve (AUC) and failure rate are reported. The top four performing curves are highlighted for each video category
Fig. 11Results for experiment 5 of Sect. 4.7 (Kalman Smoothing). The top five performing curves are highlighted in each legend. Please see Table 10 for a full summary
Fig. 12Results for experiment 5 of Sect. 4.7 (Kalman Smoothing). These results show the effect of Kalman smoothing on the final landmark localisation results. The top three performing results are given in red, green and blue, respectively, and the top three most improved are given in cyan, yellow and brown, respectively. The dashed lines represent the results before the smoothing is applied, solid lines are after (Color figure online)
Comparison between the best methods of Sects. 4.3–4.7 and the participants of the 300 VW challenge by Shen et al. (2015) (Color table online)
|
The area under the curve (AUC) and failure rate are reported. The top five performing curves are highlighted for each video category
Fig. 13Comparison between the best methods of Sects. 4.3–4.7 and the participants of the 300 VW challenge by Shen et al. (2015). The top five methods are shown and are coloured red, blue, green, orange and purple, respectively. Please see Table 11 for a full summary (Color figure online)