| Literature DB >> 35035213 |
Abstract
3D face reconstruction is the most captivating topic in biometrics with the advent of deep learning and readily available graphical processing units. This paper explores the various aspects of 3D face reconstruction techniques. Five techniques have been discussed, namely, deep learning, epipolar geometry, one-shot learning, 3D morphable model, and shape from shading methods. This paper provides an in-depth analysis of 3D face reconstruction using deep learning techniques. The performance analysis of different face reconstruction techniques has been discussed in terms of software, hardware, pros and cons. The challenges and future scope of 3d face reconstruction techniques have also been discussed.Entities:
Year: 2022 PMID: 35035213 PMCID: PMC8744573 DOI: 10.1007/s11831-021-09705-4
Source DB: PubMed Journal: Arch Comput Methods Eng ISSN: 1134-3060 Impact factor: 8.171
Fig. 1Number of research papers published in 3D face reconstruction from 2016–2021
Fig. 23D face images: a RGB Image, b Depth Image, c Mesh Image, d Point Cloud Image, e voxel Image
Fig. 3Taxonomy of 3D face reconstruction
Fig. 4General Framework of 3D Face Reconstruction Problem [9]
Fig. 5Word cloud of 3D face reconstruction literature
Fig. 63D Face Reconstruction Techniques
Fig. 7Progressive improvement in 3DMM over last twenty years [17]
Fig. 8Phases of 3D face recognition using restoration [9]
Fig. 9a Epipolar Plane Images corresponding to 3D face curves, b horizontal EPI, and c vertical EPI [48]
Fig. 10General framework of one-shot learning-based 3D face reconstruction
Fig. 113D face shape recovery a 2D image, b 3D depth image, c Texture projection, and d Albedo histogram[59]
Comparative analysis of 3D facial reconstruction techniques
| Reference | Year | Approaches/Models used | Is Face Alignment Done? | Convergence factor | Is Deep Learning Done? | Synthetic Data used? |
|---|---|---|---|---|---|---|
| [ | 2016 | SFS with landmarks and deep learning | No | MSE | Yes | Yes |
| [ | 2017 | 3DMM and SFS, learning cascaded regression | Yes | Mean Absolute Error (MAE) | No | Yes |
| [ | 2017 | CNN and Coarse-to-fine details | No | MSE | Yes | Yes |
| [ | 2017 | Volumetric regression networks (Multitask and Guided) | Yes | Normalized Mean Error (NME) | Yes | No |
| [ | 2017 | Auto encoder-based CNN | Yes | Geometric, Photometric, and Landmark error | Yes | No |
| [ | 2017 | DNN architecture | No | Root Mean Square Error (RMSE) | Yes | No |
| [ | 2017 | CNN based deep regression network | No | Mean error | Yes | Yes |
| [ | 2017 | Automatic reconstruction of a human face and 3D epipolar geometry | Yes | Mean and Standard Deviation | No | No |
| [ | 2018 | 3D deep feature vector and 3D augmentation of faces | Yes | Cumulative Matching Characteristic (CMC) and Receiver Operator Characteristic (ROC) curve | Yes | Yes |
| [ | 2018 | Coarse face modeling, Medium face modeling, and Fine face modeling | Yes | RMSE | No | No |
| [ | 2018 | Clustering and interpolation-based reconstruction | No | Error distribution | No | No |
| [ | 2018 | Faster Region-based CNN (RCNN) and reduced tree structure model | No | Moving Least Squares (MLS) | Yes | Yes |
| [ | 2018 | FR3DNet, CNN, Data augmentation. Main objective of the paper is to close the gap between size of 2D and 3D datasets for effective 3DFR | Yes | ROC curve | Yes | Yes |
| [ | 2018 | FaceLFNet. 3DMM with facial geometry. EPIs and CNN | No | RMSE | Yes | No |
| [ | 2018 | Leverages sparse photometric stereo (PS) | No | Average geometric error for reconstruction | No | Yes |
| [ | 2018 | Deep convolution encoder-decoder | No | ROC curve | No | No |
| [ | 2018 | 3D dense face reconstruction algorithm. 3DMM-CNN | No | RMSE | Yes | No |
| [ | 2018 | UV position maps | Yes | NME | Yes | No |
| [ | 2018 | Encoder-decoder network | No | RMSE | Yes | No |
| [ | 2018 | PIFR based 3DMM | Yes | Mean euclidean metric (MEM) | No | No |
| [ | 2019 | 3DMM. Cascaded regression | No | RMSE and MAE | No | No |
| [ | 2019 | Blended model | Yes | Structural Similarity Index Metric (SSIM) | No | No |
| [ | 2019 | MobileNet CNN | No | Area Under the Curve (AUC) | Yes | No |
| [ | 2019 | Deep learning model | Yes | NME | Yes | Yes |
| [ | 2019 | 3DMM fitting based on GANs and a differentiable renderer | No | Mean and Standard Deviation | Yes | No |
| [ | 2019 | CNNs | Yes | RMSE | Yes | No |
| [ | 2019 | Inverse 3DMM GAN model | Yes | Peak Signal to Noise Ratio (PSNR) and SSIM | Yes | Yes |
| [ | 2019 | Siamese network-based CNN model | Yes | ROC curve | Yes | No |
| [ | 2019 | GAN | No | Wasserstein GAN (W-GAN) adversarial loss | Yes | Yes |
| [ | 2019 | Self-supervised 3DMM encoder | Yes | RMSE | No | No |
| [ | 2019 | Encoder-decoder framework | Yes | PSNR and SSIM | Yes | Yes |
| [ | 2019 | Neural rendering | No | RMSE | Yes | Yes |
| [ | 2020 | Blended model | No | Pearson correlation and MSE | Yes | Yes |
| [ | 2020 | SymmFCNet | Yes | PSNR, SSIM, Identity Distance, and Perceptual Similarity | Yes | No |
| [ | 2020 | Deep learning-based technique | Yes | MSE | Yes | Yes |
| [ | 2020 | 2D-Assisted Self Supervised Learning (2DASL) | Yes | NME | Yes | Yes |
| [ | 2020 | VAE-GAN | Yes | Normalized dense vector error | Yes | No |
| [ | 2020 | Summation model and cascaded regression | Yes | MAE and NME | No | No |
| [ | 2020 | Graph convolutional networks | No | W-GAN adversarial loss | Yes | No |
| [ | 2020 | PCA model | No | Adversarial loss, Bidirectional cycle-consistency loss, Cross-domain character loss, and User control loss | Yes | Yes |
| [ | 2020 | Generation of per-pixel diffuse and specular components | No | PSNR | Yes | Yes |
| [ | 2020 | Parametric model based on vertex deformation space | Yes | Cumulative error distribution | Yes | Yes |
| [ | 2020 | DiscoFaceGAN | Yes | Adversarial loss, Imitative loss, and Contrastive loss | Yes | Yes |
| [ | 2020 | Joint 2D and 3D metaheuristic method | Yes | 3D Root Mean Square Error (3DRMSE) | No | No |
| [ | 2020 | End-to-end deep learning framework | Yes | NME and AUC | Yes | No |
| [ | 2020 | MGCNet | Yes | RMSE | No | No |
| [ | 2020 | 3D morphable model-based Pixel-3DM | Yes | NME and RMSE | No | Yes |
| [ | 2020 | Attention Guided Generative Adversarial Networks (AGGAN) | Yes | Intersection-over-Union (IoU) and Cross-Entropy Loss (CE) | Yes | No |
| [ | 2020 | GAN | Yes | Adversarial loss | Yes | Yes |
| [ | 2020 | Variational autoencoder, bi-LSTM, and triplet loss training | No | MAE | Yes | No |
| [ | 2020 | Deep learning process, game-theory based generator and discriminator | No | MAE | Yes | No |
| [ | 2021 | Variational autoencoders and triplet loss training | No | MAE | Yes | No |
| [ | 2021 | Single shot learning based weakly supervised multi-face reconstruction technique | Yes | NME | Yes | No |
Comparison of 3D face reconstruction techniques in terms of pros and cons
| References | Year | Pros | Cons |
|---|---|---|---|
| [ | 2016 | Efficient on various illumination conditions and expressions Generalises well on the synthetic data | Fail to reconstruct new faces in the test set Fails on facial attributes not present in synthetic data |
| [ | 2017 | No annotation needed for invisible landmarks | The facial texture quality is low Unable to apply in real-world scenarios |
| [ | 2017 | Fine details such as wrinkles can be reconstructed | Fails to generalise on facial features not in training data Dependency on synthetic data |
| [ | 2017 | Reconstruction is done using a single 2D image | Lack of facial alignment leads to the generation of almost identical faces after reconstruction |
| [ | 2017 | End-to-end deep learning | Fail under occlusion such as beard or external object Shrinked reconstruction |
| [ | 2017 | Simplified framework with end-to-end model instead of iterative model | Dependency on synthetic data |
| [ | 2017 | Free hand sketch of the face gets converted to 3D caricature model | Generates unnatural results when exaggeration in expression and shape are inconsistent |
| [ | 2017 | Model training can be done on mobile A general variational segmentation model is proposed to generalise the glasses | Occlusion-invariant only on glasses |
| [ | 2018 | Transfer Learning-based model is faster to train | Loses 3D information while converting 3D point cloud image to 2.5D |
| [ | 2018 | 3D face reconstructed from single 2D image | SFS technique depends on pre-defined knowledge about facial geometry, such as facial symmetry |
| [ | 2018 | 100 K point clouds are generated with a single shot of RGB-D sensor Low-cost method | External hardware is required |
| [ | 2018 | 3D component-based approach requires only a few 3D models CNN based models can easily handle in-the-wild characteristics | 3D component-based approach does not generalise well CNN based approach requires a large dataset for training |
| [ | 2018 | Deep CNN model trained on 3.1 M 3D faces of 100 K subjects | Training a CNN from scratch is time-consuming |
| [ | 2018 | Method generalises well This model-free approach is a superior choice in medical applications | A huge amount of epipolar plane image curves are required |
| [ | 2018 | High-quality 3D faces are generated with fine details | Dependent on light falling on the face |
| [ | 2018 | Bump map regression takes 0.03 s/image Face segmentation requires 0.02 s/image | Unoptimised soft symmetry implementation takes 50 s/image |
| [ | 2018 | 3D face reconstruction from single 2D image | Texture based reconstruction technique takes high computational time |
| [ | 2018 | Size of the model is 160 MB in contrast to 1.5 GB of Volumetric Regression Network The UV position map can generalise well | Unable to apply in real-world scenarios |
| [ | 2018 | It is a lightweight network | The joint loss function affects the quality of face shapes |
| [ | 2018 | Good reconstruction invariant of poses | The reconstruction needs improvement for large poses |
| [ | 2019 | The reconstruction technique is robust to light variation | Less landmarks are available. Hence landmark displacement features can be improved |
| [ | 2019 | Good generalisation through UV map based on feature points | Dependent on database with good head shapes Due to this the overall head shape lacks good quality |
| [ | 2019 | Fast training on mobile devices Real-time application | Annotation of 3D faces using morphable model is costly at a pre-processing stage of proposed MobileNet |
| [ | 2019 | Rendering of input image to multiple view angles 2D image to 3D shape is reconstructed | Rigid-body transformation is used for pre-processing |
| [ | 2019 | High texture 3D images are generated using UV maps with GANs | GANs are hard to train It cannot be applied in real-time solutions |
| [ | 2019 | Large pose and occlusion invariant Weakly supervised learning | Confidence of model is low on occlusion during prediction |
| [ | 2019 | Synthetic faces under occlusion images generated with semantic mapping to facial landmarks | Multiple discriminators add on to the model complexity GANs are hard to train |
| [ | 2019 | Siamese CNN based reconstruction has achieved at-par normalised mean error when compared to 3D dense face alignment method | Face recognition has not been tested in-the-wild The number of images in training set are low |
| [ | 2019 | High-quality 3D faces are generated with fine details | GANs are hard to train Cannot be applied in real-time solutions |
| [ | 2019 | Faces are generated with good quality under normal occlusion Details of face are captured using UV space | Model fails on extreme occlusion, expression, and large pose |
| [ | 2019 | Face deblurring is done over video handling challenge of pose variation | High computational cost |
| [ | 2019 | Technique is fast to train as it depends on transfer learning and not training from scratch | Video quality needs to be improved for real-world applications |
| [ | 2020 | Works for in-the-wild face datasets | 4DFAB dataset is not publicly available |
| [ | 2020 | Missing pixels are regenerated using a generative model | Dependency on multiple networks |
| [ | 2020 | 2D image as input can be converted to caricature of 3D face model | Caricature quality is affected when occlusion such as eyeglasses exist Not invariant to lighting conditions |
| [ | 2020 | Works for in-the-wild 2D faces along with the noisy landmarks. Self-supervised learning | Dependency on 2D-to-3D landmarks annotation |
| [ | 2020 | 3D GAN method for generation and translation of 3D face High-frequency details are preserved | GANs are hard to train Cannot be applied in real-time 3D face solutions |
| [ | 2020 | 3D face reconstruction from single 2D image 3D face recognition invariant of pose and expression | Does not include occlusion invariance |
| [ | 2020 | No large-scale dataset is required Detailed and coloured 3D mesh image | Does not work for occluded face regions |
| [ | 2020 | End-to-end deep learning method 6.1 K 3D meshes of caricature are synthesised Generates high-quality caricatures | Does not work well if the input image is occluded |
| [ | 2020 | Generates high-resolution avatars using GANs | Fails to generate avatars of dark skin subjects Minor alignment errors lead to blur of pore details |
| [ | 2020 | 3D caricature shape is directly built from 2D caricature image | Does not work well if the input image is occluded |
| [ | 2020 | Face generation is precise over expressions, poses, and illumination | Low quality of model is generated under low lighting and extreme poses |
| [ | 2020 | Robust to partial occlusions and extreme poses | Fails when 2D and 3D landmarks are wrongly estimated for occlusion |
| [ | 2020 | Trained through in-the-wild videos Generates high-quality 3D face and facial motion transfer from one person to other | Does not work well under external occlusion |
| [ | 2020 | Reconstruction is done through an occlusion-aware method | Does not work well under external occlusion such as glasses, hands, etc |
| [ | 2020 | Proposed technique can do 3D face analysis and generation effectively | External occlusions are not considered |
| [ | 2020 | 2.5D to 3D face mapping is done with attention-based GAN Handles a wide range of head poses and expressions | Unable to fully reconstruct the facial expression in case of big open mouth |
| [ | 2020 | The CNN-based model learns head-geometries even without ground-truth data | Unable to handle large pose variations |
| [ | 2020 | The mirroring technique is faster for reconstruction as compared to deep learning-based methods The pre-processing time, reconstruction time, and verification time are faster in computation as compared to the state-of-the-art methods | Preprocessing does not include facial alignment The technique has not been tested on a big dataset of 3D faces |
| [ | 2020 | End-to-end deep learning technique Occlusion invariant reconstruction is done | Not tested for being scalable |
| [ | 2021 | One-shot learning-based 3D face restoration technique Landmarks based reconstruction is faster as compared to mesh-based reconstruction using the mirroring technique | Facial alignment is missing The maximum size of the dataset is 4666 3D images. Deep learning model can be trained better if the dataset is huge |
| [ | 2021 | A single network model for multi-face reconstruction Faster pre-processing for feature extraction | Low facial texture Multiple GPUs required |
Evaluation of 3D face restoration techniques in terms of performance measures
| References | Year | Adversarial Loss | AUC | CE | IoU | MAE | MEM | MSE | NME | PSNR | RMSE | ROC | SSIM | Other |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| [ | 2016 | × | × | × | × | × | × | ✓ | × | × | × | × | × | × |
| [ | 2017 | × | × | × | × | ✓ | × | × | × | × | × | × | × | × |
| [ | 2017 | × | × | × | × | × | × | ✓ | × | × | × | × | × | × |
| [ | 2017 | × | × | × | × | × | × | × | ✓ | × | × | × | × | × |
| [ | 2017 | × | × | × | × | × | × | × | × | × | × | × | × | ✓ |
| [ | 2017 | × | × | × | × | × | × | × | × | × | ✓ | × | × | × |
| [ | 2017 | × | × | × | × | × | × | × | × | × | × | × | × | ✓ |
| [ | 2017 | × | × | × | × | × | × | × | × | × | × | × | × | ✓ |
| [ | 2018 | × | × | × | × | × | × | × | × | × | × | ✓ | × | ✓ |
| [ | 2018 | × | × | × | × | × | × | × | × | × | ✓ | × | × | × |
| [ | 2018 | × | × | × | × | × | × | × | × | × | × | × | × | ✓ |
| [ | 2018 | × | × | × | × | × | × | × | × | × | × | × | × | ✓ |
| [ | 2018 | × | × | × | × | × | × | × | × | × | × | ✓ | × | × |
| [ | 2018 | × | × | × | × | × | × | × | × | × | ✓ | × | × | × |
| [ | 2018 | × | × | × | × | × | × | × | × | × | × | × | × | ✓ |
| [ | 2018 | × | × | × | × | × | × | × | × | × | × | ✓ | × | × |
| [ | 2018 | × | × | × | × | × | × | × | × | × | ✓ | × | × | × |
| [ | 2018 | × | × | × | × | × | × | × | ✓ | × | × | × | × | × |
| [ | 2018 | × | × | × | × | × | × | × | × | × | ✓ | × | × | × |
| [ | 2018 | × | × | × | × | × | ✓ | × | × | × | × | × | × | × |
| [ | 2019 | × | × | × | × | ✓ | × | × | × | × | ✓ | × | × | × |
| [ | 2019 | × | × | × | × | × | × | × | × | × | × | × | ✓ | × |
| [ | 2019 | × | ✓ | × | × | × | × | × | × | × | × | × | × | × |
| [ | 2019 | × | × | × | × | × | × | × | ✓ | × | × | × | × | × |
| [ | 2019 | × | × | × | × | × | × | × | × | × | × | × | × | ✓ |
| [ | 2019 | × | × | × | × | × | × | × | × | × | ✓ | × | × | × |
| [ | 2019 | × | × | × | × | × | × | × | × | ✓ | × | × | ✓ | × |
| [ | 2019 | × | × | × | × | × | × | × | × | × | × | ✓ | × | × |
| [ | 2019 | ✓ | × | × | × | × | × | × | × | × | × | × | × | × |
| [ | 2019 | × | × | × | × | × | × | × | × | × | ✓ | × | × | × |
| [ | 2019 | × | × | × | × | × | × | × | × | ✓ | × | × | ✓ | × |
| [ | 2019 | × | × | × | × | ✓ | × | × | × | × | × | × | × | ✓ |
| [ | 2020 | × | × | × | × | × | × | ✓ | × | × | × | × | × | × |
| [ | 2020 | × | × | × | × | × | × | × | × | ✓ | × | × | ✓ | × |
| [ | 2020 | × | × | × | × | × | × | ✓ | × | × | × | × | × | × |
| [ | 2020 | × | × | × | × | × | × | × | ✓ | × | × | × | × | × |
| [ | 2020 | × | ✓ | × | × | × | × | × | × | × | × | × | × | ✓ |
| [ | 2020 | × | × | × | × | ✓ | × | × | ✓ | × | × | × | × | × |
| [ | 2020 | ✓ | × | × | × | × | × | × | × | × | × | × | × | × |
| [ | 2020 | ✓ | × | × | × | × | × | × | × | × | × | × | × | × |
| [ | 2020 | × | × | × | × | × | × | × | × | ✓ | × | × | × | × |
| [ | 2020 | × | × | × | × | × | × | × | ✓ | × | × | × | × | × |
| [ | 2020 | ✓ | × | × | × | × | × | × | × | × | × | × | × | × |
| [ | 2020 | × | × | × | × | × | × | × | × | × | ✓ | × | × | × |
| [ | 2020 | × | ✓ | × | × | × | × | × | ✓ | × | × | × | × | × |
| [ | 2020 | × | × | × | × | × | × | × | × | × | ✓ | × | × | × |
| [ | 2020 | × | × | × | × | × | × | × | ✓ | × | ✓ | × | × | × |
| [ | 2020 | × | × | ✓ | ✓ | × | × | × | × | × | × | × | × | × |
| [ | 2020 | ✓ | × | × | × | × | × | × | × | × | × | × | × | × |
| [ | 2020 | × | × | × | × | ✓ | × | × | × | × | × | × | × | ✓ |
| [ | 2020 | ✓ | × | × | × | ✓ | × | × | × | × | × | × | × | × |
| [ | 2021 | × | × | × | × | ✓ | × | × | × | × | × | × | × | ✓ |
| [ | 2021 | × | × | × | × | × | × | × | ✓ | × | × | × | × | × |
Detail description of datasets used
| Dataset Name | Modalities | Total Images | Total Subjects | Emotion Label Availability | Occlusion Label Availability | Publicly Available | Techniques using the dataset |
|---|---|---|---|---|---|---|---|
| Annotated faces-in-the-wild (AFW) [ | 2D + Landmarks | 468 | – | No | No | Yes | [ |
| Annotated facial landmarks in the wild (AFLW) [ | 2D + Landmarks | 25993 | – | No | No | Yes | [ |
| AFLW2000-3D [ | 2D + Landmarks | 2000 | – | No | No | Yes | [ |
| AFLW-LFPA [ | 2D + Landmarks | 1299 | – | No | No | Yes | [ |
| Age Database (AgeDB) [ | 2D + Age | 16488 | 568 | No | No | Yes | [ |
| Basel Face Model (BFM2009) [ | 3D | 200 | – | No | No | Yes | [ |
| Bosphorus [ | 2D, 3D | 4666 | 105 | Yes | Yes | Yes | [ |
| Binghamton University 3D Facial Expression (BU3DFE) [ | 3D | 2500 | 100 | Yes | No | Yes | [ |
| Binghamton University 4D Facial Expression (BU4DFE) [ | 3D + Time | 606 | 101 | Yes | No | Yes | [ |
| Chinese Academy of Sciences Institute of Automation (CASIA–3D) [ | 3D | 4624 | 123 | Yes | No | Yes | [ |
| CASIA-WebFace [ | 2D | 494414 | 10575 | No | No | Yes | [ |
| CelebFaces Attributes Dataset (CelebA) [ | 2D | 202599 | 10177 | Yes | Yes | Yes | [ |
| Celebrities in Frontal-Profile in the Wild (CFP) [ | 2D | 7000 | 500 | No | No | Yes | [ |
| FaceScape [ | 3D | 18760 | 938 | Yes | No | Yes | – |
| Facewarehouse [ | 3D | – | 150 | Yes | No | Yes | [ |
| Face Recognition Grand Challenge (FRGC-v2.0) [ | 3D | 4950 | 466 | Yes | No | Yes | [ |
| GavabDB [ | 3D | 549 | 61 | Yes | No | Yes | [ |
| Helen Facial Feature Dataset (Helen) [ | 2D + Landmarks | 2330 | – | No | No | Yes | [ |
| Hi-Lo [ | 3D | 6000 | – | No | No | Yes | [ |
| IARPA Janus Benchmark A (IJB–A) [ | 2D + Landmarks | 5712 | 500 | No | No | Yes | [ |
| KinectFaceDB [ | 2D, 2.5D, 3D | 936 | 52 | Yes | Yes | Yes | [ |
| Labeled Faces in the Wild (LFW) [ | 2D + Landmarks | 13233 | 5749 | No | No | Yes | [ |
| Labeled Face Parts in the Wild (LFPW) [ | 2D + Landmarks | 3000 | – | No | No | Yes | [ |
| Large Scale 3D Faces in the Wild (LS3D-W) [ | 2D + Landmarks | 230000 | – | No | No | Yes | [ |
| Media Integration and Communication Center Florence (MICC-Florence)[ | 2D, 3D | 53 | 53 | No | No | Yes | [ |
| Notre Dame (ND-2006) [ | 3D | 13450 | 888 | Yes | No | Yes | [ |
| Texas 3D Face Recognition Database (TexasFRD) [ | 2D + Landmarks, 2.5D | 1149 | 105 | No | No | Yes | [ |
| University of Houston Database (UHDB31) [ | 3D | 25872 | 77 | No | No | Yes | [ |
| University of Milano Bicocca 3D Face Database (UMBDB) [ | 2D, 3D | 1473 | 143 | Yes | Yes | Yes | [ |
| Visual Geometric Group Face Dataset (VGG-Face) [ | 2D + Time | 2.6M | 2622 | No | No | Yes | [ |
| VGGFace2 [66] | 2D | 3.31M | 9131 | No | No | Yes | [ |
| VidTIMIT [ | Video + Audio | – | 43 | No | No | Yes | [ |
| VoxCeleb2 [ | 2D + Audio | 1.12M Audios | 6112 | No | No | Yes | [ |
| WebCaricature [ | 2D | 6042 Caricatures + 5974 Images | 252 | No | No | Yes | [ |
| YouTube Faces Database (YTF) [ | 2D + Time | 3425 Videos | 1595 | No | No | Yes | [ |
| 300 Videos in the Wild (300 VW) [ | 2D+Time | 218595 | 300 | No | No | Yes | [ |
| 300 Faces in-the-wild Challenge (300W-3D) [ | 2D + Landmarks | 600 | 300 | No | No | Yes | [ |
| 300 Faces in-the-wild with Large Poses (300W-LP) [ | 2D, 3D | 61225 | – | No | No | Yes | [ |
| 3D Twins Expression Challenge (3D-TEC) [ | 3D | 428 | 214 | Yes | No | Yes | [ |
| 4D Facial Behaviour Analysis for Security (4DFAB) [ | 3D+Time | 1.8M | 180 | Yes | No | No | [ |
Comparative analysis of 3D face reconstruction in terms of technique, hardware, and applications
| References | Year | Technique | Hardware | RAM (GB) | Applications |
|---|---|---|---|---|---|
| [ | 2016 | CNN on synthetic data | GPU | 8 | Real Images |
| [ | 2017 | Cascaded regression | Intel Core i7 | 8 | Real Images |
| [ | 2017 | CNN on synthetic data | Intel Core i7 | 8 | Facial Reenactment |
| [ | 2017 | Direct volumetric CNN regression | GPU | 8 | Real Images |
| [ | 2017 | Unsupervised deep convolutional autoencoder | GPU | 8 | Real Images |
| [ | 2017 | End-to-end Deep Neural Network | GPU | 8 | Real Images |
| [ | 2017 | Deep learning-based sketching | GPU | 8 | Avatar, Cartoon characters |
| [ | 2017 | Glass-based explicit modelling | Mobile Phone | 4 | Real Images |
| [ | 2018 | Deep CNN and 3D augmentation | GPU | 8 | Captured 3D face reconstruction |
| [ | 2018 | Coarse to fine optimization strategy | Intel Core i7 | 8 | Real Images |
| [ | 2018 | RBF interpolation | Intel Core i7 | 8 | 3D films and virtual reality |
| [ | 2018 | Landmark localization and deep CNN | Intel Core i7 | 8 | Real Images |
| [ | 2018 | Deep CNN | GPU | 8 | Real Images |
| [ | 2018 | Epipolar plane images and CNN | GPU | 8 | Real Images |
| [ | 2018 | Sparse photometric stereo | Intel Core i7 | 8 | Semantic labeling of face into fine region |
| [ | 2018 | Bump map estimation with deep convolution autoencoder | GPU | 12 | Real Images |
| [ | 2018 | Morphable model, Basel face model, and Cascaded regression | Intel Core i7 | 8 | Real Images |
| [ | 2018 | Position map regression network | GPU | 8 | Real Images |
| [ | 2018 | Encoder decoder based | GPU | 32 | Real Images |
| [ | 2018 | 3DMM | Intel Core i7 | 8 | Real Images |
| [ | 2019 | Cascaded regression | Intel Core i7 | 8 | Estimation of high-quality 3D face shape from single 2D image |
| [ | 2019 | Best fit blending | Intel Core i7 | 16 | Virtual reality |
| [ | 2019 | CNN regression | GPU | 8 | Real time application |
| [ | 2019 | Unguided volumetric regression network | Intel Core i7 | 8 | Real Images |
| [ | 2019 | GANs and Deep CNNs | GPU | 11 | Image augmentation |
| [ | 2019 | Weakly supervised CNN | GPU | 8 | Real Images |
| [ | 2019 | GANs | GPU | 11 | De-occluded face generation |
| [ | 2019 | Siamese CNN | Intel Core i7 | 8 | Real Images |
| [ | 2019 | GANs | GPU | 4 | Image augmentation |
| [ | 2019 | 3DMM | Intel Core i7 | 8 | Easy combination of multiple face views |
| [ | 2019 | Encoder decoder based | GPU | 8 | Video quality enhancement |
| [ | 2019 | RNN and autoencoder | GPU | 11 | Video avatars, facial reenactment, video dubbing |
| [ | 2020 | Deep neural networks | Intel Core i7 | 8 | Facial affect synthesis of basic expressions |
| [ | 2020 | Symmetry consistent CNN | GPU | 11 | Natural Images |
| [ | 2020 | Deep CNN | GPU | 11 | Expression modeling on caricature |
| [ | 2020 | 2D assisted self-supervised learning | Intel Core i7 | 8 | Real Images |
| [ | 2020 | GANs | GPU | 32 | 3D face augmentation |
| [ | 2020 | Cascaded coupled regression | GPU | 8 | Real Images |
| [ | 2020 | Graph convolutional networks | GPU | 11 | Generate high fidelity 3D face texture |
| [ | 2020 | GANs | GPU | 11 | Animation, 3D printing, virtual reality |
| [ | 2020 | 3DMM and GAN algorithm | Intel Core i7 | 16 | Generation of 4Kx6K 3D face from single 2D face image |
| [ | 2020 | ANN | GPU | 16 | Expression modeling on caricature |
| [ | 2020 | GANs and VAEs | GPU | 8 | Vision and graphics |
| [ | 2020 | Adaptive reweighing based optimization | Intel Core i7 | 8 | Real Images |
| [ | 2020 | 3DMM and blendshapes | GPU | 8 | Personalized reconstruction |
| [ | 2020 | Multiview geometry consistency | Intel Core i7 | 8 | Real Images |
| [ | 2020 | 3DMM | Intel Core i7 | 8 | Expression modeling |
| [ | 2020 | Attention guided GAN | Intel Core i7 | 8 | 2.5D to 3D face generation |
| [ | 2020 | GANs | GPU | 12 | Face animation and reenactment |
| [ | 2020 | Clustering, VAE, BiLSTM, SVM | GPU | 32 | Real Images |
| [ | 2020 | End-to-end deep learning | GPU | 32 | Real Images |
| [ | 2021 | VAE and Triplet Loss | GPU | 32 | Real Images |
| [ | 2021 | Encoder decoder | GPU | 32 | Multiface reconstruction |
Fig. 12Face puppetry in real-time [129]
Fig. 13Neural Voice Puppetry [44]
Fig. 14DeepFake example in 6.S191 [133]
Fig. 15a Synthesised virtual tattoos [134] and b Augmented reality-based pixel-unit makeup on lips [136]
Fig. 16FaceForge based live projection mapping [137]
Fig. 17Projection mapping of a 2D face combined with 3DMM model [24]
Fig. 18Expression invariant face replacement system [138]
Fig. 19Transformation of the face using ACGAN [139]
Applications of 3D face reconstruction
| Broad Area | Target Problems | Techniques / Tools | References |
|---|---|---|---|
| Animation | Facial Puppetry | Displaced dynamic expression (DDE) model and dynamic expression model (DEM) | [ |
| Speech-driven Animation | RNN and Autoencoders | [ | |
| Face Enactment | RNN, GAN, Attention-based CNN | [ | |
| Video | Video Dubbing | DeepFake, GANs | [ |
| Face Replacement | CNN-based transfer learning, GANs, Adobe Premiere Elements, Apple Final Cut Pro, Filmora | [ | |
| 3D Face | Face Aging | GANs | [ |
| Virtual Makeup | GANs, Autoencoders, Augmented Reality | [ | |
| Projection Mapping | CNN | [ |
Fig. 203D Face reconstruction based on facial landmarks [9]
Fig. 21MakeupBag based output for applying makeup from reference to target face [154]
Fig. 22GAN based makeup transfer and removal [156]
Fig. 23Expression transfer using ReenactGAN [157]
Fig. 24Results of progressive face aging GAN [142]
Fig. 25High-quality lip shapes for reconstruction [1]
Fig. 26Teeth reconstruction with its applications [168]
Fig. 27Eyelid tracking based on semantic edges [169]
Fig. 283D Hair Synthesis using volumetric VAE [172]
Fig. 29Full head reconstruction [174]
Challenges and future research directions for 3D face reconstruction
| Challenges | Target Problem | Technique Used | References |
|---|---|---|---|
| Occlusion Removal | Forensics and surveillance | GAN, VAE, BiLSTM, and Triplet Loss | [ |
| Makeup Removal | Online meetings, forensics, and cosmetics | Controllable GAN | [ |
| Expression Transfer | Animation and dubbing | Encoder-Decoder based GAN | [ |
| Age Prediction | Photography, fashion, and robotics | Conditional GAN | [ |
| Lips Reconstruction | Surgery and AI in medicines | Surgery-based | [ |
| Teeth and Tongue Capturing | 3D modeling | GAN | [ |
| Eyes and Eyelids Capturing | Proctored examinations | BiLSTM | [ |
| Hair Style | Cosmetics and hair style industry | CNN, Autoencoder | [ |
| Complete Head | Augmented reality and Virtual reality | CNN | [ |