| Literature DB >> 35342228 |
Lei Liao1, Yu Zhu1,2, Bingbing Zheng1, Xiaoben Jiang1, Jiajun Lin1.
Abstract
Due to the problems of occlusion, pose change, illumination change, and image blur in the wild facial expression dataset, it is a challenging computer vision problem to recognize facial expressions in a complex environment. To solve this problem, this paper proposes a deep neural network called facial expression recognition based on graph convolution network (FERGCN), which can effectively extract expression information from the face in a complex environment. The proposed FERGCN includes three essential parts. First, a feature extraction module is designed to obtain the global feature vectors from convolutional neural networks branch with triplet attention and the local feature vectors from key point-guided attention branch. Then, the proposed graph convolutional network uses the correlation between global features and local features to enhance the expression information of the non-occluded part, based on the topology graph of key points. Furthermore, the graph-matching module uses the similarity between images to enhance the network's ability to distinguish different expressions. Results on public datasets show that our FERGCN can effectively recognize facial expressions in real environment, with RAF-DB of 88.23%, SFEW of 56.15% and AffectNet of 62.03%.Entities:
Keywords: Deep learning; Expression recognition; Graph convolutional network; In-the-wild data
Year: 2022 PMID: 35342228 PMCID: PMC8939244 DOI: 10.1007/s00138-022-01288-9
Source DB: PubMed Journal: Mach Vis Appl ISSN: 0932-8092 Impact factor: 2.983
Fig. 1Interference factors in wild facial expression datasets. From left to right are the side face, grayscale, low pixel, and occlusion
Fig. 2The FERGCN neural network framework. FERGCN includes feature extraction module, GCN, and graph-matching module. represents the multiplication of corresponding elements
Fig. 3Triplet attention network structure
Fig. 4Face key point acquisition process. a is the original face image, b is 68 landmarks, c is the key point of the face, in which the blue point is calculated
Fig. 5The proposed classification unit. C is the number of expression classes
Fig. 6The architecture of the plot module. denotes matrix multiplication, denotes multiplication of corresponding elements, and A is the designed adjacency matrix
Fig. 7The topological map between the key points
Comparison to the state-of-the-art results
| Method | RAF-DB | AffectNet |
|---|---|---|
| gACNN [ | 85.07 | 58.78 |
| RAN [ | 86.90 | – |
| SCN [ | 87.03 | – |
| LDL-ALSG [ | 85.53 | 59.35 |
| OADN [ | 87.16 | 61.89 |
| Our |
The best recognition results are shown in bold
Fig. 8Confusion matrix in RAF-DB
Comparison to the state-of-the-art results in SFEW
| Method | Accuracy |
|---|---|
| [ | 26.58 |
| ICID [ | 51.2 |
| LBF-NN [ | 49.31 |
| RAN [ | 54.19 |
| Our (ResNet18) |
The best recognition results are shown in Bold
Accuracy on Occlusion-RAF-DB and Pose-RAF-DB dataset
| Model | Occ | Pose > 30 | Pose > 45 |
|---|---|---|---|
| RAN [ | 82.72 | 86.74 | 85.20 |
| SCN [ | 82.18 | 86.45 | |
| Our | 86.74 |
The best recognition results are shown in bold
Ablation test results on RAF-DB
| GCN | Triplet attention | Graph match | Accuracy |
|---|---|---|---|
| 87.26 | |||
| 87.08 | |||
| 87.31 | |||
| 88.23 |
Fig. 9Comparison of the SCN method and our method on RAF-DB