| Literature DB >> 32380751 |
Abstract
Facial expression recognition (FER) is a challenging problem in the fields of pattern recognition and computer vision. The recent success of convolutional neural networks (CNNs) in object detection and object segmentation tasks has shown promise in building an automatic deep CNN-based FER model. However, in real-world scenarios, performance degrades dramatically owing to the great diversity of factors unrelated to facial expressions, and due to a lack of training data and an intrinsic imbalance in the existing facial emotion datasets. To tackle these problems, this paper not only applies deep transfer learning techniques, but also proposes a novel loss function called weighted-cluster loss, which is used during the fine-tuning phase. Specifically, the weighted-cluster loss function simultaneously improves the intra-class compactness and the inter-class separability by learning a class center for each emotion class. It also takes the imbalance in a facial expression dataset into account by giving each emotion class a weight based on its proportion of the total number of images. In addition, a recent, successful deep CNN architecture, pre-trained in the task of face identification with the VGGFace2 database from the Visual Geometry Group at Oxford University, is employed and fine-tuned using the proposed loss function to recognize eight basic facial emotions from the AffectNet database of facial expression, valence, and arousal computing in the wild. Experiments on an AffectNet real-world facial dataset demonstrate that our method outperforms the baseline CNN models that use either weighted-softmax loss or center loss.Entities:
Keywords: auxiliary loss; class center; deep convolutional neural network; facial expression recognition; transfer learning; weighted loss
Mesh:
Year: 2020 PMID: 32380751 PMCID: PMC7249188 DOI: 10.3390/s20092639
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Comparison between FER approaches.
| Approach | Method | Human Resource | Computing Resource | Computational Complexity | Accuracy |
|---|---|---|---|---|---|
| Conventional | Gabor wavelets coefficients [ | High | Low | Low | Medium |
| Haar features [ | High | Low | Low | Medium | |
| Local binary pattern (LBP) [ | High | Low | Low | Medium | |
| LBP on three orthogonal planes (LBP-TOP) [ | High | Low | Low | Medium | |
| Scale-invariant feature transform (SIFT) [ | High | Low | Low | Medium | |
| Histogram of gradient (HOG) [ | High | Low | Low | Medium | |
| Deep learning-based | Convolutional neural network [ | Low | High | High | High |
| Transfer learning-based CNN (Our approaches) | Low | Medium | Medium | High |
Figure 1Framework of the proposed method. A SE-ResNet-50 model [21], which was pre-trained on VGGFace2 data [22] for face identification, is fine-tuned with AffectNet data [5] for facial recognition using weighted-cluster loss. Before the fine-tuning phase, we add one more fully connected layer to the model while froze the three first stages of the pre-trained model to save computing power. The weighted-cluster loss is used at the output layer to update model parameters. Best view in color.
Detailed architectures of SE-ResNet-50 in the fine-tuning phase. Convolution blocks (SE-ResNet-50 uses the SE-ResNet block [21]) are shown in brackets, with the numbers of blocks stacked.
| Output Size | Kernel | Repeat | |
|---|---|---|---|
| Stage 1 (Freeze) |
| convolution | 1 |
| Stage 2 (Freeze) |
| convolution block | 3 |
| Stage 3 (Freeze) |
| convolution block | 4 |
| Stage 4 |
| convolution block | 6 |
| Stage 5 |
| convolution block | 3 |
|
| global average pooling | ||
| Fully connected | |||
| Fully connected | |||
| Output layer | 8-d softmax | ||
Figure 2Sample images from the AffectNet dataset (0: neutral; 1: happy; 2: sad; 3: surprise; 4: fear; 5: disgust; 6: anger; 7: contempt).
Numbers of samples in training, validation, and test sets.
| Emotion | Training | Validation | Test |
|---|---|---|---|
| Neutral | 74,374 | 500 | 500 |
| Happy | 133,915 | 500 | 500 |
| Sad | 24,959 | 500 | 500 |
| Surprise | 13,590 | 500 | 500 |
| Fear | 5878 | 500 | 500 |
| Disgust | 3303 | 500 | 500 |
| Anger | 24,382 | 500 | 500 |
| Contempt | 3250 | 500 | 500 |
| Total | 283,651 | 4000 | 4000 |
Figure 3Distribution of the eight classes in the training set.
Figure 4Sample images from the VGGFace2 dataset.
List of acronyms.
| Abbreviation | Meaning |
|---|---|
| ResNet | Residual neural network |
| SE-ResNet | ResNet-based squeeze and excitation neural network |
| Alpha | Krippendorff’s alpha score |
| Kappa | Cohen’s kappa score |
| AUC | Area under the receiver operating characteristic curve |
| AUC-PR | Area under precision recall curve |
Recognition performance of different FER models on the test set.
| Base Model | Model No. | Pre-Trained | Loss Function | Accuracy | F1-Score | Kappa | Alpha | AUCPR | AUC |
|---|---|---|---|---|---|---|---|---|---|
| SE-ResNet-50 | 1 | No | Softmax | 50.65 | 46.87 | 43.60 | 42.60 | 62.87 | 90.67 |
| 2 | No | Center with softmax | 46.07 | 39.24 | 38.37 | 36.89 | 55.96 | 87.92 | |
| 3 | No | Weighted-softmax | 56.37 | 56.41 | 50.14 | 50.09 | 62.27 | 90.49 | |
| 4 | No | Center with weighted-softmax | 56.90 | 57.06 | 50.74 | 50.71 | 62.24 | 90.33 | |
| 5 | No | Weighted-cluster with weighted-softmax | 56.27 | 56.42 | 50.03 | 49.97 | 61.57 | 90.10 | |
| 6 | Yes | Softmax | 52.22 | 49.51 | 45.4 | 44.54 | 63.27 | 90.75 | |
| 7 | Yes | Center with softmax | 47.08 | 40.02 | 39.51 | 38.27 | 54.91 | 86.75 | |
| 8 | Yes | Weighted-softmax | 59.72 | 59.72 | 53.97 | 53.93 | 66.47 | 91.85 | |
| 9 | Yes | Center with weighted-softmax | 59.60 | 59.50 | 53.83 | 53.82 | 65.35 | 91.21 | |
| 10 | Yes | Weighted-cluster with weighted-softmax (proposed) | 60.70 | 60.49 | 55.09 | 55.06 | 66.55 | 91.82 |
Figure 5Learning curves of different models over number of epoch trained. (a) Train loss during the training. (b) Validation loss during the training. (c) Validation accuracy during the training. Best view in color.
Recognition performance of ResNet-50-based models on the test set.
| Base Model | Model No. | Pre-Trained | Loss Function | Accuracy | F1-score | Kappa | Alpha | AUCPR | AUC |
|---|---|---|---|---|---|---|---|---|---|
| ResNet-50 | 11 | No | Softmax | 49.85 | 46.67 | 42.69 | 41.46 | 62.80 | 90.63 |
| 12 | No | Center with softmax | 46.62 | 39.96 | 39.00 | 37.67 | 54.08 | 86.26 | |
| 13 | No | Weighted-softmax | 57.40 | 57.33 | 51.31 | 51.23 | 63.05 | 90.82 | |
| 14 | No | Center with weighted-softmax | 57.20 | 57.08 | 51.09 | 51.06 | 62.61 | 90.50 | |
| 15 | No | Weighted-cluster with weighted-softmax | 57.37 | 57.40 | 51.29 | 51.24 | 62.79 | 90.52 | |
| 16 | Yes | Softmax | 51.88 | 48.89 | 45.00 | 44.10 | 61.6 | 90.22 | |
| 17 | Yes | Center with softmax | 48.33 | 44.00 | 40.94 | 39.8 | 56.96 | 87.89 | |
| 18 | Yes | Weighted-softmax | 58.65 | 58.59 | 52.74 | 52.71 | 64.58 | 91.17 | |
| 19 | Yes | Center with weighted-softmax | 58.27 | 58.07 | 52.31 | 52.24 | 63.59 | 90.54 | |
| 20 | Yes | Weighted-cluster with weighted-softmax | 59.45 | 59.42 | 53.66 | 53.66 | 65.26 | 91.51 |
Figure 6Confusion matrices of the transfer learning-based models on the test set.