| Literature DB >> 34909462 |
Olufisayo Ekundayo1, Serestina Viriri1.
Abstract
Facial Expression Recognition (FER) has gained considerable attention in affective computing due to its vast area of applications. Diverse approaches and methods have been considered for a robust FER in the field, but only a few works considered the intensity of emotion embedded in the expression. Even the available studies on expression intensity estimation successfully assigned a nominal/regression value or classified emotion in a range of intervals. Most of the available works on facial expression intensity estimation successfully present only the emotion intensity estimation. At the same time, others proposed methods that predict emotion and its intensity in different channels. These multiclass approaches and extensions do not conform to man heuristic manner of recognising emotion and its intensity estimation. This work presents a Multilabel Convolution Neural Network (ML-CNN)-based model, which could simultaneously recognise emotion and provide ordinal metrics as the intensity estimation of the emotion. The proposed ML-CNN is enhanced with the aggregation of Binary Cross-Entropy (BCE) loss and Island Loss (IL) functions to minimise intraclass and interclass variations. Also, ML-CNN model is pre-trained with Visual Geometric Group (VGG-16) to control overfitting. In the experiments conducted on Binghampton University 3D Facial Expression (BU-3DFE) and Cohn Kanade extension (CK+) datasets, we evaluate ML-CNN's performance based on accuracy and loss. We also carried out a comparative study of our model with some popularly used multilabel algorithms using standard multilabel metrics. ML-CNN model simultaneously predicts emotion and intensity estimation using ordinal metrics. The model also shows appreciable and superior performance over four standard multilabel algorithms: Chain Classifier (CC), distinct Random K label set (RAKEL), Multilabel K Nearest Neighbour (MLKNN) and Multilabel ARAM (MLARAM).Entities:
Keywords: Binary cross-entropy; Facial expression recognition; Island loss; Multilabel; Ordinal intensity estimation
Year: 2021 PMID: 34909462 PMCID: PMC8641570 DOI: 10.7717/peerj-cs.736
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Figure 1Showing multilabel problem formulation of FER.
The nodes under emotion represent the six basic emotion classes Anger, Disgust, Fear, Happy, Sad, Surprise, and the nodes under the degree represent the ordinal estimation of emotion intensity Low, Normal, High, Very High and the output is the possible result of the multilabel CNN classification.
Summary of various models for emotion and intensity recognition.
| Method | Model | DB & performance | Limitation |
|---|---|---|---|
|
| Distance based | Primary source: NA | Only few emotions are considered, method not generalise, emotion intensity before emotion recognition, computationally expensive. |
|
| Optical flow tracking algorithm (Distance) | Real-time data | Need for each subject to be trained differently, not generalise, predicting intensity before emotion |
|
| HCORF (Prob) | CMU | Intrinsic topology of FER data is linearly model. |
|
| K-Means (Cluster) | CK+ | Predict intensity before emotion, intensity estimation based on graphical difference is not logical |
|
| Scatering transform + SVM (Cluster) | CK+ | Emotion recognition task is omitted. |
|
| SVOR (Regression) | Pain | Correlations between emotion classes are not modelled. |
|
| LSM-CORF (Prob) | BU-4DFE, CK+ | Latent states are not considered in the modeling of sequences across and within the classes |
|
| VSL-CRF (Prob) | CK+ AFEW | Result of emotion intensity is not accounted for. |
|
| weighted vote | CK+ | Emotion and emotion intensity not concurrently predicted. |
| Proposed model | ML-CNN (Multi-Label) | BU-3DFE | Assume temporal information among sequence data as ordinal metrics. |
Note:
NA: Not Applicable, MAE: Mean Absolute Error, PCC: Pearson Correlation Coefficient, ICC: Intraclass Correlation, MAL: MeanAbsolute Loss, HL: Hamming Loss, RL: Ranking Loss; AP: Average Precision, CE: Coverage Error.
Figure 2The description of Multilabel CNN model for facial expression recognition and intensity estimation.
Figure 3(A) Description of VGG-16 model; (B) the proposed ML-CNN modeland; (C) the VGGML-CNN model, which the optimised version of ML-CNN.
ML-CNN algorithm.
| 1 Given: minibatch n, learning rate |
| 2 Initialization: {t, W, |
| 3 t = 1 |
| 4 while(t != T) {compute the aggregate loss Lagg = LBCE + |
| 5 update LBCE |
| 6 |
| 7 update |
| 8 |
| 9 update backpropagation error |
| 10 |
| 11 Update network layer parameter |
| 12 |
| 13 |
The tabular presentation of ML-CNN and VGGML-CNN performance evaluation Using accuracy and aggregate loss on BU-3DFE and CK+ datasets, and their comparison with some existing methods.
In the table, metric with ↑ indicates that the higher the metric value the better the model performance, and metric with ↓ indicates that the lower or smaller the value of the metric the better the model performance.
| ML-Models | Database | Accuracy ↑ | Aggregate loss ↓ |
|---|---|---|---|
| ML-CNN | BU-3DFE | 88.56 | 0.3534 |
| AUG_BU-3DFE | 92.84 | 0.1841 | |
| CK+ | 93.24 | 0.2513 | |
| VGGML-CNN | Bu-3DFE | 94.18 | 0.1723 |
| AUG_BU-3DFE | 98.01 | 0.1411 | |
| CK+ | 97.16 | 0.1842 | |
|
| CK+ | 82.4 | NA |
|
| CK+ | 94.5 | NA |
|
| CK+ | 88.3 | NA |
The result of the comparative studies of multilabel models’ performances on BU-3DFE dataset is presented as follows.
Metric with ↑ indicates the higher the metric value, the better the model performance, and metric with ↓ indicates the lower or smaller the value of the metric the better the model’s performance.
| ML-Models | Hamming loss ↓ | Ranking loss ↓ | Average precision ↑ | Coverage ↓ |
|---|---|---|---|---|
| RAKELD | 0.4126 | 0.6859 | 0.2274 | 4.8137 |
| CC | 0.1807 | 0.8393 | 0.3107 | 4.8094 |
| MLkNN | 0.1931 | 0.8917 | 0.2634 | 4.9486 |
| MLARAM | 0.3045 | 0.6552 | 0.3180 | 3.1970 |
| ML-CNN | 0.1273 | 0.2867 | 0.5803 | 2.5620 |
| VGGML-CNN | 0.0890 | 0.1647 | 0.7093 | 1.9091 |
The result of the comparative studies of multilabel models’ performances on CK+ dataset is presented as follows.
Metric with ↑ indicates the higher the metric value, the better the model performance, and metric with ↓ indicates the lower or smaller the value of the metric the better the model’s performance.
| ML-Model | Hamming loss ↓ | Ranking loss ↓ | Average precision ↑ | Coverage ↓ |
|---|---|---|---|---|
| RAKELD | 0.3904 | 0.6637 | 0.2370 | 4.4435 |
| CC | 0.1489 | 0.6842 | 0.4234 | 4.7339 |
| MLkNN | 0.1839 | 0.8345 | 0.2965 | 4.7930 |
| MLARAM | 0.1951 | 0.4636 | 0.4144 | 3.0748 |
| ML-CNN | 0.1487 | 0.4161 | 0.5926 | 2.8120 |
| VGGML-CNN | 0.1393 | 0.3897 | 0.6002 | 1.4359 |
The comparative studies of multilabel models’ performances on augmented BU-3DFE dataset are presented as follows.
Metric with ↑ indicates the higher the metric value, the better the model performance, and metric with ↓ indicates the lower or smaller the value of the metric the better the model’s performance.
| ML-Model | Hamming loss ↓ | Ranking loss ↓ | Average precision ↑ | Coverage ↓ |
|---|---|---|---|---|
| RAKELD | 0.3858 | 0.7223 | 0.2241 | 4.0453 |
| CC | 0.1825 | 0.8948 | 0.2812 | 4.7270 |
| MLkNN | 0.1929 | 0.9025 | 0.2573 | 4.9623 |
| MLARAM | 0.3169 | 0.6963 | 0.3280 | 2.9315 |
| ML-CNN | 0.1124 | 0.2278 | 0.7216 | 2.2397 |
| VGGML-CNN | 0.0628 | 0.1561 | 0.8637 | 1.3140 |
Emotion and intensity degree predictions on BU-3DFE test samples.
| EMotion and ordinal intensity | Accuracy % |
|---|---|
| Anger | 97.0 |
| Disgust | 98.3 |
| Fear | 97.0 |
| Happy | 100 |
| Sadness | 98.7 |
| Surprise | 98.7 |
| Low | 98.7 |
| Normal | 97.5 |
| High | 97.5 |
| Very High | 97.0 |
Emotion and intensity degree prediction on CK+ test samples.
| Emotion and ordinal intensity | Accuracy % |
|---|---|
| Anger | 98.1 |
| Disgust | 98.1 |
| Fear | 100 |
| Happy | 98.1 |
| Sadness | 100 |
| Surprise | 100 |
| Low | 96.2 |
| Normal | 83.3 |
| High | 87.0 |
| Very High | 96.3 |
Figure 4Multilabel confusion matrix of the VGGML-CNN on Bu-3DFE.
Figure 5Multilabel confusion matrix of the VGGML-CNN on CK+.
Comparison result of VGGML-CNN with some recent models on CK+.
| Model | Accuracy % | No of classes | Target |
|---|---|---|---|
| 94.35 | 7 | Expression only | |
| 95.78 | 7 | Expression only | |
| 98 | 7 | Expression only | |
| 91.50 | 6 | Expression and intensity | |
| 93.08 | 7 | Expression distribution | |
| ML-CNN | 93.24 | 6 | Expression and intensity |
| VGGML-CNN | 97.16 | 6 | Expression and intensity |
Comparison result of VGGML-CNN with some recent models on BU-3DFE.
| Model | Accuracy % | No of classes | Target |
|---|---|---|---|
| 85.15 | 7 | Expression only | |
| 88.70 | 6 | Expression only | |
| 80.63 | 6 | Expression only | |
| VGGML-CNN | 98.01 | 6 | Expression and intensity |