| Literature DB >> 35746417 |
Marzuraikah Mohd Stofa1, Mohd Asyraf Zulkifley1, Muhammad Ammirrul Atiqi Mohd Zainuri1.
Abstract
Understanding a person's attitude or sentiment from their facial expressions has long been a straightforward task for humans. Numerous methods and techniques have been used to classify and interpret human emotions that are commonly communicated through facial expressions, with either macro- or micro-expressions. However, performing this task using computer-based techniques or algorithms has been proven to be extremely difficult, whereby it is a time-consuming task to annotate it manually. Compared to macro-expressions, micro-expressions manifest the real emotional cues of a human, which they try to suppress and hide. Different methods and algorithms for recognizing emotions using micro-expressions are examined in this research, and the results are presented in a comparative approach. The proposed technique is based on a multi-scale deep learning approach that aims to extract facial cues of various subjects under various conditions. Then, two popular multi-scale approaches are explored, Spatial Pyramid Pooling (SPP) and Atrous Spatial Pyramid Pooling (ASPP), which are then optimized to suit the purpose of emotion recognition using micro-expression cues. There are four new architectures introduced in this paper based on multi-layer multi-scale convolutional networks using both direct and waterfall network flows. The experimental results show that the ASPP module with waterfall network flow, which we coined as WASPP-Net, outperforms the state-of-the-art benchmark techniques with an accuracy of 80.5%. For future work, a high-resolution approach to multi-scale approaches can be explored to further improve the recognition performance.Entities:
Keywords: convolutional neural networks; deep learning; emotion classification; micro-expression analysis
Mesh:
Year: 2022 PMID: 35746417 PMCID: PMC9227116 DOI: 10.3390/s22124634
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1Differences in facial muscle movement for happy emotion among the test subjects: (a) subject 1; (b) subject 2; (c) subject 3.
The number of samples for the tested datasets.
| Types of Emotion | Combined | CASME II | SAMM | SMIC |
|---|---|---|---|---|
| Positive | 109 | 32 | 26 | 51 |
| Negative | 250 | 88 | 92 | 70 |
| Surprise | 83 | 25 | 14 | 43 |
| Total | 441 | 145 | 132 | 164 |
Network architecture of the proposed base CNN model.
| Layer | Size of Kernel | Stride | Padding | Size of Output | Activation Function |
|---|---|---|---|---|---|
| Conv1 | 7 × 7 | 1 | 1 | 96 × 69 × 69 | ReLu |
| Conv2 | 5 × 5 | 1 | 1 | 256 × 65 × 65 | ReLu |
| Conv3 | 3 × 3 | 1 | 0 | 512 × 65 × 65 | ReLu |
| Pool3 | 3 × 3 | 2 | 1 | 512 × 32 × 32 | - |
| Conv4 | 3 × 3 | 1 | 0 | 512 × 32 × 32 | ReLu |
| Pool4 | 3 × 3 | 2 | 1 | 512 × 16 × 16 | - |
| Conv5 | 3 × 3 | 1 | 0 | 512 × 16 × 16 | ReLu |
| Pool5 | 3 × 3 | 2 | 1 | 512 × 8 × 8 | - |
| FC1 | - | - | - | 128 | ReLu |
| FC2 | - | - | - | 128 | ReLu |
| FC3 | - | - | - | 3 | Softmax |
Figure 2Basic SPP module architecture.
List of the SPP module architecture variants.
| SPP Model | Number of Parallel Paths | Maximum Kernel Size | Position |
|---|---|---|---|
| I | 2 SPP | 4 × 4 | After Conv1 |
| II | 3 SPP | 6 × 6 | After Conv1 |
| III | 4 SPP | 8 × 8 | After Conv1 |
| IV | 5 SPP | 10 × 10 | After Conv1 |
| V | 2 SPP | 4 × 4 | After Conv2 |
| VI | 3 SPP | 6 × 6 | After Conv2 |
| VII | 4 SPP | 8 × 8 | After Conv2 |
| VIII | 5 SPP | 10 × 10 | After Conv2 |
Figure 3Two placement strategies of the SPP module in the base CNN model.
Figure 4Basic ASPP module architecture.
Figure 5Two placement strategies of the ASPP module in the base CNN model.
List of ASPP module architecture.
| ASPP Model | Number of Parallel Paths | Maximum Dilation Rate | Position |
|---|---|---|---|
| I | 2 ASPP | 2 | After Conv1 |
| II | 3 ASPP | 3 | After Conv1 |
| III | 4 ASPP | 4 | After Conv1 |
| IV | 5 ASPP | 5 | After Conv1 |
| V | 2 ASPP | 2 | After Conv2 |
| VI | 3 ASPP | 3 | After Conv2 |
| VII | 4 ASPP | 4 | After Conv2 |
| VIII | 5 ASPP | 5 | After Conv2 |
Figure 6Direct network flow of the SPP and ASPP modules: (a) DSPP-Net architecture; (b) DASPP-Net architecture.
Figure 7Waterfall network flow for SPP and ASPP modules: (a) WSPP-Net architecture; (b) WASPP-Net architecture.
List of experimental hyperparameters.
| Hyperparameter | Type/Value | Function |
|---|---|---|
| Optimizer | Adam [ | Update parameters such as weights and learning rates to reduce losses |
| Learning rate | 0.0001 | Update weights |
| Batch size | 32 | Number of samples taken to update model parameters |
| Number of training samples | Leave-One-Subject-Out (LOSO) | The combined number of samples used |
Emotion classification accuracy evaluated based on number of parallel paths and placement position of the SPP module.
| Types of Datasets | Accuracy (%) | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Original | Types of SPP Model | ||||||||
| I | II | III | IV | V | VI | VII | VIII | ||
| Combined | 77.48 | 77.63 | 77.32 | 77.63 | 77.48 | 77.93 | 78.23 | 77.48 | 79.59 |
| CASME II | 88.51 | 91.26 | 87.59 | 88.51 | 88.51 | 89.43 | 87.13 | 91.26 | 89.89 |
| SAMM | 67.68 | 67.17 | 69.7 | 71.21 | 70.2 | 69.7 | 72.73 | 67.68 | 73.23 |
| SMIC | 75.61 | 73.98 | 74.39 | 73.17 | 73.58 | 74.39 | 74.8 | 73.17 | 75.61 |
Overall emotion classification F1 score results evaluated based on number of parallel paths and placement position of the SPP module.
| Types of Datasets | F1 Score | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Original | Types of SPP Model | ||||||||
| I | II | III | IV | V | VI | VII | VIII | ||
| Combined | 0.6621 | 0.6644 | 0.6599 | 0.6644 | 0.6621 | 0.6689 | 0.6735 | 0.6621 | 0.6939 |
| CASME II | 0.8276 | 0.869 | 0.8138 | 0.8276 | 0.8276 | 0.8414 | 0.8069 | 0.869 | 0.8483 |
| SAMM | 0.5152 | 0.5076 | 0.5455 | 0.5682 | 0.553 | 0.5455 | 0.5909 | 0.5152 | 0.5985 |
| SMIC | 0.6441 | 0.6098 | 0.6159 | 0.5976 | 0.6037 | 0.6159 | 0.622 | 0.5976 | 0.6341 |
Emotion classification accuracy evaluated based on number of parallel paths and placement position of ASPP module.
| Types of Datasets | Accuracy (%) | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Original | Types of ASPP Model | ||||||||
| I | II | III | IV | V | VI | VII | VIII | ||
| Combined | 77.48 | 76.11 | 77.02 | 76.11 | 78.53 | 79.14 | 78.08 | 77.48 | 77.63 |
| CASME II | 88.51 | 89.89 | 89.97 | 86.67 | 89.42 | 88.97 | 90.8 | 90.8 | 87.59 |
| SAMM | 67.68 | 66.16 | 70.71 | 70.71 | 71.21 | 73.74 | 69.19 | 70.2 | 70.2 |
| SMIC | 75.61 | 71.95 | 71.54 | 71.14 | 74.8 | 74.8 | 73.98 | 71.54 | 74.8 |
Overall precision of emotion classification F1 score results evaluated based on number of parallel paths and placement position of the ASPP module.
| Types of Datasets | F1 Score | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Original | Types of ASPP Model | ||||||||
| I | II | III | IV | V | VI | VII | VIII | ||
| Combined | 0.6621 | 0.6417 | 0.6553 | 0.6417 | 0.678 | 0.6871 | 0.6712 | 0.6621 | 0.6644 |
| CASME II | 0.8276 | 0.8483 | 0.8345 | 0.80 | 0.8414 | 0.8345 | 0.8621 | 0.8621 | 0.8138 |
| SAMM | 0.5152 | 0.4924 | 0.5606 | 0.5606 | 0.5682 | 0.6061 | 0.5379 | 0.553 | 0.553 |
| SMIC | 0.6441 | 0.5793 | 0.5732 | 0.5671 | 0.622 | 0.622 | 0.6098 | 0.5732 | 0.622 |
Comparison of emotion classification accuracy on DSPP-Net and WSPP-Net architectures.
| Types of Datasets | Accuracy (%) | ||
|---|---|---|---|
| Original | DSPP-Net | WSPP-Net | |
| Combined | 77.48 | 77.93 | 80.20 |
| CASME II | 88.51 | 89.43 | 92.18 |
| SAMM | 67.68 | 69.7 | 72.73 |
| SMIC | 75.61 | 74.39 | 75.61 |
Figure 8The training graph performance: (a) DSPP-Net architecture; (b) WSPP-Net architecture.
Comparison of emotion classification accuracy on DASPP-Net and WASPP-Net architectures.
| Types of Datasets | Accuracy (%) | ||
|---|---|---|---|
| Original | DASPP-Net | WASPP-Net | |
| Combined | 77.48 | 78.08 | 80.50 |
| CASME II | 88.51 | 90.8 | 92.18 |
| SAMM | 67.68 | 69.19 | 71.21 |
| SMIC | 75.61 | 73.98 | 77.64 |
Figure 9The training graph performance: (a) DASPP-Net architecture; (b) WASPP-Net architecture.
Execution time comparison between DSPP-Net, WSPP-Net, DASPP-Net, and WASPP-Net architectures.
| Type of Architecture | Training Time Per Subject (s) | Execution Time (Frames Per Second) |
|---|---|---|
| Original (Without SPP/ASPP Module) | 520 | 418 |
| DSPP-Net | 431 | 510 |
| WSPP-Net | 370 | 591 |
| DASPP-Net | 548 | 400 |
| WASPP-Net | 447 | 460 |
Performance comparison to the state-of-the-art CNN models.
| Method | Accuracy (%) | F1-Score |
|---|---|---|
| VGG-M | 72.34 | 0.5850 |
| DualInception | 73.09 | 0.5964 |
| AlexNet | 75.51 | 0.6327 |
| STSTNet | 77.48 | 0.6621 |
| OffApexNet | 78.38 | 0.6757 |
| WASPP-Net | 80.50 | 0.7075 |
Number of network parameter for each architecture model.
| Types of Models | Number of Parameter |
|---|---|
| DSPP-Net | 8,378,659 |
| WSPP-Net | 8,231,203 |
| DASPP-Net | 8,378,659 |
| WASPP-Net | 8,117,794 |