| Literature DB >> 35684917 |
Shiqi Wang1, Suen Guan1, Hui Lin1, Jianming Huang2, Fei Long1,2, Junfeng Yao1,2.
Abstract
Micro-expressions are rapid and subtle facial movements. Different from ordinary facial expressions in our daily life, micro-expressions are very difficult to detect and recognize. In recent years, due to a wide range of potential applications in many domains, micro-expression recognition has aroused extensive attention from computer vision. Because available micro-expression datasets are very small, deep neural network models with a huge number of parameters are prone to over-fitting. In this article, we propose an OF-PCANet+ method for micro-expression recognition, in which we design a spatiotemporal feature learning strategy based on shallow PCANet+ model, and we incorporate optical flow sequence stacking with the PCANet+ network to learn discriminative spatiotemporal features. We conduct comprehensive experiments on publicly available SMIC and CASME2 datasets. The results show that our lightweight model obviously outperforms popular hand-crafted methods and also achieves comparable performances with deep learning based methods, such as 3D-FCNN and ELRCN.Entities:
Keywords: PCANet+; deep learning; micro-expression recognition; optical flow
Mesh:
Year: 2022 PMID: 35684917 PMCID: PMC9185295 DOI: 10.3390/s22114296
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Convention of variable representation.
| Variable Symbol | Description |
|---|---|
|
| A |
|
| |
|
| A 2-dimensional real matrix with |
|
| A 3-dimensional real matrix with size of |
|
| A clipped matrix of |
|
| A set. |
|
| Size of the set |
Figure 1The framework of the proposed ME recognition method.
Figure 2Example of optical flow motion estimation, where we set the first frame of ME image sequence as the reference frame and then compute the optical flow field between the reference frame and the rest of the frames with a subspace trajectory model.
Figure 3Illustration of stacking optical flow sequences into multi-channel images.
Figure 4The frames of a sample video clip (happiness) in CASME2 dataset.
Detailed information of SMIC and CASME2 dataset.
| Dataset | SMIC-HS | CASME2 |
|---|---|---|
| Subjects | 16 | 26 |
| Sample | 164 | 247 |
| Year | 2013 | 2014 |
| Frame Rate | 100 | 200 |
| Image Resolution | 640 × 480 | 640 × 480 |
| Emotion classes | 5 categories: | |
| 3 categories: | happiness (32) | |
| positive (51) | surprise (25) | |
| negative (70) | disgust (64) | |
| surprise (43) | repression (27) | |
| others (99) |
ME recognition results of OF-PCANet+ with respect to different frame stacking number, T.
| Frame Stacking Number | SMIC | CASME2 | ||||
|---|---|---|---|---|---|---|
| Accuracy | Macro-F1 | Macro-Recall | Accuracy | Macro-F1 | Macro-Recall | |
| 1 | 0.4268 | 0.3924 | 0.3890 | 0.2301 | 0.2437 | 0.2316 |
| 3 | 0.6159 | 0.6184 | 0.6214 | 0.4959 | 0.4960 | 0.4786 |
| 5 | 0.6280 | 0.6309 | 0.6369 | 0.5203 | 0.5266 | 0.5148 |
| 7 | 0.6098 | 0.6109 | 0.6131 | 0.4512 | 0.4412 | 0.4270 |
ME recognition results of OF-PCANet+ with respect to different number and size of filters .
|
| SMIC | CASME2 | ||||
|---|---|---|---|---|---|---|
| Accuracy | Macro-F1 | Macro-Recall | Accuracy | Macro-F1 | Macro-Recall | |
|
| 0.5854 | 0.5893 | 0.5941 | 0.5000 | 0.5122 | 0.4962 |
|
| 0.5854 | 0.5880 | 0.5941 | 0.5041 | 0.5047 | 0.4950 |
|
| 0.5915 | 0.5954 | 0.6036 | 0.5081 | 0.5114 | 0.5020 |
|
| 0.5793 | 0.5834 | 0.5905 | 0.5163 | 0.5198 | 0.5055 |
|
| 0.6098 | 0.6127 | 0.6173 | 0.5285 | 0.5272 | 0.5031 |
|
| 0.5976 | 0.6010 | 0.6084 | 0.5122 | 0.5128 | 0.4950 |
|
| 0.6098 | 0.6137 | 0.6209 | 0.5041 | 0.5081 | 0.4867 |
|
|
|
|
| 0.5203 | 0.5266 | 0.5148 |
|
| 0.6037 | 0.6046 | 0.6096 |
|
|
|
|
| 0.6037 | 0.6053 | 0.6126 | 0.5285 | 0.5280 | 0.5067 |
|
| 0.5976 | 0.6007 | 0.6048 | 0.4919 | 0.4931 | 0.4724 |
|
| 0.6220 | 0.6247 | 0.6310 | 0.5081 | 0.5152 | 0.4931 |
|
| 0.5915 | 0.5943 | 0.6001 | 0.4268 | 0.4096 | 0.4096 |
|
| 0.6098 | 0.6131 | 0.6167 | 0.4350 | 0.4250 | 0.4250 |
Summary of the configuration of PCANet+ network.
| Best Configuration For SMIC | Best Configuration For CASME2 | |
|---|---|---|
|
|
|
|
|
|
|
|
|
| Str. 1, Pad. 3 | Str. 1, Pad. 3 |
| Pool-1 | ||
|
|
|
|
|
| Str. 1, Pad. 4 | Str. 1, Pad. 3 |
Figure 5The visualization of feature maps produced in each layer for an input video clip from CASME2 dataset.
Comparisons of different methods.
| Method | SMIC | CASME2 | ||||
|---|---|---|---|---|---|---|
| Accuracy | Macro-F1 | Macro-Recall | Accuracy | Macro-F1 | Macro-Recall | |
| LBP-TOP [ | 0.4207 | 0.4266 | 0.4429 | 0.4390 | 0.4297 | 0.4259 |
| STLBP-IP [ | 0.4329 | 0.4270 | 0.4241 | 0.4173 | 0.4026 | 0.4282 |
| KGSL [ | 0.5244 | 0.4937 | 0.5162 | 0.4575 | 0.4325 | 0.4437 |
| ELRCN [ | N/A | N/A | N/A | 0.5244 | 0.5000 | 0.4396 |
| 3D-FCNN [ | 0.5549 | N/A | N/A | 0.5911 | N/A | N/A |
| OF-PCANet+ | 0.6280 | 0.6309 | 0.6369 | 0.5325 | 0.5493 | 0.5241 |