| Literature DB >> 36267236 |
Peiji Chen1, Bochao Zou2, Abdelkader Nasreddine Belkacem3, Xiangwen Lyu4, Xixi Zhao5, Weibo Yi6, Zhaoyang Huang7, Jun Liang8, Chao Chen1,9.
Abstract
Current decoding algorithms based on a one-dimensional (1D) convolutional neural network (CNN) have shown effectiveness in the automatic recognition of emotional tasks using physiological signals. However, these recognition models usually take a single modal of physiological signal as input, and the inter-correlates between different modalities of physiological signals are completely ignored, which could be an important source of information for emotion recognition. Therefore, a complete end-to-end multi-input deep convolutional neural network (MI-DCNN) structure was designed in this study. The newly designed 1D-CNN structure can take full advantage of multi-modal physiological signals and automatically complete the process from feature extraction to emotion classification simultaneously. To evaluate the effectiveness of the proposed model, we designed an emotion elicitation experiment and collected a total of 52 participants' physiological signals including electrocardiography (ECG), electrodermal activity (EDA), and respiratory activity (RSP) while watching emotion elicitation videos. Subsequently, traditional machine learning methods were applied as baseline comparisons; for arousal, the baseline accuracy and f1-score of our dataset were 62.9 ± 0.9% and 0.628 ± 0.01, respectively; for valence, the baseline accuracy and f1-score of our dataset were 60.3 ± 0.8% and 0.600 ± 0.01, respectively. Differences between the MI-DCNN and single-input DCNN were also compared, and the proposed method was verified on two public datasets (DEAP and DREAMER) as well as our dataset. The computing results in our dataset showed a significant improvement in both tasks compared to traditional machine learning methods (t-test, arousal: p = 9.7E-03 < 0.01, valence: 6.5E-03 < 0.01), which demonstrated the strength of introducing a multi-input convolutional neural network for emotion recognition based on multi-modal physiological signals.Entities:
Keywords: biological signals; convolutional neural network; emotion recognition; machine learning; multi-modality
Year: 2022 PMID: 36267236 PMCID: PMC9577494 DOI: 10.3389/fnins.2022.965871
Source DB: PubMed Journal: Front Neurosci ISSN: 1662-453X Impact factor: 5.152
Summary of the multi-modal physiological research using video stimulation.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| ASCERTAIN | Subramanian et al., | ECG, GSR, EEG, EMO | Multiple physiological features and facial action units | Valance/arousal | Non-linear statistics and RBF SVM | 64%/62% for valance and arousal with peripheral Signals (ECG + GSR) (2-classes) |
| DECAF | Abadi et al., | EEG, MEG, NIR, hEOG, ECG, tEMG | Multiple physiological features and audio-video features | Valance/arousal | Linear SVM | 56%/60% for valance and arousal with peripheral physiological (2-classes) |
| DREAMER | Katsigiannis and Ramzan, | EEG, ECG | HRV, PSD | Valance/arousal | SVM | 61.84%/62.32% for Valance/Arousal using all modalities (2-classes) |
| Siddharth et al., | EEG, ECG | PSD, HRV | Valance/arousal | LSTM | 79.95% Valance/Arousal using all fusion(2-classes) | |
| DEAP | Wang and Shang, | EEG, EOG, EMG | Raw data | Valance/arousal | DBN | 51.2%//60.09% Arousal/Valence (2-classes) |
| Tripathi et al., | EEG | Image features extracted by convolutional neural networks | Valance/arousal | DNN/CNN | 81.41%/73.36% for arousal and valence using EEG (2-classes) | |
| Siddharth et al., | EEG, ECG, GSR | Physiological features and features extracted by pre-trained VGG-16 model | Valance/arousal | LSTM | 71.87%/73.05% for valance and arousal (2-classes) | |
| AMIGOS | Santamaria-Granados et al., | ECG, GSR | Time-domain-non-linear features, mean, min, max, standard deviation etc. | Valance/arousal | DCNN | 75%/76% for valance and arousal using all modalities (2-classes) |
| Miranda-Correa et al., | EEG, ECG, GSR | Multiple physiological features | Valance/arousal | Gaussian Bayes/SVM | 56%/56.4% for valance and arousal using all modalities (2-classes) | |
| MAHNOB-HCI | Siddharth et al., | EEG, ECG, GSR | PSD, HRV, Statistical features in time- frequency domain. | Valance/arousal | LSTM | 80.36%/80.61% for valance and arousal using peripheral physiological features (2-classes) |
| Subramanian et al., | ECG | Statistical distributions of dominant frequencies (DFs) from IMFs and their difference | Valance/arousal | KNN | 59.2%/58.7% for valence and arousal (3-classes) | |
| MPED | Song et al., | EEG, GSR, RSP, ECG | PSD, STFT, HHS, HOC, Hjorth ( | Joy, funny, anger, fear, disgust and neutrality | KNN/SVM/LSTM/A-LSTM | Several protocols (Song et al., |
The modalities include electrocardiography (ECG), electroencephalogram (EEG), galvanic skin response (GSR), horizontal electrooculography (hEOG), electromyography (tEMG), and respiratory activity (RSP). The feature extraction methods include heart rate variability (HRV), power spectral density (PSD), short-time Fourier transform (STFT), Hilbert–Huang spectrum (HHS), higher order crossing (HOC), and intrinsic mode functions (IMFs). VGG-16: Visual Geometry Group Network (consists of 13 convolutional layers and three fully connected layers). The classifiers include support vector machine (SVM), K-nearest neighbors (KNN), deep neural network (DNN), deep convolutional neural network (DCNN), long short-term memory (LSTM), attention-long short-term memory (A-LSTM), and deep belief network (DBN).
Details of the emotion elicitation material in our dataset.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| Disgust | FOOD FOR LOUIS | A man eats mealworms | 34 s | FOOD FOR LOUIS | A man eats cockroaches | 42 s |
| Fear | Myopia surgery | Doctor does eye surgery for a patient | 116 s | Ring | A woman crawled out of the television | 201 s |
| Final destination | A serious car accident | 103 s | The eye | A woman meets a ghost in the elevator | 77 s | |
| Sadness | Redmond's Olympic Story | An athlete finishes the race with injury | 91 s | Wenzhou train collision | Families of the accident seek the truth | 117 s |
| Anger | Beating gravida | A man beats a woman in a restaurant | 144 s | Dog abuse | A man abetted a big dog to bite a little dog | 96 s |
| Happiness | Larva farting episode | Yellow tries to prevent the fart from coming out | 92 s | Fault funny collection | fragments of mistakes and funny accidents | 77 s |
Figure 1ECG analysis of subject 1 in the selected interval (10 s) of video b. (A) Filtering and R-peaks detection of ECG signals. (B) Heart rate changes during the selected interval. (C) Individual heartbeats.
ECG features used in this research.
|
| |
|---|---|
| Time domain | RMSSD, MeanNN, SDNN, SDSD, CVNN, CVSD, MedianNN, MadNN, MCVNN, IQRNN, pNN50, pNN20, TINN, HTI |
| Frequency domain | HF, VHF, HFn, LnHF |
| Non-linear domain | SD1, SD2, SD1SD2, S, CSI, CVI, CSI Modified, PIP, IALS, PSS, PAS, GI, SI, AI, PI, C1d, C1a, SD1d, SD1a, C2d, C2a, SD2d, SD2a, Cd, Ca, SDNNd, SDNNa, ApEn, SampEn |
Figure 2Multi-input deep convolutional neural network.
Detailed feature extraction structure of the multi-input deep convolutional neural network.
|
|
| |||
|---|---|---|---|---|
|
|
|
| ||
| Conv Block1 |
|
|
| Activation = ReLU, Strides = 2/2/3, Padding =”'same” |
| Conv Block2 |
|
|
| Activation = ReLU, Strides = 2/3/2, Padding = “same” |
| Conv Block3 |
|
|
| Activation = ReLU, Strides = 1/1/1, Padding = “same” |
| GAP layer | [(?, 128)] | [(?, 32)] | [(?, 64)] | - |
| Concatenate layer | [(?, 224)] | - | ||
GAP, global average pooling. ? means batch size.
Figure 3Architecture of convolutional blocks.
Figure 4Framework of emotion recognition in our research.
Baseline of our dataset.
|
|
|
|
| ||
|---|---|---|---|---|---|
|
|
|
|
| ||
| SVM | ECG | 60.0 ± 1.7% | 0.599 ± 0.02 | 58.0 ± 1.1% | 0.573 ± 0.02 |
| EDA | 58.5 ± 1.3% | 0.584 ± 0.01 | 55.3 ± 1.7% | 0.512 ± 0.02 | |
| RSP | 59.2 ± 2.3% | 0.579 ± 0.04 | 55.5 ± 1.4% | 0.548 ± 0.01 | |
| Fusion | 62.9 ± 0.9% | 0.628 ± 0.01 | 60.3 ± 0.8% | 0.600 ± 0.01 | |
| RFC | ECG | 61.4 ± 2.4% | 0.612 ± 0.02 | 56.8 ± 1.1% | 0.567 ± 0.01 |
| EDA | 56.3 ± 1.2% | 0.561 ± 0.01 | 54.6 ± 1.5% | 0.535 ± 0.02 | |
| RSP | 59.9 ± 1.4% | 0.595 ± 0.01 | 57.6 ± 1.1% | 0.569 ± 0.01 | |
| Fusion | 62.9 ± 2.1% | 0.627 ± 0.02 | 59.7 ± 1.3% | 0.594 ± 0.02 | |
| KNN | ECG | 59.5 ± 3.1% | 0.592 ± 0.03 | 55.9 ± 2.1% | 0.554 ± 0.02 |
| EDA | 54.3 ± 2.1% | 0.523 ± 0.02 | 53.6 ± 1.6% | 0.520 ± 0.02 | |
| RSP | 58.9 ± 1.9% | 0.588 ± 0.02 | 55.7 ± 2.1% | 0.552 ± 0.02 | |
| Fusion | 62.7 ± 1.2% | 0.625 ± 0.01 | 59.2 ± 0.7% | 0.584 ± 0.01 | |
| Random | 50.1 ± 0.2% | 0.489 ± 0.01 | 49.9 ± 0.1% | 0.489 ± 0.01 | |
| Majority | 50.0% | 0.333 | 50.0% | 0.333 | |
Comparison with other public datasets.
|
|
|
|
| ||
|---|---|---|---|---|---|
|
|
|
|
| ||
| DEAP | EEG | 62.0% | 0.583 | 57.6% | 0.563 |
| Peripheral | 57.0% | 0.533 | 62.7% | 0.608 | |
| Majority class | 64.4% | 0.389 | 58.6% | 0.368 | |
| DREAMER | EEG | 62.2% | 0.577 | 62.5% | 0.518 |
| ECG | 62.4% | 0.580 | 62.4% | 0.531 | |
| Fusion | 62.3% | 0.575 | 61.8% | 0.521 | |
| AMIGOS | EEG | N/A | 0.577 | N/A | 0.564 |
| GSR | N/A | 0.541 | N/A | 0.528 | |
| ECG | N/A | 0.551 | N/A | 0.545 | |
| Fusion | N/A | 0.564 | N/A | 0.560 | |
| OURS | ECG | 61.4% | 0.612 | 58.0% | 0.573 |
| EDA | 58.5% | 0.584 | 55.3% | 0.512 | |
| RSP | 59.9% | 0.595 | 57.6% | 0.569 | |
| Fusion | 62.9% | 0.628 | 60.3% | 0.600 | |
| Majority class | 50.0% | 0.333 | 50.0% | 0.333 | |
Results of using our proposed methods in our dataset.
|
|
| |||
|---|---|---|---|---|
|
|
|
|
| |
| Single-in DCNN ECG | 75.6 ± 0.5% | 0.751 ± 0.00 | 64.5 ± 1.0% | 0.642 ± 0.01 |
| Single-in DCNN EDA | 58.3 ± 0.5% | 0.569 ± 0.01 | 55.6 ± 0.5% | 0.544 ± 0.02 |
| Single-in DCNN RSP | 66.2 ± 0.8% | 0.656 ± 0.01 | 58.5 ± 0.4% | 0.579 ± 0.01 |
| Single-in DCNN ECG&EDA&RSP | 63.9 ± 1.7% | 0.635 ± 0.02 | 56.6 ± 1.3% | 0.554 ± 0.02 |
| Multi-in DCNN ECG&EDA&RSP | 78.3 ± 1.6% | 0.780 ± 0.01 | 67.1 ± 1.2% | 0.666 ± 0.01 |
Figure 5Comparison of emotion classification based on ECG and ECG&EDA&RSP fusion data.
Figure 6Confusion matrixes using fusion data (ECG&EDA&RSP) of our proposed method: (A) arousal; (B) valence.
Comparison with other research.
|
|
|
|
|
|
| ||
|---|---|---|---|---|---|---|---|
|
|
|
|
| ||||
| DEAP | (Koelstra et al., | Peripheral | No | 62.7% | 0.608 | 57.0% | 0.533 |
| (Wang and Shang, | EEG, EOG, EMG | Yes | 51.2% | - | 60.9% | - | |
| Our | Peripheral | Yes | 67.9% | 0.67 | 68.4% | 0.69 | |
| DREAMER | (Katsigiannis and Ramzan, | ECG | No | 62.37% | 0.5305 | 62.37% | 0.5798 |
| Our | ECG | Yes | 78.6% | 0.77 | 74.7% | 0.74 | |