| Literature DB >> 35957167 |
Eman M G Younis1, Someya Mohsen Zaki2, Eiman Kanjo3, Essam H Houssein1.
Abstract
Automatic recognition of human emotions is not a trivial process. There are many factors affecting emotions internally and externally. Expressing emotions could also be performed in many ways such as text, speech, body gestures or even physiologically by physiological body responses. Emotion detection enables many applications such as adaptive user interfaces, interactive games, and human robot interaction and many more. The availability of advanced technologies such as mobiles, sensors, and data analytics tools led to the ability to collect data from various sources, which enabled researchers to predict human emotions accurately. Most current research uses them in the lab experiments for data collection. In this work, we use direct and real time sensor data to construct a subject-independent (generic) multi-modal emotion prediction model. This research integrates both on-body physiological markers, surrounding sensory data, and emotion measurements to achieve the following goals: (1) Collecting a multi-modal data set including environmental, body responses, and emotions. (2) Creating subject-independent Predictive models of emotional states based on fusing environmental and physiological variables. (3) Assessing ensemble learning methods and comparing their performance for creating a generic subject-independent model for emotion recognition with high accuracy and comparing the results with previous similar research. To achieve that, we conducted a real-world study "in the wild" with physiological and mobile sensors. Collecting the data-set is coming from participants walking around Minia university campus to create accurate predictive models. Various ensemble learning models (Bagging, Boosting, and Stacking) have been used, combining the following base algorithms (K Nearest Neighbor KNN, Decision Tree DT, Random Forest RF, and Support Vector Machine SVM) as base learners and DT as a meta-classifier. The results showed that, the ensemble stacking learner technique gave the best accuracy of 98.2% compared with other variants of ensemble learning methods. On the contrary, bagging and boosting methods gave (96.4%) and (96.6%) accuracy levels respectively.Entities:
Keywords: emotion recognition; ensemble learning; multi-modal emotion recognition; physiological and environmental; subject independent predictive models for emotion
Mesh:
Year: 2022 PMID: 35957167 PMCID: PMC9371233 DOI: 10.3390/s22155611
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
List of some on-body sensors that have been used for emotion detection.
| Sensor | Signals and Features |
|---|---|
| Motion | Because modern accelerometers incorporate tri-axial micro-electro-mechanical systems (MEMS) to record three-dimensional acceleration, the motion equation is as follows: |
| Body Temperature | Despite its simplicity, we can use body temperature to gauge a person’s emotions and mood shifts [ |
| Heart Rate | The RR interval refers to the period between 2 successive pulse peaks, and the signal produced by this sensor consists of heartbeats. According to many researches, authors sometimes use HR to measure happiness and emotions [ |
| EDA | It is sometimes called Galvanic Skin Resistance (GSR) and is associated with emotional and stress sensitivity [ |
Previous Research on Recognizing Emotion from Physiological Signals and Facial Expressions.
| Emotions | Measurement Methods | Data Analysis Methods | Accuracy | Ref. |
|---|---|---|---|---|
| Sadness, anger, stress, surprise | ECG, SKT, GSR | SVM | For recognizing three and four categories, the correct classification rates were 78.4% and 61.8%, respectively. | [ |
| Sadness, anger, fear, surprise, frustration, and amusement | GSR, HRV, SKT | KNN, DFA, MBP | KNN, DFA, and MBP could classify emotions with 72.3%, 75.0%, and 84.1%, respectively | [ |
| Three levels of driver stress | ECG, EOG, GSR and respiration | Fisher projection matrix and a linear discriminant | Three levels of driver stress with an accuracy of over 97% | [ |
| Fear, neutral, joy | ECG, SKT, GSR, respiration | Canonical correlation analysis | The rate of correct categorization is 85.3%. Fear, neutral, and happy categorization percentages were 76%, 94%, and 84%, respectively | [ |
| The emotional classes identified are high stress, low stress, disappointment, and euphoria | Facial EOG, ECG, GSR, respiration, | SVM and adaptive neuro-fuzzy inference system (ANFIS) | The total classification rates for the SVM and the ANFIS using ten fold cross-validation are 79.3% and 76.7%, respectively. | [ |
| Fatigue caused by driving for extended hours | HRV | Neural network | The accuracy of the neural network is 90% | [ |
| Boredom, pain, surprise | GSR, ECG, HRV, SKT | Machine learning algorithms such as linear discriminate analysis (LDA), classification and regression tree (CART), self-organizing map (SOM), and SVM | SVM produced accuracy rate of 100.0% | [ |
| The arousal classes were calm, medium aroused, and activated and the valence classes were unpleasant, neutral, and pleasant | ECG, pupillary response, gaze distance | Support vector machine | The optimal classification accuracies of 68.5% for three labels of valence and 76.4% for three labels of arousal | [ |
| Sadness, fear, pleasure | ECG, GSR, blood volume, pulse | Support vector regression | Recognition rate up to 89.2% | [ |
| Terrible, love, hate, sentimental, lovely, happy, fun, shock, cheerful, depressing, exciting, melancholy, mellow | EEG, GSR, blood volume pressure, respiration pattern, SKT, EMG, EOG | Support Vector Machine, Multilayer Perceptron (MLP), K-Nearest Neighbor (KNN) and Meta-multiclass (MMC), | The average accuracies are 81.45%, 74.37%, 57.74% and 75.94% for SVM, MLP, KNN and MMC classifiers respectively. The best result is for ‘Depressing’ with 85.46% using SVM. | [ |
| Happiness, sadness, surprise, stress | SKT, EDA, and HR | SVM, RSVM, SVM+GA, NN, DFA | The average accuracies are: 66.95% (SVM), 75.9% (RSVM), 90% (SVM+GA), 80.2% (NN), 84.7% (DFA) of this study and using Empatica E4 smartwatch to collect data from participants | [ |
| Theoretical emotions | EEG signal | KNN, NB, SVM, RF, feature extraction (e.g., wavelet transform and non-linear dynamics), feature reduction (e.g., PCA, LDA) | This study achieved an average classification accuracy of over 80% and using wearable sensor to collect eeg signals | [ |
Recent Previous Research on Recognizing Emotion from Physiological Signals and Facial Expressions (2021 and 2022).
| Emotions | Measurement Methods | Data Analysis Methods | Accuracy | Ref. |
|---|---|---|---|---|
| Arousal and valence emotions. Arousal represents inactive and active emotions (Annoying, Angry, Nervous, Excited, Happy, pleased). Valence represents negative and positive emotions (Sad, Bored, Sleepy, Relaxed, Calm, Peaceful) | EEG, Facial expressions | ANN, SVM, RF, K-NN, DT, RNN, CNN, DNN, DBN, LSTM | ML classification accuracy ranges from 61.17 to 93% (SVM: 41%, ANN: 18%, RF: 14%, KNN: 9%, DT: 9%) and deep learning classification accuracy ranges from 61.25% and 97.56% (LSTM: 50%, DNN: 7%, DBN: 7%, CNN: 36%) | [ |
| Arousal and valence (low and high) emotion levels. | EEG Signal | ML classifiers (KNN, SVM, LDA) and deep learning and MG3P (NN, MLP, ELM) and Gaussian process, k-means | This study performed an overall recognition rate (82.9%) [NN: 85.80%, SVM: 77.80%, KNN: 88.94%, MLP: 78.16%, 87.10%, 78.06%, 71.30%, 71.30%] | [ |
| Nagtive and positive emotions | EEG signal and Facial expressions | ML classfiers: RF, KNN, SVM, DT, LDA and deep learning classifiers: CNN+LSTM | This study achieves the following accuracy levels: 63.33% RF, 63.33% SVM, 61.7% KNN, 55% DT, 51.7% LDA, 71.67% CNN+LSTM. | [ |
| Negative emotions (annonyed, stressed, angry) | EGG physiological signals | ML classifiers: LR, SVM | It achieves accuracy levels: 75.00% LR, 72.62% SVM. | [ |
Figure 1The proposed system architecture using data fusion.
The collected data.
| Microsoft Wrist-Band 2 | Android Phone 7 |
|---|---|
| Heart Rate (HR) | Self-Report of Emotion (1–5) |
| Body-Temperature (Body-Temp ) | Environmental Noise ( Env-Noise ) |
| Electro Dermal Activities (EDA) | GPS Location (lat, lon) |
| Hand Acceleration (Motion as three-axis accelerometer) | |
| Air Pressure | |
| Light (UV) |
Extracted and removed features.
| Features Extracted | Meaning | Removed Features |
|---|---|---|
| EDA | It’s called Elctro-Dermal Activity, skin | FLightofStairsAscended, |
| HR | Heart Rate (Also called pulse) | |
| Air-Pressure | The pressure of the air. | |
| bTemp | It’s called Body temperature. | |
| Env-Noise | Represents Environmental Noise. | |
| UV | UV means Ultra-violet radiation. | |
| Motion | An accelerometer with three axes. | |
| X | Participant’s Motion in X-axis. | |
| Y | Participant’s Motion in Y-axis | |
| Z | Participant’s Motion in Z-axis | |
| Total-Gain | The overall gain achieved by the participant. | |
| Total-Loss | The amount of calories lost. | |
| Stepping-Gain | Steps achieved or gained during travel. | |
| Stepping-Loss | The steps in which a loss of calories occurred. | |
| Steps-Ascended | Number of steps in ascending order. | |
| Steps-Descended | Number of steps in descending order. | |
| Rate | The rate of movement in X, Y, and Z directions. | |
| Label | The target emotion labels can be (1–5) |
Figure 2Line Plot of Variance Threshold (X) Versus Number of Selected Features to be removed (Y).
Figure 3Types of ensemble methods.
Figure 4Explanation of Bootstrap Aggregating Method (Bagging).
Figure 5Illustration of Boosting Algorithm Architecture. “Adapted from [50]. (2021), Li, Y et al.”
Figure 6Illustration of Ada-boost Architecture Steps. “Adapted from [51]. (1997), Sodhi, A.”.
Figure 7Flow Chart of Gradient boosting Algorithm.
Figure 8Flow Chart of Stacking Classification ensemble.
Figure 9A comparison between ensemble learning methods: (a) Bagging Method. (b) Boosting Method. (c) Stacking Method. “Adapted from [52]. (2020), Kiyak, E.O.”
A comparison between the differences and characteristics of three ensemble methods (Bagging, Boosting and Stacking).
| Bagging | Boosting | Stacking | |
|---|---|---|---|
| Differences | Bagging often considers homogeneous weak learners, learns them independently from each other in parallel, and combines them following some kind of deterministic averaging process. | Boosting frequently takes into account homogeneous weak learners, trains them sequentially in a highly adaptive way (a base model depends on the preceding ones), and combines them by a deterministic method. | Stacking frequently takes into account diverse weak learners, trains them concurrently, and then combines them by training a meta-model to produce a prediction based on the output of the many weak models. |
| Characteristics | Bagging enables a group of weak learners to work together to outperform a single good student. Additionally, it aids in variance reduction, hence preventing the over-fitting of models during the process. | Boosting models could be improved with the help of several hyper-parameter variables. Boosting algorithms iteratively combine several weak learners and enhance observations. It might lessen a high bias that frequently appeared in models like decision trees and logistic regression. With Boosting Algorithms, characteristics are only chosen that have a large impact on the target, potentially reducing dimensionality and improving computational efficiency. | Stacking can harness the capabilities of a range of well-performing models on a classification or regression task and make predictions that have better performance than any single model in the ensemble. |
Figure 10(a) Importance of Physiological and Environmental Features. (b) Cumulative Importance of These Variables.
Descriptive statistics for the collected signals.
| Min | Q1 | Med | Q2 | Mean | Q3 | Max | skw | kur | std | |
|---|---|---|---|---|---|---|---|---|---|---|
| EDA | 0.0 | 0.0 | 340,330 | 340,330 | 221,954 | 340,330 | 340,330 | −0.6 | −1.7 | 165,736 |
| HR | 0.0 | 0.0 | 70.0 | 70.0 | 45.7 | 70.0 | 70.0 | −0.6 | −1.7 | 34.1 |
| UV | 78.0 | 80.0 | 82.0 | 82.0 | 82.5 | 85.0 | 89.0 | 0.6 | −1.0 | 3.7 |
| X | −0.1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | −0.6 | −0.5 | 0.0 |
| Y | 0.1 | 0.1 | 0.1 | 0.1 | 0.2 | 0.2 | 0.3 | 0.2 | −1.7 | 0.1 |
| Z | 0.9 | 0.9 | 1.0 | 1.0 | 1.0 | 1.0 | 1.1 | −0.1 | 1.5 | 0.0 |
| EnvNoise | 49.0 | 52.0 | 52.0 | 52.0 | 52.6 | 53.0 | 56.0 | 0.2 | −0.2 | 1.6 |
| AirPressure | 0.0 | 0.0 | 1010.6 | 1010.6 | 703.1 | 1010.7 | 1010.7 | −0.8 | −1.4 | 475.5 |
| bTemp | 0.0 | 0.0 | 22.8 | 22.8 | 15.8 | 22.8 | 22.8 | −0.8 | −1.4 | 10.7 |
Figure 11Correlation Matrix of all independent features according to dependent variable (Label).
Covariance Matrix of all on-body and environmental variables.
| EDA | HR | UV | Motion | EnvNoise | AirPressure | bTemp | |
|---|---|---|---|---|---|---|---|
| EDA | 26,655,597,441 | −1,012,733 | −202,362,467 | 3430.42 | 95,587 | 33,144.6 | −648,616 |
| HR | −1,012,733 | 293.4 | 32,554.2 | 0.67 | −7.4 | 2.104 | 51.51 |
| UV | −202,362,467 | 32,554.2 | 11,783,951.32 | 40.46 | −1174.20 | 516.51 | 9288.6 |
| Motion | 3430.4 | 0.57 | 40.46 | 0.13 | 0.016 | 0.032 | −0.046 |
| EnvNoise | 95,587.03 | −7.40 | −1174.20 | 0.016 | 4.82 | 0.15 | −3.40 |
| AirPressure | 33,144.58 | 2.10 | 516.51 | 0.032 | 0.15 | 0.31 | −0.58 |
| bTemp | −648,616 | 51.51 | 9288.58 | −0.046 | −3.39 | −0.58 | 22.79 |
Figure 12PCA plot of Body features with environmental variables.
Figure 13(a) Shows Noisy HR patterns using poincare plot. (b) Shows Normal HR of user after data transformation.
Figure 14Accuracy levels comparison of Bagging, Boosting and Stacking Ensemble Methods.
Figure 15Accuracy Levels Comparison between Base classifiers Based on only Environmental, Physiological factors and those based on entire data-set with Stacking Ensemble Method Accuracy.
Figure 16Accuracy Levels Comparison between Base classifiers of Environmental Modality and the Stacking Learner.
Figure 17Accuracy Levels Comparison between Base classifiers of Physiological modality and the Stacking Learner.
Classification Report of the Stacking Model.
| Precision | Recall | F1-Score | Support | |
|---|---|---|---|---|
| 1 | 0.94 | 0.98 | 0.96 | 666 |
| 2 | 0.98 | 0.97 | 0.98 | 1661 |
| 3 | 0.99 | 0.99 | 0.99 | 2532 |
| 4 | 0.99 | 0.98 | 0.99 | 1813 |
| 5 | 0.98 | 0.99 | 0.99 | 1415 |
| Accuracy | 0.98 | 8087 | ||
| macro avg | 0.98 | 0.98 | 0.98 | 8087 |
| weighted avg | 0.98 | 0.98 | 0.98 | 8087 |
A Comparison between results of classifiers for (each modality and entire data) and the stacking learner of them.
| Classifier | Body + Environmental | Body | Environmental |
|---|---|---|---|
| KNN | 93% | 89% | 87% |
| SVM | 94% | 91% | 88% |
| DT | 97% | 91% | 89.50% |
| RF | 97.50% | 91.40% | 88% |
| Stacking | 98.20% | 93% | 91% |
Parameters of Classifiers Definition used in the Stacking Model after applying Hyper-parameter Tuning.
| Classifier | Parameter | Parameter Explanation |
|---|---|---|
| KNN | KNN parameters | Explanation:- |
| SVM | SVM Parameters | |
| RF | RF Parameters | |
| DT | DT Parameters |