| Literature DB >> 32906731 |
Edgar P Torres P1, Edgar A Torres2, Myriam Hernández-Álvarez1, Sang Guun Yoo1.
Abstract
Affecting computing is an artificial intelligence area of study that recognizes, interprets, processes, and simulates human affects. The user's emotional states can be sensed through electroencephalography (EEG)-based Brain Computer Interfaces (BCI) devices. Research in emotion recognition using these tools is a rapidly growing field with multiple inter-disciplinary applications. This article performs a survey of the pertinent scientific literature from 2015 to 2020. It presents trends and a comparative analysis of algorithm applications in new implementations from a computer science perspective. Our survey gives an overview of datasets, emotion elicitation methods, feature extraction and selection, classification algorithms, and performance evaluation. Lastly, we provide insights for future developments.Entities:
Keywords: BCI; classification; emotion; extraction; feature; preprocessing; recognition; selection; survey; trends
Mesh:
Year: 2020 PMID: 32906731 PMCID: PMC7570756 DOI: 10.3390/s20185083
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Emotional states in the Valence-Arousal space [11].
Figure 2Emotional states in the Valence-Arousal-Dominance space [12].
Frequency bands associations [16,17].
| Band | State Association | Potential Localization | Stimuli |
|---|---|---|---|
| Gamma rhythm (above 30 HZ) | Positive valence. These waves are correlated with positive spiritual feelings. Arousal increases with high-intensity visual stimuli. | Different sensory and non-sensory cortical networks. | These waves appear stimulated by the attention, multi-sensory information, memory, and consciousness. |
| Beta (13 to 30 Hz) | They are related to visual self-induced positive and negative emotions. These waves are associated with alertness and problem-solving. | Motor cortex. | They are stimulated by motor activity, motor imagination, or tactile stimulation. Beta power increases during the tension of scalp muscles, which are also involved in frowning and smiling. |
| Alpha (8 to 13 Hz) | They are linked to relaxed and wakeful states, feelings of conscious awareness, and learning. | Parietal and occipital regions. | These waves are believed to appear during relaxation periods with eyes shut while remaining still awake. They represent the visual cortex in a repose state. These waves slow down when falling asleep and accelerate when opening the eyes, moving, or even when thinking about the intention to move. |
| Theta (4 to 7 Hz) | They appear in relaxation states, and in those cases, they allow better concentration. These waves also correlate with anxious feelings. | The front central head region is associated with the hippocampal theta waves. | Theta oscillations are involved in memory encoding and retrieval. Additionally, individuals that experience higher emotional arousal in a reward situation reveal an increase of theta waves in their EEG [ |
| Delta (0 to 4 Hz) | They are present in deep NREM 3 sleep stage. | Frontal, temporal, and occipital regions. | Deep sleep. These waves also have been found in continuous attention tasks [ |
Figure 3Components of an EEG-based BCI for emotion recognition.
Publicly available datasets.
| Source | Dataset | Number of Channels | Emotion Elicitation | Number of Participants | Target Emotions |
|---|---|---|---|---|---|
| [ | DEAP | 32 EEG channels | Music videos | 32 | Valence, arousal, dominance, liking |
| [ | eNTERFACE’06 | 54 EEG channels | Selected images from IAPS. | 5 | Calm, positive, exciting, negative exciting |
| [ | headIT | - | Recall past emotions | 31 | Positive valence (joy, happiness) or of negative valence (sadness, anger) |
| [ | SEED | 62 channels | Film clips | 15 | Positive, negative, neutral |
| [ | SEED-IV | 62 channels | 72 film clips | 15 | Happy, sad, neutral, fear |
| [ | Mahnob-HCI-tagging | 32 channels | Fragments of movies and pictures. | 30 | Valence and arousal rated with the self-assessment manikin |
| [ | EEG Alpha Waves dataset | 16 channels | Resting-state eyes open/closed experimental protocol | 20 | Relaxation |
| [ | DREAMER | 14 channels | Film clips | 23 | Rating 1 to 5 to valence, arousal, and dominance |
| [ | RCLS | 64 channels | Native Chinese Affective Video System | 14 | Happy, sad, and neutral |
Frequently used pre-preprocessing methods of EEG signals.
| Preprocessing Method | Main Characteristics | Advantages | Limitations | Literature’s Usage Statistics % (2015–2020) |
|---|---|---|---|---|
| Independent component analysis (ICA) [ | ICA separates artifacts from EEG signals into independent components based on the data’s characteristics without relying on reference channels. It decomposes the multi-channel EEG data into temporal separate and spatial-fixed components. It has been applied for ocular artifact extraction. | ICA efficiently separates artifacts from noise components. | ICA is successful only under specific conditions where one of the signals is of greater magnitude than the others. | 26.8 |
| Common Average Reference (CAR) [ | CAR is used to generate a reference for each channel. The algorithm obtains an average or all the recordings on every electrode and then uses it as a reference. The result is an improvement in the quality of Signal to Noise Ratio. | CAR outperforms standard types of electrical referencing, reducing noise by >30%. | The average calculation may present problems for finite sample density and incomplete head coverage. | 5.0 |
| Surface Laplacian (SL) [ | SL is a way of viewing the EEG data with high spatial resolution. It is an estimate of current density entering or leaving the scalp through the skull, considering the volume conductor’s outer shape and does not require details of volume conduction. | SL estimates are reference-free, meaning that any EEG recording reference scheme will render the same SL estimates. | It is sensitive to artifacts and spline patterns. | 0.4 |
| Principal Component Analysis (PCA) [ | PCA finds patterns in data. It can be pictured as a rotation of the coordinate axes so that they are not along with single time points. Still, along with linear combinations of sets of time points, collectively represents a pattern within the signal. PCA rotates the axes to maximize the variance within the data along the first axis, maintaining their orthogonality. | PCA helps in the reduction of feature dimensions. | PCA does not eliminate noise, but it can reduce it. PCA compresses data compared to ICA and allows for data separation. | 50.1 |
| Common Spatial Patterns (CSP) [ | CSP applies spatial filters that are used to discriminate different classes of EEG signals. For instance, those corresponding to different motor activity types. CSP also estimates covariance matrices. | CSP does not require a priori selection of sub-specific bands and knowledge of these bands. | CSP requires many electrodes. | 17.7 |
Feature extraction algorithms.
| Feature Extraction Method | Main Characteristics | Domain | Advantages | Limitations | Literature’s usage statistics % (2015–2020) |
|---|---|---|---|---|---|
| ERP [ | It is the brain response to a sensory, cognitive, or motor event. Two sub-classifications are (1) evoked potentials and (2) induced potentials. | Time | It has an excellent temporal resolution. | ERP has a poor spatial resolution, so it is not useful for research questions related to the activity location. | 2.9 |
| Hjorth Features [ | These are statistical indicators whose parameters are normalized slope descriptors. | Time | Low computational cost appropriate for real-time analysis. | Possible statistical bias in signal parameter calculations | 17.0 |
| Statistical Measures [ | Signal statistics: power, mean, standard deviation, variance, kurtosis, relative band energy. | Time | Low computational cost. | - | 8.6 |
| DE [ | Entropy evidences scattering in data. Differential Entropy can reflect spatial signal variations. | Time–spatial | Entropy and derivate indexes reflect the intra-cortical information flow. | 4.9 | |
| HOC [ | Oscillation in times series can be represented by counts of axis crossing and its differences. HOC displays a monotone property whose rate of increase discriminates between processes. | Time | HOC reveals the oscillatory pattern of the EEG signal providing a feature set that conveys enough emotion information to the classification space. | The training process is time-consuming due to the dependence of the HOC order on different channels and different channel combinations [ | 2.0 |
| ICA [ | ICA is a signal enhancing method and a feature extraction algorithm. ICA separates components that are independent of each other based on the statistical independence principle. | Time. | ICA efficiently separates artifacts from noise components. ICA decomposes signals into temporal independent and spatially fixed components. | ICA is only useful under specific conditions (one of the signals is of greater magnitude than the others). | 11.3 |
| PCA [ | The PCA algorithm is mostly used for feature extraction but could also be used for feature extraction. It reduces the dimensionality of the signals creating new uncorrelated variables. | Time | PCA reduces data dimensionality without information loss. | PCA assumes that the data is linear and continuous. | 19.7 |
| WT [ | The WT method represents the original EEG signal with secured and straightforward building blocks known as wavelets, which can be discrete or continuous. | Time-frequency | WT describes the features of the signal within a specified frequency domain and localized time domain properties. It is used to analyze irregular data patterns. | High computational and memory requirements. | 26.0 |
| AR [ | AR is used for feature extraction in the frequency domain. AR estimates the power spectrum density (PSD) of the EEG using a parametric approach. The estimation of PSD is achieved by calculating the coefficients or parameters of the linear system under consideration. | Frequency domain | AR is used for feature extraction in the frequency domain. | The order of the model in the spectral estimation is challenging to select. | 1.6 |
| WPD [ | WPD generates a sub-band tree structuring since a full binary tree can characterize the decomposition process. WPD decomposes the original signals orthogonally and independently from each other and satisfies the law of conservation of energy. The energy distribution is extracted as the feature. | Time-frequency | WPD can analyze non-stationary signals such as EEG. | WPD uses a high computational time to analyze the signals. | 1.6 |
| FFT [ | FFT is an analysis method in the frequency domain. EEG signal characteristics are reviewed and computed by power spectral density (PSD) estimation to represent the EEG samples signal selectively. | Frequency | FFT has a higher speed than all the available methods so that it can be used for real-time applications. | FFT has low-frequency resolution and high spectral loss of information, which makes it hard to find the actual frequency of the signal. | 2.2 |
| Functional EEG connectivity indices [ | EEG-based functional connectivity is estimated in the frequency bands for all pairs of electrodes using correlation, coherence, and phase synchronization index. Repeated measures of variance for each frequency band were used to determine different connectivity indices among all pairs. | Frequency | Connectivity indices at each frequency band can be used as features to recognize emotional states. | Difficult to generalize and distinguish individual differences in functional brain activity. | 1.3 |
| Rhythm [ | Detection of repeating patterns in the frequency band or “rhythm”. | Frequency | Specific band rhythms contribute to emotion recognition. | - | 0.1 |
| Graph Regularized Sparse Linear Regularized GRSLR [ | This method applies a graph regularization and a sparse regularization on the transform matrix of linear regression | Frequency | It can simultaneously cope with sparse transform matrix learning while preserving the intrinsic manifold of the data samples. | - | 0.2 |
| Granger causality [ | This feature is a statistical concept of causation that is based on prediction. | Frequency | The authors can analyze the brain’s underlying structural connectivity. | These features only give information about the linear characteristics of signals. | 0.6 |
Figure 4Frequency domain, time domain, and spatial information [63].
Feature selection methods used in the literature (2015–2020) in percentages (%).
| Feature Selection Method | Literature’s Usage Statistics % (2015–2020) |
|---|---|
| min-Redundancy Max-Relevance mRMR | 11.5% |
| Univariate | 6.3% |
| Multivariate | 6.3% |
| Genetic Algorithms | 32.3% |
| Stepwise Discriminant Analysis SDA | 17.7% |
| Fisher score | 7.3% |
| Wrapper methods | 15.6% |
| Built-in methods | 3.1% |
Categories of general classifiers.
| Category of Classifier | Description | Examples of Algorithms in the Category | Advantages | Limitations | Literature’s Usage Statistics % (2015–2020) |
|---|---|---|---|---|---|
| Linear | Discriminant algorithms that use linear functions (hyperplanes) to separate classes. | Linear Discriminant Analysis LDA [ | These algorithms have reasonable classification accuracy and generalization properties. | Linear algorithms tend to have poor outcomes in processing complex nonlinear EEG data. | 5.50 |
| Neural networks (NN) | NN are discriminant algorithms that recognize underlying relationships in a set of data resembling the human brain operation. | Multilayer Perceptron MLP [ | NN generally yields good classification accuracy | Sensitive to overfitting with noisy and non-stationary data as EEGs. | 1.60 |
| Nonlinear Bayesian classifier | Generative classifiers produce nonlinear decision boundaries. | Bayes quadratic [ | Generative classifiers reject uncertain samples efficiently. | For Bayes quadratic, the covariance matrix cannot be estimated accurately if the dimensionality is vast, and there are not enough training sample patterns. | 0.10 |
| Nearest neighbor classifiers | Discriminative algorithms that classify cases based on its similarity to other samples | k-Nearest Neighbors kNN [ | kNN has excellent performance with low-dimensional feature vectors. | kNN has reduced performance for classifying high dimension feature vectors or noise distorted features. | 4.5 |
| Combination of classifiers | Combined classifiers using boosting, voting, or stacking. Boosting consists of several cascading classifiers. In voting, classifiers have scores, which yield a combined score per class, and a final class label. Stacking uses classifiers as meta-classifier inputs. | Ensemble-methods can combine almost any type of classifier [ | Variance reduction that leads to increase of classification accuracy. | Quality measures are application dependent. | 2.1 |
Conventional performance evaluation methods for BCI.
| Performance Evaluation | Main characteristics | Advantages | Limitations |
|---|---|---|---|
| Confusion matrix | The confusion matrix presents the number of correct and erroneous classifications specifying the erroneously categorized class. | The confusion matrix gives insights into the classifier’s error types (correct and incorrect predictions for each class). | Results are difficult to compare and discuss. Instead, some authors use some parameters extracted from the confusion matrix. |
| Accuracy and error rate | The accuracy p is the probability of correct classification in a certain number of repeated measures. | It works well if the classes are balanced, i.e., there are an equal number of samples belonging to each class. | Accuracy and error rate do not take into account whether the dataset is balanced or not. If one class occurs more than another, the evaluation may appear with a high value for accuracy even though the classification is not performing well. |
| Cohen’s kappa (k) | k is agreement evaluation between nominal scales. This index measures the agreement between a true class compared to a classifier output. 1 is a perfect agreement, and 0 is pure chance agreement. | Cohen’s kappa returns the theoretical chance level of a classifier. | This coefficient has to be interpreted appropriately. It is necessary to report the bias and prevalence of the k value and test the significance for a minimum acceptable level of agreement. |
| Sensitivity or Recall | Sensitivity, also called Recall, identifies the true positive rate for describing the accuracy of classification results. It evaluates the proportion of correctly identified true positives related to the sum of true positives plus false negatives. | Sensitivity measures how often a classifier correctly categorizes a positive result. | The Recall should not be used when the positive class is larger (imbalanced dataset), and correct detection of positives samples is less critical to the problem. |
| Specificity | Specificity is the ability to identify a true negative rate. It measures the proportion of correctly identified true negatives over the sum of the true negatives plus false positives. | Specificity measures how often a classifier correctly categorizes a negative result. | Specificity focuses on one class only, and the majority class biases it. |
| Precision | Precision also referred to as Positive Predicted Value, is calculated as 1 – False Detection Rate (F). | Precision measures the fraction of correct classifications. | Precision should not be used when the positive class is larger (imbalanced dataset), and correct detection of positives samples is less critical to the problem. |
| ROC | The ROC curve is a Sensitivity plot as a function of the False Positive Rate. The area under the ROC curve is a measure of how well a parameter can distinguish between a true positive and a true negative. | ROC curve provides a measure of the classifier performance across different significance levels. | ROC is not recommended when the negative class is smaller but more important. The Precision and Recall will mostly reflect the ability to predict the positive class if it is larger in an imbalanced dataset. |
| F-Measure | F-Measure is the harmonic mean of Precision and Recall. It is useful because as the Precision increases, Recall decreases, and vice versa. | F-measure can handle imbalanced data. F-measure (like ROC and kappa) provides a measure of the classifier performance across different significance levels. | F-measure does not generally take into account true negatives. |
| Pearson correlation coefficient | Pearson’s correlation coefficient (r), quantifies the degree of a ratio between the true and predicted values by a value ranking from −1 to +1. | Pearson’s correlation is a valid way to measure the performance of a regression algorithm. | Pearson’s correlation ignores any bias which might exist between the true and the predicted values. |
| Information transfer rate (ITR) | As BCI is a channel from the brain to a device, it is possible to estimate the bits transmitted from the brain. ITR is a standard metric for measuring the information sent within a given time in bits per second. | ITR is a metric that contributes to criteria to evaluate a BCI System. | ITR is often misreported due to inadequate understanding of many considerations as delays are necessary to process data, to present feedback, and clear the screen. |
Summary of emotion recognition systems using BCI 1.
| Reference/Year | Stimuli | EEG Data | Feature Extraction | Feature Selection | Features | Classification | Emotions | Accuracy |
|---|---|---|---|---|---|---|---|---|
| [ | - | DEAP | Computation in the time domain, Hjorth, Higuchi, | mRMR | Statistical features, BP, | RBF NN | 3 class/Arousal | Arousal/60.7% Valence/62.33% |
| [ | 15 movie clips | Own dataset/15 participants | DBN | - | DE, DASM, RASM, DCAU, from | kNN | Positive Neutral Negative. | SVM/83.99% |
| [ | Self-induced emotions | Own dataset/10 participants | WT | PCA | Eigenvalues vector | SVM | Disgust | Avg. 90.2% |
| [ | Video clips | Own dataset/10 participants | Higuchi | - | FD | RBF | Happy | Avg. 60% |
| [ | Video clips | Own dataset/30 participants | SFTT, ERD, ERS | LDA | PSD | LIBSVM | Joy Amusement Tenderness Anger | Neutrality 81.26% |
| [ | - | DEAP | DFT, DWT | - | PSD, Logarithmic compression of Power Bands, LFCC, PSD, DW | NB | Dislike | Avg. |
| [ | - | DEAP and SEED-IV | Computations in time domain, FFT, DWT | - | PSD, Energy, | SVM | HAHV | Avg DEAP/79% |
| [ | Music tracks | Own dataset/30 participants | SFTT, WT | - | PSD, BP | SVM | Happy | Avg. |
| [ | - | SEED | FFT, and electrode location | Max Pooling | DE, DASM, RASM, DCAU | SVM | Positive | Avg. |
| [ | Video clips | Own dataset/16 participants | SFTT, WT, Hjorth, AR | - | PSD, BP, Quadratic mean, AR Parameters, Hjorth | SVM | Happy | Avg. 90.41% |
| [ | - | DEAP | WT | - | Wavelets | LSTM RNN | Valence | Avg. 59.03% |
| [ | - | SEED | LSTM to learn context information for each hemispheric data | - | DE | BiDANN | Positive | Avg. 92.38% |
| [ | - | DEAP | Signal computation in the time domain, and FFT | Statistical characteristics. PSD | BT | Valence | Avg. for combination features AUC BT/0.9254 | |
| [ | - | DEAP | Computation in the time domain, and FFT | GA | Statistical characteristics, PSD, | AdaBoost | Joy | 95.84% |
| [ | - | DEAP | SFTT, NMI | - | Inter-channel connection matrix based on NMI | SVM | HAHV | Arousal/73.64% Valence/74.41% |
| [ | - | SEED | FFT | SDA | Delta, Theta, Alpha, Beta, and, Gamma | LDA | Positive | Avg. 93.21% |
| [ | - | SEED | FFT | - | Electrodes-frequency Distribution Maps (EFDMs) | CNN | Positive | Avg. 82.16% |
| [ | - | SEED/ | Computation in the time domain, and FFT | Fisher-score, classifier-dependent structure (wrapper), | EEG based network patterns (ENP) | SVM | Positive | Best feature F1 |
| [ | - | DEAP | Tensorflow framework | Sparse group lasso | Granger causality feature | CapsNet Neural Network | Valence-arousal | Arousal/87.37% Valence/88.09% |
| [ | Video clips | Own dataset RCLS/14 participants. | Computation in the time domain, WT | - | HOC, FD, Statistics, Hjorth, Wavelets | GRSLR | Happy | 81.13% |
| [ | - | DEAP | Computation in the time domain, FFT, WT | Correlation matrix, | Statistical measures, Hjorth, Autoregressive parameters, frequency bands, the ratio between frequency bands, wavelet domain features | XGBoost | Valence, arousal, dominance, and liking | Valence/75.97% Arousal/74.20% |
| [ | - | DEAP | Frequency phase information | Sequential feature elimination | Derived features of bispectrum | SVM | Low/high valence, low/high arousal | Low-high arousal/64.84% |
| [ | - | DEAP | Higuchi, FFT | - | FD, PSD | SVM | Valence, arousal | Valence/86.91% |
| [ | - | DEAP | DWT | - | Discrete wavelets | kNN | Valence, arousal | Valence/84.05% |
| [ | - | DEAP | RBM | - | Raw signal-6 channels | Deep-Learning | Happy, calm, sad, scared | Avg. 75% |
| [ | - | DEAP | DWT | Best classification performance for channel selection | Discrete wavelets | MLP | Positive, negative | MLP/77.14% |
| [ | - | DEAP | - | - | - | LSTM NN | Low/high valence, | Low-high valence/85.45% |
| [ | - | DEAP | - | - | - | 3D-CNN | Valence, arousal | Valence/87.44% |
| [ | - | DEAP | FFT, phase computations, Pearson correlation | - | PSD, phase, phase synchronization, Pearson correlation | CNN | Valence | Valence/96.41% |
| [ | Flight simulator | Own dataset/8 participants | Computation in time domain, and WT | - | Statistical measures, | ANN | Happy, Sad, | Avg. 53.18% |
1 Autoregressive Parameter (AR). Bagging Tree (BT). Band Power (BP). Bayesian linear discriminant analysis (BLDA). Bi-hemispheres Domain Adversarial Neural Network (BiDANN). Convolutional Neural Network (CNN). Complex-Valued Convolutional Neural Network (CVCNN). Gated-Shape Convolutional Neural Network (GSCNN). Global Space Local Time Filter Convolutional Neural Network (GSLTFCNN). Deep Belief Networks (DBNs). Differential entropy (DE). DE feature Differential Asymmetry (DASM). DE feature Rational Assimetry (RASM). DE feature Differential Caudality (DCAU). Electrooculography (EOG). Electromyogram (EMG). Event-Related Desynchronization (ERD) and Synchronization (ERS). Feature selection and weighting method (SFEW). Fractal dimensions (FD). Genetic Algorithm (GA). Graph regularized Extreme Learning Machine (GELM) NN. Graph Regularized Sparse Linear Regularized (GRSLR). High Order Crossing (HOC). Linear Discriminant Analysis (LDA). Logistic Regression (LR). Long short-term memory Recurrent Neural Network (LSTM RNN). Minimum-Redundancy-Maximum-Relevance (mRMR). Normalized Mutual Information (NMI). Principal Component Analysis (PCA). Radial Basis Function (RBF). Short-Time Fourier Transform (STFT). Stepwise Discriminant Analysis (SDA). Support Vector Machine (SVM). Wavelet Transform (WT).
Figure 5Emotion elicitation methods.
Figure 6Number of participants in EEG datasets.
Figure 7EEG datasets for emotion recognition.
Figure 8Domain of used features.
Figure 9Percentage of the use of algorithms for feature extraction from Table 8.
Systems in Table 8 using feature selection algorithms.
| Feature Selection Algorithm | Reference |
|---|---|
| mRMR | [ |
| PCA | [ |
| LDA | [ |
| Max Pooling | [ |
| Genetic Algorithm | [ |
| SDA | [ |
| Fisher-score | [ |
| SFEW | [ |
| Sparse group lasso | [ |
| Correlation matrix | [ |
| Information gain | [ |
| Recursive feature elimination | [ |
| Best classification performance for channel selection | [ |
Figure 10Classifiers’ usage.
Figure 11Percentage of systems with different numbers of classified emotions.
Figure 12Accuracy vs. types and number of classified emotions.