| Literature DB >> 34955797 |
Guangcheng Bao1, Bin Yan1, Li Tong1, Jun Shu1, Linyuan Wang1, Kai Yang1, Ying Zeng1,2.
Abstract
One of the greatest limitations in the field of EEG-based emotion recognition is the lack of training samples, which makes it difficult to establish effective models for emotion recognition. Inspired by the excellent achievements of generative models in image processing, we propose a data augmentation model named VAE-D2GAN for EEG-based emotion recognition using a generative adversarial network. EEG features representing different emotions are extracted as topological maps of differential entropy (DE) under five classical frequency bands. The proposed model is designed to learn the distributions of these features for real EEG signals and generate artificial samples for training. The variational auto-encoder (VAE) architecture can learn the spatial distribution of the actual data through a latent vector, and is introduced into the dual discriminator GAN to improve the diversity of the generated artificial samples. To evaluate the performance of this model, we conduct a systematic test on two public emotion EEG datasets, the SEED and the SEED-IV. The obtained recognition accuracy of the method using data augmentation shows as 92.5 and 82.3%, respectively, on the SEED and SEED-IV datasets, which is 1.5 and 3.5% higher than that of methods without using data augmentation. The experimental results show that the artificial samples generated by our model can effectively enhance the performance of the EEG-based emotion recognition.Entities:
Keywords: data augmentation; electroencephalography (EEG); emotion recognition; generative adversarial network (GAN); variational auto encoder (VAE)
Year: 2021 PMID: 34955797 PMCID: PMC8700963 DOI: 10.3389/fncom.2021.723843
Source DB: PubMed Journal: Front Comput Neurosci ISSN: 1662-5188 Impact factor: 2.380
FIGURE 1The framework of VAE-D2GAN. The model consists of an encoder, a decoder/generator, and two discriminators.
FIGURE 2TP-DE images of five frequency bands for a certain participant. The length and width are 32; the channel is 5.
The training process of VAE-D2GAN.
|
|
FIGURE 3Structural diagram of deep neural network (DNN).
FIGURE 4Flowchart of data augmentation based on emotion recognition.
VAE-D2GAN architecture.
| Module | Layer | Kernel size | Stride | Input | Output | Activation |
| Encoder | Input | – | – | – | ( | – |
| Conv1 | (5,5) | 2 | ( | ( | ReLU | |
| Conv2 | (5,5) | 2 | ( | ( | ReLU | |
| Conv3 | (5,5) | 2 | ( | ( | ReLU | |
| FC1 | – | – | ( | ( | – | |
| Generator | Input | – | – | 64 | ( | – |
| FC1 | – | – | ( | ( | ReLU | |
| Deconv1 | (5,5) | 2 | ( | ( | ReLU | |
| Deconv2 | (5,5) | 2 | ( | ( | ReLU | |
| Deconv3 | (5,5) | 2 | ( | ( | ReLU | |
| Discriminator | Input | – | – | – | ( | – |
| Conv1 | (5,5) | 2 | ( | ( | ReLU | |
| Conv2 | (5,5) | 2 | ( | ( | ReLU | |
| Conv3 | (5,5) | 2 | ( | ( | ReLU | |
| FC1 | – | – | ( | ( | – |
The * represents the multiplication symbol.
On the SEED dataset, TP-DE images are generated based on different models, and the number of different generated images samples is added in the training set.
| Generated samples | |||||||||
| Model | 0 | 1000 | 2000 | 5000 | 8000 | 10000 | 15000 | 20000 | 24000 |
| VAE | 91.0/7.2 | 88.7/8.5 | 89.4/7.8 | 90.7/7.5 | 89.6/6.6 | 88.5/7.6 | 89.0/7.9 | 90.4/6.8 | 88.9/7.9 |
| WGAN | 91.0/7.2 | 89.2/7.3 | 89.1/7.0 | 88.6/7.2 | 87.4/8.6 | 88.3/6.9 | 89.7/7.2 | 87.0/7.9 | 88.6/6.9 |
| DCGAN | 91.0/7.2 | 90.1/7.7 | 90.0/8.4 | 88.8/7.4 | 88.4/7.2 | 89.9/7.1 | 91.6/7.7 | 91.0/7.4 | 90.3/7.1 |
| D2GAN | 91.0/7.2 | 91.4/7.3 | 90.9/6.4 | 90.1/7.0 | 90.4/6.8 | 89.4/8.3 | 89.3/6.6 | 91.6/6.3 | 90.3/7.2 |
| VAE-GAN | 91.0/7.2 | 89.7/7.3 | 90.9/7.1 | 90.9/7.5 | 91.6/7.2 | 91.1/8.0 | 89.4/8.0 | 90.8/7.4 | 89.5/7.5 |
| VAE-D2GAN | 91.0/7.2 | 90.9/7.0 | 92.5/7.1 | 91.0/7.4 | 90.4/7.4 | 91.2/7.3 | 90.0/7.8 | 91.7/6.9 | 91.7/6.1 |
The average accuracy and standard deviation of classification are obtained by using DNN. 0 means no generated samples are added in the training set.
On the SEED-IV dataset, TP-DE images are generated based on different models, and the number of different generated image samples is added in the training set.
| Generated samples | |||||||||||
| Model | 0 | 1000 | 2000 | 5000 | 8000 | 10000 | 15000 | 20000 | 25000 | 30000 | 32000 |
| VAE | 78.8/14.2 | 78.6/13.5 | 77.2/12.7 | 76.9/12.6 | 74.5/14.1 | 72.0/15.6 | 69.2/17.8 | 64.9/12.7 | 59.9/15.6 | 59.9/15.6 | 61.3/17.3 |
| WGAN | 78.8/14.2 | 76.1/13.5 | 78.0/10.3 | 71.5/11.3 | 73.7/13.7 | 71.2/14.1 | 68.6/13.9 | 69.4/14.2 | 65.6/14.4 | 67.7/13.6 | 70.4/14.3 |
| DCGAN | 78.8/14.2 | 76.8/12.2 | 72.2/12.4 | 78.1/11.6 | 76.9/13.8 | 75.6/10.8 | 76.5/10.1 | 77.4/11.8 | 77.5/11.6 | 79.1/13.8 | 79.1/11.7 |
| D2GAN | 78.8/14.2 | 78.3/11.9 | 80.0/12.6 | 73.8/14.0 | 76.2/13.5 | 75.0/12.4 | 75.4/14.3 | 76.3/13.5 | 75.1/11.3 | 74.3/12.6 | 75.1/12.7 |
| VAE-GAN | 78.8/14.2 | 78.8/11.2 | 80.8/10.3 | 77.7/89.2 | 78.9/9.6 | 81.1/11.5 | 80.7/11.0 | 80.8/11.8 | 81.5/12.8 | 81.1/12.9 | 80.5/12.5 |
| VAE-D2GAN | 78.8/14.2 | 78.8/11.4 | 80.4/10.6 | 80.8/12.3 | 81.4/11.4 | 82.3/11.0 | 79.9/13.0 | 79.0/10.5 | 80.2/11.6 | 80.5/11.8 | 80.8/10.1 |
The average accuracy and standard deviation of classification are obtained by using DNN. 0 means no generated samples are added in the training set.
FIGURE 5Significance test of different data enhancement models.
FIGURE 6The influence of data augmentation on the recognition accuracy of different classifiers. (A) Result for the SEED dataset; (B) result for the SEED-IV dataset.
Several algorithms are used to evaluate the performance of the data augmentation models.
| Evaluation method | ||||||
| Model | SEED | SEED-IV | ||||
| IS | FID | MMD | IS | FID | MMD | |
| VAE | 1.371 | 29.257 | 0.628 | 1.390 | 409.52 | 0.907 |
| WGAN |
| 17.511 | 0.175 |
| 67.906 | 0.347 |
| DCGAN | 1.874 | 30.108 | 0.171 | 1.566 | 40.122 | 0.508 |
| D2GAN | 1.951 | 20.745 | 0.111 | 1.845 | 13.762 | 0.241 |
| VAE-GAN | 2.256 | 17.557 | 0.241 | 1.995 | 30.542 | 0.276 |
| VAE-D2GAN | 2.041 |
|
| 1.865 |
|
|
Bold represents the best performance in the corresponding evaluation algorithm.
FIGURE 7Two-dimensional visualizations of real and generated TP-DE images by different models at different iterations in the SEED dataset. Data points with red, blue and green represent real samples of positive, neutral and negative emotions, respectively, and the lighter color represents generated samples.
Three groups of experiments were set to explore the performance of the data augmentation model while varying the number of training samples for each experiment.
| Data augmentation | ||
| Experiment | No | Yes |
| Experiment 1 | 68.17/11.89 | 79.46/12.24 |
| Experiment 2 | 75.46/14.04 | 83.76/10.64 |
| Experiment 3 | 90.97/7.20 | 92.46/7.05 |
The average accuracy and standard deviation of classification were obtained by using DNN.