| Literature DB >> 36213341 |
Ruoyu Du1,2, Shujin Zhu1,2, Huangjing Ni1,2, Tianyi Mao1,2, Jiajia Li1, Ran Wei3.
Abstract
During the COVID-19 pandemic, young people are using multimedia content more frequently to communicate with each other on Internet platforms. Among them, music, as psychological support for a lonely life in this special period, is a powerful tool for emotional self-regulation and getting rid of loneliness. More and more attention has been paid to the music recommender system based on emotion. In recent years, Chinese music has tended to be considered an independent genre. Chinese ancient-style music is one of the new folk music styles in Chinese music and is becoming more and more popular among young people. The complexity of Chinese-style music brings significant challenges to the quantitative calculation of music. To effectively solve the problem of emotion classification in music information search, emotion is often characterized by valence and arousal. This paper focuses on the valence and arousal classification of Chinese ancient-style music-evoked emotion. It proposes a hybrid one-dimensional convolutional neural network and bidirectional and unidirectional long short-term memory model (1D-CNN-BiLSTM). And a self-acquisition EEG dataset for Chinese college students was designed to classify music-induced emotion by valence-arousal based on EEG. In addition to that, the proposed 1D-CNN-BILSTM model verified the performance of public datasets DEAP and DREAMER, as well as the self-acquisition dataset DESC. The experimental results show that, compared with traditional LSTM and 1D-CNN-LSTM models, the proposed method has the highest accuracy in the valence classification task of music-induced emotion, reaching 94.85%, 98.41%, and 99.27%, respectively. The accuracy of the arousal classification task also gained 93.40%, 98.23%, and 99.20%, respectively. In addition, compared with the positive valence classification results of emotion, this method has obvious advantages in negative valence classification. This study provides a computational classification model for a music recommender system with emotion. It also provides some theoretical support for the brain-computer interactive (BCI) application products of Chinese ancient-style music which is popular among young people.Entities:
Keywords: BiLSTM; Chinese music; EEG; Emotion classification
Year: 2022 PMID: 36213341 PMCID: PMC9530425 DOI: 10.1007/s11042-022-14011-7
Source DB: PubMed Journal: Multimed Tools Appl ISSN: 1380-7501 Impact factor: 2.577
Fig. 1Chinese Ancientry-style Music Information Selected by DESC dataset
Fig. 2Experimental Design Process
Main characteristics of the DEAP, DREAMER, and DESC datasets
| DEAP | DREAMER | DESC | |
|---|---|---|---|
| Stimuli | 40 | 18 | 18 |
| Type | Music videos | Film clips | Chinese ancient-style music videos |
| Duration | 60s | 65-393s | 34s |
| Physiological Signals | EEG, GSR, BVP, RESP, SKT, EOG, EMG | EEG, ECG | EEG |
| Participants | 32 (19 males,13 female) | 23 (14 males, 9 females) | 20 (10 males, 10females) |
Fig. 3LSTM Model Structure
Fig. 4BiLSTM Model Structure
Fig. 5Proposed Model Structure
Fig. 6Model Accuracy Results with Valence-Arousal Classification of Emotion from Three EEG Datasets, (a) Valence; (b) Arousal
The performance of LSTM, 1D-CNN-LSTM, and the proposed 1D-CNN-BiLSTM model on the valence classification task with the DEAP, DREAMER, and DESC datasets
| Dataset | Method | Accuracy | Precision | Recall | F1 score | Cohen’s kappa |
|---|---|---|---|---|---|---|
| DEAP | LSTM | 91.71% | 91.71% | 91.59% | 0.9159 | 0.8874 |
| 1D-CNN-LSTM | 92.67% | 92.52% | 92.67% | 0.9259 | 0.9005 | |
| 1D-CNN-BiLSTM | 94.85% | 94.65% | 94.58% | 0.9461 | 0.9300 | |
| DREAMER | LSTM | 96.86% | 96.84% | 96.87% | 0.9685 | 0.9528 |
| 1D-CNN-LSTM | 97.69% | 97.69% | 97.68% | 0.9769 | 0.9653 | |
| 1D-CNN-BiLSTM | 98.41% | 98.42% | 98.40% | 0.9841 | 0.9760 | |
| DESC | LSTM | 97.77% | 98.26% | 98.74% | 0.9850 | 0.9615 |
| 1D-CNN-LSTM | 99.19% | 98.87% | 99.57% | 0.9922 | 0.9861 | |
| 1D-CNN-BiLSTM | 99.27% | 99.15% | 99.60% | 0.9937 | 0.9874 |
The performance of LSTM,1D-CNN-LSTM, and the proposed 1D-CNN-BiLSTM model on the arousal classification task with DEAP, DREAMER, and DESC datasets
| Datasets | Methods | Accuracy | Precision | Recall | F1-score | Cohens kappa |
|---|---|---|---|---|---|---|
| DEAP | LSTM | 86.61% | 85.33% | 85.70% | 0.8548 | 0.8457 |
| 1D-CNN-LSTM | 90.38% | 89.49% | 89.64% | 0.8956 | 0.8892 | |
| 1D-CNN-BiLSTM | 93.40% | 92.70% | 92.96% | 0.9283 | 0.9240 | |
| DREAMER | LSTM | 96.16% | 96.04% | 96.15% | 0.9609 | 0.9493 |
| 1D-CNN-LSTM | 97.38% | 97.48% | 97.12% | 0.9730 | 0.9654 | |
| 1D-CNN-BiLSTM | 98.23% | 98.29% | 98.05% | 0.9817 | 0.9765 | |
| DESC | LSTM | 98.08% | 98.08% | 97.99% | 0.9803 | 0.9775 |
| 1D-CNN-LSTM | 98.70% | 98.86% | 98.57% | 0.9871 | 0.9848 | |
| 1D-CNN-BiLSTM | 99.20% | 99.27% | 99.17% | 0.9922 | 0.9907 |
Fig. 7Positive and Negative Valence Classification Accuracy of the Proposed Method based on DEAP, DREAMER, and DESC Datasets
Details of several published studies on DEAP, DREAMER, and Self-acquisition datasets
| Reference | Dataset | Device | Stimuli | Inputs | Classifier | Accuracy(%) | |
|---|---|---|---|---|---|---|---|
| Valence | Arousal | ||||||
| Salma et al. [ | DEAP | Biosemi | Audio-Visual (music and video clips) | Raw EEG signals | LSTM | 85.45% | 85.65% |
| Zhan et al. [ | DEAP | Biosemi | Audio-Visual (music and video clips) | PSD | CNN | 82.95% | 84.07% |
| Sharma et al. [ | DEAP | Biosemi | Audio-Visual (music and video clips) | Higher order statistics | Bi-LSTM | 84.16% | 85.21% |
| Gao et al. [ | DEAP | Biosemi | Audio-Visual (music and video clips) | DE | Dense CNN | 92.24% | 92.92% |
| The proposed method | DEAP | Biosemi | Audio-Visual (music and video clips) | PSD | 1D-CNN-BiLSTM | 94.85% | 93.40% |
| Katsigiannis and Ramzan [ | DREAMER | Emotiv EPOC | Audio-Visual (film clips) | PSD | SVM-RBF | 62.49% | 62.17% |
| Y. Liu et al. [ | DREAMER | Emotiv EPOC | Audio-Visual (film clips) | Raw EEG signals | MLF-CapsNet | 94.59% | 95.26% |
| Pandey et al. [ | DREAMER | Emotiv EPOC | Audio-Visual (film clips) | Raw EEG signals | 1D-CNN | 75.93% | 81.48% |
| The proposed method | DREAMER | Emotiv EPOC | Audio-Visual (film clips) | PSD | 1D-CNN-BiLSTM | 98.41% | 98.23% |
| Liu et al. [ | Self-acquisition dataset | Emotiv EPOC | Audio-Visual (film clips) | PSD, ASM | SVM-RBF | Positive: 86.43% | None |
Negative: 65.09% | |||||||
| Zhou et al. [ | Self-acquisition dataset | Biosemi | Audio-Visual (film and music clips) | PSD, DE | Random Forest | 78.75% | 73.98% |
| The proposed method | Self-acquisition dataset (DESC) | Emotiv EPOC | Audio-Visual (music and video clips) | PSD | 1D-CNN-BiLSTM | 99.27% | 99.20% |