| Literature DB >> 24822035 |
Yuan-Pin Lin1, Yi-Hsuan Yang2, Tzyy-Ping Jung1.
Abstract
Electroencephalography (EEG)-based emotion classification during music listening has gained increasing attention nowadays due to its promise of potential applications such as musical affective brain-computer interface (ABCI), neuromarketing, music therapy, and implicit multimedia tagging and triggering. However, music is an ecologically valid and complex stimulus that conveys certain emotions to listeners through compositions of musical elements. Using solely EEG signals to distinguish emotions remained challenging. This study aimed to assess the applicability of a multimodal approach by leveraging the EEG dynamics and acoustic characteristics of musical contents for the classification of emotional valence and arousal. To this end, this study adopted machine-learning methods to systematically elucidate the roles of the EEG and music modalities in the emotion modeling. The empirical results suggested that when whole-head EEG signals were available, the inclusion of musical contents did not improve the classification performance. The obtained performance of 74~76% using solely EEG modality was statistically comparable to that using the multimodality approach. However, if EEG dynamics were only available from a small set of electrodes (likely the case in real-life applications), the music modality would play a complementary role and augment the EEG results from around 61-67% in valence classification and from around 58-67% in arousal classification. The musical timber appeared to replace less-discriminative EEG features and led to improvements in both valence and arousal classification, whereas musical loudness was contributed specifically to the arousal classification. The present study not only provided principles for constructing an EEG-based multimodal approach, but also revealed the fundamental insights into the interplay of the brain activity and musical contents in emotion modeling.Entities:
Keywords: EEG; affective brain-computer interface; emotion classification; music listening; music signal processing
Year: 2014 PMID: 24822035 PMCID: PMC4013455 DOI: 10.3389/fnins.2014.00094
Source DB: PubMed Journal: Front Neurosci ISSN: 1662-453X Impact factor: 4.677
Figure 1Electrode placements of 32 channels according to the international 10–20 system.
A summary of EEG feature types.
| DLAT | 24 | 60 | Five differential spectral band power (δ, θ, α, β, and γ) for 12 left-right electrode pairs: Fp1-Fp2, F7-F8, F3-F4, FT7-FT8, FC3-FC4, T7-T8, P7-P8, C3-C4, TP7-TP8, CP3-CP4, P3-P4, and O1-O2. |
| DCAU | 24 | 60 | Five differential spectral band power (δ, θ, α, β, and γ) for 12 fronto-posterior electrode pairs: Fp1-O1, Fp2-O2, F7-P7, F3-P3, Fz-Pz, F4-P4, F8-P8, FT7-TP7, FC3-CP3, FCz-CPz, FC4-CP4, and FT8-TP8. |
| PSD | 30 | 150 | Five spectral band power (δ, θ, α, β, and γ) for 30 electrodes: Fp1, Fp2, F7, F3, Fz, F4, F8, FT7, FC3, FCz, FC4, FT8, T7, C3, Cz, C4, T8, TP7, CP3, CPz, CP4, TP8, P7, P3, Pz, P4, P8, O1, Oz, and O2. |
| MESH | 30 | 270 | A combination of DLAT, DCAU, and PSD. |
A summary of music feature types.
| Pitch | 3 | Key clarity, Mode, Harmonic flux |
| Dissonance | 4 | Tonal dissonance (HK,S), Spectral dissonance (HK, S) |
| Loudness | 5 | Loudness, Sharpness (Z, A), Timbral width, Volume |
| MFCC | 13 | MFCC coefficients (13 features) |
| MUSIC | 25 | A combination of Pitch, Dissonance, Loudness, and MFCC |
Figure 2The valence and arousal classification results using the subject-dependent EEG feature sets with/without the F-score based feature selection. The numbers above the bars represent the mean values of the results, whereas the numbers in bold indicate the accuracies significantly better (p < 0.01) than the majority voting accuracy (valence: ~63%, arousal: ~61%). †Indicates that the accuracy with feature selection significantly outperformed that without feature selection (p < 0.01).
Figure 3The valence and arousal classification results using the subject-dependent multimodal approach with/without feature selection. The results of the subject-dependent EEG modality (feature type: MESH) and the music modality (feature type: MUSIC) are also provided for comparison. The numbers above the bars represent the mean values of the results, whereas the numbers in bold indicate the accuracies significantly better (p < 0.01) than the majority voting accuracy (valence: ~63%, arousal: ~61%). †Indicates that the accuracy with feature selection significantly outperformed that without feature selection (p < 0.01).
Figure 4The percent composition of contributions of EEG (DLAT, DCAU, and PSD) and musical (Pitch, Dissonance, Loudness, and MFCC) features to the subject-dependent multimodality. The composition of the subject-dependent EEG modality is also provided for comparison.
Figure 5The valence and arousal classification results of the subject-independent EEG features (type: MESH) in term of the average number of features, electrodes, and accuracies using with/without feature selection under the LFI criteria (0.1 ~ 0.6). The numbers near to the nodes represent the mean values of the results. †Indicates that the accuracy with feature selection significantly outperformed that without feature selection (p < 0.01), yet were comparable (p > 0.1) to majority voting accuracies (valence: ~63%, arousal: ~61%).
Figure 6The topographic mapping of informative EEG features consistently appeared in multiple subjects. The rightmost topography color-codes the importance of electrodes according to how frequent the electrodes were used to derive the corresponding features.
The informative EEG features that consistently appeared across multiple subjects.
| 1 | DLAT: FT7-FT8 (Theta) | DCAU: FC3-CP3 (Delta) |
| 2 | DLAT: FC3-FC4 (Alpha) | DLAT: C3-C4 (Alpha) |
| 3 | DLAT: F3-F4 (Delta) | DLAT: F7-F8 (Theta) |
| 4 | DLAT: FT7-FT8 (Delta) | DLAT: FC3-FC4 (Theta) |
| 5 | DLAT: TP7-TP8 (Delta) | |
| 6 | DCAU: F3-P3 (Beta) | |
| 7 | PSD: T7 (Gamma) |
Figure 7The valence and arousal classification results using the subject-independent multimodal approach (LFI = 0.6) with/without feature selection. The results of the subject-independent EEG modality (feature type: MESH) and the music modality (feature type: MUSIC) are also provided for comparison. The numbers above the bars represent the mean values of the results, whereas the numbers in bold indicate the accuracies significantly better (p < 0.02) than the majority voting accuracy (valence: ~63%, arousal: ~61%). †indicates that the accuracy with feature selection significantly outperformed that without feature selection (p < 0.01).
Figure 8The percent composition of contributions of EEG (DLAT, DCAU, and PSD) and musical (Pitch, Dissonance, Loudness, and MFCC) features to the subject-independent multimodality. The composition of the subject-independent EEG modality is also provided for comparison.
The informative musical features in the subject-independent multimodal approach.
| 1 | Dissonance: Spectral dissonance (S) | Loudness: Sharpness (Z) |
| 2 | Pitch: Mode | MFCC: 8th |
| 3 | Loudness: Sharpness (A) | |
| 4 | Pitch: Harmonic flux |