| Literature DB >> 29354080 |
Naresh N Vempala1, Frank A Russo1,2.
Abstract
Emotion judgments and five channels of physiological data were obtained from 60 participants listening to 60 music excerpts. Various machine learning (ML) methods were used to model the emotion judgments inclusive of neural networks, linear regression, and random forests. Input for models of perceived emotion consisted of audio features extracted from the music recordings. Input for models of felt emotion consisted of physiological features extracted from the physiological recordings. Models were trained and interpreted with consideration of the classic debate in music emotion between cognitivists and emotivists. Our models supported a hybrid position wherein emotion judgments were influenced by a combination of perceived and felt emotions. In comparing the different ML approaches that were used for modeling, we conclude that neural networks were optimal, yielding models that were flexible as well as interpretable. Inspection of a committee machine, encompassing an ensemble of networks, revealed that arousal judgments were predominantly influenced by felt emotion, whereas valence judgments were predominantly influenced by perceived emotion.Entities:
Keywords: computational modeling; machine learning; music cognition; music emotion; neural networks; physiological responses; random forests
Year: 2018 PMID: 29354080 PMCID: PMC5760560 DOI: 10.3389/fpsyg.2017.02239
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
RMSE values of the five perception networks.
| Fold | Valence RMSE | Arousal RMSE |
|---|---|---|
| 1 | 0.27 | 0.18 |
| 2 | 0.21 | 0.34 |
| 3 | 0.16 | 0.33 |
| 4 | 0.26 | 0.14 |
| 5 | 0.16 | 0.15 |
| Mean | 0.21 | 0.23 |
| SE | 0.03 | 0.05 |
RMSE values of the five feeling networks.
| Fold | Valence RMSE | Arousal RMSE |
|---|---|---|
| 1 | 0.26 | 0.25 |
| 2 | 0.24 | 0.33 |
| 3 | 0.19 | 0.29 |
| 4 | 0.23 | 0.24 |
| 5 | 0.24 | 0.35 |
| Mean | 0.23 | 0.29 |
| SE | 0.01 | 0.02 |
Summary of all ML model results.
| Machine learning methods | ||||||
|---|---|---|---|---|---|---|
| Neural networks | Multiple linear regression | Random forests | ||||
| Valence | Arousal | Valence | Arousal | Valence | Arousal | |
| Audio features (perception models) | Five trained models from fivefold cross-validation (44 excerpts) | Five trained models from fivefold cross-validation (44 excerpts) | Four trained models from fivefold cross-validation (44 excerpts) | Five trained models from fivefold cross-validation (44 excerpts) | ||
| Ensemble performance (16 excerpts) RMSE | 0.27 | 0.24 | 0.25 | 0.23 | 0.25 | 0.20 |
| Physiology features (feeling models) | Five trained models from fivefold cross-validation (44 excerpts) | Five trained models from fivefold cross-validation (44 excerpts) | Three trained models from fivefold cross-validation (44 excerpts) | No model | ||
| Ensemble performance (16 excerpts) RMSE | 0.34 | 0.23 | 0.66 | No model | 0.28 | 0.26 |
| Committee Machine – CMLR (16 excerpts) RMSE | 0.26 | 0.20 | ||||