| Literature DB >> 34901433 |
Liang Xu1, Zaoyi Sun2, Xin Wen1, Zhengxi Huang1, Chi-Ju Chao3, Liuchang Xu4,5.
Abstract
Melody and lyrics, reflecting two unique human cognitive abilities, are usually combined in music to convey emotions. Although psychologists and computer scientists have made considerable progress in revealing the association between musical structure and the perceived emotions of music, the features of lyrics are relatively less discussed. Using linguistic inquiry and word count (LIWC) technology to extract lyric features in 2,372 Chinese songs, this study investigated the effects of LIWC-based lyric features on the perceived arousal and valence of music. First, correlation analysis shows that, for example, the perceived arousal of music was positively correlated with the total number of lyric words and the mean number of words per sentence and was negatively correlated with the proportion of words related to the past and insight. The perceived valence of music was negatively correlated with the proportion of negative emotion words. Second, we used audio and lyric features as inputs to construct music emotion recognition (MER) models. The performance of random forest regressions reveals that, for the recognition models of perceived valence, adding lyric features can significantly improve the prediction effect of the model using audio features only; for the recognition models of perceived arousal, lyric features are almost useless. Finally, by calculating the feature importance to interpret the MER models, we observed that the audio features played a decisive role in the recognition models of both perceived arousal and perceived valence. Unlike the uselessness of the lyric features in the arousal recognition model, several lyric features, such as the usage frequency of words related to sadness, positive emotions, and tentativeness, played important roles in the valence recognition model. ©2021 Xu et al.Entities:
Keywords: Audio signal processing; Chinese pop song; LIWC; Lyric feature extraction; Music emotion recognition
Year: 2021 PMID: 34901433 PMCID: PMC8627224 DOI: 10.7717/peerj-cs.785
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Figure 1The proposed modeling method.
Figure 2Data distribution of the 2372 Chinese songs in this study.
(A) Distribution of annotated emotions in the valence-arousal emotion space. (B) Word clouds of the top words used in each quadrant of the valence-arousal emotion space. The font size depends on the usage frequency of the word (positive correlation).
Correlation between lyric features and perceived arousal in music.
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
|---|---|---|---|---|---|---|---|---|---|
| 1. Arousal | 1 | ||||||||
| 2. | -.121 | 1 | |||||||
| 3. | -.122 | .168 | 1 | ||||||
| 4. | -.115 | .299 | .139 | 1 | |||||
| 5. | .111 | 0.023 | .146 | −0.006 | 1 | ||||
| 6. | -.124 | .517 | .153 | .322 | 0.014 | 1 | |||
| 7. | .206 | .056 | .080 | 0.004 | .073 | 0.014 | 1 | ||
| 8. | .179 | .062 | .105 | 0.021 | .088 | 0.025 | .873 |
| |
| 9. | .183 | −0.029 | 0.02 | −0.032 | .103 | −0.031 | .174 | .114 | 1 |
Notes.
Correlation is significant at the 0.01 level (2-tailed).
Abbreviations: PastM, proportion of past tense markers; Insight, proportion of words related to insight; Time, proportion of words related to time; Achieve, proportion of words related to achievement; tPast, proportion of words related to the past; WordCount, the total number of words; WordPerSentence, average number of words per sentence; RateLatinWord, the ratio of Latin words.
Correlation between lyric features and perceived valence in music.
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
|---|---|---|---|---|---|---|---|---|---|
| 1. Arousal | 1 | ||||||||
| 2. | -.121 | 1 | |||||||
| 3. | -.122 | .168 | 1 | ||||||
| 4. | -.115 | .299 | .139 | 1 | |||||
| 5. | .111 | 0.023 | .146 | −0.006 | 1 | ||||
| 6. | -.124 | .517 | .153 | .322 | 0.014 | 1 | |||
| 7. | .206 | .056 | .080 | 0.004 | .073 | 0.014 | 1 | ||
| 8. | .179 | .062 | .105 | 0.021 | .088 | 0.025 | .873 | 1 | |
| 9. | .183 | −0.029 | 0.02 | −0.032 | .103 | −0.031 | .174 | .114 | 1 |
Notes.
Correlation is significant at the 0.01 level (2-tailed).
Abbreviations: Adverb, proportion of adverbs; TenseM, proportion of tense markers; PastM, proportion of past tense markers; NegEmo, proportion of negative emotion words; Sad, proportion of words related to sadness; CogMech, proportion of words related to cognition; Tentat, proportion of words related to tentativeness; tPast, proportion of words related to the past.
The best performing parameters for each random forest regression.
| Ground truth | Inputs | Parameters | ||||
|---|---|---|---|---|---|---|
|
|
|
|
|
| ||
| Arousal | AF | 156 | 10 | 8 | 18 | 0.2 |
| LF | 196 | 50 | 5 | 8 | 0.8 | |
| CF | 136 | 27 | 3 | 12 | 0.4 | |
| Valence | AF | 179 | 15 | 13 | 22 | 0.5 |
| LF | 193 | 38 | 3 | 8 | 0.8 | |
| CF | 191 | 43 | 5 | 25 | 0.6 | |
Notes.
Abbreviations: AF indicates audio features; LF indicates lyric features; and CF indicates combined features.
Figure 3Performance of constructed MER models with different inputs and algorithms.
(A) Prediction results of perceived arousal recognition models, measured by R2 statistics. (B) Prediction results of perceived arousal recognition models, measured by RMSE. (C) Prediction results of perceived valence recognition models, measured by R2 statistics. (D) Prediction results of perceived valence recognition models, measured by RMSE. Error bars indicate the standard deviations.
Figure 4Distribution of feature importance for RFR-based recognition models of perceived arousal and perceived valence.
Arranged in descending order of the mean value, the top 30 features were included for visibility, and the trend of the remaining features was approximately the same. Error bars indicate the standard deviations.