| Literature DB >> 35769273 |
Abstract
Since the existing music emotion classification researches focus on the single-modal analysis of audio or lyrics, the correlation among models are neglected, which lead to partial information loss. Therefore, a music emotion classification method based on deep learning and improved attention mechanism is proposed. First, the music lyrics features are extracted by Term Frequency-Inverse Document Frequency (TF-IDF) and Word2vec method, and the term frequency weight vector and word vector are obtained. Then, by using the feature extraction ability of Convolutional Neural Network (CNN) and the ability of Long Short-Term Memory (LSTM) network to process the serialized data, and integrating the matching attention mechanism, an emotion analysis model based on CNN-LSTM is constructed. Finally, the output results of the deep neural network and CNN-LSTM model are fused, and the emotion types are obtained by Softmax classifier. The experimental analysis based on the selected data sets shows that the average classification accuracy of the proposed method is 0.848, which is better than the other comparison methods, and the classification efficiency has been greatly improved.Entities:
Mesh:
Year: 2022 PMID: 35769273 PMCID: PMC9236837 DOI: 10.1155/2022/5181899
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1Word2vec training word vector model.
Figure 2Emotion classification model of lyrics based on CNN-LSTM.
Figure 3BiLSTM structure.
Figure 4CNN model.
Figure 5LSTM model.
The size of the data set and the number of songs contained in each type of emotion.
| Training set | Test set | Total | |
|---|---|---|---|
| Happy | 1078 | 194 | 1272 |
| Sad | 1267 | 317 | 1584 |
| Calm | 882 | 221 | 1103 |
| Healing | 1061 | 266 | 1327 |
| Total | 4288 | 998 | 5286 |
Parameter setting of LSTM network model.
| Parameter | Value |
|---|---|
| Loss function | Softmax |
| Optimizer | Adam |
| Learning rate | 0.01 |
| Activation function | Tanh |
| Dropout | 0.03 |
| Batch size | 50 |
| Epoch | 30 |
Classification accuracy results of different text features.
| Text features | TF-IDF | Word2vec |
|---|---|---|
| Classification method | SVM | |
| Happy | 0.762 | 0.801 |
| Sad | 0.735 | 0.779 |
| Calm | 0.647 | 0.698 |
| Healing | 0.661 | 0.703 |
| Average | 0.701 | 0.746 |
Experimental results of three attention mechanisms.
| Model | Classification accuracy |
|---|---|
| Traditional attention mechanism | 0.753 |
| Matching attention mechanism | 0.826 |
Figure 6Classification accuracy of different methods.
Classification results of different methods.
| Method | Reference [ | Reference [ | Reference [ | Proposed method |
|---|---|---|---|---|
| Happy | 0.853 | 0.861 | 0.887 | 0.903 |
| Sad | 0.828 | 0.837 | 0.849 | 0.864 |
| Calm | 0.742 | 0.739 | 0.753 | 0.809 |
| Healing | 0.751 | 0.766 | 0.771 | 0.816 |
| Average | 0.794 | 0.801 | 0.815 | 0.848 |