| Literature DB >> 35774442 |
Abstract
In order to improve the accuracy of music emotion recognition and classification, this study combines an explicit sparse attention network with deep learning and proposes an effective emotion recognition and classification method for complex music data sets. First, the method uses fine-grained segmentation and other methods to preprocess the sample data set, so as to provide a high-quality input data sample set for the classification model. The explicit sparse attention network is introduced into the deep learning network to reduce the influence of irrelevant information on the recognition results and improve the emotion classification and recognition ability of music sample data set. The simulation experiment is based on the actual data set of the network. The experimental results show that the recognition accuracy of the proposed method is 0.71 for happy emotions and 0.688 for sad emotions. It has a good ability of music emotion recognition and classification.Entities:
Mesh:
Year: 2022 PMID: 35774442 PMCID: PMC9239758 DOI: 10.1155/2022/3920663
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1MFCC extraction process.
Figure 2Audio emotion classification model.
Figure 3CNN layer.
Figure 4LSTM layer.
Figure 5DNN layer.
Figure 6Implementation process of explicit sparse attention mechanism.
Parameter setting of experimental analysis platform.
| Project | Parameter |
|---|---|
| Operating system | Ubuntu 16.04 |
| CPU | Inter(R) core(TM) i5 |
| GPU | GeForce RTX 2060 TI |
| CUDA | 8.0 |
| Python platform | 3.6 |
| Tensorflow | 1.4.2 |
Sample data set.
| Happy | Sad | Relax | Anger | Total | |
|---|---|---|---|---|---|
| Training set | 687 | 515 | 343 | 173 | 1718 |
| Test set | 172 | 129 | 85 | 43 | 429 |
| Total | 859 | 644 | 428 | 216 | 2147 |
Classification accuracy under different data preprocessing methods.
| Preprocessing method | Happy | Sad | Relax | Anger | Average |
|---|---|---|---|---|---|
| Fine-grained segmentation | 0.687 | 0.659 | 0.636 | 0.618 | 0.650 |
| Vocal separation | 0.678 | 0.660 | 0.643 | 0.632 | 0.653 |
| Fine-grained segmentation + vocal separation | 0.712 | 0.689 | 0.661 | 0.654 | 0.679 |
| Proposed preprocessing method | 0.737 | 0.723 | 0.698 | 0.688 | 0.712 |
Model classification accuracy under different data features.
| Audio features | Happy | Sad | Relax | Anger | Average |
|---|---|---|---|---|---|
| LLDs | 0.668 | 0.651 | 0.624 | 0.604 | 0.637 |
| Spectrogram | 0.657 | 0.643 | 0.665 | 0.623 | 0.647 |
| LLDs + spectrogram | 0.712 | 0.689 | 0.661 | 0.654 | 0.679 |
Model classification accuracy under different attention mechanisms.
| Accuracy | Cross entropy | |
|---|---|---|
| Traditional attention mechanism | 0.682 | 0.654 |
| Explicit sparse attention mechanism | 0.712 | 0.631 |
Figure 7Performance of music emotion analysis under different classification methods.