| Literature DB >> 35087600 |
Abstract
Music style is one of the important labels for music classification, and the current music style classification methods extract features such as rhythm and timbre of music and use classifiers to achieve classification. The classification accuracy is not only affected by the classifier but also limited by the effect of music feature extraction, which leads to poor classification accuracy and stability. In response to the abovementioned defects, a deep-learning-based music style classification method will be studied. The music signal is framed using filters and Hamming windows, and the MFCC coefficient features of music are extracted by discrete Fourier transform. A convolutional recurrent neural network structure combining CNN and RNN is designed and trained to determine the parameters to achieve music style classification. Analysis of the simulation experimental data shows that the classification accuracy of the studied classification method is at least 93.3%, and the classification time overhead is significantly reduced, the classification results are stable, and the results are reliable.Entities:
Mesh:
Year: 2022 PMID: 35087600 PMCID: PMC8789415 DOI: 10.1155/2022/3699885
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Figure 1MFCC feature extraction process.
CRNN network structure.
| Network hierarchy number | Network layer name | Input size | Network hierarchy number | Network layer name | Input size |
|---|---|---|---|---|---|
| 1 | Input layer | 1 × 43 × 128 | 10 | 3 × 3 maximum pooling layer 3 | 128 × 3 × 6 |
| 2 | Convolutional layer | 513 × 4 × 15 | 11 | 0.25 descending layer | 128 × 3 × 6 |
| 3 | Residual unit | 1 × 4 × 100 | 12 | Filters | 256 × 7 × 10 |
| 4 | 3 × 3 maximum pooling layer 1 | 32 × 15 × 44 | 13 | Global maximum pooling layer | 256 × 1 × 1 |
| 5 | 0.25 descending layer | 32 × 15 × 44 | 14 | LSTM | 128 × 6 × 16 |
| 6 | Filters | 64 × 19 × 48 | 15 | Fully connected layer 1 | 300 |
| 7 | 3 × 3 maximum pooling layer 2 | 64 × 6 × 16 | 16 | Fully connected layer 2 | 150 |
| 8 | 0.25 descending layer | 64 × 6 × 16 | 17 | Fully connected layer 3 | 10/6 |
| 9 | Filters | 128 × 10 × 20 | 18 | Output layer |
Figure 2Convolutional recurrent neural network training flow chart.
Music training and testing database.
| Dance music | Lyricism | Jazz | Chinese folk music | Rock ‘n' roll (music) | |
|---|---|---|---|---|---|
| Number of training library music | 500 | 500 | 500 | 500 | 500 |
| Number of test library music | 240 | 360 | 214 | 351 | 168 |
List of convolutional neural network parameters corresponding to different acoustic spectrograms.
| Input type | STFT | Mel | CQT |
|---|---|---|---|
| Convolutional layer 1 | 513 × 4 × 128 | 128 × 4 × 128 | 128 × 4 × 128 |
| Maximum value pooling 1 | 1 × 2 | 1 × 2 | 1 × 2 |
| Convolutional layer 2 | 1 × 4 × 128 | 1 × 4 × 128 | 1 × 4 × 128 |
| Maximum value pooling 2 | 1 × 2 | 1 × 2 | 1 × 2 |
| Convolutional layer 3 | 1 × 4 × 128 | 1 × 4 × 128 | 1 × 4 × 128 |
| Maximum value pooling 3 | 1 × 26 | 1 × 26 | 1 × 26 |
| Fully connected layer 1 | 300 | 300 | 300 |
| Fully connected layer 2 | 150 | 150 | 150 |
| Fully connected layer 3 | 10/6 | 10/6 | 10/6 |
Comparison of classification performance of different spectrograms.
| Dataset | GTZAN | ISMIR2004 |
|---|---|---|
| STFT | 85.46% | 86.41% |
| Mel | 80.25% | 82.19% |
| CQT | 85.35% | 86.57% |
Figure 3Experimental results for a sample size of 100.
Figure 4Experimental results for sample size of 50.
Comparison of classification accuracy and time overhead of classification methods.
| Experiment number | Method of this article | SVM classification method | CNN classification method | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Accuracy rate (%) | Standard deviation (%) | Time overhead (s) | Accuracy rate (%) | Standard deviation (%) | Time overhead (s) | Accuracy rate (%) | Standard deviation (%) | Time overhead (s) | |
| 1 | 95.5 | 4.8865 | 0.031 | 86.2 | 8.0637 | 0.268 | 90.3 | 6.3851 | 0.165 |
| 2 | 96.7 | 4.5457 | 0.042 | 84.3 | 8.5172 | 0.295 | 89.9 | 6.7534 | 0.164 |
| 3 | 93.3 | 4.6319 | 0.056 | 85.1 | 8.6944 | 0.179 | 91.2 | 6.6892 | 0.178 |
| 4 | 94.8 | 6.7436 | 0.049 | 86.4 | 8.3786 | 0.306 | 90.8 | 6.8465 | 0.122 |
| 5 | 96.1 | 4.8063 | 0.050 | 83.6 | 8.9015 | 0.293 | 91.7 | 6.7643 | 0.163 |