| Literature DB >> 30930804 |
Sergio Giraldo1, George Waddell2,3, Ignasi Nou1, Ariadna Ortega1, Oscar Mayor1, Alfonso Perez1, Aaron Williamon2,3, Rafael Ramirez1.
Abstract
The automatic assessment of music performance has become an area of increasing interest due to the growing number of technology-enhanced music learning systems. In most of these systems, the assessment of musical performance is based on pitch and onset accuracy, but very few pay attention to other important aspects of performance, such as sound quality or timbre. This is particularly true in violin education, where the quality of timbre plays a significant role in the assessment of musical performances. However, obtaining quantifiable criteria for the assessment of timbre quality is challenging, as it relies on consensus among the subjective interpretations of experts. We present an approach to assess the quality of timbre in violin performances using machine learning techniques. We collected audio recordings of several tone qualities and performed perceptual tests to find correlations among different timbre dimensions. We processed the audio recordings to extract acoustic features for training tone-quality models. Correlations among the extracted features were analyzed and feature information for discriminating different timbre qualities were investigated. A real-time feedback system designed for pedagogical use was implemented in which users can train their own timbre models to assess and receive feedback on their performances.Entities:
Keywords: automatic assessment of music; machine learning; music performance; tone quality; violin performance
Year: 2019 PMID: 30930804 PMCID: PMC6427949 DOI: 10.3389/fpsyg.2019.00334
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Figure 1Overall framework for automatic tone assessment using machine learning.
Tonal semantic dimensions defined by music experts.
| Dark | Bright |
| Cold | Warm |
| Harsh | Sweet |
| Dry | Resonant |
| Light | Heavy |
| Grainy | Pure |
| Coarse | Smooth |
| Closed | Open |
| Restricted | Free |
| Narrow | Broad |
List of audio features.
| Pitch | Fundamental frequency in Hz |
| Energy | Mean Square Root over a 600 ms window |
| Tristimulus1 | Relation of the first fundamental harmonic over the total of harmonic peaks |
| Tristimulus2 | Relation of the second plus the third harmonic peak over the total of harmonic peaks |
| Tristimulus3 | Relation of the remaining harmonic peaks after the third over the total of harmonic peaks |
| specCent | The spectral center of gravity of the spectrum |
| specSpread | The spectral standard deviation |
| specSkew | Measure of the asymmetry of the spectrum around its mean value |
| specKurt | Measure of the flatness of the spectrum around its mean value |
| specSlope | Computed from the slope of the linear regression over the spectral amplitude values |
| specDecr | Averages the set of slopes of the lowest frequencies |
| specRolloff | Defined as the frequency below which 95% of the signal energy is contained |
| specFlat | Ratio between the geometric and the arithmetic mean of the spectrum |
| specCrest | Ratio between the maximum arithmetic mean and the arithmetic mean of the spectrum |
| MFCC | Mel frequency cepstral coefficients |
Inter-subject correlations and Cronbach's alpha for the tone quality dimensions perceptual study.
| Bright | 0.25 | 0.7 |
| Dark | 0.29 | 0.7 |
| Cold | 0.30 | 0.7 |
| Warm | 0.27 | 0.8 |
| Harsh | 0.28 | 0.6 |
| Sweet | 0.25 | 0.8 |
| Dry | 0.28 | 0.8 |
| Resonant | 0.28 | 0.8 |
| Light | 0.29 | 0.7 |
| Heavy | 0.31 | 0.8 |
| Grainy | 0.27 | 0.9 |
| Pure | 0.29 | 0.8 |
| Coarse | 0.34 | 0.8 |
| Smooth | 0.27 | 0.6 |
| Open | 0.24 | 0.9 |
| Closed | 0.25 | 0.9 |
| Restricted | 0.24 | 0.9 |
| Free | 0.27 | 0.9 |
| Narrow | 0.35 | 0.9 |
| Broad | 0.27 | 0.8 |
Inter-dimension correlations for the tone quality dimensions perceptual study.
| Dark/bright | 1.00 | |||||||||
| Cold/warm | −0.12 | 1.00 | ||||||||
| Harsh/sweet | 0.28 | 0.50 | 1.00 | |||||||
| Dry/resonant | 0.14 | 0.41 | 0.70 | 1.00 | ||||||
| Light/heavy | −0.48 | 0.08 | −0.35 | −0.26 | 1.00 | |||||
| Grainy/pure | 0.34 | 0.30 | 0.79 | 0.55 | −0.38 | 1.00 | ||||
| Coarse/smooth | 0.26 | 0.21 | 0.68 | 0.47 | −0.36 | 0.87 | 1.00 | |||
| Closed/open | 0.37 | 0.29 | 0.62 | 0.59 | −0.25 | 0.62 | 0.51 | 1.00 | ||
| Restricted/free | 0.39 | 0.26 | 0.63 | 0.55 | −0.23 | 0.69 | 0.67 | 0.67 | 1.00 | |
| Narrow/broad | 0.29 | 0.32 | 0.68 | 0.60 | −0.13 | 0.66 | 0.63 | 0.68 | 0.81 | 1.00 |
Inter-dimension correlations for expert-defined tone quality dimensions vs. Good Sounds scales.
| Dark/bright | 0.16 | 0.13 | 0.17 | 0.10 | 0.17 | 0.19 |
| Cold/warm | 0.32 | 0.22 | 0.19 | 0.18 | 0.33 | 0.37 |
| Harsh/sweet | 0.71 | 0.49 | 0.64 | 0.39 | 0.60 | 0.70 |
| Dry/resonant | 0.51 | 0.39 | 0.43 | 0.21 | 0.44 | 0.49 |
| Light/heavy | −0.24 | −0.22 | −0.28 | −0.14 | −0.13 | −0.18 |
| Grainy/pure | 0.75 | 0.52 | 0.72 | 0.49 | 0.57 | 0.71 |
| Coarse/smooth | 0.75 | 0.53 | 0.71 | 0.52 | 0.47 | 0.74 |
| Closed/open | 0.52 | 0.44 | 0.49 | 0.27 | 0.56 | 0.53 |
| Restricted/free | 0.58 | 0.47 | 0.55 | 0.41 | 0.43 | 0.66 |
| Narrow/broad | 0.58 | 0.44 | 0.56 | 0.38 | 0.52 | 0.70 |
Figure 2Confusion matrix of the obtained ratings of the listener over the considered recorded tonal qualities.
Figure 3Feature selection: Rankings based on information gain.
Figure 4Learning Curves showing the increase of features organized by information gain.
Multi-class classification accuracies measured as CCI% for Train (T) and 10-fold cross validation (CV).
| By register | Low | 6.41 | 47.26/52.81 | 84.02/85.66 | 88.34/91.6 |
| Mid | 6.59 | 48.10/53.57 | 88.54/85.89 | 87.09/89.57 | |
| High | 6.78 | 47.84/52.08 | 86.66/86.39 | 85.03/91.31 | |
| By position | Pos. I | 6.66 | 53.27/51.87 | 89.28/86.34 | 79.15/77.67 |
| Pos. V | 6.13 | 49.56/48.51 | 89.57/86.12 | 77.88/74.64 | |
| By finger | 1st. | 6.45 | 48.78/52.37 | 87.7/85.08 | 88.67/91.14 |
| 2nd. | 6.44 | 49.76/52.65 | 86.94/87.50 | 88.16/90.12 | |
| 3rd. | 6.76 | 48.81/51.82 | 88.9/87.56 | 86.38/90.65 | |
| 4th. | 6.64 | 49.77/52.87 | 88.81/85.53 | 87.58/92.91 |
Binary classification accuracies measured as CCI% for Train (T) and 10-fold cross validation (CV) for Pitch subgroup.
| High | Dark/bright | 54.95 | 89.45 | 92.55 | 98.18 |
| Cold/warm | 53.31 | 90.65 | 92.61 | 98.95 | |
| Harsh/sweet | 54.59 | 88.75 | 93.05 | 97.83 | |
| Dry/resonant | 53.76 | 87.81 | 92.95 | 98.1 | |
| Light/heavy | 51.44 | 89.04 | 92.82 | 97.5 | |
| Grainy/pure | 50.96 | 85.24 | 92.79 | 97.3 | |
| Coarse/smooth | 51.04 | 87.42 | 94.84 | 98.82 | |
| Closed/open | 53.48 | 89.24 | 94.24 | 97.56 | |
| Restricted/free | 50.96 | 89.90 | 94.34 | 98.40 | |
| Narrow/broad | 54.38 | 86.05 | 94.6 | 97.27 | |
| Medium | Dark/bright | 54.25 | 89.47 | 92.75 | 97.88 |
| Cold/warm | 54.25 | 90.97 | 94.10 | 97.31 | |
| Harsh/sweet | 52.62 | 86.71 | 94.88 | 97.86 | |
| Dry/resonant | 52.29 | 88.44 | 92.39 | 97.77 | |
| Light/heavy | 51.66 | 87.17 | 93.92 | 97.11 | |
| Grainy/pure | 54.64 | 89.94 | 93.48 | 97.71 | |
| Coarse/smooth | 52.41 | 85.41 | 93.16 | 97.68 | |
| Closed/open | 54.77 | 88.00 | 92.04 | 97.79 | |
| Restricted/free | 50.41 | 85.49 | 94.62 | 97.83 | |
| Narrow/broad | 52.96 | 87.75 | 94.15 | 97.73 | |
| Low | Dark/bright | 54.62 | 88.81 | 94.57 | 97.73 |
| Cold/warm | 52.14 | 87.96 | 93.48 | 97.15 | |
| Harsh/sweet | 51.92 | 86.25 | 94.12 | 97.85 | |
| Dry/resonant | 53.31 | 90.10 | 93.66 | 98.15 | |
| Light/heavy | 54.95 | 85.85 | 92.02 | 98.11 | |
| Grainy/pure | 57.20 | 87.09 | 93.62 | 98.21 | |
| Coarse/smooth | 51.52 | 89.65 | 92.2 | 98.44 | |
| Closed/open | 53.37 | 89.95 | 93.15 | 97.18 | |
| Restricted/free | 53.52 | 88.82 | 93.72 | 98.78 | |
| Narrow/broad | 54.52 | 85.96 | 92.29 | 98.48 |
Statistically significant improvement.
Binary classification accuracies measured as CCI% for Train (T) and 10-fold cross validation (CV) for Position subgroup.
| First | Closed/open | 57.75 | 88.56 | 86.33 | 95.25 |
| Coarse/smooth | 52.62 | 90.82 | 89.33 | 97.47 | |
| Cold/warm | 56.30 | 88.26 | 87.17 | 96.56 | |
| Dark/bright | 54.10 | 87.80 | 81.77 | 97.61 | |
| Dry/resonant | 52.19 | 82.07 | 80.38 | 92.90 | |
| Grainy/pure | 54.02 | 82.81 | 81.06 | 94.69 | |
| Harsh/sweet | 57.44 | 86.47 | 82.32 | 97.62 | |
| Light/heavy | 51.32 | 88.57 | 85.38 | 93.55 | |
| Narrow/broad | 54.17 | 89.21 | 86.49 | 98.25 | |
| Restricted/free | 54.51 | 86.14 | 83.82 | 96.41 | |
| Fifth | Closed/open | 58.18 | 84.79 | 92.97 | 95.02 |
| Coarse/smooth | 53.32 | 86.02 | 94.90 | 97.73 | |
| Cold/warm | 55.25 | 80.07 | 91.98 | 95.15 | |
| Dark/bright | 59.31 | 83.04 | 91.70 | 97.03 | |
| Dry/resonant | 52.16 | 80.68 | 92.96 | 95.13 | |
| Grainy/pure | 51.51 | 82.63 | 92.68 | 94.69 | |
| Harsh/sweet | 53.61 | 81.46 | 93.37 | 96.80 | |
| Light/heavy | 55.32 | 88.99 | 94.20 | 96.18 | |
| Narrow/broad | 56.44 | 86.61 | 94.37 | 96.76 | |
| Restricted/free | 58.71 | 86.01 | 95.14 | 96.62 |
Statistically significant improvement.
Figure 5Confusion matrix of the ANN model over the training set.
Binary classification accuracies measured as CCI% for the real-time framework tests.
| Subject 1 | Rich/poor | 52.62 | 98.32 |
| Subject 2 | Rich/poor | 54.63 | 98.41 |
| Subject 3 | Rich/poor | 51.45 | 98.23 |
| Subject 4 | Rich/poor | 54.62 | 97.38 |
| Subject 1 | Light/heavy | 51.30 | 98.40 |
| Subject 2 | Bad/good | 53.96 | 97.30 |
| Subject 3 | Thin/full | 51.02 | 97.84 |
| Subject 4 | Light/heavy | 53.14 | 98.73 |
Statistically significant improvement.
Binary classification accuracies measured as CCI% for cross validation (CV) among performers and models.
| Test note by | Subject 1 | 92.08 | 58.33 | 63.29 | 55.26 |
| Subject 2 | 88.97 | 92.44 | 64.79 | 62.83 | |
| Subject 3 | 55.96 | 57.61 | 92.67 | 57.62 | |
| Subject 4 | 56.99 | 56.03 | 56.23 | 92.34 | |
Accuracy obtained by subject 2 on model by subject 1 after several trials.