| Literature DB >> 32063665 |
Carlos Eduardo Cancino-Chacón1,2, Thassilo Gadermaier1, Gerhard Widmer1,2, Maarten Grachten1,2.
Abstract
Expressive interpretation forms an important but complex aspect of music, particularly in Western classical music. Modeling the relation between musical expression and structural aspects of the score being performed is an ongoing line of research. Prior work has shown that some simple numerical descriptors of the score (capturing dynamics annotations and pitch) are effective for predicting expressive dynamics in classical piano performances. Nevertheless, the features have only been tested in a very simple linear regression model. In this work, we explore the potential of non-linear and temporal modeling of expressive dynamics. Using a set of descriptors that capture different types of structure in the musical score, we compare linear and different non-linear models in a large-scale evaluation on three different corpora, involving both piano and orchestral music. To the best of our knowledge, this is the first study where models of musical expression are evaluated on both types of music. We show that, in addition to being more accurate, non-linear models describe interactions between numerical descriptors that linear models do not.Entities:
Keywords: Artificial neural networks; Computational models of music performance; Musical expression; Non-linear basis models
Year: 2017 PMID: 32063665 PMCID: PMC6994224 DOI: 10.1007/s10994-017-5631-y
Source DB: PubMed Journal: Mach Learn ISSN: 0885-6125 Impact factor: 2.940
Fig. 1Schematic view of expressive dynamics as a function of basis functions , representing dynamic annotations
Fig. 2Illustration of merging and fusion of score information of two different parts belonging to the same instrument class “Oboe”. The matrix on the left shows two example basis functions, and , for the first notes of each of the two score parts. The matrix top right is the result of merging basis functions of different “Oboe” instantiations into a single set. The matrix on the bottom right is the result of fusion, applied per basis function to each set of values occurring at the same time point
Fig. 3The architecture of the used NBM for modeling expressive dynamics. From bottom to top, the circles represent the input layer, two successive hidden layers and the output layer, respectively. From left to right, advancing time steps are shown
Fig. 4Bidirectional RNBM for modeling expressive dynamics. The single hidden layer is made up from forward (fw) and backward (bw) recurrent hidden units. Left to right shows advancing time steps
Musical material contained in the RCO/Symphonic corpus
| Composer | Piece | Movements | Conductor |
|---|---|---|---|
| Beethoven | Symphony no. 5 in C-Min. (op. 67) | 1, 2, 3, 4 | Fischer |
| Beethoven | Symphony no. 6 in F-Maj. (op. 68) | 1, 2, 3, 4, 5 | Fischer |
| Beethoven | Symphony no. 9 in D-Min. (op. 125) | 1, 2, 3, 4 | Fischer |
| Mahler | Symphony no. 4 in G-Maj. | 1, 2, 3, 4 | Jansons |
| Bruckner | Symphony no. 9 in D-Min. (WAB 109) | 1, 2, 3 | Jansons |
Predictive accuracy for expressive dynamics in terms of explained variance () and Pearson’s correlation coefficient (r), averaged over a fivefold cross-validation on each of the three corpora
| Model | Magaloff/Chopin | Zeilinger/Beethoven | RCO/Symphonic | |||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |
| LBM | 0.171 | 0.470 | 0.197 | 0.562 |
| 0.312 |
| NBM (100, 20) | 0.195 | 0.478 | 0.266 | 0.568 | 0.242 | 0.528 |
| RNBM (20rec) | 0.205 | 0.518 | ||||
| RNBM (20lstm) | 0.271 | 0.590 | ||||
| RNBM (100, 20rec) | 0.282 | 0.609 | ||||
Due to the structure of the data, the RNBM models cannot be directly used to model expressive dynamics in the Magaloff/Chopin and Zeilinger/Beethoven corpora (see text)
Basis functions with the largest sensitivity coefficients for the LBM models
| Magaloff/Chopin | Zeilinger/Beethoven | ||||
|---|---|---|---|---|---|
| Basis function | Active (%) |
| Basis function | Active (%) |
|
| pitch | 100.00 | 0.168 |
|
|
|
| slur decr | 63.06 | 0.067 |
|
|
|
|
| 42.71 | 0.067 |
|
|
|
|
| 39.12 | 0.059 |
| 26.76 | 0.042 |
| duration | 100.00 | 0.035 |
| 37.48 | 0.036 |
|
|
|
|
| 26.07 | 0.036 |
| slur incr | 62.13 | 0.033 |
|
|
|
|
| 35.49 | 0.031 |
|
|
|
|
|
|
| pitch | 100.00 | 0.024 |
|
| 12.86 | 0.023 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| slur incr | 35.92 | 0.018 |
|
|
|
| duration | 100.00 | 0.017 |
|
|
|
| slur decr | 37.66 | 0.017 |
|
|
|
|
| 18.08 | 0.017 |
Averages are reported over the fivefolds of the cross-validation. Dynamics markings are in bold italic. Basis functions that are non-zero for less than 5% of the instances have been grayed out
Basis functions with the largest sensitivity coefficients for the NBM models
| Magaloff/Chopin | Zeilinger/Beethoven | ||||||
|---|---|---|---|---|---|---|---|
| Basis function | Active (%) |
|
| Basis function | Active (%) |
|
|
| pitch | 100.00 | 0.200 | 0.207 |
|
|
|
|
|
| 42.71 | 0.037 | 0.124 |
| 18.08 | 0.071 | 0.135 |
|
| 41.56 | 0.036 | 0.100 | duration | 100.00 | 0.056 | 0.126 |
| slur decr | 63.06 | 0.044 | 0.084 |
| 26.76 | 0.046 | 0.096 |
|
| 39.12 | 0.059 | 0.083 |
| 26.07 | 0.083 | 0.095 |
|
| 35.49 | 0.051 | 0.073 | slur incr | 35.92 | 0.046 | 0.080 |
| slur incr | 62.13 | 0.062 | 0.073 |
| 37.48 | 0.052 | 0.068 |
| duration | 100.00 | 0.040 | 0.072 |
| 64.41 | 0.025 | 0.034 |
|
| 12.86 | 0.016 | 0.028 | slur decr | 37.66 | 0.020 | 0.033 |
|
| 23.18 | 0.006 | 0.020 |
| 40.07 | 0.029 | 0.032 |
|
|
|
|
| pitch | 100.00 | 0.032 | 0.032 |
|
|
|
|
|
|
|
|
|
|
| 5.46 | 0.007 | 0.012 | ritardando | 31.09 | 0.005 | 0.010 |
|
| 41.94 | 0.007 | 0.012 |
|
|
|
|
|
|
|
|
| staccato | 8.41 | 0.005 | 0.005 |
|
| 5.79 | 0.001 | 0.011 |
|
|
|
|
Averages are reported over the fivefolds of the cross-validation. Dynamics markings are in bold italic. Basis functions that are non-zero for less than 5% of the instances have been grayed out
Fig. 5Example of the effect of the interaction of crescendo after a diminuendo for both LBM and NBM models