| Literature DB >> 27997625 |
Kirk N Olsen1,2, Roger T Dean1, Yvonne Leung1.
Abstract
Phrasing facilitates the organization of auditory information and is central to speech and music. Not surprisingly, aspects of changing intensity, rhythm, and pitch are key determinants of musical phrases and their boundaries in instrumental note-based music. Different kinds of speech (such as tone- vs. stress-languages) share these features in different proportions and form an instructive comparison. However, little is known about whether or how musical phrasing is perceived in sound-based music, where the basic musical unit from which a piece is created is commonly non-instrumental continuous sounds, rather than instrumental discontinuous notes. This issue forms the target of the present paper. Twenty participants (17 untrained in music) were presented with six stimuli derived from sound-based music, note-based music, and environmental sound. Their task was to indicate each occurrence of a perceived phrase and qualitatively describe key characteristics of the stimulus associated with each phrase response. It was hypothesized that sound-based music does elicit phrase perception, and that this is primarily associated with temporal changes in intensity and timbre, rather than rhythm and pitch. Results supported this hypothesis. Qualitative analysis of participant descriptions showed that for sound-based music, the majority of perceived phrases were associated with intensity or timbral change. For the note-based piano piece, rhythm was the main theme associated with perceived musical phrasing. We modeled the occurrence in time of perceived musical phrases with recurrent event 'hazard' analyses using time-series data representing acoustic predictors associated with intensity, spectral flatness, and rhythmic density. Acoustic intensity and timbre (represented here by spectral flatness) were strong predictors of perceived musical phrasing in sound-based music, and rhythm was only predictive for the piano piece. A further analysis including five additional spectral measures linked to timbre strengthened the models. Overall, results show that even when little of the pitch and rhythm information important for phrasing in note-based music is available, phrasing is still perceived, primarily in response to changes of intensity and timbre. Implications for electroacoustic music composition and music recommender systems are discussed.Entities:
Mesh:
Year: 2016 PMID: 27997625 PMCID: PMC5172564 DOI: 10.1371/journal.pone.0167643
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Time-stamped phrase responses and acoustic time-series data.
Displays all time-stamped responses assigned to timbre, intensity, or rhythm categories across the entire time-course of all six stimuli. Acoustic categories were based on thematic analyses of participants’ qualitative descriptions of each perceived phrase. The diamond-shaped markers on the three horizontal scales in each panel signify these data and specifically, the point at which each phrase was perceived to have ended. The ‘×’ symbol signifies the moment in each stimulus where a 1kHz pure tone was presented. The pure-tone indicated to participants that they were to begin responding and was used to mark the beginning of the first perceived phrase. Time-series data of acoustic intensity (solid line; dB SPL on left y-axis) and spectral flatness (dashed line; Wiener entropy on right y-axis) are also plotted for each stimulus. All perceptual and acoustic data are presented at a sampling rate of 2Hz.
Number of perceived phrases assigned to intensity, timbre, or rhythm categories.
| Stimulus | Stimulus Description | Assigned Acoustic Category | |||
|---|---|---|---|---|---|
| Timbre | Intensity | Rhythm | Total | ||
| BBC SoundFX: | Environmental sound | 24 | 68 | 2 | 94 |
| Ng & Dean: | Noise sound-based piece | 68 | 25 | 7 | 100 |
| Eno: | Ambient sound-based piece | 95 | 8 | 2 | 105 |
| Wishart: | Hybrid sound-based piece | 123 | 22 | 1 | 146 |
| Beethoven: | Instrumental note-based piece | 32 | 31 | 64 | 127 |
| Xenakis: | Instrumental sound-based piece | 45 | 49 | 6 | 100 |
Note. Categories assigned to each perceived phrase across participants were based on thematic analyses conducted by the authors of participants’ qualitative descriptions of each perceived phrase (see S1 Appendix).
Summary of selected cox hazard models of perceived phrases using the whole-phrase ‘global’ approach.
| Stimulus | Likelihood Ratio | Model | Robust Model | |
|---|---|---|---|---|
| BBC SoundFX: | .51 | 67.07 | < .001 | .002 |
| Ng & Dean: | .42 | 54.40 | < .001 | .019 |
| Eno: | .31 | 37.94 | < .001 | .003 |
| Wishart: | .38 | 69.70 | < .001 | .004 |
| Beethoven: | .30 | 45.32 | < .001 | .023 |
| Xenakis: | .56 | 82.30 | < .001 | .060 |
Note. ‘Model p-value’ refers to the p-value based on the Likelihood Ratio and assumes independence of observations within a cluster (i.e., a group of successive phrase perceptions by an individual listening to a particular piece). The more conservative ‘Robust Model p-values’ do not; the time transform used in all these models was *log(t).
Description of predictors used in cox hazard models of perceived phrases.
| Parent Predictor | Description and Potential Range of Values |
|---|---|
| Mean acoustic intensity (dB SPL) | |
| Mean spectral flatness (Wiener Entropy) | |
| Mean rhythmic density (number of onset events per 500 ms) | |
| Mean sparsity (reciprocal of the number of onset events per 500 ms) | |
| Mean spectral centroid | |
| Mean spectral spread | |
| Mean spectral flux | |
| Mean roughness | |
| Mean inharmonicity | |
| tt(x) | Time transform of the predictor (x) |
Note. Models included additional versions of each ‘parent’ predictor above: mean change (Mchg) and mean absolute change (Mabschg).
Predictors in the selected cox hazard models of perceived phrases using the ‘whole-phrase’ global approach.
| Stimulus | Predictor | Coefficient | Robust | Coefficient |
|---|---|---|---|---|
| BBC SoundFX | 34.56 | 9.49 | < .001 | |
| 16.32 | 4.82 | .001 | ||
| 2.93 | .89 | < .001 | ||
| tt( | -15.74 | 4.82 | .001 | |
| Ng & Dean | -221.10 | 83.20 | < .01 | |
| -1.36 | .26 | < .01 | ||
| 54.60 | 21.70 | .011 | ||
| .02 | .02 | >.05 | ||
| tt( | 21.00 | 7.78 | < .01 | |
| tt( | -4.91 | 2.04 | < .05 | |
| Wishart | 16.61 | 4.82 | < .001 | |
| 1.33 | .40 | < .001 | ||
| 8.83 | 2.45 | < .001 | ||
| -.03 | .01 | .001 | ||
| tt( | -1.03 | .53 | .051 | |
| tt( | -.71 | .01 | .001 | |
| Eno | -140.30 | 25.19 | < .001 | |
| -20.39 | 4.46 | < .001 | ||
| tt( | 12.26 | 2.24 | < .001 | |
| tt( | 1.79 | .39 | < .001 | |
| Xenakis | -204.90 | 39.24 | < .001 | |
| 6.10 | 1.25 | < .001 | ||
| -19.84 | 2.16 | < .001 | ||
| .51 | .10 | < .001 | ||
| -.93 | .16 | < .001 | ||
| -.12 | .02 | < .001 | ||
| -1.55 | .32 | < .001 | ||
| tt( | 18.48 | 3.50 | < .001 | |
| tt( | 1.62 | .18 | < .001 | |
| Beethoven | 1.97 | .68 | < .01 | |
| 44.10 | 8.60 | < .001 | ||
| 5.61 | 1.66 | < .001 | ||
| 11.43 | 2.61 | < .001 | ||
| -8.12 | 2.81 | < .01 | ||
| -.47 | .22 | < .05 | ||
| tt( | -.06 | .02 | < .01 |
Note. Asterisked (*) predictors are individually non-significant but nevertheless required in the selected model. A colon placed between the two predictors denotes an interaction between them; p-values rounded to three decimal places. The time transform used in all these models was *log(t).
Comparison of R2 values between whole-phrase ‘global’ models and ‘terminal portion’ models.
| Stimulus | Whole-Phrase ‘Global’ Models | ‘Terminal Portion’ Models |
|---|---|---|
| BBC SoundFX: | .51 | .52 |
| Ng & Dean: | .42 | .52 |
| Eno: | .31 | .28 |
| Wishart: | .38 | .42 |
| Beethoven: | .30 | .28 |
| Xenakis: | .56 | .47 |
Summary of selected cox hazard models of perceived phrases using the whole-phrase global approach and additional spectral parameters.
| Stimulus | Likelihood Ratio | Model | Robust Model | |
|---|---|---|---|---|
| BBC SoundFX: | .67 | 103.20 | < .001 | .030 |
| Ng & Dean: | .57 | 84.15 | < .001 | .103 |
| Eno: | .50 | 72.12 | < .001 | .013 |
| Wishart: | .60 | 134.00 | < .001 | .032 |
| Beethoven: | .32 | 48.32 | < .001 | .021 |
| Xenakis: | .64 | 130.00 | < .001 | 1.000 |
Note. ‘Model p-value’ refers to the p-value based on the Likelihood Ratio and assumes independence of observations within a cluster (i.e., a group of successive phrase perceptions by an individual listening to a particular piece). The more conservative ‘Robust Model p-values’ do not; the time transform used in all these models was *log(t).
Predictors in the selected cox hazard models of perceived phrases using the global approach and additional spectral parameters.
| Stimulus | Predictor | Coefficient | Robust | Coefficient |
|---|---|---|---|---|
| BBC SoundFX | 4.32 | .62 | < .001 | |
| -.06 | .01 | < .001 | ||
| -.02 | .01 | < .001 | ||
| 349.70 | 41.88 | < .001 | ||
| 2956.00 | 433.10 | < .001 | ||
| -47.47 | 17.73 | < .01 | ||
| 101.80 | 29.02 | < .001 | ||
| .90 | .11 | < .001 | ||
| tt( | -.01 | .01 | < .001 | |
| Ng & Dean | -11.57 | 3.56 | < .01 | |
| 216.00 | 72.95 | < .01 | ||
| 3.47 | .78 | < .001 | ||
| .01 | .01 | < .01 | ||
| -.01 | .01 | < .001 | ||
| -269.40 | 154.60 | >.05 | ||
| 44.74 | 13.50 | < .001 | ||
| -35.11 | 15.02 | < .05 | ||
| -2.99 | 1.03 | < .01 | ||
| tt( | -.01 | .01 | < .01 | |
| Wishart | 72.37 | 18.12 | < .001 | |
| -.03 | .01 | < .001 | ||
| .02 | .01 | < .001 | ||
| .02 | .01 | < .01 | ||
| 332.00 | 58.50 | < .001 | ||
| 63.69 | 18.58 | < .001 | ||
| -149.50 | 29.11 | < .001 | ||
| 78.96 | 25.11 | < .01 | ||
| -.07 | .02 | < .001 | ||
| tt( | -5.52 | 1.40 | < .001 | |
| Eno | -25.05 | 7.62 | < .01 | |
| -161.40 | 37.13 | < .001 | ||
| .02 | .01 | < .001 | ||
| -24.91 | 8.25 | < .01 | ||
| -1893.00 | 370.20 | < .001 | ||
| -1827.00 | 401.00 | < .001 | ||
| tt( | 13.99 | 3.32 | < .001 | |
| tt( | 2.17 | .68 | < .01 | |
| Xenakis | -3.35 | .36 | < .001 | |
| .21 | .07 | < .01 | ||
| -1.02 | .13 | < .001 | ||
| .02 | .01 | < .001 | ||
| -.03 | .01 | < .001 | ||
| -.16 | .02 | < .001 | ||
| -1.13 | .25 | < .001 | ||
| tt(mspecsp) | .01 | .01 | < .001 | |
| tt(mintens) | .01 | .01 | < .001 | |
| Beethoven | 37.89 | 7.23 | < .001 | |
| 5.28 | 1.23 | < .001 | ||
| 6.92 | 1.92 | < .001 | ||
| -7.33 | 1.57 | < .001 | ||
| -.01 | .01 | < .05 | ||
| -28.15 | 2.09 | < .001 | ||
| -.05 | .02 | < .05 | ||
| tt( | -3.21 | .63 | < .001 |
Note. Asterisked (*) predictors are individually non significant but nevertheless required in the selected model. A colon placed between the two predictors denotes an interaction between them; p-values rounded to three decimal places. The time transform used in all these models was *log(t).