| Literature DB >> 35432090 |
Lindsey Reymore1, Emmanuelle Beauvais-Lacasse1, Bennett K Smith1, Stephen McAdams1.
Abstract
Audio features such as inharmonicity, noisiness, and spectral roll-off have been identified as correlates of "noisy" sounds. However, such features are likely involved in the experience of multiple semantic timbre categories of varied meaning and valence. This paper examines the relationships of stimulus properties and audio features with the semantic timbre categories raspy/grainy/rough, harsh/noisy, and airy/breathy. Participants (n = 153) rated a random subset of 52 stimuli from a set of 156 approximately 2-s orchestral instrument sounds representing varied instrument families (woodwinds, brass, strings, percussion), registers (octaves 2 through 6, where middle C is in octave 4), and both traditional and extended playing techniques (e.g., flutter-tonguing, bowing at the bridge). Stimuli were rated on the three semantic categories of interest, as well as on perceived playing exertion and emotional valence. Correlational analyses demonstrated a strong negative relationship between positive valence and perceived physical exertion. Exploratory linear mixed models revealed significant effects of extended technique and pitch register on valence, the perception of physical exertion, raspy/grainy/rough, and harsh/noisy. Instrument family was significantly related to ratings of airy/breathy. With an updated version of the Timbre Toolbox (R-2021 A), we used 44 summary audio features, extracted from the stimuli using spectral and harmonic representations, as input for various models built to predict mean semantic ratings for each sound on the three semantic categories, on perceived exertion, and on valence. Random Forest models predicting semantic ratings from audio features outperformed Partial Least-Squares Regression models, consistent with previous results suggesting that non-linear methods are advantageous in timbre semantic predictions using audio features. Relative Variable Importance measures from the models among the three semantic categories demonstrate that although these related semantic categories are associated in part with overlapping features, they can be differentiated through individual patterns of audio feature relationships.Entities:
Keywords: audio features; extended technique; musical instrument timbre; register; timbre; timbre analysis; timbre and noise perception; timbre semantics
Year: 2022 PMID: 35432090 PMCID: PMC9010607 DOI: 10.3389/fpsyg.2022.796422
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Pearson’s correlation coefficients among ratings of perceived valence, playing exertion, airy/breathy, raspy/grainy/rough, and harsh/noisy.
| Airy/breathy | Raspy/grainy/ | Harsh/noisy | Valence | |
| Raspy, grainy, rough | –0.09 | |||
| Harsh, noisy | –0.54 | 0.53 | ||
| Valence | 0.31 | –0.90 | –0.61 | |
| Exertion | –0.04 | 0.50 | 0.46 | –0.48 |
df = 154, Holm-corrected, ***p < 0.001.
Effects of the factors register (R), extended technique (ET), and instrument family (F) on each semantic category, as well as on valence and exertion.
| Airy/breathy | Raspy/grainy/rough | Harsh/noisy | ||||
|
|
|
|
|
|
| |
| Intercept | 1257.45 | <0.001 | 1826.71 | <0.001 | 2214.71 | <0.001 |
|
| 3.49 | 0.48 | 48.08 | <0.001 | 14.81 | 0.005 |
| ET | 0.72 | 0.40 | 137.00 | <0.001 | 37.61 | <0.001 |
|
| 22.95 | <0.001 | 1.85 | 0.39 | 1.92 | 0.38 |
|
| ||||||
|
|
| |||||
|
|
|
|
| |||
|
| ||||||
| Intercept | 3421.90 | <0.001 | 3721.21 | <0.001 | ||
|
| 22.99 | <0.001 | 27.95 | <0.001 | ||
| ET | 78.32 | <0.001 | 13.27 | <0.001 | ||
|
| 4.34 | 0.11 | 3.58 | 0.17 | ||
**p < 0.01; ***p < 0.001.
FIGURE 1Estimated marginal means for the absence/presence of an extended technique in models of exertion, valence, raspy/grainy/rough, and harsh/noisy; vertical bars represent 95% confidence intervals.
FIGURE 3Estimated marginal means for instrument family in the airy/breathy model; vertical bars represent 95% confidence intervals.
FIGURE 2Estimated marginal means for register (octave) in models of exertion, valence, raspy/grainy/rough, and harsh/noisy; vertical bars represent 95% confidence intervals.
Audio features extracted from Timbre Toolbox.
| Audio features from Timbre Toolbox | ||
| Representation | Feature | Description |
| STFT | Spectral centroid | Center of gravity of the spectrum |
| STFT | Spectral spread | Standard deviation of the spectrum around the mean |
| STFT | Spectral skewness | Asymmetry of the spectrum around the mean |
| STFT | Spectral kurtosis | Flatness of the spectrum around the mean |
| STFT | Spectral flatness | Ratio of the geometric and arithmetic means of the spectrum |
| STFT | Spectral crest | Ratio of the spectral maximum to the arithmetic spectral mean |
| STFT | Spectral slope | Linear regression over the spectral amplitude values |
| STFT | Spectral decrease | Average of slopes between F0 and 2nd to |
| STFT | Spectral roll-off | Frequency below which 95% of the signal energy is contained |
| STFT | Spectral variation | A measure of variability of the spectrum over time: correlation between spectra in successive time frames |
| STFT | Spectral flux | A measure of variability of the spectrum over time: Euclidean distance between spectra in successive time frames |
| HARM | F0 | Fundamental frequency of a periodic sound |
| HARM | Harmonic spectral deviation | Deviation of the amplitudes of the partials from a smoothed spectral envelope |
| HARM | Tristimulus 1 | Ratio of energy of the 1st harmonic to total energy |
| HARM | Tristimulus 2 | Ratio of energy of the 2nd, 3rd, and 4th harmonics to total energy |
| HARM | Tristimulus 3 | Ratio of energy of remaining harmonics (above 4th) to total energy |
| HARM | Harmonic odd-to-even ratio | Ratio of energy of odd harmonics to even harmonics |
| HARM | Inharmonicity | Degree to which frequencies of overtones depart from multiples of the fundamental frequency |
| HARM | Harmonic energy | Energy of the signal explained by stable partials |
| HARM | Noise energy | Energy of the signal not explained by stable partials |
| HARM | Noisiness | Ratio of noise energy to total energy |
| HARM | Harmonic-to-noise ratio | Ratio between periodic and non-periodic components of a signal |
| TEE | Attack time | Duration of the attack portion of the sound |
| TEE | Log attack time | Logarithm of the duration of the attack portion of the sound |
| TEE | Attack slope | Rate of change of energy over time in the attack portion |
| TEE | Decrease slope | Measure of the rate of decrease of the signal energy |
| TEE | Temporal centroid | Center of gravity of the energy envelope |
| TEE | Effective duration | Time during which energy envelope is above 40% (intended to reflect perceived duration) |
| TEE | Frequency of energy modulation | Frequency of the modulation of energy over the sustained portion of the sound as represented using a sinusoidal component |
| TEE | Amplitude of energy modulation | Amplitude of the modulation of energy over the sustained portion of the sound as represented using a sinusoidal component |
STFT, short-time Fourier transform; HARM, harmonic; TEE, temporal energy envelope. For further detail on how features are computed, see
FIGURE 4Dendrogram of hierarchical clustering of feature values across the stimulus set. Colors indicate groups of features organized by a five-cluster solution.
Summary of models predicting mean ratings from audio features.
| Rating | Model type |
| Average | Average RMSE |
|
| RF | 0.82 | 0.78 | 0.47 |
| PLSR | 0.64 | 0.56 | 0.70 | |
|
| RF | 0.56 | 0.54 | 0.69 |
| PLSR | 0.43 | 0.28 | 0.93 | |
|
| RF | 0.45 | 0.43 | 0.78 |
| PLSR | 0.36 | 0.29 | 0.89 | |
| Exertion | RF | 0.34 | 0.32 | 0.84 |
| PLSR | 0.14 | 0.09 | 1.08 | |
| Valence | RF | 0.68 | 0.67 | 0.59 |
| PLSR | 0.56 | 0.52 | 0.72 |
R
Top 10 important variables and their respective relative variable importance values for each semantic category using partial least-squares regression.
|
|
|
| Valence | Exertion | |||||
| Feature | RVI | Feature | RVI | Feature | RVI | Feature | RVI | Feature | RVI |
| Harm Spec Dev IQR | 100 | Spectral Decrease Med | 100 | HNR Med | 100 | HNR Med | 100 | Tristimulus 3 Med | 100 |
| Harm Spec Dev Med | 72.27 | Spectral Centroid Med | 64.79 | Noisiness Med | 92.05 | Noisiness Med | 98.86 | Spectral Centroid IQR | 93.03 |
| Spectral Roll-Off Med | 68.76 | F0 Med | 54.92 | Spectral Variation IQR | 74.63 | Inharmonicity Med | 82.78 | Spectral Roll-Off IQR | 84.34 |
| Spectral Spread Med | 64.50 | Spectral Spread Med | 50.88 | Harmonic Energy Med | 73.24 | Tristimulus 3 Med | 82.68 | Tristimulus 1 Med | 77.77 |
| Spectral Centroid Med | 62.17 | Spectral Roll-Off Med | 50.03 | Inharmonicity Med | 73.17 | Spectral Crest Med | 77.59 | Inharmonicity Med | 74.58 |
| Spectral Flux IQR | 58.61 | Spectral Variation IQR | 46.63 | F0 Med | 72.26 | F0 Med | 75.73 | Harmonic Energy Med | 70.78 |
| Spectral Slope IQR | 58.24 | Harm Spec Dev IQR | 40.62 | Spectral Slope Med | 70.63 | Harm Spec Dev Med | 73.49 | Spectral Spread IQR | 69.89 |
| Tristimulus 1 Med | 56.17 | Harm Spec Dev Med | 39.97 | Spectral Crest Med | 66.84 | Harm Spec Dev IQR | 74.19 | Spectral Slope Med | 68.82 |
| Spectral Flux Med | 52.86 | HNR Med | 31.52 | Tristimulus 3 Med | 63.57 | Spectral Variation IQR | 73.00 | Harm Spec Dev Med | 61.42 |
| Spectral Skewness Med | 48.44 | Spectral Decrease IQR | 31.36 | Spectral Variation Med | 59.95 | Harmonic Energy Med | 68.10 | Spectral Crest Med | 58.98 |
Top 10 important variables and their respective relative variable importance values for each semantic category using random forest regression.
|
|
|
| Valence | Exertion | |||||
| Feature | RVI | Feature | RVI | Feature | RVI | Feature | RVI | Feature | RVI |
| Odd:Even Ratio Med | 100 | Spectral Decrease Med | 100 | HNR Med | 100 | Inharmonicity IQR | 100 | Noisiness IQR | 100 |
| Odd:Even Ratio IQR | 72.34 | Spectral Spread Med | 87.68 | Inharmonicity IQR | 62.11 | HNR Med | 55.98 | F0 | 83.62 |
| Harm Spec Dev IQR | 71.73 | Spectral Roll-Off Med | 72.04 | Spectral Variation Med | 54.34 | Tristimulus 3 Med | 19.59 | HNR IQR | 77.18 |
| Spectral Roll-Off Med | 49.60 | Spectral Centroid Med | 71.21 | Noisiness Med | 40.84 | Odd:Even Ratio Med | 14.55 | Noise Energy Med | 72.15 |
| Spectral Flux IQR | 40.98 | Spectral Spread IQR | 62.49 | Spectral Variation IQR | 39.43 | Noisiness Med | 13.73 | Tristimulus 1 Med | 69.95 |
| Spectral Spread Med | 30.97 | Spectral Variation IQR | 37.58 | Tristimulus 3 Med | 21.00 | Spectral Variation Med | 8.77 | Tristimulus 3 Med | 64.35 |
| Spectral Centroid Med | 29.13 | Spectral Flatness IQR | 31.09 | Inharmonicity Med | 18.98 | Inharmonicity Med | 6.63 | Harmonic Energy Med | 62.72 |
| Spectral Variation IQR | 27.13 | Spectral Variation Med | 26.13 | F0 Med | 18.81 | Harm. Spectral Deviation IQR | 5.97 | Inharmonicity Med | 60.57 |
| Spectral Skewness IQR | 19.02 | Spectral Flatness Med | 24.44 | Harmonic Energy Med | 5.11 | F0 Median | 5.30 | Spectral Variation Med | 55.64 |
| Tristimulus 1 IQR | 16.67 | Noisiness IQR | 24.24 | Tristimulus 1 Med | 3.13 | Spectral Variation IQR | 4.12 | Spectral Slope IQR | 41.25 |
FIGURE 5Radar plots of 14 most important features across the three semantic categories in the random forest models. Radius represents relative variable importance values.