Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Time-frequency scattering accurately models auditory similarities between instrumental playing techniques.

Literature DB >> 33488686

Time-frequency scattering accurately models auditory similarities between instrumental playing techniques.

Vincent Lostanlen¹, Christian El-Hajj¹, Mathias Rossignol², Grégoire Lafay², Joakim Andén^3,4, Mathieu Lagrange¹.

Abstract

Instrumentalplaying techniques such as vibratos, glissandos, and trills often denote musical expressivity, both in classical and folk contexts. However, most existing approaches to music similarity retrieval fail to describe timbre beyond the so-called "ordinary" technique, use instrument identity as a proxy for timbre quality, and do not allow for customization to the perceptual idiosyncrasies of a new subject. In this article, we ask 31 human participants to organize 78 isolated notes into a set of timbre clusters. Analyzing their responses suggests that timbre perception operates within a more flexible taxonomy than those provided by instruments or playing techniques alone. In addition, we propose a machine listening model to recover the cluster graph of auditory similarities across instruments, mutes, and techniques. Our model relies on joint time-frequency scattering features to extract spectrotemporal modulations as acoustic features. Furthermore, it minimizes triplet loss in the cluster graph by means of the large-margin nearest neighbor (LMNN) metric learning algorithm. Over a dataset of 9346 isolated notes, we report a state-of-the-art average precision at rank five (AP@5) of 99.0%±1. An ablation study demonstrates that removing either the joint time-frequency scattering transform or the metric learning algorithm noticeably degrades performance.

Entities: Chemical Disease Gene Species

Keywords: Audio databases; Audio similarity; Continuous wavelet transform; Demodulation; Distance learning; Human–computer interaction; Music information retrieval

Year: 2021 PMID： 33488686 PMCID： PMC7801324 DOI： 10.1186/s13636-020-00187-z

Source DB: PubMed Journal: EURASIP J Audio Speech Music Process ISSN： 1687-4714

21 in total

1. Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex.

Authors: D A Depireux; J Z Simon; D J Klein; S A Shamma
Journal: J Neurophysiol Date: 2001-03 Impact factor: 2.714

2. Computer identification of musical instruments using pattern recognition with cepstral coefficients as features.

Authors: J C Brown
Journal: J Acoust Soc Am Date: 1999-03 Impact factor: 1.840

3. Robust spectrotemporal reverse correlation for the auditory system: optimizing stimulus design.

Authors: D J Klein; D A Depireux; J Z Simon; S A Shamma
Journal: J Comput Neurosci Date: 2000 Jul-Aug Impact factor: 1.621

4. The dependency of timbre on fundamental frequency.

Authors: Jeremy Marozeau; Alain de Cheveigné; Stephen McAdams; Suzanne Winsberg
Journal: J Acoust Soc Am Date: 2003-11 Impact factor: 1.840

5. Multiresolution spectrotemporal analysis of complex sounds.

Authors: Taishih Chi; Powen Ru; Shihab A Shamma
Journal: J Acoust Soc Am Date: 2005-08 Impact factor: 1.840

6. Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition.

Authors: Marc René Schädler; Birger Kollmeier
Journal: J Acoust Soc Am Date: 2015-04 Impact factor: 1.840

Time-frequency scattering accurately models auditory similarities between instrumental playing techniques.

1. Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex.

2. Computer identification of musical instruments using pattern recognition with cepstral coefficients as features.

3. Robust spectrotemporal reverse correlation for the auditory system: optimizing stimulus design.

4. The dependency of timbre on fundamental frequency.

5. Multiresolution spectrotemporal analysis of complex sounds.

6. Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition.

7. Modeling the onset advantage in musical instrument recognition.

8. Perceptual scaling of synthesized musical timbres: common dimensions, specificities, and latent subject classes.

9. The spectro-temporal receptive field. A functional characteristic of auditory neurons.

10. A multiresolution analysis for detection of abnormal lung sounds.