| Literature DB >> 31142750 |
Vincent Isnard1,2,3, Véronique Chastres1, Isabelle Viaud-Delmon2, Clara Suied4.
Abstract
Human listeners are able to recognize accurately an impressive range of complex sounds, such as musical instruments or voices. The underlying mechanisms are still poorly understood. Here, we aimed to characterize the processing time needed to recognize a natural sound. To do so, by analogy with the "rapid visual sequential presentation paradigm", we embedded short target sounds within rapid sequences of distractor sounds. The core hypothesis is that any correct report of the target implies that sufficient processing for recognition had been completed before the time of occurrence of the subsequent distractor sound. We conducted four behavioral experiments using short natural sounds (voices and instruments) as targets or distractors. We report the effects on performance, as measured by the fastest presentation rate for recognition, of sound duration, number of sounds in a sequence, the relative pitch between target and distractors and target position in the sequence. Results showed a very rapid auditory recognition of natural sounds in all cases. Targets could be recognized at rates up to 30 sounds per second. In addition, the best performance was observed for voices in sequences of instruments. These results give new insights about the remarkable efficiency of timbre processing in humans, using an original behavioral paradigm to provide strong constraints on future neural models of sound recognition.Entities:
Mesh:
Year: 2019 PMID: 31142750 PMCID: PMC6541711 DOI: 10.1038/s41598-019-43126-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Tested conditions in the main experiment with sequences of short sounds presented rapidly.
| Conditions | RASP experiments | |||
|---|---|---|---|---|
| ‘very short sounds’ | ‘number of sounds’ | ‘pitch’ | ‘RT’ | |
| Sound duration | 16 ms | 32 ms | 32 ms | 32 ms |
| Sequence type | Fixed duration | Fixed duration vs. fixed number of sounds | Fixed number of sounds | Fixed number of sounds |
| Pitch | Randomized | Randomized | Randomized vs. fixed | Randomized |
| Presentation rate | 5.3, 7.5, 10.6, 15, 21.2, 30 and 60 Hz | 5.3, 7.5, 10.6, 15, 21.2 and 30 Hz | 5.3, 7.5, 10.6, 15, 21.2 and 30 Hz | 5.3, 7.5, 10.6, 15, 21.2 and 30 Hz |
Figure 1Recognition of individual short sounds (participant selection experiment). Error bars represent the standard errors of the means (too small to be visible in the graph). Performance, as measured by d-prime, increased as the sound duration increased for both the selected group (black curve) and the excluded group of participants (grey curve).
Figure 2RASP performances: recognition of a short target in a sequence of short distractors presented rapidly. Mean d-prime scores are plotted for each experiment condition as a function of presentation rates. The error bars represent the standard errors of the means. Results from the control experiment, when short sounds were presented in isolation, are represented by a diamond on the left of the curves. For all panels, performance linearly decreased (on a log scale) as the presentation rate increased and voice were better recognized within instruments than the reverse. Panel a: Performance for sequences composed of 16-ms sounds. Panels b and c: Sequences of 32-ms sounds were presented. Each line is an average of the ‘fixed duration’ and the ‘fixed number of sounds’ conditions and of the ‘random pitch’ and the ‘control pitch’ conditions respectively. There was respectively no difference between these conditions for all presentation rates.
Figure 3Recognition of a target presented in isolation or in a RASP sequence evaluated with RTs: faster RT for voices. Mean RTs (in ms) for both instrument and voice targets. The results for isolated targets in the gating experiment are represented as single points on the left of the curves. The error bars represent the standard errors of the means.