| Literature DB >> 27895552 |
Beáta T Szabó1, Susan L Denham2, István Winkler3.
Abstract
Auditory scene analysis (ASA) refers to the process (es) of parsing the complex acoustic input into auditory perceptual objects representing either physical sources or temporal sound patterns, such as melodies, which contributed to the sound waves reaching the ears. A number of new computational models accounting for some of the perceptual phenomena of ASA have been published recently. Here we provide a theoretically motivated review of these computational models, aiming to relate their guiding principles to the central issues of the theoretical framework of ASA. Specifically, we ask how they achieve the grouping and separation of sound elements and whether they implement some form of competition between alternative interpretations of the sound input. We consider the extent to which they include predictive processes, as important current theories suggest that perception is inherently predictive, and also how they have been evaluated. We conclude that current computational models of ASA are fragmentary in the sense that rather than providing general competing interpretations of ASA, they focus on assessing the utility of specific processes (or algorithms) for finding the causes of the complex acoustic signal. This leaves open the possibility for integrating complementary aspects of the models into a more comprehensive theory of ASA.Entities:
Keywords: auditory object representation; auditory scene analysis; auditory streaming; bi-/multi-stable perception; computational model; predictive processing
Year: 2016 PMID: 27895552 PMCID: PMC5108797 DOI: 10.3389/fnins.2016.00524
Source DB: PubMed Journal: Front Neurosci ISSN: 1662-453X Impact factor: 4.677
Figure 1Schematic depiction of the auditory streaming paradigm (A) and its possible perceptual interpretations grouped into 3 categories (B–D). Rectangles depict the “A” and “B” tones. Sounds perceived as part of the same stream are connected by lines in the lower panels (B–D). Darker notes with gray background indicate the stream in the foreground (also described with symbols to the right of each of the lower panels; B–D). Reprinted with permission from Farkas et al. (2016b).
Figure 2Schematic illustration of a stimulus including a “figure” component. Black dots depict random tonal elements while red ones represent repeating ones. Chord onsets are represented as vertical lines. The x axis shows both time and the serial position within the stimulus. The y axis provides a qualitative representation of frequency. Figure duration (the number of repeated tone complexes), figure coherence (number of tonal components comprising the repeated tone complex), and the range within which the figure can appear are marked.
The models reviewed together with their categorizations with respect to the main issues discussed in the review.
| Barniv and Nelken, | yes | yes | Bayesian | unlimited |
| Nix and Hohmann, | yes | yes | Bayesian | 2 |
| Wang and Chang, | yes | no | Neural | 2 |
| Pichevar and Rouat, | yes | no | Neural | 2 |
| Mill et al., | yes | yes | Neural | unlimited |
| Rankin et al., | yes | no | Neural | 3 |
| Krishnan et al., | no | no | Temporal Coherence | 2 |
| Ma, | no | no | Temporal Coherence | 2 |
| Elhilali and Shamma, | no | yes | Temporal Coherence | 2 |