| Literature DB >> 22582038 |
James W Lewis1, William J Talkington, Katherine C Tallaksen, Chris A Frum.
Abstract
Whether viewed or heard, an object in action can be segmented as a distinct salient event based on a number of different sensory cues. In the visual system, several low-level attributes of an image are processed along parallel hierarchies, involving intermediate stages wherein gross-level object form and/or motion features are extracted prior to stages that show greater specificity for different object categories (e.g., people, buildings, or tools). In the auditory system, though relying on a rather different set of low-level signal attributes, meaningful real-world acoustic events and "auditory objects" can also be readily distinguished from background scenes. However, the nature of the acoustic signal attributes or gross-level perceptual features that may be explicitly processed along intermediate cortical processing stages remain poorly understood. Examining mechanical and environmental action sounds, representing two distinct non-biological categories of action sources, we had participants assess the degree to which each sound was perceived as object-like versus scene-like. We re-analyzed data from two of our earlier functional magnetic resonance imaging (fMRI) task paradigms (Engel et al., 2009) and found that scene-like action sounds preferentially led to activation along several midline cortical structures, but with strong dependence on listening task demands. In contrast, bilateral foci along the superior temporal gyri (STG) showed parametrically increasing activation to action sounds rated as more "object-like," independent of sound category or task demands. Moreover, these STG regions also showed parametric sensitivity to spectral structure variations (SSVs) of the action sounds-a quantitative measure of change in entropy of the acoustic signals over time-and the right STG additionally showed parametric sensitivity to measures of mean entropy and harmonic content of the environmental sounds. Analogous to the visual system, intermediate stages of the auditory system appear to process or extract a number of quantifiable low-order signal attributes that are characteristic of action events perceived as being object-like, representing stages that may begin to dissociate different perceptual dimensions and categories of every-day, real-world action sounds.Entities:
Keywords: auditory perception; entropy; functional MRI; motion processing; natural sound categorization; signal feature extraction; spectral structure variation
Year: 2012 PMID: 22582038 PMCID: PMC3348722 DOI: 10.3389/fnsys.2012.00027
Source DB: PubMed Journal: Front Syst Neurosci ISSN: 1662-5137
List of sound stimuli, ordered by object-like to scene-like Likert ratings.
| Clock ticking #3, grandfather (cavernous) | 1.1 | ||
| Water bubbling, hot tub #2 | 3.0 | ||
| Glacier break; | 3.1 | Antique clock chiming and ticking | 1.4 |
| Heavy rain | 3.1 | Paint can lid rolling on floor | 1.4 |
| Water dripping in cave #2 | 3.1 | Fax or copy machine adjusting | 1.5 |
| Lake water wave ashore | 3.2 | Church bell chimes | 1.6 |
| Waves, large lake | 3.2 | Money falling out of slot machine | 1.7 |
| Fire cackling, big forest | 3.3 | Church bells ringing | 1.8 |
| Fire in fire place | 3.3 | Industry generator, compressor | 1.8 |
| Wind blowing #1 | 3.3 | Printer, rotor movements | 1.8 |
| Large river, flowing | 3.4 | Film projector | 2.0 |
| Rain fall, medium hard | 3.4 | Clocks, several ticking | 2.1 |
| Small waterfall | 3.4 | Fax machine | 2.1 |
| Forest fire | 3.5 | Printer, dot matrix | 2.1 |
| Fire crackling #1 | 3.6 | Airplane, propeller | 2.3 |
| Fire crackling #2 | 3.7 | Helicopter passing | 2.3 |
| River flowing | 3.7 | Machinery, chugging sounds | 2.3 |
| Small brush fire | 3.8 | Printer, office | 2.3 |
| Water waves coming ashore | 3.8 | Helicopter #2 | 2.3 |
| Wind blowing, cold | 3.8 | Office machine, handling paper | 2.3 |
| Bubbling water in hot tub | 3.8 | Police car with siren passing by | 2.3 |
| Rocks falling/sliding | 3.8 | Fax machine, paper coming out | 2.4 |
| Wind gusting | 3.9 | Office printer, printing | 2.4 |
| River medium flow | 3.9 | Drying machine #2 | 2.4 |
| Wind blowing #2 | 3.9 | Fireworks going off | 2.4 |
| Water flow, stream | 4.0 | printer, feeding paper | 2.4 |
| Rockslide | 4.1 | Airplane, propellar #2 | 2.6 |
| Rolling thunder | 4.1 | Air conditioner motor turning on | 2.6 |
| Water dripping | 4.1 | Drying machine #1 | 2.6 |
| Wind blowing, low pitch | 4.1 | Puncher metal | 2.7 |
| Wind blowing, quietly | 4.1 | Helicopter #3 | 2.7 |
| Wind, cold breeze #2 | 4.1 | Conveyor belt moving | 2.8 |
| Glacier break | 4.2 | Windshield wipers | 2.8 |
| Wind blowing, high pitch | 4.2 | Airline fly by | 2.9 |
| Wind blowing, whistling #2 | 4.2 | Large newspaper print press, chugging | 3.1 |
| Heavy wind through doorway | 4.3 | Mechanical conveyer moving | 3.3 |
| Glacier break | 4.3 | Garage door opening 2 | 3.4 |
| Raining falling, with thunder | 4.3 | Pressbook, chugging sound | 3.4 |
| Wind blowing #5 | 4.3 | Train, freight passing | 3.4 |
| Wind blowing #6 | 4.3 | Garage door opening | 3.5 |
| Wind blowing #7 | 4.4 | Train squeal breaks to a stop | 3.5 |
| Avalanche | 4.4 | ||
| Glacier break | 4.4 | ||
| Ocean waves #1 | 4.5 | ||
Bold text refers to extreme rated sounds used in Figure 1B.
Figure 1Cortical sensitivity to the perception of auditory “objects” versus acoustic scenes, using real-world non-biological action sounds. (A) Frequency of Likert ratings (1–5) of the Mechanical (M; blue, n = 54 sounds retained) and Environmental sound stimuli (E; green, n = 57). See Table A1 bolded entries for a list of these sounds. (B) Power spectra of the 28 action sounds with the most extreme object-vs-scene ratings in each conceptual category of action sound (refer to color key). (C) Volume-based group-averaged activation common to both Groups A and B (conjunction analyses; yellow with black outlines) that showed preferential activation to sounds judged to be object-like (MO7 and EO7) versus scene-like (MS7 and ES7). Cortical responses to the same sounds were used to define regions preferential for mechanical (blue) versus environmental (green) sounds. Transparent white patches in the left hemisphere depict an overlapping “heat map” of tonotopically organized regions (disregarding orientation of the tonotopic gradient) derived from eight individuals. STS = superior temporal sulcus. (D) Charts illustrating the BOLD percent signal change response profiles as a function of Likert scale rating for both Groups (refer to color key). Blue squares depict mechanical sounds and green circles depict environmental sounds. The group-averaged BOLD percent signal change responses to the human action sounds (red diamonds; left STG 0.62% BOLD signal differential, right 0.73%) and animal action sounds (yellow triangle; left 0.61%, right 0.72%) are also depicted for comparison. (E) Charts separately illustrating BOLD responses to environmental and mechanical action sounds as a function of Likert scale ratings. Refer to text for other details.
Figure 2A double-dissociation of networks preferential for processing sounds perceived more as auditory objects (yellow) versus acoustic scenes (brown) during the sound offset detection task (Group A, Histograms show activation profiles (normalized relative to responses to silent events) for participants from both Group A (n = 12; left-most charts) and B (n = 19; right).
Figure 3Correlations between acoustic signal attributes and perceptual ratings of object-vs-scene non-biological action sounds. (A) Mean entropy measures (z-normalized) showed no significant linear correlations between the sound stimuli as a function of the Likert ratings. (B) Spectral structure variation (SSV) measures (ln(SSV), z-normalized) of the sounds as a function of Likert ratings did revealed significant correlations for both the mechanical (blue) and environmental (green) sounds. (C) Chart derived from panel B showing only the set of 28 extreme rated sounds from Figure 1B. See text for other details.
Figure 4(A) Location of object-vs-scene sensitive cortices (yellow from Figure 1C) relative to regions showing parametric sensitivity to ln(SSV) at p < 0.00001 (red) and mean entropy at p < 0.0001 (purple). Charts show average BOLD signal responses from within the left and right STG foci (n = 31 subjects) relative to (B) ln(SSV) values, (C) mean entropy, and (D) global HNR values. ns = not significant. Refer to text for other details.