| Literature DB >> 31808369 |
Adam Weisser1,2, Jörg M Buchholz1,2, Gitte Keidser2,3.
Abstract
The concept of complex acoustic environments has appeared in several unrelated research areas within acoustics in different variations. Based on a review of the usage and evolution of this concept in the literature, a relevant framework was developed, which includes nine broad characteristics that are thought to drive the complexity of acoustic scenes. The framework was then used to study the most relevant characteristics for stimuli of realistic, everyday, acoustic scenes: multiple sources, source diversity, reverberation, and the listener's task. The effect of these characteristics on perceived scene complexity was then evaluated in an exploratory study that reproduced the same stimuli with a three-dimensional loudspeaker array inside an anechoic chamber. Sixty-five subjects listened to the scenes and for each one had to rate 29 attributes, including complexity, both with and without target speech in the scenes. The data were analyzed using three-way principal component analysis with a (2 3 2) Tucker3 model in the dimensions of scales (or ratings), scenes, and subjects, explaining 42% of variation in the data. "Comfort" and "variability" were the dominant scale components, which span the perceived complexity. Interaction effects were observed, including the additional task of attending to target speech that shifted the complexity rating closer to the comfort scale. Also, speech contained in the background scenes introduced a second subject component, which suggests that some subjects are more distracted than others by background speech when listening to target speech. The results are interpreted in light of the proposed framework.Entities:
Keywords: complex acoustic environments; complexity; hearing; perception; three-way principal component analysis
Year: 2019 PMID: 31808369 PMCID: PMC6900675 DOI: 10.1177/2331216519881346
Source DB: PubMed Journal: Trends Hear ISSN: 2331-2165 Impact factor: 3.293
Different Wording Used in the Literature to Designate Complex Acoustic Environments.
| Complex | Acoustic | Environment |
|---|---|---|
| Realistic | Auditory | Scenario |
| Everyday | Listening | Scene |
| Demanding | Multitalker | Soundscape |
| Challenging | Multisource | Condition |
| Adverse | Sound | Situation |
| Real world | Setting | |
| Field |
The Complex Acoustic Environment Framework.
| Type | No. | Characteristic | Description |
|---|---|---|---|
| Source | 1 | Multiple acoustic sources distributed in space | The more acoustic sources that are present, the more
independent streams of sound event information are competing
for the listener’s attention (e.g., |
| 2 | Acoustic source diversity | Sounds vary in temporal and spectral characteristics,
radiation patterns, and position. The more variable the
source is the more challenging it may be for the receiver to
process these changes in the real time (e.g., | |
| 3 | Source–source interaction | Human talkers and other biological sound sources form
interactive communication networks, so the sound produced by
several sources (e.g., talkers) is not the same as each of
them alone ( | |
| Environment | 4 | Reverberation, reflections, scattering, diffraction, and diffusion | The geometry and materials of the environment, and objects
within it, impose the boundary conditions on the propagation
of sound that deviate from free-field acoustics. This blurs
individual sound sources (e.g., |
| 5 | Nonuniform medium for sound propagation | The medium can be inhomogeneous, nonlinear, nonisotropic,
absorptive, and dispersive. This is mostly applicable for
underwater acoustics, although over very long distances,
large temperature gradients, precipitation, or turbulent
conditions, some of these phenomena can become relevant to
airborne sound as well ( | |
| Source–environment interactions | 6 | Loudspeaker amplification systems | Electronic amplification interacts with the room acoustics
and affects the sound pressure levels and radiation patterns
of sounds in space ( |
| 7 | Source–environment adaptation | Biological sound sources may react and adapt to the general acoustical conditions in the environment including reverberation, and general noise level, as well as their distribution in space (e.g., the Lombard effect, Lombard, 1911). | |
| 8 | Cues of other modalities | Sensory signals of different modalities tend to co-occur in
natural settings and they can be combined and integrated by
the receiver ( | |
| Receiver | 9 | Receiver’s task | The instantaneous task of the listener frames how much
complexity matters ( |
The Complex Scene Stimuli Used in the Experiment, Which Were Taken From the ARTE Database.
| Scene name | Description | SPL (dB) | SPL (dBA) | Speech (dBA) | ||
|---|---|---|---|---|---|---|
| 1 | Library | University study area in the main library, off-peak hours, quiet, distinctly audible acoustic objects, people whisper to avoid disturbing others. | 53.0 | 46.1 | 0.6 | 54.6 |
| 2 | Office | Open space office, people typing, chatting and talking on the phone | 56.7 | 51.4 | 0.2 | 63.9 |
| 3 | Diffuse Noise 1 | Low-level speech-weighted broadband diffuse sound field L | 58.3 | 54.2 | N/A | 62.7 |
| 4 | Church 1 | Small church space, people entering and chatting quietly before service | 60.5 | 54.7 | 1.2 | 62.4 |
| 5 | Living Room | Living room with access to kitchen in the back, loud television and sounds from the kitchen | 63.3 | 58.7 | 0.2 | 65.0 |
| 6 | Church 2 | Same as 4, but busier and louder conversations (1.5 min) | 65.9 | 60.9 | 1.2 | 67.5 |
| 7 | Diffuse Noise 2 | Medium-level speech-weighted broadband diffuse sound field | 70 | 65.9 | N/A | 69.7 |
| 8 | Café 1 | Indoor café at medium occupancy | 71.0 | 67.3 | 1.1 | 72.1 |
| 9 | Café 2 | Indoor (company) café at medium occupancy before lunch, next to the wall | 71.7 | 66.2 | 1.1 | 70.8 |
| 10 | Dinner Party | Small room with eight people chatting over the table with background music | 72.8 | 68.7 | 0.4 | 71.8 |
| 11 | Street/Balcony | Apartment balcony over a busy arterial road; Mainly traffic noise with some noise from within the apartment | 74.5 | 71.1 | N/A | 75.6 |
| 12 | Train Station | Sydney Central, main concourse—large space, open to the platforms with people walking at peak hour; loud amplified announcement and train sounds | 77.1 | 73.6 | 1.0 | 74.2 |
| 13 | Food Court 1 | Busy university food court | 78.2 | 74.9 | 0.9 | 74.9 |
| 14 | Food Court 2 | Very noisy food court in a shopping mall during lunch | 79.6 | 76.7 | 1.0 | 76.6 |
Note. The scene names and descriptions are provided along with the unweighted (dB SPL) and A-weighted (dBA) SPLs, and the reverberation time T30. Data in the table are reproduced from Weisser et al. (2019). The right-most column contains the target speech levels from 0.66 m in dBA. SPL = sound pressure level; N/A = not applicable.
A Qualitative Breakdown of Relevant CAE Framework Characteristics for the ARTE Scenes Described in Table 3.
| Scene name | CAE framework
characteristic | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Multiple sources | Source diversity | Source–source interaction | Reverberation | Nonuniform medium | Amplification | Adaptation | Other modalities | Receiver’s task | ||
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ||
| 1 | Library | + | + | + | − | − | − | + | − | + |
| 2 | Office | + | + | + | − | − | − | + | − | + |
| 3/7 | Diffuse Noise 1 and 2 | − | − | − | − | − | − | − | − | + |
| 4 | Church 1 | + | − | + | + | − | − | + | − | + |
| 5 | Living Room | + | + | − | − | − | + | − | − | + |
| 6 | Church 2 | + | − | + | + | − | − | + | − | + |
| 8/9 | Café 1 and 2 | + | + | + | + | − | + | + | − | + |
| 10 | Dinner Party | + | + | + | + | − | + | + | − | + |
| 11 | Street/Balcony | + | + | − | − | 0 | − | − | − | + |
| 12 | Train Station | + | + | + | + | 0 | + | + | − | + |
| 13 /14 | Food Court 1 and 2 | + | + | + | + | − | − | + | − | + |
Note. This breakdown gathers the most salient dimensions according to which perceived complexity may vary. The CAE framework characteristics are shown in Table 2. A plus (+) indicates that the respective characteristic was judged to be highly relevant to the scene, whereas a minus (−) indicates the opposite. The zero (0) indicates that a nonuniform medium effect (5) may have been present but could not be captured with the recording technology used. CAE = complex acoustic environment.
Figure 1.Scree plot of different component combinations of Tucker3 models. The triplets on the plot designate the number of components for scales scenes subjects. The amount of explained variation is measured as percentage of the sum of square differences between the original data set to the fitted one. The final model that was selected (2 3 2) is emphasized.
The Core Array, , of the (2 3 2) Tucker3 Model, Simplified by Forcing the Smallest Terms to Zero.
| Scene 1 | Scene 2 | Scene 3 | |
|---|---|---|---|
| Subject Component 1 | |||
| Scale 1 | 99.65 | 0.00 | 0.00 |
| Scale 2 | 0.00 | 41.30 | 0.00 |
| Subject Component 2 | |||
| Scale 1 | 0.00 | 0.00 | −38.87 |
| Scale 2 | 0.00 | 0.00 | 0.00 |
The Loading Matrix of the Two Scale Components, Grouped According to the Clusters They Form on the Joint Biplot in Figure 2.
| Scale Component 1 | Scale Component 2 | |
|---|---|---|
| Pleasantness | 0.141 | −0.088 |
| Intelligibility | 0.177 | −0.126 |
| Loudness | −0.176 | 0.137 |
| Phone listening | −0.229 | 0.107 |
| Annoyance | −0.212 | 0.198 |
| Annoyance speech | −0.242 | 0.046 |
| Attention | −0.211 | 0.027 |
| Stress | −0.185 | 0.074 |
| Disturbance speech | −0.255 | 0.068 |
| Focus | −0.193 | 0.045 |
| Focusing speech | −0.224 | 0.033 |
| Fatigue | −0.218 | 0.148 |
| Fatigue speech | −0.250 | 0.066 |
| Effort speech | −0.254 | 0.041 |
| Difficulty speech | −0.200 | 0.096 |
| Difficulty | −0.070 | 0.051 |
| Variability | −0.133 | −0.386 |
| Familiarity | −0.097 | −0.416 |
| Realism | −0.059 | −0.299 |
| Event count | −0.051 | −0.321 |
| Event score | −0.010 | −0.249 |
| Realism speech | −0.070 | −0.263 |
| Complexity | −0.199 | −0.176 |
| Complexity speech | −0.234 | −0.150 |
| Effort | −0.173 | −0.073 |
| Busyness | −0.217 | −0.129 |
| Distraction | −0.175 | −0.099 |
| Distraction speech | −0.242 | −0.098 |
| Distinction | 0.072 | −0.245 |
| Presentation order | 0.002 | 0.129 |
| Time AB | −0.011 | −0.090 |
| Time C | −0.016 | −0.045 |
| Envelopment | −0.036 | 0.092 |
| Spaciousness | −0.035 | 0.108 |
| Reverberance | −0.077 | 0.054 |
Loading Matrix of the Three Scene Components ().
| Scene Component 1 | Scene Component 2 | Scene Component 3 | |
|---|---|---|---|
| Library | 0.413 | −0.353 | 0.117 |
| Office | 0.135 | −0.207 | 0.377 |
| Diffuse Noise 1 | 0.411 | 0.456 | 0.147 |
| Church 1 | 0.224 | 0.000 | 0.231 |
| Living Room | 0.119 | −0.257 | 0.298 |
| Church 2 | 0.086 | 0.123 | 0.426 |
| Diffuse Noise 2 | 0.223 | 0.559 | 0.261 |
| Cafe 1 | −0.118 | −0.255 | 0.296 |
| Cafe 2 | −0.083 | −0.261 | 0.199 |
| Dinner Party | −0.057 | −0.006 | 0.341 |
| Street/Balcony | −0.318 | 0.155 | 0.221 |
| Train Station | −0.210 | −0.100 | 0.286 |
| Food Court 1 | −0.407 | 0.112 | 0.186 |
| Food Court 2 | −0.432 | 0.226 | 0.150 |
Note. The first component mainly drives the comfort ratings, whereas the second drives the variability. The third component is only required to reveal the second subject component (see later).
Figure 2.Joint biplot of the scale and scene first and second components. The vectors represent the different scales, where names ending with (S) refer to target speech attributes in Part C of the questionnaire. The first two scene components are superimposed on the plot, with the large markers designating the scenes. If a marker is closer to a particular scale, then it is more closely related to that attribute. The biplot origin designates the average value of every scale over all scenes and subjects.
The Relative Contribution of the Core Array Elements to the Total Variation Explained by the Simplified (2 3 2) Tucker3 Model.
| Core element | Explained variation (%) | Value | Sum of squares |
|---|---|---|---|
|
| 75.53 | 99.65 | 9,930 |
|
| 12.97 | 41.30 | 1,706 |
|
| 11.48 | −38.87 | 1,510 |
Figure 3.The strongest correlation of any rating scale to the second subject component was with the listening effort rating during speech. Only the scenes with dominant background speech drove this differentiation. Hearing status does not appear to consistently predict the mean subjective ratings (four subjects with mild losses, >25 dB HL, were grouped separately from the slight losses for illustration purposes only).
Figure 4.The scene complexity ratings averaged over all 65 subjects, ordered by increasing sound pressure level (dB SPL) from left to right. The error bars are the Tukey–Kramer 95% confidence intervals of pairwise comparisons of all ratings.