| Literature DB >> 28451630 |
Santani Teng1, Verena R Sommer1,2, Dimitrios Pantazis3, Aude Oliva1.
Abstract
Perceiving the geometry of surrounding space is a multisensory process, crucial to contextualizing object perception and guiding navigation behavior. Humans can make judgments about surrounding spaces from reverberation cues, caused by sounds reflecting off multiple interior surfaces. However, it remains unclear how the brain represents reverberant spaces separately from sound sources. Here, we report separable neural signatures of auditory space and source perception during magnetoencephalography (MEG) recording as subjects listened to brief sounds convolved with monaural room impulse responses (RIRs). The decoding signature of sound sources began at 57 ms after stimulus onset and peaked at 130 ms, while space decoding started at 138 ms and peaked at 386 ms. Importantly, these neuromagnetic responses were readily dissociable in form and time: while sound source decoding exhibited an early and transient response, the neural signature of space was sustained and independent of the original source that produced it. The reverberant space response was robust to variations in sound source, and vice versa, indicating a generalized response not tied to specific source-space combinations. These results provide the first neuromagnetic evidence for robust, dissociable auditory source and reverberant space representations in the human brain and reveal the temporal dynamics of how auditory scene analysis extracts percepts from complex naturalistic auditory signals.Entities:
Keywords: audition; auditory scene analysis; magnetoencephalography; multivariate pattern analysis; reverberation
Mesh:
Year: 2017 PMID: 28451630 PMCID: PMC5394928 DOI: 10.1523/ENEURO.0007-17.2017
Source DB: PubMed Journal: eNeuro ISSN: 2373-2822
Figure 1.Stimulus conditions, MEG classification scheme, and single-sound decoding time course. , Stimulus design. Three brief sounds were convolved with three different RIRs to produce nine sound sources spatialized in reverberant environments. , MEG pattern vectors were used to train an SVM classifier to discriminate every pair of stimulus conditions (three sound sources in three different space sizes each). Decoding accuracies across every pair of conditions were arranged in 9 × 9 decoding matrices, one per time point t. , Averaging across all condition pairs (shaded matrix partition) for each time point t resulted in a single-sound decoding time course. Lines below time course indicates significant time points (N = 14, cluster-definition threshold, p < 0.05, 1000 permutations). Decoding peaked at 156 ms; error bars represent 95% CI.
Figure 2.Separable space and source identity decoding. , Individual conditions were pooled across source identity (left, top) or space size (left, bottom) in separate analyses. Classification analysis was then performed on the orthogonal stimulus dimension to establish the time course with which the brain discriminated between space (red) and source identity (blue). Sound-source classification peaked at 130 ms, while space classification peaked at 386 ms. Significance indicators and latency error bars on plots same as in Figure 1. , Space was classified across sound sources and vice versa. Left panel, Cross-classification example in which a classifier was trained to discriminate between spaces on sound sources 1 and 2, then tested on space discrimination on source 3. Right panel, Sound-source cross-classification example in which a classifier was trained to discriminate between sound sources on space sizes 1 and 2, then tested on sound-source discrimination on space 3. , Results from all nine such pairwise train-test combinations were averaged to produce a classification time course in which the train and test conditions contained different experimental factors. Sound-source cross-classification peaked at 132 ms, while space cross-classification peaked at 385 ms. Significance bars below time courses and latency error bars same as in Figure 1.
Figure 3.Sensorwise decoding of source identity and space size. MEG decoding time courses were computed separately for 102 sensor locations yielding decoding sensor maps. , Sensor map of sound source decoding at the peak of the effect (130 ms). , Sensor map of space size decoding at the peak of the effect (386 ms). Significant decoding is indicated with a black circle over the sensor position (p < 0.01; corrected for false discovery rate (FDR) across sensors and time).
Figure 4.Temporal generalization matrix of auditory source and space decoding time courses. Left column shows the generalized decoding profiles of space () and source () decoding. Right column shows the statistically significant results (t test against 50%, p < 0.05, FDR corrected).
Figure 6.Stimulus dissimilarity analysis based on cochleogram data. , Cochleograms were generated for each stimulus, discretized into 200 5-ms bins and 64 frequency subbands. Each cochleogram thus comprised 200 64 × 1 pattern vectors. For each pair of stimuli, pattern vectors across frequency subbands were correlated at corresponding time points and subtracted from 1. , Overall cochleogram-based dissimilarity. The final dissimilarity value at time t is an average of all pairwise correlations at that time point. Peak overall cochleogram dissimilarity occurred at 500 ms; peak MEG dissimilarity (decoding accuracy) is shown for comparison. , Pooled cochleogram-based dissimilarity across space size and source identity. Pairwise correlations were performed and averaged analogously to pooled decoding analysis. MEG pooled decoding peaks for source identity and space size are shown for reference; corresponding stimulus dissimilarity peaks were significantly offset (p < 0.05 for both source identity and space).
Figure 7.Comparison of MEG neural representations to a categorical versus an ordinal scene size model. Representational dissimilarity matrices (RDMs) of a categorical and an ordinal model () were correlated with the MEG data from 138–801 ms (the temporal window of significant space size decoding) to assess the nature of MEG scene size representations. , Results indicate the MEG representations have significantly higher correlation with the ordinal than the categorical scene size model. Spearman correlation coefficients ρ were averaged across time points in the temporal window. Error bars represent ±SEM.
Summary of key statistical tests
| Line | Data structure | Type of test | 95% confidence intervals |
|---|---|---|---|
| a | None assumed: classification accuracy over time | Bootstrap | Onset CI: 12–64 ms |
| b | None assumed: classification accuracy over time | Bootstrap | Peak CI: 119–240 ms |
| c | None assumed: classification accuracy over time | Bootstrap | Onset CI: 37–60 ms |
| d | None assumed: classification accuracy over time | Bootstrap | Peak CI: 116–140 ms |
| e | None assumed: classification accuracy over time | Bootstrap | Onset CI: 71–150 ms |
| f | None assumed: classification accuracy over time | Bootstrap | Peak CI: 246–395 ms |
| g | None assumed: onsets of source and space decoding | Compare bootstrapped empirical distribution of space decoding onset with mean source decoding onset | Space onset CI: 71–150 ms |
| h | None assumed: peaks of source and space decoding | Compare bootstrapped empirical distribution of space decoding peak with mean source decoding peak | Space peak CI: 246–395 ms |
| i | None assumed: cross-classification accuracy over time | Bootstrap | Onset CI: 40–63 ms |
| j | None assumed: cross-classification accuracy over time | Bootstrap | Onset CI: 125–356 ms |
| k | None assumed: MEG-behavior correlations | Bootstrapping | CI: .227–.895 |
| l | None assumed: MEG-behavior correlations | Bootstrapping | CI: .325–.795 |
| m | None assumed: empirical distribution of source decoding peak | Compare bootstrapped empirical distribution of source decoding peak with source dissimilarity peak | Peak CI: 116–140 ms |
| n | None assumed: empirical distribution of space decoding peak | Compare bootstrapped empirical distribution of space decoding peak with mean space dissimilarity peak | Peak CI: 246–395 ms |
| o | Normal distribution: MEG-model correlations over time points | Paired | Mean difference CI: 0.0470–0.0507 |
| p | None assumed: classification accuracy over time | Bootstrap | Source peak CI: 96–312 ms |
| q | None assumed: classification accuracy over time | Bootstrap | Space peak CI: 71–790 ms |
Figure 8.Space and sound source decoding with repetition-window stimuli. , Representative waveforms of single and repeated stimuli. Repeated stimuli were produced by concatenation of anechoic stimuli, followed by RIR convolution and linear amplitude ramping. , Source (blue) and space (red) decoding. Sound-source classification peaked at 167 (96-312) ms, while space classification peaked at 237 (71-790) ms. Color-coded lines below time courses indicate significant time points, as in experiment 1; latency error bars indicate bootstrapped confidence intervals as in experiment 1. Gray vertical lines indicate stimulus onset and approximate offset.
Figure 5.Behavior correlates with MEG decoding data. Assessment of linear relationships between response times and MEG peak decoding latencies (), as well as behavioral and decoding accuracies (). Bootstrapping the participant sample (N = 14, p < 0.05) 10,000 times revealed significant correlations between RT and latency (r = 0.66, p = 0.0060) and behavioral and decoding accuracy (r = 0.59, p < 0.0001). Individual condition pairs are denoted by source (So; red) or space (Sp; blue) labels, with numerals indicating which conditions were compared. For space conditions: 1, small; 2, medium; 3, large. For source conditions: 1, hand pat; 2, pole tap; 3, ball bounce.