| Literature DB >> 31082680 |
David Meijer1, Sebastijan Veselič2, Carmelo Calafiore2, Uta Noppeney2.
Abstract
Multisensory perception is regarded as one of the most prominent examples where human behaviour conforms to the computational principles of maximum likelihood estimation (MLE). In particular, observers are thought to integrate auditory and visual spatial cues weighted in proportion to their relative sensory reliabilities into the most reliable and unbiased percept consistent with MLE. Yet, evidence to date has been inconsistent. The current pre-registered, large-scale (N = 36) replication study investigated the extent to which human behaviour for audiovisual localization is in line with maximum likelihood estimation. The acquired psychophysics data show that while observers were able to reduce their multisensory variance relative to the unisensory variances in accordance with MLE, they weighed the visual signals significantly stronger than predicted by MLE. Simulations show that this dissociation can be explained by a greater sensitivity of standard estimation procedures to detect deviations from MLE predictions for sensory weights than for audiovisual variances. Our results therefore suggest that observers did not integrate audiovisual spatial signals weighted exactly in proportion to their relative reliabilities for localization. These small deviations from the predictions of maximum likelihood estimation may be explained by observers' uncertainty about the world's causal structure as accounted for by Bayesian causal inference.Entities:
Keywords: Maximum likelihood estimation; Multisensory integration; Multisensory perception; Optimality; Spatial ventriloquism
Year: 2019 PMID: 31082680 PMCID: PMC6864592 DOI: 10.1016/j.cortex.2019.03.026
Source DB: PubMed Journal: Cortex ISSN: 0010-9452 Impact factor: 4.027
Fig. 1Trial structure for the audiovisual localization task (Panel A) and full experimental procedure (Panel B). A. A jittered pre-stimulus time period, in which participants fixated a cross in the middle, was followed by two intervals, each of which consisted of a stimulus and a subsequent blank period. The stimuli in the two intervals were either both auditory or visual or audiovisual (the latter is shown here). The first stimulus, the ‘standard’, was always presented in the middle. The second stimulus, the ‘probe’, was presented at one of thirteen locations along the azimuth. An audiovisual probe could be spatially congruent, or incongruent (with a small spatial conflict between the auditory and visual signals; as shown here). After the second interval, two rectangles appeared on the screen to prompt participants to indicate via a two choice key press whether the location of the probe was left or right of the standard. B. The experiment included three sessions on three separate days (vertically numbered in the figure). In session 1 we individually (for each participant) adjusted the probe locations and AV spatial disparity (Section 2.6.1.2) and the spatial perceptual reliability of the visual signal to match the spatial perceptual reliability of the auditory signal. The visual reliability was adjusted by changing the size of the visual stimulus (i.e., ; see Section 2.6.1.3). At the end of session 1, we validated that auditory and visual reliabilities were approximately equal (Section 2.6.1.4). In sessions 2 and 3, the probe locations, AV spatial disparity and visual stimulus size were set to the levels defined in session 1 and they were not further adjusted during the main experiment (but see Section 2.6.2.2). All tasks, throughout all three sessions made use of the trial structure as described in Panel A.
Fig. 2Psychometric functions, sensory noise parameters and weights – group level results. A-B. Psychometric functions were fit to responses for A, V and AV (congruent and incongruent) conditions of each participant. Panels A and B show the group-average of those fitted psychometric functions (group mean ±SEM) obtained by computing the mean P(“probe right”) across participants for every point on the x-axis (where the stimulus levels were expressed relative to the individual's ): Auditory (green), visual (red) and audiovisual congruent (blue), audiovisual conflict (magenta; i.e., ‘visual left, auditory right’) and (cyan; i.e., ‘auditory left, visual right’) as solid lines. Using individuals' MLE-predicted parameters (see Eqs. (1), (2)) we also constructed MLE-predicted psychometric functions and subsequently computed group-averages for (panel A, black), (panel B, magenta) and (panel B, cyan) in dashed lines. The empirical and MLE-predicted psychometric functions are nearly identical for the congruent condition. Furthermore, for the AV incongruent conditions the MLE-predicted psychometric functions are nearly identical to the empirical AV congruent psychometric function, because auditory and visual variances were approximately matched. By contrast, the empirical psychometric functions for the audiovisual incongruent conditions are shifted sideways, indicating more (less) “right” responses when the probe's visual signal was presented on the right (left) of the auditory signal. If participants completely ignored one of the two sensory modalities then their incongruent PSEs are expected to be near (vertical dashed black lines). C. Bar plots show the across participants' mean (1.96 SEM) of the sensory noise parameter for the auditory (green), visual (red) and empirical (dark blue) and MLE-predicted (light blue, Eq. (2)) audiovisual conditions. The individual sensory noise parameters were normalized, i.e., divided by the participant-specific before averaging (hence ). D. Bar plots show the across participants' mean (1.96 SEM) of empirical and MLE-predicted auditory weights (Eqs. (1), (6)).
Fig. 3Scatter plots of individual sensory noise parameters and weights (with 95% bootstrapped confidence intervals for each parameter). Black dashed lines along the diagonal indicate equality of the two parameters. A. Unisensory visual (y-axis) versus auditory (x-axis) sensory noise parameters. Dark diamonds indicate twelve participants with a significant difference between auditory and visual sensory noise (two-sided test). B. Empirical audiovisual sensory noise parameters versus the minimum of the auditory or visual sensory noise parameters. The colour of the diamonds' outlines indicates the least variable unisensory modality for each participant ( = green, = red). Dark diamonds indicate twenty-four participants with a significant multisensory variance reduction. C. Empirical versus MLE-predicted AV sensory noise parameters. Dark diamonds indicate five participants for whom the empirical variance is significantly greater than predicted by MLE. D. Empirical versus MLE-predicted auditory weights. Dark diamonds indicate twenty participants with significant visual overweighting.