| Literature DB >> 29696314 |
Martijn Baart1,2, Jean Vroomen3.
Abstract
Perception of vocal affect is influenced by the concurrent sight of an emotional face. We demonstrate that the sight of an emotional face also can induce recalibration of vocal affect. Participants were exposed to videos of a 'happy' or 'fearful' face in combination with a slightly incongruous sentence with ambiguous prosody. After this exposure, ambiguous test sentences were rated as more 'happy' when the exposure phase contained 'happy' instead of 'fearful' faces. This auditory shift likely reflects recalibration that is induced by error minimization of the inter-sensory discrepancy. In line with this view, when the prosody of the exposure sentence was non-ambiguous and congruent with the face (without audiovisual discrepancy), aftereffects went in the opposite direction, likely reflecting adaptation. Our results demonstrate, for the first time, that perception of vocal affect is flexible and can be recalibrated by slightly discrepant visual information.Entities:
Keywords: Adaptation; Audiovisual integration; Cross-modal learning; Emotion perception
Mesh:
Year: 2018 PMID: 29696314 PMCID: PMC6010487 DOI: 10.1007/s00221-018-5270-y
Source DB: PubMed Journal: Exp Brain Res ISSN: 0014-4819 Impact factor: 1.972
Fig. 1Overview of the audiovisual exposure—auditory test design. Recalibration (a): three repetitions of a dynamic video of a ‘happy’ or ‘fearful’ speaker pronouncing an auditory sentence with ambiguous emotional auditory prosody were followed by an auditory-only test in which one out of three ambiguous sentence was rated for emotional affect. Exposure stimuli with ambiguous prosody were expected to induce assimilative aftereffects (recalibration) because the video shifts the interpretation of the ambiguous sound so that the audiovisual conflict is reduced. Adaptation (b): the procedure was the same as in a, except that the exposure stimuli had auditory sentences with non-ambiguous happy or fearful prosody that were congruent with the video. These stimuli were expected to induce contrastive after effects because the non-ambiguous nature of the sentences induces adaptation
Fig. 2Stimulus overview. The pitch contour of the seven sentences are indicated by the blue line (on a 75–400 Hz scale), and are superimposed on the spectrograms (0–5000 Hz, 50 dB dynamic range). Relative timing of the auditory sentence is indicated by the text in the spectrograms of the ‘fearful’ and ‘happy’ continuum endpoints. The underlined letters correspond to the video frames that are provided above/below the spectrograms
Fig. 3Group-averaged valence ratings of the voice of a the auditory-only 7-step continuum, b the audio-only test tokens after exposure to audiovisual stimuli with ambiguous and slightly incongruent prosody, c the audio-only test tokens after exposure to audiovisual stimuli with non-ambiguous and congruent prosody, d the audiovisual exposure stimuli. Error bars represent 95% Confidence Intervals of the mean