Literature DB >> 31216037

Vocal threat enhances visual perception as a function of attention and sex.

Annett Schirmer^1,2,3, Maria Wijaya¹, Esther Wu⁴, Trevor B Penney^1,2,3.

Abstract

This pre-registered event-related potential study explored how vocal emotions shape visual perception as a function of attention and listener sex. Visual task displays occurred in silence or with a neutral or an angry voice. Voices were task-irrelevant in a single-task block, but had to be categorized by speaker sex in a dual-task block. In the single task, angry voices increased the occipital N2 component relative to neutral voices in women, but not men. In the dual task, angry voices relative to neutral voices increased occipital N1 and N2 components, as well as accuracy, in women and marginally decreased accuracy in men. Thus, in women, vocal anger produced a strong, multifaceted visual enhancement comprising attention-dependent and attention-independent processes, whereas in men, it produced a small, behavior-focused visual processing impairment that was strictly attention-dependent. In sum, these data indicate that attention and listener sex critically modulate whether and how vocal emotions shape visual perception.

Entities: Chemical Disease Gene Species

Keywords: ERP; emotion; sex differences; visual attention; vocal affect

Year: 2019 PMID： 31216037 PMCID： PMC6778830 DOI： 10.1093/scan/nsz044

Source DB: PubMed Journal: Soc Cogn Affect Neurosci ISSN： 1749-5016 Impact factor: 3.436

Introduction

We often find ourselves influenced by the stimuli we intend to ignore—especially if these stimuli are affectively relevant and change the way we feel (Lui ; Min and Schirmer, 2011; for reviews, see Inzlicht ; Pessoa 2015). For example, driving a car in heavy traffic is harder when passengers talk as compared to when they are silent (Gaspar ), and this difference may be greater for verbal exchanges that are confrontational as compared to benign. Here, we addressed the role of vocal emotions for visual perception. Moreover, we explored the extent to which expected emotion effects require attentive voice processing, how exactly they shape visual perception and whether they differ between sexes. There is much evidence that the affective value of task-irrelevant stimuli modulates ongoing mental processes (Öhman and Soares, 1994; Globisch ; for a meta-analysis, see Schirmer, 2018). For example, this was shown by Vuilleumier and colleagues who presented visual arrays that paired two houses and two faces (Vuilleumier ). Color frames around the stimuli indicated which pair was task-relevant. On face trials, participants judged whether the faces were identical, and on house trials, they judged whether the houses were identical. Despite being task-irrelevant, facial expressions modulated brain activity during both face and house trials. Specifically, the amygdala and fusiform gyrus, two structures implicated in face perception, were more strongly activated by fearful expressions as compared with neutral expressions. These and similar findings prompted some to conclude that emotion processing is automatic (for a review, see Pourtois ). However, others have challenged this conclusion (Pessoa, 2008) and argued instead that an influence of task-irrelevant emotions depends on the availability of processing resources and an undemanding primary task. A study supporting this latter position presented an emotional or neutral face flanked by two bars (Pessoa ). In different experimental blocks, study participants indicated bar orientation (same/different) or face sex (male/female). Differences in brain activity between emotional and neutral faces were observed during the face task, but not during the bar task, suggesting that the processing of emotional information is ‘under top-down control’. The conflict ignited by this original work inspired much subsequent research. Some of this research leveraged on the spatial sensitivity of functional magnetic resonance imaging (fMRI), which allows for a fairly detailed differentiation of emotion systems (Straube ; Schindler ). Additionally, research was accumulated using the electroencephalogram (EEG) and its event-related potential (ERP) technique. Compared to fMRI, EEG/ERP has a better temporal resolution and is thus better suited for dissociating potentially fast-changing effects (e.g. Schirmer ; Pourtois ; Schupp ; Kissler ; Pourtois ; Brosch and Wieser, 2011; Schindler ). Accumulating findings from both fMRI and EEG/ERP paint a complex picture suggesting that automaticity is a differentiated construct that is continuous rather than discreet and that depends on paradigm and methodological choices (for an example of the role of low-level visual features, see Schindler ). Moreover, extant work highlights the need to consider automaticity more specifically within a particular task context. The context that was of interest here was modeled on everyday experiences in which individuals perform a demanding visual task (e.g. driving a car) on an auditory backdrop (e.g. passenger conversations). Specifically, we asked how emotional aspects of the auditory backdrop become relevant for modulating visual processes and responses. To the best of our knowledge, this question has been tackled by only a couple of fMRI studies (Mothes-Lasch , 2012). Moreover, these studies relied on small samples and failed to consider important individual variables such as sex. In fact, there is now much evidence suggesting that, compared with men, women are more sensitive to emotional information if that information is task-irrelevant (Proverbio ; van den Brink ; Schirmer ; Proverbio and Galli, 2016; Schirmer and Gunter, 2017). Thus, sex is likely to be important for ongoing efforts at understanding emotion processing automaticity. With these points in mind, we designed an ERP study in which visual task displays occurred in silence or were accompanied by an angry or neutral voice. In the single-task block, voices were irrelevant, whereas in the dual-task block, speaker sex had to be categorized. Of interest were behavioral responses as well as two negative components in the visual ERP. The first component, the N1, typically peaks at ~160 ms following stimulus onset with larger amplitude to physically salient, emotional or cued stimuli (Mangun, 1995; Herrmann and Knight, 2001; Schneider ). As such, N1 is sensitive to bottom-up mechanisms that recruit attentional resources for stimulus perception. The second component, N2, peaks between 200 and 350 ms and comprises several subcomponents that have been linked to top-down mechanisms and stimulus expectation (Folstein and Van Petten, 2008; Schneider ). In their original form, N1 and N2 reflect responses to the visual field as a whole, but when one subtracts voltages recorded from electrode sites ipsi-lateral to the visual target from those recorded contralaterally, they reveal information about hemifield processing and spatial orienting (Mangun, 1995; Woodman and Luck, 1999). The behavioral measures examined here comprised reaction times and the sensitivity index d′ and were expected to help specify whether and how emotional sounds modulate visual task performance. We tested the following three predictions. First, if vocal affect modulates visual processing automatically, then differences in the N1, N2 and behavioral responses to angry voices as compared with neutral voices should be present in both the dual and the single task. If, however, vocal affect modulates visual processing as a function of voice-directed attention, then effects on N1, N2 and behavioral responses should be present in the dual task only. Second, vocal background should impair target processing when compared with silence, and this impairment should be largest for angry voices. This prediction was derived from evidence that (i) irrelevant context distracts unless it provides cues (e.g. spatial location) to target processing and that (ii) emotions augment context effects (Brosch ; Gaspar ). Third, based on established sex differences (reviewed in Schirmer, 2013), we expected effects of vocal affect to be more pronounced and more automatic in women than in men.

Methods

Participants

We determined our sample size a priori based on previous research reporting sex differences and counterbalancing needs associated with the paradigm (Schirmer , 2018; Schirmer and McGlone, 2018). Sample size, basic methodology and study hypotheses were pre-registered at the Open Science Framework and can be inspected here: https://osf.io/ur6fk/?view_only=bea9c744d0824b6aaea0ad08588a6b04. Please note that when visiting this link, you must click on the ‘View Registration Form’ button to see our registered document. Apart from the hypotheses described here, the registered document mentions stimulus difficulty as an additional variable. This variable was conceived with the intention that slightly and starkly asymmetrical crosses would be presented in separate blocks. However, due to a miscommunication with the programmer, they were presented within the same block. Therefore, the difficulty variable was non-orthogonal, and we excluded it from our analyses as we did not know how to conceptualize the difficulty associated with symmetrical crosses. We invited 53 participants to this study. The data from five participants had to be discarded due to a data recording issue (N = 1) or because of their failure to maintain fixation (N = 4). Twenty-four of the remaining participants were female with a mean age of 22.7 years (s.d. 4.4) and 24 were male with a mean age of 21.6 years (s.d. 2.7). Participants reported normal or corrected-to-normal vision and normal hearing. They received HKD 70 for 1 h of their time.

Stimuli

This study used visual and auditory stimuli. The visual stimuli were search displays comprising eight equally-spaced crosses presented along a circle (radius = 4.5° of visual angle) around a central fixation point. Each cross measured ~1° of visual angle in width and height. On a given trial, half the crosses were symmetrical (+) and half were asymmetrical (†). For the asymmetrical crosses, the difference in length between the top and bottom segments of the cross was either 30 or 60% of its total height. Crosses in the search display occurred on a gray background and could be blue or pink—if one cross was blue, the others were pink, and vice versa. Singleton color was balanced across trials and conditions, and identified the target for which participants decided whether it was symmetrical or asymmetrical. Targets occurred on the left and right half of the circle on 50% of the trials, respectively. To reduce neuronal adaptation over time (Luck and Hillyard, 1994), the colors of the search display alternated across trials, i.e. a pink singleton on one trial was followed by a blue singleton on the next trial. Further, we added a small, random jitter to the exact position of each cross (≤10° subtended angle from the screen center) such that cross positions varied slightly from trial to trial. Auditory stimuli were selected from the Montreal Affective Voices (MAV) (Belin ) and from stimuli previously recorded in our laboratory. They consisted of 24 angry and 24 neutral exclamations of the syllable ‘ah’, half of which were spoken by a female and half by a male speaker. An independent group of listeners normed the selected voices. Specifically, 22 listeners (11 female) rated the MAV stimuli, while 20 listeners (10 female) rated the in-house stimuli by indicating their emotion (What is the emotion expressed?) and scoring emotion intensity and arousal on scales ranging from 1 (very weak) to 4 (very strong). The rating results informed stimulus selection. The selected angry and neutral voices differed for intensity (t(46) = 6.39, P < 0.001, Mangry = 3.42, s.d.angry = 0.65, Mneutral = 2.49, s.d.angry = 0.28) and arousal (t(46) = 8.73, P < 0.001, Mangry = 3.56, s.d.angry = 0.63, Mneutral = 2.33, s.d.neutral = 0.30), but not for identification accuracy (P > 0.25). Their sound intensity was normalized and sound durations ranged between 228 and 981 ms. A Baysian t-test using a joint conjugate prior indicated that mean durations did not meaningfully differ between the angry and neutral conditions (t(49) = 0.521, P = 0.604, CI: −0.103 to 0.176).

Paradigm

Task displays were presented on a 24 inch LCD monitor and with a viewing distance of 90 cm. The monitor’s refresh rate was 60 Hz. During both a single- and a dual-task block, trials started with a fixation dot (radius 0.3°) that lasted for 0.9–1.1 s (randomly selected from a uniform distribution) and was followed by a 1 s search display. Participants indicated whether or not the target was symmetrical by pressing one of two buttons on the computer keyboard. The next trial started immediately after a response was made or after 5 s, whichever came first. For two thirds of the trials, search displays were accompanied by an angry or a neutral voice played over speakers positioned to the left and right of the screen. Voices were always presented binaurally. For the remaining trials, search displays were presented in silence (Figure 1).

Fig. 1

Research paradigm.

Research paradigm. During the single-task block, participants focused on the visual task only. During the dual-task block, they performed the visual task together with an auditory task that required them to indicate whether or not the voice, if present, was male. Participants responded to the visual task with the index and middle finger of one hand and to the auditory task with the index and middle finger of the other hand. For a given participant, hand and finger assignment for the visual task was constant across blocks. Across participants, block order and hand and finger assignment were counterbalanced. At the start of the experiment, participants first gave informed consent and were then prepared for the EEG recording. Subsequently, they were briefed about both tasks. Participants who performed the visual-only block first were briefed about the visual task followed by the auditory task, while participants who performed the visual–auditory task block first were briefed about the auditory task before the visual task. Participants were asked to fixate on the central fixation dot and to not shift their eyes to the crosses. Moreover, they were told to respond as quickly as possible, but without sacrificing response accuracy. Each task block comprised 576 trials distributed equally among the three sound conditions (i.e. angry, neutral and silent) and the two visual hemi-fields. We thus had 96 left and 96 right target trials for each cell in the design. An experimental session lasted about 1 h and 20 min.

EEG recording and analysis

The EEG was recorded using a 64-channel EEGO system from ANT. Electrodes were embedded in a cap according to the modified 10–20 system. Five additional electrodes were placed at the two outer canthi, above and below the left eye and the nose. The data were sampled at 500 Hz with a hardware defined non-linear anti-aliasing filter that attenuated frequencies below 183 Hz by -6dB and with CPz as an online reference. Data processing was done in MATLAB R2016B (The MathWorks, Inc., Natick, MA, USA) and EEGLAB 14.1.2.B (Delorme and Makeig, 2004). The data were re-referenced to the average of all electrodes, high-pass filtered at 0.1 Hz (0.1 Hz transition band; −6 dB/octave), low-pass filtered at 30 Hz (7.5 Hz transition band; −6 dB/octave) and epoched by centering a 2 s window around stimulus onset. The resulting epochs were visually scanned for non-typical artifacts caused by drifts or muscle movements, and epochs containing such artifacts were removed. The data were then subjected to an automatic rejection procedure that removed additional epochs in which the HEOG exceeded 100 μV or the VEOG exceeded 32 μV within the first 300 ms following stimulus onset. The HEOG cut-off translated to 2° of visual angle and thus less than half the radius of the circle (4.5°) on which visual targets appeared. Trials with early HEOG and VEOG movements were excluded in this manner because during these trials, the visual displays would not have been properly processed. Moreover, visual processing would have been suppressed by the eye movement (Bristow ), thus confounding early visual ERPs like N1 and N2. To prepare our data for an independent component analysis (ICA), we applied a 1 Hz high-pass filter that removed slow drifts and improved component decomposition. The component structure resulting from the ICA was then applied to the original epoched data set with the 0.1 to 30 Hz filter setting (Winkler ). Components reflecting the remaining horizontal and vertical eye movements were removed and the data back-projected from component space into EEG channel space. Another automatic rejection procedure was applied that removed epochs in which scalp channels exceeded 100 μV. Subsequently, the data were submitted to a current source density (CSD) transformation using the CSD tool box (Kayser and Tenke, 2003) with its default settings. This was followed by a trial number matching procedure whereby the condition with the lowest trial number was identified, and the same number of trials was randomly drawn from the other conditions. Final trial numbers ranged from 33 to 182 per condition and participant due to the fact that we lost many trials for some participants who had difficulties maintaining central fixation. Across participants, each condition averaged to 117 trials (s.d. 40.6). For four participants, one of the channels analyzed here required interpolation. Behavioral results. Mean d′ scores and reaction times are shown as a function of task, sound and sex. Error bars reflect the within-subject standard error. Following this preprocessing protocol, we re-epoched the data using a −100 to 500 ms time window and applied baseline correction using the 100 ms period before stimulus onset. Early ERP components previously linked to visual attention, including N1 and N2, were of primary interest. We quantified these components in two ways: (1) by averaging over left hemisphere electrodes PO7, PO5 and O1 and right hemisphere electrodes PO8, PO6 and O2 and (2) by subtracting channels ipsi-lateral from those contralateral to the target (i.e. PO7–PO8, PO5–PO6 and O1–O2 for right targets; PO8–PO7, PO6–PO5 and O2–O1 for left targets). Based on visual inspection of component peaks and guided by previous work (Heinze ; Woodman and Luck, 1999; Zani ), we computed mean voltages from resulting traces in two time windows centered around the N1 (140–190 ms) and N2 (230–270 ms) peaks, respectively. Please note that the N2 window overlapped with the shortest sound offsets (~228 ms). However, given that the offset duration did not differ between conditions, we considered a possible influence of sound offsets on the present N2 modulations to be negligible.

Results

Behavioral data

Behavioral results are illustrated in Figure 2. d′ scores were calculated for visual response accuracy by subtracting the normalized probability of falsely categorizing a cross as symmetrical from the normalized probability of correctly categorizing a cross as symmetrical. The resultant values served as the dependent variable in an ANOVA with task and sound as repeated measure factors and sex as a between-subjects factor. This ANOVA revealed a marginal interaction of sound and sex (F(2,92) = 2.61, P = 0.079, ηp2 = 0.054) and a significant interaction of task, sound and sex (F(2,92) = 3.44, P = 0.036, ηp2 = 0.069). All other effects were non-significant (Ps > 0.25). We pursued the three-way interaction by analyzing each task separately. In the dual task, the sound by sex interaction was significant (F(2,92) = 5.1, P = 0.008, ηp2 = 0.099); vocal expressions affected the sensitivity of visual categorizations significantly in women (F(2,46) = 3.51, P = 0.038, ηp2 = 0.132) and marginally in men (F(2,46) = 2.81, P = 0.07, ηp2 = 0.109). Women performed better on angry trials compared with neutral trials (F(1,23) = 4.91, P = 0.037, ηp2 = 0.176) without performance differences between angry and silent or neutral and silent trials (both Ps > 0.142). In contrast, men performed better on both neutral (F(1,23) = 4.91, P = 0.037, ηp2 = 0.218) and silent (F(1,23) = 4.24, P = 0.051, ηp2 = 0.156) trials compared with angry trials without performance differences between neutral and silent trials (P > 0.25). In the single task, both the sound effect and the sound by sex interaction were non-significant (Ps > 0.25).

Fig. 2

Behavioral results. Mean d′ scores and reaction times are shown as a function of task, sound and sex. Error bars reflect the within-subject standard error.

RTs for correctly categorized targets were submitted to an ANOVA with task and sound as repeated measure factors and sex as a between-subjects factor. This revealed the main effects of task (F(1,46) = 126.43, P < 0.0001, ηp2 = 0.733) and sound (F(2,92) = 8.51, P < 0.001, ηp2 = 0.156), as well as a task by sound interaction (F(2,92) = 13.75, P < 0.0001, ηp2 = 0.23). Follow-up analyses indicated that voices affected performance in the dual task (F(2,92) = 12.38, P < 0.0001, ηp2 = 0.212), but not the single task (P > 0.25). In the dual task, angry (F(1,46) = 13.48, P < 0.001, ηp2 = 0.227) and neutral (F(1,46) = 16.31, P < 0.001, ηp2 = 0.262) expressions slowed down RTs relative to silence. However, angry and neutral trials were performed with similar speeds (P > 0.25). A marginal effect of sex (F(1,46) = 3.08, P = 0.086, ηp2 = 0.063) suggested that women tended to respond more slowly than men. All other effects were non-significant (all Ps > 0.197).

Event-related potentials

Electrophysiological results are illustrated in Figures 3 and 4. Visual ERPs were explored in two ways. First, we examined components of interest for the entire visual field to determine general effects of voices on visual attention. In a second step, we analyzed the ERP difference between target and non-target hemifields to determine whether voices modulate spatial orienting to targets.

Fig. 3

Fig. 4

ERP mean amplitudes. Mean voltages in the N1 and N2 analysis windows are shown as a function of task, sound, sex and region. Error bars reflect the within-subject standard error.

ERP traces and maps. Mean ERP voltages were derived by separately averaging signals for left occipital electrodes (PO7, PO5 and O1), right occipital electrodes (PO8, PO6 and O2) and the voltage difference between contra- and ipsi-lateral occipital electrodes. Time windows for statistical analysis are marked by the shaded areas. Maps illustrate the mean voltages and condition differences for the statistical analysis windows. ERP mean amplitudes. Mean voltages in the N1 and N2 analysis windows are shown as a function of task, sound, sex and region. Error bars reflect the within-subject standard error. Our first set of analyses revealed effects for both N1 and N2. The N1 was modulated by main effects of task (F(1,46) = 22.5, P < 0.0001, ηp2 = 0.328) and sound (F(2,92) = 6.73, P = 0.002, ηp2 = 0.128) as well as interactions of task, sound and laterality (F(2,92) = 3.11, P = 0.049, ηp2 = 0.063) and task, sound, laterality and sex (F(2,92) = 6.12, P = 0.003, ηp2 = 0.117). We pursued the latter interaction by examining data from men and women separately. For men, we found that the interaction of task, sound and laterality was non-significant. Men showed a task effect only (F(1,23) = 11.87, P = 0.002, ηp2 = 0.34), indicating that N1 was larger in the dual task compared with the single task. No other effects reached the traditional significance threshold (all Ps > 0.135). For women, we observed task (F(1,23) = 10.69, P = 0.003, ηp2 = 0.317) and sound main effects (F(2,46) = 4.99, P = 0.011, ηp2 = 0.178) as well as an interaction of task, sound and laterality (F(2,46) = 7, P = 0.002, ηp2 = 0.233). Over the left occipital region, the sound effect differed between tasks (F(2,46) = 6.65, P = 0.003, ηp2 = 0.224). In the single task, it was non-significant (P > 0.25). However, in the dual task (F(2,46) = 9.05, P < 0.001, ηp2 = 0.282), N1 amplitudes were larger for silent (F(1,23) = 12.63, P = 0.002, ηp2 = 0.354) and angry (F(1,23) = 11.52, P = 0.002, ηp2 = 0.113) trials compared with neutral trials. Silent and angry trials did not differ (P = 0.101). Over the right occipital region, the sound effect and the sound by task interaction were non-significant (Ps>0.128). There was only a task effect indicating that, similar to men, N1 amplitudes were larger in the dual task compared with the single task. Analysis of N2 revealed a sound main effect (F(2,92) = 8.15, P < 0.001, ηp2 = 0.15) and a sound by sex interaction (F(2,92) = 3.95, P = 0.022, ηp2 = 0.079; all other Ps > 0.109). The sound effect was significant in women (F(2,92) = 8.77, P < 0.001, ηp2 = 0.276), but not in men (P = 0.474). In women, N2 amplitudes were larger for angry trials compared with neutral (F(1,23) = 4.45, P = 0.046, ηp2 = 0.162) trials and for neutral trials compared with silent trials (F(1,23) = 5.87, P = 0.024, ηp2 = 0.203). We explored the target-directed attention effects for both N1 and N2 by computing their posterior-contralateral (pc) indices. For the N1pc, all effects were non-significant. For the N2pc, there was a marginal effect of sex (F(1,46) = 3.24, P = 0.078, ηp2 = 0.066) suggesting that the N2pc tended to be larger in women than in men. Additionally, a significant effect of task (F(1,46) = 7.68, P = 0.008, ηp2 = 0.143) indicated that spatial attention toward the target was greater or more effectively allocated in the single task compared with the dual task. All other effects were non-significant (Ps>0.207).

Discussion

Here we explored the role of attention in enabling effects of vocal threat on visual perception. Additionally, we characterized the nature of these effects and how they unfold in women compared with men. In the following, we will discuss these three points in turn.

Are auditory emotion effects on visual processing automatic or controlled?

Much previous work has pursued the interaction between emotion and attention. Of particular interest here are studies examining the cross-modal effect of auditory emotions on visual processing using fMRI (Mothes-Lasch , 2012). Similar to the present paradigm, they presented a visual categorization task against the backdrop of angry and neutral voices (Mothes-Lasch , 2012). Differences in brain activity between these conditions depended on voices being task-relevant, suggesting that auditory affective processing or more specifically, the influence auditory affect on visual processing requires attention. Due to fMRI’s sluggish nature, however, these claims have been challenged and studies using a temporally more sensitive approach have been called for (Brosch and Wieser, 2011). Here we adopted such an approach and, to increase methodological convergence, used the paradigm implemented in earlier fMRI work (Mothes-Lasch , 2012). This, however, meant that volume conduction could compromise the interpretation of occipital scalp effects. Specifically, the concurrent presentation of visual and auditory stimuli could be expected to produce effects extending to auditory and visual recording sites, respectively. We addressed this problem by applying a CSD transformation, thus making recorded voltages reference-free and reducing global effects while enhancing local effects linked to underlying cortical tissue (Kayser and Tenke, 2015). Our CSD results revealed both task-dependent (N1, behavior) and independent (N2) effects of the vocal affect. As such, they partially disagree with previous fMRI results (Mothes-Lasch , 2012). Moreover, they highlight that both more and less automatic processes may be observed concurrently from the same paradigm with a technique that better captures how these processes unfold in time. In light of our effects, we conclude that affective influences under automatic processing conditions occurred later than affective influences under controlled processing conditions. In other words, paying attention to affective voices temporally facilitated their integration with attended visual input. Possibly, in the absence of focused attention, the affective processing of auditory signals is too slow or not salient enough to impact on early bottom-up representations in other modalities. Notably, our conclusion contrasts with previous electrophysiological evidence on unimodal perception. Specifically, intracranial recordings from the amygdala (Pourtois ) using a paradigm similar to that of Vuilleumier and colleagues (Vuilleumier ) revealed a task-independent early emotion effect starting around 140 ms and a task-dependent later effect starting around 750 ms. Likewise, two studies adopting the paradigm of Pessoa and colleagues (Pessoa ) using magnetoencephalography (Luo ) or combining EEG and fMRI (Müller-Bardorff ) found task-independent effects after 40 ms and task-dependent effects after 280 ms. Thus, future discussions of the relation between emotion and attention must carefully consider the type of processes (e.g. uni- vs cross-model representations) for which task-dependent and independent effects are being assessed.

How do vocal expressions shape visual perception?

How auditory background shapes visual perception has been of great interest to applied psychologists (Gaspar ). Moreover, their work revealed impairment effects which could be replicated here. Specifically, reaction times were longer for voice trials relative to silence when voices were task-relevant. Although part of the latter task effect may arise from the dual vs single hand motor demands, one may reasonably venture that bi vs unimodal cognitive demands also played a role. In line with this, (neutral) task-relevant but not task-irrelevant voices reduced N1 to visual targets, indicating that the additional demand of attending to a speaker hampered bottom-up visual representations. Importantly, the present voice effects differed as a function of expression and, unexpectedly, were not generally debilitating. In the single task, N2 was larger for angry trials compared with neutral trials, suggesting that vocal threat benefited associated top-down mechanisms of visual attention. In the dual task, this N2 effect was complemented by a larger N1 and more accurate visual categorizations for angry trials compared with neutral trials. Notably, there were neither impairment effects nor enhancement effects on target-hemifield ERPs, suggesting that spatial orienting was unaffected. Taken together, we show that unrelated sounds impair aspects of visual performance, but that impairments may be compensated and accompanied by processing benefits as a function of sound affect. Moreover, vocal anger appears to boost a fairly automatic mechanism reflected by N2. Its bilateral topography further points to a modulation of left-lateralized local and right-lateralized global perceptual processes supporting item-specific and display-general representations, respectively (Fink ). Additionally, a left-lateralized attention-dependent mechanism reflected by N1 may promote local-over-global processing. Together, both mechanisms seem to enhance resource allocation across an individual’s visual field in a catch-all fashion, thus benefiting the integrity of both target and non-target representations.

Do the sexes differ?

As expected, we found effects of vocal affect to be more pronounced in women than in men. Specifically, the ERP and behavioral results described above were significant in women only. Men, in contrast, showed only marginal accuracy differences between the voice conditions, and angry voices tended to impair rather than enhance visual performance. These findings fit well with previous evidence that women are more likely than men to process social signals (Proverbio ; van den Brink ) and emotional expressions that are task-irrelevant. For example, in women, but not men, emotional faces prime lexical decisions (Schirmer ), and vocal emotional oddballs enhance the change detection response in the ERP (Schirmer ). Additionally, women are more likely than men to show enhanced orienting toward an angry voice compared with a neutral voice (Burra ). Because these sex differences typically disappear when social signals and their emotions have to be processed in order to perform an experimental task (Schirmer , 2006), they likely reflect differences in how automatically emotions are accessed. Compared with men, women may require less effort or mental resources for establishing affective or emotion relevance.

Conclusions

Many visual tasks, such as driving, often occur against an auditory backdrop like the voices of other people. Exploring their influence on primary task performance, we found that although voices in general impaired visual performance, threat, compared to neutral affect, compensated and partially reversed these effects as a function of attention and sex. When the task focused on visual information only, angry voices relative to neutral voices enhanced the neural correlates of visual attention without consequences for behavior in women, but not in men. When both visual and auditory information were in focus, anger elicited more substantial neural benefits as well as more accurate visual categorization in women only. In sum, we found that automaticity in emotion is not an either/or issue. Instead, emotions emerge in multifaceted ways that may be more or less resource-dependent and that vary as a function of situational and individual factors.

Conflict of interest

None declared.

52 in total

1. Optimizing PCA methodology for ERP component identification and measurement: theoretical rationale and empirical evaluation.

Authors: Jürgen Kayser; Craig E Tenke
Journal: Clin Neurophysiol Date: 2003-12 Impact factor: 3.708

2. The (non)automaticity of amygdala responses to threat: on the issue of fast signals and slow measures.

Authors: Tobias Brosch; Matthias J Wieser
Journal: J Neurosci Date: 2011-10-12 Impact factor: 6.167

3. On the influence of high-pass filtering on ICA-based artifact reduction in EEG-ERP.

Authors: Irene Winkler; Stefan Debener; Klaus-Robert Müller; Michael Tangermann
Journal: Conf Proc IEEE Eng Med Biol Soc Date: 2015

Review 4. Emotion and attention: event-related brain potential studies.

Authors: Harald T Schupp; Tobias Flaisch; Jessica Stockburger; Markus Junghöfer
Journal: Prog Brain Res Date: 2006 Impact factor: 2.453

5. Perceiving verbal and vocal emotions in a second language.

Authors: Chua Shi Min; Annett Schirmer
Journal: Cogn Emot Date: 2011-05-24

6. Where in the brain does visual attention select the forest and the trees?

Authors: G R Fink; P W Halligan; J C Marshall; C D Frith; R S Frackowiak; R J Dolan
Journal: Nature Date: 1996-08-15 Impact factor: 49.962

7. Vocal emotions influence verbal memory: neural correlates and interindividual differences.

Authors: Annett Schirmer; Ce-Belle Chen; April Ching; Ling Tan; Ryan Y Hong
Journal: Cogn Affect Behav Neurosci Date: 2013-03 Impact factor: 3.282

Review 8. Neural mechanisms of visual selective attention.

Authors: G R Mangun
Journal: Psychophysiology Date: 1995-01 Impact factor: 4.016

9. Temporal precedence of emotion over attention modulations in the lateral amygdala: Intracranial ERP evidence from a patient with temporal lobe epilepsy.

Authors: Gilles Pourtois; Laurent Spinelli; Margitta Seeck; Patrik Vuilleumier
Journal: Cogn Affect Behav Neurosci Date: 2010-03 Impact factor: 3.282

10. Cross-modal emotional attention: emotional voices modulate early stages of visual processing.

Authors: Tobias Brosch; Didier Grandjean; David Sander; Klaus R Scherer
Journal: J Cogn Neurosci Date: 2009-09 Impact factor: 3.225

1 in total

1. Attentional conditions differentially affect early, intermediate and late neural responses to fearful and neutral faces.

Authors: Sebastian Schindler; Maximilian Bruchmann; Anna-Lena Steinweg; Robert Moeck; Thomas Straube
Journal: Soc Cogn Affect Neurosci Date: 2020-09-24 Impact factor: 3.436

1 in total