| Literature DB >> 34966252 |
Moïra-Phoebé Huet1,2, Christophe Micheyl3, Etienne Parizet1, Etienne Gaudrain2,4.
Abstract
During the past decade, several studies have identified electroencephalographic (EEG) correlates of selective auditory attention to speech. In these studies, typically, listeners are instructed to focus on one of two concurrent speech streams (the "target"), while ignoring the other (the "masker"). EEG signals are recorded while participants are performing this task, and subsequently analyzed to recover the attended stream. An assumption often made in these studies is that the participant's attention can remain focused on the target throughout the test. To check this assumption, and assess when a participant's attention in a concurrent speech listening task was directed toward the target, the masker, or neither, we designed a behavioral listen-then-recall task (the Long-SWoRD test). After listening to two simultaneous short stories, participants had to identify keywords from the target story, randomly interspersed among words from the masker story and words from neither story, on a computer screen. To modulate task difficulty, and hence, the likelihood of attentional switches, masker stories were originally uttered by the same talker as the target stories. The masker voice parameters were then manipulated to parametrically control the similarity of the two streams, from clearly dissimilar to almost identical. While participants listened to the stories, EEG signals were measured and subsequently, analyzed using a temporal response function (TRF) model to reconstruct the speech stimuli. Responses in the behavioral recall task were used to infer, retrospectively, when attention was directed toward the target, the masker, or neither. During the model-training phase, the results of these behavioral-data-driven inferences were used as inputs to the model in addition to the EEG signals, to determine if this additional information would improve stimulus reconstruction accuracy, relative to performance of models trained under the assumption that the listener's attention was unwaveringly focused on the target. Results from 21 participants show that information regarding the actual - as opposed to, assumed - attentional focus can be used advantageously during model training, to enhance subsequent (test phase) accuracy of auditory stimulus-reconstruction based on EEG signals. This is the case, especially, in challenging listening situations, where the participants' attention is less likely to remain focused entirely on the target talker. In situations where the two competing voices are clearly distinct and easily separated perceptually, the assumption that listeners are able to stay focused on the target is reasonable. The behavioral recall protocol introduced here provides experimenters with a means to behaviorally track fluctuations in auditory selective attention, including, in combined behavioral/neurophysiological studies.Entities:
Keywords: attentional switches; neural tracking; speech-on-speech; temporal response function (TRF); vocal cues
Year: 2021 PMID: 34966252 PMCID: PMC8710602 DOI: 10.3389/fnins.2021.674112
Source DB: PubMed Journal: Front Neurosci ISSN: 1662-453X Impact factor: 4.677
Distance between the target and the masker voices, in semitones.
|
|
|
|
|
| Difficult | −1.6 | 0.4 | 1.14 |
| Intermediate | −3.2 | 1.21 | 3.42 |
| Easy | −4.8 | 1.82 | 5.13 |
FIGURE 1Creation steps of inferred stimuli. Panel (A) represents an example of a trial where the target is in black and the masker is in red. In this example, the participant has answered the first target keyword (highlighted in purple), the second masker keyword and the third extraneous keyword. The unselected keywords are shown with a hatched pattern. (B) The three segments are built according to the participant’s answer with an attention switch of 3 s and the extraneous segments filled with the mixture of the target and masker (i.e., method h+, in yellow). The attention scopes are interpolated with the segment method g. The three segments are then added together. (C) The three attention scopes are built according to the participant’s answer with an attention switch of 1 s and the extraneous sections filled with noise (h). When no behavioral information is known, the attention scope is filled with the target (i.e., g). The three parts are then added together.
FIGURE 2Electroencephalographic channel positions.
FIGURE 3Percentage for target answers (in green), masker answers (in light yellow), and extraneous answers (in dark purple) in each level of difficulty. The points represent each identified keywords percentage for every participant in each condition. The hinges of the boxplot represent the first and the third quartile. The middle of the boxplot is the median. The whiskers extend up to 1.5 times the interquartile range.
FIGURE 4Average reconstruction accuracy r for inferred envelopes per condition. The points represent the scores for every participant in each voice condition for the extraneous keyword (in color) and the attentional scope (top and bottom). The hinges of the boxplot represent the first and the third quartile. The median is represented as a bar in each boxplot. The whiskers extend over 1.5 times the interquartile range.
Post hoc analyses for Equation 3.
|
|
|
|
| Difference between voices | 1.14 st vs. 3.42 st | |
| 1.14 st vs. 5.13 st | ||
| 3.42 st vs. 5.13 st | ||
| Extraneous keyword filler | Mixture vs. target | |
| Mixture vs. masker | ||
| Target vs. masker | ||
| Target vs. other speech stream | ||
| Target vs. noise | ||
| Other speech stream vs. noise |
FIGURE 5Average reconstruction accuracy r per condition for the decoder based on original target stimuli (in dark blue), and for the decoder based on the best inferred stimuli (in light yellow).
Post hoc analyses for the Equation 4 interaction.
|
|
|
| 1.14 st: target vs. behavioral decoder | |
| 3.42 st: target vs. behavioral decoder | |
| 5.13 st: target vs. behavioral decoder | |
| Target decoder: 1.14 st vs. 3.42 st | |
| Target decoder: 1.14 st vs. 5.13 st | |
| Target decoder: 3.42 st vs. 5.13 st | |
| Behavioral decoder: 1.14 st vs. 3.42 st | |
| Behavioral decoder: 1.14 st vs. 5.13 st | |
| Behavioral decoder: 3.42 st vs. 5.13 st |