| Literature DB >> 34893546 |
Emily S Teoh1, Farhin Ahmed2, Edmund C Lalor3,2.
Abstract
Humans have the remarkable ability to selectively focus on a single talker in the midst of other competing talkers. The neural mechanisms that underlie this phenomenon remain incompletely understood. In particular, there has been longstanding debate over whether attention operates at an early or late stage in the speech processing hierarchy. One way to better understand this is to examine how attention might differentially affect neurophysiological indices of hierarchical acoustic and linguistic speech representations. In this study, we do this by using encoding models to identify neural correlates of speech processing at various levels of representation. Specifically, we recorded EEG from fourteen human subjects (nine female and five male) during a "cocktail party" attention experiment. Model comparisons based on these data revealed phonetic feature processing for attended, but not unattended speech. Furthermore, we show that attention specifically enhances isolated indices of phonetic feature processing, but that such attention effects are not apparent for isolated measures of acoustic processing. These results provide new insights into the effects of attention on different prelexical representations of speech, insights that complement recent anatomic accounts of the hierarchical encoding of attended speech. Furthermore, our findings support the notion that, for attended speech, phonetic features are processed as a distinct stage, separate from the processing of the speech acoustics.SIGNIFICANCE STATEMENT Humans are very good at paying attention to one speaker in an environment with multiple speakers. However, the details of how attended and unattended speech are processed differently by the brain is not completely clear. Here, we explore how attention affects the processing of the acoustic sounds of speech as well as the mapping of those sounds onto categorical phonetic features. We find evidence of categorical phonetic feature processing for attended, but not unattended speech. Furthermore, we find evidence that categorical phonetic feature processing is enhanced by attention, but acoustic processing is not. These findings add an important new layer in our understanding of how the human brain solves the cocktail party problem.Entities:
Keywords: EEG; attention; cocktail party; speech
Mesh:
Year: 2021 PMID: 34893546 PMCID: PMC8805628 DOI: 10.1523/JNEUROSCI.1455-20.2021
Source DB: PubMed Journal: J Neurosci ISSN: 0270-6474 Impact factor: 6.167
Figure 1., Speech representations: the first row depicts an acoustic waveform of an excerpt taken from one of the stimuli. Subsequent rows show the computed acoustic and phonetic representations for that excerpt. , Analysis framework: cross-validation is used to train forward models mapping the different attended and unattended speech representations to EEG. These models are then used to predict left-out EEG. Pearson's correlation is used to evaluate model accuracy, and the prediction accuracies of acoustic and phonetic models are compared.
Figure 2., Behavioral (comprehension questionnaire) results; dots indicate individual subject performance. Theoretical chance level is 25% (multiple-choice test with four options) and is indicated by the dashed line. , Prediction accuracies of individual acoustic and phonetic feature spaces (as labeled on horizontal axes) for the attended and unattended stimuli. There is redundancy between these features, so joint modeling was performed to gauge in particular whether phonetic features add unique predictive power beyond other features. , Prediction accuracies of joint feature spaces for attended and unattended stimuli (**p < 0.01, two-tailed Wilcoxon sign-ranked test, FDR corrected). On each box, the central horizontal line indicates the median. The bottom and top edges of the boxes indicate the 25th and 75th percentiles of the data, respectively. The whiskers indicate variability outside the upper and lower quartiles. The '+' sign inside the boxes indicates mean value and the '+' sign outside the boxes indicate outlier. n.s. means not significant (p > 0.05).
Joint model comparison statistics
| Attended | Unattended | |
|---|---|---|
| Baseline: | ||
| |
| |
| |
|
|
| |
| |
| Baseline: | ||
| |
| |
| Baseline: | ||
| |
| |
| Baseline: | ||
| |
|
Wilcoxon two-sided signed-rank test results for attended and unattended stimuli (columns) along with the bayes factors (BF10). Each set of rows tests a different statistical question, as specified by the baseline. Terms being evaluated are to the left of the > symbol. Bolded values indicate significant improvement over baseline.
Figure 3., Unique predictive power for the acoustic and phonetic feature spaces under two attentional states. The features are as labeled on the horizontal axis. Statistical testing was conducted to identify attentional effects (two-tailed Wilcoxon signed-rank; *p < 0.05, FDR corrected). , Topographic distribution of partial correlations, averaged across all subjects. , Unique predictive power for the phonetic features representation (f) in 50-ms windows from 0 to 300 ms, under two attentional states (two-tailed Wilcoxon signed-rank test; *p < 0.05, **p < 0.005, FDR corrected). On each box, the central horizontal line indicates the median. The bottom and top edges of the boxes indicate the 25th and 75th percentiles of the data, respectively. The whiskers indicate variability outside the upper and lower quartiles. The '+' sign inside the boxes indicates mean value and the '+' sign outside the boxes indicate outlier. n.s. means not significant (p > 0.05).
Figure 4.A significant reduction in unique predictive power was observed for the attended condition when the partial correlation analysis was repeated with phonemes only categorized as vowels or consonants (vc), as opposed to their underlying articulatory features (f; two-tailed Wilcoxon signed-rank test, *p = 0.011, z = 2.542).