| Literature DB >> 27630552 |
Vincent Aubanel1, Chris Davis1, Jeesun Kim1.
Abstract
A growing body of evidence shows that brain oscillations track speech. This mechanism is thought to maximize processing efficiency by allocating resources to important speech information, effectively parsing speech into units of appropriate granularity for further decoding. However, some aspects of this mechanism remain unclear. First, while periodicity is an intrinsic property of this physiological mechanism, speech is only quasi-periodic, so it is not clear whether periodicity would present an advantage in processing. Second, it is still a matter of debate which aspect of speech triggers or maintains cortical entrainment, from bottom-up cues such as fluctuations of the amplitude envelope of speech to higher level linguistic cues such as syntactic structure. We present data from a behavioral experiment assessing the effect of isochronous retiming of speech on speech perception in noise. Two types of anchor points were defined for retiming speech, namely syllable onsets and amplitude envelope peaks. For each anchor point type, retiming was implemented at two hierarchical levels, a slow time scale around 2.5 Hz and a fast time scale around 4 Hz. Results show that while any temporal distortion resulted in reduced speech intelligibility, isochronous speech anchored to P-centers (approximated by stressed syllable vowel onsets) was significantly more intelligible than a matched anisochronous retiming, suggesting a facilitative role of periodicity defined on linguistically motivated units in processing speech in noise.Entities:
Keywords: brain oscillations; isochrony; speech intelligibility; syllable; temporal modification
Year: 2016 PMID: 27630552 PMCID: PMC5006149 DOI: 10.3389/fnhum.2016.00430
Source DB: PubMed Journal: Front Hum Neurosci ISSN: 1662-5161 Impact factor: 3.169
Figure 1Anchor points used for the isochronous modification, with the associated time scale function for the stressed syllable level for the example sentence Spectrogram of the naturally timed sentence, with time instants of stressed syllables onsets overlaid in red. (B) Annotation in words (scored keywords in capital letters) and phonemes. Stressed syllables are in red, unstressed syllables in orange, remaining phonemes, and word onsets in black. (C) Amplitude envelope in normalized units with original peaks identified with empty circles. Peaks adjusted for sentence-level decay are shown in numbered circles with numbers indicating decreasing value order. Dark blue: four highest peaks; light blue: remaining peaks up to height. Note the relative agreement in timing between stressed syllables and low number of amplitude envelope peaks anchor points. (D) Time scale function for the stressed syllables anchor points. Values < 1 indicate compression, values >1 elongation. (E) Spectrogram of resulting isochronous sentence at the stressed syllable level, with time instants of isochronous stressed syllables onsets overlaid in red. (F) Resulting isochronous annotation.
Figure 2Mean inter-anchor point frequency and temporal distortion across four temporal modification conditions. Errobars, here and elsewhere, show 95% confidence intervals (N = 190).
Figure 3Phoneme-level temporal distortion for the two anchor point types at the two time scales (.
Figure 4Mean proportion of correctly identified keyword per sentence over participants for the five conditions. Experiment I: syllable-based anchor points (N = 26). Experiment II: amplitude envelope-based anchor points (N = 29).
Output of generalized linear mixed models fitted separately for each experiment.
| 1 | −0.545 | −12.16 | < 0.001 | *** | −0.847 | −19.87 | < 0.001 | *** | ||
| 2 | −0.722 | −16.06 | < 0.001 | *** | −0.821 | −19.18 | < 0.001 | *** | ||
| 3 | −1.017 | −22.40 | < 0.001 | *** | −0.713 | −16.78 | < 0.001 | *** | ||
| 4 | −1.127 | −24.68 | < 0.001 | *** | −0.666 | −15.67 | < 0.001 | *** | ||
| 5 | 0.177 | 4.00 | < 0.001 | *** | −0.027 | −0.63 | 0.965 | |||
| 6 | 0.110 | 2.43 | 0.092 | . | −0.047 | −1.12 | 0.771 | |||
| 7 | 0.287 | 4.54 | < 0.001 | *** | −0.074 | −1.24 | 0.700 | |||
| 8 | −0.878 | −13.80 | < 0.001 | *** | 0.290 | 4.82 | < 0.001 | *** | ||
Each numbered line shows a comparison, its estimate, the z-value and associated p-value, and visual indication of significativity.
Correlation between intelligibility scores of transformed sentences and .
| Stressed syllables ( | −0.19 | 0.000 | *** | Low number ( | −0.39 | 0.000 | *** | ||
| −0.20 | 0.000 | *** | −0.31 | 0.000 | *** | ||||
| All syllables ( | −0.26 | 0.000 | *** | High number ( | −0.25 | 0.000 | *** | ||
| −0.31 | 0.000 | *** | −0.22 | 0.000 | *** | ||||
| Stressed syllables ( | −0.03 | 0.382 | Low number ( | 0.20 | 0.000 | *** | |||
| 0.03 | 0.439 | 0.18 | 0.000 | *** | |||||
| All syllables ( | 0.15 | 0.000 | *** | High number ( | 0.23 | 0.000 | *** | ||
| 0.25 | 0.000 | *** | 0.24 | 0.000 | *** | ||||
Each cell displays the anchor point type, the polarity of the transformation, the Pearson's product moment correlation coefficient with its associated p-value, and a visual indication of significativity.