| Literature DB >> 32985572 |
Maya Inbar1,2, Eitan Grossman1, Ayelet N Landau3.
Abstract
Studies of speech processing investigate the relationship between temporal structure in speech stimuli and neural activity. Despite clear evidence that the brain tracks speech at low frequencies (~ 1 Hz), it is not well understood what linguistic information gives rise to this rhythm. In this study, we harness linguistic theory to draw attention to Intonation Units (IUs), a fundamental prosodic unit of human language, and characterize their temporal structure as captured in the speech envelope, an acoustic representation relevant to the neural processing of speech. IUs are defined by a specific pattern of syllable delivery, together with resets in pitch and articulatory force. Linguistic studies of spontaneous speech indicate that this prosodic segmentation paces new information in language use across diverse languages. Therefore, IUs provide a universal structural cue for the cognitive dynamics of speech production and comprehension. We study the relation between IUs and periodicities in the speech envelope, applying methods from investigations of neural synchronization. Our sample includes recordings from every-day speech contexts of over 100 speakers and six languages. We find that sequences of IUs form a consistent low-frequency rhythm and constitute a significant periodic cue within the speech envelope. Our findings allow to predict that IUs are utilized by the neural system when tracking speech. The methods we introduce here facilitate testing this prediction in the future (i.e., with physiological data).Entities:
Mesh:
Year: 2020 PMID: 32985572 PMCID: PMC7522717 DOI: 10.1038/s41598-020-72739-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Analysis pipeline. (a) An example Intonation Unit sequence from a conversation in Du Bois et al.[11]. (b) Illustration of one of the characteristics contributing to the delimitation of IUs: the fast-slow dynamic of syllables. A succession of short syllables is followed by comparatively longer ones; new units are cued by the resumed rush in syllable rate following the lengthening (syllable duration measured in ms). (c) Illustration of the phase-consistency analysis: 2-s windows of the speech envelope (green) were extracted around each IU onset (gray vertical line), decomposed and compared for consistency of phase angle within each frequency. (d) Illustration of the IU-onset permutation in time, which was used to compute the randomization distribution of phase consistency spectra (see “Materials and methods” section).
Summary information on the sample of speech segments used in the study.
| Source of recordings and transcriptions | Number of recordings | Audio duration (min) | Number of IUs | Number of speakers with > 5 IUs | % IUs following inter-IU interval < 1 s (%) |
|---|---|---|---|---|---|
| Santa Barbara Corpus of Spoken American English | 10 | 9:58 | 460 | 19 | 78.3 |
| Haifa Corpus of Spoken Hebrew | 10 | 6:30 | 507 | 24 | 80.9 |
| Russian multichannel discourse | 3 | 60:26 | 3078 | 9 | 77.2 |
| DoBeS Summits-PAGE Collection of Papuan Malay | 20 | 64:07 | 2995 | 33 | 89.1 |
| DoBeS Wooi documentation | 12 | 34:59 | 1033 | 18 | 64.6 |
| DoBes Yali documentation | 5 | 13:25 | 561 | 10 | 80.2 |
Figure 2Characterization of the temporal structure of Intonation Units. Phase-consistency analysis results include the average of phase consistency spectra across speakers for each language. Shaded regions denote bootstrapped 95% confidence intervals[43] of the averages. Significance is denoted by a horizontal line above the spectra, after correction for multiple comparisons across neighboring frequency bins using an FDR procedure. Inset: Probability distribution of IU durations within each language corpus, calculated for 50 ms bins and pooled across speakers. Overlaid are the medians (dashed line; dark gray) and the bootstrapped 95% confidence intervals of the medians (light gray).