| Literature DB >> 33177590 |
Vincent Aubanel1, Jean-Luc Schwartz2.
Abstract
The role of isochrony in speech-the hypothetical division of speech units into equal duration intervals-has been the subject of a long-standing debate. Current approaches in neurosciences have brought new perspectives in that debate through the theoretical framework of predictive coding and cortical oscillations. Here we assess the comparative roles of naturalness and isochrony in the intelligibility of speech in noise for French and English, two languages representative of two well-established contrastive rhythm classes. We show that both top-down predictions associated with the natural timing of speech and to a lesser extent bottom-up predictions associated with isochrony at a syllabic timescale improve intelligibility. We found a similar pattern of results for both languages, suggesting that temporal characterisation of speech from different rhythm classes could be unified around a single core speech unit, with neurophysiologically defined duration and linguistically anchored temporal location. Taken together, our results suggest that isochrony does not seem to be a main dimension of speech processing, but may be a consequence of neurobiological processing constraints, manifesting in behavioural performance and ultimately explaining why isochronous stimuli occupy a particular status in speech and human perception in general.Entities:
Mesh:
Year: 2020 PMID: 33177590 PMCID: PMC7658253 DOI: 10.1038/s41598-020-76594-1
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Top panel: proportion of words correctly recognised in each experimental condition for French and English. Error bars show 95% confidence intervals over 26 and 27 subjects in the two languages respectively. Bottom panel: average sentence temporal distortion ( function computed on speech units matching the temporal condition, see Eq. (1)). By construction, temporal distortion is null for natural sentences (NAT condition) and identical for isochronously and anisochronously retimed sentences at a given rythmic level, that is, in ISO.acc and ANI.acc conditions on one hand, and in ISO.syl and ANI.syl conditions on the other hand. Error bars show 95% confidence intervals over 180 sentences for French and English. Data for English was previously reported in[19].
Simultaneous generalised hypotheses tests for the effect of condition on intelligibility, formulated on two independent models for French and English respectively. From left to right: comparison tested and, for each language, comparison estimate and associated z and p values, with classical visual significativity indication. Data for English has been previously reported in[19].
| Row | Comparison | French | English | ||||
|---|---|---|---|---|---|---|---|
| Est. | Est. | ||||||
| 1 | ISO.acc, NAT | ||||||
| 2 | ANI.acc, NAT | ||||||
| 3 | ISO.syl, NAT | ||||||
| 4 | ANI.syl, NAT | ||||||
| 5 | ISO.acc, ANI.acc | 0.273 | 0.177 | 4.00 | |||
| 6 | ISO.syl, ANI.syl | 0.878 | 0.110 | 2.43 | |||
| 7 | ISO, ANI | 0.233 | 0.287 | 4.54 | |||
| 8 | syl, acc | ||||||
Analysis metrics to evaluate departure from naturally-timed and isochronous forms at the accent and syllable level (rows) in each of the 5 experimental conditions (columns). Each argument of the function (Eq. 1) is a time series of either accent (acc) or syllable (syl) onsets, as they occur in a given experimental condition. For example, represents the syllable onsets of a sentence as they occur in the transformed ISO.acc experimental condition. Note that some of these distortions are equal to 0 by construction: they are dnat.acc and and dnat.syl for NAT sentences, and diso.acc and diso.syl for ISO.acc and ISO.syl sentences respectively.
| NAT | ISO.acc | ISO.syl | |
|---|---|---|---|
Figure 2Intelligibility as a function of temporal distortion, as measured by the four metrics (rows) defined in Table 5. Data are grouped according to experimental condition (colors), the type of modification of the experimental condition (column groupings) and language (columns). Three subsets of data are highlighted for subsequent analysis (see text): (A) departure from isochrony of naturally timed sentences; (B) departure from natural rhythm of isochronously retimed sentences; (C) departure from natural rhythm and isochrony of anisochronously retimed sentences. Regression lines show linear modelling of the data points, disregarding subject and sentence random variation.
(A) Initial (m1) and equivalent simpler (m2) model for the role of departure from isochrony in naturally timed sentences. The formulae of the fixed effects are given for the two models, and the result of a likelihood-ratio test between the two models is given on the right of the vertical separator. (B) m2 model coefficients, with associated p values. (C) Fixed-effect sizes with lower and upper confidence levels.
| Model summary | Likelihood-ratio test (m1, m2) | ||||
|---|---|---|---|---|---|
| Fixed effects | AIC | Df | |||
| m1 | 6728.4 | 3.36 | 5 | 0.65 | |
| m2 | 6721.7 | ||||
(A) Initial (m3) and equivalent simpler (m4) models for the role of departure from natural rhythm in isochronously retimed sentences.The formulae of the fixed effects are given for the two models, and the result of a likelihood-ratio test between the two models is given on the right of the vertical separator. (B) m4 model coefficients, with associated p values. (C) Fixed-effect sizes with lower and upper confidence levels.
| Model summary | Likelihood-ratio test (m3, m4) | ||||
|---|---|---|---|---|---|
| Fixed effects | AIC | Df | |||
| m3 | 13,507 | 6.12 | 6 | 0.41 | |
| m4 | 13,502 | ||||
(A) Initial (m5) and equivalent simpler (m6) model for the role of departure from isochrony and natural rhythm in anisochronously retimed sentences. The formulae of the fixed effects are given for the two models, and the result of a likelihood-ratio test between the two models is given on the right of the vertical separator. (B) m6 model coefficients, with associated p values. (C) Fixed-effect sizes with lower and upper confidence levels.
| Model summary | Likelihood-ratio test (m5, m6) | ||||
|---|---|---|---|---|---|
| Fixed effects | AIC | Df | |||
| m5 | 13,610 | 39.35 | 29 | 0.095 | |
| m6 | 13,591 | ||||
Figure 3Annotation of an example sentence (translation: The red neon lamp makes his/her hair iridescent), in its original unmodified natural timing (A) and transformed isochronous forms at the accent (B) and syllable (C) levels. For each panel, from top to bottom: spectrogram with target boundaries used for the transformation overlaid in dashed lines, accent group onsets (red), syllable onsets (orange), phonemes and words.