| Literature DB >> 28443035 |
Mathias Barthel1, Antje S Meyer2,3, Stephen C Levinson1,3.
Abstract
In conversation, turn-taking is usually fluid, with next speakers taking their turn right after the end of the previous turn. Most, but not all, previous studies show that next speakers start to plan their turn early, if possible already during the incoming turn. The present study makes use of the list-completion paradigm (Barthel et al., 2016), analyzing speech onset latencies and eye-movements of participants in a task-oriented dialogue with a confederate. The measures are used to disentangle the contributions to the timing of turn-taking of early planning of content on the one hand and initiation of articulation as a reaction to the upcoming turn-end on the other hand. Participants named objects visible on their computer screen in response to utterances that did, or did not, contain lexical and prosodic cues to the end of the incoming turn. In the presence of an early lexical cue, participants showed earlier gaze shifts toward the target objects and responded faster than in its absence, whereas the presence of a late intonational cue only led to faster response times and did not affect the timing of participants' eye movements. The results show that with a combination of eye-movement and turn-transition time measures it is possible to tease apart the effects of early planning and response initiation on turn timing. They are consistent with models of turn-taking that assume that next speakers (a) start planning their response as soon as the incoming turn's message can be understood and (b) monitor the incoming turn for cues to turn-completion so as to initiate their response when turn-transition becomes relevant.Entities:
Keywords: eye-movements; intonation; planning; production; task-oriented dialogue; turn-taking
Year: 2017 PMID: 28443035 PMCID: PMC5387091 DOI: 10.3389/fpsyg.2017.00393
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Figure 1Two examples illustrating the intonation contours used in conditions 1 (baseline condition), 3 (no boundary tone condition), and 4 (downstepped condition). Condition 1 (and equally condition 2, not displayed here) contains no downsteps on non-final list items and a low boundary tone at the turn end. By contrast, condition 3 contains no final low boundary tone. Condition 4 contains downsteps and a final low boundary tone.
Figure 2Example item displays. (A) Confederate display. (B) Participant display. Reproduced from Barthel et al. (2016).
Response latencies by condition.
| 1 | − | + | − | 1,010 (12) | 988 |
| 2 | + | + | − | 922 (12) | 990 |
| 3 | − | − | − | 1077 (14) | 560 |
| 4 | − | + | + | 873 (18) | 402 |
Means and standard errors (SE) in ms.
Response timing model and .
| (Intercept) | 953.482 | 53.4 | 17.830 | ||
| lexical cue_no | 90.190 | 19.6 | 4.602 | ||
| boundary tone cue_no | 60.344 | 22.0 | 2.741 | ||
| downsteps_no | 8.663 | 34.4 | 0.252 | n.s. | |
| sentence_duration | −48.974 | 7.1 | −6.836 | ||
| recording_noticed_yes | 74.624 | 73.087 | 1.021 | n.s. |
Formula: RT ~ 1 + LEX + BT + DWNS + recording_noticed + sentence_duration_centered + (1 + LEX + BT + DWNS | subject) + (1 + LEX + BT + DWNS | item). Presences of cues were used as reference levels, so that effects shown are effects of absence of cues. Asterisks indicate significance levels of effects. *p < 0.05;
p = 0.01;
p < 0.001.
Figure 3Proportions and standard errors of looks to the target object time-locked to the onset of the last object noun of the confederate turn (0 ms).
Eye-movement results of by-subject analysis.
| Cond. 1 vs. cond. 2 (±LEX) | t1 × cond. | 3.30 | 0.29 | ||
| t2 × cond. | −2.98 | 0.21 | |||
| t3 × cond. | −0.51 | 0.19 | |||
| Cond. 1 vs. cond. 3 (±BT) | t1 × cond. | 0.47 | 0.34 | n.s. | |
| t2 × cond. | −0.52 | 0.29 | n.s. | ||
| t3 × cond. | 0.17 | 0.19 | n.s. | ||
| Cond. 1 vs. cond. 3 (±DWNS) | t1 × cond. | −0.25 | 0.35 | n.s. | |
| t2 × cond. | −0.24 | 0.34 | n.s. | ||
| t3 × cond. | 0.23 | 0.26 | n.s. |
Formula = emplogit ~ (time1 + time2 + time3) * condition + (1 + (time1 + time2 + time3) * condition | subject/item) t2 = TIME2, t3 = TIME3. β's indicate effects of absence of cues. Asterisks indicate significance levels of effects.
p < 0.05;
**p < 0.01; ***p < 0.001. By-item analysis yielded a similar pattern of results.