| Literature DB >> 30067853 |
Sebastian Loth1,2, Katharina Jettka3, Manuel Giuliani4, Stefan Kopp1,5, Jan P de Ruiter6.
Abstract
Interactions with artificial agents often lack immediacy because agents respond slower than their users expect. Automatic speech recognisers introduce this delay by analysing a user's utterance only after it has been completed. Early, uncertain hypotheses of incremental speech recognisers can enable artificial agents to respond more timely. However, these hypotheses may change significantly with each update. Therefore, an already initiated action may turn into an error and invoke error cost. We investigated whether humans would use uncertain hypotheses for planning ahead and/or initiating their response. We designed a Ghost-in-the-Machine study in a bar scenario. A human participant controlled a bartending robot and perceived the scene only through its recognisers. The results showed that participants used uncertain hypotheses for selecting the best matching action. This is comparable to computing the utility of dialogue moves. Participants evaluated the available evidence and the error cost of their actions prior to initiating them. If the error cost was low, the participants initiated their response with only suggestive evidence. Otherwise, they waited for additional, more confident hypotheses if they still had time to do so. If there was time pressure but only little evidence, participants grounded their understanding with echo questions. These findings contribute to a psychologically plausible policy for human-robot interaction that enables artificial agents to respond more timely and socially appropriately under uncertainty.Entities:
Mesh:
Year: 2018 PMID: 30067853 PMCID: PMC6070273 DOI: 10.1371/journal.pone.0201516
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Setting of the study.
The setting for the ghost participants including the information panel, eye tracker, control panel and eye tracker control screen. The participant wears passive noise insulating headphones.
Fig 2Bartending robot.
The robot is shown at its bar about to grab a bottle of water for serving.
Fig 3Information panel for the ghosts.
The panel covered the entire computer screen in front of the participants. Translations are provided in blue and were not part of the experimental design.
Fig 4Control panel for the ghosts.
The panel was shown in the upper left corner of the screen. The remaining screen was light grey matching the panel’s background colour. Translations are provided in blue and were not part of the experimental design.
Number of scripted drink orders, drink orders detected by the sensors and number of correctly served drinks.
| Type of order | Condition | Number of scripted orders | Number of detected orders | Number of correctly served orders | Ratio of correctly served orders |
|---|---|---|---|---|---|
| Individual order | 96 | 124 | 115 | 93% | |
| 96 | 133 | 99 | 74% | ||
| Group orders | 32 | 19 | 18 | 95% | |
| 32 | 20 | 14 | 70% | ||
| Total number of drinks | 160 | 162 | 151 | 93% | |
| 160 | 173 | 127 | 73% |
A trial was scored as correct if the customers’ requests were served according to recogniser data. Only those drinks that were served in correct trials contributed to the number of correctly served drinks. Please note that a group order comprised of two drinks, thus 32 group orders contribute 64 drinks.
States of the indicators when the ghosts acknowledged new customers.
| Indicator | State | Certain | Uncertain | ||
|---|---|---|---|---|---|
| Number | Percent | Number | Percent | ||
| Customer visible | 95 | 99% | 96 | 100% | |
| 1 | 1% | 0 | 0% | ||
| Customer at bar | 92 | 96% | 89 | 93% | |
| 0 | 0% | 7 | 7% | ||
| 4 | 4% | 0 | 0% | ||
| Customer facing the bar | 93 | 97% | 95 | 99% | |
| 3 | 3% | 1 | 1% | ||
| Customer saying something | 8 | 8% | 33 | 34% | |
| 88 | 92% | 63 | 66% | ||
First action that the ghosts selected for acknowledging their new customers.
| Selected action | Certain | Uncertain | ||
|---|---|---|---|---|
| Number | Percent | Number | Percent | |
| Looking at customer | 76 | 79% | 70 | 73% |
| Verbal greeting | 18 | 19% | 22 | 23% |
| Other verbal utterance | 1 | 1% | 2 | 2% |
| No action | 1 | 1% | 2 | 2% |
In ‘No action’ trials, the ghosts waited for the customers to place an order before initiating any action.
Summary of the number of cases, response times (RT), the corresponding z-scores (z-RT), the number of hypotheses and their confidence levels as a function of the type of request and the condition.
| Individual orders | Group orders | Questions about menu | ||||
|---|---|---|---|---|---|---|
| 115 | 98 | 17 | 14 | 28 | 32 | |
| 8249 | 8501 | 10992 | 20970 | 4909 | 5573 | |
| 6561 | 6923 | 6677 | 20637 | 2821 | 6173 | |
| 0.00 | 0.06 | 0.41 | 1.32 | -0.56 | -0.49 | |
| 0.91 | 0.93 | 0.64 | 1.63 | 0.42 | 0.81 | |
| 6191 | 3041 | 9423 | 5473 | 4838 | 3433 | |
| 3493 | 3049 | 3369 | 5573 | 2835 | 2576 | |
| 0.36 | -0.52 | 1.38 | 0.03 | -0.03 | -0.41 | |
| 0.85 | 0.81 | 0.93 | 1.30 | 0.77 | 0.63 | |
| 1.17 | 3.38 | 1.12 | 5.36 | 1.04 | 1.84 | |
| 0.46 | 1.46 | 0.49 | 3.32 | 0.19 | 0.95 | |
| 100 | 73 | 100 | 74 | 100 | 61 | |
| 20 | 15 | 27 | ||||
Only correct trials contributed to the measures. An individual order, a group order and a question were excluded for extremely fast/slow RT.
Number of cases, serving time (RT), their participant-wise z-scores and the corresponding standard deviations from the first appearance of the customers until the first drink in the trial was served as a function of the type of request, preceding menu related questions and condition (certain/uncertain).
| Individual orders | Group orders | |||||
|---|---|---|---|---|---|---|
| Preceding question | No question | No question | ||||
| 31 | 17 | 43 | 44 | 16 | 13 | |
| 61169 | 43262 | 29911 | 29220 | 30504 | 46130 | |
| 56001 | 24214 | 24754 | 8185 | 11605 | 26848 | |
| 0.72 | 0.14 | -0.39 | -0.27 | -0.19 | 0.52 | |
| 1.17 | 0.95 | 0.81 | 0.44 | 0.66 | 1.30 | |
Relative dwell times on indicators.
| Indicator | Relative dwell time |
|---|---|
| Customer visible | 0.11 |
| Customer torso orientation | 0.15 |
| Customer at bar | 0.12 |
| Speech hypotheses | 0.42 |
| Elsewhere | 0.21 |
The relative dwell time was computed by dividing the time span that a participant dwelled on each indicator by the summed dwell time on the information panel, and averaging across participants. This analysis is comparably coarse because the tracking accuracy was reduced as a result of the large head turns.
Fig 5Flowchart of suggested human-robot interaction policy.
The user’s speech utterance triggers the ASR that issues hypotheses about it. Comparing the error cost of the response action to the evidence decides whether to wait, ask an echo question or perform the action.