| Literature DB >> 26539138 |
Frederick G Conrad1, Michael F Schober2, Matt Jans3, Rachel A Orlowski4, Daniel Nielsen5, Rachel Levenstein6.
Abstract
This study investigates how an onscreen virtual agent's dialog capability and facial animation affect survey respondents' comprehension and engagement in "face-to-face" interviews, using questions from US government surveys whose results have far-reaching impact on national policies. In the study, 73 laboratory participants were randomly assigned to respond in one of four interviewing conditions, in which the virtual agent had either high or low dialog capability (implemented through Wizard of Oz) and high or low facial animation, based on motion capture from a human interviewer. Respondents, whose faces were visible to the Wizard (and videorecorded) during the interviews, answered 12 questions about housing, employment, and purchases on the basis of fictional scenarios designed to allow measurement of comprehension accuracy, defined as the fit between responses and US government definitions. Respondents answered more accurately with the high-dialog-capability agents, requesting clarification more often particularly for ambiguous scenarios; and they generally treated the high-dialog-capability interviewers more socially, looking at the interviewer more and judging high-dialog-capability agents as more personal and less distant. Greater interviewer facial animation did not affect response accuracy, but it led to more displays of engagement-acknowledgments (verbal and visual) and smiles-and to the virtual interviewer's being rated as less natural. The pattern of results suggests that a virtual agent's dialog capability and facial animation differently affect survey respondents' experience of interviews, behavioral displays, and comprehension, and thus the accuracy of their responses. The pattern of results also suggests design considerations for building survey interviewing agents, which may differ depending on the kinds of survey questions (sensitive or not) that are asked.Entities:
Keywords: comprehension; dialog capability; facial animation; social signals; survey interviewing; virtual agent
Year: 2015 PMID: 26539138 PMCID: PMC4611966 DOI: 10.3389/fpsyg.2015.01578
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Figure 1The Derek model that formed the basis of the four virtual interviewers.
Facial animation features of virtual interviewers.
| Head moves | No | Yes, even when “listening” |
| Face moves | Only mouth | Yes |
| Eyes move | No | Yes |
| Eyes blink | No | Yes |
| Mouth movement | Only opens and closes during speech, but does not change shape | Mouth forms appropriate shapes for sounds being produced |
Dialog capability features of virtual interviewers.
| Reads question as worded | Yes | Yes |
| Understands spoken answers | Yes | Yes |
| Repeats question if asked | Yes | Yes |
| Understands explicit requests for clarification | Yes | Yes |
| Provides clarification when explicitly requested | No: presents neutral probe (e.g., “Whatever it means to you”; “Let me repeat the question”) | Yes: reads definition |
| Offers clarification when it seems needed (based on respondent's verbal and visual behavior) | No | Yes |
Wizard's decision rules.
| Give respondent 3 min to familiarize him/herself with packet, and ignore respondent if he/she says he/she is ready | Give respondent 3 min to familiarize him/herself with packet, but begin interview if respondent says he/she is ready |
| Wait 10 s between transition and question clip, despite respondent behavior | Wait for respondent to look at virtual interviewer before presenting next question clip |
| Do not modify presentation of clips based on respondent's gaze or attention | Stop presenting a clip if respondent stops looking at virtual interviewer |
| Send research assistant to help respondent if in trouble | Use virtual interviewer to assist respondent if in trouble. If not successful send research assistant |
| If respondent seems hesitant or confused, do nothing | If respondent seems hesitant or confused, then offer help |
| If respondent asks for help, then present neutral probe | If respondent asks for help not related to scenario, then present neutral probe |
| If respondent asks for help pertaining to scenario, then present entire definition | |
| If respondent asks for help with specific mention of key concept, then present partial definition | |
| If respondent interrupts virtual interviewer, then finish presenting clip. Wait for respondent to repeat him/herself | If respondent interrupts virtual interviewer, then present waiting clip and address respondent's concern immediately |
Figure 2Response accuracy (percentage of survey responses that matched what the official definition would require) for straightforward and complicated scenarios (error bars represent SE's).
Percentage of question-answer sequences in which clarification and related speech occurred (SE's in parentheses).
| Respondent explicit requests for clarification (“What do you mean by ‘furniture’?”) | 29.2 (5.9) | 24.1 (5.8) | ||||
| Respondent implicit requests for clarification (“I don't know whether to count that or not”) | 6.3 (1.8) | 4.4 (1.8) | 4.6 (2.1) | 6.2 (2.1) | ||
| Virtual interviewer comments on respondent's confusion (“It sounds like you're having some trouble.”) | ||||||
| Virtual interviewer offers clarification (“Can I help you?”) | 25.8 (3.4) | 25.2 (3.2) | ||||
| Respondent rejects offer | 5.3 (1.6) | 3.2 (1.1) | 5.1 (3.4) | 3.4 (1.4) | ||
| Virtual interviewer presents definition | 29.6 (5.1) | 23.3 (5.0) | ||||
Statistically reliable and marginal differences are in bold face.
Figure 3Percentage of time that respondents looked at the virtual interviewer and the scenario packet, on average, across the four conditions, broken down by whether they were answering questions that mapped onto the scenario in a straightforward (lighter shades) or complicated (darker shades) way. Gaze elsewhere (not at the virtual interviewer or scenario packet) was so rare (less than 1% of the time in all conditions) that it is not plotted.
Figure 4Respondents' rates of aggregated acknowledgment behaviors (verbal back channels, nods, head shakes, other head movements, and other body and facial movements) per speaking turn (error bars represent SE's).
Respondents' subjective ratings of the virtual interviewers, presented in the order the ratings were elicited (SE's in parentheses).
| How comfortable were you with Derek at the start of the session? | 1: Not at all comfortable | 3.83 (0.27) | 3.22 (0.27) | 3.33 (0.27) | 3.00 (0.26) | |||
| As the interview progressed, did your comfort with Derek increase, decrease, or stay the same? | 1: Decrease | 1.78 (0.15) | 1.78 (0.15) | 1.56 (0.15) | 1.21 (0.15) | |||
| How natural was the interaction with Derek? | 1: Not at all natural | 3.22 (0.27) | 2.61 (0.27) | 3.17 (0.27) | 2.74 (0.27) | |||
| How often did Derek seem to act on his own? | 1: Never | 2.56 (0.25) | 2.44 (0.25) | 3.83 (0.25) | 2.74 (0.25) | |||
| Would you say that Derek acted more like a computer or a person? | 1: Just like a computer | 1.44 (0.152) | 1.61 (0.15) | 2.11 (0.15) | 1.68 (0.15) | |||
| How much did you enjoy interacting with Derek? | 1: Did not enjoy at all | 3.06 (0.22) | 3.22 (0.22) | 3.83 (0.22) | 3.37 (0.22) | |||
| How frustrating was it to be interviewed by Derek? | 1: Not at all frustrating | 1.88 (0.24) | 2.11 (0.23) | 1.50 (0.23) | 1.84 (0.22) | |||
| I felt that Derek was… | 1: Impersonal | 1.94 (0.23) | 2.35 (0.24) | 3.28 (0.23) | 2.68 (0.23) | |||
| I felt that Derek was… | 1: Distant | 2.17 (0.24) | 2.77 (0.25) | 3.19 (0.26) | 2.79 (0.24) | |||
| I felt that Derek was… | 1: Inexpressive | 2.56 (0.29) | 2.78 (0.29) | 3.00 (0.29) | 2.84 (0.28) | |||
| I felt that Derek was… | 1: Insensitive | 2.56 (0.22) | 2.94 (0.22) | 3.22 (0.22) | 3.32 (0.22) | |||
Statistically significant and marginal effects are in bold.