| Literature DB >> 23945740 |
Syaheerah Lebai Lutfi1, Fernando Fernández-Martínez, Jaime Lorenzo-Trueba, Roberto Barra-Chicote, Juan Manuel Montero.
Abstract
We describe the work on infusion of emotion into a limited-task autonomous spoken conversational agent situated in the domestic environment, using a need-inspired task-independent emotion model (NEMO). In order to demonstrate the generation of affect through the use of the model, we describe the work of integrating it with a natural-language mixed-initiative HiFi-control spoken conversational agent (SCA). NEMO and the host system communicate externally, removing the need for the Dialog Manager to be modified, as is done in most existing dialog systems, in order to be adaptive. The first part of the paper concerns the integration between NEMO and the host agent. The second part summarizes the work on automatic affect prediction, namely, frustration and contentment, from dialog features, a non-conventional source, in the attempt of moving towards a more user-centric approach. The final part reports the evaluation results obtained from a user study, in which both versions of the agent (non-adaptive and emotionally-adaptive) were compared. The results provide substantial evidences with respect to the benefits of adding emotion in a spoken conversational agent, especially in mitigating users' frustrations and, ultimately, improving their satisfaction.Entities:
Mesh:
Year: 2013 PMID: 23945740 PMCID: PMC3812615 DOI: 10.3390/s130810519
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1.The architecture of the need-inspired emotion model with the HiFi spoken conversational agent (NEMOHIFI).
Datasets re-clustered according to similarity of score into all possible combinations of classes.
|
| |||||
|---|---|---|---|---|---|
| Five ((5) original class) | 1 | 2 | 3 | 4 | 5 |
| % distribution (U, A) | 2, 1 | 5, 16 | 22, 36 | 33, 29 | 38, 18 |
| Four (4) | - | 1, 2 | 3 | 4 | 5 |
| % distribution (U, A) | 7, 17 | 22, 36 | 33, 29 | 38, 18 | |
| Three (version 1)(3V1) | - | 1, 2 | 3 | 4, 5 | - |
| % distribution (U, A) | 7, 17 | 22, 36 | 71, 49 | ||
| Three (version 2)(3V2) | - | 1, 2, 3 | - | 4 | 5 |
| % distribution (U, A) | 29, 53 | 33, 29 | 38, 18 | ||
| Two (version 1) (2V1) | - | 1, 2, 3 | - | 4, 5 | - |
| % distribution (U, A) | 29, 53 | 71, 47 | |||
| Two (version 2)(2V2) | - | 1, 2 | - | 3, 4, 5 | - |
| % distribution (U, A) | 7, 17 | 93, 83 | |||
Comparisons of significant improvements in classification accuracies in detecting the satisfaction score from conversational features (for both the user and annotator datasets).
|
| ||||||||
|---|---|---|---|---|---|---|---|---|
| Five | 38.0 | 36.0 | - | 49.3 | - | 44.6 | - | 51.3 |
| Four | 38.0 | 36.0 | - | 53.1 | - | 43.4 | - | 52.0 |
| Three (version 1) | 71.0 | 47.0 | - | 64.0 | - | 61.1 | - | 62.5 |
| Three (version 2) | 38.0 | 53.0 | - | - | 50.7 | - | - | - |
| Two (version 1) | 71.0 | 53.0 | - | 75.0 | - | 74.4 | 69.4 | |
| Two (version 2) | 93.0 | 83.0 | - | - | - | - | - | - |
SiLog = Functions.SimpleLogistics, SMO = Functions.SMO, Ord = Meta.Ordinal. U = user data, A = annotator data. Results were truncated to display only the best statistically significant classification improvements (at p < 0.05).
Figure 2.Summarized interaction chart for the annotators' dataset.
Figure 3.Demonstration room.
Figure 4.The virtual questionnaires. (a) Ratings for the current agent; (b) ratings for the next agent, along with the reference of the previous one, in which users were allowed to edit.
Figure 5.Comparisons of preference and satisfaction ratings between BASEHIFI and NEMOHIFI. (a) Preference percentage for both agents; (b) mean satisfaction ratings for both agents.
t-test results comparing the mean subjective ratings between BASEHIFI and NEMOHIFI.
|
| |||||
|---|---|---|---|---|---|
| PERFORMANCE | 1.13 | 1.41 | −0.28 | −1.43 | 0.16 |
| RESPONSE | 1.46 | 1.29 | 0.17 | 1.05 | 0.30 |
| VOICE | 1.67 | 2.14 | −0.47 | −3.31 | 0.001 |
| ATTITUDE | 0.90 | 2.11 | −1.21 | −7.76 | 0.000 |
| NATURALNESS | 0.42 | 1.54 | −1.32 | −7.41 | 0.000 |
| GSS | 1.00 | 1.69 | −1.21 | −4.89 | 0.000 |
denotes highly significant result at p < 0.001.
The objective metrics with significant differences between BASEHIFI and NEMOHIFI.
|
| |||||
|---|---|---|---|---|---|
| Ave_sentence_recog_conf | 0.75 | 0.73 | 0.02 | 2.13 | 0.04 |
| Exec_actions | 30.47 | 27.68 | 2.79 | 2.77 | 0.01 |
| TT | 21.79 | 19.97 | 1.82 | 3.52 | 0.001 |
denotes significant result at p < 0.05.