| Literature DB >> 33793633 |
Abhisek Tiwari1, Tulika Saha1, Sriparna Saha1, Shubhashis Sengupta2, Anutosh Maitra2, Roshni Ramnani2, Pushpak Bhattacharyya1.
Abstract
PURPOSE: Existing virtual agents (VAs) present in dialogue systems are either information retrieval based or static goal-driven. However, in real-world situations, end-users might not have a known and fixed goal beforehand for the task, i.e., they may upgrade/downgrade/update their goal components in real-time to maximize their utility values. Existing VAs are unable to handle such dynamic goal-oriented situations.Entities:
Year: 2021 PMID: 33793633 PMCID: PMC8016253 DOI: 10.1371/journal.pone.0249030
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Performance of a traditional dialogue agent and our proposed agent in dynamic goal setting scenario.
Fig 2Dynamic Goal Driven Virtual Agent (DGDVA).
Statistics of DevVA dataset.
| Entries | Values |
|---|---|
| Total No. of dialogues | 1000 |
| Total Utterances | 8335 |
| Average Dialogue length | 8 |
| No. of Slots | 18 |
Sample dialogue conversation 3.
| User | Agent |
|---|---|
| Hi, I want to buy a new phone | Sure, please let me know what kind of phone do you need? |
| It should be an Acer phone with high battery capacity. | Is it all your requirements? |
| I watch movie regularly, so display should be big. | Great, you should take Liquid Z6 that matches with your choice. It has a 4080 mah battery capacity and 401 inch display. |
| What are other features? | This phone contains 3 GB Ram, 13 MP camera …. 32 GB internal memory. |
| Nice, does it contain single slot for sim? | Yes, there is single slot for sim. |
| No, it is not sufficient for me. I need at least dual sim phone. | No issue, you can have Liquid Zade Primo which has two sim slots. |
| I have not heard its name, is it a latest phone? | This phone has been released in 2015 |
| it is an old phone, see some latest one of 2016 | Yes, you can take Iconic Talk S by Acer that will be perfect for you. It is released in 2016 only. |
| Yes please book it or me. | Sure, it has been booked for you. |
| Thanks buddy | You are most welcome. Hope you will enjoy this phone. |
Fig 3Sentiment class distribution of utterances of DevVA dataset.
Sample dialogue conversation 1.
| User | Agent |
|---|---|
| Hi, I am in search of a good phone | hello, any specific phone do you need? |
| It should have big display and high memory | Is it all that you need? |
| Yes, and color should be black. | you can buy F3 by Oppo which has internal Ram of 64 GB and display size of 401 pixel. |
| What is its Primary Camera? | It is 13MP |
| Oh, but 13 MP is less, I take many pictures. show something with at least 15 MP. | Okay then V5 Plus by Vivo is perfect for you. |
| What is its weight | It is 184 gms. |
| Okay, I like this phone. Please Pack it for me. | Good choice, I am booking it for you. |
Sample dialogue conversation 2.
| User | Agent |
|---|---|
| hi,can you help me in choosing a good phone? | Yeah Sure, What kind of phone are you looking? |
| It should be 2016 or latest released phone with 6 GB or more ram | Do you need any other features? |
| No, this is all my minimum requirements | you can buy M2017 by Gionee which has 6 GB Ram and it is released in 2016. |
| What is its Cost? | Its cost is 1150 euro. |
| Oh it is very costly, Is there anything below 500 euro? | You can buy 3T by OnePlus which costs only 440 euro. |
| What is its color? | The color of the phone is gunmetal. |
| But, my favorite color is black only | Then, you should take M6s Plus by Gionee. |
| Nice, Please Pack it for me. | Yeah, I am booking it for you. |
Fig 4System architecture of the proposed methodology.
Fig 5Dialogue policy optimization through a Deep Q Network(DQN).
Fig 6Actor Critic (AC) training architecture.
Fig 7Training flow of the proposed RL based framework.
Fig 8BERT for joint intent classification and slot filling.
Results for intent classification and slot tagging tasks.
| Model | Task | Accuracy(in %) | F1-Score |
|---|---|---|---|
| Joint BERT | Intent Classification | 93.11 | 0.874 |
| Slot Tagging | 87.39 | 0.866 | |
| Joint SFIDCN | Intent Classification | 82.90 | 0.808 |
| Slot Tagging | 85.18 | 0.827 | |
| SF-ID | Intent Classification | 92.55 | 0.865 |
| Slot Tagging | 85.04 | 0.839 |
Results of different models for sentiment classification.
| Model | Accuracy | F1-Score |
|---|---|---|
| GRU | 94.25% | 0.941 |
| LSTM | 94.31% | 0.941 |
| Bi-LSTM | 94.61% | 0.943 |
| Bi-LSTM + Attention | 95.02% | 0.946 |
| Pre trained BERT | 91.37% | 0.910 |
| Pre trained XLNet |
Results obtained by different dialogue agents.
| Agent | Success rate | Avg. Reward | Avg. Dialogue Length |
|---|---|---|---|
| Random Agent | 0.011 | -232 | 15.25 |
| Rule Agent | 0.000 | -127 | 11.00 |
| AC | 0.6615 | 36.11 | 11.32 |
| AC | 0.6753 | 37.18 | 11.20 |
| Vanilla DQN | 0.8413 | 68.41 | 9.30 |
| Vanilla DQN | 0.8573 | 70.82 | 8.91 |
| DDQN | 0.8693 | 73.02 | 8.68 |
|
| 0.8610 | 72.14 | 8.52 |
|
|
Fig 9Learning Curves: (a) Learning graphs of different agents, b) Reward graph over episodes: Vanilla DQN and DDQN agents, c) Reward graph over episodes: Actor Critic, Rule agnet and Random agent, d) Reward graph over episodes: DDQN agents, (e) Success rate over episodes: DDQN agents, (f) Dialogue length over episodes: DDQN agents.
Statistical significance test result: p value at 5% significance level.
| Model | Success rate | Dialogue Length |
|---|---|---|
| AC | 1.03e−08 | 1.51e−08 |
| AC | 8.77e−08 | 8.62e−09 |
| VDQN | 3.59 | 3.22 |
| VDQN | 3.79 | 4.85 |
| DQN | 4.09 | 6.77 |
Fig 10Human score for different baselines and proposed model.
Fig 11VA performance during testing—sample1.
Fig 12VA performance during testing—sample2.
Fig 13A sample that shows new goal formulation error.
Performance of the SimpleDS for dynamic goal setting.
| Evaluation Metric | Result |
|---|---|
| Avg Episodic Reward | -21.16 |
| Success rate | 0.003 |
| Avg. Dialogue length | 8.82 |
SentiVA’s performance for dynamic goal setting scenarios.
| Evaluation Metrics | Result |
|---|---|
| Avg. Reward | -0.73 |
| Success rate | 0.003 |
| Avg. Dialogue Length | 15.12 |
TSCP model performance for dynamic goal setting scenarios.
| Model | Success rate | Avg Dialogue Length | Entity match rate | Success F1 | BLEU |
|---|---|---|---|---|---|
| TSCP | 0.56 | 8.96 | 0.701 | 0.842 | 0.274 |
GO-Bot performance for dynamic goal setting scenarios.
| Evaluation Metrics | Result |
|---|---|
| Avg. Episodic Reward | -34.90 |
| Success rate | 0.001 |
| Avg. Dialogue Length | 14.96 |