| Literature DB >> 29311888 |
Akira Taniguchi1, Tadahiro Taniguchi1, Angelo Cangelosi2.
Abstract
In this paper, we propose a Bayesian generative model that can form multiple categories based on each sensory-channel and can associate words with any of the four sensory-channels (action, position, object, and color). This paper focuses on cross-situational learning using the co-occurrence between words and information of sensory-channels in complex situations rather than conventional situations of cross-situational learning. We conducted a learning scenario using a simulator and a real humanoid iCub robot. In the scenario, a human tutor provided a sentence that describes an object of visual attention and an accompanying action to the robot. The scenario was set as follows: the number of words per sensory-channel was three or four, and the number of trials for learning was 20 and 40 for the simulator and 25 and 40 for the real robot. The experimental results showed that the proposed method was able to estimate the multiple categorizations and to learn the relationships between multiple sensory-channels and words accurately. In addition, we conducted an action generation task and an action description task based on word meanings learned in the cross-situational learning scenario. The experimental results showed that the robot could successfully use the word meanings learned by using the proposed method.Entities:
Keywords: Bayesian model; cross-situational learning; lexical acquisition; multimodal categorization; symbol grounding; word meaning
Year: 2017 PMID: 29311888 PMCID: PMC5742219 DOI: 10.3389/fnbot.2017.00066
Source DB: PubMed Journal: Front Neurorobot ISSN: 1662-5218 Impact factor: 2.650
Figure 1Overview of the cross-situational learning scenario as the focus of this study; the robot obtains multimodal information from multiple sensory-channels in a situation and estimates the relationships between words and sensory-channels.
Figure 2Proposed graphical model for multichannel categorizations and for learning word meaning; the action, position, color, and object categories are represented by a component in Gaussian mixture models (GMMs). A word distribution is related to a category on GMMs. Gray nodes represent observed variables. Each variable is explained in the description of the generative model in Section 3.2.
Learning algorithm based on Gibbs sampling.
| 1: |
| 2: Setting of hyperparameters { |
| 3: Initialization of parameters and latent variables { |
| 4: |
| 5: |
| 6: |
| 7: |
| 8: |
| 9: |
| 10: |
| 11: |
| 12: |
| 13: |
| 14: |
| 15: |
| 16: |
| 17: |
| 18: |
| 19: |
| 20: |
| 21: |
| 22: |
| 23: |
| 24: |
| 25: |
| 26: |
| 27: |
| 28: |
| 29: |
| 30: |
| 31: |
| 32: |
| 33: |
| 34: |
| 35: |
Figure 3Procedure for obtaining and processing data.
Figure 4(A) Word probability distribution across the multiple categories; darker shades represent higher probability values. The pair consisting of a letter and a number on the left of the table is the index of the word distribution, which represents the sensory-channel related to the word distribution and the index of the category. Note that category indices are not shown; they are merged and not used because the number of the categories is automatically estimated by the nonparametric Bayesian method. (B) Learning result of the position category; for example, the index of position category p1 corresponds to the word “front.” The point group of each color represents each Gaussian distribution of the position category. The crosses in the different colors represent the object positions of the learning data. Each color represents a position category. (C) Example of categorization results of object category; (D) example of categorization results of color category.
Experimental results of the CSL task for 20 and 40 trials.
| Method | ROW | ARI_a | ARI_p | ARI_o | ARI_c | EAR_ |
|---|---|---|---|---|---|---|
| Proposed | 0 | 0.300 | 0.606 | 0.408 | 0.782 | |
| w/o MEC-II | 0 | 0.317 | 0.648 | 0.338 | 0.805 | |
| mMLDA | 0 | 0.316 | 0.428 | 0.277 | 0.756 | – |
| Proposed | 20 | 0.290 | 0.564 | 0.332 | 0.762 | 0.727 |
| w/o MEC-II | 20 | 0.342 | 0.486 | 0.436 | 0.755 | 0.598 |
| mMLDA | 20 | 0.267 | 0.494 | 0.369 | 0.776 | – |
| Proposed | 40 | 0.324 | 0.493 | 0.354 | 0.780 | 0.556 |
| w/o MEC-II | 40 | 0.318 | 0.486 | 0.347 | 0.812 | 0.529 |
| mMLDA | 40 | 0.356 | 0.479 | 0.312 | 0.771 | – |
| Proposed | 60 | 0.282 | 0.460 | 0.295 | 0.783 | 0.381 |
| w/o MEC-II | 60 | 0.311 | 0.454 | 0.326 | 0.750 | 0.406 |
| mMLDA | 60 | 0.294 | 0.487 | 0.403 | 0.724 | – |
| ALL | 100 (no word) | 0.325 | 0.431 | 0.346 | 0.751 | – |
| Proposed | 0 | 0.375 | 0.540 | 0.366 | 0.870 | |
| w/o MEC-II | 0 | 0.383 | 0.524 | 0.333 | 0.805 | 0.834 |
| mMLDA | 0 | 0.388 | 0.594 | 0.377 | 0.822 | – |
| Proposed | 40 | 0.368 | 0.543 | 0.313 | 0.835 | |
| w/o MEC-II | 40 | 0.417 | 0.577 | 0.320 | 0.842 | 0.780 |
| mMLDA | 40 | 0.340 | 0.600 | 0.377 | 0.856 | – |
Bold and underscore indicate the highest evaluation values, and bold indicates the second highest evaluation values.
Figure 5Example of results of the action generation task in the iCub simulator. (A) Reach front blue cup. (B) Grasp right green ball. (C) Touch left red box.
Results of evaluation values for the action generation using the results of the CSL for 40 trials (ROW is 0%).
| Method | WAR | OAR |
|---|---|---|
| Proposed | ||
| w/o MEC-II | ||
| mMLDA | 0.260 | 0.667 |
Bold and underscore indicate the highest evaluation values, and bold indicates the second highest evaluation values.
Experimental results of action description task for 20 and 40 trials.
| Method | Trials | ROW | F1 | ACC |
|---|---|---|---|---|
| Proposed | 20 | 0 | ||
| w/o MEC-II | 20 | 0 | ||
| mMLDA | 20 | 0 | 0.401 | 0.469 |
| Proposed | 20 | 40 | ||
| w/o MEC-II | 20 | 40 | ||
| mMLDA | 20 | 40 | 0.319 | 0.352 |
| Proposed | 40 | 0 | ||
| w/o MEC-II | 40 | 0 | ||
| mMLDA | 40 | 0 | 0.474 | 0.560 |
| Proposed | 40 | 40 | ||
| w/o MEC-II | 40 | 40 | ||
| mMLDA | 40 | 40 | 0.479 | 0.569 |
Bold and underscore indicate the highest evaluation values, and bold indicates the second highest evaluation values.
Figure 6Confusion matrix of results of the action description task using the learning result for 20 and 40 trials. (A) 20 trials; ROW values are (top) 0 and (bottom) 40. (B) 40 trials; ROW values are (top) 0 and (bottom) 40.
Figure 7All of the objects used in the real experiments (14 objects including four types and four colors).
Figure 8(A) Word probability distribution across the multiple categories; (B) learning result of position category; each color of the point group represents each of the Gaussian distributions of the position category. The crosses of each color represent the object positions of the learning data. Each color represents a position category. The circle represents the area of the white circular table. (C) Example of categorization results of object category; (D) example of categorization results of color category.
Experimental results of the CSL task for 25 and 40 trials.
| Method | ROW | ARI_a | ARI_p | ARI_o | ARI_c | EAR_ |
|---|---|---|---|---|---|---|
| Proposed | 0 | 0.239 | 0.932 | 0.201 | 0.720 | |
| w/o MEC-II | 0 | 0.299 | 0.971 | 0.207 | 0.717 | 0.723 |
| mMLDA | 0 | 0.255 | 0.959 | 0.226 | 0.703 | – |
| Proposed | 30 | 0.297 | 0.879 | 0.227 | 0.702 | |
| w/o MEC-II | 30 | 0.242 | 0.980 | 0.218 | 0.683 | 0.601 |
| mMLDA | 30 | 0.296 | 0.893 | 0.256 | 0.730 | – |
| Proposed | 50 | 0.240 | 0.905 | 0.257 | 0.681 | 0.604 |
| w/o MEC-II | 50 | 0.224 | 0.895 | 0.211 | 0.694 | 0.482 |
| mMLDA | 50 | 0.221 | 0.981 | 0.303 | 0.688 | – |
| Proposed | 0 | 0.304 | 0.960 | 0.240 | 0.691 | |
| w/o MEC-II | 0 | 0.282 | 0.986 | 0.190 | 0.729 | 0.763 |
| mMLDA | 0 | 0.290 | 0.959 | 0.224 | 0.736 | – |
| Proposed | 30 | 0.303 | 0.978 | 0.193 | 0.698 | |
| w/o MEC-II | 30 | 0.349 | 0.917 | 0.219 | 0.726 | 0.718 |
| mMLDA | 30 | 0.307 | 0.956 | 0.159 | 0.717 | – |
| Proposed | 50 | 0.316 | 0.922 | 0.199 | 0.718 | 0.668 |
| w/o MEC-II | 50 | 0.258 | 0.937 | 0.210 | 0.726 | 0.639 |
| mMLDA | 50 | 0.297 | 0.989 | 0.123 | 0.687 | – |
Bold and underscore indicate the highest evaluation values, and bold indicates the second highest evaluation values.
Figure 9Examples of results of action generation task with real iCub. (A) Grasp front red ball. (B) Reach right red cup. (C) Look-at left yellow cup.
Experimental results in 25 and 40 trials.
| Method | Trials | ROW | F1 | ACC |
|---|---|---|---|---|
| Proposed | 25 | 0 | ||
| w/o MEC-II | 25 | 0 | ||
| mMLDA | 25 | 0 | 0.406 | 0.558 |
| Proposed | 40 | 0 | ||
| w/o MEC-II | 40 | 0 | ||
| mMLDA | 40 | 0 | 0.509 | 0.645 |
Bold and underscore indicate the highest evaluation values, and bold indicates the second highest evaluation values.
Figure 10Confusion matrix of results on action description task using the learning result by (top) 20 and (bottom) 40 trials under the ROW is 0%.