| Literature DB >> 22719871 |
Caroline Lyon1, Chrystopher L Nehaniv, Joe Saunders.
Abstract
The advent of humanoid robots has enabled a new approach to investigating the acquisition of language, and we report on the development of robots able to acquire rudimentary linguistic skills. Our work focuses on early stages analogous to some characteristics of a human child of about 6 to 14 months, the transition from babbling to first word forms. We investigate one mechanism among many that may contribute to this process, a key factor being the sensitivity of learners to the statistical distribution of linguistic elements. As well as being necessary for learning word meanings, the acquisition of anchor word forms facilitates the segmentation of an acoustic stream through other mechanisms. In our experiments some salient one-syllable word forms are learnt by a humanoid robot in real-time interactions with naive participants. Words emerge from random syllabic babble through a learning process based on a dialogue between the robot and the human participant, whose speech is perceived by the robot as a stream of phonemes. Numerous ways of representing the speech as syllabic segments are possible. Furthermore, the pronunciation of many words in spontaneous speech is variable. However, in line with research elsewhere, we observe that salient content words are more likely than function words to have consistent canonical representations; thus their relative frequency increases, as does their influence on the learner. Variable pronunciation may contribute to early word form acquisition. The importance of contingent interaction in real-time between teacher and learner is reflected by a reinforcement process, with variable success. The examination of individual cases may be more informative than group results. Nevertheless, word forms are usually produced by the robot after a few minutes of dialogue, employing a simple, real-time, frequency dependent mechanism. This work shows the potential of human-robot interaction systems in studies of the dynamics of early language acquisition.Entities:
Mesh:
Year: 2012 PMID: 22719871 PMCID: PMC3374830 DOI: 10.1371/journal.pone.0038236
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1The scenario for the human-robot dialogue.
List of salient words used by participants.
| One syllable words of form CVC | Other salient words |
| big | arrow |
| black | blue |
| box | circle |
| cross | crescent |
| cube | |
| green | |
| heart | |
| moon | |
| red | |
| ring | |
| round | |
| shape | |
| shapes | |
| small | |
| smile | |
| square | |
| squares | |
| star | |
| sun | |
| white |
Salient content words which were spoken by participants in these experiments. Recall that in our notation C represents one or more instances of a consonant.
The CMU phoneme set.
| Phoneme | Example | Phoneme | Example |
| aa | odd | k | key |
| ae | at | l | lee |
| ah | hut | m | me |
| ao | ought | n | knee |
| aw | cow | ng | ping |
| ay | hide | ow | oat |
| b | be | oy | toy |
| ch | cheese | p | pee |
| d | dee | r | read |
| dh | thee | s | sea |
| eh | Ed | sh | she |
| er | hurt | t | tea |
| ey | ate | th | theta |
| f | fee | uh | hood |
| g | green | uw | two |
| hh | he | v | vee |
| ih | it | w | we |
| iy | eat | y | yield |
| jh | gee | z | zee |
| zh | vision |
Figure 2Overview of the system architecture.
Statistics on participants’ speech, with robot’s perceptions and productions.
| Col. 1 | Col. 2 | Col. 3 | Col. 4 | Col. 5 | Col. 6 | Col. 7 | Col. 8 |
| P’pant | Total no. of words from p’pant | No. of different words from p’pant | No. of different syllables perceived by robot | No. of different CVC syllables perceived by robot | No. of salient wordsin top 10 spokenCVC words | No. of salient sylls. intop 10 perceivedCVC sylls. | No. of salient words uttered by robot |
| Set 1 | |||||||
| 1A | 282 | 38 | 267 | 96 | 6 | 5 | 5 |
| 1B | 387 | 76 | 358 | 126 | 4 | 0 | 2 |
| 1C | 447 | 53 | 284 | 99 | 5 | 3 | 6 |
| 1D | 481 | 78 | 317 | 116 | 7 | 5 | 4 |
| 1E | 825 | 84 | 458 | 199 | 5 | 4 | 4 |
| 1F | 559 | 121 | 453 | 189 | 5 | 3 | 1 |
| 1G | 398 | 79 | 306 | 117 | 4 | 0 | 0 |
| Set 2 | |||||||
| 2A | 627 | 53 | 149 | 38 | 5 | 2 | 1 |
| 2B | 475 | 61 | 98 | 21 | 5 | 2 | 2 |
| 2C | 876 | 113 | 170 | 45 | 3 | 1 | 1 |
| 2D | 832 | 113 | 176 | 44 | 4 | 2 | 0 |
| 2E | 229 | 20 | 86 | 21 | 8 | 2 | 1 |
| 2F | 729 | 109 | 151 | 28 | 1 | 0 | 1 |
| 2G | – | – | 139 | 35 | – | 1 | 2 |
| Set 3 | |||||||
| 3A | 454 | 57 | 316 | 114 | 4 | 3 | 4 |
| 3B | 165 | 20 | 181 | 61 | 7 | 3 | 2 |
| 3C | 330 | 56 | 297 | 112 | 8 | 4 | 2 |
| 3D | 110 | 27 | 171 | 57 | 2 | 2 | 2 |
| 3E | 297 | 48 | 304 | 118 | 7 | 3 | 3 |
| 3F | 656 | 135 | 549 | 240 | 3 | 2 | 1 |
| Set 4 | |||||||
| 4A | 611 | 92 | 426 | 180 | 4 | 4 | 3 |
| 4B | 368 | 44 | 359 | 154 | 4 | 2 | 1 |
| 4C | 654 | 84 | 515 | 225 | 6 | 1 | 2 |
| 4D | 692 | 128 | 488 | 212 | 3 | 2 | 3 |
| 4E | 704 | 100 | 476 | 203 | 4 | 2 | 2 |
| 4F | 234 | 65 | 298 | 110 | 8 | 4 | 4 |
| 4G | 180 | 24 | 227 | 80 | 5 | 3 | 3 |
| Set 5 | |||||||
| 5A | 558 | 118 | 492 | 213 | 2 | 3 | 2 |
| 5B | 681 | 145 | 539 | 235 | 5 | 3 | 1 |
| 5C | 491 | 46 | 301 | 117 | 6 | 3 | 3 |
| 5D | 221 | 53 | 243 | 85 | 4 | 3 | 2 |
| 5E | 715 | 120 | 474 | 211 | 4 | 2 | 2 |
| 5F | 189 | 20 | 139 | 46 | 6 | 5 | 5 |
| 5G | 83 | 7 | 52 | 13 | 2 | 2 | 2 |
| min | 83 | 7 | 52 | 13 | 1 | 0 | 0 |
| max | 876 | 145 | 549 | 240 | 8 | 5 | 6 |
| mean | 470.9 | 72.3 | 344.1 | 138.1 | 4.7 | 2.8 | 2.6 |
| SD | 226.8 | 38.8 | 131 | 64.0 | 1.8 | 1.3 | 1.4 |
See text, section “Experimental Programme” for description of Set 2. The min, max, mean and SD for columns 4, 5, 7 and 8 exclude Set 2, as the filtered figures are not comparable to those in other sets.
Pearson correlation between columns 6 and 7: ;
between columns 7 and 8: , both excluding Set 2.
Comparison of word counts with syllable counts as perceived by the robot: Set 1 as an example.
| Participant | Number of different words spoken | Number of different CVC words spoken | Number of different CVC syllables as perceived by robot |
| 1A | 38 | 16 | 96 |
| 1B | 76 | 50 | 126 |
| 1C | 53 | 26 | 99 |
| 1D | 78 | 36 | 116 |
| 1E | 84 | 42 | 199 |
| 1F | 121 | 55 | 189 |
| 1G | 79 | 35 | 117 |
Note the difference in number of CVC words, when the speech stream is segmented into words, in contrast with the much larger number of syllables as perceived by the robot with no knowledge of syllable boundaries. This is in spite of the fact that some small part of the participant’s speech is not perceived when he/she talks over DeeChee out of turn.
Words produced by the robot and those missed by the teacher for reinforcement.
| Set number | Number ofparticipants | Salient words producedby the robot | Number of words missedby participants | Word reinforced but heurisitc failed (included in col. 4) |
| 1 | 7 | 22 | 17 | 1 |
| 2 | 7 | 8 | 4 | 0 |
| 3 | 6 | 14 | 11 | 2 |
| 4 | 7 | 18 | 12 | 1 |
| 5 | 7 | 17 | 10 | 0 |
| total | 34 | 79 | 54 | 4 |
Aggregated figures.
Note the significant number of words uttered by DeeChee but not noticed by participant. “Simulated reinforcement” would find these matches.
Words learnt.
| Set number | Number of participants | Salient words learnt | non-words learnt | Other words learnt |
| 1 | 7 | 5 | 11 | 2 |
| 2 | 7 | 4 | 9 | 5 |
| 3 | 6 | 3 | 11 | 6 |
| 4 | 7 | 6 | 11 | 4 |
| 5 | 7 | 7 | 13 | 5 |
| total | 34 | 25 | 55 | 22 |
One-syllable words uttered by DeeChee, perceived by teacher, reinforced, and entered in lexicon as learnt. “Other words” are proper but non-salient words, such as “this” or “that”.
Figure 3Zipfian relationship between frequency of CVC words and rank.
Zipfian relationship between frequency of one-syllable CVC words in phonemic form, as perceived by the robot, and rank of the word. Recall that ‘C’ represents a consonant or a cluster of consonants, V represents a vowel. Example taken from participant 4A.
Figure 4Zipfian relationship between frequency of CV syllables and rank.
Zipfian relationship between frequency of CV syllables, in phonemic form, as perceived by the robot, and rank of the syllable. Example taken from participant 4A.
F-measures.
| Set number | F1-1 | F1-2 |
| Real reinforcement | “Simulated reinforcement” | |
| 1 | 0.26 | 0.80 |
| 2 | 0.38 | 0.64 |
| 3 | 0.21 | 0.72 |
| 4 | 0.34 | 0.77 |
| 5 | 0.38 | 0.72 |
|
|
| |
| 1 | 0.25 | 0.77 |
| 2 | 0.31 | 0.53 |
| 3 | 0.18 | 0.62 |
| 4 | 0.32 | 0.71 |
| 5 | 0.33 | 0.65 |
F-measure for each set under each of 2 conditions: real reinforcement and “simulated reinforcement”, F1-1 and F1-2. “Simulated reinforcement” is based on the number of salient content words spoken by the robot. Most of these were not noticed and so not reinforced by the participant. The analysis is repeated using the different definition of false positives, to include the “other” words in the false positives, giving scores F2-1 and F2-2. “Other” words are proper words like “this” and “that” but not salient content words, the names of shapes and colours. See text.
Word frequencies from orthographic transcriptions, participant 4A.
| Rank | Word | Frequency |
| 1 | a | 53 |
| 2 | red | 51 |
| 3 | thats | 33 |
| 4 | green | 32 |
| 5 | you | 26 |
| 6 | and | 24 |
| 7 | cross | 21 |
| 8 | blue | 21 |
| 9 | heart | 20 |
| 10 | circle | 17 |
| 11 | we | 15 |
| 12 | that | 14 |
Excerpt from ranked frequencies of words spoken by participant 4A. First 12 of 92 shown.
Syllable frequencies perceived by robot DeeChee.
| rank | CVC | CV | ||||
| phonemic form | orthographic form | frequency | phonemic form | orthographic form | frequency | |
| 1 | r eh d | red | 27 | r eh | part of red | 51 |
| 2 | gr iy n | green | 25 | dh ae* | part of that(s) | 34 |
| 3 | dh ae ts | thats | 20 | gr iy | part of green | 26 |
| 4 | kr ao s | cross | 17 | kr ao | part of cross | 17 |
| 5 | k ah l | part of circle | 13 | k ah | part of circle | 16 |
| 6 | r eh ds | part of red circle | 10 | dh ah* | the/that(s) | 16 |
| 7 | hh ah t | heart | 10 | t ah | 15 | |
| 8 | s er k | part of circle | 9 | hh ah | part of heart | 13 |
| 9 | y eh s | yes | 8 | dh eh* | the/that(s) | 13 |
| 10 | dh ae t | that | 8 | bl uw | blue | 11 |
Example from participant 4A.
Excerpts from ranked frequencies of CVC syllables spoken by participant 4A, as perceived by DeeChee. First 10 of 180 shown. Note the starred entries showing variable phonemic form for some function words. See graphical representations in Figures 3 and 4.