| Literature DB >> 29520243 |
Philip A Huebner1, Jon A Willits2.
Abstract
Previous research has suggested that distributional learning mechanisms may contribute to the acquisition of semantic knowledge. However, distributional learning mechanisms, statistical learning, and contemporary "deep learning" approaches have been criticized for being incapable of learning the kind of abstract and structured knowledge that many think is required for acquisition of semantic knowledge. In this paper, we show that recurrent neural networks, trained on noisy naturalistic speech to children, do in fact learn what appears to be abstract and structured knowledge. We trained two types of recurrent neural networks (Simple Recurrent Network, and Long Short-Term Memory) to predict word sequences in a 5-million-word corpus of speech directed to children ages 0-3 years old, and assessed what semantic knowledge they acquired. We found that learned internal representations are encoding various abstract grammatical and semantic features that are useful for predicting word sequences. Assessing the organization of semantic knowledge in terms of the similarity structure, we found evidence of emergent categorical and hierarchical structure in both models. We found that the Long Short-term Memory (LSTM) and SRN are both learning very similar kinds of representations, but the LSTM achieved higher levels of performance on a quantitative evaluation. We also trained a non-recurrent neural network, Skip-gram, on the same input to compare our results to the state-of-the-art in machine learning. We found that Skip-gram achieves relatively similar performance to the LSTM, but is representing words more in terms of thematic compared to taxonomic relations, and we provide reasons why this might be the case. Our findings show that a learning system that derives abstract, distributed representations for the purpose of predicting sequential dependencies in naturalistic language may provide insight into emergence of many properties of the developing semantic system.Entities:
Keywords: language learning; neural networks; semantic development; statistical learning
Year: 2018 PMID: 29520243 PMCID: PMC5827184 DOI: 10.3389/fpsyg.2018.00133
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Theoretical debates regarding the nature of knowledge.
| Form of knowledge | ||
|---|---|---|
| Structure of knowledge | Only representations of sensory-motor information | Sensory-motor and abstract concepts |
| Unstructured | Knowledge consists of nothing but a very rich and complex set of unstructured sensory-motor associations. | In addition to sensory-motor information, knowledge consists of a set of abstract concepts. |
| Structured | Knowledge consists of a very rich and complex set of sensory/motor associations that are organized into hierarchically-structured representations. | Knowledge consists of sensory/motor information and abstract concepts, organized into hierarchically-structured representations. |
The set of categories, the number of word types in each category, and the number of occurrences of word types in each category in the training corpus.
| Category | Word types | Word tokens | Category | Word types | Word tokens |
|---|---|---|---|---|---|
| Bathroom | 22 | 5533 | Mammal | 72 | 35781 |
| Bird | 27 | 8384 | Meat | 18 | 2914 |
| Body | 62 | 42601 | Months | 13 | 1897 |
| Clothing | 48 | 16022 | Music | 14 | 1845 |
| Days | 14 | 8163 | Numbers | 27 | 41048 |
| Dessert | 20 | 9048 | Plants | 15 | 6006 |
| Drink | 14 | 9880 | Shape | 13 | 3355 |
| Electronics | 18 | 5347 | Space | 14 | 3042 |
| Family | 32 | 52539 | Times | 11 | 7731 |
| Fruit | 28 | 7719 | Tools | 28 | 7665 |
| Furniture | 28 | 11131 | Toys | 30 | 25339 |
| Games | 6 | 1222 | Vegetable | 21 | 3271 |
| Household | 32 | 10930 | Vehicles | 34 | 15559 |
| Insect | 18 | 4755 | Weather | 11 | 4082 |
| Kitchen | 29 | 7767 |
Nearest semantic neighbors after training for 1 of the 10 models for selected words, in terms of the average hidden activation state of the network (for SRNs and LSTMs) and in terms of the weight matrix (for Skip-gram).
| Dog | Bed | Shoe | Banana | Five |
|---|---|---|---|---|
| Squirrel 0.95 | Crib 0.93 | Sock 0.97 | Carrot 0.97 | Six 0.95 |
| Fox 0.95 | Room 0.92 | Sneaker 0.95 | Pretzel 0.96 | Four 0.95 |
| Horse 0.95 | Desk 0.92 | Boot 0.95 | Cracker 0.96 | Three 0.94 |
| Tiger 0.95 | Pouch 0.92 | Sandal 0.95 | Cheerio 0.96 | Ten 0.93 |
| Wolf 0.95 | House 0.92 | Jacket 0.94 | Lemon 0.96 | Seven 0.93 |
| Wolf 0.95 | Desk 0.94 | Sock 0.98 | Cheerio 0.96 | Four 0.97 |
| Fox 0.95 | Crib 094 | Sneaker 0.96 | Carrot 0.96 | Six 0.96 |
| Horse 0.95 | Shade 0.93 | Sandal 0.95 | Pretzel 0.96 | Eight 0.94 |
| Mouse 0.95 | Bedroom 0.93 | Boot 0.94 | Hamburger 0.96 | Seven 0.94 |
| Penguin 0.94 | Room 0.93 | Sweater 0.94 | Peach 0.95 | Three 0.94 |
| Pup 0.76 | Sleep 0.63 | Sock 0.77 | Pear 0.58 | Six 0.88 |
| Collie 0.62 | Crib 0.59 | Sneaker 0.77 | Raisin 0.56 | Four 0.83 |
| Kitten 0.57 | Blanket 0.54 | Sandal 0.64 | Frozen 0.55 | Seven 0.77 |
| Woggy 0.56 | Bedroom 0.53 | Pant 0.63 | Cereal 0.55 | Three 0.74 |
| Bark 0.53 | Nap 0.47 | Shoelace 0.58 | Oatmeal 0.54 | Eight 0.69 |
Nearest semantic neighbors from SRN, LSTM, and Skip-gram for two words in the categories ‘weather,’ ‘meat,’ and ‘months.’
| SRN | LSTM | Skip-gram |
|---|---|---|
| Treasure 0.87 | Rocket 0.90 | Man 0.57 |
| Log 0.86 | Fish 0.88 | Flake 0.55 |
| Motorcycle 0.86 | Snail 0.88 | White 0.52 |
| Taxi 0.86 | Cloud 0.88 | Baum 0.50 |
| Mail 0.86 | Mole 0.88 | Melt 0.46 |
| Flash 0.92 | Dark 0.91 | Spout 0.49 |
| Dust 0.91 | Daytime 0.90 | Outside 0.46 |
| Land 0.90 | Dust 0.89 | Bitsy 0.45 |
| Steam 0.90 | Steam 0.88 | Itsy 0.45 |
| Crowd 0.90 | Colder 0.88 | Spider 0.44 |
| Salad 0.97 | Broccoli 0.96 | Soup 0.57 |
| Bread 0.97 | Oatmeal 0.95 | Carrot 0.56 |
| Pizza 0.96 | Salad 0.95 | Broccoli 0.55 |
| Oatmeal 0.96 | Bread 0.95 | Cheese 0.53 |
| Cereal 0.96 | Macaroni 0.95 | Vegetable 0.53 |
| Whale 0.91 | Penguin 0.92 | Angler 0.53 |
| Hay 0.91 | Snail 0.91 | Turtle 0.49 |
| Goldfish 0.91 | Goldfish 0.91 | Glub 0.48 |
| Goose 0.91 | Whale 0.91 | Swim 0.46 |
| Turkey 0.91 | Bug 0.91 | Fins 0.46 |
| Buster 0.93 | Harvey 0.93 | Fifth 0.59 |
| Harvey 0.93 | Darling 0.93 | February 0.56 |
| Hank 0.93 | Abba 0.93 | Twenty 0.53 |
| September 0.93 | Correct 0.92 | Saturday 052 |
| January 0.93 | America 0.92 | October 0.51 |
| Year 0.97 | Year 0.97 | Year 0.80 |
| Degree 0.93 | Thousand 0.92 | Week 0.68 |
| Ounce 0.93 | Hour 0.92 | Twenty 0.64 |
| Dollar 0.93 | Week 0.92 | Ounce 0.58 |
| Thousand 0.92 | Hundred 0.92 | Thirty 0.56 |