| Literature DB >> 34918269 |
Francis Mollica1, Steven T Piantadosi2.
Abstract
We examine the conceptual development of kinship through the lens of program induction. We present a computational model for the acquisition of kinship term concepts, resulting in the first computational model of kinship learning that is closely tied to developmental phenomena. We demonstrate that our model can learn several kinship systems of varying complexity using cross-linguistic data from English, Pukapuka, Turkish, and Yanomamö. More importantly, the behavioral patterns observed in children learning kinship terms, under-extension and over-generalization, fall out naturally from our learning model. We then conducted interviews to simulate realistic learning environments and demonstrate that the characteristic-to-defining shift is a consequence of our learning model in naturalistic contexts containing abstract and concrete features. We use model simulations to understand the influence of logical simplicity and children's learning environment on the order of acquisition of kinship terms, providing novel predictions for the learning trajectories of these words. We conclude with a discussion of how this model framework generalizes beyond kinship terms, as well as a discussion of its limitations.Entities:
Keywords: Bayesian modeling; Conceptual development; Word-learning
Mesh:
Year: 2021 PMID: 34918269 PMCID: PMC9166873 DOI: 10.3758/s13423-021-02017-5
Source DB: PubMed Journal: Psychon Bull Rev ISSN: 1069-9384
The Probabilistic Context Free Grammar (PCFG) specifying the base functions and the rewrite rules that govern their composition. Each hypothesis starts with a SET symbol and there are 37 concrete referents in our learning context
| SET | SET | SET | SET |
| SET | SET | SET | SET |
| SET | SET | SET | SET |
| SET | SET | SET | SET |
Fig. 1Family tree context for our simulations. Connections above figures reflect parent–child relationships. Connections under figures reflect lateral/spousal relationships. Men are denoted with hats. Numbers reflect the rank order of the amount of interaction a learner (i.e., 1) has with the other individuals on the tree
Fig. 2Average lexicon posterior-weighted accuracy for each word as a function of data points of that word. Shaded region denotes 95% bootstrapped confidence intervals. Insets show the color-coded extension of the terms
The maximum-a-posterior (MAP) hypotheses after learning
| Word | Extension | MAP Hypothesis | |
|---|---|---|---|
| Pukapuka | difference(generation0(X), sameGender(X)) | ||
| male(child(parent(parent(X)))) | |||
| female(child(parent(parent(X)))) | |||
| intersection(generation0(X), sameGender(X)) | |||
| male(child(parent(parent(parent(X))))) | |||
| female(child(parent(parent(parent(X)))) | |||
| English | female(difference(generation1(X), parent(X))) | ||
| male(child(parent(X))) | |||
| difference(generation0(X), child(parent(X))) | |||
| male(parent(X)) | |||
| female(parent(parent(X))) | |||
| male(parent(parent(X))) | |||
| female(parent(X)) | |||
| female(child(parent(X))) | |||
| male(difference(generation1(X), parent(X))) | |||
| Turkish | male(child(parent(X))) | ||
| female(child(parent(X))) | |||
| intersection(sameGender( | |||
| female(parent(X)) | |||
| female(parent(female(parent(X)))) | |||
| male(parent(X)) | |||
| female(parent(male(parent(X)))) | |||
| male(child(parent(female(parent(X))))) | |||
| male(parent(parent(X))) | |||
| intersection(lateral(child(parent(parent(X)))), male(complement(parent(X)))) | |||
| female(child(parent(male(parent(X))))) | |||
| difference(generation0(X), child(parent(X))) | |||
| difference(difference(female(generation0(female(parent(X)))),X),parent(X)) | |||
| difference(female(generation1(X)),union(child(parent(parent(X))),parent(X))) | |||
| Yanomamö | female(child(coreside(X))) | ||
| male(child(coreside(X))) | |||
| male(coreside(X)) | |||
| female(coreside(X)) | |||
| male(difference(generation1s(X), coreside(X))) | |||
| difference(male(generation0(X)), child(coreside(X))) | |||
| female(difference(generation0(X), child(coreside(X)))) | |||
| difference(female(generation1s(X)), coreside(X)) |
f:father, m:mother, p:parent, s:son, d:daughter, c:child, b:brother, z:sister, g:sibling, h:husband, w:wife, e:spouse ‡ The extension is provided with regards to a male speaker. For a female speaker, swap the two words. The MAP hypothesis will compute the correct extension regardless of speaker’s gender. ‡‡ The MAP hypothesis for amca makes use of Fabio, the individual ranked 29 in Fig. 1 in order to construct the set of all men in the context
Fig. 3Probability of using abstraction as a function of unique data points at several different prior strengths for concrete reference. At higher prior values of concrete reference, the rise in the probability of abstraction is shifted to require more unique data points
Fig. 4The posterior probability that each person on the tree is an uncle of the learner (in black) at various data amounts. Yellow (lighter color) indicates high probability and blue (darker color) indicates low probability
Fig. 5Average lexicon posterior-weighted accuracy, precision, and recall for each word as a function of data points. Recall greater than precision is a hallmark of overgeneralization. Shaded regions represent 95% bootstrapped confidence intervals
Fig. 6Distance-ranked family trees from informants. Circles represent women; squares men. Bold lateral lines denote spousal relationships. Informant 1 (top left) provided 107 unique features; Informant 2 (top right) 88; Informant 3 (bottom left) 92; and Informant 4, 59
Additional rules for the PCFG in Table 1. Now, each hypothesis starts with a START symbol
| START | FSET | FSET | FSET |
| START | FSET | FSET | VALUE |
Fig. 7Average posterior probability of using a characteristic or a defining hypothesis (y-axis) as a function of the amount of data observed (x-axis) for words (rows) and informants (columns). Shaded regions reflect 95% bootstrapped confidence intervals. For all words, there is a characteristic-to-defining shift
Best hypotheses for Informant One learning grandma at three different time points
| Hypothesis | Posterior Probability | |
|---|---|---|
| Before seeing data | X (i.e., the speaker) | 0.354 |
| male(X) | 0.006 | |
| complement(X) | 0.006 | |
| After seeing 3 data points | outgoing(Yes) | 0.283 |
| nosy(Yes) | 0.283 | |
| small(Yes) | 0.084 | |
| One data point after shift | parents(parents(X)) | 0.289 |
| female(parents(parents(X))) | 0.268 | |
| outgoing(Yes) | 0.219 |
Complexity in terms of Haviland and Clark (1974) aligns with the prior probability of our model
| Empirical Order | Word | Original H&C Order & Formalization | Log Prior | CHILDES Freq. |
|---|---|---|---|---|
| 1 | Level I: [ | -9.457 | 6812 | |
| 1 | Level I: [ | -9.457 | 3605 | |
| 2 | Level III: [ | -13.146 | 41 | |
| 2 | Level III: [ | -13.146 | 89 | |
| 3 | Level II: [ | -13.146 | 526 | |
| 3 | Level II: [ | -13.146 | 199 | |
| 4 | Level IV: [ | -19.320 | 97 | |
| 4 | Level IV: [ | -19.320 | 68 | |
| 4 | Level IV: [ | -18.627 | 14 |
Contrary to Benson and Anglin (1987)’s survey, CHILDES frequencies do not align with order of acquisition
Fig. 8Possible patterns of order of acquisition. The x-axis reflects the ordinal position of acquisition. The y-axis represents each word. The tiles are filled according to the probability of acquisition. Words that have zero probability at a given ordinal position are omitted
Fig. 9Simulations of the order of acquisition of kinship terms as a function of changes in environmental data distributions (columns) and the inductive biases of the learner (rows). A tiny amount of random noise was added to probabilities in each simulation to settle ties
Quantitative description of consistency and correlation to attested order of acquisition
| Prior | Environment | Joint Entropy | Rank Correlation |
|---|---|---|---|
| Simplicity | CHILDES | 3.43 | 0.475 [0.197, 0.704] |
| Simplicity | Uniform | 3.42 | 0.469 [0.254, 0.704] |
| Simplicity | Zipf | 2.83 | 0.687 [0.592, 0.761] |
| Uniform | CHILDES | 3.28 | 0.365 [0.197, 0.535] |
| Uniform | Uniform | 3.25 | 0.365 [0.197, 0.535] |
| Uniform | Zipf | 2.96 | 0.611 [0.479, 0.761] |
Intervals reflect 95% posterior weighted interval. For reference, τ = 0.535 would be considered a significant correlation
Summary of the empirical behavior, how the model explains this behavior and the behavioral predictions to be generated by the model
| Empirical behavior | Model explanation | Behavioral predictions |
|---|---|---|
| Cross-linguistic learnability | Inductive learning | The number of data points before acquisition |
| Under-extension | Local data distribution | The number of data points before abstraction. |
| Over-generalization | Trade-off between prior and likelihood | The pattern of generalization at each data amount |
| Characteristic-to-defining shift | Learning context | The presence of and the number of data points before the shift |
| Order of Acquisition | Environmental experience | The order of acquisition and number of data points before each term |
| is acquired |
An example local max lexicon when permitting recursive calls in the lexicon space
| male(parent(X)) | |
| female(parent(X)) | |
| child(parent(X)) | |
| female( | |
| male( | |
| female( | |
| difference(generation0(X), |
Example leaner
| Concept | Hypothesis | Relative Frequency |
|---|---|---|
| female(generation1(X)) | 0.3 | |
| female(parent(X)) | 0.7 |