| Literature DB >> 28450842 |
Robert Grimm1, Giovanni Cassani1, Steven Gillis1, Walter Daelemans1.
Abstract
Previous studies have suggested that children and adults form cognitive representations of co-occurring word sequences. We propose (1) that the formation of such multi-word unit (MWU) representations precedes and facilitates the formation of single-word representations in children and thus benefits word learning, and (2) that MWU representations facilitate adult word recognition and thus benefit lexical processing. Using a modified version of an existing computational model (McCauley and Christiansen, 2014), we extract MWUs from a corpus of child-directed speech (CDS) and a corpus of conversations among adults. We then correlate the number of MWUs within which each word appears with (1) age of first production and (2) adult reaction times on a word recognition task. In doing so, we take care to control for the effect of word frequency, as frequent words will naturally tend to occur in many MWUs. We also compare results to a baseline model which randomly groups words into sequences-and find that MWUs have a unique facilitatory effect on both response variables, suggesting that they benefit word learning in children and word recognition in adults. The effect is strongest on age of first production, implying that MWUs are comparatively more important for word learning than for adult lexical processing. We discuss possible underlying mechanisms and formulate testable predictions.Entities:
Keywords: age of first production; contextual diversity; language acquisition; lexical processing; multi-word units; reaction times; word learning
Year: 2017 PMID: 28450842 PMCID: PMC5390038 DOI: 10.3389/fpsyg.2017.00555
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Relevant statistics for the ADS and CDS corpora.
| nr. adult speakers | 124 | 201 |
| nr. tokens | 4,233,645 | 4,869,472 |
| nr. types | 34,267 | 25,109 |
| Median utterance length | 5 (IQR: 7) | 4 (IQR: 4) |
Relevant statistics about the distribution of MWUs in ADS and CDS.
| Chunk-Based Learner | nr. MWU tokens | 834,205 | 1,117,465 |
| nr. MWU types | 495,610 | 467,849 | |
| Median MWU length | 5 (IQR: 4) | 4 (IQR: 3) | |
| Random baseline | nr. word sequence tokens | 663,953 | 955,698 |
| nr. word sequence types | 520,482 | 592,735 | |
| Median word sequence length | 6 (IQR: 4) | 5 (IQR: 3) |
Upper section: MWUs extracted by the Chunk-Based Learner. Lower section: MWUs extracted by the random baseline model.
The two most frequent and two of the least frequent MWUs for the three target words .
| Boy | Good boy | Good boy | 116 | 736 |
| Old boy | Clever boy | 35 | 301 | |
| There is a clever boy | Poor little boy | 3 | 3 | |
| Good old boy | Oh you naughty boy | 2 | 2 | |
| Sit | Sit down | Sit down | 106 | 324 |
| Sit there | Sit up | 19 | 107 | |
| Sit in the back | Sit on your chair | 3 | 3 | |
| You sit there | Can I sit down | 2 | 2 | |
| Nice | Very nice | That's nice | 132 | 354 |
| That is nice | Is that nice | 88 | 219 | |
| Isn't it nice | Do I look nice | 3 | 3 | |
| You look nice | Looks quite nice | 2 | 2 |
The right-most two columns contain MWU frequencies.
Figure 1Rank distribution of the number of tokens uttered by each child (A) and distribution of transcripts by child age in months (B).
Example data points.
| Mummy | 11,265 | 5,298 | 0.804 |
| Said | 4,894 | 2,357 | 1.111 |
| Body | 180 | 69 | 1.209 |
| Learn | 162 | 69 | 2.405 |
| Covered | 162 | 69 | 1.951 |
Statistics are estimated from the CDS corpus.
Figure 2Full pairwise correlations with ADS and CDS predictor variants. (A) Correlations with #Freq. (B) Correlations with #MWUs.
Figure 3Partial pairwise correlations with ADS and CDS predictor variants. (A) Correlations with #Freq. (B) Correlations with #MWUs.
Figure 4Comparison of correlations with MWUs from the Chunk-Based Learner and word sequences from a model which randomly groups words into MWUs. (A) Full correlations. (B) Partial correlations.
Figure 5Comparison of correlations with predictors across dependent variable. (A) Full correlations. (B) Partial correlations.