| Literature DB >> 34060184 |
Stewart M McCauley1, Colin Bannard2, Anna Theakston1, Michelle Davis3, Thea Cameron-Faulkner4, Ben Ambridge2.
Abstract
Psycholinguistic research over the past decade has suggested that children's linguistic knowledge includes dedicated representations for frequently-encountered multiword sequences. Important evidence for this comes from studies of children's production: it has been repeatedly demonstrated that children's rate of speech errors is greater for word sequences that are infrequent and thus unfamiliar to them than for those that are frequent. In this study, we investigate whether children's knowledge of multiword sequences can explain a phenomenon that has long represented a key theoretical fault line in the study of language development: errors of subject-auxiliary non-inversion in question production (e.g., "why we can't go outside?*"). In doing so we consider a type of error that has been ignored in discussion of multiword sequences to date. Previous work has focused on errors of omission - an absence of accurate productions for infrequent phrases. However, if children make use of dedicated representations for frequent sequences of words in their productions, we might also expect to see errors of commission - the appearance of frequent phrases in children's speech even when such phrases are not appropriate. Through a series of corpus analyses, we provide the first evidence that the global input frequency of multiword sequences (e.g., "she is going" as it appears in declarative utterances) is a valuable predictor of their errorful appearance (e.g., the uninverted question "what she is going to do?*") in naturalistic speech. This finding, we argue, constitutes powerful evidence that multiword sequences can be represented as linguistic units in their own right.Entities:
Keywords: chunking; corpus analysis; language acquisition; questions
Mesh:
Year: 2021 PMID: 34060184 PMCID: PMC8596434 DOI: 10.1111/desc.13125
Source DB: PubMed Journal: Dev Sci ISSN: 1363-755X
Details of CHILDES corpora used in analysis of uninversion errors
| Target Child | Corpus | Age Range |
|---|---|---|
| Abe | Kuczaj, 1977 | 2;04–5;00 |
| Adam | Brown, 1973 | 2;03–5;02 |
| Eleanor | Lieven et al., | 2;00–3;00 |
| Ethan | Demuth & McCullough, 2009 | 0;11–2;11 |
| Fraser | Lieven et al., | 2;00–3;01 |
| Laura | Braunwald, 1976 | 1;05‐7;00 |
| Lara | Rowland & Fletcher, 2006 | 1;09‐3;03 |
| Lily | Demuth & McCullough, 2009 | 1;01‐4;00 |
| Naima | Demuth & McCullough, 2009 | 0;11‐3;10 |
| Ross | MacWhinney, 1991 | 1;04‐7;08 |
| Sarah | Brown, 1973 | 2;03‐5;01 |
| Thomas | Maslen et al., 2004 | 2;00‐4;11 |
FIGURE 1Unigrams (individual words), bigrams, and trigrams for the correctly inverted (top) and corresponding errorful (bottom) forms of the example question What are you doing there? N‐grams excluded from the final statistical model are shown in black. N‐grams retained in the final statistical model are shown as green/red words (unigrams) and green/red line (bigrams and trigrams). Note that this figure mixes the example level with the general design level for illustration purposes
Results of model comparisons
| Left‐out Predictor | Log‐likelihood |
|
| Ex. |
|---|---|---|---|---|
| Unigram (full/baseline) | −702.13 | – | – | – |
| Unigram 1 | −705.6 | 6.95 | 0.00 ** |
|
| Unigram 2 | −707.16 | 10.07 | 0.00 ** |
|
| Unigram 3 | −702.27 | 0.29 | 0.59 |
|
| Unigram 4 | −702.13 | 0.00 | 0.97 |
|
| Unigram 5 | −702.20 | 0.14 | 0.71 |
|
| Bigram (full/baseline) | −626.40 | – | – | – |
| Bigram 1 | −627.28 | 1.76 | 0.19 |
|
| Bigram 2 | −627.20 | 1.59 | 0.21 |
|
| Bigram 3 | −631.41 | 10.01 | 0.00 ** |
|
| Bigram 4 | −632.68 | 12.55 | 0.00 *** |
|
| Trigram (full/baseline) | −614.62 | – | – | – |
| Trigram 1 | −615.44 | 1.641 | 0.2002 |
|
| Trigram 2 | −615.69 | 2.141 | 0.1434 |
|
| Trigram 3 | −614.67 | 0.103 | 0.748 |
|
| Uninverted Bigram (full/baseline) | −626.40 | – | – | – |
| Uninverted Bigram 1 | −626.42 | 0.02 | 0.88 |
|
| Uninverted Bigram 2 | −634.79 | 16.77 | 0.00 *** |
|
| Uninverted Bigram 3 | −634.87 | 16.94 | 0.00 *** |
|
| Uninverted Bigram 4 | −632.5 | 12.19 | 0.00 *** |
|
| Uninverted Trigram (full/baseline) | −614.62 | – | – | – |
| Uninverted Trigram 1 | −614.87 | 0.505 | 0.4772 |
|
| Uninverted Trigram 2 | −617.55 | 5.874 | 0.02 * |
|
| Uninverted Trigram 3 | −618.41 | 7.582 | 0.01 ** |
|
Note. Errorful (uninverted) questions are coded as 1, while correctly inverted questions are coded as 0.
Results of full model
| Item | β | Std. Error | Ex. |
|---|---|---|---|
| Intercept | −4.24 | 0.34 |
|
| Uni 1 | −0.69 | 0.24 |
|
| Uni 2 | −0.78 | 0.12 |
|
| Bi 3 | −0.73 | 0.14 |
|
| Bi 4 | −0.95 | 0.20 |
|
| Bi 2 ( | 0.67 | 0.15 |
|
| Bi 3 ( | 0.59 | 0.15 |
|
| Bi 4 ( | 0.34 | 0.16 |
|
| Tri 2 ( | 0.10 | 0.15 |
|
| Tri 3 ( | 0.56 | 0.16 |
|
Note. Errorful (uninverted) questions are coded as 1, while correctly inverted questions are coded as 0. Beta coefficients are included for transparency; conclusions regarding the significance of variables are based, instead, on the model comparisons (described above).
CHILDES (English) frequency counts for N‐grams across dataset for all caregiver and child utterances
|
| Caregiver Min. Freq | Caregiver Mean Freq. | Caregiver Max. Freq | Child Min Freq. | Child Mean Freq. | Child Max Freq. |
|---|---|---|---|---|---|---|
|
| 20,153 | 98,343.0391 | 219,514 | 7,414 | 30,833.4 | 50,444 |
|
| 1,110 | 153,861.9 | 266,567 | 128 | 51,689.5 | 85,760 |
|
| 0 | 241,017.3 | 508,191 | 1 | 68,353.7 | 187,512 |
|
| 0 | 35,680.99 | 508,191 | 1 | 16,613.1 | 187,512 |
|
| 0 | 71,296.2 | 508,191 | 1 | 25,878.5 | 187,512 |
|
| 0 | 18,830.97 | 66,309 | 1 | 8,090.5 | 20,921 |
|
| 0 | 16,062.03 | 66,796 | 1 | 2,978.7 | 7,216 |
|
| 0 | 2,227.369 | 41,221 | 1 | 710.1 | 19,255 |
|
| 0 | 2,622.009 | 66,796 | 1 | 737.4 | 14,432 |
|
| 0 | 2,697.657 | 13,351 | 1 | 915.9 | 3,867 |
|
| 0 | 609.1358 | 16,887 | 1 | 68.7 | 1,649 |
|
| 0 | 265.8017 | 14,145 | 1 | 42.9 | 4,784 |
|
| 0 | 1,348.902 | 6,884 | 1 | 669.9 | 20,921 |
|
| 0 | 4,351.922 | 41,221 | 1 | 1,271.8 | 14,432 |
|
| 0 | 593.7 | 43,418 | 1 | 336.4 | 9,373 |
|
| 0 | 2762.7 | 66,796 | 1 | 780.8 | 20,921 |
|
| 0 | 55.3 | 653 | 1 | 16.4 | 205 |
|
| 0 | 84.9 | 3,540 | 1 | 24.3 | 1,977 |
|
| 0 | 68.0 | 2,592 | 1 | 16.6 | 1,146 |