| Literature DB >> 29518682 |
Steven Moran1, Damián E Blasi2, Robert Schikowski3, Aylin C Küntay4, Barbara Pfeiler5, Shanley Allen6, Sabine Stoll3.
Abstract
How does a child map words to grammatical categories when words are not overtly marked either lexically or prosodically? Recent language acquisition theories have proposed that distributional information encoded in sequences of words or morphemes might play a central role in forming grammatical classes. To test this proposal, we analyze child-directed speech from seven typologically diverse languages to simulate maximum variation in the structures of the world's languages. We ask whether the input to children contains cues for assigning syntactic categories in frequent frames, which are frequently occurring nonadjacent sequences of words or morphemes. In accord with aggregated results from previous studies on individual languages, we find that frequent word frames do not provide a robust distributional pattern for accurately predicting grammatical categories. However, our results show that frames are extremely accurate cues cross-linguistically at the morpheme level. We theorize that the nonadjacent dependency pattern captured by frequent frames is a universal anchor point for learners on the morphological level to detect and categorize grammatical categories. Whether frames also play a role on higher linguistic levels such as words is determined by grammatical features of the individual language.Entities:
Keywords: Child-directed speech; Cross-linguistic language acquisition; Frequent frames; Input patterns; Nonadjacent dependency; Statistical learning
Mesh:
Year: 2018 PMID: 29518682 PMCID: PMC5894936 DOI: 10.1016/j.cognition.2018.02.005
Source DB: PubMed Journal: Cognition ISSN: 0010-0277
Results from previous studies.
| Language (corpus) | Utterances | Mean accuracy | Mean completeness | ||
|---|---|---|---|---|---|
| Words | Morphemes | Words | Morphemes | ||
| English ( | 103,191 | 0.91 | 0.12 | ||
| Chinese ( | 22,137 | 0.70 | |||
| Dutch ( | 49,635 | 0.71 | |||
| French ( | 2006 | 1.0 | 0.33 | ||
| Spanish ( | 37,588 | 0.75 | |||
| Turkish ( | 37,765 | 0.47 | 0.91 | 0.10 | 0.06 |
| German ( | 5685 | 0.86 | 0.88 | 0.07 | 0.05 |
| German ( | 30,601 | 0.77 | |||
Language sample.
| Language | Spoken mainly in | Language family | Speakers |
|---|---|---|---|
| Chintang | Nepal | Sino-Tibetan | 6000 |
| Inuktitut | Canada | Eskimo-Aleut | 34,000 |
| Japanese | Japan | Japanese | 128,000,000 |
| Russian | Russia | Indo-European | 166,000,000 |
| Sesotho | South Africa | Bantu | 5,600,000 |
| Turkish | Turkey | Altaic | 70,900,000 |
| Yucatec | Mexico | Mayan | 766,000 |
Children in the corpora.
| Corpus | Children | Age ranges |
|---|---|---|
| Chintang | 4 | 2; 1.9–3; 5.25, 2; 0.29–3; 5.13, 3; 0.14–4; 4.25, 2; 11.2–4; 3.14 |
| Inuktitut | 4 | 2; 6.6–3; 3.2, 2; 0.11–2; 9.5, 2; 6.2–3; 2.26, 2; 9.16–3; 6.12 |
| Japanese | 4 | 2; 11.27–5; 1.23, 2; 11.28–5; 0.17 (×2), 3; 0.1–5; 0.27 |
| Russian | 5 | 1; 3.26–4; 11.0, 1; 4.22–5; 6.26, 1; 6.10–5; 4.18, 1; 11.28–4; 3.14, 3; 1.8–6; 8.12 |
| Sesotho | 4 | 2; 1–3; 0, 2; 1–3; 2, 2; 4–3; 3, 3; 8–4; 7 |
| Turkish | 8 | 1; 0.2–3; 0.3, 0; 7.28–3; 0.24, 0; 8.6–3; 0.14, 0; 8.1–1; 9.28,0; 8.0–2; 4.20, 0; 8.2–3; 0.14, 0; 8.30–3; 0.20, 0; 9.27–2; 9.13 |
| Yucatec | 3 | 1; 11.9–3; 5.4, 2; 0.1–3; 0.29, 2; 1.5–3; 3.11 |
Corpus and analysis size.
| Language (corpus) | Utterances | Words | Morphemes | ||
|---|---|---|---|---|---|
| Total | Analyzed | Total | Analyzed | ||
| Chintang ( | 396,412 | 987,120 | 473,918 | 1,594,829 | 814,076 |
| Inuktitut ( | 46,680 | 73,255 | 23,164 | 37,781 | 8673 |
| Japanese ( | 271,868 | 821,106 | 514,344 | 666,748 | 376,934 |
| Russian ( | 828,041 | 2,033,755 | 1,316,234 | NA | NA |
| Sesotho ( | 69,530 | 237,112 | 83,514 | 329,347 | 112,630 |
| Turkish ( | 400,836 | 1,136,332 | 938,955 | 300,907 | 272,459 |
| Yucatec ( | 91,825 | 257,496 | 89,219 | 198,761 | 84,928 |
Based on Miyata and Nisisawa, 2009, Miyata and Nisisawa, 2010 and Nisisawa and Miyata, 2009, Nisisawa and Miyata, 2010.
Typological features.
| Synthesis | ||||
|---|---|---|---|---|
| Language | Word order | Noun | Verb | Adposition |
| Chintang | SOV | Mid | High | None |
| Inuktitut | SOV | High | High | None |
| Japanese | SOV | Low | Low | Post |
| Russian | SVO | Mid | Mid | Prep |
| Sesotho | SVO | Mid | High | None |
| Turkish | SOV | Mid | High | Post |
| Yucatec | VOS | Mid | High | Prep |
Frequent word frames in Chintang.
| Accuracy | SD | Completeness | SD | Frames | Min | Max | Median | ||
|---|---|---|---|---|---|---|---|---|---|
| Fixed threshold | LDCh1 | 0.66 | 0.25 | 0.03 | 0.02 | 45 | 27 | 1311 | 40 |
| LDCh2 | 0.61 | 0.27 | 0.03 | 0.02 | 45 | 29 | 1434 | 42 | |
| LDCh3 | 0.55 | 0.26 | 0.03 | 0.02 | 45 | 24 | 478 | 36 | |
| LDCh4 | 0.54 | 0.21 | 0.03 | 0.02 | 45 | 36 | 592 | 56 | |
| Variable threshold | LDCh1 | 0.64 | 0.25 | 0.04 | 0.02 | 39 | 29 | 1311 | 41 |
| LDCh2 | 0.62 | 0.28 | 0.03 | 0.02 | 41 | 32 | 1434 | 42 | |
| LDCh3 | 0.54 | 0.27 | 0.04 | 0.03 | 33 | 30 | 478 | 39 | |
| LDCh4 | 0.55 | 0.23 | 0.04 | 0.02 | 35 | 42 | 592 | 66 |
Frequent word frames in Russian.
| Accuracy | SD | Completeness | SD | Frames | Min | Max | Median | ||
|---|---|---|---|---|---|---|---|---|---|
| Fixed threshold | Child1 | 0.49 | 0.25 | 0.05 | 0.03 | 45 | 96 | 852 | 130 |
| Child2 | 0.49 | 0.24 | 0.05 | 0.03 | 45 | 132 | 1050 | 182 | |
| Child3 | 0.52 | 0.23 | 0.05 | 0.03 | 45 | 65 | 503 | 94 | |
| Child4 | 0.47 | 0.23 | 0.04 | 0.03 | 45 | 78 | 560 | 132 | |
| Variable threshold | Child1 | 0.50 | 0.26 | 0.03 | 0.02 | 67 | 84 | 852 | 116 |
| Child2 | 0.48 | 0.24 | 0.04 | 0.03 | 51 | 127 | 1050 | 177 | |
| Child3 | 0.51 | 0.22 | 0.04 | 0.02 | 50 | 61 | 503 | 85 | |
| Child4 | 0.46 | 0.22 | 0.05 | 0.03 | 44 | 82 | 560 | 137 |
Frequent morpheme frames in Chintang.
| Accuracy | SD | Completeness | SD | Frames | Min | Max | Median | ||
|---|---|---|---|---|---|---|---|---|---|
| Fixed threshold | LDCh1 | 0.95 | 0.09 | 0.08 | 0.07 | 45 | 202 | 2610 | 269 |
| LDCh2 | 0.93 | 0.13 | 0.08 | 0.06 | 45 | 226 | 3138 | 362 | |
| LDCh3 | 0.92 | 0.15 | 0.07 | 0.07 | 45 | 202 | 3159 | 280 | |
| LDCh4 | 0.89 | 0.20 | 0.09 | 0.06 | 45 | 249 | 3806 | 387 | |
| Variable threshold | LDCh1 | 0.94 | 0.13 | 0.07 | 0.07 | 55 | 175 | 2610 | 261 |
| LDCh2 | 0.92 | 0.15 | 0.06 | 0.06 | 59 | 188 | 3138 | 298 | |
| LDCh3 | 0.92 | 0.15 | 0.07 | 0.06 | 49 | 174 | 3159 | 273 | |
| LDCh4 | 0.88 | 0.20 | 0.08 | 0.06 | 51 | 237 | 3806 | 370 |
Frequent word frames.
| Accuracy | SD | Completeness | SD | Frames | Min | Max | Median | |
|---|---|---|---|---|---|---|---|---|
| Chintang | 0.57 | 0.24 | 0.04 | 0.02 | 33 | 90 | 2720 | 118.00 |
| Inuktitut | 0.98 | 0.11 | 0.03 | 0.01 | 37 | 2 | 3 | 2.00 |
| Japanese | 0.82 | 0.21 | 0.02 | 0.02 | 97 | 67 | 915 | 106.00 |
| Russian | 0.44 | 0.22 | 0.04 | 0.03 | 48 | 234 | 1485 | 310.00 |
| Sesotho | 0.83 | 0.23 | 0.01 | 0.01 | 107 | 8 | 163 | 12.00 |
| Turkish | 0.62 | 0.20 | 0.08 | 0.08 | 15 | 34 | 318 | 48.00 |
| Yucatec | 0.78 | 0.28 | 0.01 | 0.01 | 133 | 3 | 41 | 3.00 |
Frequent morpheme frames.
| Accuracy | SD | Completeness | SD | Frames | Min | Max | Median | |
|---|---|---|---|---|---|---|---|---|
| Chintang | 0.95 | 0.09 | 0.08 | 0.07 | 60 | 517 | 7940 | 779.00 |
| Inuktitut | 0.93 | 0.16 | 0.02 | 0.01 | 100 | 5 | 43 | 6.50 |
| Japanese | 0.98 | 0.04 | 0.02 | 0.03 | 187 | 83 | 1943 | 157.00 |
| Sesotho | 0.97 | 0.12 | 0.04 | 0.04 | 88 | 66 | 1358 | 109.50 |
| Turkish | 0.88 | 0.17 | 0.01 | 0.01 | 835 | 21 | 1000 | 37.00 |
| Yucatec | 0.90 | 0.18 | 0.01 | 0.02 | 153 | 20 | 584 | 34.00 |
Global accuracy and completeness of words (nouns and verbs).
| Corpus | POS | Accuracy | Completeness | Frames | |
|---|---|---|---|---|---|
| 1 | Chintang | N | 0.23 | 0.02 | 234 |
| 2 | Inuktitut | N | 1.00 | 0.02 | 6 |
| 3 | Japanese | N | 0.72 | 0.16 | 893 |
| 4 | Russian | N | 0.43 | 0.05 | 937 |
| 5 | Sesotho | N | 0.89 | 0.05 | 81 |
| 6 | Turkish | N | 0.48 | 0.02 | 139 |
| 7 | Yucatec | N | 0.75 | 0.06 | 120 |
| 8 | Chintang | V | 0.77 | 0.05 | 1447 |
| 9 | Inuktitut | V | 1.00 | 0.02 | 13 |
| 10 | Japanese | V | 0.95 | 0.12 | 628 |
| 11 | Russian | V | 0.54 | 0.03 | 690 |
| 12 | Sesotho | V | 0.87 | 0.05 | 62 |
| 13 | Turkish | V | 0.70 | 0.01 | 95 |
| 14 | Yucatec | V | 0.70 | 0.04 | 96 |
Global accuracy and completeness of morphemes (nouns and verbs).
| Corpus | POS | Accuracy | Completeness | Frames | |
|---|---|---|---|---|---|
| 1 | Inuktitut | N | 1.00 | 0.01 | 3 |
| 2 | Japanese | N | 0.93 | 0.06 | 291 |
| 3 | Sesotho | N | 0.97 | 0.14 | 105 |
| 4 | Turkish | N | 0.91 | 0.25 | 601 |
| 5 | Yucatec | N | 0.86 | 0.30 | 312 |
| 6 | Chintang | V | 0.99 | 0.51 | 479 |
| 7 | Inuktitut | V | 0.80 | 0.13 | 26 |
| 8 | Japanese | V | 0.97 | 0.44 | 422 |
| 9 | Sesotho | V | 1.00 | 0.62 | 448 |
| 10 | Turkish | V | 0.97 | 0.60 | 479 |
| 11 | Yucatec | V | 0.98 | 0.44 | 262 |
Morpheme and word types and their accuracy.
| Language | Word types | Accuracy | Morpheme types | Accuracy |
|---|---|---|---|---|
| Chintang | 51,180 | 0.57 | 5518 | 0.95 |
| Inuktitut | 12,140 | 0.98 | 734 | 0.93 |
| Japanese | 20,746 | 0.82 | 7525 | 0.98 |
| Sesotho | 5517 | 0.83 | 2437 | 0.97 |
| Turkish | 61,277 | 0.62 | 4034 | 0.88 |
| Yucatec | 16,626 | 0.78 | 2612 | 0.90 |