| Literature DB >> 23046894 |
Abstract
ABSTRACT What are the sources of variation in the input, and how much do they matter for language acquisition? This study examines frequency variation in manner-of-articulation classes in child and adult input. The null hypothesis is that segmental frequency distributions of language varieties are unigram (modelable by stationary, ergodic processes), and that languages are unitary (modelable as a single language variety). Experiment I showed that English segments are not unigram; they exhibit a 'bursty' distribution in which the local frequency varies more than expected by chance alone. Experiment II showed the English segments are approximately unitary: the natural background variation in segmental frequencies that arises within a single language variety is much larger than numerical differences across varieties. Variation in segmental frequencies seems to be driven by variation in discourse topic; topic-associated words cause bursts/lulls in local segmental frequencies. The article concludes with some methodological recommendations for comparing language samples.Entities:
Mesh:
Year: 2012 PMID: 23046894 PMCID: PMC3798116 DOI: 10.1017/S0305000912000372
Source DB: PubMed Journal: J Child Lang ISSN: 0305-0009
| Set of speakers present in the document | Number of documents | Cumulative % of documents |
|---|---|---|
| Target, Parent | 1797 | 38·9 |
| Target, Parent, Investigator | 1189 | 64·6 |
| Sibling, Target, Parent, Investigator | 283 | 70·7 |
| Sibling, Target, Parent | 271 | 76·6 |
| Adult, Target, Parent | 176 | 80·4 |
| Target | 114 | 82·9 |
| Target, Other, Parent | 82 | 84·6 |
| Adult, Target, Other, Parent | 66 | 86·1 |
| Target, Investigator | 57 | 87·3 |
| Family, Target, Parent | 49 | 88·4 |
note: Role code and roles columns indicate speaker roles; document count column indicates how many documents had that exact combination of roles; the cumulative percentage of all documents is given in the remaining column. For example, the 64·6% in the cumulative percentage cell of the second row indicates that documents containing just a child and their parent (38·9%) or just a child, their parent, and the investigator (64·5–38 ·9 = 25·6%) jointly make up 64·5% of all documents in the corpus.
| Role | Documents | Utterances | Words | % | Documents | Utterances | Words |
|---|---|---|---|---|---|---|---|
| Target | 4486 | 946024 | 3276634 | 97·1 | 38·4 | 32·3 | |
| Parent | 4298 | 1198243 | 5496751 | 93·0 | 48·7 | 54·0 | |
| Investigator | 1783 | 161283 | 746854 | 38·6 | 6·6 | 7·3 | |
| Sibling | 691 | 55180 | 209738 | 15·0 | 2·2 | 2·1 | |
| Adult | 490 | 56267 | 270826 | 10·6 | 2·3 | 2·7 | |
| Other | 299 | 24613 | 98038 | 6·5 | 1·0 | 1·0 | |
| Family | 167 | 16664 | 80151 | 3·6 | 0·7 | 0·8 | |
| Child | 75 | 2329 | 9351 | 1·6 | 0·1 | 0·1 |
note: Absolute (left) and relative (right) amount of input by speaker role. The # and % columns indicate the role code. Documents columns give the number or percentage of documents in which the role appears. Utterances and words give the number and percentage of utterances and words contributed by each speaker role.
The most frequently occurring forms in CHILDES and the Buckeye corpus not listed in the original dictionary file
| Form (CHILDES) | Frequency | Form (Buckeye) | Frequency |
|---|---|---|---|
| xxx | 41201 | yknow | 2264 |
| xx | 36627 | um-hum | 565 |
| hmm | 10411 | mm-hmm | 39 |
| www | 6888 | hm | 19 |
| uhhuh | 5764 | mm | 17 |
Fig. 1.Expected versus observed relative frequency density distribution of [l], with 95% confidence intervals.
Fig. 2.Violin plot of manner class relative frequencies. Left violins of each pair indicate adult-directed speech; right violins indicate child-directed speech. See text for further details on interpretation.
Fig. 3.Log odds-transformed p-values for each manner class as a function of number of documents in subsamples.
| Rank | Word | Δ(glide | ω) (%) | Word | Δ(nasal | ω) (%) |
|---|---|---|---|---|
| 1 | you | 1·63 | and | −1·17 |
| 2 | what | 0·67 | mean | −0·36 |
| 3 | your | 0·31 | um | −0·28 |
| 4 | what's | 0·21 | i'm | −0·17 |
| 5 | want | 0·17 | my | −0·15 |
| … | … | … | … | … |
| −5 | well | −0·06 | mhm | 0·17 |
| −4 | were | −0·06 | want | 0·17 |
| −3 | years | −0·07 | on | 0·19 |
| −2 | when | −0·08 | can | 0·22 |
| −1 | was | −0·30 | no | 0·26 |
| Total | – | 2·96 | – | −1·85 |
note: Top five words contributing to the asymmetry between CDS and ADS in the relative frequency of glides (columns 2–3) and nasals (columns 4–5) are shown in the top five data rows. The five words that anti-contribute the most to the total asymmetry are shown in the bottom five data rows. The global manner asymmetry is shown in the bottom row. Positive numbers indicate the manner is more frequent in CDS than in ADS; negative numbers indicate the opposite.