| Literature DB >> 33329209 |
Jayden L Macklin-Cordes1, Erich R Round1,2,3.
Abstract
Causal processes can give rise to distinctive distributions in the linguistic variables that they affect. Consequently, a secure understanding of a variable's distribution can hold a key to understanding the forces that have causally shaped it. A storied distribution in linguistics has been Zipf's law, a kind of power law. In the wake of a major debate in the sciences around power-law hypotheses and the unreliability of earlier methods of evaluating them, here we re-evaluate the distributions claimed to characterize phoneme frequencies. We infer the fit of power laws and three alternative distributions to 166 Australian languages, using a maximum likelihood framework. We find evidence supporting earlier results, but also nuancing them and increasing our understanding of them. Most notably, phonemic inventories appear to have a Zipfian-like frequency structure among their most-frequent members (though perhaps also a lognormal structure) but a geometric (or exponential) structure among the least-frequent. We compare these new insights the kinds of causal processes that affect the evolution of phonemic inventories over time, and identify a potential account for why, despite there being an important role for phonetic substance in phonemic change, we could still expect inventories with highly diverse phonetic content to share similar distributions of phoneme frequencies. We conclude with priorities for future work in this promising program of research.Entities:
Keywords: Australian languages; Zipf's law; distributions; maximum likelihood; phoneme inventories; phonology; power laws
Year: 2020 PMID: 33329209 PMCID: PMC7714923 DOI: 10.3389/fpsyg.2020.570895
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Figure 1Frequency of phonemes in Walmajarri lexicon (Hudson and Richards, 1993). (A) Displays relative frequencies of each segment type. (B) Shows the same frequencies on log-transformed x and y axes—the traditional visual device used to identify power laws.
Figure 2Log-log plot of frequencies vs. frequency ranks in Walmajarri. When a linear model is fitted to the full distribution (dashed black), high and low frequency segments are overestimated and mid-rank segments are underestimated. When lowest-frequency segments are removed from the model (solid blue), the model appears to fit well.
Power law (without x).
| α | 1.38 | 0.17 | 1.16 | 2.18 |
| Goodness-of-fit | 0.35 | 0.07 | 0.15 | 0.53 |
| 0.01 | 0.03 | 0.00 | 0.27 |
Summary of α paramter, goodness-of-fit and p-values for the power law distribution fitted to each language's full phonemic inventory.
Power law distribution (with x).
| α | 2.75 | 0.65 | 1.51 | 6.14 |
| Goodness-of-fit | 0.14 | 0.03 | 0.08 | 0.22 |
| 0.62 | 0.26 | 0.01 | 1.00 |
Summary of α paramter, goodness-of-fit, and p-values for the power law distribution fitted to a subset of more frequent phonemes in each language.
Figure 3Four distribution types: power law, exponential, lognormal and Poisson, each illustrated with four parameterizations.
Results summary.
| Power law | 2 (1%) | 158 (95%) | 56% |
| Lognormal | 93 (56%) | 155 (93%) | 78% |
| Exponential | 147 (89%) | 146 (88%) | 84% |
| Poisson | 0 (0%) | 43 (26%) | 17% |
For each of the four distributions considered, this table lists the number of languages (and percentage of the total language sample) for which the distribution plausibly fits, as indicated by an uncorrected p>0.1 value. “Prop. fitted” gives the average proportion of each language's phoneme inventory above x.