| Literature DB >> 23028843 |
Dan Dediu1, Stephen C Levinson.
Abstract
Language is the best example of a cultural evolutionary system, able to retain a phylogenetic signal over many thousands of years. The temporal stability (conservatism) of basic vocabulary is relatively well understood, but the stability of the structural properties of language (phonology, morphology, syntax) is still unclear. Here we report an extensive Bayesian phylogenetic investigation of the structural stability of numerous features across many language families and we introduce a novel method for analyzing the relationships between the "stability profiles" of language families. We found that there is a strong universal component across language families, suggesting the existence of universal linguistic, cognitive and genetic constraints. Against this background, however, each language family has a distinct stability profile, and these profiles cluster by geographic area and likely deep genealogical relationships. These stability profiles seem to show, for example, the ancient historical relationships between the Siberian and American language families, presumed to be separated by at least 12,000 years, and possible connections between the Eurasian families. We also found preliminary support for the punctuated evolution of structural features of language across families, types of features and geographic areas. Thus, such higher-level properties of language seen as an evolutionary system might allow the investigation of ancient connections between languages and shed light on the peopling of the world.Entities:
Mesh:
Year: 2012 PMID: 23028843 PMCID: PMC3447929 DOI: 10.1371/journal.pone.0045198
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1The stability hyper-cube for two features and , the stability profiles of three language families , and and the stability distances between language families (shown for and ).
Please note that and are very close in this space.
Top and bottom 15 most stable features.
| Rank | Polymorphic features |
| 1 | Absence of Common Consonants |
| 2 | Front Rounded Vowels |
| 3 | The Optative |
| 4 | Vowel Nasalization |
| 5 | Obligatory Possessive Inflection |
| 6 | Order of Genitive and Noun |
| 7 | N-M Pronouns |
| 8 | Nominal and Locational Predication |
| 9 | Uvular Consonants |
| 10 | M-T Pronouns |
| 11 | Order of Object and Verb |
| 12 | Order of Numeral and Noun |
| 13 | Numeral Classifiers |
| 14 | Order of Subject and Verb |
| 15 | Tone |
| … | … |
| 54 | Locus of Marking in the Clause |
| 55 | Voicing in Plosives and Fricatives |
| 56 | Symmetric and Asymmetric Standard Negation |
| 57 | Applicative Constructions |
| 58 | Relationship between the Order of Obj. and Verb and the Order of Adj. and Noun |
| 59 | Order of Person Markers on the Verb |
| 60 | Indefinite Articles |
| 61 | Asymmetrical Case-Marking |
| 62 | Definite Articles |
| 63 | Third Person Pronouns and Demonstratives |
| 64 | Position of Polar Question Particles |
| 65 | Number of Cases |
| 66 | Ordinal Numerals |
| 67 | Consonant-Vowel Ratio |
| 68 | Consonant Inventories |
This ranking represents the consensus among all 12 datasets as given by the first principal component () of a Principal Component Analysis run on all polymorphic ranks, explaining of the variance and representing the agreement. See for details and WALS [31], [32] for the description of the features.
Figure 2Multidimensional scaling (MDS) plot of the relationships between the stability profiles of the language families for the MBE dataset.
Shown are the first (horizontal) and second (vertical) dimensions. We distinguished ten geographical regions represented by a distinct color and single digits, as follows: South America (0, dark blue), Central America (1, blue), South America (2, light blue), Southern Africa (3, black), Northern Africa (4, red), Eurasia (5, pink), South Asia (6, orange), Oceania (7, green), Papua-New Guinea (8, dark green) and Australia (9, cyan). The language families are represented by single lower case letters allocated in alphabetical order per geographical region, as follows: Arawakan (0a), Carib (0b), Macro-Ge (0c), Tucanoan (0d), Tupi (0e), Chibchan (1a), Mayan (1b), Oto-Manguean (1c), Uto-Aztecan (1d), Algic (2a), Hokan (2b), Na-Dene (2c), Penutian (2d), Salishan (2e), Wakashan (2f), Khoisan (3a), Niger-Congo (3b), Afro-Asiatic (4a), Nilo-Saharan (4b), Altaic (5a), Chukotko-Kamchatkan (5b), Dravidian (5c), Indo-European (5d), North-Caucasian (5e), Uralic (5f), Austro-Asiatic (6a), Sino-Tibetan (6b), Tai-Kadai (6c), Austronesian (7a), Sepik (8a), Trans-New-Guinea (8b), West-Papuan (8c) and Australian (9a). It can be seen that most of the American language families are distinguished from the others by the first dimension (left side) respecting the north (bottom) - south (top) geographic direction as well (second dimension). Eurasia occupies the bottom-right quadrant while South Asia and Oceania group together as well. Interestingly, Chukotko-Kamchatkan (5b; marked with a black arrow) clusters with the (Central and North) American language families. See supplementary figures in for all 12 datasets.
Figure 3Network representation of the relationships between the same stability profiles as in Figure 2 (same conventions apply).
Same clusters as in Figure 2 can be observed but the attachment of Chukotko-Kamchatkan (5b; marked with a black arrow) is now clearer with the North American families Algic (2a), Penutian (2d), Wakashan (2f), and the Central American Uto-Aztecan (1d) whose geographical range, in fact, extends well into North America. See supplementary figures in for all 12 datasets.
Statistical robustness of sets of language families.
| Set of families | Raw | Controlling for geography | ||
| Most conservative | Number signif. | Most conservative | Number signif. | |
| Africa | 0.074 | 3 | 0.39 | 0 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| C America (vs world) | 0.38 | 0 | 0.90 | 0 |
| C America (vsAmerica) | 0.99 | 0 | 0.96 | 0 |
| N America (vs world) |
|
| 0.072 | 2 |
| N America (vsAmerica) | 0.12 | 3 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| C America + Siberia | 0.37 | 0 | 0.42 | 0 |
|
|
|
|
|
|
| Eurasia |
|
| 0.70 | 3 |
|
|
|
| 0.094 |
|
| Nostratic v1 |
|
| 0.13 | 3 |
| Nostratic v2 | 0.24 | 0 | 0.77 | 0 |
| SE Asia + Oceania | 0.48 | 0 | 0.83 | 0 |
| Austro-Tai | 0.070 | 3 | 0.12 | 3 |
| PNG |
|
| 0.22 | 0 |
| Australia | 0.42 | 0 | 0.51 | 0 |
| PNG + Australia | 0.87 | 0 | 0.99 | 0 |
The most conservative combined p-value and the number of combined p-values significant at -level = 0.05 for the five methods (Fisher, Z-transform, Hartung, Simes and Makambi) as applied to all 12 datasets for raw and geography-corrected stability distances. The combined p-values significant at -level = 0.05 are in bold). The sets with at least 4 significant combined p-values in both the raw and geography-corrected columns are also in bold. See for full details.
See for the exact composition of these sets. (vs America): randomization only within the Americas. (vs world): randomization not restricted.
Here we report the results for the maximal composition of “Siberia”, namely Chukotko-Kamchatkan, Tungusic and Yukaghir (the results are very similar when excluding Tungusic). See text and for details.