| Literature DB >> 21106583 |
Shijulal Nelson-Sathi1, Johann-Mattis List, Hans Geisler, Heiner Fangerau, Russell D Gray, William Martin, Tal Dagan.
Abstract
Language evolution is traditionally described in terms of family trees with ancestral languages splitting into descendent languages. However, it has long been recognized that language evolution also entails horizontal components, most commonly through lexical borrowing. For example, the English language was heavily influenced by Old Norse and Old French; eight per cent of its basic vocabulary is borrowed. Borrowing is a distinctly non-tree-like process--akin to horizontal gene transfer in genome evolution--that cannot be recovered by phylogenetic trees. Here, we infer the frequency of hidden borrowing among 2346 cognates (etymologically related words) of basic vocabulary distributed across 84 Indo-European languages. The dataset includes 124 (5%) known borrowings. Applying the uniformitarian principle to inventory dynamics in past and present basic vocabularies, we find that 1373 (61%) of the cognates have been affected by borrowing during their history. Our approach correctly identified 117 (94%) known borrowings. Reconstructed phylogenetic networks that capture both vertical and horizontal components of evolutionary history reveal that, on average, eight per cent of the words of basic vocabulary in each Indo-European language were involved in borrowing during evolution. Basic vocabulary is often assumed to be relatively resistant to borrowing. Our results indicate that the impact of borrowing is far more widespread than previously thought.Entities:
Mesh:
Year: 2010 PMID: 21106583 PMCID: PMC3097823 DOI: 10.1098/rspb.2010.1917
Source DB: PubMed Journal: Proc Biol Sci ISSN: 0962-8452 Impact factor: 5.349
Figure 1.Etymological reconstruction of the concept tooth. The English and German word forms have descended from the Proto-Germanic ancestor [52]. The Italian and French words are descendants of Latin, and the Proto-Germanic and Latin forms stem from Proto-Indo-European [43,53].
Figure 2.Modules in the shared COGs network. (a) A graphic representation of cognate PAPs. Languages are sorted by their order on the reference phylogenetic tree [3]. COGs are sorted by their size in ascending order. A presence case of a certain COG in a certain language is coloured in blue if the COG pattern is congruent with the tree branching patterns and red otherwise. (b) A matrix representation of the shared COGs network in Indo-European languages. Cells in the matrix are edges in the network. Edges are colour-coded by the frequency of shared cognate according to the colour bar at the bottom. The languages in the matrix are sorted by order of appearance in the phylogenetic tree on the left. (c) Modules within the shared COGs network. Languages included in the same module are coloured in the same colour.
Figure 3.Inference of borrowing frequency by ancestral vocabulary size. (a–d) Schematic (left) and dynamics of ancestral and contemporary vocabulary size (right) under the different borrowing models. The fraction of interquartile range ((Medianancestral − Mediancontemporary)/IQRcontemporary) in the different models is as follows. Loss only: 2.92; origin only: 1.93; BOR1: 0.12; BOR3: −0.86. Green triangles, origin; red circles, loss; green circles, word presence; blue line, contemporary languages; red line, ancestral languages.
Figure 4.The MLN of Indo-European languages. (a) An MLN for 84 contemporary languages reconstructed under the BOR1 model. Vertical edges are indicated in grey, with both the width and the shading of the edge shown proportional to the number of inferred vertically inherited COGs along the edge (see the scale). The lateral network is indicated by edges that do not map onto the vertical component, with the number of cognates per edge indicated in colour (see the scale). Lateral edges that link ancestral nodes represent laterally shared COGs among the descendent languages of the connected nodes, whose distribution pattern could not be explained by origin and LO under the ancestral vocabulary size constraint. The two heaviest edges of Slovene (Slavic) and Romanian (Romance) are marked by an arrow. (b) Distribution of connectivity, the number of one-edge-distanced neighbours for each vertex, in the network. (c) Frequency distribution of edge weight in the lateral component of the network.
Reconstructed borrowing events. The origin node that includes the reinserted borrowing is shaded in light grey.
| edge type | origin node | number of reinserted borrowings |
|---|---|---|
| external–external | 1 | |
| external–internal | 18 | |
| 58 | ||
| internal–internal | 40 |
Lateral edge (LE) frequencies between and within groups in the MLN.
| normalized borrowing | median LE weightb | |||||
|---|---|---|---|---|---|---|
| group | int | ext | int | ext | ||
| Greek | 9 | 1.22 | 0.25 | 2 | 1 | <0.05 |
| Armenian | 3 | 0 | 0.17 | 0 | 1 | n.a. |
| Celtic | 13 | 1.61 | 0.29 | 2 | 1 | ≪0.05 |
| Romance | 31 | 2.45 | 0.36 | 1 | 1 | ≪0.05 |
| Germanic | 29 | 2.37 | 0.44 | 1 | 1 | ≪0.05 |
| Slavic | 31 | 2.35 | 0.64 | 1 | 1 | ≪0.05 |
| Albanian | 9 | 1.55 | 0.18 | 4 | 1 | ≪0.05 |
| Indic | 21 | 3.33 | 0.68 | 2 | 1 | ≪0.05 |
| Iranian | 14 | 2.35 | 0.75 | 2 | 1 | ≪0.05 |
aNumber of languages within group.
bRange of median number of COGs per lateral edge.
cOne-side Kolmogorov–Smirnov test for lateral edge distribution.
dFor internal edges (int), number of internal edges per number of nodes within the group; for external edges (ext), number of external edges per number of nodes outside the group.