| Literature DB >> 26820737 |
Susanne Vejdemo1, Thomas Hörberg1.
Abstract
The rate of lexical replacement estimates the diachronic stability of word forms on the basis of how frequently a proto-language word is replaced or retained in its daughter languages. Lexical replacement rate has been shown to be highly related to word class and word frequency. In this paper, we argue that content words and function words behave differently with respect to lexical replacement rate, and we show that semantic factors predict the lexical replacement rate of content words. For the 167 content items in the Swadesh list, data was gathered on the features of lexical replacement rate, word class, frequency, age of acquisition, synonyms, arousal, imageability and average mutual information, either from published databases or gathered from corpora and lexica. A linear regression model shows that, in addition to frequency, synonyms, senses and imageability are significantly related to the lexical replacement rate of content words-in particular the number of synonyms that a word has. The model shows no differences in lexical replacement rate between word classes, and outperforms a model with word class and word frequency predictors only.Entities:
Mesh:
Year: 2016 PMID: 26820737 PMCID: PMC4731055 DOI: 10.1371/journal.pone.0147924
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Translation equivalents for the concepts dirty and tongue in some Slavic and Germanic languages.
Whereas the words for DIRTY come from eight different cognate classes, the words for TONGUE are all a cognate of the Indo-European original word *dnghwa, and therefore come from one cognate class.
| Language name | Language family | DIRTY | TONGUE | ||
|---|---|---|---|---|---|
| Words | Classes | Words | Class | ||
| Byelorussian | Slavic | BRUDNY | 1 | Jazyk | 1 |
| Slovak | Slavic | BRUDNY | 1 | Jazyk | 1 |
| Polish | Slavic | BRUDNY | 1 | Jezyk | 1 |
| Czech | Slavic | SPINAVY | 2 | Jazyk | 1 |
| Icelandic | Germanic | SKITUGUR | 3 | Tunga | 1 |
| Norwegian | Germanic | SKIDDEN | 3 | Tunge | 1 |
| Faroese | Germanic | SKITIN | 3 | Tunga | 1 |
| Danish | Germanic | BESKIDT | 3 | Tunge | 1 |
| Sorbian | Slavic | MAZANY | 4 | Jazyk | 1 |
| Slovenian | Slavic | UMAZANU | 4 | Jezik | 1 |
| Bulgarian | Slavic | MRESNO | 5 | Ezik | 1 |
| Serbocroatian | Slavic | PRLJAV | 6 | Jezik | 1 |
| Macedonian | Slavic | PRLAV | 6 | Jazik | 1 |
| German | Germanic | SCHMUTZIG | 7 | Zunge | 1 |
| English | Germanic | DIRTY | 8 | Tongue | 1 |
Fig 1Lexical replacement rate as function of normalized frequency in English, Greek, Russian and Spanish, for concepts of open (red: adjectives, green: nouns, blue: verbs) and closed (yellow: adverbs, grey: conjunctions, purple: numbers, turquoise: prepositions, orange: pronouns) word classes, respectively.
Correlation matrix of correlations between all variables in the study.
P values for significance tests have been corrected for multiple comparisons using Holm correction.
| Rate | Log Frequency | Synonyms | Mutual-Information | Imageability | Arousal | Log Senses | Log AgeOfAcq | |
|---|---|---|---|---|---|---|---|---|
| Rate | - | -.273 | .242 | -.281 | -.254 | .029 | -.046 | .255 |
| LogFrequency | -.273 | - | .381 | .221 | -.208 | -.046 | .357 | -.482 |
| Synonyms | .242 | .381 | - | -.003 | -.486 | .195 | .592 | -.043 |
| Mutual-Information | -.281 | .221 | -.003 | - | .506 | -.111 | .031 | -.367 |
| Imageability | -.254 | -.208 | -.486 | .506 | - | -.071 | -.369 | -.238 |
| Arousal | .029 | -.046 | .195 | -.111 | -.071 | - | .046 | .097 |
| LogSenses | -.046 | .357 | .592 | .031 | -.369 | .046 | - | -.178 |
| LogAgeOfAcq | .255 | -.482 | -.043 | -.367 | -.238 | .097 | -.178 | - |
***: p < .0001
**: p < .01
* p < .05.
β coefficients and inferential statistics of the original and the bootstrapped model.
The table also includes 95% point wise confidence intervals for the coefficients, based on the 0.025 and 0.975 quantiles of the coefficient estimates of the 10000 bootstrap samples. The table also includes ΔR2 for each predictor, that is, the proportion of variance of Lexical Replacement Rate explained by each predictor, over and above that of all other predictors in the model. For technical reasons, the Word Class variable, which has three values (verb, noun or adjective), is represented as three different binary variables: Word class: Noun, Word class: Verb, Word class: Adjective, and the last of these is not entered into the model since its information is already there: if something is not a verb or a noun, it is an adjective.
| Predictor | Original model | Bootstrapped model | CI lower | CI upper | ΔR2 | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| β | Std. error | t | β | Std. error | Z | ||||||
| 9.87 | 2.29 | 4.31 | 0 | 9.92 | 2.68 | 3.71 | 0 | 4.66 | 15.26 | - | |
| -0.76 | 0.15 | -4.93 | 0 | -0.76 | 0.17 | -4.37 | 0 | -1.11 | -0.42 | 16.3% | |
| 1.84 | 0.42 | 4.33 | 0 | 1.83 | 0.42 | 4.38 | 0 | 0.98 | 2.62 | 12.5% | |
| 0.03 | 0.12 | 0.21 | 0.834 | 0.02 | 0.14 | 0.14 | 0.891 | -0.26 | 0.27 | 0.0% | |
| -0.67 | 0.22 | -3.02 | 0.003 | -0.67 | 0.24 | -2.78 | 0.005 | -1.14 | -0.19 | 6.1% | |
| -0.19 | 0.15 | -1.25 | 0.215 | -0.2 | 0.14 | -1.41 | 0.158 | -0.48 | 0.08 | 1.0% | |
| -0.55 | 0.24 | -2.25 | 0.027 | -0.53 | 0.24 | -2.18 | 0.029 | -1 | -0.04 | 3.4% | |
| -0.36 | 0.74 | -0.48 | 0.629 | -0.36 | 0.74 | -0.48 | 0.629 | -1.79 | 1.16 | 0.2% | |
| 1.15 | 0.74 | 1.55 | 0.125 | 1.16 | 0.75 | 1.54 | 0.123 | -0.24 | 2.72 | 1.6% | |
| 0.38 | 0.54 | 0.7 | 0.486 | 0.4 | 0.46 | 0.87 | 0.386 | -0.47 | 1.33 | 0.3% | |
Fig 2Scatterplots of the relationship between the rate of lexical replacement and logSenses.
The left hand panel shows the relationship between Lexical Replacement Rate and LogSenses for three different levels of synonyms (Low: 0–0.65 mean synonyms; Medium: 0.65–1.1 mean synonyms; and High: 1.1–2.65 mean synonyms). The right hand panel shows the relationship between Lexical Replacement Rate and LogSenses when the average number of synonyms is not controlled for. Shaded areas represent 95% confidence intervals of the slopes of the regression lines.
Fig 3Scatterplots of the relationships between the rate of lexical replacement and (A) residualized log Frequency, (B), residualized Synonyms, (C) residualized Imageability and (D) residualized Senses.
Shaded areas represent 95% confidence intervals of the slopes of the regression lines.