| Literature DB >> 34151254 |
John L A Huisman1, Karlien Franco2, Roeland van Hout1.
Abstract
Dialectometry studies patterns of linguistic variation through correlations between geographic and aggregate measures of linguistic distance. However, aggregating smooths out the role of semantic characteristics, which have been shown to affect the distribution of lexical variants across dialects. Furthermore, although dialectologists have always been well-aware of other variables like population size, isolation and socio-demographic features, these characteristics are generally only included in dialectometric analyses afterwards for further interpretation of the results rather than as explanatory variables. This study showcases linear mixed-effects modelling as a method that is able to incorporate both language-external and language-internal factors as explanatory variables of linguistic variation in the Limburgish dialect continuum in Belgium and the Netherlands. Covering four semantic domains that vary in their degree of basic vs. cultural vocabulary and their degree of standardization, the study models linguistic distances using a combination of external (e.g., geographic distance, separation by water, population size) and internal (semantic density, salience) sources of variation. The results show that both external and internal factors contribute to variation, but that the exact role of each individual factor differs across semantic domains. These findings highlight the need to incorporate language-internal factors in studies on variation, as well as a need for more comprehensive analysis tools to help better understand its patterns.Entities:
Keywords: computational sociolinguistics; dialectometry; lexical variation; limburg; mixed-effects regression; semantic variation; spatial analysis
Year: 2021 PMID: 34151254 PMCID: PMC8211982 DOI: 10.3389/frai.2021.668035
Source DB: PubMed Journal: Front Artif Intell ISSN: 2624-8212
FIGURE 1The Limburgish dialect region (in green). The brown line within the green area is the boundary between the Dutch (on the right) and the Belgian (on the left) province of Limburg. The purple line in the south of the dialect region shows the boundary between the Dutch-speaking (northern) and French-speaking (southern) part of Belgium. The other provinces of the Netherlands and Belgium are colored light-blue and pink.
FIGURE 2Map of locations included in the database, with their classification into one of six dialect areas.
Data per semantic domain in the database.
| Semantic domain | Number of locations | Number of concepts | Number of subsections per level of depth | Ratio of multi-word concepts | Concept length | |||
|---|---|---|---|---|---|---|---|---|
| 1 | 2 | Max | Mean | Median | ||||
| Church and religion | 114 | 592 | 5 | 19 | 35 | 0.30 | 12.9 | 11 |
| Clothing and personal hygiene | 188 | 323 | 5 | 18 | 43 | 0.48 | 13.8 | 12 |
| Human body | 179 | 180 | 3 | 14 | 18 | 0.32 | 10.7 | 9 |
| Society and education | 175 | 462 | 4 | 19 | 55 | 0.21 | 9.9 | 9 |
FIGURE 3Locations available per semantic domain (in red). Grey circles indicate that no data is available for a particular semantic field.
Examples of concepts in the subsections of the Church and religion domain.
| Subsection | Examples |
|---|---|
| In and around the church building | Church, leaded window, credence (table), church bell, tombstone |
| Liturgy and devotion | Early mass, offertory, holy water, rosary, to pray |
| Catholic holy days and rites | Patron saint, advent, good friday, confirmation, confession |
| Catholic belief and faith | Catechism, devil, baby jesus, fasting |
| The clergy | Pope, franciscan, monk, dean |
Examples of concepts in the subsections of Clothing and personal hygiene domain.
| Subsection | Examples |
|---|---|
| Clothing | Smock, undervest, to fit, women’s coat |
| Headgear | Beret, hat, pom-pom of a bonnet, bowler hat |
| Foot- and legwear | Barefoot, women’s shoe with medium or high heel, clog, sock |
| Jewelry and ornaments | Watch, medallion, jewelry, sequin |
| Personal hygiene | To shower, to brush teeth, toothpick, razor |
Examples of concepts in the subsections of the Human body domain.
| Subsection | Examples |
|---|---|
| The body and the body parts | Short, curly hair, eye, navel |
| Organs and their functions | To breathe, stomach, kidney, diaphragm |
| The senses | To see, to wink, flavor, to listen attentively |
Examples of concepts in the subsections of the Society and education domain.
| Subsection | Examples |
|---|---|
| Man and society | Company, to peddle, night owl, market booth, (Dutch) guilder, impolite, fairy tale, to complain |
| Societal organization | Mayor, liberal, charge, perjury, soldier, to fall in battle |
| Transportation | Pedestrian, women’s bicycle, train, steamboat, airplane, to travel |
| School and education | Boarding school, teacher, ruler, report card |
FIGURE 4Distribution of linguistic distances across four semantic domains.
FIGURE 5Linguistic distance over geographic distance across four semantic domains.
Multiple regression over distance matrices (MRM) results for the Church and religion domain.
| Estimate |
| |
|---|---|---|
| Intercept | 0.820 | <0.001 |
| Log geographic distance | 0.010 | <0.001 |
| Dialect area | 0.003 | 0.009 |
| National border | 0.033 | <0.001 |
| Border * distance | −0.003 | 0.110 |
| Separation by water | −0.023 | 0.001 |
| Water * distance | 0.008 | <0.001 |
| Log population difference | 0.001 | 0.266 |
R2 = 0.329.
Multiple regression over distance matrices (MRM) results for the Clothing and personal hygiene domain.
| Estimate |
| |
|---|---|---|
| Intercept | 0.698 | <0.001 |
| Log geographic distance | 0.039 | <0.001 |
| Dialect area | 0.004 | 0.004 |
| National border | 0.071 | <0.001 |
| Border * distance | −0.006 | 0.010 |
| Separation by water | −0.022 | 0.003 |
| Water * distance | 0.005 | 0.015 |
| Log population difference | 0.000 | 0.987 |
R2 = 0.453.
Multiple regression over distance matrices (MRM) results for the Human body domain.
| Estimate |
| |
|---|---|---|
| Intercept | 0.769 | <0.001 |
| Log geographic distance | 0.021 | <0.001 |
| Dialect area | 0.004 | 0.006 |
| National border | 0.032 | <0.001 |
| Border * distance | −0.002 | 0.510 |
| Separation by water | 0.000 | 0.957 |
| Water * distance | 0.003 | 0.178 |
| Log population difference | −0.001 | 0.393 |
R2 = 0.264.
Multiple regression over distance matrices (MRM) results for the Society and education domain.
| Estimate |
| |
|---|---|---|
| Intercept | 0.795 | 0.997 |
| Log geographic distance | 0.016 | 0.000 |
| Dialect area | 0.002 | 0.407 |
| National border | 0.011 | 0.374 |
| Border * distance | 0.004 | 0.197 |
| Separation by water | −0.008 | 0.423 |
| Water * distance | 0.003 | 0.216 |
| Log population difference | −0.001 | 0.263 |
R2 = 0.087.
Significant explanatory factors (+ for positive coefficients; − for negative coeffecients) and R2-values across the four semantic domains based on the multiple regression of distance matrices (MRM) analyses.
| Church and religion | Clothing and personal hygiene | Human body | Society and education | |
|---|---|---|---|---|
| Log geographic distance | + | + | + | + |
| Dialect area | + | + | + | |
| National border | + | + | + | |
| Border * distance | − | |||
| Separation by water | − | − | ||
| Water * distance | + | + | ||
| Log population difference | ||||
| MRM R2 | 0.33 | 0.45 | 0.26 | 0.09 |
Linear mixed-effect modelling results for the Church and religion domain, showing beta coefficients, standard errors, t-values and significance levels.
|
| SE |
|
| |
|---|---|---|---|---|
| Intercept | −0.015 | 0.045 | 0.34 | 0.731 |
| Log geographic distance | 0.299 | 0.009 | 34.39 | <0.001 |
| Dialect area | 0.022 | 0.007 | 2.92 | 0.003 |
| National border | 0.352 | 0.008 | 42.99 | <0.001 |
| Border * distance | 0.034 | 0.008 | 4.06 | <0.001 |
| Separation by water | 0.097 | 0.007 | 13.72 | <0.001 |
| Water * distance | 0.035 | 0.008 | 4.46 | <0.001 |
| Log population difference | 0.012 | 0.009 | 1.38 | 0.167 |
Conditional R2 = 0.538, Marginal R2 = 0.314.
Linear mixed-effect modelling results for the Clothing and personal hygiene domain, showing beta coefficients, standard errors, t-values and significance levels.
|
| SE |
|
| |
|---|---|---|---|---|
| Intercept | −0.036 | 0.027 | 1.33 | 0.183 |
| Log geographic distance | 0.460 | 0.005 | 86.59 | <0.001 |
| Dialect area | −0.014 | 0.004 | 3.22 | 0.001 |
| National border | 0.346 | 0.005 | 72.43 | <0.001 |
| Border * distance | 0.136 | 0.005 | 25.60 | <0.001 |
| Separation by water | 0.011 | 0.005 | 2.24 | 0.025 |
| Water * distance | −0.020 | 0.005 | 3.99 | <0.001 |
| Log population difference | −0.003 | 0.006 | 0.46 | 0.645 |
Conditional R2 = 0.546, Marginal R2 = 0.420.
Linear mixed-effect modeling results for the Human body domain, showing beta coefficients, standard errors, t-values and significance levels.
|
| SE |
|
| |
|---|---|---|---|---|
| Intercept | −0.015 | 0.037 | 0.41 | 0.681 |
| Log geographic distance | 0.383 | 0.006 | 64.31 | <0.001 |
| Dialect area | 0.007 | 0.005 | 1.43 | 0.154 |
| National border | 0.241 | 0.005 | 47.00 | <0.001 |
| Border * distance | 0.073 | 0.005 | 13.25 | <0.001 |
| Separation by water | 0.097 | 0.005 | 2.68 | <0.001 |
| Water * distance | −0.012 | 0.005 | 2.42 | 0.016 |
| Log population difference | 0.011 | 0.006 | 1.86 | 0.063 |
Conditional R2 = 0.510, Marginal R2 = 0.286.
Linear mixed-effect modeling results for the Society and education domain, showing beta coefficients, standard errors, t-values and significance levels.
|
| SE |
|
| |
|---|---|---|---|---|
| Intercept | −0.007 | 0.028 | 0.27 | 0.791 |
| Log geographic distance | 0.215 | 0.007 | 28.92 | <0.001 |
| Dialect area | 0.008 | 0.006 | 1.33 | 0.183 |
| National border | 0.287 | 0.007 | 41.53 | <0.001 |
| Border * distance | 0.024 | 0.007 | 3.24 | 0.001 |
| Separation by water | 0.007 | 0.006 | 1.08 | 0.280 |
| Water * distance | 0.004 | 0.007 | 0.64 | 0.522 |
| Log population difference | 0.028 | 0.008 | 3.67 | <0.001 |
Conditional R2 = 0.292, Marginal R2 = 0.167.
Significant predictors across the four semantic domains (strongest predictor highlighted), and conditional and marginal R2-values for the linear mixed-effect models.
| Church and religion | Clothing and personal hygiene | Human body | Society and education | |
|---|---|---|---|---|
| Log geographic distance | + | + | + | + |
| Dialect area | + | − | ||
| National border | + | + | + | + |
| Border * distance | + | + | + | + |
| Separation by water | + | + | + | |
| Water * distance | + | − | − | |
| Log population difference | + | |||
| Conditional R2 | 0.54 | 0.55 | 0.51 | 0.29 |
| Marginal R2 | 0.31 | 0.42 | 0.29 | 0.17 |
FIGURE 6Map showing the Limburgish area with values of the mean random intercept for each location (A), and distribution of random intercepts across six dialect areas (B).
Overview of models including language-internal factors, showing beta coefficients, Akaike information criterion values compared to the baseline model with external factors only, χ 2-values, and significance levels.
|
| AIC |
|
| |
|---|---|---|---|---|
| External factors only | 1,82,438 | |||
| Domain (nominal) | ||||
|
| 0.118 | 1,81,087 | 1,383 | <0.001 |
|
| 0.079 | |||
|
| 0.036 | |||
| All internal factors merged | 0.069 | 1,81,920 | 528.7 | <0.001 |
| Subsections at one level of depth | 0.068 | 1,81,928 | 521.5 | <0.001 |
| Subsections at two levels depth | 0.020 | 1,82,409 | 42.24 | <0.001 |
| Subsections at maximum depth | −0.024 | 1,82,388 | 67.13 | <0.001 |
| Total number of concepts | 0.035 | 1,82,331 | 129.1 | <0.001 |
| Concepts at one level of depth | −0.006 | 1,82,452 | 4.31 | 0.038 |
| Concepts at two levels depth | 0.039 | 1,82,297 | 156.5 | <0.001 |
| Concepts at maximum depth | 0.076 | 1,81,834 | 618.1 | <0.001 |
| Ratio of multi-word concepts | 0.057 | 1,82,086 | 359.3 | <0.001 |
| Mean concept length | 0.085 | 1,81,623 | 828.0 | <0.001 |
| Median concept length | 0.078 | 1,81,764 | 686.5 | <0.001 |
df for Domain as nominal variable = 3; all other df’s = 1.