| Literature DB >> 34276486 |
Klára Jágrová1, Michael Hedderich1,2, Marius Mosbach1,2, Tania Avgustinova1,3, Dietrich Klakow1,2.
Abstract
This contribution seeks to provide a rational probabilistic explanation for the intelligibility of words in a genetically related language that is unknown to the reader, a phenomenon referred to as intercomprehension. In this research domain, linguistic distance, among other factors, was proved to correlate well with the mutual intelligibility of individual words. However, the role of context for the intelligibility of target words in sentences was subject to very few studies. To address this, we analyze data from web-based experiments in which Czech (CS) respondents were asked to translate highly predictable target words at the final position of Polish sentences. We compare correlations of target word intelligibility with data from 3-g language models (LMs) to their correlations with data obtained from context-aware LMs. More specifically, we evaluate two context-aware LM architectures: Long Short-Term Memory (LSTMs) that can, theoretically, take infinitely long-distance dependencies into account and Transformer-based LMs which can access the whole input sequence at the same time. We investigate how their use of context affects surprisal and its correlation with intelligibility.Entities:
Keywords: Czech; Long Short-Term Memory; Polish; context-aware language models; intercomprehension; predictive context; surprisal; transformer
Year: 2021 PMID: 34276486 PMCID: PMC8278517 DOI: 10.3389/fpsyg.2021.662277
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Figure 1Example: Predictability of PL target word głosu “voice [genitive]” is reflected well by the low surprisal score of the target obtained from the 3-g LM.
Figure 2Example: Predictability of PL target word gwoździa “nail [genitive]” is not reflected well by the 3-g language model (LM): surprisal curve rises at the target word.
The perplexity of the language models on the CS validation corpus.
| LSTM | 17.85 | 38.80 |
| Transformer | 15.59 | 32.67 |
| TransformerXL |
The lowest perplexity values are marked bold.
The perplexity of the LMs on the PL validation corpus.
| LSTM | 49.83 | 125.5 |
| Transformer | 31.12 | 70.11 |
| TransformerXL | ||
| ULMFiT-SP (Czapla et al., | – | 117.67 |
| ULMFiT-SP (Czapla et al., | – | 95.0 |
The lowest perplexity values are marked bold.
Traditionally calculated Levenshtein distance vs. pronunciation-based distance.
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
| PL | s | i | ł | o | w | n | i | ȩ | ||
| CS | p | o | s | i | l | o | v | n | u | |
| Distance | 1 | 1 | 0 | 0 | 0.5 | 0 | 1 | 0 | 1 | 1 |
| Normalized distance | 55% | |||||||||
| PL | s | i | ł | o | w | n | i | ȩ | ||
| CS | p | o | s | i | l | o | v | n | u | |
| Distance | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
| Normalized distance | 40% | |||||||||
Correlations of the context-aware LMs with intelligibility (all sentences).
| Transformer CS | target word | |
| LSTM CS | target word | |
| TransformerXL CS | target word | |
| 3-g PL (Jágrová and Avgustinova, | sentence (sum) | |
| Reader Model | target word | |
| 3-g CS (Jágrová and Avgustinova, | target word | |
| 3-g PL (Jágrová and Avgustinova, | target word | |
| TransformerXL PL | target word | |
| LSTM PL | target word | |
| Transformer PL | target word |
Figure 3Intelligibility of target words and surprisal from the CS Transformer model.
Figure 4Intelligibility of target words (including filtered subsets) and surprisal from the CS LSTM model.
Figure 5Intelligibility of target words and surprisal from the CS Transformer XL model.
Figure 6Relation of target word intelligibility and target word distance.
Figure 7Relation of target word intelligibility and the number of non-cognates per sentence.
Surprisal scores: 3-g vs. context-aware LMs of example sentences 1–7 (surprisal below the mean of the whole dataset is marked bold).
| 1 | 26.76 | 40.47 | |||||
| 2 | 5.88 | 6.16 | 36.19 | 55.28 | 10.75 | 11.90 | |
| 3 | |||||||
| 4 | 29.28 | 8.77 | 7.67 | ||||
| 5 | 3.86 | 4.05 | 30.42 | ||||
| 6 | 3.52 | 29.03 | |||||
| 7 | 4.15 | 5.58 | 26.52 | 57.75 | 8.31 | 11.25 | |
| Mean surp all | 3.14 | 3.85 | 24.74 | 38.97 | 7.59 | 6.94 | |
| SE all | 1.76 | 1.96 | 5.91 | 22.73 | 4.34 | 4.25 | |
| CV (%) | 56.12 | 50.96 | 23.89 | 58.34 | 57.12 | 61.29 |
Figure 8Subword level surprisal of the CS reader model when applied on CS testamentu (A) and PL testamencie (B) (both “testament [locative]”). From the perspective of a CS reader, the model displays a rise in surprisal at the unexpected subword cie (PL) as opposed to the CS subword units.