| Literature DB >> 34899450 |
Sophia Voigtmann1,2, Augustin Speyer1,2.
Abstract
This paper aims to find a correlation between Information Density (ID) and extraposition of Relative Clauses (RC) in Early New High German. Since surprisal is connected to perceiving difficulties, the impact on the working memory is lower for frequent combinations with low surprisal-values than it is for rare combinations with higher surprisal-values. To improve text comprehension, producers therefore distribute information as evenly as possible across a discourse. Extraposed RC are expected to have a higher surprisal-value than embedded RC. We intend to find evidence for this idea in RC taken from scientific texts from the 17th to 19th century. We built a corpus of tokenized, lemmatized and normalized papers about medicine from the 17th and 19th century, manually determined the RC-variants and calculated a skipgram-Language Model to compute the 2-Skip-bigram surprisal of every word of the relevant sentences. A logistic regression over the summed up surprisal values shows a significant result, which indicates a correlation between surprisal values and extraposition. So, for these periods it can be said that RC are more likely to be extraposed when they have a high total surprisal value. The influence of surprisal values also seems to be stable across time. The comparison of the analyzed language periods shows no significant change.Entities:
Keywords: Early New High German; corpus linguistics; extraposition; information density; relative clauses
Year: 2021 PMID: 34899450 PMCID: PMC8660694 DOI: 10.3389/fpsyg.2021.650969
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Corpus.
|
|
|
|---|---|
| 1650–1700 | Purmann, |
| 1700–1750 | Unzer, |
| 1750–1800 | Gall, |
| 1800–1850 | Reil, |
| 1850–1900 | Ludwig, |
Language model.
|
|
|
|
|
|
|---|---|---|---|---|
| 1650–1700 | 2,107,590 | 48,1693 | 8.93% | 240 (116, 48%) |
| 1700–1750 | 1,481,259 | 39,251 | 6% | 680 (363, 53%) |
| 1750–1800 | 2,572,263 | 26,325 | 14.72% | 375 (130, 35%) |
| 1800–1850 | 998,639 | 16,757 | 6.28% | 1,023 (573, 56%) |
| 1850–1900 | 1,270,561 | 29,060 | 12.13% | 925 (467, 50%) |
Figure 1Cumulative surprisal.
Figure 2Mean surprisal.
Most influential effects in the final linear regression model (GLM) predicting position from surprisal values.
|
|
|
|
|---|---|---|
| Cumulative surprisal | −2.23 | <0.05* |
| Type | 1.74 | <0.1 |
| Cumulative surprisal: length | 2.8 | <0.001** |
Descriptive statistics.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| 1650–1700 | 240 (116, 48%) | 6.697/242.55 (42.066) | 3.19/4.36 (3.74) | 3/58 (10.42) | 1/10 (2.38) |
| 1700–1750 | 680 (363, 53%) | 6.7/177.55 (34.99) | 3.19/4.41 (3.67) | 2/50 (9.7) | 1/17 (2.08) |
| 1750–1800 | 375(130, 35%) | 7.17/216.88 (41.45) | 3.291/4.260 (3.716) | 3/37 (11.14) | 1/7 (1.87) |
| 1800–1850 | 1023 (573, 56%) | 6.36/211.69 (39.39) | 3.13/4.18 (3.57) | 2/58 (10.42) | 1/14 (1.78) |
| 1850–1900 | 925 (467, 50%) | 6.87/222.628 (40.10) | 2.91/4.09 (3.62) | 3/66 (11.58) | 1/11 (1.74) |
Most influential effects in the final GLM predicting position, 1650–1700.
|
|
|
|
|---|---|---|
| Cumulative surprisal | −2.669 | <0.01** |
| Length | 2.268 | <0.05* |
Most influential effects in the final GLM predicting Position, 1700–1750.
|
|
|
|
|---|---|---|
| Mean surprisal | −1.693 | <0.1 |
| Length | −3.961 | <0.01** |
Most influential effects in the final GLM predicting position, 1750–1800.
|
|
|
|
|---|---|---|
| Cumulative surprisal | −4.471 | <0.01** |
| Mean surprisal | −0.186 | <0.1 |
Most influential effects in the final GLM predicting position, 1800–1850.
|
|
|
|
|---|---|---|
| Cumulative surprisal | −5.474 | <0.001*** |
| Mean surprisal | 0.853 | =0.394 |
Most influential effects in the final GLM predicting position, 1850–1900.
|
|
|
|
|---|---|---|
| Cumulative surprisal | −1.8 | <0.1 |
| Mean surprisal | −1.736 | <0.1 |
| Cumulatvie surprisal: length | 2.67 | <0.05** |
Most influential effects in the final GLM predicting position, after removing interactions, 1850–1900.
|
|
|
|
|---|---|---|
| Cumulative surprisal | −8.027 | <0.001*** |
| Mean surprisal | −1.835 | <0.1 |
|
|
|
|
|
|
|---|---|---|---|---|
| 1) Peter | Hat | Maria das | gegeben. | |
| Peter | Has | Maria the | given. | |
| 2) Peter | Hat | Maria | gegeben | |
| Peter | Has | Maria | given | |
| “Peter has given Maria a book that she needs urgently.” | ||||