| Literature DB >> 33828649 |
Arpit Agrawal1, Sumeet Agarwal1, Samar Husain1.
Abstract
We used the Potsdam-Allahabad Hindi eye-tracking corpus to investigate the role of wordlevel and sentence-level factors during sentence comprehension in Hindi. Extending previous work that used this eye-tracking data, we investigate the role of surprisal and retrieval cost metrics during sentence processing. While controlling for word-level predictors (word complexity, syllable length, unigram and bigram frequencies) as well as sentence-level predictors such as integration and storage costs, we find a significant effect of surprisal on first-pass reading times (higher surprisal value leads to increase in FPRT). Effect of retrieval cost was only found for a higher degree of parser parallelism. Interestingly, while surprisal has a significant effect on FPRT, storage cost (another predictionbased metric) does not. A significant effect of storage cost shows up only in total fixation time (TFT), thus indicating that these two measures perhaps capture different aspects of prediction. The study replicates previous findings that both prediction-based and memorybased metrics are required to account for processing patterns during sentence comprehension. The results also show that parser model assumptions are critical in order to draw generalizations about the utility of a metric (e.g. surprisal) across various phenomena in a language.Entities:
Keywords: sentence comprehension; Hindi comprehension; eye-tracking; incremental dependency parser; surprisal; working memory constraints
Year: 2017 PMID: 33828649 PMCID: PMC7141052 DOI: 10.16910/jemr.10.2.4
Source DB: PubMed Journal: J Eye Mov Res ISSN: 1995-8692 Impact factor: 0.957

Surprisal (k = 3) at different words for the sentence dilli meediaa kaa makkaa-madinaa hai – ‘Delhi is the epicenter of the media (in India).’
| Word | Gloss | αi | Surprisal |
|---|---|---|---|
| dilli | Delhi | 1 | 0.00000 |
| meediaa | Media | 0.99997 | 0.00003 |
| kaa | GEN | 0.9985 | 0.00148 |
| makkaa-madinaa | Mecca-Medina | 0.3134 | 1.15865 |
| hai | Is | 0.2713 | 0.14419 |
Results of linear mixed-effects model on log first pass reading time.
| Estimate(b) | Std. Error | t value | |
|---|---|---|---|
| Intercept | 5.502 | 0.023 | 237.74 |
| Word complexity | 0.003 | 0.003 | 0.87 |
| Word frequency | -0.0003 | 0.006 | -0.04 |
| Word bigramfrequency | -0.014 | 0.003 | -4.00 |
| Syllable length | 0.112 | 0.011 | 9.95 |
| Integration cost | 0.004 | 0.004 | 1.00 |
| Storage cost | 0.003 | 0.006 | 0.50 |
| Surprisal | 0.013 | 0.004 | 2.88 |
Results of linear mixed-effects model on log regression path duration.
| Estimate(b) | Std. Error | t value | |
|---|---|---|---|
| Intercept | 5.655 | 0.031 | 181.45 |
| Word complexity | 0.003 | 0.004 | 0.77 |
| Word frequency | -0.005 | 0.007 | -0.75 |
| Word bigramfrequency | -0.023 | 0.003 | -6.53 |
| Syllable length | 0.116 | 0.011 | 10.44 |
| Integration cost | 0.012 | 0.005 | 2.26 |
| Storage cost | -0.011 | 0.007 | -1.57 |
| Surprisal | 0.002 | 0.005 | 0.52 |
Results of linear mixed-effects model on log regression path duration.
| Estimate(b) | Std. Error | t value | |
|---|---|---|---|
| Intercept | 5.619 | 0.030 | 181.32 |
| Word complexity | 0.005 | 0.002 | 1.97 |
| Word frequency | -0.016 | 0.007 | -2.24 |
| Word bigramfrequency | -0.018 | 0.004 | -4.41 |
| Syllable length | 0.131 | 0.010 | 12.06 |
| Integration cost | 0.001 | 0.004 | 0.39 |
| Storage cost | 0.019 | 0.006 | 2.80 |
| Surprisal | 0.005 | 0.004 | 1.14 |
Figure 1Husain et al. (2014) [20] Experiment 1: Reading times in log ms at the critical region (relative clause verb) for the four conditions.




Results of linear mixed-effects model on log regression path duration (k=25)
| Estimate(b) | Std. Error | t value | |
|---|---|---|---|
| Intercept | 5.656 | 0.031 | 180.98 |
| Word complexity | 0.002 | 0.003 | 0.61 |
| Word frequency | -0.006 | 0.007 | -0.81 |
| Word bigramFrequency | -0.024 | 0.003 | -6.79 |
| Syllable length | 0.115 | 0.011 | 10.27 |
| Storage cost | -0.014 | 0.007 | -1.89 |
| Surprisal | 0.0001 | 0.005 | 0.03 |
| Retrieval cost | 0.009 | 0.004 | 1.91 |
Results of linear mixed-effects model on log first pass reading time
| Estimate(b) | Std. Error | t value | |
|---|---|---|---|
| Intercept | 5.501 | 0.023 | 237.72 |
| Word complexity | 0.002 | 0.003 | 0.67 |
| Word frequency | 6.750e-04 | 0.005 | 0.12 |
| Word bigramfrequency | -0.013 | 0.003 | -4.03 |
| Syllable length | 0.110 | 0.011 | 9.90 |
| Storage cost | -9.006e-05 | 0.006 | -0.01 |
| Surprisal | 0.016 | 0.004 | 3.75 |
| Retrieval cost | -0.004 | 0.003 | -1.14 |
Results of linear mixed-effects model on log regression path duration.
| Estimate(b) | Std. Error | t value | |
|---|---|---|---|
| Intercept | 5.654 | 0.031 | 181.98 |
| Word complexity | 0.002 | 0.004 | 0.55 |
| Word frequency | -0.004 | 0.007 | -0.64 |
| Word bigramfrequency | -0.023 | 0.003 | -6.58 |
| Syllable length | 0.113 | 0.011 | 10.11 |
| Storage cost | -0.015 | 0.007 | -2.17 |
| Surprisal | 0.004 | 0.005 | 0.75 |
| Retrieval cost | 0.007 | 0.005 | 1.42 |
Results of linear mixed-effects model on log total fixation time.
| Estimate(b) | Std. Error | t value | |
|---|---|---|---|
| Intercept | 5.618 | 0.030 | 182.30 |
| Word complexity | 0.004 | 0.002 | 1.68 |
| Word frequency | -0.014 | 0.006 | -2.09 |
| Word bigramfrequency | -0.017 | 0.004 | -4.26 |
| Syllable length | 0.129 | 0.010 | 11.85 |
| Storage cost | 0.016 | 0.006 | 2.46 |
| Surprisal | 0.011 | 0.004 | 2.33 |
| Retrieval cost | -0.006 | 0.004 | -1.52 |
Coefficient of surprisal for log first pass reading time for different values of k
| k | Estimate(b) | Std. Error | t value |
|---|---|---|---|
| 1 | 0.006 | 0.003 | 1.75 |
| 2 | 0.009 | 0.003 | 2.62 |
| 3 | 0.010 | 0.004 | 2.55 |
| 4 | 0.010 | 0.004 | 2.5 |
| 5 | 0.011 | 0.004 | 2.8 |
| 10 | 0.012 | 0.004 | 2.88 |
| 15 | 0.011 | 0.004 | 2.65 |
| 20 | 0.010 | 0.004 | 2.38 |
| 25 | 0.009 | 0.004 | 2.2 |
Coefficient of retrieval cost for log regression path duration for different values of k
| k | Estimate(b) | Std. Error | t value |
|---|---|---|---|
| 2 | 0.010 | 0.006 | 1.75 |
| 3 | 0.010 | 0.005 | 1.79 |
| 4 | 0.008 | 0.005 | 1.50 |
| 5 | 0.007 | 0.005 | 1.41 |
| 10 | 0.007 | 0.005 | 1.42 |
| 15 | 0.007 | 0.005 | 1.40 |
| 20 | 0.007 | 0.004 | 1.59 |
| 25 | 0.009 | 0.004 | 1.91 |