| Literature DB >> 35112286 |
Noam Siegelman1, Sascha Schroeder2, Cengiz Acartürk3,4, Hee-Don Ahn5, Svetlana Alexeeva6, Simona Amenta7, Raymond Bertram8, Rolando Bonandrini9, Marc Brysbaert10, Daria Chernova6, Sara Maria Da Fonseca11, Nicolas Dirix10, Wouter Duyck10, Argyro Fella12, Ram Frost13, Carolina A Gattei14,15,16, Areti Kalaitzi11, Nayoung Kwon17, Kaidi Lõo18, Marco Marelli9, Timothy C Papadopoulos19, Athanassios Protopapas11, Satu Savo8, Diego E Shalom14,15, Natalia Slioussar6,20, Roni Stein13, Longjiao Sui10, Analí Taboh14,15, Veronica Tønnesen11, Kerem Alp Usal5, Victor Kuperman21.
Abstract
Scientific studies of language behavior need to grapple with a large diversity of languages in the world and, for reading, a further variability in writing systems. Yet, the ability to form meaningful theories of reading is contingent on the availability of cross-linguistic behavioral data. This paper offers new insights into aspects of reading behavior that are shared and those that vary systematically across languages through an investigation of eye-tracking data from 13 languages recorded during text reading. We begin with reporting a bibliometric analysis of eye-tracking studies showing that the current empirical base is insufficient for cross-linguistic comparisons. We respond to this empirical lacuna by presenting the Multilingual Eye-Movement Corpus (MECO), the product of an international multi-lab collaboration. We examine which behavioral indices differentiate between reading in written languages, and which measures are stable across languages. One of the findings is that readers of different languages vary considerably in their skipping rate (i.e., the likelihood of not fixating on a word even once) and that this variability is explained by cross-linguistic differences in word length distributions. In contrast, if readers do not skip a word, they tend to spend a similar average time viewing it. We outline the implications of these findings for theories of reading. We also describe prospective uses of the publicly available MECO data, and its further development plans.Entities:
Keywords: Cross-linguistic research; Eye tracking; Language; Reading
Year: 2022 PMID: 35112286 PMCID: PMC8809631 DOI: 10.3758/s13428-021-01772-6
Source DB: PubMed Journal: Behav Res Methods ISSN: 1554-351X
Fig. 1Distribution of investigated languages in 1078 (2000–2018) publications on eye movements in reading
Investigated languages and their properties
| Language | Language code | Typological family (branch) | Script (script type) | Morphological typology | Orthographic transparency | % of studies (2000–2018) |
|---|---|---|---|---|---|---|
| Dutch | DU | Indo-European (West Germanic) | Latin (alphabetic) | Synthetic, fusional | Moderate | 3.3 |
| English | EN | Indo-European (West Germanic) | Latin (alphabetic) | Moderately analytic | Opaque | 57.5 |
| Estonian | EE | Uralic (Finnic) | Latin (alphabetic) | Agglutinative, fusional | Transparent | <1 |
| Finnish | FI | Uralic (Finnic) | Latin (alphabetic) | Agglutinative, fusional | Transparent | 3.9 |
| German | GE | Indo-European (West Germanic) | Latin (alphabetic) | Synthetic, fusional | Moderate | 9.7 |
| Greek | GR | Indo-European (Hellenic) | Greek (alphabetic) | Synthetic, fusional | Transparent | <1 |
| Hebrew | HE | Semitic (Northwestern Semitic) | Hebrew (abjad) | Synthetic, fusional Semitic morphology | Opaque | <1 |
| Italian | IT | Indo-European (Romance) | Latin (alphabetic) | Synthetic, fusional | Transparent | 2.1 |
| Korean | KO | Koreanic | Hangul (alphabetic) | Agglutinative | Moderate | 1.0 |
| Norwegian | NO | Indo-European (North Germanic) | Latin (alphabetic) | Synthetic, fusional | Moderate | <1 |
| Russian | RU | Indo-European (East Slavic) | Cyrillic (alphabetic) | Synthetic, fusional | Moderate | <1 |
| Spanish | SP | Indo-European (Romance) | Latin (alphabetic) | Synthetic, fusional | Transparent | 4.1 |
| Turkish | TR | Turkic (Oghuz) | Latin (alphabetic) | Agglutinative | Transparent | <1 |
% studied languages: The estimated portion of studied languages based on the bibliometric search reported in Part I.
Information regarding participants in the participating sites
| Language | Mean age (range) | Mean years of education (SD) | Mean self-rating: speaking (SD) | Mean self-rating: oral comp (SD) | Mean self-rating: reading (SD) | Country | Institute | Participants' compensation | Trials after trimming (%) | Data points after trimming | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Dutch | 45 | 22.69 (19–30) | 16.12 (2.81) | 9.47 (0.69) | 9.56 (0.62) | 9.6 (0.58) | Belgium | Ghent University | 10 Euro/hour | 67 | 66,075 |
| English | 46 | 21.04 (18–28) | 15.76 (1.7) | 10 (0) | 10 (0) | 10 (0) | Canada | McMaster University | 20 CAD/hour or course credit | 87 | 83,246 |
| Estonian | 52 | 22.23 (18–30) | 14.51 (2.56) | 9.31 (0.90) | 9.64 (0.56) | 9.46 (0.79) | Estonia | University of Tartu | Gift card worth 7.5 Euro/hour | 74 | 58,249 |
| Finnish | 49 | 24.29 (19–35) | 15.04 (2.71) | 9.67 (0.59) | 9.84 (0.47) | 9.82 (0.44) | Finland | University of Turku | Course credit or 2 movie tickets | 91 | 64,673 |
| German | 45 | 23.76 (18–39) | 15.88 (2.75) | 9.5 (0.69) | 9.59 (0.63) | 9.41 (0.72) | Germany | University of Goettingen | 10 Euro/hour or course credit | 83 | 74,096 |
| Greek | 45 | 22.84 (18–30) | 17.04 (2.5) | 9 (0.88) | 9.67 (0.6) | 9.73 (0.58) | Cyprus | University of Cyprus | 10 Euro/hour or course credit | 66 | 60,382 |
| Hebrew | 47 | 24.04 (18–29) | 12.82 (1.37) | 9.68 (0.56) | 9.79 (0.41) | 9.6 (0.54) | Israel | Hebrew University | 40 NIS/hour or course credit | 72 | 64,786 |
| Italian | 54 | 22.83 (19–30) | 16.72 (2.15) | 9.59 (0.71) | 9.76 (0.55) | 9.76 (0.51) | Italy | University of Milano-Bicocca | 15 Euro or course credit | 76 | 84,976 |
| Korean | 32 | 21.97 (19–25) | 12.98 (2.13) | 8.53 (1.5) | 8.78 (1.31) | 8.69 (1.09) | South Korea | Konkuk University | 10,000 KRW | 62 | 34,685 |
| Norwegian | 42 | 25.69 (19–30) | 15.33 (3.27) | 9.31 (1.7) | 9.33 (1.6) | 9.21 (1.7) | Norway | University of Oslo | Volunteers | 71 | 61,548 |
| Russian | 46 | 24.26 (18–45) | 15.45 (2.06) | 9.38 (1.41) | 9.69 (1.08) | 9.46 (1.47) | Russia | St. Petersburg State University | Course credit/volunteers | 81 | 67,094 |
| Spanish | 48 | 23.04 (18–30) | 19.48 (3.8) | 9.73 (0.61) | 9.73 (0.64) | 9.58 (0.79) | Argentina | Universidad Torcuato Di Tella | 8 USD | 75 | 84,942 |
| Turkish | 29 | 23.69 (20–29) | 17.34 (2.38) | 9.41 (0.73) | 9.66 (0.61) | 9.34 (1.59) | Turkey | Middle East Technical University | 50 Turkish liras | 64 | 31,065 |
Number of sentences (#sent) and words (#word) in each text across languages
| Text # | Topic | DU | EE | EN | FI | GE | GR | HE | IT | KO | NO | RU | SP | TR | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1* | Janus | #sent | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 9 | 10 |
| #word | 186 | 131 | 183 | 128 | 174 | 189 | 130 | 185 | 142 | 177 | 151 | 210 | 146 | ||
| 2 | Shaka | #sent | 7 | 9 | 6 | 8 | 9 | 6 | 11 | 7 | 7 | 8 | 7 | 7 | 7 |
| #word | 194 | 133 | 185 | 116 | 161 | 171 | 209 | 174 | 150 | 169 | 145 | 190 | 131 | ||
| 3* | Doping | #sent | 9 | 9 | 9 | 10 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 |
| #word | 185 | 137 | 187 | 143 | 190 | 179 | 151 | 217 | 167 | 176 | 155 | 238 | 156 | ||
| 4 | Thylacine | #sent | 12 | 13 | 9 | 7 | 11 | 9 | 10 | 6 | 9 | 9 | 9 | 7 | 9 |
| #word | 206 | 142 | 182 | 130 | 180 | 177 | 168 | 176 | 158 | 181 | 190 | 169 | 167 | ||
| 5 | World Environment Day | #sent | 11 | 11 | 8 | 11 | 11 | 8 | 9 | 5 | 8 | 8 | 9 | 5 | 7 |
| #word | 173 | 147 | 167 | 127 | 154 | 180 | 168 | 137 | 160 | 158 | 139 | 182 | 139 | ||
| 6 | Monocle | #sent | 7 | 10 | 8 | 8 | 11 | 8 | 9 | 8 | 9 | 8 | 9 | 11 | 10 |
| #word | 180 | 126 | 152 | 97 | 153 | 143 | 151 | 150 | 143 | 149 | 165 | 212 | 142 | ||
| 7* | Wine tasting | #sent | 8 | 8 | 8 | 8 | 9 | 9 | 8 | 8 | 9 | 9 | 9 | 8 | 8 |
| #word | 213 | 156 | 199 | 135 | 199 | 202 | 164 | 212 | 167 | 189 | 165 | 229 | 150 | ||
| 8 | Orange juice | #sent | 10 | 7 | 6 | 8 | 9 | 6 | 14 | 7 | 7 | 11 | 7 | 7 | 10 |
| #word | 161 | 102 | 136 | 103 | 132 | 134 | 165 | 160 | 130 | 171 | 150 | 159 | 126 | ||
| 9 | Beekeeping | #sent | 10 | 9 | 8 | 11 | 9 | 7 | 10 | 6 | 9 | 16 | 11 | 10 | 9 |
| #word | 181 | 107 | 200 | 128 | 171 | 176 | 173 | 164 | 152 | 243 | 150 | 188 | 149 | ||
| 10 | National flag | #sent | 11 | 10 | 11 | 13 | 11 | 11 | 15 | 8 | 8 | 11 | 11 | 10 | 9 |
| #word | 187 | 109 | 180 | 149 | 181 | 181 | 201 | 176 | 168 | 177 | 164 | 234 | 127 | ||
| 11* | International Union for Conservation of Nature | #sent | 9 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 9 | 8 | 7 | 8 |
| #word | 196 | 132 | 176 | 120 | 172 | 181 | 140 | 182 | 125 | 170 | 164 | 225 | 139 | ||
| 12* | Vehicle registration plate | #sent | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 |
| #word | 169 | 118 | 162 | 111 | 160 | 170 | 130 | 181 | 134 | 146 | 156 | 176 | 125 |
#sent number of sentences; #word number of words. Translated texts are marked with an asterisk; other texts were language-specific. Note that some small deviations in the number of sentences per text in matched texts are due to differences in spelling conventions (e.g., using colon or period before "For example").
Correlation table for reading measures across languages (N = 580). Values above the diagonal show Pearson correlations; values below the diagonal show p-values
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 1) Skipping | –0.07 | –0.40 | –0.47 | 0.75 | –0.60 | –0.54 | –0.06 | –0.3 | –0.13 | |
| 2) First fixation duration | 0.085 | 0.82 | 0.61 | –0.45 | 0.14 | 0.03 | –0.07 | –0.04 | –0.08 | |
| 3) Gaze duration | <0.001 | <0.001 | 0.71 | –0.66 | 0.67 | 0.29 | –0.08 | –0.02 | –0.04 | |
| 4) Total fixation duration | <0.001 | <0.001 | <0.001 | –0.87 | 0.43 | 0.80 | 0.42 | 0.67 | 0.04 | |
| 5) Reading rate | <0.001 | <0.001 | <0.001 | <0.001 | –0.56 | –0.78 | –0.37 | –0.59 | –0.08 | |
| 6) Number of fixations: first run | <0.001 | 0.001 | <0.001 | <0.001 | <0.001 | 0.45 | –0.08 | 0.00 | 0.02 | |
| 7) Number of fixations: total | <0.001 | 0.413 | <0.001 | <0.001 | <0.001 | <0.001 | 0.57 | 0.88 | 0.10 | |
| 8) Regressions in | 0.122 | 0.085 | 0.048 | <0.001 | <0.001 | 0.067 | <0.001 | 0.70 | 0.05 | |
| 9) Rereading | <0.001 | 0.327 | 0.58 | <0.001 | <0.001 | 0.956 | <0.001 | <0.001 | 0.11 | |
| 10) CFT | 0.003 | 0.046 | 0.309 | 0.315 | 0.068 | 0.684 | 0.018 | 0.211 | 0.008 |
Fig. 2Means of eye-movement measures across languages. Error bars stand for ± 1 SE. accuracy: percent answers correct; accuracyMatched: percent answers correct in matched texts; cft: score in the CFT test; firstFixationDuration: first fixation duration; gazeDuration: gaze duration; nFixationsFirstRun: first run number of fixations; nFixationsTotal: total number of fixations; readingRate: reading rate; regressionIn: regression rate; rereading: likelihood of second pass; skipping: skipping rate; totalFixationDuration: total fixation duration. du: Dutch; ee: Estonian; en: English; fi: Finnish; ge: German; gr: Greek; he: Hebrew; it: Italian; ko: Korean; no: Norwegian; ru: Russian; sp: Spanish; tr: Turkish
Fig. 3Classification of participants into languages based on eye-movement measures. Best-split values of input variables are reported, along with p-values of the splits
Fig. 4Estimated skipping rate for a word of an average length as a function of mean word length in language
Fig. 5Hierarchical clustering of languages based on eye movements