| Literature DB >> 21720920 |
Emmanuel Keuleers1, Paula Lacey, Kathleen Rastle, Marc Brysbaert.
Abstract
We present a new database of lexical decision times for English words and nonwords, for which two groups of British participants each responded to 14,365 monosyllabic and disyllabic words and the same number of nonwords for a total duration of 16 h (divided over multiple sessions). This database, called the British Lexicon Project (BLP), fills an important gap between the Dutch Lexicon Project (DLP; Keuleers, Diependaele, & Brysbaert, Frontiers in Language Sciences. Psychology, 1, 174, 2010) and the English Lexicon Project (ELP; Balota et al., 2007), because it applies the repeated measures design of the DLP to the English language. The high correlation between the BLP and ELP data indicates that a high percentage of variance in lexical decision data sets is systematic variance, rather than noise, and that the results of megastudies are rather robust with respect to the selection and presentation of the stimuli. Because of its design, the BLP makes the same analyses possible as the DLP, offering researchers with a new interesting data set of word-processing times for mixed effects analyses and mathematical modeling. The BLP data are available at http://crr.ugent.be/blp and as Electronic Supplementary Materials.Entities:
Mesh:
Year: 2012 PMID: 21720920 PMCID: PMC3278621 DOI: 10.3758/s13428-011-0118-4
Source DB: PubMed Journal: Behav Res Methods ISSN: 1554-351X
List of visual word recognition megastudies
| Source | Task | Material | Participants |
|---|---|---|---|
| Seidenberg and Waters ( | Naming | 2,897 monosyllabic English words | 30 students |
| Treiman, Mullenix, Bijeljac-Babic, and Richmond-Welty ( | Naming | 1,329 English monosyllabic CVC words | 27 students |
| Spieler and Balota ( | Naming | 2,820 monosyllabic English words | 31 students |
| Spieler and Balota ( | Naming | 2,820 monosyllabic English words | 29 older adults (mean age = 73 years) |
| Chateau and Jared ( | Naming | 1,000 disyllabic six-letter words | 29 undergraduate students |
| Balota et al. ( | Lexical decision | 2,906 English monosyllabic words | 30 students and 30 older adults (mean age = 74 years) |
| Balota et al. ( | Lexical decision and naming | 40,481 English words | 400 students (naming), 816 students (lexical decision) |
| Lemhöfer et al. ( | Progressive demasking identification | 1,025 English words | 20 native speakers, and three groups of bilinguals with English as L2 |
| Ferrand et al. ( | Lexical decision | 38,840 French words | 975 students |
| Keuleers, Diependaele, & Brysbaert ( | Lexical decision | 14,037 Dutch monosyllabic and disyllabic words | 39 students and university personnel |
Comparison of the British Lexicon Project (BLP) with the Dutch Lexicon Project (DLP) and the monosyllabic and disyllabic words of the English Lexicon Project (ELP)
| BLP | DLP | ELP (mono + di) | |
|---|---|---|---|
| Number of words | 28,730 | 14,034 | 22,143 |
| Length (characters) | 6.5 (2–13) | 6.3 (2–12) | 6.5 (1–13) |
| Length (syllables) | 1.7 (1–2) | 1.8 (1–2) | 1.7 (1–2) |
| SUBTLEX frequencya | 31.5 (.02–41,857) | 59.7 (0.02–39,883) | 42.6 (0.02–41,857) |
| Accuracy words | 77% (0–100) | 84% (0–100) | 85% (0–100) |
| RT words | 654 (300–1,617) | 654 (312–1382) | 730 (415–1,589) |
| Accuracy nonwords | 94% (0–100) | 94% (2–100) | 88%b |
| RT nonwords | 639 (444–1,159) | 674 (508–1,135) | 856b |
aSUBTLEX frequencies refer to word form frequencies calculated on a corpus of 40–50 million words from film and television subtitles. Frequencies are expressed as frequency per million words. English frequencies are from Brysbaert and New (2009); Dutch frequencies are from Keuleers, Brysbaert, and New (2010). For the BLP words, there were SUBTLEX frequencies for only 25,316 words, partly because of spelling differences between British and American English. Therefore, unless indicated otherwise, for the analyses reported in this article, we used the BNC frequencies, which had an average of 26.9 per million and ranged from 0.01 to 61,879 per million.
bBased on the full ELP
Fig. 1The effects of practice on accuracy (left panel) and response latency (right panel)
Fig. 2ECVT test (Courrieu et al., 2011) on zRTs data of Group 1 (n=38), after imputation of 26% missing data by the CRARI algorithm (Courrieu & Rey, 2011). The "predicted" and "observed" curves are indistinguishable, and the χ 2 test does not detect a significant difference between them. Therefore, the ICC method can be considered valid for these data
Correlations between the word data of the British Lexicon Project (BLP) and English Lexicon Project (ELP; lexical decision)
| BLPzrt | BLPacc | ELPrt | ELPzrt | ELPacc | |
|---|---|---|---|---|---|
| BLPrt | .954 | -.685 | .679 | .730 | -.588 |
| BLPzrt | -.767 | .710 | .770 | -.656 | |
| BLPacc | -.580 | -.653 | .788 | ||
| ELPrt | .937 | -.595 | |||
| ELPzrt | -.690 |
All correlations are significant at the .0001 level (N=18,969)
Words with the largest residual difference in zRT between the British Lexicon Project (BLP) and English Lexicon Project (ELP)
| BLP Much Faster | BLP Much Slower |
|---|---|
| nightie | homer |
| greaseproof | lincoln |
| offence | boston |
| postcode | johnny |
| catchphrase | mom |
| sulphate | speedway |
| oxtail | sears |
| wholemeal | roger |
| levelled | softball |
| signposts | plato |
| gasworks | lawless |
| ferries | farthest |
| heartstrings | ralph |
| instructs | butts |
| transience | wick |
| tongs | dean |
| strengthens | heather |
| defence | peter |
| yachtsman | singed |
| drainpipe | babes |
| moulding | tooling |
Correlations between the word data in the BLP and Balota et al. (2004), young adults;[B04])
| BLP zRT | BLP Accuracy | B04 RT | B04 Accuracy | |
|---|---|---|---|---|
| BLPrt | .968 | -.682 | .693 | -.589 |
| BLPzrt | -.728 | .716 | -.616 | |
| BLPacc | -.589 | .612 | ||
| BAL04rt | -.589 |
All correlations are significant at the .0001 level (N=2,328).
Correlations between word characteristics and average RTs and accuracies for Balota et al. (2004), young adults; [B04]). and the British Lexicon Project (BLP; words for which all data are available; N = 2,328)
| RT | Accuracy | |||
|---|---|---|---|---|
| B04 | BLP | B04 | BLP | |
| Length | .092** | .147** | .022 | .039 |
| Frequency | -.598** | -.617** | .414** | .456** |
| AoA | .649** | .645** | -.501** | -.500** |
| Familiarity | -.605** | -.608** | .445** | .454** |
| Imageability | -.274** | -.273** | .303** | .266** |
AoA, age of acquisition.
** p<.01
Fig. 3The frequency effect in the English Lexicon Project (blue lines) and the British Lexicon Project (red lines). Stimuli were binned by frequency in groups of 1,000. Each line shows the mean RT and the standard deviation for a bin. Frequencies were based on British National Corpus. Log10 frequency is the log10 of the frequency per million (i.e., for 0.1 per million, its value is -1.0)
Fig. 4The word frequency effect in the English Lexicon Project (blue) and the British Lexicon Peojct (red) when expressed in standardized reaction times. The whiskers represent standard errors
Fig. 5The word frequency effect on accuracy in the English Lexicon Project (blue) and British Lexicon Project (red)
Published lexical decision experiments involving frequency effects and virtual experiments with the same stimuli from the British Lexicon Project (studies are chronologically ordered)
| Original Experiment RTs | Virtual Experiment RTs | ||
|---|---|---|---|
| Monsell, Doyle, and Haggard ( | High frequency, person | 538 | 535 |
| High frequency, thing | 541 | 539 | |
| Medium frequency, person | 553 | 570 | |
| Medium frequency, thing | 570 | 565 | |
| Low frequency, person | 639 | 640 | |
| Low frequency, thing | 617 | 618 | |
| Effect of frequency | 88** | 92** | |
| Effect of animacy | 1 | 8 | |
| Frequency × animacy interaction |
| n.s. | |
| Monsell et al. ( | High frequency, initial stress | 538 | 541 |
| High frequency, final stress | 543 | 551 | |
| Low frequency, initial stress | 642 | 646 | |
| Low frequency, final stress | 616 | 598 | |
| Effect of frequency | 89** | 77** | |
| Effect of stress | 10 | 19+ | |
| Frequency × stress Interaction | 15 | 29** | |
| Morrison and Ellis ( | High frequency | 548 | 542 |
| Low frequency | 602 | 576 | |
| Effect of frequency | 54** | 34** | |
| Yap et al. ( | High frequency | 557 | 531 |
| Low frequency | 605 | 574 | |
| Effect of frequency | 48** | 43** | |
** p<.01, * p<.05, + p<.10 or significant only in F 1 or F 2
Data from Yap et al. (2008), illustrating the frequency effect as a function of the vocabulary size
| University | Age | Years of Education | Vocabulary Age | RTHF | RTLF | Effect |
|---|---|---|---|---|---|---|
| Washington University | 20.9 | 13.8 | 18.7 | 612 | 678 | 66 |
| University of Waterloo | 20.9 | NA | 17.7 | 658 | 753 | 95 |
| University at Albany (SUNY) | 19.4 | 12.2 | 16.9 | 732 | 844 | 112 |
Studies addressing the effect of neighborhood size (N) in published lexical decision experiments and in virtual experiments with the same stimuli from the British Lexicon Project
| Original Experiment RTs | Virtual Experiment RTs | ||
|---|---|---|---|
| Andrews ( | High frequency, small | 570 | 539 |
| High frequency, large | 586 | 535 | |
| Low frequency, small | 757 | 642 | |
| Low frequency, large | 714 | 625 | |
| Effect of frequency | 157** | 96** | |
| Effect of | -13 | -7 | |
| Frequency × | 29** | 6 | |
| Sears, Hino, and Lupker ( | High frequency, small | 528 | 532 |
| High frequency, large | 509 | 538 | |
| Low frequency, small | 587 | 564 | |
| Low frequency, large | 577 | 581 | |
| Effect of frequency | 63** | 28** | |
| Effect of | 15+ | -12+ | |
| Frequency × | 5 | 6 | |
| Sears et al. ( | High frequency, small | 520 | 535 |
| High frequency, large | 518 | 546 | |
| Low frequency, small | 669 | 595 | |
| Low frequency, large | 617 | 587 | |
| Effect of frequency | 124** | 50** | |
| Effect of | 27+ | 1 | |
| Frequency × | 25+ | 10 | |
| Sears et al. ( | Small | 625 | 584 |
| Small | 585 | 559 | |
| Small | 591 | 563 | |
| Large | 585 | 554 | |
| Large | 570 | 574 | |
| Large | 570 | 557 | |
| Effect of | 25* | 7 | |
| Effect of higher neighbors | 27+ | 9 | |
|
| n.s. |
| |
| Chateau and Jared ( | High frequency, small | 542 | 539 |
| High frequency, large | 533 | 535 | |
| Low frequency, small | 694 | 642 | |
| Low frequency, large | 636 | 625 | |
| Effect of frequency | 127** | 96** | |
| Effect of | 33** | -7 | |
| Frequency x | 24** | 6 | |
| Sears et al. ( | High frequency, small | 520 | 532 |
| High frequency, large | 517 | 530 | |
| Low frequency, small | 567 | 552 | |
| Low frequency, large | 548 | 559 | |
| Effect of frequency | 39** | 24** | |
| Effect of | 11+ | -3 | |
| Frequency × | 8+ | 4 | |
aSame stimuli as in Andrews (1992) but pseudohomophones as nonwords, only the high print exposure group. **p<.01, *p<.05, + p<.10 or significant only in F 1 or F 2
Age of acquisition (AoA) effects in published lexical decision experiments and in virtual experiments with the same stimuli as those from the British Lexicon Project
| Original Experiment RTs | Virtual Experiment RTs | ||
|---|---|---|---|
| Morrison and Ellis ( | Early acquired | 582 | 552 |
| Late acquired | 648 | 604 | |
| Effect of AoA | 66** | 52** | |
| Gerhand and Barry ( | Early acquired, high frequency | 593 | 540 |
| Early acquired, low frequency | 621 | 538 | |
| Late acquired, high frequency | 603 | 584 | |
| Late acquired, low frequency | 730 | 623 | |
| Effect of AoA | 59** | 65** | |
| Effect of frequency | 77** | 19 | |
| Frequency × AoA Interaction | 50** | 20+ | |
** p<.01, * p<.05, + p<.10 or significant only in F 1 or F 2
The interaction between word frequency and spelling–sound consistency in published lexical decision experiments and in virtual experiments with the same stimuli as those from the British Lexicon Project
| Original Experiment RTs | Virtual Experiment RTs | ||
|---|---|---|---|
| Seidenberg et al. ( | High frequency, regular inconsistent | 584 | 534 |
| High frequency, strange | 570 | 526 | |
| High frequency, regular | 601 | 534 | |
| Low frequency, regular inconsistent | 626 | 603 | |
| Low frequency, strange | 673 | 613 | |
| Low frequency, regular | 633 | 598 | |
| Effect of frequency | 59** | 73** | |
| Effect of regularity | n.s. | n.s. | |
| Frequency × regularity interaction |
| n.s. | |
| Seidenberg et al. ( | High frequency, regular | 533 | 534 |
| High frequency, exception | 530 | 564 | |
| Low frequency, regular | 601 | 600 | |
| Low frequency, exception | 604 | 593 | |
| Effect of frequency | 71** | 47** | |
| Effect of regularity | 0 | -11 | |
| Frequency × regularity interaction | 3 | 18 | |
| Hino and Lupker ( | High frequency, regular | 500 | 543 |
| High frequency, exception | 492 | 521 | |
| Low frequency, regular | 573 | 597 | |
| Low frequency, exception | 579 | 571 | |
| Effect of frequency | 80** | 52** | |
| Effect of regularity | -1 | -24* | |
| Frequency × regularity interaction | 7 | 2 | |
| Stone et al. ( | Feedback consistent | 774 | 597 |
| Feedback inconsistent | 807 | 595 | |
| Effect of feedback consistency | 33* | -2 | |
| Stone et al. ( | Feedforward consistent, feedback consistent | 732 | 574 |
| Feedforward consistent, feedback inconsistent | 778 | 620 | |
| Feedforward inconsistent, feedback consistent | 780 | 593 | |
| Feedforward inconsistent, feedback inconsistent | 770 | 604 | |
| Effect of feedforward consistency | 20+ | 2 | |
| Effect of feedback consistency | 18+ | 29** | |
| Feedforward × feedback interaction | 28+ | 18+ | |
** p<.01, * p<.05, + p<.10 or significant only in F 1 or F 2
The effect of phonological neighborhood size in published lexical decision experiments and in virtual experiments with the same stimuli from BLP
| Original Experiment RTs | Virtual Experiment RTs | ||
|---|---|---|---|
| Yates et al. ( | Small phonological neighborhood | 681 | 633 |
| Large phonological neighborhood | 620 | 578 | |
| Effect of neighborhood | 61** | 55** | |
| Yates et al. ( | Small phonological neighborhood | 638 | 602 |
| Large phonological neighborhood | 601 | 580 | |
| Effect of neighborhood | 37+ | 22 | |
| Yates ( | Small phonological neighborhood | 729 | 610 |
| Large phonological neighborhood | 656 | 567 | |
| Effect of neighborhood | 73** | 43** | |
| Yates ( |
| 647 | 575 |
|
| 620 | 566 | |
| Effect of | 27** | 9 | |
** p<.01, * p<.05, + p<.10 or only significant in F 1 or F 2
Effects of frequency and bigram frequency in Andrews (1992, Experiment 3) and in a virtual experiment with the same stimuli as those in the British Lexicon Project
| Original Experiment RTs | Virtual Experiment RTs | |
|---|---|---|
| High word frequency, high bigram frequency | 592 | 532 |
| High word frequency, low bigram frequency | 594 | 531 |
| Low word frequency, high bigram frequency | 690 | 577 |
| Low word frequency, low bigram frequency | 686 | 591 |
| Effect of word frequency | 95** | 52** |
| Effect of bigram frequency | -1 | 6 |
| Word frequency × bigram frequency interaction | 3 | 7 |
** p<.01, * p<.05, + p<.10 or significant only in F 1 or F 2
Effects of polysemy on visual lexical decision times in published lexical decision experiments and in virtual experiments with the same stimuli as those in the British Lexicon Project
| Original Experiment RTs | Virtual Experiment RTs | ||
|---|---|---|---|
| Borowsky and Masson ( | Polysemous | 637 | 555 |
| Monosemous | 647 | 562 | |
| Effect of polysemy | 10+ | 7 | |
| Hino and Lupker ( | High frequency, polysemous | 548 | 524 |
| High frequency, monosemous | 561 | 534 | |
| Low frequency, polysemous | 613 | 574 | |
| Low frequency, monosemous | 626 | 572 | |
| Effect of frequency | 65** | 44** | |
| Effect of polysemy | 13+ | 4 | |
| Frequency × polysemy interaction | 0 | 6 | |
| Pexman, Hino, and Lupker ( | High frequency, polysemous | 513 | 529 |
| High frequency, monosemous | 511 | 531 | |
| Low frequency, polysemous | 567 | 564 | |
| Low frequency, monosemous | 609 | 570 | |
| Effect of frequency | 76** | 37** | |
| Effect of polysemy | 20* | 4 | |
| Frequency × polysemy interaction | 22* | 2 | |
** p<.01, * p<.05, + p<.10 or significant only in F 1 or F 2
Effect of number of senses and number of meanings in Rodd et al. (2000, Experiment 2) and in virtual experiments with the same stimuli as those in the British Lexicon Project
| Original Experiment RTs | Virtual Experiment RTs | |
|---|---|---|
| Many meanings, few senses | 587 | 571 |
| Many meanings, many senses | 578 | 559 |
| One meaning, few senses | 586 | 560 |
| One meaning, many senses | 567 | 550 |
| Effect of number of meanings | 6 | 10 |
| Effect of number of senses | 14* | 11+ |
| Interaction | 5 | 1 |
** p<.01, * p<.05, + p<.10 or significant only in F 1 or F 2
The feedforward and feedback consistency effects reported by Stone et al. (1997, Experiment 2) in the British Lexicon Project (BLP), in the English Lexicon Project (ELP), and combined. The first three columns show mean RTs, the final three columns show standardized RTs. The three final rows display the p-values of the effects
| Original Experiment RTs | BLP RTs | ELP RTs |
|
|
| |
|---|---|---|---|---|---|---|
| Feedforward consistent, feedback consistent | 732 | 574 | 634 | -.406 | -.505 | -.455 |
| Feedforward consistent, feedback inconsistent | 778 | 624 | 718 | -.057 | -.248 | -.152 |
| Feedforward inconsistent, feedback consistent | 780 | 598 | 669 | -.236 | -.386 | -.311 |
| Feedforward inconsistent, feedback inconsistent | 770 | 617 | 690 | -.087 | -.292 | -.189 |
|
| .059 | .491 | .814 | .353 | .537 | .376 |
|
| .086 | .005 | .003 | .001 | .005 | .001 |
|
| .129 | .189 | .066 | .182 | .186 | .137 |
Fig. 6Sample size required for finding an effect of a particular size (in milliseconds), derived from Monte Carlo simulation. For each combination of sample size (n=10, 20, 40, 80, 160) and effect size (0, 5, 10, 20, 40 ms), we ran 1,000 simulations, each time taking two random samples of n words from the database. The y-axis indicates the proportion of simulations in which the null hypothesis (no effect) was rejected (alpha = .05). Sample sizes at which sufficient power (.8) is reached for the British Lexicon Project are about n=40 for an effect of 40 ms and about n=160 for an effect of 20 ms in both types of analyses. For the English Lexicon Project, sufficient power is reached at about n=70 for an effect of 40 ms in the item analysis and about n=100 for an effect of 40 ms in the trial analysis