| Literature DB >> 31347038 |
Christina L Gagné1, Thomas L Spalding2, Daniel Schmidtke3.
Abstract
The Large Database of English Compounds (LADEC) consists of over 8,000 English words that can be parsed into two constituents that are free morphemes, making it the largest existing database specifically for use in research on compound words. Both monomorphemic (e.g., wheel) and multimorphemic (e.g., teacher) constituents were used. The items were selected from a range of sources, including CELEX, the English Lexicon Project, the British Lexicon Project, the British National Corpus, and Wordnet, and were hand-coded as compounds (e.g., snowball). Participants rated each compound in terms of how predictable its meaning is from its parts, as well as the extent to which each constituent retains its meaning in the compound. In addition, we obtained linguistic characteristics that might influence compound processing (e.g., frequency, family size, and bigram frequency). To show the usefulness of the database in investigating compound processing, we conducted a number of analyses that showed that compound processing is consistently affected by semantic transparency, as well as by many of the other variables included in LADEC. We also showed that the effects of the variables associated with the two constituents are not symmetric. In short, LADEC provides the opportunity for researchers to investigate a number of questions about compounds that have not been possible to investigate in the past, due to the lack of sufficiently large and robust datasets. In addition to directly allowing researchers to test hypotheses using the information included in LADEC, the database will contribute to future compound research by allowing better stimulus selection and matching.Entities:
Keywords: Bigram frequency; Compound words; Family size; Morphology; Psycholinguistics; Semantic transparency; Sentiment
Mesh:
Year: 2019 PMID: 31347038 PMCID: PMC6797637 DOI: 10.3758/s13428-019-01282-6
Source DB: PubMed Journal: Behav Res Methods ISSN: 1554-351X
Frequency distribution of letter lengths for the bases
| Length | Frequency |
|---|---|
| 3 | 795 |
| 4 | 3,000 |
| 5 | 6,077 |
| 6 | 10,049 |
| 7 | 13,963 |
| 8 | 15,618 |
| 9 | 14,734 |
| 10 | 12,180 |
Descriptive statistics for semantic transparency ratings for the first constituent (ratingC1), second constituent (ratingC2), and full compound (ratingcmp), reported separately for all items, for correctly parsed items only, and for incorrectly parsed items only
|
| Mean |
| Min | Max | |
|---|---|---|---|---|---|
| All Items | |||||
| ratingC1 | 8,299 | 64.59 | (19.68) | 3.28 | 98.54 |
| ratingC2 | 8,299 | 69.70 | (18.55) | 2.04 | 99.68 |
| ratingcmp | 8,299 | 61.00 | (18.80) | 8.14 | 96.96 |
| Correctly Parsed Items | |||||
| ratingC1 | 8,115 | 64.80 | (19.59) | 4.90 | 98.54 |
| ratingC2 | 8,115 | 71.00 | (16.46) | 3.13 | 99.68 |
| ratingcmp | 8,115 | 61.90 | (18.01) | 14.37 | 96.96 |
| Incorrectly Parsed Items | |||||
| ratingC1 | 184 | 55.73 | (21.92) | 3.28 | 94.32 |
| ratingC2 | 184 | 11.78 | (11.64) | 2.04 | 85.03 |
| ratingcmp | 184 | 21.84 | (7.43) | 8.14 | 61.90 |
Fig. 1Histograms of semantic transparency ratings for the first constituent (ratingC1, top left), second constituent (ratingC2, top right), and full compound (ratingcmp, bottom)
Descriptive statistics for SNAUT cosine distance and latent semantic analysis (LSA) values
|
| Mean |
| Min | Max | |
|---|---|---|---|---|---|
| c1c2_snautCos | 6,089 | 0.78 | 0.11 | 0.21 | 1.13 |
| c1stim_snautCos | 4,223 | 0.76 | 0.13 | 0.33 | 1.22 |
| c2stim_snautCos | 4,223 | 0.76 | 0.13 | 0.24 | 1.16 |
| LSAc1c2 | 8,618 | 0.16 | 0.14 | –0.13 | 1.00 |
| LSAc1stim | 4,473 | 0.17 | 0.18 | –0.15 | 0.95 |
| LSAc2stim | 4,442 | 0.15 | 0.16 | –0.17 | 0.94 |
Descriptive statistics for sentiment (probability of positive sentiment, probability of negative sentiment, and ratio of positive to negative), valence (Warriner et al., 2013), and concreteness (Brysbaert et al., 2014)
|
| Mean |
| Min | Max | |
|---|---|---|---|---|---|
| Compound | |||||
| sentimentprobpos_stim | 8,956 | 0.40 | 0.04 | 0.07 | 0.92 |
| sentimentprobneg_stim | 8,956 | 0.35 | 0.04 | 0.02 | 0.78 |
| sentimentratioposneg_stim | 8,956 | 1.17 | 0.69 | 0.09 | 55.36 |
| valence_stim | 2,213 | 5.19 | 1.19 | 1.59 | 8.30 |
| concreteness_stim | 4,647 | 3.98 | 0.78 | 1.27 | 5.00 |
| First Constituent | |||||
| sentimentprobpos_c1 | 8,956 | 0.39 | 0.11 | 0.08 | 0.97 |
| sentimentprobneg_c1 | 8,956 | 0.33 | 0.12 | 0.02 | 0.82 |
| sentimentratioposneg_c1 | 8,956 | 1.43 | 0.90 | 0.22 | 49.06 |
| valence_c1 | 7,952 | 5.58 | 1.15 | 1.43 | 8.37 |
| concreteness_c1 | 8,383 | 4.17 | 0.82 | 1.43 | 5.00 |
| Second Constituent | |||||
| sentimentprobpos_c2 | 8,956 | 0.38 | 0.10 | 0.12 | 0.84 |
| sentimentprobneg_c2 | 8,956 | 0.34 | 0.10 | 0.04 | 0.76 |
| sentimentratioposneg_c2 | 8,956 | 1.31 | 0.55 | 0.13 | 12.03 |
| valence_c2 | 7,798 | 5.56 | 0.97 | 1.53 | 8.26 |
| concreteness_c2 | 6,480 | 4.23 | 0.81 | 1.22 | 5.00 |
Descriptive statistics for log word frequency from SUBTLEX, Zipf word frequency from SUBTLEX, log word frequency from Facebook data, lexical decision times from the English Lexicon Project (ELP) and British Lexicon Project (BLP), and naming times from ELP
|
| Mean |
| Min | Max | |
|---|---|---|---|---|---|
| Facebook frequency | 3,202 | 1.91 | 0.97 | –0.05 | 5.88 |
| SUBTLEX frequency | 5,308 | 1.06 | 0.62 | 0.30 | 4.36 |
| SUBTLEX Zipf | 5,308 | 2.35 | 0.62 | 1.59 | 5.65 |
| ELP RT | 3,148 | 806 | 126 | 534 | 1,588 |
| BLP RT | 2,609 | 685 | 82 | 431 | 1,274 |
| ELP naming | 3,149 | 716 | 91 | 546 | 1,158 |
Pearson correlations among Juhasz et al.’s (2015) semantic transparency ratings (2015_trans); Kim et al.’s (2018) ratings for the first and second constituents (2018_ratingC1 and 2018_rating_C2); semantic transparency ratings for the first constituent (ratingC1), second constituent (ratingC2), and compound (ratingcmp); and SNAUT cosine distance and latent semantic analysis (LSA) values
| ratingcmp | ratingC1 | ratingC2 | 2015 Trans | 2018 transC1 | 2018 transC2 | c1c2 snautCos | c1stim snautCos | c2stim snautCos | LSAc1c2 | LSAc1stim | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| ratingC1 | .75*** | 1.00 | |||||||||
| ratingC2 | .66*** | .26*** | 1.00 | ||||||||
| 2015_trans | .86*** | .73*** | .68*** | 1.00 | |||||||
| 2018_ratingC1 | .62*** | .77*** | .23*** | .67*** | 1.00 | ||||||
| 2018_ratingC2 | .47*** | .17*** | .72*** | .53*** | .26*** | 1.00 | |||||
| c1c2_snautCos | – .26*** | – .17*** | – .18*** | – .23*** | – .16*** | – .23*** | 1.00 | ||||
| c1stim_snautCos | – .43*** | – .51*** | – .19*** | – .49*** | – .60*** | – .21*** | .24*** | 1.00 | |||
| c2stim_snautCos | – .26*** | – .07 | – .43*** | – .33*** | – .09 | – .55*** | .30*** | .31*** | 1.00 | ||
| LSAc1c2 | .19*** | .12* | .10* | .12* | .04 | .12* | – .71*** | – .12** | – .18*** | 1.00 | |
| LSAc1stim | .36*** | .38*** | .20*** | .38*** | .47*** | .20*** | – .15** | – .58*** | – .21*** | .11* | 1.00 |
| LSAc2stim | .21*** | .04 | .34*** | .23*** | .04 | .43*** | – .20*** | – .13** | – .50*** | .22*** | .29*** |
|
| 429 |
*p < .05, **p < .01, ***p < .001.
Standardized regression coefficients with standard errors (in parentheses) from models using the semantic transparency measures and compound-based covariates to predict English Lexicon Project lexical decision times
| Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | Model 6 | Model 7 | |
|---|---|---|---|---|---|---|---|
| SUBTLEX frequency | – 0.53*** | – 0.52*** | – 0.51*** | – 0.51*** | – 0.49*** | – 0.45*** | – 0.49*** |
| (0.00381) | (0.00382) | (0.00381) | (0.00380) | (0.00383) | (0.00161) | (0.00158) | |
| stimlen | 0.15*** | 0.15*** | 0.15*** | 0.15*** | 0.15*** | 0.23*** | 0.22*** |
| (0.00179) | (0.00179) | (0.00178) | (0.00177) | (0.00176) | (0.000654) | (0.000666) | |
| 2015_trans | – 0.03 | ||||||
| (0.00190) | |||||||
| 2018_ratingC1 | – 0.04 | ||||||
| (0.00275) | |||||||
| 2018_ratingC2 | – 0.08* | ||||||
| (0.00272) | |||||||
| ratingcmp | – 0.12** | – 0.37*** | – 0.38*** | ||||
| (0.000135) | (0.000305) | (0.000132) | |||||
| ratingC1 | 0.22** | 0.16*** | – 0.08*** | ||||
| (0.000192) | (0.0000879) | (0.0000511) | |||||
| ratingC2 | 0.12* | 0.15*** | – 0.04* | ||||
| (0.000189) | (0.0000837) | (0.0000571) | |||||
|
| 456 | 456 | 456 | 456 | 456 | 2513 | 2513 |
| adj. R-sq | 0.308 | 0.308 | 0.316 | 0.322 | 0.334 | 0.349 | 0.324 |
| AIC | – 1,494.9 | – 1,493.6 | – 1,498.0 | – 1,503.0 | – 1,509.4 | – 8,065.1 | – 7,969.7 |
Standardized beta coefficients; standard errors are in parentheses. *p < .05, **p < .01, ***p < .001.
Standardized regression coefficients with standard errors (in parentheses) from models using the semantic transparency measures and compound-based covariates to predict British Lexicon Project lexical decision times
| Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | Model 6 | Model 7 | |
|---|---|---|---|---|---|---|---|
| Analyses using SUBTLEX to estimate frequency | |||||||
| SUBTLEX frequency | – 0.52*** | – 0.52*** | – 0.51*** | – 0.50*** | – 0.50*** | – 0.43*** | – 0.47*** |
| (0.00375) | (0.00370) | (0.00369) | (0.00367) | (0.00371) | (0.00146) | (0.00144) | |
| stimlen | – 0.06 | – 0.06 | – 0.05 | – 0.05 | – 0.06 | 0.02 | – 0.00 |
| (0.00220) | (0.00217) | (0.00216) | (0.00215) | (0.00214) | (0.000766) | (0.000775) | |
| 2015_trans | – 0.14** | ||||||
| (0.00184) | |||||||
| 2018_ratingC1 | – 0.09 | ||||||
| (0.00266) | |||||||
| 2018_ratingC2 | – 0.14** | ||||||
| (0.00265) | |||||||
| ratingcmp | – 0.19*** | – 0.33** | – 0.40*** | ||||
| (0.000131) | (0.000310) | (0.000122) | |||||
| ratingC1 | 0.17* | 0.19*** | – 0.06** | ||||
| (0.000196) | (0.0000812) | (0.0000477) | |||||
| ratingC2 | 0.01 | 0.14*** | – 0.06** | ||||
| (0.000189) | (0.0000789) | (0.0000538) | |||||
|
| 319 | 319 | 319 | 319 | 319 | 1999 | 1999 |
| adj. R-sq | 0.262 | 0.281 | 0.291 | 0.298 | 0.305 | 0.262 | 0.233 |
| AIC | – 1,161.7 | – 1,168.9 | – 1,172.4 | – 1,176.4 | – 1,177.8 | – 7,163.9 | – 7,089.9 |
| Analyses using SUBTLEX to estimate frequency | |||||||
| BNC frequency | – 0.56*** | – 0.57*** | – 0.56*** | – 0.56*** | – 0.56*** | – 0.44*** | – 0.47*** |
| (0.00307) | (0.00300) | (0.00300) | (0.00295) | (0.00295) | (0.00127) | (0.00128) | |
| stimlen | – 0.01 | – 0.01 | – 0.00 | – 0.01 | – 0.02 | 0.07*** | 0.05** |
| (0.00213) | (0.00207) | (0.00208) | (0.00204) | (0.00203) | (0.000726) | (0.000744) | |
| 2015_trans | – 0.19*** | ||||||
| (0.00177) | |||||||
| 2018_transC1 | – 0.09 | ||||||
| (0.00255) | |||||||
| 2018_transC2 | – 0.16** | ||||||
| (0.00256) | |||||||
| ratingcmp | – 0.24*** | – 0.42*** | – 0.46*** | ||||
| (0.000125) | (0.000291) | (0.000115) | |||||
| ratingC1 | 0.21* | 0.23*** | – 0.05** | ||||
| (0.000185) | (0.0000785) | (0.0000476) | |||||
| ratingC2 | 0.04 | 0.14*** | – 0.08*** | ||||
| (0.000178) | (0.0000769) | (0.0000535) | |||||
|
| 319 | 319 | 319 | 319 | 319 | 2392 | 2392 |
| adj. R-sq | 0.307 | 0.342 | 0.343 | 0.362 | 0.374 | 0.285 | 0.247 |
| AIC | – 1,181.7 | – 1,197.1 | – 1,196.9 | – 1,207.1 | – 1,210.9 | – 8,366.6 | – 8,245.4 |
Standardized beta coefficients; standard errors are in parentheses. *p < .05, **p < .01, ***p < .001
Standardized regression coefficients with standard errors (in parentheses) from models using the semantic transparency measures and compound-based covariates to predict English Lexicon Project naming times
| Model 1 | Model 2 | Model 3 | Model4 | Model 5 | Model 6 | Model 7 | |
|---|---|---|---|---|---|---|---|
| SUBTLEX frequency | – 0.47*** | – 0.47*** | – 0.47*** | – 0.46*** | – 0.44*** | – 0.38*** | – 0.42*** |
| (0.00325) | (0.00326) | (0.00326) | (0.00324) | (0.00327) | (0.00135) | (0.00131) | |
| stimlen | 0.25*** | 0.25*** | 0.25*** | 0.26*** | 0.25*** | 0.29*** | 0.28*** |
| (0.00152) | (0.00153) | (0.00152) | (0.00151) | (0.00150) | (0.000548) | (0.000554) | |
| 2015_trans | – 0.02 | ||||||
| (0.00162) | |||||||
| 2018_ratingC1 | – 0.00 | ||||||
| (0.00236) | |||||||
| 2018_ratingC2 | – 0.08 | ||||||
| (0.00232) | |||||||
| ratingcmp | – 0.12** | – 0.34*** | – 0.33*** | ||||
| (0.000115) | (0.000261) | (0.000111) | |||||
| ratingC1 | 0.20** | 0.11*** | – 0.10*** | ||||
| (0.000164) | (0.0000737) | (0.0000425) | |||||
| ratingC2 | 0.11 | 0.14*** | – 0.02 | ||||
| (0.000162) | (0.0000701) | (0.0000476) | |||||
|
| 456 | 456 | 456 | 456 | 456 | 2514 | 2514 |
| adj. R-sq | 0.305 | 0.303 | 0.307 | 0.317 | 0.327 | 0.304 | 0.286 |
| AIC | – 1,640.6 | – 1,638.8 | – 1,640.5 | – 1,648.0 | – 1,652.4 | – 8,957.4 | – 8,893.1 |
Standardized beta coefficients; Standard errors are in parentheses. *p < .05, **p < .01, ***p < .001.
Standardized regression coefficients with standard errors (in parentheses) from models using the semantic transparency measures, compound-based covariates, and constituent-based covariates to predict English Lexicon Project lexical decision times
| Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | Model 6 | Model 7 | |
|---|---|---|---|---|---|---|---|
| Frequency | – 0.48*** | – 0.48*** | – 0.46*** | – 0.47*** | – 0.45*** | – 0.42*** | – 0.46*** |
| (0.00393) | (0.00394) | (0.00394) | (0.00392) | (0.00393) | (0.00163) | (0.00160) | |
| c1_SLlg10wf | – 0.06 | – 0.06 | – 0.07 | – 0.05 | – 0.06 | – 0.11*** | – 0.11*** |
| (0.00274) | (0.00275) | (0.00273) | (0.00274) | (0.00272) | (0.00128) | (0.00130) | |
| c2_SLlg10wf | – 0.15*** | – 0.15*** | – 0.17*** | – 0.15*** | – 0.14*** | – 0.09*** | – 0.09*** |
| (0.00285) | (0.00285) | (0.00287) | (0.00283) | (0.00281) | (0.00114) | (0.00116) | |
| c1len | 0.16*** | 0.16*** | 0.15*** | 0.16*** | 0.16*** | 0.20*** | 0.20*** |
| (0.00221) | (0.00221) | (0.00219) | (0.00219) | (0.00218) | (0.000928) | (0.000943) | |
| c2len | 0.00 | 0.00 | 0.00 | 0.01 | 0.01 | 0.09*** | 0.07*** |
| (0.00280) | (0.00280) | (0.00278) | (0.00279) | (0.00276) | (0.000992) | (0.00100) | |
| 2015_trans | – 0.03 | ||||||
| (0.00187) | |||||||
| 2018_ratingC1 | – 0.04 | ||||||
| (0.00270) | |||||||
| 2018_ratingC2 | – 0.11** | ||||||
| (0.00271) | |||||||
| ratingcmp | – 0.11** | – 0.33*** | – 0.35*** | ||||
| (0.000134) | (0.000302) | (0.000131) | |||||
| ratingC1 | 0.21** | 0.16*** | – 0.06*** | ||||
| (0.000189) | (0.0000875) | (0.0000512) | |||||
| ratingC2 | 0.10 | 0.14*** | – 0.03 | ||||
| (0.000187) | (0.0000831) | (0.0000568) | |||||
|
| 456 | 456 | 456 | 456 | 456 | 2501 | 2501 |
| adj. R-sq | 0.335 | 0.334 | 0.347 | 0.345 | 0.356 | 0.369 | 0.348 |
| AIC | – 1,509.7 | – 1,508.5 | – 1,516.4 | – 1,515.7 | – 1,521.3 | – 8,107.1 | – 8,024.5 |
Standardized beta coefficients; Standard errors are in parentheses. *p < .05, **p < .01, ***p < .001.
Standardized regression coefficients with standard errors (in parentheses) from models using the semantic transparency measures, compound-based covariates, and constituent-based covariates to predict British Lexicon Project lexical decision times
| Model 1 | Model 2 | Model 3 | Model4 | Model 5 | Model 6 | Model 7 | |
|---|---|---|---|---|---|---|---|
| Analyses using SUBTLEX to estimate frequency | |||||||
| SUBTLEX frequency | – 0.48*** | – 0.48*** | – 0.46*** | – 0.47*** | – 0.47*** | – 0.42*** | – 0.46*** |
| (0.00393) | (0.00388) | (0.00385) | (0.00384) | (0.00387) | (0.00150) | (0.00149) | |
| c1_SLlg10wf | – 0.02 | – 0.02 | – 0.04 | – 0.00 | – 0.01 | – 0.02 | – 0.03 |
| (0.00268) | (0.00265) | (0.00263) | (0.00263) | (0.00262) | (0.00118) | (0.00121) | |
| c2_SLlg10wf | – 0.12* | – 0.13* | – 0.16** | – 0.13* | – 0.12* | – 0.06** | – 0.05* |
| (0.00299) | (0.00295) | (0.00298) | (0.00292) | (0.00292) | (0.00114) | (0.00116) | |
| c1len | – 0.01 | – 0.01 | 0.00 | – 0.01 | – 0.01 | 0.02 | 0.01 |
| (0.00306) | (0.00302) | (0.00298) | (0.00299) | (0.00299) | (0.00113) | (0.00115) | |
| c2len | – 0.10* | – 0.10* | – 0.10* | – 0.09 | – 0.10 | – 0.02 | – 0.04 |
| (0.00312) | (0.00308) | (0.00304) | (0.00305) | (0.00304) | (0.00113) | (0.00114) | |
| 2015_trans | – 0.15** | ||||||
| (0.00183) | |||||||
| 2018_ratingC1 | – 0.09 | ||||||
| (0.00264) | |||||||
| 2018_ratingC2 | – 0.17*** | ||||||
| (0.00268) | |||||||
| ratingcmp | – 0.19*** | – 0.30** | – 0.42*** | ||||
| (0.000131) | (0.000311) | (0.000122) | |||||
| ratingC1 | 0.15 | 0.21*** | – 0.05* | ||||
| (0.000196) | (0.0000810) | (0.0000480) | |||||
| ratingC2 | – 0.00 | 0.15*** | – 0.06** | ||||
| (0.000189) | (0.0000790) | (0.0000540) | |||||
|
| 319 | 319 | 319 | 319 | 319 | 1994 | 1994 |
| adj. R-sq | 0.271 | 0.292 | 0.310 | 0.306 | 0.312 | 0.270 | 0.240 |
| AIC | – 1,162.7 | – 1,170.8 | – 1,178.1 | – 1,177.2 | – 1,177.9 | – 7,181.4 | – 7,102.1 |
| Analyses using SUBTLEX to estimate frequency | |||||||
| BNC frequency | – 0.51*** | – 0.53*** | – 0.50*** | – 0.53*** | – 0.53*** | – 0.43*** | – 0.46*** |
| (0.00329) | (0.00322) | (0.00319) | (0.00317) | (0.00317) | (0.00134) | (0.00136) | |
| c1_BNC frequency | – 0.06 | – 0.04 | – 0.06 | – 0.03 | – 0.03 | – 0.04 | – 0.04* |
| (0.00285) | (0.00279) | (0.00277) | (0.00277) | (0.00275) | (0.00127) | (0.00130) | |
| c2_BNC frequency | – 0.10* | – 0.10* | – 0.13** | – 0.09 | – 0.08 | – 0.04* | – 0.03 |
| (0.00314) | (0.00307) | (0.00310) | (0.00303) | (0.00303) | (0.00124) | (0.00127) | |
| c1len | 0.02 | 0.02 | 0.03 | 0.02 | 0.01 | 0.05* | 0.05** |
| (0.00298) | (0.00291) | (0.00289) | (0.00287) | (0.00286) | (0.00109) | (0.00111) | |
| c2len | – 0.06 | – 0.05 | – 0.05 | – 0.04 | – 0.05 | 0.04* | 0.02 |
| (0.00295) | (0.00288) | (0.00286) | (0.00285) | (0.00283) | (0.00102) | (0.00105) | |
| 2015_trans | – 0.19*** | ||||||
| (0.00177) | |||||||
| 2018_ratingC1 | – 0.09 | ||||||
| (0.00254) | |||||||
| 2018_ratingC2 | – 0.18*** | ||||||
| (0.00258) | |||||||
| ratingcmp | – 0.23*** | – 0.39*** | – 0.46*** | ||||
| (0.000126) | (0.000293) | (0.000117) | |||||
| ratingC1 | 0.20* | 0.24*** | – 0.04* | ||||
| (0.000186) | (0.0000800) | (0.0000488) | |||||
| ratingC2 | 0.03 | 0.14*** | – 0.08*** | ||||
| (0.000178) | (0.0000789) | (0.0000549) | |||||
|
| 319 | 319 | 319 | 319 | 319 | 2360 | 2360 |
| adj. R-sq | 0.316 | 0.348 | 0.358 | 0.365 | 0.375 | 0.286 | 0.249 |
| AIC | – 1,182.9 | – 1,197.4 | – 1,201.4 | – 1,205.4 | – 1,208.6 | – 8,241.8 | – 8,122.9 |
Standardized beta coefficients; standard errors are in parentheses. *p < .05, **p < .01, ***p < .001
Standardized regression coefficients with standard errors (in parentheses) from models using semantic transparency measures compound-based covariates and constituent-based covariates to predict English Lexicon Project naming times
| Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | Model 6 | Model 7 | |
|---|---|---|---|---|---|---|---|
| SUBTLEX frequency | – 0.40*** | – 0.40*** | – 0.38*** | – 0.39*** | – 0.38*** | – 0.33*** | – 0.36*** |
| (0.00326) | (0.00327) | (0.00328) | (0.00325) | (0.00327) | (0.00132) | (0.00129) | |
| c1_SLlg10wf | – 0.19*** | – 0.19*** | – 0.19*** | – 0.18*** | – 0.18*** | – 0.22*** | – 0.22*** |
| (0.00228) | (0.00228) | (0.00227) | (0.00227) | (0.00226) | (0.00104) | (0.00105) | |
| c2_SLlg10wf | – 0.14*** | – 0.14*** | – 0.16*** | – 0.14*** | – 0.13*** | – 0.11*** | – 0.12*** |
| (0.00236) | (0.00236) | (0.00239) | (0.00235) | (0.00234) | (0.000924) | (0.000933) | |
| c1len | 0.24*** | 0.24*** | 0.23*** | 0.24*** | 0.23*** | 0.23*** | 0.23*** |
| (0.00183) | (0.00183) | (0.00182) | (0.00182) | (0.00181) | (0.000753) | (0.000761) | |
| c2len | 0.05 | 0.05 | 0.05 | 0.06 | 0.06 | 0.12*** | 0.11*** |
| (0.00232) | (0.00233) | (0.00231) | (0.00231) | (0.00230) | (0.000805) | (0.000810) | |
| 2015_trans | – 0.02 | ||||||
| (0.00155) | |||||||
| 2018_ratingC1 | – 0.00 | ||||||
| (0.00225) | |||||||
| 2018_ratingC2 | – 0.10** | ||||||
| (0.00225) | |||||||
| ratingcmp | – 0.09* | – 0.29*** | – 0.28*** | ||||
| (0.000111) | (0.000251) | (0.000107) | |||||
| ratingC1 | 0.19** | 0.12*** | – 0.06*** | ||||
| (0.000158) | (0.0000710) | (0.0000413) | |||||
| ratingC2 | 0.09 | 0.12*** | – 0.02 | ||||
| (0.000156) | (0.0000675) | (0.0000459) | |||||
|
| 456 | 456 | 456 | 456 | 456 | 2502 | 2502 |
| adj. R-sq | 0.367 | 0.366 | 0.375 | 0.374 | 0.383 | 0.364 | 0.351 |
| AIC | – 1,680.8 | – 1,678.9 | – 1,684.5 | – 1,684.7 | – 1,689.1 | – 9,153.2 | – 9,103.1 |
Standardized beta coefficients; standard errors are in parentheses. *p < .05, **p < .01, ***p < .001.
Standardized regression coefficients with standard errors (in parentheses) from models using vector-based measures of semantic transparency to predict English Lexicon Project (ELP) lexical decision (LD) times, British Lexicon Project (BLP) lexical decision times, and ELP naming times
| ELP LD | ELP LD | BLP LD | BLP LD | BLP LD | BLP LD | ELP Naming | ELP Naming | |
|---|---|---|---|---|---|---|---|---|
| SUBTLEX frequency | – 0.423*** (0.00190) | – 0.451*** (0.00191) | – 0.484*** (0.00197) | – 0.506*** (0.00197) | – 0.349*** (0.00154) | – 0.373*** (0.00153) | ||
| c1_SLlg10wf | – 0.143*** (0.00151) | – 0.138*** (0.00155) | – 0.034 (0.00164) | – 0.032 (0.00167) | – 0.248*** (0.00122) | – 0.244*** (0.00124) | ||
| c2_SLlg10wf | – 0.115*** (0.00138) | – 0.101*** (0.00141) | – 0.122*** (0.00154) | – 0.107*** (0.00160) | – 0.147*** (0.00111) | – 0.142*** (0.00113) | ||
| BNC_frequency | – 0.482*** (0.00186) | – 0.492*** (0.00180) | ||||||
| c1_BNC frequency | – 0.028 (0.00178) | – 0.036 (0.00179) | ||||||
| c2_BNC frequency | – 0.069** (0.00170) | – 0.065* (0.00173) | ||||||
| c1len | 0.180*** (0.00112) | 0.179*** (0.00113) | 0.026 (0.00152) | 0.024(0.00152) | 0.057* (0.00153) | 0.059* (0.00152) | 0.194*** (0.000904) | 0.193*** (0.000903) |
| c2len | 0.095*** (0.00116) | 0.092*** (0.00116) | – 0.028 (0.00163) | – 0.029 (0.00163) | 0.047 (0.00160) | 0.045 (0.00159) | 0.104*** (0.000932) | 0.105*** (0.000931) |
| LSAc1c2 | 0.022 (0.00791) | 0.005 (0.00866) | – 0.030 (0.00860) | 0.033 (0.00638) | ||||
| LSAc1stim | – 0.072*** (0.00631) | – 0.064* (0.00673) | – 0.055*(0.00679) | – 0.031 (0.00509) | ||||
| LSAc2stim | – 0.047* (0.00729) | – 0.031 (0.00751) | – 0.043 (0.00749) | – 0.018 (0.00588) | ||||
| c1c2_snautCos | 0.002 (0.0107) | 0.014 (0.0119) | 0.061* (0.0113) | – 0.043 (0.00860) | ||||
| c1stim_snautCos | 0.025 (0.00956) | 0.036 (0.0100) | 0.091*** (0.00992) | – 0.007 (0.00765) | ||||
| c2stim_snautCos | – 0.042* (0.00915) | – 0.049 (0.00979) | 0.024 (0.00969) | – 0.039 (0.00733) | ||||
|
| 1,767 | 1,767 | 1,121 | 1,121 | 1,191 | 1,191 | 1,768 | 1,768 |
| R-sq | 0.350 | 0.344 | 0.304 | 0.301 | 0.293 | 0.304 | 0.344 | 0.346 |
| adj. R-sq | 0.347 | 0.341 | 0.299 | 0.296 | 0.288 | 0.299 | 0.341 | 0.343 |
| AIC | – 5,774.3 | – 5,757.3 | – 4,037.6 | – 4,033.1 | – 4,203.5 | – 4,221.1 | – 6,536.8 | – 6,543.4 |
Standardized beta coefficients; standard errors are in parentheses. *p < .05, **p < .01, ***p < .001.
Standardized regression coefficients with standard errors (in parentheses) from models using frequency, stimulus length (in letters), log bigram frequency (based either on SUBTLEX or from Jones & Mewhort, 2004), to predict English Lexicon Project (ELP) lexical decision (LD) times, British Lexicon Project (BLP) lexical decision times, and ELP naming times
| ELP LD | ELP LD | BLP LD | BLP LD | BLP LD | BLP LD | ELP Naming | ELP Naming | |
|---|---|---|---|---|---|---|---|---|
| SUBTLEX frequency | – 0.491*** | – 0.491*** | – 0.479*** | – 0.478*** | – 0.415*** | – 0.414*** | ||
| (0.00156) | (0.00156) | (0.00144) | (0.00144) | (0.00131) | (0.00131) | |||
| BNC frequency | – 0.486*** | – 0.486*** | ||||||
| (0.00129) | (0.00129) | |||||||
| stimlen | 0.220*** | 0.217*** | – 0.015 | – 0.015 | 0.042* | 0.042* | 0.258*** | 0.256*** |
| (0.000645) | (0.000646) | (0.000774) | (0.000774) | (0.000743) | (0.000743) | (0.000542) | (0.000543) | |
| bgSUBTLEX | 0.078*** | 0.038 | 0.041* | 0.087*** | ||||
| (0.000975) | (0.000902) | (0.000852) | (0.000819) | |||||
| bgJonesMewhort | 0.088*** | 0.035 | 0.041* | 0.087*** | ||||
| (0.000460) | (0.000422) | (0.000399) | (0.000387) | |||||
|
| 2,593 | 2,593 | 2,002 | 2,002 | 2,396 | 2,396 | 2,594 | 2,594 |
| R-sq | 0.322 | 0.324 | 0.226 | 0.226 | 0.239 | 0.239 | 0.273 | 0.273 |
| adj. R-sq | 0.321 | 0.323 | 0.225 | 0.225 | 0.238 | 0.238 | 0.272 | 0.272 |
| AIC | – 8,199.6 | – 8,205.8 | – 7,079.1 | – 7,078.6 | – 8,231.6 | – 8,231.4 | – 9,106.2 | – 9,106.3 |
Standardized beta coefficients; standard errors are in parentheses. *p < .05, **p < .01, ***p < .001.
Standardized regression coefficients with standard errors (in parentheses) from models using frequency, stimulus length (in letters), log bigram frequency calculated from SUBTLEX, and whether the item was correctly parsed or not to predict English Lexicon Project (ELP) lexical decision (LD) times, British Lexicon Project (BLP) lexical decision times, and ELP naming times
| ELP LD | BLP LD | BLP LD | ELP Naming | |
|---|---|---|---|---|
| SUBTLEX frequency | – 0.487*** | – 0.472*** | – 0.409*** | |
| (0.00152) | (0.00140) | (0.00128) | ||
| BNC frequency | – 0.482*** | |||
| (0.00125) | ||||
| stimlen | 0.217*** | – 0.010 | 0.048** | 0.264*** |
| (0.000627) | (0.000753) | (0.000723) | (0.000527) | |
| log_bgSUBTLEX | – 0.335* | – 0.476** | – 0.409** | – 0.085 |
| (0.00801) | (0.00693) | (0.00660) | (0.00673) | |
| Correct parse | – 0.561** | – 0.788*** | – 0.652** | – 0.208 |
| (0.0459) | (0.0401) | (0.0380) | (0.0386) | |
| Bigram frequency x Correct parse | 0.244** | .023** | 0.021 | 0.008 |
| (0.00807) | (0.00698) | (0.00665) | (0.00678) | |
|
| 2,773 | 2,183 | 2,604 | 2,774 |
| adj. R-sq | 0.318 | 0.226 | 0.240 | 0.270 |
| AIC | – 8,762.3 | – 7,704.8 | – 8,929.3 | – 9,730.1 |
Standardized beta coefficients; standard errors are in parentheses. *p < .05, **p < .01, ***p < .001.
Standardized regression coefficients with standard errors (in parentheses) from models using frequency, stimulus length (in letters), log bigram frequency, family size of the first constituent, and family size of the second constituent to predict English Lexicon Project (ELP) lexical decision (LD) times, British Lexicon Project (BLP) lexical decision times, and ELP naming times
| ELP LD | BLP LD | BLP LD | ELP Naming | |
|---|---|---|---|---|
| SUBTLEX frequency | – 0.481*** | – 0.469*** | – 0.397*** | |
| (0.00155) | (0.00143) | (0.00127) | ||
| BNC frequency | – 0.472*** | |||
| (0.00129) | ||||
| stimlen | 0.227*** | – 0.031 | 0.034 | 0.263*** |
| (0.000645) | (0.000787) | (0.000758) | (0.000525) | |
| nc1_cmp | – 0.117*** | – 0.122*** | – 0.098*** | – 0.242*** |
| (0.0000394) | (0.0000506) | (0.0000490) | (0.0000321) | |
| nc2_cmp | 0.005 | – 0.038 | – 0.008 | – 0.024 |
| (0.0000206) | (0.0000198) | (0.0000191) | (0.0000168) | |
|
| 2,593 | 2,002 | 2,396 | 2,594 |
| adj. R-sq | 0.329 | 0.238 | 0.246 | 0.322 |
| AIC | – 8,227.7 | – 7,112.9 | – 8,253.5 | – 9,288.9 |
Standardized beta coefficients; standard errors are in parentheses. *p < .05, **p < .01, ***p < .001.
Standardized regression coefficients with standard errors (in parentheses) from models using frequency, stimulus length (in letters), and sentiment to predict English Lexicon Project (ELP) lexical decision (LD) times, British Lexicon Project (BLP) lexical decision times, and ELP naming times
| ELP LD | BLP LD | BLP LD | ELP Naming | |
|---|---|---|---|---|
| SUBTLEX frequency | – 0.473*** | – 0.455*** | – 0.401*** | |
| (0.00162) | (0.00149) | (0.00136) | ||
| BNC frequency | – 0.464*** | |||
| (0.00132) | ||||
| stimlen | 0.226*** | – 0.012 | 0.042* | 0.264*** |
| (0.000647) | (0.000773) | (0.000743) | (0.000544) | |
| sentimentprobneg_stim | 0.090*** | 0.126*** | 0.115*** | 0.065** |
| (0.0219) | (0.0212) | (0.0221) | (0.0184) | |
| sentimentprobneg_c1 | – 0.010 | – 0.010 | 0.004 | – 0.041 |
| (0.0112) | (0.0107) | (0.0104) | (0.00942) | |
| sentimentprobneg_c2 | 0.016 | 0.012 | – 0.003 | 0.008 |
| (0.0125) | (0.0120) | (0.0114) | (0.0105) | |
| sentimentprobpos_stim | 0.073*** | 0.083** | 0.052* | 0.050* |
| (0.0215) | (0.0213) | (0.0225) | (0.0181) | |
| sentimentprobpos_c1 | – 0.025 | – 0.002 | – 0.017 | – 0.015 |
| (0.0114) | (0.0107) | (0.0104) | (0.00961) | |
| sentimentprobpos_c2 | 0.025 | 0.019 | 0.012 | 0.023 |
| (0.0131) | (0.0124) | (0.0118) | (0.0110) | |
|
| 2,593 | 2,002 | 2,396 | 2,594 |
| adj. R-sq | 0.320 | 0.231 | 0.244 | 0.267 |
| AIC | – 8,187.5 | – 7,089.5 | – 8,242.7 | – 9,082.9 |
Standardized beta coefficients; standard errors are in parentheses. *p < .05, **p < .01, ***p < .001.
Standardized regression coefficients with standard errors (in parentheses) from models using frequency, stimulus length (in letters), and valence to predict English Lexicon Project (ELP) lexical decision (LD) times, British Lexicon Project (BLP) lexical decision times, and ELP naming times
| ELP LD | BLP LD | BLP LD | ELP Naming | |
|---|---|---|---|---|
| SUBTLEX frequency | – 0.302*** (0.00277) | – 0.417***(0.00205) | – 0.290*** (0.00215) | |
| BNC frequency | – 0.534*** (0.00159) | |||
| stimlen | 0.199*** (0.000965) | – 0.032 (0.00105) | 0.009 (0.000945) | 0.248*** (0.000750) |
| valence_stim | – 0.165*** (0.00144) | – 0.218*** (0.00126) | – 0.151*** (0.00117) | – 0.049 (0.00112) |
| valence_c1 | – 0.054 (0.00136) | – 0.022 (0.00116) | – 0.015 (0.00108) | – 0.055 (0.00106) |
| valence_c2 | – 0.003 (0.00151) | 0.087** (0.00137) | 0.046 (0.00127) | – 0.004 (0.00117) |
|
| 1,076 | 950 | 950 | 1,076 |
| adj. R-sq | 0.187 | 0.223 | 0.334 | 0.168 |
| AIC | – 3,605.7 | – 3,567.7 | – 3,716.5 | – 4,148.0 |
Standardized beta coefficients; standard errors are in parentheses. *p < .05, **p < .01, ***p < .001.
List of variables in LADEC
| Variable Name | Variable Description |
|---|---|
| id_master | id |
| c1 | first constituent |
| c2 | second constituent |
| stim | compound |
| obs | observation number by compound |
| obsc1 | observation number by first constituent |
| obsc2 | observation number by second constituent |
| stimlen | length of compound |
| c1len | length of first constituent |
| c2len | length of second constituent |
| nparses | number of parses per compound |
| correctParse | correct parse? (1 = |
| ratingcmp | Predictability rating |
| ratingC1 | Meaning retention rating for first constituent |
| ratingC2 | Meaning retention rating for second constituent |
| isPlural | is plural? (1 = |
| nc1_cmp | first constituent, family size based on correctly parsed compounds |
| nc2_cmp | second constituent, family size based on correctly parsed compounds |
| nc1_cmpnoplural | first constituent, family size based on correctly parsed compounds, no plurals |
| nc2_cmpnoplural | second constituent, family size based on correctly parsed compounds, no plurals |
| sentiment_stim | sentiment stim (Mathematica classifier) |
| sentiment_c1 | sentiment first constituent (Mathematica classifier) |
| sentiment_c2 | sentiment second constituent (Mathematica classifier) |
| sentimentprobpos_stim | probability of positive sentiment, stimulus (Mathematica classifier) |
| sentimentprobpos_c1 | probability of positive sentiment, first constituent (Mathematica classifier) |
| sentimentprobpos_c2 | probability of positive sentiment, second constituent (Mathematica classifier) |
| sentimentprobneg_stim | probability of negative sentiment, stimulus (Mathematica classifier) |
| sentimentprobneg_c1 | probability of negative sentiment, first constituent (Mathematica classifier) |
| sentimentprobneg_c2 | probability of negative sentiment, second constituent (Mathematica classifier) |
| sentimentratioposneg_stim | ratio of probPos/probNeg, stimulus (Mathematica sentiment classifier) |
| sentimentratioposneg_c1 | ratio of probPos/probNeg, first constituent (Mathematica sentiment classifier) |
| sentimentratioposneg_c2 | ratio of probPos/probNeg, second constituent (Mathematica sentiment classifier) |
| profanity_stim | is stimulus profane? (Mathematica classifier) |
| profanity_c1 | is first constituent profane? (Mathematica classifier) |
| profanity_c2 | is second constituent profane? (Mathematica classifier) |
| isCommonstim | is stimulus in Mathematica_CommonList? |
| isCommonC1 | Is first constituent in Mathematica's list of common words? (1 = |
| isCommonC2 | Is second constituent in Mathematica's list of common words? (1 = |
| bg_boundary | bigram at c1c2 boundary |
| bgJonesMewhort | bigram frequency from Jones & Mewhort ( |
| bgSUBTLEX | bigram frequency from SUBTLEX-US |
| bgFacebook | bigram frequency from Facebook |
| inSUBTLEX | compound in SUBTLEX-US? (1 = |
| inBLP | compound in BLP? (1 = |
| inELP | compound in ELP? (1 = |
| inJuhaszLaiWoodcock | compound in Juhasz, Lai, & Woodcock ( |
| c1_inELP | first constituent in ELP? (1 = |
| c1_inBrysbaert | first constituent in Brysbaert, Warriner, & Kuperman ( |
| c1_inWordnet | first constituent in Wordnet? (1 = |
| c1_inMMA | first constituent in Mathematica? (1 = |
| c2_inELP | second constituent in ELP? (1 = |
| c2_inBrysbaert | second constituent in Brysbaert, Warriner, & Kuperman ( |
| c2_inWordnet | second constituent in Wordnet? (1 = |
| c2_inMMA | second constituent in Mathematica? (1 = |
| LSAc1c2 | LSA between first constituent and second constituent |
| LSAc1stim | LSA between first constituent and compound |
| LSAc2stim | LSA between second constituent and compound |
| stim_SLlg10wf | SUBTLEX-US log word frequency for compound |
| C1_SLlg10wf | SUBTLEX-US log word frequency for first constituent |
| C2_SLlg10wf | SUBTLEX-US log word frequency for second constituent |
| BLPbncfrequency | BNC word frequency of the compound from the BLP |
| BLPbncfrequencymillion | BNC word frequency per million of the compound from the BLP |
| c1_BLPbncfrequency | BNC word frequency of the first constituent from the BLP |
| c1_BLPbncfrequencymillion | BNC word frequency per million of the first constituent from the BLP |
| c2_BLPbncfrequency | BNC word frequency of the second constituent from the BLP |
| c2_BLPbncfrequencymillion | BNC word frequency per million of the second constituent from the BLP |
| BLPrt | BLP lexical decision times |
| elp_ld_rt | ELP lexical decision times |
| elp_naming_mean_rt | ELP naming times |
| c1c2_snautCos | first and second constituent cosine distance SNAUT |
| c1stim_snautCos | first constituent and compound cosine distance SNAUT |
| c2stim_snautCos | second constituent and compound cosine distance SNAUT |
| fbusfreq | frequency per billion words in the American corpus |
| fbukfreq | frequency per billion words in the British corpus |
| valence_stim | valence of compound in Warriner, Kuperman, & Brysbaert ( |
| valence_c1 | valence of first constituent in Warriner, Kuperman, & Brysbaert ( |
| valence_c2 | valence of second constituent in Warriner, Kuperman, & Brysbaert ( |
| concreteness_stim | concreteness of compound in Brysbaert, Warriner, & Kuperman ( |
| concreteness_c1 | concreteness of first constituent in Brysbaert, Warriner, & Kuperman ( |
| concreteness_c2 | concreteness of second constituent in Brysbaert, Warriner, & Kuperman ( |
| Juhasz_tran (2015_trans) | Transparency rating from Juhasz, Lai, & Woodcock ( |
| st_c1_mean (2018_ratingC1) | Rating for first constituent from Kim et al. ( |
| st_c2_mean (2018_ratingC2) | Rating for first constituent from Kim et al. ( |