| Literature DB >> 31517218 |
Hendrik Vankrunkelsven1, Steven Verheyen1, Gert Storms1, Simon De Deyne1,2.
Abstract
In two studies we compare a distributional semantic model derived from word co-occurrences and a word association based model in their ability to predict properties that affect lexical processing. We focus on age of acquisition, concreteness, and three affective variables, namely valence, arousal, and dominance, since all these variables have been shown to be fundamental in word meaning. In both studies we use a model based on data obtained in a continued free word association task to predict these variables. In Study 1 we directly compare this model to a word co-occurrence model based on syntactic dependency relations to see which model is better at predicting the variables under scrutiny in Dutch. In Study 2 we replicate our findings in English and compare our results to those reported in the literature. In both studies we find the word association-based model fit to predict diverse word properties. Especially in the case of predicting affective word properties, we show that the association model is superior to the distributional model.Entities:
Keywords: affective word characteristics; age of acquisition; concreteness; k-nearest neighbors; lexical norms; word associations
Year: 2018 PMID: 31517218 PMCID: PMC6634333 DOI: 10.5334/joc.50
Source DB: PubMed Journal: J Cogn ISSN: 2514-4820
Information about the lexico-semantic norms used in Study 1 and 2: Amount of words, number of raters per word, and split-half reliabilities.
| Study 1 | Study 2 | |||||
|---|---|---|---|---|---|---|
| Words | Raters | Reliability | Words | Raters | Reliability | |
| Valencea | 4,299 | 64 | .99d | 13,915 | 20 | .91 |
| Arousala | 4,299 | 64 | .97d | 13,915 | 20 | .69 |
| Dominancea | 4,299 | 64 | .96d | 13,915 | 20 | .77 |
| AoAb | 4,299 | 32 | .97d | 30,121 | 18+ | .92 |
| Concretenessc | 30,070 | 15 | .91–.93d,e | 37,058 | 25+ | – |
a Norms from Moors et al. (2013) for Study 1 and from Warriner et al. (2013) for Study 2. b Norms from Moors et al. (2013) for Study 1 and from Kuperman et al. (2012) for Study 2. c Norms from Brysbaert, Stevens, et al. (2014) for Study 1 and from Brysbaert, Warriner, and Kuperman (2014) for Study 2. d Spearman-Brown corrected split-half correlations calculated on 10,000 different randomizations of the participants. e Reliabilities of each of five lists of ca. 6,000 words were within this range.
Figure 1Correlations between predicted ratings and human ratings for valence, arousal, dominance, AoA, and concreteness, using association data or word co-occurrence data. Values of k are 1 to 50, 60, 70, 80, 90, and 100.
The highest correlations and 95% confidence intervals for each variable per source of data (associations and text co-occurrences) using k-NN. All cross-validation correlations use the leave-one-out principle. The respective size of k is listed between square brackets.
| Associations | Word co-occurrences | ||
|---|---|---|---|
| Valence | 2,831 | .91 (.91–.92) [50] | .78 (.77–.80) [38] |
| Arousal | 2,831 | .84 (.83–.85) [19] | .73 (.71–.75) [8] |
| Dominance | 2,831 | .84 (.83–.85) [8] | .66 (.64–.68) [8] |
| AoA | 2,831 | .71 (.69–.73) [43] | .64 (.61–.66) [24] |
| Concreteness | 2,831 | .87 (.86–.88) [10] | .87 (.86–.88) [11] |
Figure 2Correlations between estimated values based on the word association data and human ratings for valence, arousal, dominance, AoA, and concreteness. Values of k are 1 to 50, 60, 70, 80, 90, and 100.
Highest correlations (r), 95% confidence intervals (95% CI), sample size (N) for each variable using k-NN with their respective value of k (k). All cross-validation correlations use the leave-one-out principle.
| 95% CI | ||||
|---|---|---|---|---|
| Valence | 8770 | .86 | (.86–.87) | 24 |
| Arousal | 8770 | .69 | (.68–.70) | 44 |
| Dominance | 8770 | .75 | (.74–.76) | 25 |
| AoA | 10032 | .59 | (.58–.61) | 26 |
| Concreteness | 10957 | .87 | (.86–.87) | 8 |
Highest correlations (r), 95% confidence intervals (95% CI), sample size (N) for each variable using k-NN with their respective value of k (k), for the ANEW (Bradley & Lang, 1999) norms. All cross-validation correlations use the leave-one-out principle.
| 95% CI | ||||
|---|---|---|---|---|
| Valence | 946 | .92 | (.91–.93) | 11 |
| Arousal | 946 | .74 | (.71–.77) | 10 |
| Dominance | 946 | .83 | (.81–.85) | 10 |
Highest correlations (r), 95% confidence intervals (95% CI), sample size (N) for each variable using k-NN with their respective value of k (k). Data is trained on the Warriner et al. (2013) norms, and tested with the ANEW (Bradley & Lang, 2017) norms.
| 95% CI | ||||
|---|---|---|---|---|
| Valence | 2156 | .89 | (.88–.89) | 13 |
| Arousal | 2156 | .71 | (.68–.73) | 24 |
| Dominance | 2156 | .76 | (.74–.77) | 23 |