| Literature DB >> 33904142 |
Anita Peti-Stantić1,2, Maja Anđel3,4, Vedrana Gnjidić3, Gordana Keresteš3,5, Nikola Ljubešić3,6,7, Irina Masnikosa3, Mirjana Tonković3,5, Jelena Tušek3,8, Jana Willer-Gold9, Mateusz-Milan Stanojević3,10.
Abstract
Psycholinguistic databases containing ratings of concreteness, imageability, age of acquisition, and subjective frequency are used in psycholinguistic and neurolinguistic studies which require words as stimuli. Linguistic characteristics (e.g. word length, corpus frequency) are frequently coded, but word class is seldom systematically treated, although there are indications of its significance for imageability and concreteness. This paper presents the Croatian Psycholinguistic Database (CPD; available at: https://doi.org/10.17234/megahr.2019.hpb ), containing 6000 Croatian nouns, verbs, adjectives and adverbs, rated for concreteness, imageability, age of acquisition, and subjective frequency. Moreover, we present computationally obtained extrapolations of concreteness and imageability to the remainder of the Croatian lexicon (available at: https://github.com/megahr/lexicon/blob/master/predictions/hr_c_i.predictions.txt ). In the two studies presented here, we explore the significance of word class for concreteness and imageability in human and computationally obtained ratings. The observed correlations in the CPD indicate correspondences between psycholinguistic measures expected from the literature. Word classes exhibit differences in subjective frequency, age of acquisition, concreteness and imageability, with significant differences between nouns, verbs, adjectives and adverbs. In the computational study which focused on concreteness and imageability, concreteness obtained higher correlations with human ratings than imageability, and the system underpredicted the concreteness of nouns, and overpredicted the concreteness of adjectives and adverbs. Overall, this suggests that word class contains schematic conceptual and distributional information. Schematic conceptual content seems to be more significant in human ratings of concreteness and less significant in computationally obtained ratings, where distributional information seems to play a more significant role. This suggests that word class differences should be theoretically explored.Entities:
Keywords: Age of acquisition; Computational modeling; Concreteness; Croatian psycholinguistic database; Imageability; Subjective frequency
Mesh:
Year: 2021 PMID: 33904142 PMCID: PMC8367916 DOI: 10.3758/s13428-020-01533-x
Source DB: PubMed Journal: Behav Res Methods ISSN: 1554-351X
Number of Raters for Each Psycholinguistic Category
| Total | Female | Male | ||||||
|---|---|---|---|---|---|---|---|---|
| Concreteness | 30.20 | 1.35 | 20 | 36 | 21.98 | 3.83 | 6.90 | 3.39 |
| Imageability | 30.05 | 0.86 | 22 | 35 | 23.06 | 3.82 | 5.43 | 2.95 |
| Age of acquisition | 29.72 | 1.21 | 17 | 34 | 22.80 | 3.74 | 5.40 | 2.93 |
| Subjective frequency | 30.16 | 1.34 | 19 | 36 | 21.94 | 3.83 | 6.89 | 3.40 |
Descriptive statistics for 6000 words from the CPD
| Variable | Word type | N | M | SD | Min. | Max. | 1st Quartile | Median | 3rd Quartile |
|---|---|---|---|---|---|---|---|---|---|
| Concreteness | Nouns | 2617 | 3.62 | 0.85 | 1.27 | 5.00 | 2.97 | 3.63 | 4.37 |
| Verbs | 1571 | 3.21 | 0.71 | 1.43 | 4.86 | 2.67 | 3.17 | 3.73 | |
| Adjectives | 1554 | 2.92 | 0.69 | 1.17 | 4.83 | 2.43 | 2.83 | 3.40 | |
| Adverbs | 258 | 2.66 | 0.64 | 1.33 | 4.43 | 2.20 | 2.59 | 3.13 | |
| Total | 6000 | 3.29 | 0.83 | 1.17 | 5.00 | 2.66 | 3.23 | 3.90 | |
| Imageability | Nouns | 2617 | 3.84 | 0.77 | 1.29 | 5.00 | 3.27 | 3.90 | 4.53 |
| Verbs | 1571 | 3.58 | 0.67 | 1.70 | 4.97 | 3.07 | 3.60 | 4.10 | |
| Adjectives | 1554 | 3.37 | 0.69 | 1.24 | 4.97 | 2.87 | 3.33 | 3.87 | |
| Adverbs | 258 | 3.09 | 0.66 | 1.76 | 4.77 | 2.63 | 2.97 | 3.60 | |
| Total | 6000 | 3.62 | 0.75 | 1.24 | 5.00 | 3.04 | 3.63 | 4.23 | |
| Age of acquisition | Nouns | 2617 | 8.36 | 2.53 | 2.24 | 16.72 | 6.37 | 8.30 | 10.21 |
| Verbs | 1571 | 7.80 | 2.16 | 3.00 | 14.43 | 6.10 | 7.67 | 9.35 | |
| Adjectives | 1554 | 8.84 | 2.23 | 3.70 | 15.27 | 7.23 | 8.86 | 10.40 | |
| Adverbs | 258 | 7.46 | 2.45 | 3.50 | 14.70 | 5.47 | 6.92 | 9.42 | |
| Total | 6000 | 8.30 | 2.39 | 2.24 | 16.72 | 6.40 | 8.27 | 10.03 | |
| Subjective frequency | Nouns | 2617 | 3.24 | 0.78 | 1.14 | 4.97 | 2.67 | 3.23 | 3.83 |
| Verbs | 1571 | 3.46 | 0.69 | 1.17 | 5.00 | 3.00 | 3.47 | 3.99 | |
| Adjectives | 1554 | 3.18 | 0.69 | 1.30 | 4.90 | 2.70 | 3.17 | 3.67 | |
| Adverbs | 258 | 3.91 | 0.81 | 1.30 | 5.00 | 3.50 | 4.10 | 4.52 | |
| Total | 6000 | 3.31 | 0.75 | 1.14 | 5.00 | 2.77 | 3.32 | 3.87 | |
| Frequency | Nouns | 2617 | 66,077.88 | 162,126.28 | 2.00 | 4.075e +6 | 4927.00 | 15,532.00 | 62,690.00 |
| Verbs | 1571 | 65,243.29 | 254,388.35 | 11.00 | 5.663e +6 | 4756.00 | 11,934.00 | 46,015.00 | |
| Adjectives | 1554 | 49,090.40 | 172,019.94 | 1.00 | 3.582e +6 | 4473.25 | 10,461.50 | 35,283.00 | |
| Adverbs | 258 | 215,711.79 | 413,298.93 | 4.00 | 3.077e +6 | 16,827.50 | 71,288.00 | 223,390.75 | |
| Total | 6000 | 67,893.86 | 210,739.43 | 1.00 | 5.663e +6 | 4804.50 | 13,495.50 | 54,260.75 | |
| Length | Nouns | 2617 | 7.23 | 2.46 | 2.00 | 18.00 | 5.00 | 7.00 | 9.00 |
| Verbs | 1571 | 8.74 | 2.07 | 3.00 | 18.00 | 7.00 | 9.00 | 10.00 | |
| Adjectives | 1554 | 8.23 | 2.35 | 3.00 | 16.00 | 7.00 | 8.00 | 10.00 | |
| Adverbs | 258 | 6.58 | 2.05 | 2.000 | 12.00 | 5.00 | 6.00 | 8.00 | |
| Total | 6000 | 7.86 | 2.42 | 2.000 | 18.00 | 6.00 | 8.00 | 9.00 |
Fig. 1Correlations between psycholinguistic and linguistic features of words in the CPD
| 1 apstraktno | 2 | 3 | 4 | 5 konkretno |
| pravda | tuljan | |||
| smisao | majica | |||
| morati | jesti | |||
| koncipirati | plivati | |||
| poetski | slan | |||
| slobodan | drven |
| 1 Gotovo nikada | 2 | 3 | 4 | 5 Jednom ili više puta dnevno |
| 1 abstract | 2 | 3 | 4 | 5 concrete |
| justice | seal | |||
| meaning | T-shirt | |||
| must | eat | |||
| conceptualize | swim | |||
| poetic | salty | |||
| free | wooden |
| 1 Almost never | 2 | 3 | 4 | 5 Once or several times a day |
| 1 - nisko predočivo | 2 | 3 | 4 | 5 - visoko predočivo |
| 1 - low imageability | 2 | 3 | 4 | 5 - high imageability |