| Literature DB >> 25196838 |
Zahra Sadeghi1, James L McClelland2, Paul Hoffman3.
Abstract
An influential position in lexical semantics holds that semantic representations for words can be derived through analysis of patterns of lexical co-occurrence in large language corpora. Firth (1957) famously summarised this principle as "you shall know a word by the company it keeps". We explored whether the same principle could be applied to non-verbal patterns of object co-occurrence in natural scenes. We performed latent semantic analysis (LSA) on a set of photographed scenes in which all of the objects present had been manually labelled. This resulted in a representation of objects in a high-dimensional space in which similarity between two objects indicated the degree to which they appeared in similar scenes. These representations revealed similarities among objects belonging to the same taxonomic category (e.g., items of clothing) as well as cross-category associations (e.g., between fruits and kitchen utensils). We also compared representations generated from this scene dataset with two established methods for elucidating semantic representations: (a) a published database of semantic features generated verbally by participants and (b) LSA applied to a linguistic corpus in the usual fashion. Statistical comparisons of the three methods indicated significant association between the structures revealed by each method, with the scene dataset displaying greater convergence with feature-based representations than did LSA applied to linguistic data. The results indicate that information about the conceptual significance of objects can be extracted from their patterns of co-occurrence in natural environments, opening the possibility for such data to be incorporated into existing models of conceptual representation.Entities:
Keywords: Categorisation; Latent semantic analysis; Object knowledge; Semantic representation
Mesh:
Year: 2014 PMID: 25196838 PMCID: PMC4589736 DOI: 10.1016/j.neuropsychologia.2014.08.031
Source DB: PubMed Journal: Neuropsychologia ISSN: 0028-3932 Impact factor: 3.139
Fig. 1Examples of three images and their object lists from the SUN database.
Fig. 2Similarity matrix for a selection of objects in the feature dataset. Colour scale indicates the cosine similarity between pairs of objects (1=identical and 0=no similarity). Objects are ordered according to results of a hierarchical clustering algorithm applied to the data. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 3Similarity matrix for a selection of objects in the scene dataset.
Fig. 4Similarity matrix for a selection of objects in the verbal LSA dataset.
Correlations between similarity matrices derived from each dataset.
| Feature dataset | Scene dataset | Verbal LSA dataset | |
|---|---|---|---|
| Feature dataset | 1 | ||
| Scene dataset | 0.29 | 1 | |
| Verbal LSA dataset | 0.23 | 0.30 | 1 |
p<0.001.
Results of multiple regression analysis predicting feature dataset similarities from the other two datasets.
| Standard error | ||||
|---|---|---|---|---|
| Scene dataset | 0.13 | 0.006 | 0.25 | 21.4 |
| Verbal LSA dataset | 0.11 | 0.008 | 0.16 | 13.8 |
p<0.001.
Correlations of similarity matrices for scene and verbal LSA datasets with similarity matrices generated from each type of semantic feature separately.
| Feature type | Scene dataset | Verbal LSA dataset |
|---|---|---|
| Perceptual | 0.18 | 0.14 |
| Functional | 0.20 | 0.19 |
| Encyclopaedic | 0.30 | 0.21 |
p<0.001.