| Literature DB >> 32531274 |
Siying Xie1, Daniel Kaiser2, Radoslaw M Cichy3.
Abstract
To behave adaptively with sufficient flexibility, biological organisms must cognize beyond immediate reaction to a physically present stimulus. For this, humans use visual mental imagery [1, 2], the ability to conjure up a vivid internal experience from memory that stands in for the percept of the stimulus. Visually imagined contents subjectively mimic perceived contents, suggesting that imagery and perception share common neural mechanisms. Using multivariate pattern analysis on human electroencephalography (EEG) data, we compared the oscillatory time courses of mental imagery and perception of objects. We found that representations shared between imagery and perception emerged specifically in the alpha frequency band. These representations were present in posterior, but not anterior, electrodes, suggesting an origin in parieto-occipital cortex. Comparison of the shared representations to computational models using representational similarity analysis revealed a relationship to later layers of deep neural networks trained on object representations, but not auditory or semantic models, suggesting representations of complex visual features as the basis of commonality. Together, our results identify and characterize alpha oscillations as a cortical signature of representations shared between visual mental imagery and perception.Entities:
Keywords: deep neural networks; feedback; mental imagery; object perception; oscillations
Year: 2020 PMID: 32531274 PMCID: PMC7342016 DOI: 10.1016/j.cub.2020.04.074
Source DB: PubMed Journal: Curr Biol ISSN: 0960-9822 Impact factor: 10.834
Figure 1Methods and Results of Multivariate Classification Analyses
(A) Stimuli were a diverse set of twelve object images and twelve spoken words denoting these objects.
(B) In the perception task, participants viewed the object images in random order.
(C) In the mental imagery task, participants were cued to imagine an object by hearing the spoken word denoting the object.
(D) EEG data recorded from 64 electrodes during both tasks were epoched into trials and subjected to time-frequency decomposition using Morlet wavelets. This was done separately for each single trial and each electrode, yielding a trial-wise representation of induced oscillatory power. We aggregated these time-frequency data into three frequency bands (theta: 5–7 Hz; alpha: 8–13 Hz; beta: 14–31 Hz). Averaging across all frequencies within each band yielded a time- and frequency-resolved response vector (across EEG sensors) for each trial. These response vectors were entered into multivariate pattern analyses.
(E) Multivariate pattern classification was performed separately for each frequency band. As perception and imagery need not emerge with similar temporal dynamics, we performed a time-generalization analysis in which we considered timing in the perception and imagery tasks independently. For every time point combination during perception (0–800 ms with respect to image onset) and imagery (0–2,500 ms with respect to word onset) separately, we conducted a pairwise cross-classification analysis where we trained support vector machine (SVM) classifiers to discriminate between response patterns for two different objects (here: car versus apple) when they were imagined and tested these classifiers on response patterns for the same two objects when they were perceived (and vice versa). We averaged classification accuracies for all pairwise classification analyses between objects, yielding a single time-generalization matrix for each frequency band. These matrices depict the temporal dynamics of representations shared between imagery and perception.
(F) We found significant cross-classification in the alpha frequency band, ranging from 200 to 660 ms in perception and from 600 to 2,280 ms in imagery. Peak decoding latency was at 480 ms (95% confidence intervals: 479–485 ms) in perception and 1,340 ms (95% confidence intervals: 1,324–1,346 ms) in imagery.
(G) To spatially localize these shared representations, we performed separate time-generalization analyses for anterior and posterior electrodes in our EEG setup. This analysis revealed significant cross-classification in the alpha band for posterior electrodes (from 20 to 800 ms during perception and from 660 to 2,500 ms during imagery), but not in the anterior electrodes. This suggests that parieto-occipital alpha sources mediate the shared representations between perception and imagery. Black outlines indicate time point combinations with above-chance classification (N = 38; non-parametric sign permutation tests; cluster-definition threshold p < 0.05; cluster threshold p < 0.05; Bonferroni corrected by 3 for the number of frequency bands tested). Dec. acc., decoding accuracy.
See also Figure S1.
Figure 2Methods and Results of Relating Shared Representations to Computational Models
(A) We characterized the format of the representations shared between imagery and perception in the alpha frequency band by relating EEG signals to computational models using representational similarity analysis [10, 11]. For each participant, we first constructed a 12 × 12 neural representational dissimilarity matrix (RDM) that contained the pairwise cross-classification accuracy between imagery and perception for each possible object pair (data, models, and results are color-coded similarly; EEG data here in gray). This summarizes the representational geometry of the shared representations between imagery and perception in the alpha band. We then related (Spearman’s R) neural RDMs to model RDMs that captured hypotheses about the format of the shared representations: (1) a deep neural network (DNN) trained on visual object classification (VGG-19 [56]; color-coded red) to assess visual processing; (2) a category model that captures superordinate-level category membership of the objects in 4 categories (animals, body part, plants, and man-made objects; color-coded purple) to assess semantic processing; and (3) a spectrotemporal auditory model [57] (color-coded green) and a DNN with two branches trained on musical genre and auditory word classification, respectively [58] (color-coded blue) to assess auditory processing. Visualizations of all model RDMs can be found in Figures S2A–S2C.
(B–D) We found a significant relationship between neural and model RDMs only for the late layers of the DNN trained on visual object classification (B), but not for the semantic model (C) or the auditory models (D). Error bars represent standard errors of the mean. Asterisks indicate significant correlations between model RDMs and neural RDMs (N = 38; non-parametric sign-permutation tests; ∗p < 0.05; ∗∗p < 0.01; false discovery rate [FDR] corrected for multiple comparisons across RDMs per model).
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Raw and analyzed data | This paper | |
| MATLAB | Mathworks Inc. | |
| Psychtoolbox | [ | |
| Brainstorm | [ | |
| LIBSVM Toolbox | [ | |
| MatConvNet MATLAB Toolbox | [ | |
| NSL MATLAB Toolbox | [ | |
| Deep neural network trained on auditory categorization | [ | |