Petra Vetter1, Fraser W Smith2, Lars Muckli3. 1. Centre for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, College of Medical, Veterinary and Life Sciences, University of Glasgow, 58 Hillhead Street, Glasgow G12 8QB, UK; Laboratory for Behavioral Neurology and Imaging of Cognition, Department of Neuroscience, Medical School and Swiss Center for Affective Sciences, University of Geneva, Campus Biotech, Case Postale 60, 1211 Geneva, Switzerland. Electronic address: petra.vetter@unige.ch. 2. Centre for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, College of Medical, Veterinary and Life Sciences, University of Glasgow, 58 Hillhead Street, Glasgow G12 8QB, UK. 3. Centre for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, College of Medical, Veterinary and Life Sciences, University of Glasgow, 58 Hillhead Street, Glasgow G12 8QB, UK. Electronic address: lars.muckli@glasgow.ac.uk.
Abstract
Human early visual cortex was traditionally thought to process simple visual features such as orientation, contrast, and spatial frequency via feedforward input from the lateral geniculate nucleus (e.g., [1]). However, the role of nonretinal influence on early visual cortex is so far insufficiently investigated despite much evidence that feedback connections greatly outnumber feedforward connections [2-5]. Here, we explored in five fMRI experiments how information originating from audition and imagery affects the brain activity patterns in early visual cortex in the absence of any feedforward visual stimulation. We show that category-specific information from both complex natural sounds and imagery can be read out from early visual cortex activity in blindfolded participants. The coding of nonretinal information in the activity patterns of early visual cortex is common across actual auditory perception and imagery and may be mediated by higher-level multisensory areas. Furthermore, this coding is robust to mild manipulations of attention and working memory but affected by orthogonal, cognitively demanding visuospatial processing. Crucially, the information fed down to early visual cortex is category specific and generalizes to sound exemplars of the same category, providing evidence for abstract information feedback rather than precise pictorial feedback. Our results suggest that early visual cortex receives nonretinal input from other brain areas when it is generated by auditory perception and/or imagery, and this input carries common abstract information. Our findings are compatible with feedback of predictive information to the earliest visual input level (e.g., [6]), in line with predictive coding models [7-10].
Human early visual cortex was traditionally thought to process simple visual features such as orientation, contrast, and spatial frequency via feedforward input from the lateral geniculate nucleus (e.g., [1]). However, the role of nonretinal influence on early visual cortex is so far insufficiently investigated despite much evidence that feedback connections greatly outnumber feedforward connections [2-5]. Here, we explored in five fMRI experiments how information originating from audition and imagery affects the brain activity patterns in early visual cortex in the absence of any feedforward visual stimulation. We show that category-specific information from both complex natural sounds and imagery can be read out from early visual cortex activity in blindfolded participants. The coding of nonretinal information in the activity patterns of early visual cortex is common across actual auditory perception and imagery and may be mediated by higher-level multisensory areas. Furthermore, this coding is robust to mild manipulations of attention and working memory but affected by orthogonal, cognitively demanding visuospatial processing. Crucially, the information fed down to early visual cortex is category specific and generalizes to sound exemplars of the same category, providing evidence for abstract information feedback rather than precise pictorial feedback. Our results suggest that early visual cortex receives nonretinal input from other brain areas when it is generated by auditory perception and/or imagery, and this input carries common abstract information. Our findings are compatible with feedback of predictive information to the earliest visual input level (e.g., [6]), in line with predictive coding models [7-10].
Decoding of Sound and Imagery Content in Early Visual Cortex
We used fMRI in combination with multivariate pattern analysis (MVPA) to explore how complex information from audition and imagery translates to the coding space of early visual cortex in the absence of feedforward visual stimulation. Throughout our experiments, we omitted any visual stimulation by blindfolding our subjects (Figure 1). In experiment 1, subjects listened to three types of natural sounds: bird singing, traffic noise, and a talking crowd (see Figure 2). fMRI activity patterns were extracted from retinotopically mapped visual areas 1, 2, and 3 (V1, V2, and V3) (Figure 1 [11]) and fed into a multivariate pattern classifier (linear support vector machine; see Supplemental Experimental Procedures available online). The classifier successfully discriminated the three different sounds in early visual cortex, particularly in V2 and V3 (at ∼42%; see Figure 2; for results with increased statistical power, see Figure S1A). Hence, activity patterns in early visual cortex contained sufficient information from auditory stimulation to allow the content-specific discrimination of natural sounds. As expected, the classifier performed very well in auditory cortex (positive control), but not in an unrelated cortical area (motor cortex; negative control). At different eccentricities, classification was successful in peripheral and far peripheral areas, particularly in V1 and V2, but not in foveal regions, consistent with structural and functional evidence for auditory influences on early visual cortex (e.g., [12-14]).
Figure 1
Experimental Setup and ROI Definition
In each of the five experiments, ten healthy subjects were scanned with solely auditory stimulation in the absence of visual stimulation. Subjects wore a blindfold and were instructed to keep their eyes closed at all times, and room lights were switched off. In a separate session, retinotopic mapping was performed for all subjects in all experiments to define early visual areas V1, V2, and V3. We show probability maps from the retinotopic mapping data of experiment 1 (n = 10) as derived from functionally informed cortex-based alignment on a flattened Montreal Neurological Institute (MNI) template. White lines indicate mean eccentricity boundaries. Sound-induced blood-oxygen-level-dependent activation patterns from these regions of interest (ROIs) were fed into a multivariate pattern analysis.
Figure 2
Experimental Design and Classification Results of Experiments 1 and 2
(A) In experiment 1, subjects listened to one of three different natural sounds, interleaved with silence (apart from scanner noise).
(B) Mean classification accuracy of the classifier distinguishing the three natural sounds in the different ROIs. Early visual cortex (EVC) contains V1, ventral V2, dorsal V2, ventral V3, and dorsal V3. Chance level is at one out of three. Error bars indicate SEM. All p values were derived from a permutation analysis (see Supplemental Experimental Procedures). Results for V1, V2, and V3 are single threshold corrected. ∗p < 0.05, ∗∗p = 0.001. For significant results, confusion matrices are displayed underneath the graphs to show that classification was not solely successful due to the difference between the activity patterns evoked by one sound versus all other patterns. Columns of the confusion matrices indicate the sound displayed (F, forest; p, people; T, traffic), and rows indicate which sound the classifier predicted. Classifier performance is represented by color hues, with warm colors for above-chance classification and cold colors for below-chance classification.
(C) Mean classification accuracies for all visual ROIs divided into three eccentricities (fovea, periphery, and far periphery). ∗p < 0.05 (uncorrected), ∗∗p < 0.05 (false discovery rate corrected).
(D) In experiment 2, subjects received a word cue to imagine the sounds and the associated visual scene. Four runs with word cues were alternated with four runs of actual sound stimulation.
(E) Classification results are shown for imagined sounds. ∗p < 0.05, ∗∗p = 0.001.
(F) Mean classification accuracies for different eccentricities of the visual ROIs. ∗p < 0.05 (uncorrected), ∗∗p < 0.05 (false discovery rate corrected).
(G) Cross-classification results of experiment 2. The classifier was trained on real sounds and tested on imagined sounds and vice versa, and results were averaged. ∗p < 0.05, ∗∗p = 0.001.
Sounds could have induced crossmodal top-down expectations or mental imagery, which can be conceptualized as one form of nonretinal input to early visual cortex. In experiment 2, we investigated whether sounds could be decoded in early visual cortex even when they were merely imagined and whether feedback information from real and imagined sounds elicited similar activity patterns. Here, runs with natural sound stimulation were interleaved with runs in which subjects solely imagined the sounds upon hearing the word cues “forest,” “traffic,” or “people” (Figure 2D). Subjects were instructed to engage in mental imagery of the sounds and a corresponding natural scene. Successful discrimination of imagined sounds was observed in both foveal and peripheral areas of early visual cortex (but not far periphery), in V1 and auditory cortex (Figures 2E and 2F; classification of real sounds replicated the results of experiment 1, cf. Figure S1B). Therefore, even in the absence of both visual and auditory stimulation, the contents of mental imagery could be decoded from both V1 and auditory cortex.Furthermore, we performed a cross-classification analysis between auditory perception and imagery, i.e., we trained the classifier on runs with sound stimulation and tested on runs with pure imagery and vice versa. Cross-classification succeeded in V1 and V2 (Figure 2G). This demonstrates that both sounds and imagery cues induced similar activity patterns in early visual cortex and that feedback information is coded consistently across imagery and auditory perception. In auditory cortex, cross-classification did not succeed, indicating that activity patterns induced by feedforward auditory stimulation are coded differently than those induced by feedback through auditory imagery.
Decoding of Sounds while Manipulating Cognitive Resources
In experiments 3 and 4, we explored the robustness of cortical feedback to interference with orthogonal engagement of attention, working memory, and visuospatial processing. During natural sound stimulation, subjects performed an orthogonal task that was either an auditory working memory task (experiment 3) or a visuospatial imagery task (experiment 4). Again, both experiments omitted any visual stimulation. In experiment 3, subjects retained a list of five words (animals or everyday objects) in memory during the natural sound stimulation and subsequently matched it with a second word list in scrambled order (Figure 3A). Activity patterns during natural sound stimulation were again successfully decoded from early visual cortex, mainly in peripheral regions and consistently in V2 (Figures 3B and 3C). This demonstrates that simultaneous retention of orthogonal contents in working memory did not strongly affect classification.
Figure 3
Experimental Design of Experiments 3 and 4 and Classification Results of Experiments 3, 4, and 5
(A) In experiment 3, subjects performed an orthogonal auditory working memory task while hearing natural sounds. They retained a word list of five animals or everyday objects in working memory and matched them with a second scrambled word list containing one different word in half of the trials. Match or mismatch was indicated with a button press during response time.
(B) Classification results for the three different sounds during performance of the task. Significance levels and analysis parameters were the same as in experiments 1 and 2. Error bars indicate SEM.
(C) Mean classification accuracies for all visual ROIs divided into three eccentricities (fovea, periphery, and far periphery).
(D) In experiment 4, subjects performed a visuospatial imaginary cube-assembly task while hearing natural sounds. They mentally constructed an imaginary figure according to five assembly instructions, rotated the imaginary figure 90° clockwise, and indicated match or mismatch of the correct solution with the second list of instructions.
(F) Classification results by eccentricity of visual ROIs. ∗p < 0.05 (uncorrected), ∗∗p < 0.05 (false discovery rate corrected).
(G) In experiment 5, subjects listened to three different sound exemplars for each of the two categories, “human” (People 1, People 2, Kids) and “inanimate” (Traffic 1, Traffic 2, Airplane). Sounds were cut to 6 s, and interstimulus intervals were 6 s, otherwise the experimental design was the same as in experiment 1. The table shows early visual areas with significant above-chance classification for all combinations of “human” versus “inanimate” sounds. All p values were derived from permutation analyses. ∗p < .05, ∗∗p < 0.005, ∗∗∗p = 0.001.
(H) Cross-classification of one pair of exemplars against another.
In experiment 4, subjects engaged in an imaginary cube-assembly task [15]. Here, subjects mentally constructed an imaginary figure according to five assembly instructions and rotated the imaginary figure 90° clockwise while hearing the natural sound. Subsequently, they matched the rotated figure held in memory with a second list of instructions. Although the classifier failed to discriminate the three natural sounds in most of early visual cortex, residual above-chance classification remained in the far periphery of V2 (Figures 3E and 3F) despite the orthogonal engagement of attentionally demanding active visuospatial processing.
Whole-Brain Searchlight Results
We performed a whole-brain searchlight analysis to identify other areas that contain information from real and imagined sound content and may mediate information feedback to early visual cortex. Unsurprisingly, sounds could be decoded in a large part of bilateral superior temporal sulcus mostly belonging to auditory cortex (Figure 4). In experiments 1 and 2, real and imagined sounds could be decoded in parts of the precuneus and in posterior superior temporal sulcus (pSTS) (see overlapping regions in Figure 4). Sounds and, to a lesser extent, imagined sounds were successfully classified in a network of frontal regions, including superior and middle frontal sulci.
Figure 4
Results of the Whole-Brain Searchlight Analysis for Experiments 1–4
Overlay of significant above-chance classification of the three heard or imagined sounds onto a flattened and inflated cortical surface reconstruction (MNI template) for experiments 1–4. Note that a searchlight analysis is less sensitive than an ROI analysis because (1) the searchlight volume is small, and, thus, the classifier is less able to pick out subtle differences in activity patterns and because (2) correction for multiple comparisons is necessary on the whole-brain level (see Supplemental Experimental Procedures). Significance level is p < 0.05 with cluster threshold correction. Searchlight size was 343 voxels. For results with increased statistical power and a bigger searchlight, see Figure S4. Early visual areas depict probability maps as in Figure 1. pSTS, posterior superior temporal sulcus; SFS, superior frontal sulcus; MFS, middle frontal sulcus.
Univariate Activation Profile
Given previous controversial evidence of whether mental imagery elicits positive activity in early visual cortex, we performed a univariate generalized linear model analysis to see whether our decoding results were based on positive or negative activation profiles. Even at very liberal thresholds (p < 0.05 uncorrected; Figure S3), listening to sounds in the absence of visual stimulation elicited no positive activation in early visual areas but instead elicited a weak deactivation, consistent with previous findings (e.g.,[16]) and in contrast to classical findings for visual mental imagery [17, 18]. Imagery (experiment 2) elicited no positive activity but exhibited weak deactivations in both early visual and auditory cortices. In experiments 3 and 4, the secondary tasks activated early visual areas consistent with an engagement of object attention.
Category Specificity of the Information Fed Back to Early Visual Cortex
In experiment 5, we were interested in the specificity of the information that is fed back to early visual cortex. We hypothesized two possibilities: (1) sounds trigger a unique picture-like representation that reinstates the same activity patterns in early visual cortex as a real image does and thus allows successful decoding, and (2) higher-level abstract or categorical information is fed down to early visual cortex causing the differential activity patterns. The purpose of such information transfer could be to provide categorical expectations as proposed by models of predictive coding (e.g., [6, 7, 19]). We presented subjects with three different sound exemplars (6 s each) for each of the categories “human” and “inanimate.” The crucial experimental manipulation here was that two sound exemplars in each category could induce similar pictorial representations (different snapshots of a similar environment: “people 1” and “people 2” and “traffic 1” and “traffic 2”), whereas the third could induce a very different image due to a different feature (“playing kids” and “starting airplane”).Classification of exemplars of the “human” versus the “inanimate” category was successful in several early visual areas for eight out of nine exemplar combinations (Figure 3G; Table S1), replicating in part the results of experiment 1 and demonstrating decoding of sounds of the categories “human” and “inanimate” with different sound exemplars and shorter stimulus presentation times.Crucially, cross-classification succeeded in V2 and V3 in two out of three combinations, i.e., training the classifier for the pair “traffic 1” versus “people 1” lead to successful classification of “traffic 2” versus “people 2,” and training the classifier for the pair “traffic 2” versus “people 2” lead to successful classification of “airplane” versus “kids” (Figure 3H; Table S1). That is, the information contained in these activity patterns is generalizable across different sound exemplars within a category, demonstrating that sounds trigger shared categorical information transfer to early visual cortex rather than a fine-grained pictorial representation.
Discussion
Our series of five fMRI experiments provides converging evidence for consistent abstract information feedback from nonretinal sources to human early visual cortex.We show that category-specific information from audition and imagery can be decoded from early visual cortex activity. The fact that our classifier could predict which sound was heard or imagined means that our results go beyond previous studies demonstrating an overall activity increase in early visual cortex in response to auditory stimulation [20] or visual mental imagery [17, 18]. Our study shows that sound stimulation and associated imagery generate shared and meaningful information feedback to early visual cortex, carrying abstract and possibly semantic information.Previous studies focused on the decoding of visual mental imagery and the consistency of activity patterns across visual mental imagery and visual perception. Mostly, decoding of object categories worked in higher visual areas such as lateral occipital complex [21] or ventral temporal cortex [22] and to some extent in extrastriate cortex, but not in V1 [23, 24]. Our study is the first to show that inducing multisensory imagery allows decoding of complex mental imagery content in V1. Furthermore, whereas previous studies reported successful cross-classification between imagery and active visual perception, our cross-classification analysis demonstrates a consistency of activity patterns in early visual areas across imagery and auditory perception. This is converging evidence that nonretinal feedback is consistent with respect to its semantic content, no matter its exact source.Our results also show that this feedback is robust to mild interference with low attentional and working memory load (experiment 3) and to some extent even to interference with a visuospatially and attentionally highly demanding task (experiment 4).The whole-brain searchlight analysis identified higher-level multisensory brain areas such as pSTS and precuneus possibly mediating the information feedback from sounds and imagery to early visual areas. The precuneus has been identified as an area responding to both visual and auditory stimuli and possibly serving as an audiovisual convergence area [25]. pSTS is implicated in audiovisual integration and has been shown to feed down information to primary visual and auditory cortices [26]. In the context of our findings, we suggest that the content-specific information from sounds, when they are heard and/or imagined, is relayed from auditory cortex to early visual cortex via pSTS and precuneus, eliciting differential activity patterns in both of these regions. Apart from the route via multisensory areas, there is evidence for multisensory integration on the subcortical level [27] and for direct anatomical connections between early auditory and early visual areas [12, 28, 29], mostly reaching peripheral regions [12-14], consistent with both our eccentricity and searchlight results. Also, hippocampal projections to peripheral early visual regions have been demonstrated in the context of boundary extension for scene processing [30]. However, whether these pathways play a causal role in inducing differential activity patterns remains to be investigated.The successful classification in experiments 1 and 2 was driven by differential patterns of deactivation rather than activation, and, thus, our results are unlikely to be caused by the same neural mechanisms as those suggested in earlier studies on visual mental imagery [17, 18]. This also means that our results were not caused by an unspecific attention effect or a simple reactivation of early visual cortex due to pictorial visual mental imagery. The univariate activity profile also showed that classification was not driven by one sound eliciting more attention-related activity than another sound (Figure S3).The results of experiment 5 suggest that the information that is fed down to early visual cortex is not only content specific but also category specific, i.e., related to the information shared by sound exemplars of the same category. This suggests that information feedback is unlikely to be caused by an exact pictorial representation but instead contains abstract and possibly semantic information. The findings of experiment 5 furthermore demonstrate that the successful decoding in experiment 1 was not specific to the first sound exemplars we used and could not be caused by differential low-level acoustic features of the sounds (e.g., frequency distribution).Note that despite relatively low classification accuracies, our series of experiments replicated the successful decoding of sounds in early visual areas several times, demonstrating proof of principle and the robustness of our results across different subject and stimulus samples.Previous fMRI studies using MVPA have provided evidence for nonfeedforward input to early visual cortex. For example, activity patterns in nonstimulated parts of early visual cortex contain content-specific information from the surrounding visual context [31, 32], from objects presented in the periphery [33], and from visual stimuli solely held in working memory rather than being actively perceived [34, 35]. Moreover, higher visual areas project back to V1 the associated color of grayscale objects [36] or the predicted motion path of an apparent motion illusion [37, 38]. Our results provide further novel evidence that early visual cortex receives category-specific feedback from auditory, multisensory, memory, or imagery areas in the absence of any actual visual stimulation. Furthermore, many studies of top-down or multisensory influences on sensory regions, such as the decoding of sound-implying visual images in auditory cortex [39], the decoding of touch-implying visual images in somatosensory cortex [40, 41], the recruitment of early visual cortex in blindfolded subjects by touch [42], or the decoding of memory traces in early visual cortex [34, 35], could have been caused or accompanied by a form of mental imagery. Our study has explored the role of mental imagery in depth and has demonstrated that, in terms of reactivation of early visual cortex by a pictorial representation similar to actual visual perception, a simplistic mental imagery account falls short of explaining our results entirely.Why should category-specific information be fed down all the way to early visual areas? One interpretation is that the brain provides priors fitting to the best prediction, and these priors can be transmitted between different sensory modalities. Within the framework of predictive coding, early sensory areas are “prepared” with a predictive model for the external incoming information through cortical feedback from higher cognitive areas, the hippocampus, and other sensory modalities [6–10, 43]. In the present case, early visual cortex may anticipate certain visual information due to real or imagined auditory information. That is, auditory stimulation or imagery triggers a predictive model reaching early visual areas via feedback connections from higher multisensory or imagery areas and evoking content-specific activity patterns. Our results demonstrate that the information arriving in early visual cortex is categorical and independent of its exact source. In fact, previous accounts suggested that prediction and mental imagery may involve overlapping brain mechanisms [6, 43, 44], and mental imagery might have evolved from predictive brain mechanisms. What distinguishes both from each other remains an interesting question to be investigated, both experimentally and theoretically. Omitting feedforward stimulation is a promising step in studying nonvisual input to early visual cortex; however, without feedforward stimulation, it is difficult to study the functional role of this influence in actual visual perception. Audiovisual priming studies with natural stimuli indicate a facilitatory role for visual perception [45].Our results demonstrate that abstract information from nonretinal input, induced by both complex sound stimulation and mental imagery, can be translated to the coding space of early visual cortex. The purpose of such abstract information feedback might be to provide early visual cortex with a categorical prediction for the incoming visual input.
Authors: Mark A Williams; Chris I Baker; Hans P Op de Beeck; Won Mok Shim; Sabin Dang; Christina Triantafyllou; Nancy Kanwisher Journal: Nat Neurosci Date: 2008-11-02 Impact factor: 24.884
Authors: Kenneth I Vaden; Stefanie E Kuchinsky; Jayne B Ahlstrom; Judy R Dubno; Mark A Eckert Journal: J Neurosci Date: 2015-03-04 Impact factor: 6.167
Authors: Max A van den Boom; Mariska J Vansteensel; Melissa I Koppeschaar; Matthijs A H Raemaekers; Nick F Ramsey Journal: Biomed Phys Eng Express Date: 2019-08-02