Literature DB >> 29897559

Cultural specialization of visual cortex.

John C Ksander¹, Laura E Paige¹, Hunter A Johndro¹, Angela H Gutchess¹.

Abstract

A growing body of evidence suggests culture influences how individuals perceive the world around them. This study investigates whether these cultural differences extend to a simple object viewing task and visual cortex by examining voxel pattern representations with multi-voxel pattern analysis (MVPA). During functional magnetic resonance imaging scanning, 20 East Asian and 20 American participants viewed photos of everyday items, equated for familiarity and conceptual agreement across cultures. Whole brain searchlight mapping with non-parametric statistical evaluation tested whether these stimuli evoked multi-voxel patterns that were distinct between cultural groups. We found that participants' cultural identities were successfully predicted from stimuli representations in visual cortex Brodmann areas 18 and 19. This result demonstrates culturally specialized visual cortex during a basic perceptual task ubiquitous to everyday life.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2018 PMID： 29897559 PMCID： PMC6121144 DOI： 10.1093/scan/nsy039

Source DB: PubMed Journal: Soc Cogn Affect Neurosci ISSN： 1749-5016 Impact factor: 3.436

Cultural specialization of visual cortex

Recent findings in psychology and neuroscience demonstrate that culture substantially impacts how individuals perceive the world around them. This evidence comes from experiments testing explicitly cultural behavior, such as reading in one’s native vs non-native language (e.g. Baker ), as well as experiments testing culturally-nonspecific behavior, such as simple visuospatial tasks (e.g. Hedden ). Multiple frameworks, the neuronal recycling hypothesis and theories of cultural differences in information processing, account for how culture might shape perception. The Dehaene and Cohen (2007) neuronal recycling theory hypothesizes that cortical specialization enables culturally acquired behavior. Cultural specialization is, in turn, strongly constrained by that cortex’s original functionality. For example, reading a written language would specialize the cortex performing the prototypical computations required for that ability. Reading is then both enabled by this specialized cortex and constrained by that cortex’s functional origins. Evidence for visual word form area sensitivity to written language (Cohen and Dehaene, 2004) and language-specific functionality (Paulesu ) supports this account. The neuronal recycling hypothesis mainly concerns cultural inventions, which are optional, evolutionarily recent behaviors acquired through learning (e.g. written language, arithmetic and some tool usage). Another prevalent theory proposes that culturally specialized perception reflects information processing biases. For example, culture can influence estimations of line lengths within different visual contexts (Kitayama ; Hedden ), focal object processing (Gutchess ), and change-detection ability (Boduroglu ). Some of these biases are discussed in the literature as differences in ‘analytic’ vs ‘holistic’ approaches. More specifically, individuals from Western cultures prioritize feature-based information, whereas individuals from Eastern cultures prioritize contextually based information (e.g. Masuda and Nisbett, 2001; Nisbett ; Masuda ; Rule ). For example, when viewing a photograph depicting a group of people, a Westerner may preferentially attend to the photograph’s visual details, such as the groups’ clothes, or hair color. An Easterner may instead preferentially attend to contextual information, such as relationships amongst individuals in the group, or the depicted group’s location. These biases have been linked to corresponding differences in visual search (Wang ) and recognition memory performance [Millar ; see Paige for a corresponding functional magnetic resonance imaging (fMRI) study]. Previous behavioral and fMRI studies have examined culturally biased information processing using simple visuospatial tasks and judgments of complex scenes. The behavioral findings largely support culturally specialized perception, in that Easterners prioritize contextual visual information while Westerners prioritize feature-based information, as reviewed by Gutchess and Indeck (2009). However, it is important to note that some behavioral experiments have not found perceptual differences between culture groups, and the reported behavioral findings do not unanimously support culturally specialized perception (e.g. Zhou ; Evans ). Unlike the behavioral literature, fMRI results uniformly show cultural differences in blood-oxygen-level dependent (BOLD) activity for perceptual tasks, based on two comprehensive reviews of cross-cultural fMRI studies (Rule ; Han and Ma, 2014). Therefore, the rich neural data afforded by neuroimaging methods may be advantageous for studying culturally specialized perception. Kitayama and Uskul (2011) argue that culture-specific cognition has a strong neural basis and consequently, neuroimaging experiments show greater sensitivity compared with behavioral studies. Modern neuroimaging techniques would then seem particularly suitable for evaluating whether perception involves culturally specialized cortex in experiments with conflicting behavioral results, such as simple visual tasks without explicitly culturally laden stimuli. This study investigates whether perception involves culturally specialized cortex during a simple object viewing task with multi-voxel pattern analysis (MVPA). Previous fMRI studies investigating culture and visual perception have exclusively utilized univariate fMRI analyses [see reviews by Chiao (2009), Han and Ma (2014) and Rule ], which only reveal where average brain activity differs, typically after spatial smoothing. This approach is likely suboptimal for this study, as regional BOLD activity may not reflect the fine-grained cortical specialization that is likely necessary to produce perceptual differences between cultures. How people perceive the world differently is also arguably multivariate by nature, and therefore poorly described by univarate measurements [see review by Charest and Kriegeskorte (2015)]. Unlike univariate approaches, MVPA methodology assumes information in the brain is represented by a distributed neural code (i.e. fine-grained patterns of voxel activity across cortex) (Norman ). This offers much greater analytical sensitivity, particularly in visual perception experiments (Haxby ; Serences and Boynton, 2007). Therefore, MVPA provides a more suitable approach for this study, despite its underutilization to date in cultural neuroscience. The authors are aware of only one other cross-cultural study employing this approach, in which Raizada demonstrated that voxel patterns can predict how well Japanese and English speakers are able to discriminate between syllables specific to the English language. This study also concluded that univariate analyses insufficiently characterized the distinctions between cross-cultural representations of auditory stimuli, as mean BOLD signals did not show this effect. This study investigates perception, employing a simple object viewing task with stimuli that are not specific to any particular culture. East Asian and American participants viewed pictures of common objects, which were equated for cultural familiarity in a previous study, and then further checked for cultural equivalence in this study. If these stimuli evoke multi-voxel patterns that are distinct between cultural groups, this will provide compelling evidence that perception involves culturally specialized cortex.

Materials and methods

Dataset

This study re-analyzes a dataset (Paige ) originally collected to investigate cultural specificity in memory. The present analyses evaluate perception, distinct from memory outcomes.

Participants

Twenty East Asian (10 female) and 20 American (17 female) participants were recruited for this study. East Asian participants originated from China (including Hong Kong and Taiwan), Japan, Korea, Malaysia, Thailand or Vietnam and had lived in the United States for < 5 years (M = 2.6, s.d. = 1.3, range = 0.5–4.5) prior to the experiment. All American participants were United States natives, native English speakers (learned before 5 years of age), and had not resided outside the United States for >5 years. All participants were between the ages of 18 and 35 (Americans M = 22.5, s.d. = 2.4; East Asians M = 24.3, s.d. = 3.8). All participants were right handed, had normal (or corrected-to-normal) vision and hearing and were screened for head trauma (loss of consciousness for >10 min), emotional, psychiatric or learning disorders, and other contraindications for scanning. Participants provided written informed consent for the protocol approved by the Brandeis University IRB.

Stimuli

The image stimuli were developed by Kensinger , and later used by Millar for testing cross-cultural differences in memory performance. Millar piloted the stimuli with American and East Asian participants to match cultural familiarity and conceptual agreement. Stimuli were images of common objects against white backgrounds (examples in Figure 1). In total 216 images were used for encoding, with two different exemplars sharing the same verbal label (e.g. apple) employed across participants.

Fig. 1.

Four example stimuli.

Picture viewing procedure

Each participant viewed 108 pictures during fMRI scanning, divided into two runs of 54 images. The selection of individual images was counterbalanced across four selection lists, determining which stimuli appeared in each run for each participant, and equated across cultures. The stimuli order within each run was randomized for each participant. At each trial, a prompt, presented in English, was shown for 2 s: ‘Please indicate whether you would approach/avoid/stay’. This was followed by an image for 500 ms and a fixation cross during the interstimulus interval from 3500 to 11 500 ms, with jitter determined by Optseq (Dale, 1999). When the image appeared, participants made their decision and responded with a button press. Before scanning, participants practiced the task until competent. The ‘approach/avoid/stay’ trial prompts were intended to ensure participants paid attention and encoded items throughout the experiment, as the original experiment included a surprise memory test. The prompt responses and retrieval performance are not of interest in this study, and will not be discussed further [see Paige a) for further treatment of this issue].

Stimulus familiarity

Approximately 10 days after encoding, participants completed an online survey in which they named and rated their familiarity with each object in the stimulus set (1–5 Likert rating scale). This survey assessed whether participants across cultures reported equal familiarity with the stimuli. Even though these stimuli were previously equated for cultural familiarity in Millar , this check is important for interpreting the current results so that differences between culture groups cannot simply be attributed to differences in stimulus familiarity.

fMRI data acquisition

Images were acquired using a Siemens Trio 3T whole-body scanner. The two encoding runs utilized a 32-channel head coil and simultaneous multi-slice scanning, for which 69 slices 2.0-mm thick were acquired with an echo-planar image sequence (TR = 2000 ms, TE = 30 ms, FOV = 216 mm, and flip angle = 80°). Stimuli presentations were timelocked to TR onsets. High-resolution anatomical images were acquired using multiplanar rapidly acquired gradient echo sequence.

fMRI pre-processing

Pre-processing was implemented with SPM12 (Wellcome Department of Cognition Neurology, London, UK). Images were slice-time corrected, realigned for motion correction, co-registered to anatomical data, and normalized to the Montreal Neurological Institute (MNI) template space. No spatial smoothing was performed after normalization; subsequent analyses were implemented with in-house Matlab scripts. A gray matter tissue probability map was applied to the unsmoothed data, and only voxels with at least 10% gray matter probability were retained. This tissue probability map was constructed with the IXI dataset (http://www.brain-development.org/), the same tissue probability map used in SPM segmentation and spatial normalization routines.

Hemodynamic response modeling

The hemodynamic response (HDR) for each stimulus was modeled using the least squares-separate (LS-S) method described in Mumford . Event-related fMRI designs with short interstimulus intervals, like this study, pose a general challenge for MVPA. Trial HDRs overlap in these designs, so activity during the HDR of one trial will often contain signal from a previous trial as well. The LS-S technique addresses this problem by deconvolving BOLD activity for individual trials from an experiment’s fMRI timecourse. This allows the estimation of voxel response patterns specific to individual stimuli. LS-S HDR modeling was performed separately for each subject and within individual runs. Modeling each run separately maintains the independence between training and testing datasets necessary for cross-validation in the subsequent analyses (Kriegeskorte ). Therefore, the following modeling procedure was applied individually to each of the 80 runs in the dataset (i.e. 40 subjects × 2 runs, every run modeled separately). Thus, the following procedure was applied to a single run from a single subject, and iteratively repeated across the whole dataset. First, each voxel’s timecourse was normalized to zero mean and unit variance (i.e. Z scored), then a highpass temporal filtering was applied (Gaussian-weighted least-squares line, 128 s cut-off). The LS-S design matrix consisted of an intercept and two regressors. Following the procedure in Mumford , the first regressor modeled the trial of interest HDR, and the second modeled all other trial HDRs. The 128 s cut-off highpass filter was also applied to these regressors prior to HDR estimation. For every trial, HDR voxel betas were estimated, and then t-values were calculated by dividing the beta estimates by their standard errors. Transforming response pattern betas into t-values is a noise normalization technique shown to limit the influence of noisy voxels on classification analyses (Misaki ). The estimated patterns for trials which began less than 3 TRs before the end of a run were discarded for insufficient data. Thus, the response pattern to each stimulus was measured by the voxel t-values calculated from LS-S modeling. Modeling each trial’s HDR pattern in this way served two purposes. First, stimulus-specific response patterns were obtained from the event-related timecourse. Second, this procedure should separate the activity related to the image stimuli from the activity related to the trial prompts that preceded every trial. As it has been demonstrated that reading in one’s native vs non-native language produces differences in mean BOLD activity (Baker ), the sensitivity of MVPA combined with this study’s overlapping trial HDRs presents the possibility that our results could be confounded with neural responses from reading the English trial prompts. Although participants likely habituated to the prompt (repeated throughout the study and practice), the possibility of differences resulting from native language was addressed analytically. The LS-S modeling performed in this study deconvolves HDRs timelocked to the image stimulus onsets, which suppresses prestimulus ‘bleed through’ activity. This model further accounts for activity common between trials by including an intercept term. Therefore, HDR estimates would not include prestimulus bleed through activity common to all trials, such as leftover activity from a common prompt.

Multi-voxel pattern analysis

Pre-processing

Two further pre-processing steps were taken for these analyses; the first step removed unsuitable fMRI data, and the second step improved the suitable data’s signal-to-noise ratio. Only voxels valid in all subjects are appropriate for this study’s classification design (see ‘Analysis design’ section), so voxels with signal dropout in any subject at any timepoint during the experiment were considered invalid and removed (see Supplementary Material). Finally, run-wise temporal compression was performed on the valid voxels. This procedure averages the trial HDR estimates within each run, producing two voxel response patterns per subject (one per run). Temporal compression improves signal-to-noise ratio without spatial smoothing by averaging over noisy trial-to-trial pattern variance [Mourao-Miranda , see Supplementary Material]. These temporally compressed t-value patterns were the input data for all analyses.

Analysis design

The preprocessed data was submitted to a leave-one-subject-out (LOSO) classification design. In this scheme, a classifier is trained on every subject except one left-out subject, then tested on the left-out subject. Model cross-validation is achieved by repeating this process iteratively until every subject has been left-out once (k-folds = 40). The classifier was trained with fMRI activation patterns and corresponding participant group labels (i.e. East Asian or American) in the training dataset, and predicted the unseen participant labels of response patterns in the testing dataset. In other words, the classifier was trained to distinguish East Asian from American response patterns, and then predicted whether the left-out subject’s response patterns were from an East Asian or an American participant. Classification accuracy was measured by the percentage of correctly predicted labels across all cross-validation folds, or, equivalently, the average LOSO classification accuracy.

Searchlight mapping

Searchlight MVPA approaches analyze response patterns contained within smaller regions of interest (ROIs) iteratively across the whole brain (Kriegeskorte ). This allows the localization of culturally specialized cortex without a priori ROIs in this study. Spherical searchlights 3 voxels (6 mm) in diameter were used, each with a volume of 19 total voxels. The average LOSO classification accuracy for each searchlight was mapped to its center voxel. Importantly, searchlights were only analyzed in locations where all voxels within the searchlight were valid for analysis. Searchlights containing any voxels excluded during preprocessing (invalid for LOSO cross-validation or <10% gray matter probability) were not analyzed. These parameters created 145 938 total searchlights.

Classifier

Searchlights were analyzed with a Gaussian Naive Bayes (GNB) classifier according to the previously described LOSO design. The GNB classifier was implemented as detailed in Raizada and Lee (2013), with a class prior term for the slightly unbalanced LOSO training set: The classifier predicts one of K classes,, based on the largest class prior, and voxel membership, joint probability. Class prior probability was defined as where N is the total number of training observations (i.e. patterns) and is the number of training observations belonging to class. The equation for voxel class likelihood (i.e. the probability an individual searchlight voxel,, belongs to a given class) is defined as: The major advantage of searchlight MVPA with Gaussian Naive Bayes classification is that GNB’s ‘naiveté’ can be exploited during cross-validation for dramatically superior computation time compared with other algorithms. This fast GNB implementation was first reported by Pereira and Botvinick (2011), then expanded on by Raizada and Lee (2013). The GNB model (naively) assumes that class features are independent of one another (e.g. the correlations between response pattern voxels do not matter). Therefore, the GNB model can be trained on every voxel in the whole brain all at once, whereas other algorithms must be trained on each searchlight individually in order to model the feature relationships in different voxel groupings. The number of CV folds compounds this advantage further, as each fold requires model retraining. Importantly, this fast GNB implementation makes searchlight permutation testing computationally tractable.

Searchlight mapping statistical inference

A non-parametric permutation testing procedure was used to evaluate the statistical significance of searchlight classification accuracies. This approach is particularly warranted as recent publications have raised concerns over parametric significance testing in fMRI (Eklund , 2016; Stelzer ). Therefore, the non-parametric cluster significance test developed by Stelzer was adapted for the current between-subjects MVPA design. Precise details of how we implemented this method are in the Supplementary Material, but this adaptation only deviated from the original Stelzer procedure in one major regard. Specifically, group accuracy maps were not bootstrapped from single subject permutation maps. This is simply because the current analysis is between-subjects and thus does not involve single subject searchlight maps or group average accuracy maps. This required more computationally expensive permutation testing, although the permuted searchlight maps in this study were analyzed in the same fashion as the bootstrapped group accuracy maps in Stelzer .

Computation time

The searchlight mapping analysis was implemented with the Matlab parallel computing toolbox for running on a single server node (Brandeis University High Performance Computing Cluster) with 512GB RAM and 4× Intel Xeon CPU E5-4620 v2 @ 2.60 GHz, each with 8 physical cores (32 cores total). The total computation time was ∼15 days for this analysis. Even with a parallelized fast-GNB implementation and powerful computing resources, searchlight permutation testing accounted for >14 days of that run time. This computational limitation is why the fMRI analyses enjoyed fewer permutations during statistical evaluation than the behavioral analysis (see ‘Familiarity Check’ section).

Results

Familiarity check

The post-task questionnaire revealed no differences in stimulus familiarity (P = 0.380) between East Asians (M = 4.38, s.d. = 0.20) and Americans (M = 4.30, s.d. = 0.31), tested via permutation testing of mean differences with 50 000 permutations. This finding replicates the Millar analyses, which also determined these stimuli were equally familiar across cultures.

Searchlight mapping

The whole brain searchlight mapping analysis revealed one significant searchlight cluster after non-parametric cluster size control (, , , ). This cluster comprised 41 contiguous searchlights spanning Brodmann areas 18 and 19 in visual cortex (see Figure 2). The MNI coordinates and Brodmann areas of each searchlight center in this cluster are listed in the Supplementary Material.

Fig. 2.

Searchlight mapping results. All voxels in the 41 searchlight cluster spanning BA 18 and 19 are shown in yellow. Glass brain rendered via Madan (2015).

ROI follow-up analyses

In order to determine whether culturally distinct neural codes are distributed across the visual cortex identified by searchlight mapping, follow-up ROI analyses were performed as prescribed by Etzel . In these analyses, an ROI was constructed from the significant searchlight cluster and tested. This analysis is deliberately circular (Kriegeskorte ), as discussed by Etzel . The purpose is to test whether the cortex identified in searchlight mapping is informative as a whole, rather than just an agglomeration of smaller informative searchlights. However, this analysis also afforded the opportunity to test three important qualities of our result. First, whether the effect generalized across different classifier algorithms, as testing such generalizability with searchlight mapping was computationally intractable. Second, whether this result truly reflected fine-grained neural coding, rather than differences in mean activity. Third, how differences in anatomical alignment or gender ratios may have influenced this result. These empirical tests were further characterized by visualizing the ROI data for illustrative purposes (see ‘t-Distributed stochastic neighbor embedding visualization’ section).

Model and cortical generalization

Gaussian Bayes Naive, diagonal quadratic discriminant analysis (DQDA), linear support vector machine (SVM-lin), and non-linear SVM (SVM-RBF) classifiers were tested. The DQDA classifier was implemented with the Matlab fitcdiscr function, and SVMs were implemented with LIBSVM (Chang and Lin, 2011). SVM-lin used a linear kernel, and SVM-RBF used a radial-basis-function kernel with . The cost parameter was fixed at 10 for both SVM-lin and SVM-RBF. The ROI data were submitted to the same LOSO cross-validation scheme used in the whole-brain analyses, and permutation testing was performed for statistical inference. For this permutation testing, class labels were randomly permuted, and ROI classification accuracies recalculated with the permuted class labels. This process was repeated 15 000 times, each with a consistent label permutation order which remained fixed across CV folds (matching the whole-brain analyses). All four classifiers performed better than chance, and these results are shown in Figure 3. This confirms that the cortex identified by searchlight mapping is informative as a whole, not just as contiguous individual searchlights. Furthermore, this analysis demonstrates that our results generalize over multiple classification models.

Fig. 3.

ROI follow‐up classification accuracies. The mean empirical chance accuracies were 49.29, 49.5, 49.47 and 48.82% for GNB, DQDA, SVM‐lin and SVM‐RBF, respectively. These values are approximated by the dashed line at 50%. Error bars are 90% CI for the mean, also obtained from the empirical chance distribution.

Mean activity

Group differences in mean activation (e.g. regional univariate effects) could influence these results. To address this possibility, we repeated the previously described analyses after treating the data to control for mean activation. The fixed SVM-RBF cost was increased to 100 in this analysis, and all other model parameters were unchanged. First, voxel-wise ‘cocktail blank’ normalization (MacEvoy and Epstein, 2009) was performed on the t-value patterns within each run by setting each voxel to zero mean and feature scaling. After temporal compression, the same normalization was applied individually to each pattern across-voxels (i.e. spatially), so that all patterns had the same mean population activity. All four classifiers again performed better than chance on the treated data (Supplementary Figure S1), replicating the previous ROI follow-up analysis while controlling for mean activity. Furthermore, a whole-brain univariate analysis showed no group differences in mean bold activity for this cortical region (see Supplementary Material for univariate analysis details). These analyses confirm that the previous results were not simply driven by differences in mean activity; voxel patterns may be necessary, and are ‘at least’ sufficient for characterizing cortical specialization in this study [see Coutanche (2013) for greater detail on these analyses and their interpretations]. On a technical note, fixed SVM cost and parameters were chosen simply to avoid intensive parameter searches during cross-validation (Hsu ). Conducting such optimizations may have yielded superior performance, but the purpose of these analyses was to test generalization across different models, rather than comparing their relative performances.

Voxel alignment

Although the authors are not aware of any empirical findings, group differences in anatomical alignment could hypothetically influence between-groups voxel pattern analyses. In order to account for this concern analytically, another ROI analysis was performed with shared response modeling [SRM; Chen ; also see Cohen for a broader review]. This method maps subjects’ fMRI data to a common latent variable space, based on subjects’ shared responses to stimuli. SRM effectively performs inter-subject alignment in this way, aligning data in abstract response space rather than 3D anatomical space. This is similar to hyperalignment (Haxby ); however, SRM naturally facilitates between-groups analyses by estimating the latent representations shared by subjects in order to find a common space. The ROI data were analyzed with the between-groups SRM analysis described in Chen experiment 3, where group-specific responses are separated from both the common and idiosyncratic responses. These group-specific responses are then used for the between-groups LOSO classification. See Supplementary Material for further details. This analysis successfully predicted participant’s cultural identities (GNB accuracy = 97.5%; 90% CI for the mean = 82.94–100%, P < 0.001), and the SRM estimates substantially improved classification accuracy. This methodology should improve analytical sensitivity, as demonstrated in Chen . More importantly, this result shows culturally distinct visual cortex representations using methodology tolerant to anatomical misalignment. If American and East Asian voxel patterns conveyed the same stimulus information, but were anatomically misaligned, this analysis would have shown no distinction between these groups. Instead, SRM estimations improved classification accuracy, providing good evidence that this study’s results reflect differences in representational content.

Gender

Although there was no reason to expect gender differences in visual cortex representations, this study’s participant groups were imbalanced with respect to gender. To address gender-imbalance concerns, two control analyses were conducted on the ROI data. In the first analysis, all male subjects were excluded, and the cross-validation scheme was altered to fairly account for those exclusions. This analysis successfully predicted participant’s cultural identities (GNB accuracy = 88.99%; 90% CI for the mean = 80.15–98.97%, P < 0.001), while controlling for gender. The second analysis excluded male subjects from training, but included every male subject in the testing-sets (in addition to female subjects). This second analysis successfully predicted cultural identity as well (GNB accuracy = 75.74%; 90% CI for the mean = 68.82–83.14%, P < 0.001 ). In both analyses the model was trained exclusively on female subjects, so the model’s decision boundary could not discriminate cultural identity based on gender differences. The second analysis demonstrated that decision boundary also generalized to testing-sets comprised of both female and male participants. This indicates that this study’s results are unlikely due to a gender-imbalance between cultural groups. The Supplemental Material describe these analyses in detail, including both ROC analyses and confusion matrices.

t-SNE

Improved methods for visualizing high-dimensional data provide valuable context for interpreting fMRI analyses (Johnson ; Peltonen and Kaski, 2011; Vellido Alcacena ). Therefore, the t-distributed stochastic neighbor embedding (t-SNE) algorithm was used to visualize the fMRI data submitted to the ROI follow-up analyses. t-SNE is a non-linear dimension reduction technique which can produce 2D representations of high-dimensional data (Maaten and Hinton, 2008). Using this technique, a scatter plot was created illustrating how subjects’ neural representations in visual cortex relate to each other (Figure 4a), and the quadratic discriminant analysis boundary was calculated for this 2D representation (Figure 4b). This figure is included purely for illustrative purposes, and is not quantitative evidence (see Supplementary Material for detailed methods).

Fig. 4.

t‐SNE visualization of visual cortex representations. Each subject’s ROI data is shown in 2D t‐SNE space (a). The dashed line marks the QDA boundary (b). The visualized data consists of 80 total observations, with two ROI voxel patterns (one corresponding to each run) for each of the 40 subjects.

Discussion

These results demonstrate culturally specialized visual cortex during a simple object viewing task. Additionally, this study is the first to show culturally distinct multi-voxel representations of stimuli that are non-specific to any particular culture. Raizada found distinct fMRI pattern representations between Japanese and English participants, but in response to English syllables which have no functional distinction in Japanese. These stimuli are considerably different to Japanese and English speakers, so perhaps their dissimilar representation is less surprising. The stimuli in this experiment depicted objects familiar to both American and East Asian participants, yet the stimuli were nevertheless represented differently by American and East Asian participants in visual cortex. This result reveals that one’s cultural background specializes the cortex involved with object recognition, a routine and fundamental perceptual task. This finding is consistent with theories that predict prevalent and extensive cross-cultural differences, such as differences in information processing between cultures. It is difficult to say whether these results reflect neuronal recycling, as the stimuli used in this study do not represent cultural inventions unique to either group. Neuronal recycling also proposes that humans’ innate cortical organization limits how cultural acquisition may influence broader cognition. However, the real psychological limits imposed by these organizational constraints are unknown. In addition, this study cannot speak to whether culturally specialized visual cortex manifests as cross-cultural differences in information processing biases. Additional experiments would be needed to support that conclusion, as no experimental manipulations or analytical measures in this study addressed that specific hypothesis. That being said, the current findings could coincide with recent work examining the cultural differences in information processing in light of spatial frequency tuning. Tardif found Chinese participants utilized lower spatial frequency information than Canadian participants during a face recognition task [for further discussion of this idea, see Paige )]. They concluded this difference in spatial frequency tuning underlies cultural differences in feature vs context information processing, because lower spatial frequency information is linked to narrower, feature-based focus (Shulman and Wilson, 1987). The cortex identified by searchlight mapping in the current experiment shows retinotopic sensitivity to spatial frequencies, that is, a strong cortical organization for responses to preferred spatial frequencies [see review by Tootell ]. Moreover, other research indicates that the region of visual cortex identified in this study responds to statistical regularities such as texture (Freeman ) and responds to the accumulation of high- and low- spatial frequency information (De Cesarei ). In addition, experimentally manipulating attention to high vs low spatial frequency consistently produces fMRI results with left-right hemispheric asymmetry (Han ; Iidaka ). It is plausible that cultural differences in spatial frequency tuning could explain why this study finds cultural specialization in left-lateralized visual cortex. Further research, employing stimuli designed to address this question, is needed (see Supplementary Material for a post hoc analysis). A worthwhile future direction would be to directly test this hypothesis with classification image methods, where behavioral responses can be directly mapped to changes in stimulus spatial frequency [see Murray (2011) for review]. Such an experiment would have ideal control over the stimuli’s spatial frequency, while also providing a strong test of cultural differences by employing artificial stimuli that should not differ across cultures. Although this study’s stimuli were rated equally familiar across cultures, it is impossible to ensure these stimuli were perfectly matched. However, Charest found pictorial stimuli representations in early visual cortex (EVC; V1–V3) are both robustly common across people, and idiosyncratic to individuals. Furthermore, in their experiment, idiosyncratic EVC representations were surprisingly not observed when the stimuli were unequally familiar across participants (e.g. one participant’s own house); that is, participant’s EVC representations only differed when the stimuli were equally familiar to everyone. Our results indicate that such EVC representations are also systematically different between cultures. Two other studies reporting culture-specific effects in this cortical area provide additional context for these results. Szwed reported higher mean BOLD activity in left-lateralized BA 17–19 when Chinese and French participants read letter-scrambled words in their own native script. Gutchess found dissimilar mean BOLD responses to focal objects vs contextual backgrounds between East Asian and American participants in left-lateralized BA 18 and 19 as well. This study involved a simple perceptual task, like this study, where participants viewed pictures of a central object within a contextual background (e.g. an elephant by a watering hole), and also pictures of those objects and backgrounds in isolation (e.g. only the elephant or watering hole). These studies show how the cultural differentiation of visual cortex can support either neuronal recycling or cultural differences in information processing. However, this study shows cultural specialization in visual cortex without an experimental manipulation specific to either theory. Given that these theories are not mutually exclusive, a more comprehensive hypothesis may unify these results. One such hypothesis may be that culture-specific behavior specializes the cortex functionally relevant for that behavior, which in turn may broadly influence cognition beyond the scope of culturally acquired behavior. For example, reading in one’s native script, an example of culture-specific behavior, would specialize visual cortex for reading that particular language. The specialization would tune this cortex for the spatial frequency information most useful for reading that language’s script (Horie ). Consequently, perceiving and recognizing objects, an example of non-culture-specific cognition, utilizes the visual cortex, which has been tuned for information on particular spatial frequencies. Language could potentially specialize visual cortex in a similar manner. Primary sensory cortices have shown multi-sensory coding (Liang ), and object representations have shown some consistency across modalities (Shinkareva ). The cortex performing object recognition may be culturally specialized through correlated auditory and visual stimuli, or higher-level associations with linguistic representations. This hypothesis would produce perceptual differences across cultures consistent with cultural differences in information processing, as may be the case in Tardif . Neuronal recycling may also accommodate this hypothesis, if the theoretical constraints proposed by Dehaene and Cohen (2007) are informed by evidence for broader perceptual differences. This hypothesis is far beyond this study’s reach, although future studies along these lines may provide a more comprehensive explanation for perceptual differences across cultures.

Conclusion

This study successfully predicts participants’ cultural identities from multi-voxel pattern representations of common objects in visual cortex. This result demonstrates the cultural specialization of visual cortex during a perceptual task ubiquitous to everyday life. Future research should address the fundamental mechanisms driving cultural cortex specialization, and how that specialization manifests downstream across behavioral domains. Attempting to answer these questions should further advance understanding of how culture shapes cognition. Click here for additional data file.

57 in total

1. A cultural effect on brain function.

Authors: E Paulesu; E McCrory; F Fazio; L Menoncello; N Brunswick; S F Cappa; M Cotelli; G Cossu; F Corte; M Lorusso; S Pesenti; A Gallagher; D Perani; C Price; C D Frith; U Frith
Journal: Nat Neurosci Date: 2000-01 Impact factor: 24.884

2. Beyond mind-reading: multi-voxel pattern analysis of fMRI data.

Authors: Kenneth A Norman; Sean M Polyn; Greg J Detre; James V Haxby
Journal: Trends Cogn Sci Date: 2006-08-08 Impact factor: 20.229

Review 3. Cultural influences on memory.

Authors: Angela H Gutchess; Allie Indeck
Journal: Prog Brain Res Date: 2009 Impact factor: 2.453