Episodic memory enables humans to encode and later vividly retrieve information about our rich experiences, yet the neural representations that support this mental capacity are poorly understood. Using a large fMRI dataset (n = 468) of face-name associative memory tasks and principal component analysis to examine neural representational dimensionality (RD), we found that the human brain maintained a high-dimensional representation of faces through hierarchical representation within and beyond the face-selective regions. Critically, greater RD was associated with better subsequent memory performance both within and across participants, and this association was specific to episodic memory but not general cognitive abilities. Furthermore, the frontoparietal activities could suppress the shared low-dimensional fluctuations and reduce the correlations of local neural responses, resulting in greater RD. RD was not associated with the degree of item-specific pattern similarity, and it made complementary contributions to episodic memory. These results provide a mechanistic understanding of the role of RD in supporting accurate episodic memory.
Episodic memory enables humans to encode and later vividly retrieve information about our rich experiences, yet the neural representations that support this mental capacity are poorly understood. Using a large fMRI dataset (n = 468) of face-name associative memory tasks and principal component analysis to examine neural representational dimensionality (RD), we found that the human brain maintained a high-dimensional representation of faces through hierarchical representation within and beyond the face-selective regions. Critically, greater RD was associated with better subsequent memory performance both within and across participants, and this association was specific to episodic memory but not general cognitive abilities. Furthermore, the frontoparietal activities could suppress the shared low-dimensional fluctuations and reduce the correlations of local neural responses, resulting in greater RD. RD was not associated with the degree of item-specific pattern similarity, and it made complementary contributions to episodic memory. These results provide a mechanistic understanding of the role of RD in supporting accurate episodic memory.
One fundamental question in cognitive neuroscience is understanding the neural representations that support human episodic memory (). Existing studies suggest that both the content (i.e., what is encoded) and the geometry (i.e., the structured relations across input conditions) could substantially affect cognition (). In particular, the dimensionality of neural representations, which refers to the minimum number of dimensions needed to account for the variance in neural population activity across input conditions (), has been implicated in various cognitive functions, including visual perception (), reinforcement learning (), concept learning (), adaptation (), decision-making (, ), and cognitive control (). However, its role in episodic memory is still unknown.Both high- and low-dimensional representations have specific characteristics that may benefit episodic memory. In particular, a high-dimensional representation will separate even similar inputs into orthogonal activity patterns, which is achieved by eliminating correlations in sensory inputs. These codes are sparse, can efficiently use the limited numbers of neurons in a brain region, and enable complex features to be read out by simple downstream networks. Recent studies have shown an increase in representational dimensionality (RD) with reinforcement learning of object value (), and quick learners showed greater RD than slow learners (). A low-dimensional representation, by contrast, will encode a diverse range of inputs into a small set of common orthogonal activity patterns. Neural codes with low dimensions are correlated and redundant, enabling robust representation in the presence of neural noise. It could also increase the smoothness or generalization of representation, resulting in similar neural responses to similar images. A recent study has shown that the neural representations of complex inputs are fundamentally constrained by their smoothness (). It thus remains an open question as to how the neural representations are balanced in terms of efficiency and robustness to support good episodic memory.Another question concerns the mechanisms underlying the association between RD and episodic memory. A large body of research has revealed that the population of neurons wax and wane as a group, demonstrating low-dimensional variance (–). Attentional processes could exert top-down control (, ), which could reduce the correlation of neural responses (, , –) and thus could potentially increase the dimensionality. Extant modeling work suggests that this top-down attentional effect could be achieved by suppressing the low-dimensional shared neural variability (), yet evidence from human neuroimaging studies is still absent. Furthermore, these top-down processes have previously been shown to enhance the fidelity (i.e., item-level specificity) of neural representations (, ), which is associated with better episodic memory (, ). Thus, the primary goal of the current study is to examine the relationship between top-down processes, low-dimensional neural variability, and RD in the human brain and how they are related to the fidelity of neural representations and memory performance.To achieve a mechanistic understanding of the role of RD in episodic memory, we collected functional magnetic resonance imaging (fMRI) data on a relatively large group of participants performing a face-name associative memory task, which requires the participants to precisely discriminate each novel face and remember its association with an arbitrary name. We used principal component analysis (PCA) to estimate the representative dimensionality within and beyond the face-selective regions (FSRs) and further linked them to memory performance both within and across participants. Our results revealed a critical neural mechanism through which RD could affect episodic memory performance.
RESULTS
Behavioral results
A face-name associative memory task was used in the current study. In this task, participants were asked to remember 30 unfamiliar face-name pairs. To aid their memory, they were asked to make a subjective judgment on the fitness of name-face association. Each face-name association was studied twice within one scanning run, with an interrepetition interval ranging from 8 to 17 trials. We used a slow event-related design (12 s for each trial) to better estimate the single-trial blood oxygen level dependent (BOLD) responses (Fig. 1A). To prevent participants from further processing the face-name pairs, they were asked to make self-paced perceptual orientation judgments for 7.5 s before the next trial started. The average response time was 0.576 ± 0.240 s, and the mean accuracy was 0.856 ± 0.137 for the orientation task, suggesting that the participants were paying attention to the task.
Fig. 1.
Experimental paradigm and distribution of memory performance.
(A) A slow event-related design (12 s for each trial) was used to improve the accuracy in the estimation of single-trial responses. Each trial started with 0.5-s fixation, followed by the presentation of a picture for 2.5 s. Then, the frame of the picture turned red, which informed the participants to indicate whether the face “fitted” the name within 1.5 s. To prevent further encoding of the picture, a series of Gabor images tilting 45° to the left or the right was presented on the screen during the 7.5-s intertrial interval, and participants were asked to judge the direction of the Gabor images as quickly and accurately as possible. During the recognition memory test, participants were asked to indicate the corresponding correct name if the face was old; otherwise, they pressed the new button within 4 s. Please note that the facial images were censored here to protect privacy. They were not censored in the experiment. (B) Distribution of the accuracy of associative recognition (ACC_association) and d′ of item recognition (d′_item). The dashed lines represent the group mean (N = 468). (C) Correlation between ACC_association and d′_item.
Experimental paradigm and distribution of memory performance.
(A) A slow event-related design (12 s for each trial) was used to improve the accuracy in the estimation of single-trial responses. Each trial started with 0.5-s fixation, followed by the presentation of a picture for 2.5 s. Then, the frame of the picture turned red, which informed the participants to indicate whether the face “fitted” the name within 1.5 s. To prevent further encoding of the picture, a series of Gabor images tilting 45° to the left or the right was presented on the screen during the 7.5-s intertrial interval, and participants were asked to judge the direction of the Gabor images as quickly and accurately as possible. During the recognition memory test, participants were asked to indicate the corresponding correct name if the face was old; otherwise, they pressed the new button within 4 s. Please note that the facial images were censored here to protect privacy. They were not censored in the experiment. (B) Distribution of the accuracy of associative recognition (ACC_association) and d′ of item recognition (d′_item). The dashed lines represent the group mean (N = 468). (C) Correlation between ACC_association and d′_item.After 24 min of working memory and decision-making tasks in the scanner, participants completed a recognition task where they were asked to select the correct given name for a presented face from three candidates or choose “new” if it was a new face. The 30 old faces and 20 new faces were included in the recognition test. Two behavioral measures were generated from this task, i.e., associative recognition and item recognition. Associative recognition refers to the correct recognition of face-name associations, which was quantified as the accuracy for both the old (i.e., choosing the correct name) and new faces (i.e., correct rejection). In contrast, item recognition refers to the correct recognition of old faces regardless of the correct names (hit), which was quantified as the d′ score: Z (hit rate) − Z (false alarm rate, i.e., incorrect recognition of new faces as old). The accuracy of associative recognition (ACC_association) was 0.603 ± 0.147 (kurtosis = −0.127, skewness = −0.269; Fig. 1B, left), which was significantly above the chance level (25%, P < 0.001). The d′ of item recognition (d′_item) was 2.048 ± 0.823 (kurtosis = −0.070, skewness = −0.299; Fig. 1B, right), which was significantly correlated with ACC_association (R = 0.850, P < 0.001; Fig. 1C).
High-dimensional facial representations in the human brain
We examined the neural RD in FSR (Fig. 2B) and the hippocampus (HIP; Fig. 2C) using PCA. Existing studies have proposed two RD indices, including the number of PCs required to explain 90% of the variance (RDvar) () and the number of eigenvalues greater than 1 (RDeig) (). Whereas RDvar could be potentially affected by very small eigenvalues that are likely to be noise, RDeig is insensitive to the differences in the cumulative variance explained by eigenvalues greater than 1. Here, we propose an alternative index, namely, effective RD (RDeff), which is defined as RDeig divided by the cumulative variance explained by eigenvalues larger than 1 (Fig. 2A, see Materials and Methods). This is based on the assumption that the unexplained variance is not all noise. For participants with the same RDeig, those with more unexplained variance should have greater effective dimensions. RDeff jointly considers the major eigenvalues (RDeig) and the explained cumulative variance (RDvar) and is less affected by the number of eigenvalues with very small values. Thus, it could provide a good measure of regional and individual differences in representational dimensions, although the absolute number is less meaningful.
Fig. 2.
Schematic depiction of the RD analysis, the regions of interest, and the distribution of RDeig.
(A) PCA was performed on neural activation patterns (represented by m voxels) evoked for each of n trials. RDeig (i.e., k1) is the number of principal components (PCs) with an eigenvalue greater than 1, and RDvar (i.e., k2) is the number of PCs required to explain 90% of the variance. RDeff was obtained by dividing RDeig by the cumulative variance (V) explained by the PCs with an eigenvalue greater than 1. (B) FSRs were defined according to Zhen et al. (). Because the left anterior FFA contains a very small number of voxels (i.e., n = 182), the posterior and anterior FFA were merged and named FFA. (C) The bilateral hippocampus (HIP) was defined according to the Harvard-Oxford probabilistic (25%) atlas. (D) Population distribution of RDeig in FSR and HIP. The left, middle, and right dashed lines indicate the dimension of the facial images based on the OpenFace model (i.e., 4), the mean observed RDeig in FSR (i.e., 13.24) and HIP (i.e., 13.04) across participants, and the maximum possible dimensionality (i.e., 29), respectively. OFA, occipital face area; FFA, fusiform face area; pcSTS, posterior continuation of the superior temporal sulcus (STS); pSTS, posterior STS; aSTS, anterior STS.
Schematic depiction of the RD analysis, the regions of interest, and the distribution of RDeig.
(A) PCA was performed on neural activation patterns (represented by m voxels) evoked for each of n trials. RDeig (i.e., k1) is the number of principal components (PCs) with an eigenvalue greater than 1, and RDvar (i.e., k2) is the number of PCs required to explain 90% of the variance. RDeff was obtained by dividing RDeig by the cumulative variance (V) explained by the PCs with an eigenvalue greater than 1. (B) FSRs were defined according to Zhen et al. (). Because the left anterior FFA contains a very small number of voxels (i.e., n = 182), the posterior and anterior FFA were merged and named FFA. (C) The bilateral hippocampus (HIP) was defined according to the Harvard-Oxford probabilistic (25%) atlas. (D) Population distribution of RDeig in FSR and HIP. The left, middle, and right dashed lines indicate the dimension of the facial images based on the OpenFace model (i.e., 4), the mean observed RDeig in FSR (i.e., 13.24) and HIP (i.e., 13.04) across participants, and the maximum possible dimensionality (i.e., 29), respectively. OFA, occipital face area; FFA, fusiform face area; pcSTS, posterior continuation of the superior temporal sulcus (STS); pSTS, posterior STS; aSTS, anterior STS.We found that RDeig and RDeff in the gray matter were highly correlated across participants (r = 0.96), and they were moderately correlated with RDvar (r = 0.56 and 0.68 for RDeig and RDeff, respectively; fig. S1A). In addition, RDeig and RDeff were highly reliable across two repetitions, with intraclass correlation coefficients (ICCs) of 0.80 and 0.81, respectively. A relatively smaller ICC was observed for RDvar (i.e., 0.42; fig. S1B). Separately, for each region of interest (ROI), RDeff was more reliable across two repetitions compared to RDeig and RDvar (Supplementary Results). Because RDeff showed the most fine-grained distribution across participants (fig. S1) and the highest reliability, we then used RDeff as the main measure of regional and individual differences in RD. The results based on RDeig and RDvar are included in the Supplementary Results. Meanwhile, RDeig was used to provide a conservative estimation of the dimensionality of neural representations.Using the RDeig index to estimate the dimensionality, we found that FSR and HIP contained high-dimensional face representations (FSR: mean ± SD = 13.24 ± 1.32; HIP: mean ± SD = 13.04 ± 0.82; Fig. 2D). The dimension is much higher than the dimensionality estimated from the computational model (i.e., OpenFace; see Materials and Methods) that was required to identify different faces (i.e., 4; left dashed line in Fig. 2D) but was significantly lower than the maximal dimensions, i.e., 29 (Fig. 2D, right dashed line). Noticeably, the maximum possible dimensionality was 29 (30 − 1 in this study) because each voxel (i.e., matrix row) was mean-centered to eliminate the effects of activation level. Together, these results suggest that the brain maintains a high-dimensional yet robust representation of faces.
Dimensionality reflects hierarchical face representations
Previous studies have suggested that FSR (Fig. 2B) encodes faces hierarchically. For example, the occipital face area (OFA) engages in the early perception of facial features, the fusiform face area (FFA) analyzes the invariant aspects of faces that underlie recognition of individuals, and the superior temporal sulcus (STS) processes the changeable aspects of faces such as expressions and the direction of eye gaze (). According to this hierarchical representation, one could hypothesize that the dimensionality of face representations in the higher-level face areas, such as FFA and STS, would be higher than that in the primary brain areas, such as OFA. To ameliorate the effects of spatial autocorrelation of the BOLD signal and the size of brain regions, we used a searchlight method to estimate RD. That is, we calculated the dimensionalities in each cubic containing 125 voxels across the whole brain (Fig. 3A). It is evident that along the ventral visual pathway, the dimensionality increased from posterior to anterior regions.
Fig. 3.
Hierarchical face representation revealed by dimensionality analysis.
(A) Group-averaged map of RDeff based on the searchlight method, which revealed increasing dimensionality from posterior to anterior regions. (B) Differences in RDeff among ROIs. Error bars represent SEs across participants. LH, left hemisphere; RH, right hemisphere. ***Pcorrected ≤ 0.001.
Hierarchical face representation revealed by dimensionality analysis.
(A) Group-averaged map of RDeff based on the searchlight method, which revealed increasing dimensionality from posterior to anterior regions. (B) Differences in RDeff among ROIs. Error bars represent SEs across participants. LH, left hemisphere; RH, right hemisphere. ***Pcorrected ≤ 0.001.Focusing on each face-selective subregion and HIP, we found that OFA has the lowest dimensionality, whereas HIP has the highest dimensionality (Fig. 3B). Moreover, compared with the posterior FSR, the anterior regions show higher representation dimensionality, and this effect was found in both the face area and STS (Fig. 3B). The dimensionality of the right pcSTS (posterior continuation of STS) and pSTS (posterior STS) was significantly higher than that of the left homolog (Pcorrected < 0.001; Fig. 3B), but that in the right HIP was significantly lower than that in the left HIP (Pcorrected < 0.001).
Higher RD was associated with better memory
Having characterized RD within and beyond FSR, we then moved on to examine the core hypothesis of the current study: Is higher dimensionality associated with better episodic memory? We performed two analyses to address this question, one on individual differences and the other on the subsequent memory effect (SME).In light of the significant and stable individual differences in RDs (Fig. 2D and fig. S1), we first examined whether an individual with a higher RD had better memory. Because the dimensionality was correlated with activation in some ROIs (fig. S2), all of the following analyses controlled the activation level. The results revealed that across participants, better associative memory performance was associated with greater dimensionality in FSR (R = 0.190, P < 0.001; Fig. 4A, left) and HIP (R = 0.110, P = 0.020; Fig. 4A, right). Separately for each face-selective subregion, we also found that higher dimensionalities were associated with better associative memory performance in all ROIs (Pcorrected ≤ 0.041), except for a marginal effect in the left FFA (R = 0.09, Pcorrected = 0.052; table S1). Similar positive correlations were also observed when using RDeig and RDvar as the dimensionality index and the item recognition d′ as the behavioral index (table S1 and Supplementary Results).
Fig. 4.
The association between RD and memory performance.
(A) Scatterplots of memory performance (ACC_association) and dimensionality of the (left) FSR and (right) HIP. See table S1 for the correlation coefficients for each subregion after controlling the activation (ACT) level. (B) The dimensionalities of the MTL, medial frontal gyrus, and part of parietal lobe were positively related to associative memory performance (Z > 3.29, whole-brain corrected) after controlling for activation level. (C) Differences in dimensionality between remembered pairs and forgotten pairs in FSR and HIP. (D) Differences in dimensionality between remembered pairs and forgotten pairs in face-selective subregions and HIP. Error bars represent SEs across participants. SME was based on 10 items under each condition. Similar results were found when using 7 to 13 items (fig. S6). ~Pcorrected ≤ 0.1, *Pcorrected ≤ 0.05, **Pcorrected ≤ 0.01, ***Pcorrected ≤ 0.001.
The association between RD and memory performance.
(A) Scatterplots of memory performance (ACC_association) and dimensionality of the (left) FSR and (right) HIP. See table S1 for the correlation coefficients for each subregion after controlling the activation (ACT) level. (B) The dimensionalities of the MTL, medial frontal gyrus, and part of parietal lobe were positively related to associative memory performance (Z > 3.29, whole-brain corrected) after controlling for activation level. (C) Differences in dimensionality between remembered pairs and forgotten pairs in FSR and HIP. (D) Differences in dimensionality between remembered pairs and forgotten pairs in face-selective subregions and HIP. Error bars represent SEs across participants. SME was based on 10 items under each condition. Similar results were found when using 7 to 13 items (fig. S6). ~Pcorrected ≤ 0.1, *Pcorrected ≤ 0.05, **Pcorrected ≤ 0.01, ***Pcorrected ≤ 0.001.Given that the local smoothness may affect the estimation of dimensionality, we thus controlled the smoothness of the activation images of each participant when conducting the correlation analysis. Specifically, 3dFWHMx was used to calculate the smoothness and then averaged the smoothness from the X, Y, and Z directions for each participant. The results showed that RD and memory performance were still significantly correlated after controlling for smoothness (fig. S3). Moreover, whole-brain searchlight analysis revealed that the dimensionality in the medial temporal lobe (MTL), part of the frontal and parietal lobes, was also significantly positively correlated with associative recognition (Fig. 4B), in addition to the fusiform and occipital gyrus. The results suggest that the dimension of representation contributes to successful memory encoding beyond the activation level.To examine the specificity of the association between RD and face-name associative memory, we further correlated RDeff with the behavioral performance of the digital n-back task conducted in the scanner (see Materials and Methods). We found that individuals who performed better in the n-back task also showed higher associative recognition accuracy (R = 0.38, P < 0.001) and item recognition d′ (R = 0.30, P < 0.001) in the face-name associative memory task (fig. S4), suggesting that working memory could contribute to episodic memory. Nevertheless, we found no significant correlation between working memory performance and RDeff in FSR or HIP after controlling for the activation level (all P > 0.9). Separate for each subregion, the correlations were not significant either (all P > 0.1). Whole-brain searchlight analysis also failed to reveal significant correlations. In addition, RDeff was significantly related to associative (R = 0.12, P = 0.018) and item memory (R = 0.15, P = 0.003) in FSR and item memory in HIP (R = 0.10, P = 0.038) after controlling for working memory performance. These results suggest that RDeff contributes specifically to face-name associative memory but not to general cognitive ability.
Subsequently, remembered faces showed higher RDs
To further establish the relationship between RD and face memory, we examined SME by comparing RDs of subsequently remembered faces (faces with correct face-name associative recognition) with those of forgotten faces (all other faces). In this analysis, we excluded the participants whose accuracy for associative recognition was below chance level (i.e., 25%) or whose d′_item was smaller than 0 (n = 34). In addition, the number of items in the two conditions should be matched. According to fig. S5, there was a trade-off between the minimal number of items and the number of participants that could be included in this analysis. As a result, we systematically selected 7 to 13 (step = 1) remembered and forgotten items for each participant and estimated their dimensionality in each region. We reported the results of 10 items in the main text because this condition showed a good balance between the number of items and the number of participants (fig. S5). Similar results were found when different numbers of items were included (fig. S6).Intriguingly, we found that subsequently remembered faces showed greater dimensionality than subsequently forgotten faces in FSR (t = 3.843, P < 0.001) and HIP (t = 2.267, P = 0.024; Fig. 4C). Separately for each subregion (Fig. 2, B and C), we found that the dimensionality in the bilateral pSTS (left: t = 2.976, Pcorrected = 0.019; right: t = 3.343, Pcorrected = 0.005) and anterior STS (aSTS; left: t = 2.721, Pcorrected = 0.034; right: t = 3.597, Pcorrected = 0.002) showed significant SME, whereas other subregions did not (Pcorrected > 0.09; Fig. 4D). Moreover, whole-brain searchlight analysis revealed that, in addition to the occipital gyrus, the left medial frontal gyrus, bilateral angular gyrus, and precuneus also showed SME of RDeff (fig. S7A). However, we did not find significant SME of RDeig and RDvar in any region (P > 0.052, uncorrected), which might be due to the insufficient discriminability of RDeig and RDvar.Given the important role of HIP in associative memory (, ), we further examined whether HIP and FSR could contribute differentially to item recognition and associative recognition. We thus further compared RDeff between faces with only item recognition (n = 10) and those with associative recognition (n = 10). In total, 228 participants were included in the analyses. We found that faces with associative recognition showed greater RDeff than faces with item recognition in HIP (t = 2.045, P = 0.042) but not in FSR (t = 1.052, P = 0.294). Nevertheless, the region-by-memory interaction was not significant (F = 0.39, P = 0.53). These results provide partial support for the important role of high-dimensional hippocampal representations in associative memory.
RD did not reflect item-specific pattern similarity
Having identified the association between RD and face memory, we further examined the underlying mechanisms. Previous studies have found that item-specific neural representations (–) are associated with memory performance, and a greater RD could result in a more distinctive and item-specific representation. To test the hypothesis that greater RD could improve episodic memory by increasing the item-specific representation, we first calculated the within-item (WI), between-item (BI), and item-specific pattern similarity (PS) and then correlated them with RD. WIPS was measured as the Pearson correlation of the activation pattern across the two repetitions of the same face in each ROI, whereas BIPS was measured as the correlation between pairs of different faces that matched the WI pairs in memory performance and intertrial interval (, ).Contrary to our hypothesis, partial correlation analysis with activation level as a covariate revealed no significant correlations between item-specific PS and RD in FSRs or HIP (R ranging from −0.091 to 0.059, all Pcorrected > 0.29). We used two one-sided tests (TOSTs) based on the TOSTER R package () to statistically reject the hypothesis that RDeff and item-specific PS are correlated (i.e., equivalence test). We found that with a sample size of 468 and 80% power, the equivalence bounds were −0.13 and 0.13. In most ROIs, the 90% confidence intervals (CIs) of TOST were within the lower and upper bounds (PTOST < 0.05), except for the right aSTS (PTOST = 0.061) and HIP (PTOST = 0.196; fig. S8), suggesting that RDeff and item-specific PS were statistically uncorrelated.In addition, the item-specific PS was not correlated with the accuracy of associative recognition (R ranging from −0.068 to 0.082, Pcorrected > 0.45) or the d′ of item recognition (R ranging from −0.099 to 0.043, Pcorrected > 0.19). Using the R package cocor () to compare the correlation coefficients between RDeff and item-specific PS with memory performance, we found that the correlation between RDeff and ACC_association was significantly higher than that between item-specific PS and ACC_association in FSR (Z = 2.477, P = 0.013) but not in HIP (Z = 1.624, P = 0.104; table S2). The correlation between RDeff and d′_item was significantly higher than that between item-specific PS and d′_item in both FSR (Z = 3.921, P < 0.001) and HIP (Z = 3.296, P = 0.001; table S2). Separately for each subregion, significant differences were observed in the right OFA (Z = 2.708, Pcorrected = 0.034) and aSTS (Z = 2.818, Pcorrected = 0.029) for associative memory and in all subregions for item memory (all Pcorrected < 0.05; table S2).In addition to intersubject correlation analysis, representational similarity analysis (RSA) revealed no SME of item-specific PS in FSRs or HIP (F ranging from 0.033 to 1.299, P > 0.2). Moreover, although several brain regions showed SME of item-specific PS, including the right inferior parietal lobule (Montreal Neurological Institute, MNI: 54, −46, 44; Z = 3.997) and superior frontal gyrus (SFG; MNI: 22, 32, 60; Z = 3.795; fig. S7B), in none of these regions was there SME of RD (fig. S7A). Direct comparison of SME of RDeff and item-specific PS revealed significantly larger SME of RDeff than item-specific PS in the left superior occipital gyrus (SOG) expanded to the superior parietal lobe (SPL), the left middle occipital gyrus (MOG), the right posterior cingulate cortex (PCC), and the bilateral occipital fusiform gyrus (OFG), and an opposite effect in the right SFG, the left middle temporal gyrus (MTG), and the left precentral gyrus (fig. S7C). Together, these results suggest that item-specific PS and RD reflect different aspects of neural information representation and contribute complementarily to episodic memory.
Frontoparietal activity positively correlated with RD
The above analyses have revealed close relationships between RD and episodic memory. What could contribute to the dimensionality of representation? Existing studies suggest that frontoparietal top-down attention might exert top-down control (), which modulates RD (, ). More specifically, this modulation is posited to be achieved by suppressing the shared low-dimensional variance () and reducing the correlation of neuronal activities in the posterior regions (, ), resulting in more neural encoding space and thus greater RD. To test this mechanism in human participants, we examined whether frontoparietal activities can modulate the shared low-dimensional variance, the cross-voxel correlation of brain activities, and RD.Given the close relationship between attention control and memory (, , , , ), the top-down control region was defined as the frontoparietal regions in which their activities were associated with memory performance both across items and participants. Using the univariate activation level to predict subsequent memory performance, we found that several brain regions showed stronger activation for subsequently recognized pairs than forgotten pairs (fig. S9, top), including the left inferior frontal gyrus (IFG) (MNI: −46, 28, 16; Z = 4.57), left SFG (MNI: −4, 62, 22; Z = 4.675) extended to frontal pole (FP), left MTL (MNI: −18, −4, −16; Z = 5.949), left MOG (MNI: −28, −102, −4; Z = 4.082), right inferior temporal gyrus (ITG) (MNI: 46, −60, −6; Z = 3.961), right inferior occipital gyrus (IOG) (MNI: 40, −88, −10; Z = 5.107) extended to fusiform, right IFG (MNI: 46, 26, 24; Z = 3.584), and right SPL (MNI: 28, −62, 50; Z = 3.997).Correlational analysis suggested that individuals with better memory performance (ACC_association) showed greater activation in the left SFG (MNI: −4, 60, 36; Z = 4.176), left FP (MNI: 0, 8, 66; Z = 4.478), bilateral MOG (MNI: −4, −84, −16; Z = 5.879) extended to the fusiform gyrus and parietal lobe, bilateral MTL extended to the cingulate gyrus (MNI: 2, −44, 4; Z = 5.141), and bilateral caudate (MNI: 10, −4, 8; Z = 6.097) extended to the thalamus and right putamen (fig. S9, middle). Conjunction analysis revealed that the left FP, SFG, right SPL, bilateral MTL, and bilateral MOG were associated with episodic memory performance both across items and across participants (fig. S9, bottom). The left FP, SFG, and right SPL were thus defined as the top-down control regions for subsequent analyses.Focusing on the frontoparietal regions, partial correlation analyses (after controlling for the activation level in the posterior regions) revealed that the activation in the left SFG was significantly related to dimensionality in the bilateral pSTS (left: R = 0.195, Pcorrected < 0.001; right: R = 0.190, Pcorrected < 0.001) and aSTS (left: R = 0.158, Pcorrected = 0.003; right: R = 0.157, Pcorrected = 0.003; table S3). Activation in the right SPL was significantly positively correlated with dimensionality in the bilateral pSTS (left: R = 0.150, Pcorrected = 0.007; right: R = 0.168, Pcorrected = 0.002), right pcSTS (R = 0.130, Pcorrected = 0.019), aSTS (R = 0.143, Pcorrected = 0.009), and HIP (R = 0.126, Pcorrected = 0.019; table S3). No significant correlation was found between left FP activation and dimensionality in the face-selective subregions or HIP (all P > 0.16).
Frontoparietal activities negatively correlated with low-dimensional variability
We found that the left SFG activation was negatively correlated with the eigenvalues of the first six components in FSR and HIP after controlling the activation level (Fig. 5A, left). Similarly, the right SPL activation was negatively correlated with the first seven eigenvalues in FSR and the first six eigenvalues in HIP (Fig. 5A, right). These results were consistent with the results from population neuron recording () and further suggest that top-down activities could suppress the first few shared variances (i.e., low-dimensional variances) and increase the dimensionality of neural representations.
Fig. 5.
Top-down attentional modulations of RDs.
(A) Partial correlation between the activation in the left SFG (lSFG) and right SPL (rSPL) with the eigenvalue of each PC in FSR and HIP. Mean activation of these two regions was controlled as a confounding variable. (B) Results of Pearson correlation analyses between the eigenvalue of each PC with associative recognition (ACC_association) and item recognition (d′_item) after controlling for the activation level. The asterisks indicate that the partial correlation coefficient was statistically significant (i.e., P < 0.05). (C) SME for the eigenvalue of each PC in FSR. Error bars represent the within-subject SEs. (D) Schematic diagram of homogeneity of activation (HOA) analysis. (E) Local HOA mediates the relationship between lSFG and rSPL activation and local RD (related to Table 1). a × b, indirect (mediation) effect; c′, direct effect; c, total effect (indirect + direct); **P < 0.01.
Top-down attentional modulations of RDs.
(A) Partial correlation between the activation in the left SFG (lSFG) and right SPL (rSPL) with the eigenvalue of each PC in FSR and HIP. Mean activation of these two regions was controlled as a confounding variable. (B) Results of Pearson correlation analyses between the eigenvalue of each PC with associative recognition (ACC_association) and item recognition (d′_item) after controlling for the activation level. The asterisks indicate that the partial correlation coefficient was statistically significant (i.e., P < 0.05). (C) SME for the eigenvalue of each PC in FSR. Error bars represent the within-subject SEs. (D) Schematic diagram of homogeneity of activation (HOA) analysis. (E) Local HOA mediates the relationship between lSFG and rSPL activation and local RD (related to Table 1). a × b, indirect (mediation) effect; c′, direct effect; c, total effect (indirect + direct); **P < 0.01.
Table 1.
The mediation effect of HOA on the correlation between frontoparietal activation and RDeff in face-selective areas.
To further examine the functional role of the low-dimensional variances, we correlated memory performance with the eigenvalue of each PC. We found that for FSR, the eigenvalues of the first seven components were also negatively correlated with associative recognition (ACC_association) and that starting from the eighth component was positively correlated with associative memory performance (Fig. 5B, left). For HIP, we also found that the first 9 eigenvalues (except for the 8th eigenvalue) were negatively correlated with associative memory performance, and positive associations were found from the 10th component (Fig. 5B, left). Similar results were found for item recognition (d′_item; Fig. 5B, right). We also systematically varied the number of items (n from 10 to 29) included in this analysis, which again revealed the same pattern of correlation in FSR and HIP (fig. S10).Similar results were found when comparing subsequently remembered and forgotten items. In particular, for FSR, the eigenvalue of the second (t = 2.826, P = 0.005) component was greater for the forgotten faces (n = 10) than the remembered faces (n = 10), whereas a reversed pattern was found for the sixth (t = −2.946, P = 0.003) component (Fig. 5C). Together, these results suggest that low-dimensional variance was associated with worse memory performance and that frontoparietal activities could suppress this variance and improve memory performance.
Frontoparietal activities modulated the cross-voxel correlation of brain activities
One way to suppress the low-dimensional shared variance is to reduce the correlations of homogeneity of local neural responses (, , , ), which could increase the encoding space of neurons and increase the dimensionality. To test this hypothesis, we further examined whether this top-down effect was achieved by reducing the cross-voxel correlation of brain activities in posterior regions. To do this, we calculated the homogeneity of activation (HOA), which is the averaged cross-voxel correlation (Fisher’s r-to-Z transformed) in each ROI (Fig. 5D). A higher HOA would indicate greater correlations among voxels and thus a smaller encoding space. Consistent with our hypothesis, we found that HOAs of all the subregions were negatively correlated with their RDeff after controlling for the activation level (R ranging from −0.137 to −0.528, all Pcorrected < 0.01; table S4). Moreover, greater left SFG activation was associated with lower HOA in the left pSTS (R = −0.169, Pcorrected = 0.002) and the right aSTS (R = −0.122, Pcorrected = 0.050), and greater right SPL activation was associated with lower HOA in the left pcSTS (R = −0.171, Pcorrected = 0.001) and the right pSTS (R = −0.134, Pcorrected = 0.022; table S4).The mediation analyses revealed that HOA in the left pSTS, aSTS, and right aSTS partially mediated the relationship between the left SFG activation and the dimensionality in these regions (Fig. 5E and Table 1). In addition, HOA in the left pcSTS completely mediated the relationship between rSPL activation and dimensionality in this region, and there was a partial mediation effect in the right pSTS (Fig. 5E and Table 1). These results indicate that top-down attentional control could increase the neural coding space in STS, resulting in a larger RD.
The mediation effect of HOA on the correlation between frontoparietal activation and RDeff in face-selective areas.
How the neural system encodes information to achieve efficiency and generalization is still under debate (, ). A neural system encodes information most efficiently but is costly when its stimulus responses are high-dimensional and uncorrelated (, , ) and most robustly but redundantly when they are low-dimensional and correlated (, ). To achieve a balance, it is posited that the brain needs a trade-off mechanism that could reduce the dimensionality to eliminate irrelevant factors and at the same time recast the remaining factors into a high-dimensional space to generate complex behavior (). This mechanism has gained support from a recent animal study, which found efficient and robust representation in the mouse brain (). The current human neuroimaging study extends this observation by showing that the human brain also encodes high-dimensional yet robust face representations. The RD is lower than the maximal possible dimensionality but is much higher than the dimensionality required by the deep neural network model to identify individual faces.Consistent with the hierarchical organization of face representation (), we found that the representations in the more anterior face regions showed significantly higher dimensionality. In particular, face-specific processes are initiated in OFA based on local facial features, and then the information is forwarded to higher-level regions, such as FFA, for holistic processing (, , ). Consistently, lesions in more posterior areas tend to cause facial perception defects, whereas lesions in more anterior areas are likely to produce configuration processing or facial memory problems (–). A recent fMRI study combining computational models directly examined the information representations in FSRs, which revealed that OFA represented low-level image-computable properties (such as the grayscale). In contrast, FFA contains high-level visual features and social information (). Last, the greater RD in the right hemisphere FSR is also consistent with the right hemisphere lateralization of face processing (–).Notably, we found that HIP had the highest dimension compared to FSR. This is in line with its highly sparse and unrelated neural representations (–). A large percentage of cells in HIP exhibited poor spatial tuning () and showed sensitivity to the visual similarity of objects (). This sparse coding is posited to contribute to pattern separation (–) by projecting neuronal activity onto a larger population and reducing the proportion of active neurons, thereby reducing the overlap of different neuronal firing patterns (, ). Given the different neuronal computations in the hippocampal subfields, e.g., the dentate gyrus (DG) in pattern separation (, , ) and the CA3 region in pattern completion through autoassociation (–), future studies should further examine RDs in the hippocampal subfield using ultrahigh-resolution scans.Our results discovered an important and robust relationship between RDs and subsequent memory. Using a large sample and individual difference approach, we found that individuals showing overall higher RDs had better memory performance. The positive correlation between RD and memory performance does not depend critically on specific measurements of dimensionality but is generally applicable to different indicators (i.e., RDeff and RDvar). In addition to FSRs and MTL, whole-brain searchlight analysis revealed that RD in the posterior midline regions (PMRs) could also predict memory performance. PMR usually shows below-baseline level activation during episodic encoding, and this deactivation reliably predicts successful memory performance (–). Critically, RSA found that the (de)activation pattern carried information about the encoded items, and PS predicts subsequent memory performance (, ). As a result, PMR has been considered an extension of the hippocampal memory system (, ). Consistently, we also found significant item-specific representation in this region (t = 5.120, P < 0.001). These results together emphasize the contribution of the posterior midline structure to successful memory encoding.In addition to the cross-subject correlations, the within-subject analysis revealed that subsequently remembered faces showed greater RDs than subsequently forgotten faces. Partially supporting the role of HIP in associative memory (, ), we further found that faces with associative recognition showed higher RDeff than those with only item recognition in HIP but not in FSR. These results are consistent with the idea that the high dimensionality of neuronal representations might optimize information encoding () and could resist distractors (). Consistently, reinforcement learning could increase the dimensionality of neuronal response (), and quick learners have a higher dimensional representation than slow learners (). In contrast, error trials showed reduced dimensionality ().Animal studies have shown that frontoparietal attentional activity can reduce the baseline correlation between neurons (–, ) and increase the dimensionality of neuronal encoding (, ). Computational modeling suggests that these observations could be parsimoniously achieved by a top-down modulation of inhibitory neurons, which could suppress population-wide fluctuations, reduce low-dimensional shared variabilities (), and stabilize neuronal responses in the posterior regions ().The present study provides important human neuroimaging evidence to support and extend this top-down mechanism. In particular, we found that RD in face-selective subregions was correlated with activation in the frontoparietal area (i.e., left SFG and right SPL), a critical component of the dorsal attention system implicated in goal-directed and exogenous shifts in attention (). We found that the frontoparietal activities were negatively correlated with the eigenvalues in the first few PCs, which reflected the shared low-dimensional variances of neuron activities. We further revealed that these shared low-dimensional variances impaired memory performance. Together, these human results are consistent with the hypothesis that top-down control could suppress the shared low-dimensional variances and reduce the correlation of local neural response, which, in turn, increases the encoding space (dimensionality) and the amount of information encoded by a given population of neurons. This ultimately leads to better episodic memory performance. Nevertheless, given the correlational nature of our analysis, the causal relationship should be established in future studies by using noninvasive brain stimulation approaches ().Our results suggest that RD and item-specific representation might reflect different aspects of information representation and contribute separately to memory performance. First, extending previous studies (–, –), we found that item-specific PS in several brain regions was associated with better subsequent episodic memory (). Nevertheless, the regions showing significant SME of item-specific PS did not overlap with FSR and HIP, whose RDs were associated with memory performance. Second, RD did not correlate with item-specific PS. Last, the cross-subject analysis showed that memory performance was only correlated with RD but not with item-specific PS. Existing studies suggest that item-specific PS reflects the reproducibility and uniqueness of item representation, which could be contributed by study-phase retrieval (), dynamic representational transformation during encoding (), and top-down attentional modulation (). In contrast, RD reflects the degree of nonshared variances across all stimuli. More studies are required to further understand the nature and contributing factors of these diverse aspects of neural representations.Moreover, it is still a challenge to objectively determine the dimension of representations in the neural system with significant noise. To circumvent this issue, we relied on cross-subject and cross-item comparisons and used multiple indices to obtain consistent and reliable results. Nevertheless, the relatively small number of items might prevent a detailed characterization of the information representations in the brain. Future studies could use significantly more items () and cross-validation procedures () to recover the dimensionalities on data with noisy structures. In addition, future studies should use and compare multiple measures that reflect RD, including the intervoxel similarity that examines the averaged cross-voxel correlations of BOLD time series (, ).Last, extant results support the idea that the brain encodes information in a goal-directed manner, and dynamic changes in RDs have been found during various stages of the task (). For example, in a categorization learning task, the ventral medial prefrontal cortex exhibited goal-directed dimension reduction, as predicted by the computational model (). Greater dimensional reduction is also found in an action learning task than in a value learning task, and the latter requires more fine-grained discrimination of different values (). During task switching, the dorsal lateral prefrontal cortex showed low-dimensional representations of the behaviorally relevant categories, enabling the exclusion of irrelevant categories (). Future studies should examine how the dimensionality of neural representations is flexibly configured in various memory stages to support different memory functions (e.g., familiarity and recollections) and in different memory tasks (e.g., recognition and recall).To conclude, combining a large fMRI sample and novel analytical approaches, the current study discovers a robust relationship between RD and episodic memory performance and further provides a systematic examination of the underlying mechanisms. These results significantly advance our understanding of the representational mechanisms of episodic memory ().
MATERIALS AND METHODS
Participants
Four hundred seventy-eight healthy Han Chinese college students were recruited for this study as part of a large cohort study in China, i.e., the Cognitive Neurogenetic Study of Chinese Young Adults (CNSCYA). Ten participants were excluded due to incomplete imaging data (N = 1) or large head motion (mean framewise displacement > 0.3, N = 9). As a result, 468 participants (243 females, mean age = 21.44 ± 2.10) were included in the analyses. This study was approved by the Institutional Review Board of the State Key Laboratory of Cognitive Neuroscience and Learning at Beijing Normal University.
Face-name associative memory task
Stimuli
The experimental stimuli consisted of 30 unfamiliar face photographs (15 men and 15 women, which were chosen from the internet). Fictional first names (e.g., “Carol”) and common surnames (e.g., “Lee”) were assigned to each face and were used for encoding. Another 20 unfamiliar face photographs (10 men and 10 women) were used for the memory test. All face pictures were converted into grayscale images with the same size (256 × 256 pixels) on a gray background. All these face pictures had a neutral facial expression. Four additional face-name pairs were used in the practice session.
Procedures
During fMRI data acquisition, participants were asked to remember 30 unfamiliar face-name pairs. For each face-name pair, participants were instructed to remember the name associated with the face for later memory testing by pressing a button to indicate whether each name “fit” the face (right index finger = the name fit the face; right middle finger = the name did not fit the face). Participants were informed that it was a purely subjective judgment designed to help them memorize the association between faces and names. Each face-name pair was presented twice, with an interrepetition interval ranging from 8 to 17 trials. A slow event-related design (12 s for each trial) was used in this study to obtain better estimates of the single-trial BOLD response associated with each trial (Fig. 1A). Each trial started with a 0.5-s fixation, followed by a picture presented for 2.5 s. Then, the frame of the picture turned red, which indicated that the participants should press the button to indicate their response within 1.5 s. To prevent further encoding of the pictures, participants were asked to perform a perceptual orientation judgment task for 7.5 s. In this task, a Gabor image tilting 45° to the left or the right was presented on the screen, and participants were asked to identify the orientation of the Gabor image as quickly as possible by pressing the cone of the two buttons. A self-paced procedure was used to engage in this task, and the next Gabor image appeared 0.2 s after the response. Participants finished only one run of the encoding task, which lasted 12 min. Before the scan, they finished a practice session to familiarize themselves with the task and key responses. They were informed that there would be a subsequent memory test later, but they were not informed of the specific process of the memory test.Approximately 24 min later, during which the participants performed some other fMRI experiments (an n-back task and a decision-making task), participants were asked to complete the retrieval test in the fMRI scanner. The retrieval test stimuli consisted of the same 30 face pictures from the encoding stage and 20 new face pictures. All the pictures were randomly mixed. For each face picture, three old names from the encoding stage without a surname (the correct name that was actually paired with the face during encoding, and the other two names that were paired with different faces during encoding) and a new choice were underneath the face (Fig. 1A). The location of the correct name was counterbalanced for equal numbers on the three locations of the names. Participants were asked to judge whether they had seen the face and indicated the corresponding correct name if the face was old; otherwise, they had to press the new button if the face was new. Each trial started with a 0.5-s fixation, followed by a picture presented for 4 s. Participants were asked to press the button to indicate their response within 4 s. The responses were made using a button box (the left index finger corresponded to the first name, the left middle finger corresponded to the second name, the right index finger corresponded to the third name, and the right middle finger corresponded to new). After the face, a fixation cross of jittered duration (0 to 8 s) was placed on the center of the screen. This testing run lasted approximately 5 min.
Digital n-back task
After the face-name associative memory task, participants were asked to perform a digital n-back task (from 0-back to 3-back) in the scanner, with 0-back as the baseline. For the 0-back task, participants were to judge whether the number presented in the center of the screen was 7 or not. For 1-back to 3-back, participants had to decide whether the current number was the same as the one shown “n” items (n = 1, 2, or 3) before.
Behavioral data analysis
For the face-name associative task, studied faces recognized with the correct name were defined as remembered items, whereas those recognized with an incorrect name or judged as new were defined as forgotten items. Two behavioral measures were generated from this task, i.e., associative recognition and item recognition. Associative recognition refers to the correct recognition of face-name associations, which was quantified as the accuracy for both the old (i.e., choosing the correct name) and new faces (i.e., correct rejection). In contrast, item recognition refers to the correct recognition of old faces regardless of the correctness of names (hit). New faces that were incorrectly recognized with a given name were defined as false alarm items. The d′ score was used as the index of item recognition and was calculated by using the following formula: d′ = Z (hit rate) − Z (false alarm rate).For the digital n-back task, we separately calculated d′ for the 1-back, 2-back, and 3-back tasks and averaged them to reflect the overall n-back task performance. Accuracy in the 0-back task was used to screen participants who were not paying attention to the task. Participants with an accuracy lower than 90% in the 0-back task (n = 54) were excluded from further analysis on the relationship between RD and n-back performance.
MRI data collection and processing
MRI data acquisition
Image data were acquired using a 3.0 T Siemens MRI scanner in the Brain Imaging Center at Beijing Normal University. Visual stimuli were projected onto a screen behind the scanner, which was made visible to the participant through a mirror attached to the head coil. Stimuli and responses were presented and recorded by MATLAB (MathWorks) and Psychtoolbox on a Windows PC. A single-shot T2*-weighted gradient-echo, echo planar imaging (EPI) sequence was used for the functional scan with the following parameters: repetition time (TR) = 2000 ms; echo time (TE) = 25 ms; flip angle (FA) = 90°; field of view (FOV) = 192 × 192 mm2; 64 × 64 matrix size with a resolution of 3 × 3 mm2. Forty-one 3-mm transversal slices parallel to the Anterior and Posterior Commissure (AC-PC) line were obtained to cover the whole cerebrum and partial cerebellum. The anatomical scan was acquired using a T1-weighted MPRAGE sequence with the following parameters: TI = 1100 ms, TR/TE/FA = 2530 ms/3.39 ms/7°, FOV = 256 × 256 mm, matrix = 256 × 256, slice thickness = 1.33 mm, and 144 sagittal slices.
fMRI data preprocessing
MRI data were first converted to Brain Imaging Data Structure (BIDS) format using in-house MATLAB scripts. MRIQC v0.15.1 () was used as a preliminary check of MRI data quality.Image preprocessing analyses were performed by using fMRIPrep v1.4.0 (), which is based on Nipype 1.2.0 (). The first three volumes before the task were automatically discarded by the scanner to allow for T1 equilibrium. Each T1w volume was corrected for intensity nonuniformity and skull stripping. Spatial normalization to the ICBM 152 Nonlinear 6th Generation Asymmetrical template was performed through nonlinear registration using brain-extracted versions of both the T1w volume and template. All analyses reported here use structural and functional data in MNI space. Brain tissue segmentation of cerebrospinal fluid, white matter, and gray matter was performed on the brain-extracted T1w image. Functional data were slice time–corrected and head motion–corrected. This was followed by coregistration to the corresponding T1w using boundary-based registration with nine degrees of freedom. A mask to exclude signals with cortical origin was obtained by eroding the brain mask, ensuring that it contained only subcortical structures. Framewise displacement was calculated for each functional run.
Univariate activation analysis
Before conducting the univariate activation analysis, data were spatially smoothed using a 6-mm full width at half maximum (FWHM) Gaussian kernel and filtered in the temporal domain using a nonlinear high-pass filter with a 100-s cutoff. General linear modeling within the FMRIB’s Improved Linear Model (FILM) module of FMRIB’s Software Library (FSL) was used to model the data. During the encoding stage, the remembered and forgotten face-name pairs were separately modeled. The incorrect trials in the perceptual orientation task were coded as an additional nuisance variable, whereas the correct trials were not coded and thus were treated as an implicit baseline. Events were modeled at the time of stimulus onset and convolved with the canonical hemodynamic response function (double gamma function). SME was defined as the difference between remembered and forgotten pictures. These contrasts were then used for group analysis with a random-effects model using full FLAME (FMRIB’s Local Analysis of Mixed Effects) Stage 1 with automatic outlier detection (, ). Unless otherwise noted, group images were thresholded using cluster detection statistics, with a height threshold of Z > 2.3 and a cluster probability of P < 0.05, corrected for whole-brain multiple comparisons using Gaussian random field theory.
Single-item response estimation
A generalized linear model (GLM) was performed to estimate the activation pattern for each repetition of the face-name pairs during encoding. The same preprocessing procedure as in the univariate analysis was used except that no spatial smoothing was applied. A least-square single method was used in this single-trial model, where the target trial was modeled as one explanatory variable (EV), and all other trials were modeled as another EV (). Each trial was modeled at its presentation time and convolved with a canonical hemodynamic response function (double gamma). This voxelwise GLM was used to compute the activation associated with each of the 60 trials in the task. The procedure of single-item response estimation was applied in the native space at first, and then the activation brain map, t statistical map (), was transformed to the MNI152 space using the antsApplyTransforms tool from the Advanced Normalization Tools (ANTs) (). The normalized activation brain map was used to estimate RD and perform RSA.
Definition of ROIs
We focused the analyses on FSR and bilateral HIP. FSR was defined by the probabilistic functional atlas (threshold at 10% probability) obtained from 202 healthy adults (). FSR consists of OFA, posterior and anterior FFAs (pFFAs and aFFAs), pcSTS, pSTS, and aSTS (Fig. 2B). Considering that aFFA is too small (e.g., the number of voxels in the left aFFA is 182), we combined aFFA and pFFA into FFA in this study. HIP was defined using the Harvard-Oxford probabilistic atlas (threshold at 25% probability; Fig. 2C).
RD estimation
The dimensionality of facial representation was estimated by PCA (Fig. 2A). The dimensionality of the 30 faces was estimated separately for each repetition (Rep1 and Rep2). For a given region containing m voxels, the activation values of the 30 faces were extracted and reshaped as an m (voxels) × n (items, n = 30) matrix. After demeaning along the rows (items) (), the standardized covariate matrix M (n × n) was obtained by calculating pairwise Pearson correlations among activation patterns of items. We then performed PCA on the resulting matrix M by using the MATLAB function pcacov, which is based on singular value decomposition. We estimated RD according to the three criteria. First, RDvar was estimated by the number of PCs required to explain 90% of the variance (). Second, RDeig was defined by the number of eigenvalues greater than 1, which is based on the Kaiser rule (). Notably, the Pearson correlation matrix was used to fulfill PCA because the Kaiser rule (i.e., eigenvalues greater than 1) is used to estimate RD. However, this index has one major limitation. That is, for the same number of eigenvalues larger than 1 (RDeig), the cumulative variance explained by these eigenvalues was quite variable among participants. Given the arbitrary cutoff of eigenvalues “larger than 1,” some leftover variances are signals but not just noise. As a result, for two participants who have the same RDeig, the one who has a less explained cumulative variance by the larger than 1 eigenvalues should have more variances to be explained by the “smaller than 1” eigenvalues and thus a higher overall dimensionality. RDvar has been proposed to address this limitation, where the number of dimensions is determined by the number of eigenvalues required to explain a certain amount of cumulative variance, e.g., 90%. However, this measure might be affected by the number of very small eigenvalues, which is likely to be noise. To jointly take into consideration the major eigenvalues and their explained cumulative variance, we calculated RDeff by dividing RDeig by the cumulative variance (V) explained by eigenvalues larger than 1. Unless otherwise stated, the averaged dimensionality across two repetitions was used for further analyses.We further estimated the dimensionality of the 30 facial materials in the computational model. First, the 30 faces were passed on to OpenFace () (http://cmusatyalab.github.io/openface/), a pretrained deep neural network, to generate 128 descriptor measurements for each face. The correlation matrix was then obtained by performing the Pearson correlation analysis for each pair of face images. Similarly, the dimensionality for facial images was estimated by using pcacov on the resulting correlation matrix and according to the Kaiser criterion.
Representational similarity analysis
RSA was based on the single-item response pattern. Before conducting RSA, the mean activation pattern across all trials was subtracted from the activation images for each participant. We then calculated WI, BI, and item-specific (WI-BI) PS. WIPS was measured as the Pearson correlation of the activation pattern across the two repetitions of the same face, whereas BIPS was measured as the correlation between pairs of different faces that matched the WI pairs in memory performance and intertrial interval (, ). This analysis was performed in the predefined ROIs and the whole brain using the searchlight method with 125 surrounding voxels (). SME was examined by comparing the item-specific PS (i.e., WI-BI PS) between recognized and forgotten face-name pairs. A random-effects model was used for group analysis. Because no first-level variance was available, an ordinary least square model was used. Notably, one participant was excluded from the SME analysis because of the nearly perfect memory performance (i.e., recognized 29 pairs out of 30 face-name pairs).
Direct comparison of SME of RDeff and item-specific PS
In addition, we directly compared SME between RDeff and item-specific PS using the z score maps. First, RDeff for remembered and forgotten items was z-scored for each voxel and each participant, and SME of RDeff was obtained by subtracting the z-scored RDeff of forgotten items from that of the remembered items. Notably, this z transformation did not affect the SME effect. The same z transformation was performed for the item-specific PS, and SME of item-specific PS was obtained by subtracting the z-scored item-specific PS of forgotten items from that of the remembered items. The resultant z score maps were then directly compared to examine the difference in SME between RDeff and item-specific PS. The number of remembered and forgotten items was matched, and 321 participants who had at least 10 remembered and forgotten items were involved in this analysis.
Two one-sided tests
TOSTs based on the TOSTER R package () were used to test the independence of RDeff and item-specific PS. First, the powerTOSTr function was used to determine the equivalence bounds with a sample size of 468 and 80% power. The TOSTr function was then used to test the hypothesis of a lack of association between RDeff and item-specific PS. TOST is rejected, and equivalence/independence can be concluded if the 90% CI for TOST is within the lower and upper bounds. The null hypothesis statistical test (NHST) is accepted if the 95% CI for NHST includes zero.
Mediation analysis
Mediation effect tests were implemented with mediation () in R 4.1.1 () to examine whether HOA mediated the relationship between frontoparietal activation and dimensionality in the face-selective areas. HOA was estimated by averaging Fisher’s Z transformed correlation coefficients, which were the cross-trial Pearson correlation coefficients among the voxels in each ROI (Fig. 5D). We examined the relationship between (i) frontoparietal activation (X) and the dimensionality in the posterior ROIs (Y) (Y = k1 + cX + ε1), (ii) HOA in the posterior ROIs (M) and frontoparietal activation (X) (M = k2 + aX + ε2), and (iii) frontoparietal activation (X) and the dimensionality in the posterior ROIs (Y) with mediator (Y = k3 + c′X + bM + ε3). In the above equations, X (frontoparietal activation) is the predictor, Y (dimensionality) is the dependent variable, and M (HOA) is the mediator. The indirect effect was estimated as a × b, and the mediated proportion was estimated by the indirect effect divided by the total effect (c = a × b + c′). Unless otherwise specified, P values were corrected using Holm-Bonferroni correction for multiple comparisons.
Authors: Chengcheng Huang; Douglas A Ruff; Ryan Pyle; Robert Rosenbaum; Marlene R Cohen; Brent Doiron Journal: Neuron Date: 2018-12-20 Impact factor: 17.173
Authors: Evelyn Tang; Marcelo G Mattar; Chad Giusti; David M Lydon-Staley; Sharon L Thompson-Schill; Danielle S Bassett Journal: Nat Neurosci Date: 2019-05-20 Impact factor: 28.771
Authors: Oscar Esteban; Daniel Birman; Marie Schaer; Oluwasanmi O Koyejo; Russell A Poldrack; Krzysztof J Gorgolewski Journal: PLoS One Date: 2017-09-25 Impact factor: 3.240
Authors: Russell A Poldrack; Krzysztof J Gorgolewski; Oscar Esteban; Christopher J Markiewicz; Ross W Blair; Craig A Moodie; A Ilkay Isik; Asier Erramuzpe; James D Kent; Mathias Goncalves; Elizabeth DuPre; Madeleine Snyder; Hiroyuki Oya; Satrajit S Ghosh; Jessey Wright; Joke Durnez Journal: Nat Methods Date: 2018-12-10 Impact factor: 28.547