Literature DB >> 34890077

An efficient functional magnetic resonance imaging data reduction strategy using neighborhood preserving embedding algorithm.

Wei Zhao¹, Huanjie Li¹, Yuxing Hao¹, Guoqiang Hu¹, Yunge Zhang¹, Blaise de B Frederick^2,3, Fengyu Cong^1,4,5,6.

Abstract

High dimensionality data have become common in neuroimaging fields, especially group-level functional magnetic resonance imaging (fMRI) datasets. fMRI connectivity analysis is a widely used, powerful technique for studying functional brain networks to probe underlying mechanisms of brain function and neuropsychological disorders. However, data-driven technique like independent components analysis (ICA), can yield unstable and inconsistent results, confounding the true effects of interest and hindering the understanding of brain functionality and connectivity. A key contributing factor to this instability is the information loss that occurs during fMRI data reduction. Data reduction of high dimensionality fMRI data in the temporal domain to identify the important information within group datasets is necessary for such analyses and is crucial to ensure the accuracy and stability of the outputs. In this study, we describe an fMRI data reduction strategy based on an adapted neighborhood preserving embedding (NPE) algorithm. Both simulated and real data results indicate that, compared with the widely used data reduction method, principal component analysis, the NPE-based data reduction method (a) shows superior performance on efficient data reduction, while enhancing group-level information, (b) develops a unique stratagem for selecting components based on an adjacency graph of eigenvectors, (c) generates more reliable and reproducible brain networks under different model orders when the outputs of NPE are used for ICA, (d) is more sensitive to revealing task-evoked activation for task fMRI, and (e) is extremely attractive and powerful for the increasingly popular fast fMRI and very large datasets.

Entities: Chemical

Keywords: ICA; NPE; dimensionality reduction; fMRI

Mesh：

Year: 2021 PMID： 34890077 PMCID： PMC8886658 DOI： 10.1002/hbm.25742

Source DB: PubMed Journal: Hum Brain Mapp ISSN： 1065-9471 Impact factor: 5.038

INTRODUCTION

Functional magnetic resonance imaging (fMRI) data are one of the most widely used neuroimaging modalities for probing underlying neurobiological mechanisms. Functional connectivity analyses of fMRI data allow researchers to noninvasively estimate patterns of interregional neural interactions and are consistently observed at rest and correspond with patterns of task‐evoked activation and functional connectivity (Bijsterbosch et al., 2020; Lurie et al., 2020). FMRI data, especially group‐level fMRI data, have the characteristic of high dimensionality, due to the massive datasets with tens to hundreds of thousands of voxels, hundreds to thousands of timepoints in a single brain, and ever‐growing numbers of subjects (Bijsterbosch et al., 2020; Vince D Calhoun, Silva, Adali, & Rachakonda, 2015; Rachakonda, Silva, Liu, & Calhoun, 2016; Smith, Hyvärinen, Varoquaux, Miller, & Beckmann, 2014).Thus, for brain network analyses, the dimensionality (timepoints × subjects) is far larger than the presumed number of true brain connectivity networks in fMRI data. To analyze the fMRI signals efficiently, dimensionality reduction methods are necessary to reduce group datasets into a reasonable number. Principal component analysis (PCA), a generic method for reducing the dimensionality of a high‐dimensional dataset while preserving as much shared variance (i.e., statistical information) as possible, has been widely used to extract the dominant constituents from fMRI datasets (Jollife & Cadima, 2016; Mannfolk, Wirestam, Nilsson, Ståhlberg, & Olsrud, 2010; Rachakonda et al., 2016; Smith et al., 2014). By successively maximizing the variance of variables in the data after transformation, the most prominent orthogonal directions of the generated variation (eigenvectors) in a high‐dimensional space are determined, and the amount of variance in each direction (eigenvalues) are sorted from largest to smallest. Retaining the dimensions with the largest eigenvalues for further analysis and removing dimensions with smaller eigenvalues for dimensionality reduction is the typical method applied in PCA reduction. The motivation for choosing components with dominant variances of PCA results, is based on the assumption that principal components with the highest variance are most likely to be informative signals (e.g., brain networks), while noise components will have a small variance (Erhardt et al., 2011; McKeown, Hansen, & Sejnowsk, 2003). The drawbacks of discarding small variance eigenvectors in PCA reduction are well‐known but not well‐handled. PCA is so efficient and convenient in reducing the dimension burden that the potential loss of meaningful brain networks with small variances under the orthogonal constraint is relegated to second place. Even so, the number of dimensions after reduction is still higher than the number of dimensions expected after further analysis, for example, independent component analysis [ICA] or sparse dictionary learning. Thus, model order estimation methods (C. F. Beckmann & Smith, 2004; Y. O. Li, Adali, & Calhoun, 2007) which have been developed to assist in selecting a proper number, are often found to be unstable and dependent on a number of factors (e.g., field strength, number of time points, number of subjects, and data quality) (Abou‐Elseoud et al., 2010; Ding & Lee, 2013; Ray et al., 2013). Meanwhile, many information criteria, for example, Akaike's information criterion (Akaike, 1998), Kullback information criterion (Cavanaugh, 1999), and the minimum description length criterion (V. D. Calhoun, Adali, Pearlson, & Pekar, 2001; Rissanen, 1978) applied for model order estimation, are highly reliant on variance to choose a suitable number to select components from dimensionality reduction results. So far, there is no robust and stable model order estimation method that can be a solution to reduce the dimensionality for all fMRI datasets; moreover, results from different estimation methods vary wildly. Meanwhile, many studies have shown that the outputs of ICA with the two popular ICA toolboxes—GIFT (Egolf, Kiehl, & Calhoun, ) and MELODIC (Smith et al., 2004)—based on PCA dimensionality reduction, are too conservative in their dimensionality estimation for the ideal choice for fMRI data analysis based on the above criteria (Y. O. Li et al., 2007; Mannfolk et al., 2010; Ray et al., 2013; Suárez, Markello, Betzel, & Misic, 2020). Furthermore, some studies have found that higher order decompositions (e.g., orders over 70) revealed finer subnetworks compared to low‐order decompositions (e.g., order from 20 to 70) (Abou‐Elseoud et al., 2010; Kiviniemi et al., 2009; Ray et al., 2013). This suggests that there is further useful information carried in those “redundant” eigenvectors with small variance (Ystad, Eichele, Lundervold, & Lundervold, 2010). However, blindly selecting higher model order brings the problem of overestimation and corruption for ICA decomposition (Särelä & Vigário, 2004), even though it can compensate for the information loss during dimensionality reduction. Ideally, the dimensionality reduction methods should solve the dimension overwhelming problem while simultaneously retaining sufficient information in a reasonable model order. In the present study, we propose a novel and more effective dimensionality reduction method based on neighborhood preserving embedding (He, Deng, Yan, & Zhang, 2005) to solve the issues mentioned above. NPE is a manifold learning method which aims to preserve the local neighborhood structure of the data once projected onto the manifold, which could be patterns (e.g., brain networks in fMRI data) or between‐group differences that exist among datasets (Ball, Adamson, Beare, & Seal, 2017). It has been widely applied in machine learning fields such as face recognition, and has proved to be effective in projecting the high‐dimensional setting into the low‐dimensional subspace (Khodor, Kashana, Khoder, & Younes, 2017; 2012; Wu & Zhao, 2011). By leveraging such characteristics, we propose an adapted NPE method, which can be helpful in preserving the brain networks that exist in fMRI data when reducing dimensions. First, we applied singular value decomposition (SVD) on each subject to unfold the original fMRI dataset into individual subspaces to obtain subject‐specific eigenvectors. Second, we constructed neighborhoods for each eigenvector to enhancement subject‐sharing information for each subject, by calculating the correlation between every single eigenvector and those from other individual subspaces, and only retaining eigenvectors which lie within well‐compacted neighborhoods. Finally, generalized least squares (GLS) is applied to project information of components inside one neighborhood into the eigenvector from individual space, and the generated approximation are the dimensionality reduction results. Our method takes advantage of the neighborhood construction procedure in NPE and has three novel traits compared to PCA and other model order estimation methods. The first trait is that our method does not rely on the variance of components, and instead, the number of surviving neighborhoods becomes the criterion for reducing dimensions. As a result, components with small variance can be preserved so long as they are used to help define a neighborhood. The second trait is the strengthening effect for group information like group‐wise resting‐state functional connectivity or task‐evoked brain activation. Since neighborhoods consist of components with good connections from different subjects, the projection accomplished with GLS helps to collect information shared in neighborhoods. The third trait is that NPE is more efficient for fMRI data reduction, which can maintain more useful information with fewer eigenvectors. The advantages and improvements of NPE, that is, strengthening shared common information in lower SNR data, and compacting information in a more effective way, are evaluated and presented using both simulated and Human Connectome Project (HCP) datasets.

MATERIALS AND METHODS

Adapted NPE algorithm

In contrast to PCA, which aims to preserve the global Euclidean structure of a dataset, NPE aims to preserve the local manifold structure. It constructs an adjacency graph to compute the weights denoting the relationship between samples, and is widely used in machine learning fields. However, fMRI data are quite different from the typical machine learning dataset. Taking the classic face recognition application as an example; each face image is a single sample, and a massive number of samples comprise a huge face space with the same basic structure, even though each sample has its own type of specific submanifold such as angry, happy, or sad expressions. The number of face image samples far outnumbers the possible types of faces, and each sample has its unique type in face space. However, things are quite different when considering fMRI images, which usually are 4D or 3D maps. It is obvious that every brain image shares the same major manifold, as is the case with face images. However, each fMRI image is originally a mixture of several different brain network sources, which can be extracted into different components under various model assumptions (i.e., independent components [ICs] for a linear mixing model), and each component shares the same submanifold, for example, vision region, motor region, and so on. Thus, neighbor construction will be valid and exceptional for fMRI dataset analysis. Based on the unique characteristics of fMRI datasets, an adapted NPE stratagem was proposed to reduce fMRI data dimensionality. Considering that fMRI data are a mixture of several source components, SVD is performed first, primarily to transform individual data into an individual subspace of temporal and spatial eigenvectors that unfolds and separates the mixed information. This procedure brings the effect of orthogonal constraint into individual datasets (Zhi & Ruan, 2007). The whole procedure of this method is described as Figure 1.

FIGURE 1

Three stages are the graphical demonstration of adapted neighborhood preserving embedding (NPE) for functional magnetic resonance imaging (fMRI). Stage 1: Singular value decomposition (SVD) is applied to each subject to generate subject‐specific spatial eigenvectors, and the correlation coefficients of all spatial eigenvectors are used to form the adjacency graph. “Hotter” colors of dots or strips denote stronger connections. The adjacency graph of all subjects is named as G and each subject is consisted of T nodes and T*(N‐1) connections. The values in adjacency graph represent connections of two eigenvectors from different subjects. In single adjacency graph, the green strip represents a spatial eigenvector from subject k and the connected eigenvectors of this node were sorted from highest to lowest based on correlation. Stage 2: The illustration of a neighborhood. The hotter box and dots are qualified components that finally form a surviving neighborhood with the green dot, while darker dots are disqualified components with weak connections. Stage 3: The linear approximation is employed to compute the weight by using well‐constructed neighbors with generalized least squares (GLS). After computing all the weights of survived neighbors of subject k, they were used to project back the data reduction results

Stage 1: Adjacency graph construction

Taking the brain functional networks as the sources, the noise‐free model will be where contains the source components, is the mixing coefficients matrix (temporal course for each component), and contains the observations. For the k th subject , in , where N is the number of subjects, T is the number of time points, and P is the number of voxels. The number of time points T is usually significantly larger than the number of sources R, which leaves the space the remove redundant dimensions on temporal domain. Then, SVD decomposition transforms the X into subspace: where and are the set of spatial and temporal eigenvectors, and is the diagonal matrix holding eigenvalues. The adjacency graph for subject k, , is constructed from the correlations between spatial eigenvectors from the k th subject and those from the remaining subjects. Finally, for all subjects, a total of T × N nodes (number of total spatial eigenvectors) and the connections, or edges, are generated and defined as the correlation coefficients of two spatial eigenvectors—higher correlation between eigenvectors means the eigenvectors are more adjacent.

Stage 2: Neighborhood construction

Normally, K‐nearest neighbors and neighborhoods are the two common ways to build neighborhoods for traditional NPE. With certain sorted distance metrics of edges, the former only recruits the top K edges to form a neighborhood, while the latter only rules out edges undermine the threshold. However, in the special case of fMRI data, the orthogonal constraint on SVD results naturally divides each adjacency graph into T neighborhoods (each eigenvector as a node defined a neighborhood), which including T nodes and their corresponding T × (N‐1) edges. In such case, we adopt and adapt both two stratagems to automatically reduce the dimensions from T to suitable number. The adjacency graph denotes all potential neighborhoods of subject k with the relationship between components of subjects k and other subjects. Then good neighborhoods will be compact and have strong connections, which means only highly similar spatial eigenvectors can form solid edges. Realistically, the SVD cannot perfectly extract brain networks from the observations, and the similarity of eigenvectors can be weakened and affected by the quality of the original datasets. Hence, several linkage principles are set up to assess the quality of neighborhoods and make sure the neighborhoods are suitably compact. First, we thresholded the edges with to define solid effective edges; edges below the threshold are treated as abundant weak connections (denoted as a black dot or eigenvector). Second, after that we only retained neighborhoods with solid edges numbering more than half group size (N/2, number of components within circle) to constrain the group level common trait.

Stage 3: Linear approximation and projection

Once the neighborhoods are well‐established, for example, we target the eigenvector of any surviving neighbors in and employ GLS to solve the minimization problem for computing the weights : From the above, note that is from the k th subject, while is the set of eigenvectors from the rest of the subjects that are within the same neighborhood as . Namely, the is enforced to be 0 if no connection or edge is established or does not belong to the neighborhood. Finally, the denotes the projection of the observation for subject k. Such kind of linear approximation can bring adequate group‐level information while preserving the local structure inside the k th subject. The reconstructed results is also the dimensionality reduction results of subject k, such that the corresponding eigenvalue and mixing coefficients of any reconstructed spatial eigenvector can be referred from and in Equation (2).

Simulated fMRI data

Simulated 4D fMRI data were generated by using 6 resting‐state brain networks (Damoiseaux et al., 2006) and 6 ground truth temporal courses with 250 time points. A total of six subjects' fMRI data were generated; each individual fMRI data set contained five of the six paired spatial maps and temporal courses. One unique pair of spatial map and corresponding temporal course was missing for each subject, so that we could test the ability of the proposed method to preserve individual space information without introducing nonexisting information into each individual subject. Meanwhile, expected variation in the fMRI data from each subject was simulated by including spatial and temporal variability to each simulated dataset. The spatial variability was achieved by slightly rotating the same brain network for each subject with a maximum 3° rotation. The temporal variability was introduced by using subject‐specific temporal course derived from real data for six subjects to simulate the resting‐state fMRI signals as shown Figure 2. By altering temporal courses with a time‐lag block design, the task fMRI simulation data were also generated by testing the performance of proposed method (Supplementary Figure 1).

FIGURE 2

The spatial maps and temporal courses used as ground truth. (a) Six source maps are created from the resting‐state networks. The original spatial maps cover the whole brain (in the figure they are thresholded for clarity, as there is some overlap between these source maps). (b) The temporal courses correlation of total 36 temporal courses for six ground truth networks of six subjects each. (c) The temporal courses are generated from real functional magnetic resonance imaging (fMRI) data and designed with one unique spatial–temporal pair omitted for each subject as zeros for each subject The SNR of the fMRI data has a large effect when applying PCA for dimensionality reduction, because the interesting, ground truth information can be drowned out by noise and distributed into a large range of dimensions in PCA results, especially when SNR is low. Therefore, the SNR is defined as the ratio of the SD of the source signal to the SD of the Gaussian noise. Gaussian noise was added for each subject with amplitudes appropriate to generate the SNR range from 1 to 4 with a step of 0.5. Each subject's fMRI data are originally formed as 4D maps (volume × timepoint, 61 × 73 × 61 × 250). A whole brain mask was applied, and the data were indexed into a two‐dimensional matrix (voxel × timepoint, 67,541 × 250) as the input to the dimensionality reduction.

Real fMRI data

Resting‐state fMRI data

Resting‐state data from a total of 100 healthy subjects (70 females and 30 males, age: 30.2 ± 2.6) from two phase encode directions (right‐to‐left and left‐to‐right) and two sessions (Rest 1 and Rest 2) were selected from the HCP 3T data repositories (HCP: www.humanconnectome.org). All had undergone the “minimal preprocessing” procedure (Glasser et al., 2013), including gradient unwarping, motion correction, fieldmap‐based EPI distortion correction, brain‐boundary‐based registration of EPI to structural T1‐weighted scan, nonlinear registration into MNI152 space, and grand‐mean intensity normalization. For details of the data acquisition parameters, see Smith et al. (2013). Meanwhile, the “clean” data after ICA_FIX is used to reduce the effects of motion. To minimize the effects of data acquisition and preprocessing, the only additional preprocessing performed by us was using a kernel of FWHM = 6 voxel to smooth data with FMRIB Software Library, FSL (Smith et al., 2004, https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/). After preprocessing, we averaged the data with two phase encode directions to avoid distortion and cut out the last 1,000 volumes from a total timepoints of 1,200. Finally, brain voxels were extracted from subjects' original 4D (volume × timepoint; 91 × 109 × 91 × 1,000) data files and reformatted into two‐dimensional matrices (voxel × timepoint; 228,483 × 1,000) as the input to the dimensionality reduction. The SNR of fMRI vivo data is usually evaluated with the metric tSNR (temporal SNR). We calculated the tSNR in a given voxel of the resting‐state vivo data by dividing the mean voxel value across time points with its temporal SD. The whole brain mean tSNR is 97.2 ± 8.1 for 100 subjects.

Task fMRI data

The language processing task fMRI data were chosen from the HCP 3 T datasets of the same subjects as used for the resting‐state fMRI data. The task consists of two runs that each interleave four blocks of a story task and four blocks of a math task. Only one phase encode direction, from left to right, is used from two runs because it is not crucial for the final results and conclusion. The more detailed description and motivation for the task design are in Barch et al. (2013) and Binder et al. (2011). The total length of this task design is 316 volumes. The lengths of the blocks vary (average of approximately 30 s), but the task was designed so that the math task blocks match the length of the story task blocks, with some additional math trials at the end of the task to complete the 3.8 min run as needed. The whole brain mean tSNR is 125.2 ± 16.8 for 100 task fMRI datasets.

Comparison stratagem and assessment methods

Simulated data

To fully assess the performance of the proposed method, two different PCA strategies were utilized to demonstrate the pitfalls of dimensionality reduction when dealing with low SNR datasets. The first was to match the dimensions of the PCA with the surviving NPE neighborhoods' number, called as matched PCA (mPCA), while the second was to select dimensions for the PCA which were not matched, but determined by retaining sufficient components to explain a given proportion of the signal variance in each data set, called variance PCA (varPCA). After reducing dimensions, the individual ICA (fast ICA used, Hyvärinen, 1999), group ICA (V. D. Calhoun et al., 2001), and dual regression (C. Beckmann, Mackay, Filippini, & Smith, 2009; Nickerson, Smith, Öngür, & Beckmann, 2017) were used to evaluate the dimensionality reduction performance of the NPE, mPCA, and varPCA methods. For dimensionality reduction of group fMRI data, in addition to varPCA and mPCA, full‐concatenated PCA (full‐PCA), which directly concatenates the original dataset along the temporal dimension and applies a single PCA for dimensionality reduction, was also treated as another comparison of group ICA. The details of how the comparisons were conducted for simulation data are listed below.The comparison of individual ICA results mainly focused on the recovery of the ground truth networks, which were separately extracted under all levels of SNR. We calculated the correlation coefficients between the ICA decomposition results and the ground truth to evaluate the sensitivity and specificity of the derived spatial–temporal components. We also used the average of the correlation coefficients of all the ground truth networks for each SNR to determine how performance changed with increasing SNR. To clarify the advantage of the proposed method, both the ground truth subject‐specific spatial maps and temporal courses, as well as the group‐level overlapped spatial maps and temporal courses, were used to evaluate the performance of different dimensionality reduction methods. The former was targeted to the sensitivity and specificity of individual datasets, while the latter was more focused on group effects. For all levels of SNR, the between methods differences were tested with paired two‐sample t test and marked for significant differences (*p < .05, **p < .01). For group ICA, a similar comparison was conducted by calculating correlation coefficients between recovered spatial–temporal components and ground truth to quantify the performance of different methods. Dual regression conducts a temporal linear model fit on group maps (the first step), and the resulting temporal courses are used to estimate subject‐specific spatial maps (the second step). Dual regression was conducted based on the results of group ICA to calculate subject‐specific spatial–temporal pairs. Individual ICA: Dimensionality reduction + ICA was implemented. Three dimensionality reduction methods, adapted NPE, mPCA, and varPCA were performed prior to ICA decomposition. Group ICA: The classic two‐step PCA/adapted NPE + ICA was implemented. In addition to the three methods inherited from individual ICA above, the performance of full‐PCA, which simply conducts PCA on the temporally concatenated original datasets before the group ICA was also evaluated. Dual regression: After getting the group‐wise result of spatial maps from group ICA, we applied dual regression to get the temporal courses and spatial maps in individual subjects. However, only adapted NPE and mPCA are employed with dual regression for clarity and simplicity, since the outcomes of varPCA and full‐PCA can be predicted based on the correlation of group‐level recovery on spatial maps. Temporal concatenated group ICA was used to do comparison between PCA and proposed methods. After reducing dimension, results under different model orders from low (50) to high (200) were fed into ICA decomposition. The final output IC, spatial maps were z‐score transformed and thresholded with . The between ICs' correlation coefficients were applied with fisher‐z transformation. However, the lack of a “gold standard” for real data led us to seek these important metrics for assessment. Efficiency of dimensionality reduction. Based on the two‐step PCA procedure, we calculated the explained variance ratio and dimension number of reduction results to assess the efficiency of two methods. Reproducibility. One model order was chosen as the reference model order, and the decomposition results of other model orders were compared with the reference one to identify those reproducible components. The reproducibility were evaluated by comparing the number of highly correlated (correlation coefficients or fisher‐z value ) ICs under different model orders (Groves et al., 2012). Hence, the focus of reproducibility is to justify the reliability of one specific model order by repeating ICA in another model order. Test–retest reliability. To demonstrate the unique advantage of possessing group common information of NPE methods, the test–retest assessment was performed by comparing highly reproduced ICs from two different rest runs (Rest 1 scanned in Day 1 and Rest 2 scanned in Day 2) under several typical model orders. Consistency. Consistency analysis was performed based on the initial idea of evaluating the persistence of ICs across a certain model order range. Consistent ICs under specific (reference) model order were defined as those which can be repeatably extracted over a range of model orders (Zhao et al., 2020). Unlike the reproducibility, which favors model order, the consistency focus on persistence of ICs and aims to assess the reliability of consistent components. Reconstructed 4D task fMRI data for each subject were generated after NPE‐based data reduction to demonstrate the group common information collection ability of NPE, especially in low SNR condition. The original temporal course of the language processing task contains 316 volumes and consists of four blocks of two types of task stimuli (story and math). Hence, we segment it into half and quarter length, by retaining only the first 158 and 79 volumes, respectively, which properly includes one half and one quarter of the four blocks (see the HRF‐convolved time course in Figure 9a). The smaller blocks here represent lower SNR, because it will be more difficult to produce robust and strong activation for participant. Then the three temporal course lengths (including full length) were separately analyzed with general linear model (GLM) on either the original data, or the reconstructed individual data after NPE. The GLM was implemented with FSL by using the FEAT script provided by the HCP to estimate the first‐level effects of task stimuli (Barch et al., 2013). The group‐level voxel‐wise comparison (one‐sample t test) were conducted with DPABI (Chao‐Gan & Yu‐Feng, 2010) and visualized with FSLeyes (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FSLeyes). All group‐level activation maps are T maps without correction and threshold with for shown.

RESULTS

Simulated data—Dimensionality reduction

Figure 3 shows the performance of dimensionality reduction of the NPE, mPCA, and varPCA methods. The X‐axis shows the eigenvectors from the SVD for the NPE, mPCA, and varPCA methods. The eigenvectors are sorted, with eigenvalues from highest to lowest. The Y‐axis shows the sum of the correlation coefficients of each spatial eigenvector (total 250) with all the ground truth networks for SNR = 1. Different color bars denote the correlation value of each eigenvector with the different ground truth networks; each subject had five dominant colors, which was consistent with the design of one ground truth network omitted in each subject. For each eigenvector, the heights of all five bars were summed to represent the level of information contained in the eigenvector related to ground truth networks, that is, how “informative” the eigenvector was. It is clear that high eigenvalues do not necessarily mean the network was highly informative. Many dominant eigenvectors with higher eigenvalues were not informative; conversely, there were many informative eigenvectors with very small eigenvalues.

FIGURE 3

The comparison of dimension selection. The bar height represents the sum of correlation coefficients between ground truth networks and spatial eigenvectors; the different colors denote the different networks. It is noticeable upon examination that each subject only has five dominant colors, because every subject was designed with one ground truth network absent in the simulation. The black dashed line is the cutoff of dimensionality reduction for matched principal component analysis (PCA). The black solid line is the cutoff of dimensionality reduction for variance PCA with the same variance ratio 90% (resting‐state functional magnetic resonance imaging [fMRI] simulation) or 90% (task fMRI simulation) in different SNRs. The pink background indicates the spatial eigenvectors located by neighborhood preserving embedding (NPE) with surviving neighborhoods The dimensionality reduction performance of the three methods for resting fMRI simulation data are shown in Figure 3. Under the lowest SNR level (Figure 3 top left), varPCA was able to collect most of the informative components with 90% of explained variance (the black solid line, numbered as 161, 151, 155, 150, 152, 123), but at the expense of including many noninteresting eigenvectors (which means low dimension reduction efficiency). With our proposed NPE method, the surviving spatial eigenvectors within qualified neighborhoods were indicated with a pink background, which identified informative eigenvectors in subject‐specific level with no regard to the variance of eigenvectors. mPCA only retained a small number of informative components to achieve the same dimension reduction efficiency as the NPE method (denoted by the black dotted line, numbered as 36, 53, 40, 43, 50, and 65). Noticeably, the NPE method can select almost all the informative components, even those with smaller eigenvalues, while keeping the highest dimension reduction efficiency. Furthermore, to clarify the performance of dimensionality reduction under different SNR levels of all three methods, one simulated dataset (subject #3) was chosen as an example to demonstrate this influence (Figure 3 top right). When the SNR was higher (SNR ≥3.5), NPE shows slightly higher dimensionality reduction efficiency than mPCA by selecting mainly useful information with a smaller component number (the black dotted line, numbered as 45, 50, 57, 46, 30, 16, and 8). The performance of NPE and mPCA diverge when SNR was lower (SNR ≤3.0); NPE demonstrates more effective dimension reduction than mPCA by retaining more useful information with the same number of components. The varPCA method shows lower dimension reduction efficiency that maintains the same proportion explained variance (90%, the black solid line, 156, 153, 149, 144, 140, 133, and 126) for all the different SNR levels. Thus, the NPE‐based dimension reduction method shows better performance over the full range of SNR levels considered, and always target on group‐sharing common information even under lower SNR. NPE demonstrated to be the most effective dimension reduction method as it can detect most informative spatial eigenvectors with the lowest dimension number. The advantage of NPE method on simulated task fMRI data are more obvious because the information distributions are more local concentrated. In this case, around 110 components (black solid line) with 90% of explained variance for varPCA, will be able to collect all ground truth information. NPE reduces the dimension number to 27, 43, 30, 24, 29, and 36 from subject 1 to 6 (Figure 3 bottom left), and 30, 44, 40, 31, 25, 14, and 12 from SNR 1 to 4 (Figure 3 bottom right) to collect all ground truth information. Noticeably, the linkage between variance explained and efficiency at collecting ground truth components no longer holds, and eigenvectors with large variance may not be helpful in further ICA decomposition, but only cause higher model order burden. That can be one reason to explain the distinct performance of different methods, because NPE and mPCA have the same model order, but mPCA retains less important information, while varPCA exceeds the model order compared to NPE which leads to corruption on ICA decomposition in low SNR (see in Sections 3.2 and 3.3). For both individual and group ICA, NPE has a better dimension reduction efficacy in proportion of explained variance versus and retained eigenvector numbers (Supplementary Tables 1–4).

Simulated data—Individual ICA

First, the performance of detecting group common information was evaluated for the three methods by calculating the correlation of individual ICA decomposition results with the overlapped spatial–temporal ground truth networks. As shown in Figure 4a, the boxplot of three methods—mPCA (green), varPCA (blue), and adapted NPE (red) denotes the mean correlation coefficients of all the ground truth networks and paired ICA components for six subjects under SNR levels from 1 to 4. The right panel heat map represents the correlation coefficients of ICA components and ground truth networks for each single subject. NPE method shows better performance on recovering the ground truth networks at low SNR levels compared with the other two methods, while not inducing nonexistent network (represented as repeated diagonal elements). Overall, the adapted NPE‐based dimensionality reduction methods outperformed mPCA and varPCA in recovering group‐shared spatial maps of all the ground truth networks under all SNR levels. The same results were also found from the simulated task fMRI data as shown in Supplementary Figure 2.

FIGURE 4

The comparison of individual independent components analysis (ICA) results with ground truth networks with resting‐state functional magnetic resonance imaging (fMRI) data. Two types of ground truth, overlapped (a) and subject‐varied (b), were evaluated. Coefficients of spatial maps and temporal courses were separately assessed under SNR levels from 1 to 4 with a step of 0.5. The boxplot denoted the mean correlation of all ground truths for six subjects. The heat map represents detailed recovery performance for each single subject and ground truths. *p < .05, **p < .01 We then evaluated the performance of detecting individual‐specific information for three methods by calculating the correlation between individual ICA decomposition results and subject‐specific spatial–temporal ground truths. As shown in Figure 4b, compared with mPCA, NPE shows significantly better performance in recovering the subject‐specific spatial maps (in lower SNR, SNR ≤2.0) and temporal courses (for all SNR levels). Compared with varPCA, NPE is more powerful in recovering subject‐specific spatial maps and temporal courses under lower SNRs (SNR ≤2.0). This is expected, as both overestimated model order (varPCA) and mass information loss (mPCA) can corrupt the recovery of spatial maps and temporal courses. Of note, there was a cross‐over, in that varPCA outperformed NPE slightly at the highest SNR (4.0). This indicates that NPE slightly sacrifices some subject‐specific information as a tradeoff for strengthening the group common information.

Simulated data—Group ICA and dual regression

Since the group ICA is already focused on group information, we only performed comparison based on the overlapped ground truth networks for the group ICA decomposition results. The correlation coefficients of all the ground truth networks and corresponding ICA components for all SNRs are shown in Figure 5a. The proposed NPE method had an overall better performance for group ICA results, and a significant difference were noticed under low SNR levels. Interestingly, the full‐PCA clearly outperforms the other methods in recovering the temporal courses, because of no information loss. The dual regression used to recover subject‐specific ground truth networks are shown in Figure 5b; the proposed NPE (red) had an improvement in estimating subject‐specific spatial–temporal components compared to mPCA (green) especially under low SNRs (SNR ≤2.0), which was coherent with the performance on group‐level independent components (ICs). The similar results were also found in resting‐state fMRI data analysis (Supplementary Figure 3). We only showed the comparison results between proposed NPE and mPCA method, because the performances of dual regression are heavily relied on the group ICA result.

FIGURE 5

The comparison of group independent components analysis (ICA) results and dual regression results of task functional magnetic resonance imaging (fMRI) simulation. (a) The comparison of group ICA results with the overlapped ground truth networks between matched principal component analysis (mPCA) (green), full‐PCA (black), varPCA (blue), and neighborhood preserving embedding (NPE) (red). The boxplot shows the correlation coefficients between group‐level ICs and six paired spatial–temporal ground truths under SNR levels from 1 to 4. (b) The comparison of dual regression results with the subject‐varied ground truths between mPCA (green) and NPE (red). The boxplot denoted the mean correlation coefficients of all the regressed subject‐specific set of spatial–temporal components for six subjects SNR levels from 1 to 4. *p < .05, **p < .01

Real fMRI data—Resting‐state fMRI

Efficiency of dimensionality reduction

After a two‐step dimension reduction procedure, proposed NPE method shown a better efficiency in compacting information as shown in Figure 6. PCA‐based methods must employ much higher dimension numbers to achieve the same level of information ratio. For example, NPE‐based method shows higher explained variance radio (around 90%), while PCA‐based method only achieved 73% explained variance radio with the same model order of 200. Therefore, the NPE‐based method can be helpful in dimension selection during first PCA step by targeting on those common shared eigenvectors and then benefit the second PCA with a higher compact effect.

FIGURE 6

The comparison of dimension reduction number and corresponding explained variance ratio. Both two run, Rest 1 (red) and Rest 2 (blue) are assessed for principal component analysis (PCA)‐based (dotted line) and neighborhood preserving embedding (NPE)‐based method (solid line). The black line represents the number and 90% ratio level

Reproducibility

Between model order reproducibility comparisons were implemented with the principle of proximity for four different model orders, 50, 100, 150, and 200. The quantitative metric employed was similarity of ICs, and the correlation coefficients were computed as a function of model order. In Figure 7a, the left panel represents the three paired comparison (150 vs. 200, 100 vs. 150, and 50 vs. 100, results from Rest 1) results with line plot for two methods (solid line for NPE and dotted line for PCA). NPE slightly outperformed PCA for the first two pairs and showed a superior performance under 150 versus 200 results, with more highly correlated ICs. To further explore the difference, we mapped the coefficient into a spatial distribution with corresponding ICs identified as reproducible ICs. Most areas are common regions and black circles are used to denote major differences, which show that NPE detects more cortical networks in prefrontal areas. As for a more detailed spatial distribution comparison under other pairs, model orders 100 versus 150, and 50 versus 100, as well as the results from another run, Rest 2, are listed in supplementary materials (Supplementary Figure 5).

FIGURE 7

The reproducibility and test–retest ability comparison between two methods. (a) The reproducibility is assessed between two model orders (yellow for 150 vs. 200, green for 100 vs. 150, and black for 50 vs. 100) in quantitative correlation. (b) The test–retest ability is assessed by comparing similarity of results from two resting runs under different model orders (blue for 200, yellow for 150, green for 100, and black for 50). One best result each is showed in right box by mapping correlation into spatial distribution. Black circle marked the major differences between the two results

Test–retest reliability

The same four typical model orders, 50, 100, 150, and 200 are used to evaluate the test–retest reliability. For each model order level, NPE shows better performance in producing more reliable ICs between two isolated runs for the same datasets (Figure 7b left). The results for model order 100 were chosen to show the spatial distributions of those reproducible ICs generated by NPE and PCA methods (Figure 7b right). It shows that more prefrontal and occipital areas were identified using NPE. The complete comparisons of other model orders are shown in the supplementary materials (Supplementary Figure 6).

Consistency

The consistency is evaluated from model orders 50 to 200 with a step of 10 for reference model order 100. As shown on the consistency map in Figure 8, the persistency of these 100 ICs was visualized in the heat map for NPE‐based (left) and PCA‐based method (right). We sorted the ICs based consistency from top to bottom and the brighter row indicated a higher consistency. NPE‐based results had shown a better performance in producing consistent ICs. The top 30 highly consistent ICs were chosen to represent the spatial distributions. The areas of major difference are also marked with black circles. NPE shows more consistent ICs in cortex areas while some of PCA results are found in white matter area. The similar results from Rest 2 are shown in supplementary materials (Supplementary Figure 7).

FIGURE 8

The consistency comparison between two methods. The consistency is assessed by computing the persistence of independent components (ICs) with correlation coefficients across model order 50 to 200 with a step of 10. The sorted paired IC with model order 100. The spatial distributions of the top 30 ICs are shown in the middle box. Black circle marked the differences between two results

Real fMRI data—Task fMRI

The task fMRI data have the advantage that prior knowledge for expected activation areas. For the selected task, Barch et al. (2013) found a robust activation in ventral lateral prefrontal cortex and in both superior and inferior temporal cortices, including the anterior temporal poles bilaterally. These findings were identical in our results including a highly similar, stronger activation on the left side of the brain. Hence, the comparison between three segment conditions can be more reasonable by using the activation areas and level as “golden standard” from Barch et al. (2013). The group activation maps (contrast of story vs. math) in Figure 9c, have shown that after NPE reconstruction (right panel), there are stronger and wider activation under full, half and quarter length of temporal course conditions. With NPE strengthening, data with half‐length shows better performance than the original full length data. To ensure the individual‐specificity, the 100 subject's first‐level activation maps in each segment condition were compared with their group‐level activation and the correlation coefficient are shown in Figure 9b. Apparently, not all subjects show high correlation especially under shorter length of the original data (GLM, dotted line). The reconstructed data with NPE shows superior correlation (solid line) under all three data lengths compared with the original data (dotted line). We also noticed that some subjects with worse correlation were still hold the same situation after NPE reconstruction. That is a solidified characteristic of preserving individual‐specificity in accordance with the simulation results in both resting‐state and task fMRI data experiments. Meanwhile, a group‐wise correlation comparison of mean/SD is represented in Figure 9b, which denoted a more quantified differences between two methods and length conditions.

FIGURE 9

The individual and group‐level comparison of task activation map. (a) The HRF course of two task stimuli and the segment illustration. (b) The correlation between the participant activation maps and group activation maps under certain conditions and methods. The upward violin plot with mean/SD bar denotes group‐wise comparison. Three different colors purple, green, and yellow were used to represent three segment conditions—full, half, and quarter, respectively. Solid lines show general linear model (GLM) results after reconstruction with the neighborhood preserving embedding (NPE) method, and dotted lines show GLM on the original data. (c) The group activation maps of the two methods and three conditions

DISCUSSION

In this article, we propose and evaluate a novel and powerful dimensionality reduction method for fMRI data based on adapted NPE. It shows excellent performance for recovering both the spatial and temporal components of ground truth networks in simulated data, and for generating more reliable and reproducible results, especially for low SNR fMRI data. The advantage of the NPE‐based method comes from the unique stratagem of preferentially selecting dimensions found within “neighborhoods” indicating shared information, which helps collects and strengthens the useful information shared in the group during dimensionality reduction. This avoids the limitations of selecting components solely based on the variance of eigenvectors, giving preference to “important” variance.

Dimensionality reduction and information enhancement traits

To effectively reduce fMRI data, with such high dimensionality and high information content, it is crucial to find the right balance between information retention and dimension reduction. For the typical dimensionality reduction method, PCA, a high number of dimensions is typically used to retain as much variance as possible to avoid losing information. However, this is not practical when temporal courses are too long. This may potentially lead to overestimation of the number of dimensions and add computation cost, which is especially problematic for large‐scale fMRI data analyses. The simulation results of full‐PCA presented here demonstrate this issue. However, some useful, group‐shared information, with smaller variance may not be retained after PCA dimension reduction, even with a higher dimension selection. Instead, as shown in Figure 3, many irrelevant components are retained by PCA‐based method. The NPE‐based dimensionality reduction method, however, inherits the efficiency of PCA in transforming fMRI data into orthogonal subspace, and adds the unique advantages of selecting dimensions and gathering group information. It shows excellent performance for low SNR fMRI data dimensionality reduction and increase the retention of group‐shared information. Thus, NPE‐based dimension reduction method strikes a proper balance of high‐dimensional reduction efficacy and high information retention. These traits also show specialized advantage in task fMRI data. On the one hand, the task‐related brain region activations are stronger contrast to resting state, which leads to a higher SNR level. On the other hand, the fundamental assumption of commonality let NPE better serves its purpose of collecting group information. After reconstruction with NPE results, the task fMRI data seem to be more “cleaner” because desired common information are preserved while non‐related redundant are discarded during the dimension reduction. Combining above advantages and the comparison results, it is possible to save cost by conducting the task experiment with a short design length (data acquisition time).

Stability and reliability

In addition to being more sensitive, NPE‐based dimensionality reduction method is able to generate more reliable brain network components under different model orders. As shown in the real fMRI data results, when the selected dimension numbers were matched for NPE and PCA methods, the advantage of strengthening useful group‐common information favors NPE in producing more reliable ICA decompositions (more reliable brain network components than PCA in Figures 7 and 8). The quantitative comparison results demonstrated the stability of NPE methods in several measurement metrics like reproducibility, test–retest ability, and consistency. Meanwhile, we used the whole brain mask (volumetric space) instead of gray ordinate mask for the reason that white matters and brain stems are showing the characteristic of high variance, which helps in demonstrating the limitation of variance. Therefore, the reliable components from PCA‐based method are located in white matter, brainstem areas and cerebrospinal fluid. The better performance in the spatial maps of stable components obtained by NPE dimension reduction methods is likely due to the strengthening of group information from NPE. The benefits primarily come for two reasons. First, common components (cortical areas) may have lower variance, and therefore require a higher model order to be preserved in a PCA decomposition. Second, brain networks in the cortex are more vulnerable to noise arising from abrupt head movement, which could lead to a sensitivity to SNR for individual‐level dimension reduction with PCA.

Linear and nonlinear methods

Dimensionality reduction theories and methods are booming for the potential in revealing key insights to brain imaging data via offering the low‐dimensional and tiny representations (Tang, Chen, & Li, 2021). Most methods can be classified into two general types, linear or nonlinear methods. Those linear methods like PCA and canonical correlation analysis (CCA) or nonlinear methods like locally linear embedding (LLE; Roweis & Saul, 2009) and LPP (locality preserving projections, He & Niyogi, 2003) are been applied in fMRI data analysis (Mannfolk et al., 2010; Sui et al., 2010; Tian, Dey, Ashour, McCauley, & Shi, 2018; Tsatsishvili, Cong, Toiviainen, & Ristaniemi, 2015). However, there are certain limitations that principle of CCA, maxing correlation, are too aggressive in eliminating subject‐specific variations, and assumption of nonstationary for LLE and LPP are found to be more efficient in decomposing or representing low‐order features of fMRI data (Gallos, Galaris, & Siettos, 2020; Morioka, Calhoun, & Hyvärinen, 2020). NPE, developed as the linear approximation of LLE and combining with the orthogonal constraint from PCA, is adapted for fMRI based on the correlation analysis and shows to be efficient and promising in reducing temporal domain dimensions. Especially, it serves ICA with a more stable and reliable results.

Limitations

Memory cost: The proposed NPE‐based dimensionality reduction method is computationally challenging for extremely large datasets the same as the conventional PCA method, as they both require loading all the subjects into memory for data reduction. For example, our vivo fMRI datasets cost around 200 GB peak memory for a total of 100 subjects with 1,000 timepoints and 228,483 voxel number per subject when using either method. Thus, it would be a worrisome issue for both NPE and PCA to deal with very large datasets because of the increasing computational expense and memory requirement. With the growing interest to integrate large datasets for neuroscience research, this problem needs to be tackled. To solve this problem, PCA‐based methods have been developed for large‐scale fMRI datasets to get rid of the limitation of subject number (Rachakonda et al., 2016; Smith et al., 2014). We will develop our proposed NPE method into a new version which will be more reliable and reproducible for increasingly‐large datasets in future work. Parameter setting: The dimension reduction efficiency is related to the threshold used for constructing neighborhoods—currently, the threshold is set to get the balance of information retention and dimension reduction in our analyses. However, it is possible to calculate the relation between data reduction efficiency and the threshold, which could provide a more deterministic way to choose the threshold (Supplementary Figure 8) for less dimension number.

CONCLUSION

In this manuscript, we have evaluated the performance of PCA and NPE dimensionality reduction for ICA analyses of fMRI data. This includes multiple methods of threshold selection at the individual level (dimension matched and variance) and several decomposition strategies, such as individual ICA, group ICA, and dual regression. NPE has demonstrated significant benefits relative to PCA for recovering spatial and temporal components of ground truth networks (in simulated data) and generating more reliable and reproducible results (in real fMRI data). Controlled trials with simulated data at different SNR levels demonstrated that NPE can strengthen group information especially in low SNR data, while providing high degrees of dimensionality reduction. This was verified in real fMRI data; structural components extracted from the NPE analyses had higher peak values and larger clusters than those derived from PCA analyses, and cortical and subcortical components were enhanced. Overall, our proposed NPE‐based method shows excellent performance for fMRI data dimensionality reduction. It has the advantage of utilizing and strengthening the group information in both individual and group level, while efficiently rejecting “unimportant” variance and reducing data dimensionality. Our software is available at https://github.com/WeiZhao04/fMRI_NPE.git. There is also a GUI beta version at https://github.com/WeiZhao04/Toolkit_GUI.git.

CONFLICT OF INTEREST

The authors declare no conflict of interest. Appendix S1: Supporting Information Click here for additional data file.

35 in total

1. Probabilistic independent component analysis for functional magnetic resonance imaging.

Authors: Christian F Beckmann; Stephen M Smith
Journal: IEEE Trans Med Imaging Date: 2004-02 Impact factor: 10.048

2. Estimating the number of independent components for functional magnetic resonance imaging data.

Authors: Yi-Ou Li; Tülay Adali; Vince D Calhoun
Journal: Hum Brain Mapp Date: 2007-11 Impact factor: 5.038

3. Nonlinear dimensionality reduction by locally linear inlaying.

Authors: Yuexian Hou; Peng Zhang; Xingxing Xu; Xiaowei Zhang; Wenjie Li
Journal: IEEE Trans Neural Netw Date: 2009-01-13

4. Mapping anterior temporal lobe language areas with fMRI: a multicenter normative study.

Authors: Jeffrey R Binder; William L Gross; Jane B Allendorfer; Leonardo Bonilha; Jessica Chapin; Jonathan C Edwards; Thomas J Grabowski; John T Langfitt; David W Loring; Mark J Lowe; Katherine Koenig; Paul S Morgan; Jeffrey G Ojemann; Christopher Rorden; Jerzy P Szaflarski; Madalina E Tivarus; Kurt E Weaver
Journal: Neuroimage Date: 2010-09-25 Impact factor: 6.556

5. A CCA+ICA based model for multi-task brain imaging data fusion and its application to schizophrenia.

Authors: Jing Sui; Tülay Adali; Godfrey Pearlson; Honghui Yang; Scott R Sponheim; Tonya White; Vince D Calhoun
Journal: Neuroimage Date: 2010-01-28 Impact factor: 6.556

6. Comparison of PCA approaches for very large group ICA.

Authors: Vince D Calhoun; Rogers F Silva; Tülay Adalı; Srinivas Rachakonda
Journal: Neuroimage Date: 2015-05-27 Impact factor: 6.556

7. An efficient functional magnetic resonance imaging data reduction strategy using neighborhood preserving embedding algorithm.

Authors: Wei Zhao; Huanjie Li; Yuxing Hao; Guoqiang Hu; Yunge Zhang; Blaise de B Frederick; Fengyu Cong
Journal: Hum Brain Mapp Date: 2021-12-10 Impact factor: 5.038

8. Group-PCA for very large fMRI datasets.

Authors: Stephen M Smith; Aapo Hyvärinen; Gaël Varoquaux; Karla L Miller; Christian F Beckmann
Journal: Neuroimage Date: 2014-08-03 Impact factor: 6.556

9. Resting-state fMRI in the Human Connectome Project.

Authors: Stephen M Smith; Christian F Beckmann; Jesper Andersson; Edward J Auerbach; Janine Bijsterbosch; Gwenaëlle Douaud; Eugene Duff; David A Feinberg; Ludovica Griffanti; Michael P Harms; Michael Kelly; Timothy Laumann; Karla L Miller; Steen Moeller; Steve Petersen; Jonathan Power; Gholamreza Salimi-Khorshidi; Abraham Z Snyder; An T Vu; Mark W Woolrich; Junqian Xu; Essa Yacoub; Kamil Uğurbil; David C Van Essen; Matthew F Glasser
Journal: Neuroimage Date: 2013-05-20 Impact factor: 6.556

10. The minimal preprocessing pipelines for the Human Connectome Project.

Authors: Matthew F Glasser; Stamatios N Sotiropoulos; J Anthony Wilson; Timothy S Coalson; Bruce Fischl; Jesper L Andersson; Junqian Xu; Saad Jbabdi; Matthew Webster; Jonathan R Polimeni; David C Van Essen; Mark Jenkinson
Journal: Neuroimage Date: 2013-05-11 Impact factor: 6.556

2 in total

1. A Survival Status Classification Model for Osteosarcoma Patients Based on E-CNN-SVM and Multisource Data Fusion.

Authors: Qiang Zhang; Peng Peng; Yi Gu
Journal: Comput Intell Neurosci Date: 2022-07-09

2. An efficient functional magnetic resonance imaging data reduction strategy using neighborhood preserving embedding algorithm.

Authors: Wei Zhao; Huanjie Li; Yuxing Hao; Guoqiang Hu; Yunge Zhang; Blaise de B Frederick; Fengyu Cong
Journal: Hum Brain Mapp Date: 2021-12-10 Impact factor: 5.038

2 in total