Literature DB >> 33127479

Learning Clique Subgraphs in Structural Brain Network Classification with Application to Crystallized Cognition.

Lu Wang¹, Feng Vankee Lin², Martin Cole³, Zhengwu Zhang⁴.

Abstract

Structural brain networks constructed from diffusion MRI are important biomarkers for understanding human brain structure and its relation to cognitive functioning. There is increasing interest in learning differences in structural brain networks between groups of subjects in neuroimaging studies, leading to a variable selection problem in network classification. Traditional methods often use independent edgewise tests or unstructured generalized linear model (GLM) with regularization on vectorized networks to select edges distinguishing the groups, which ignore the network structure and make the results hard to interpret. In this paper, we develop a symmetric bilinear logistic regression (SBLR) with elastic-net penalty to identify a set of small clique subgraphs in network classification. Clique subgraphs, consisting of all the interconnections among a subset of brain regions, have appealing neurological interpretations as they may correspond to some anatomical circuits in the brain related to the outcome. We apply this method to study differences in the structural connectome between adolescents with high and low crystallized cognitive ability, using the crystallized cognition composite score, picture vocabulary and oral reading recognition tests from NIH Toolbox. A few clique subgraphs containing several small sets of brain regions are identified between different levels of functioning, indicating their importance in crystallized cognition.

Entities: Chemical Disease Gene Species

Keywords: Clique subgraphs; Network classification; Signal subgraph learning; Structural brain networks; Symmetric bilinear logistic regression

Mesh：

Year: 2020 PMID： 33127479 PMCID： PMC7826449 DOI： 10.1016/j.neuroimage.2020.117493

Source DB: PubMed Journal: Neuroimage ISSN： 1053-8119 Impact factor: 6.556

Introduction

Recent advances in magnetic resonance imaging (MRI) techniques enable us to noninvasively probe the human brain at higher resolutions than ever before (Glasser et al., 2016) and reconstruct connectomes with distinct physiological meanings (Park and Friston, 2013). Among them, the diffusion MRI (dMRI) infers the locations and directions of white matter (WM) fiber tracts via measuring the water molecular movement along major fiber bundles in WM. Diffusion MRI data are now collected in almost all major cohort-based neuroimaging studies, e.g., the Human Connectome Project (Van Essen et al., 2013), the UK Biobank (Miller et al., 2016) and the Adolescent Brain Cognitive Development (ABCD) Study (Casey et al., 2018). Structural connectivity (SC) analysis is among the most important applications of dMRI (Park and Friston, 2013; Yeh et al., 2016; Zhang et al., 2018; Zhao et al., 2015), where individual-level microstructural brain networks are constructed to delineate anatomical connections between brain regions. Fig. 1 illustrates the pipeline we used for extracting SC (Zhang et al., 2018) (Fig. 1a) and an SC matrix extracted from one subject in the ABCD data (Fig. 1b).

Fig. 1.

(a) shows the pipeline we used to extract the network and (b) shows a structural brain network extracted from the ABCD data.

Brain network classification and identification of predictive subnetworks are probably among the most important applications of SC into the mechanistic understanding of neuroscience phenomena. One typical approach to this network classification and variable selection problem is to treat all the connections of SC as a long feature vector, and apply existing classification methods for vectors, such as generalized linear models with L1 or elastic-net penalty (Zou and Hastie, 2005), support vector machines (Zhu et al., 2004) etc. Another popular approach in the neuroscience literature is to perform massive univariate tests at each edge with multiple testing correction to identify edges that are significantly different between two groups. While these methods may have good prediction, they ignore the network nature of the data and do not guarantee any structure among the selected individual edges, making the results hard to interpret. Other feature extraction approaches often employ a two-step procedure, where some unsupervised dimension reduction is first applied on the networks, and then a regression model is fitted on the extracted low-dimensional features. Zhang et al. (2019) proposed to use tensor decomposition approach to map networks to low-dimensional vectors. Simpson et al. (2011) proposed to reduce the networks to some topological summary measures such as clustering coefficient, network density, etc. However, such connectome simplification leads to an enormous loss of information and brings troubles in truly understanding which part of the brain network is responsible for the group difference. There are a few advanced statistical methods considering the network structure when analyzing group differences in connectome. Vogelstein et al. (2012) proposed to search for a minimum set of vertices and edges distinguishing groups, but this method only applies to binary edges and involves solving a combinatorial optimization problem. Durante et al. (2018) developed a method for testing global and local changes in SC between groups, which gains power by accounting for dependence structure among edges through a Bayesian nonparametric modeling of the networks. Arroyo Relión et al. (2019) proposed a graph classification method that uses edge weights as predictors and incorporates 1 and group lasso penalty to penalize both the number of edges and the number of nodes selected. We developed a symmetric bilinear logistic regression (SBLR) model with elastic-net penalty to select a set of small clique signal subgraphs in network classification. A clique subgraph in graph theory refers to a subset of nodes of an undirected graph such that every two different nodes in the clique are connected (Seidman and Foster, 1978). SBLR puts symmetry constraint on the coefficient matrix in the logistic regression, because the adjacency matrices of structural brain networks are symmetric. The novelties and significance of our approach reside on: The small clique subgraphs identified by our method have appealing interpretations in neuroscience field. Clique subgraphs (containing all the interconnections among a subset of nodes) potentially aligns with the physiological meaning of subgraphs that should be strongly interrelated in order to provide the most efficient neural support of a behavior (Bassett and Sporns, 2017). Despite the clique structure imposed on the selected subgraphs, our model maintains the flexibility of identifying individual edges in network classification and good classification rate. The elastic-net penalty penalizes the size of each identified clique subgraph, producing a parsimonious representation of differences in brain connectome. We develop a coordinate descent algorithm for model estimation where analytical solutions are derived for a sequence of conditional convex optimizations. The code for implementing the algorithm is publicly available at https://github.com/wangronglu/SBLR. The SBLR approach is applied to examine structural network classification for crystallized intelligence among 4213 right-handed adolescents aged 9-10 years in the ABCD study. Emerging literature suggests unique roles of white matter in supporting general cognition and differentiating crystallized and fluid intelligence (Góngora et al., 2020; Penke et al., 2012; Simpson-Kent et al., 2020). Of note, age 9-10 represents a critical change point for the relationship between white matter and crystallized intelligence (Simpson-Kent et al., 2020). Thus far, however, no study has specified the role of SC in crystallized intelligence of adolescents. We extensively analyzed SCs of subjects in ABCD with high vs. low crystallized intelligence using our SBLR model, aiming at learning interpretable SC-based brain connectome maps that can differentiate the levels of crystallized intelligence. The rest of the paper is organized as follows. Section 2 introduces the data preprocessing steps, our method and the algorithm for model estimation. Section 3 presents simulation studies demonstrating good performance of SBLR in recovering true clique signal subgraphs compared with other methods. Application of this method to ABCD data in Section 4 shows coherent subgraphs of crystallized intelligence across composite score and individual domains. We conclude with a brief discussion in Section 5.

Methods

ABCD data preprocessing

We focus on the ABCD dataset downloaded from NIH Data Archive (https://nda.nih.gov) (Casey et al., 2018). The main goal of the ABCD study is to track the brain development from childhood through adolescence to understand biological and environmental factors that can affect the brain’s developmental trajectory. ABCD recruits approximately 10,000 9 – 10 years old children. Longitudinal measures of the brain structure and function as well as rich behavior measures and genetic factors are collected across 21 sites in the United States (Auchter et al., 2018).

Imaging data and preprocessing:

The ABCD imaging protocol is harmonized for three types of 3T scanners: Siemens Prisma, General Electric (GE) 750 and Philips. We downloaded the structural T1 MRI and diffusion MRI (dMRI) data for 5253 subjects from the ABCD 2.0 release. The structural T1 image was acquired with isotropic resolution of 1 mm3. The diffusion MRI image was obtained based on imaging parameters: 1.7 mm3 resolution, four different b-values (b = 500, 1000, 2000, 3000) and 96 diffusion directions. There are 6 directions at b = 500, 15 directions at b = 1000, 15 directions at b = 2000, 60 directions at b = 3000, Multiband factor 3 is used for dMRI acceleration. See Casey et al. (2018) for more details about the data acquisition and preprocessing of the ABCD data. To obtain structural connectome, we used a state-of-the art dMRI data preprocessing framework named population-based structural connectome (PSC) mapping (Zhang et al., 2018). PSC uses a reproducible probabilistic tractography algorithm (Girard et al., 2014; Maier-Hein et al., 2017) to generate the whole-brain tractography. This tractography method borrows anatomical information from high-resolution T1-weighted imaging to reduce bias in reconstruction of tractography. On average, about 106 streamlines were generated for each subject. We used the popular Desikan–Killiany atlas (Desikan et al., 2006) to define the brain regions of interest (ROIs) corresponding to the nodes in the structural connectivity network. The Desikan–Killiany parcellation has 68 cortical surface regions with 34 nodes in each hemisphere. Freesurfer software (Dale et al., 1999; Fischl et al., 2004) is used to perform brain registration and parcellation. Next, for each pair of ROIs, we extracted the streamlines connecting them. In this process several procedures were used to increase the reproducibility: (1) each gray matter ROI is dilated to include a small portion of white matter region, (2) streamlines connecting multiple ROIs are cut into pieces so that we can extract the correct and complete pathway and (3) apparent outlier streamlines are removed. Extensive experiments have illustrated that these procedures can significantly improve the reproducibility of the extracted weighted networks, and readers can refer to Zhang et al. (2018) for more details. To analyze the brain as a network, a scalar number is usually extracted to summarize each connection. Here we use fiber count, but other measures, such as mean fractional anisotropy or connected surface area (Zhang et al., 2018) can be easily extracted using PSC. Applying PSC, we processed 5253 subjects downloaded from NIH Data Archive. To control for handedness, we only focused on the 4213 right-handed subjects in our analysis. The distributions of the age, crystallized cognition composite score, picture vocabulary score and reading score are shown in Fig. 2. For each composite or domain-specific crystallized intelligence score, subsets of subjects with age-adjusted scale scores 1 standard deviation above or below the national average are categorized into high vs. low crystallized intelligence group.

Fig. 2.

Histograms of ages, age-corrected crystallized cognition composite scores, picture vocabulary scores and reading scores of the 4213 subjects involved in this study.

Symmetric bilinear logistic regression (SBLR)

Our data can be summarized as {(y) : i = 1, …, n}, where y is a binary response; contains the regular covariates of subject i (age, gender, etc.) with the first entry being 1; denotes the structural brain network measured for subject i, which is a × symmetric matrix with zero diagonal entries. Our goal is to learn a set of small clique subgraphs from the brain network that are relevant to the outcome. With this aim in mind, we propose the following symmetric bilinear logistic regression (SBLR): where with the first entry being the intercept; and , h = 1, …, . We do not restrict the component vectors ’s to be orthogonal because such constraint may discourage the sparsity of these vectors. Model (1) assumes that the binary outcome y of each individual follows an independent Bernoulli distribution given the network observation and other covariates . The coefficients of the network predictor in model (1) are assumed to have components, where each component matrix selects a signal subgraph. For ease of interpretation, the logit link of (1) can be written in the matrix dot product form: where 〈〉 = trace(⊺) = vec()⊺ vec(). The parameters ’s in (2) are necessary to avoid constraining the coefficient matrix of to be positive semi-definite. From (2), we can see that the nonzero entries in each component matrix locate an outcome-relevant clique subgraph in the network predictor. If only contains two nonzero entries, the corresponding subgraph comes down to a single edge. To ensure both identifiability of model (2) and sparsity of coefficient matrices , we penalize the magnitude of the lower-triangular entries in these coefficient matrices with the following elastic-net penalty: where the overall penalty factor δ > 0 and η ∈ [0, 1] controlling the fraction of 1 regularization.

Estimation

The parameters in SBLR model (1) are estimated by minimizing the loss function below: where ll is the log-likelihood of subject i. Plugging in the logit link of (1), we have where represents the normalized network observation with mean 0 and variance 1 for each edge. The algorithm of block updating each component vector in tensor regression (Zhou et al., 2013) is not applicable to minimizing the loss function (4), because ll is not a concave function of when fixing the other parameters. Notice that which may not be negative semi-definite. However, since the diagonal entries of each adjacency matrix are zero, the loss function (4) is actually a convex function of each entry β in when fixing the others, making the coordinate descent algorithm very appealing (Wang et al., 2019). The technical details of deriving the analytic form update for each parameter are discussed in Appendix A. The coordinate descent algorithm for minimizing (4) is summarized in Algorithm 1, where all the parameters are iteratively updated until the relative change of the loss function (4) is smaller than a tolerance number ϵ. Typical value for ϵ is 10−5 or 10−6. Since the loss function (4) is lower bounded by 0 and each update always decreases the function value, Algorithm 1 is guaranteed to converge.

Algorithm 1

Coordinate descent for SBLR model with elastic-net penalty.

1:	Input: Normalized V × V symmetric matrix observations {W~i:i=1,…,n}, n × m design matrix X = (x₁, …, x_n)^⊺, binary outcome {y_i : i = 1, …, n}; rank K, overall penalty factor δ > 0, L₁ fractional penalty factor η ∈ [0, 1], tolerance ϵ > 0.
2:	Output: Estimates of α, {(λ_h, β_h) : h = 1, …, K}.
3:	Initialize: α = 0 and each parameter of {{λ_h, β_h) : h = 1, …, K} ~ U(−0.1, 0.1).
4:	repeat
5:	for h = 1 : K do
6:	for u = 1 : V do
7:	update β_hu by (A.11)
8:	end for
9:	end for
10:	for h = 1 : K do
11:	update λ_h by (A.15)
12:	end for
13:	update α by (A.16)
14:	until relative change of loss function (??) < ϵ.

Coordinate descent for SBLR model with elastic-net penalty. In general, the algorithm should be run from multiple initializations to locate a good local solution. Although the estimates for {() : h = 1, …, } will all be zero under sufficiently large penalty factor δ, we cannot initialize them at zero because the results will then get stuck at zero. The updating rules (A.11) and (A.15) imply that each parameter will be set to 0 given others being zero. In fact, we recommend to initialize all the parameters in {() : h = 1, …, } to be nonzero in case some components get degenerated unexpectedly at the beginning. In practice, we initialize each parameter in {() : h = 1, …, } from the uniform distribution (−0.1, 0.1) as discussed at the end of Section A.1.

Model selection

The penalty factor δ in the regularization (3) can be tuned on a validation set over a grid of values on [δmin, δmax] for a fixed η, where δmax is a roughly smallest value for which all the parameters become zero, and δmin = εδmax (e.g. ε = 0.01). The optimal δ produces the smallest deviance (minus twice the log-likelihood) on validation set. The fractional parameter of 1 penalty, η ∈ [0, 1], can also be selected by validation. Our proposed model (1) assumes a known number of components . In practice, we choose a large enough number for and allow the elastic-net penalty (3) to discard unnecessary components, leading to a data-driven estimate for the number of signal subgraphs. We assess the performance of the procedure and verify its lack of sensitivity to the chosen upper bound in simulations.

Simulation study

We use simulations to compare the performance of recovering true signal subgraphs and predictive performance among SBLR and the following methods: Logistic regression with elastic-net penalty on vectorized networks, where the upper triangular entries of each adjacency matrix are entered into a regularized logistic regression. The grid of values for the 1 fractional penalty factor α is picked as {0.1, 0.2, …, 1}. For each α, the optimal penalty factor is chosen from a sequence of 100 equally spaced values on the logarithmic scale. This method is fitted with glmnet toolbox in Matlab (http://www.stanford.edu/~hastie/glmnet_matlab). Penalized graph classification (GC) approach (Arroyo Relión et al., 2019), which also uses edge weights as predictors, but incorporates 1 and group lasso penalty to promote sparsity in selection of edges and nodes. Their penalty factor pair () is tuned over a 5 × 11 grid, where ∈ {10−4, 10−3, …, 1} × max and ρ ∈ {1, 10, 20, …, 100}, with (max, ρ = 100) ensuring that all the coefficients are penalized to zero. As Arroyo Relión et al. (2019) suggest, values of ρ < 1 do not result in node selection. This method is fitted with graphclass package in R. Screening method based on multiple testing with false discovery rate control (MT-FDR), where a two-sample t-test is performed on each edge in the network between the two groups. Multiple comparisons are adjusted by rejecting all local nulls having a p-value below the Benjamini-Yekutieli threshold (Benjamini et al., 2001) under arbitrary dependence assumptions on the multiple tests. The false discovery rate is controlled at level α = 0.05. A logistic regression is then fitted on the significant edge weights. Tensor network factorization analysis (TNFA) (Zhang et al., 2019), which embeds the × symmetric adjacency matrices { : i = 1, …, n} into a low dimensional n × matrix U ( ≤ ). Each row i of contains the principal component scores of subject i, and each column of corresponds to a basis network. A logistic regression is then fitted on the low embedding matrix . The basis networks corresponding to the significant coefficients are selected as signal subnetworks. We use full rank = for TNFA in simulations, which explains around 99% variation in the generated networks on average. In the data generating process, each adjacency matrix is a 20 × 20 symmetric matrix generated from a set of 15 basis subgraphs with individual loading vectors where ∈ {0, 1}20 is a random binary vector with ‖‖0 = h + 1, h = 1, 2, …, 15. The loadings {} in (6) are generated independently from (0, 1). Δ is a symmetric 20 × 20 noise matrix that adds 5% noise to each edge in . Specifically, the (u, v)-th entry in Δ (denoted as Δ) is sampled from a normal distribution with mean zero and standard deviation . The generating process (6) produces dense networks with complex correlation structure. These generated adjacency matrices {i : i = 1, …, n} are further standardized for each edge across subjects and the diagonals are set to zero.

Clique signal subgraphs

In this setting, the binary response y is associated with 3 clique signal subgraphs in the network. Specifically, y is generated from Bernoull(p) with The generating process (7) indicates that the true signal subgraphs relevant to y correspond to the first three basis subgraphs as displayed in Fig. 3.

Fig. 3.

True signal subgraphs (lower panel) corresponding to (upper panel) in one simulation.

We consider two sample sizes in this simulation: n = 500 and n = 1000. For each sample size n, a dataset {() : i = 1, …, n} is drawn from the generating process (6),(7). The dataset is then divided into two parts: training set (70%) and validation set (30%). Each method (SBLR, glmnet, GC) is fitted with training set and the optimal penalty pair is selected corresponding to the lowest deviance on validation set. We then refit the model with full dataset under the optimal penalty pair. For each dataset, we additionally generate 200 pairs {()} as test data and measure the predictive performance of each method by computing the area under an ROC curve (AUC) (DeLong et al., 1988) for test data. Some input parameters of Algorithm 1 for SBLR model are set as follows. The tolerance ϵ = 10−6. The 1 fractional penalty factor η is tuned over 5 fixed values {0.6, 0.7, …, 1}. For each value of η, we set δmin = 0.01δmax and choose a sequence of 11 equally spaced δ values on the logarithmic scale. 5 initializations are used in Algorithm 1. Performances under two different choices of are compared for SBLR: = 5 and = 10. The estimated results under optimal penalty pair from glmnet are displayed in Fig. 4. As can be seen, the accuracy of glmnet improves as the sample size n increases, in terms of selecting more true signal edges and fewer non-signal edges. But it is difficult to identify meaningful structure among the selected edges. Fig. 5 displays the estimated results under optimal penalty pair for the GC approach (Arroyo Relión et al., 2019). Although this approach identifies all the true signal edges under larger sample size, it also selects a larger number of false edges. As with glmnet, this approach does not guarantee any structure among the selected edges.

Fig. 4.

Estimated results of glmnet under n = 500 and n = 1000. In each panel, the left plot shows estimated coefficients (lower triangular) versus true values (upper triangular); the right plot shows the selected edges in the network, where black edges denote true signal edges and red ones falsely identified edges; the thickness of each edge is proportional to the magnitude of its estimated coefficient.

Fig. 5.

Estimated results of penalized graph classification (GC) approach under n = 500 and n = 1000. In each panel, the left plot shows estimated coefficients (lower triangular) versus true values (upper triangular); the right plot shows the selected edges in the network, where black edges denote true signal edges and red ones falsely identified edges; the thickness of each edge is proportional to the magnitude of its estimated coefficient.

Multiple t-test screening method (MT-FDR) deems many more edges to be significant in this case, taking up about 86% and 95% of the total number of edges in the network under n = 500 and n = 1000. respectively. Tensor factorization approach (TNFA) on the contrary selects no basis networks under each sample size, because none of the 20 components are significant in the logistic regression at the significance level of 0.05. The estimated results under the optimal penalty pair of SBLR with = 5 and = 10 are displayed in Fig. 6. The performance of SBLR improves with larger sample size. Compared to Fig. 3, Fig. 6 shows that SBLR recovers all the true signal subgraphs under n = 1000 with fewer wrong edges compared to n = 500. The cliques identified by SBLR may change under different due to random initialization for component vectors. As can be seen in Fig. 6, the selected subgraphs are different yet with many overlaps under the sample size n = 500 between = 5 and K = 10. The results become more stable under the larger sample size n = 1000. where the node memberships of the selected cliques are identical between = 5 and = 10 with very slight differences in the estimated coefficients.

Fig. 6.

Estimated nonzero coefficient components (left) of SBLR and their selected subgraphs (right) under K = 5 and K = 10 for different sample sizes n = 500 and n = 1000, respectively. Black edges denote true signal edges and red ones falsely identified edges; the thickness of each edge is proportional to the magnitude of its estimated coefficient.

The procedure above is repeated 100 times. For either sample size n, 100 datasets are generated based on the generating process (6),(7). We record for each method the true positive rate (TPR) representing the proportion of true signal edges that are correctly identified, the false positive rate (FPR) representing the proportion of non-signal edges that are falsely identified, and the false discovery rate (FDR) describing the proportion of selected edges that are false, as well as AUC for test data. Fig. 7 displays the mean and standard deviation (SD) of the TPR, FPR, FDR and AUC for each method across 100 datasets. Note that higher values for TPR and AUC are better, and lower scores of FPR and FDR are better.

Fig. 7.

Mean (bar labels) and standard deviation (error bars) of TPR, FPR, FDR and AUC across 100 simulations under two sample sizes (n = 500 and n = 1000). Glmlasso is logistic regression with L1 penalty; SBLR-L1 indicates that only L1 penalty (η = 1) is applied for SBLR.

Fig. 7 shows that the mean and SD of the four measures are very similar for SBLR under = 5 and = 10. with the same type of penalty (1 or elastic-net) and the same sample size. This implies that the performance of SBLR is robust to the chosen upper bound for the rank. In addition, Fig. 7 shows that the mean TPR and AUC improve with larger sample size for all the methods. But with increasing sample size, the mean FPR and FDR of GC, multiple testing (MT-FDR) and tensor factorization (TNFA) increase considerably. The mean FPR of GC and MT-FDR are also much higher than that of the other methods. The FDR of all the methods excluding TNFA are around or above 0.5 on average, while the mean FDR of glmlasso, glmnet and SBLR models decrease as the sample size increases. The high FDR is probably due to the complex correlations among edge weights in the simulated networks, which is often the case for structural brain networks. The correlations among edges add difficulty to variable selection as false edges correlated with true signal edges are likely to be selected. It is a challenging problem to further control the FDR here, as the MT-FDR, which is supposed to keep the FDR below 0.05, turns out to have a FDR around 0.9. All the methods except for MT-FDR have competitive predictive performance in terms of similar AUC for test data. Glmlasso has the lowest mean FPR but also low mean TPR. SBLR models with either 1 or elastic-net penalty achieve higher TPR and lower FPR on average than glmnet does under either sample size. All the numerical experiments are conducted on a machine with six Intel Core i7 3.2 GHz processors and 64 GB of RAM. Algorithm 1 of SBLR method is implemented in Matlab (R2018a). Under the sample size n = 1000, SBLR with = 10 takes 99.6 s on average to run a validation instance over the 5 × 11 grid under 5 initializations, glmnet takes 21 seconds and GC approach takes 130.4 s on average.

Non-clique signal subgraphs

Clique subgraphs have appealing interpretations but could be a restrictive assumption. We evaluate the performance of SBLR when such assumption is not true by associating the outcome y with non-clique signal subgraphs. Specifically, y is related to a 4-node ring graph and a 7-node star graph as displayed in Fig. 8. The nodes in the two subgraphs correspond to the basis vectors 3 and 6 in (6), respectively. In this setting, the total number of signal edges and their coefficients are the same as in Section 3.1. The sample size is set at n = 1000 and SBLR is estimated under = 10. The tuning method and other settings are maintained the same for SBLR, MT-FDR and TNFA as in Section 3.1. Of note, we use 10-fold cross validation combined with the “one-standard-error” rule (Hastie et al., 2009) for glmlasso, glmnet and GC to further encourage sparsity for these methods. Under the one-standard-error rule, the optimal tuning parameters are selected corresponding to the most parsimonious model whose mean cross-validated deviance is within one standard-error of the minimum.

Fig. 8.

True signal subgraphs in one simulation: a 4-node ring graph and a 7-node star graph.

For a dataset simulated based on the signal subgraphs in Fig. 8, the estimated results of glmnet and GC under the optimal penalty factors are displayed in Fig. 9. Glmnet and GC select fewer edges under the finer tuning approach, but the results can still be difficult to interpret. MT-FDR declares 92.11% of the total number of edges significant in this case, while TNFA selects no signal networks.

Fig. 9.

Estimated results of glmnet (left) and GC (right) under the optimal penalty factors. In each panel, the left plot shows estimated coefficients (lower triangular) versus true values (upper triangular); the right plot shows the selected edges in the network, where black edges denote true signal edges and red ones falsely identified edges; the thickness of each edge is proportional to the magnitude of its estimated coefficient.

SBLR selects 5 subgraphs under the optimal penalty pair as shown in Fig. 10. The first subgraph corresponding to in Fig. 10 partially recovers the star graph in Fig. 8, where the hub structure with Node 1 as the central node is evident. In specific, the first entry of has much larger magnitude than the other nonzero entries corresponding to the other 4 nodes in this subgraph. Therefore the edges from Node 1 to the other 4 nodes have much larger coefficients than other edges do in this subgraph, as shown in the estimated coefficient matrix in Fig. 10. The second subgraph corresponding to in Fig. 10 contains the ring graph in Fig. 8 with two wrong edges due to the clique constraint. But SBLR is still useful for detecting the set of nodes for this signal subgraph.

Fig. 10.

Estimated nonzero coefficient matrices (upper) of SBLR and their selected subgraphs (bottom). Black edges denote true signal edges and red ones falsely identified edges; the thickness of each edge is proportional to the magnitude of its estimated coefficient.

Fig. 11 displays the mean and SD of the TPR, FPR, FDR in edge selection and AUC on test data for each method across 100 datasets simulated under n = 1000 in this setting. Compared to Fig. 7, the performance of SBLR decreases as expected when the clique assumption is not true, in terms of lower TPR, higher FPR and higher FDR on average under the same sample size with either or elastic-net penalty. The mean FPR and FDR of glmlasso, glmnet and GC decrease considerably under the one-standard-error rule compared to Fig. 7, but their mean TPRs also decrease, especially for glmlasso and GC. In addition, the 10-fold cross validation is more time-consuming than the validation tuning in Section 3.1. The glmnet now takes 204 seconds on average to run a cross-validation instance and GC takes about 25 minutes on average, while SBLR takes 106.2 seconds on average to run a validation instance in this case.

Fig. 11.

Mean (bar labels) and standard deviation (error bars) of TPR, FPR, FDR and AUC across 100 simulations under n = 1000. Glmlassois logistic regression with L1 penalty; SBLR-L1 indicates that only L1 penalty (η = 1) is applied for SBLR (K = 10).

Application: SC subgraphs distinguishing high and low crystallized intelligence of adolescents

We apply our method to the ABCD dataset described in Section 2.1 to examine the associations between structural brain network and crystallized intelligence for adolecents aged 9–10 years. NIH toolbox measures crystallized intelligence through two core tests: Picture Vocabulary test and Oral Reading Recognition test. They also provide a composite score, Crystallized Cognition Composite, to allow for evaluation of overall crystallized intelligence. SBLR model is applied to differentiate high from low crystallized intelligence for each composite or domain-specific score, as well as identifying signal subgraphs contributing to the classification. Some input parameters of Algorithm 1 for SBLR are set below in this section. The fractional penalty factor η is set to 1 to encourage sparsity in the results, as the AUC on test data is not sensitive to different values of η in this case. We set = 20, which is larger than the number of selected subgraphs. The tolerance ϵ = 10−5 and 5 initializations are used in Algorithm 1. SBLR is trained over a sequence of 11 equally spaced δ values on the logarithmic scale with δmin = 0.1δmax. We also compare to glmlasso, penalized graph classification (GC), multiple testing screening (MT-FDR) and tensor factorization analysis (TNFA) on variable selection and predictive performance. Full rank = 68 is set in TNFA, which explains around 75% of the variation in the brain networks. We employ stability selection (Meinshausen and Bühlmann, 2010; Shah and Samworth, 2013) to enhance significance of the selected variables (subgraphs or individual edges) for each method, which involves many rounds of random data splitting and keeping the variables with high selection probability. This approach reduces the variations in the results caused by single data split and random initialization. Specifically, the dataset for each cognitive measure is divided into 3 parts: training (60%), validation (20%) and test set (20%). SBLR, glmlasso and GC are fitted with training set and the optimal penalty factor(s) is tuned by validation set. The grid settings for tuning glmlasso and GC are the same as in the simulation study. MT-FDR and TNFA are fitted with training and validation sets. We record the selected variables for each method (under the optimal penalty factor) as well as AUC for test data across 30 rounds of random data splitting, and calculate the probability of each variable being selected. Directly counting the frequency of each identified subgraph of SBLR across multiple data splitting involves massive permutations. We actually took advantage of the special structure in the extracted subgraphs when counting their frequencies in the stability selection. Since the identified subgraphs of SBLR from ABCD data often display hub structure, we labeled each selected subgraph with the central node whose entry has the largest magnitude in the corresponding component vector . Then we aligned the identified subgraphs (component vectors) with the same central node across 30 data splits to find the largest common intersection of these subgraphs that appear frequently, for example more than 50% of the time, which is still a clique.

Picture vocabulary

The Picture Vocabulary test uses an audio recording of words, presented with four photographic images on the computer screen. The participants are asked to select the picture that best matches the meaning of the word. Using the 1 standard deviation rule proposed in Section 2.1, we obtain a dataset containing 1034 kids with high picture vocabulary scores and 282 low scores. SBLR on average selects 9.6 nonzero subgraphs across 30 data splits with a standard deviation of 4.0. Summarizing the results from stability selection, the subgraphs with more than two ROIs and selection probabilities greater than 0.5 are displayed in Fig. 12. Fig. 12 also displays individual connections selected by glmlasso, GC and MT-FDR with probabilities greater than 0.5. As can be seen, the nonzero coefficients estimated by glmlasso and GC are quite similar while GC selects more predictive connections. But the results of these methods generally lack meaningful structure and are difficult to interpret neurologically. 12 components remain significant in TNFA more than 50% of the time, while their corresponding basis networks involve all the connections in the brain network. The average AUCs on test data across 30 splits of the data are all around 0.78 for SBLR, glmlasso and TNFA with a standard deviation around 0.03. GC achieves a bit higher AUC of 0.79 ± 0.03, while that of MT-FDR is 0.72 ± 0.04.

Fig. 12.

Results of Picture Vocabulary: (a) connections selected by glmlasso with selection probabilities > 0.5 across 30 rounds of random data splitting; (b) connections selected by GC with selection probabilities > 0.5; (c) connections selected by MT-FDR with selection probabilities > 0.5; (d) - (f) subgraphs (≥ 3 ROIs) selected by SBLR with selection probabilities > 0.5. The subgraphs are arranged in the descending order of selection frequency. The thickness of each edge is proportional to the magnitude of its mean estimated coefficient; the color goes from blue to red as the coefficient goes from negative to positive.

The three subgraphs identified by SBLR in Fig. 12 all seem to have hub structure, with the hub nodes being 26r (right rostral middle frontal), 34r (right insula) and 28r (right superior parietal) respectively. Plot (d) of Fig. 12 implies that right-handed children with stronger neural connections among 26r (right rostral middle frontal), 27r (right superior frontal) and 2r (right caudal anterior cingulate), and weaker connections among 26r, 29r (right superior temporal) and 30r (right supramarginal) are more likely to get high scores on Picture Vocabulary test. Plot (e) of Fig. 12 implies that right-handed children with stronger neural connections among 34r (right insula), 21r (right postcentral) and 1r (right bankssts), and weaker connections among 34r, 14r (right middle temporal) and 8r (right inferior temporal) are more likely to get high scores in Picture Vocabulary test. Plot (f) of Fig. 12 implies that right-handed children with stronger neural connection between 28r (right superior parietal) and 30r (right supramarginal), and weaker connection between 28r (right superior parietal) and 34l (left insula) are more likely to achieve high ability of this cognition measure. Some of these brain regions agree with the findings in neuroscience literature, for example, right superior parietal gyrus, right supramarginal gyrus and left insula are among the activated regions in auditory language processing tasks for children (Oh et al., 2014; Sugiura et al., 2011; Vogan et al., 2016).

Reading recognition

Participants on this test are asked to read and pronounce letters and words as accurately as possible. Applying the 1 standard deviation rule proposed in Section 2.1, we construct a dataset containing 918 subjects with high Reading Recognition scores and 477 subjects with low scores. SBLR on average selects 9.3 ± 4.2 nonzero subgraphs across 30 splits of the dataset. Fig. 13 displays the subgraphs identified by SBLR with selection probabilities > 0.5. along with connections selected by glmlasso, GC and MT-FDR with probabilities > 0.5. The estimated nonzero coefficients of glmlasso and GC still look similar, but GC on average selects almost double the connections as that of glmlasso. Four components remain significant in TNFA more than 50% of the time, which correspond to all the connections in the brain network. GC on average achieves the highest AUC for test data of 0.73 ± 0.02 across 30 splits. The AUCs of the rest methods are all around 0.70 ± 0.03 in this case.

Fig. 13.

Results of Reading Recognition: (a) connections selected by glmlasso with selection probabilities > 0.5 across 30 splits of the dataset; (b) connections selected by GC with selection probabilities > 0.5; (c) connections selected by MT-FDR with selection probabilities > 0.5; (d) - (f) subgraphs (≥ 3 ROIs) selected by SBLR with selection probabilities > 0.5. The subgraphs are arranged in the descending order of selection frequency. The thickness of each edge is proportional to the magnitude of its mean estimated coefficient; the color goes from blue to red as the coefficient goes from negative to positive.

The subgraphs in Fig. 13 located by SBLR display hub structure again and share many similarities to those associated with Picture Vocabulary. Compared to Fig. 12, Reading Recognition measure has two hub nodes in common with Picture Vocabulary: 26r (right rostral middle frontal) and 28r (right superior parietal). And Plot (f) of Fig. 12 is a subgraph of Plot (f) in Fig. 13. The latter has two extra ROIs: 21r (right postcentral) and 27l (left superior frontal), and stronger connections among the two nodes and the hub node (28r) have positive effect on the cognitive functioning of Reading Recognition. Plot (d) of Fig. 13implies that right-handed children with stronger neural connections among 17r (right pars opercularis), 26r (right rostral middle frontal) and 31r (right frontal pole) are more likely to get high scores on Reading Recognition test, while the connection strengths among 26r, 32r (right temporal pole) and 34r (right insula) may have negative effect on this cognitive functioning. Plot (e) of Fig. 13 implies that stronger neural connections among 17l (left pars opercularis), 27r (right superior frontal) and 2r (right caudal anterior cingulate) have positive effect on this cognitive functioning, while the connection strength between 27r and 9r (right isthmus cingulate) may have a slight negative effect.

Crystallized cognition composite

Crystallized Cognition Composite can be interpreted as a global assessment of verbal reasoning. This composite score is derived by averaging the normalized scores of Picture Vocabulary and Reading Recognition test, and the age-adjusted scale scores are calculated based on this new distribution (Weintraub et al., 2013). Under the 1 standard deviation rule, we identify 1084 high crystallized cognitive cases and 536 low crystallized cognitive cases. SBLR on average selects 12.3 ± 3.5 nonzero subgraphs across 30 splits of the data. Fig. 14 displays the subgraphs identified by SBLR with selection probabilities > 0.5, along with individual connections selected by glmlasso and MT-FDR with probabilities > 0.5. GC selects far more connections on average than glmlasso or MT-FDR does in this case, with almost two times that of MT-FDR, and hence its results are not displayed. 9 components maintain significant in TNFA more than 50% of the time, which again corresponds to all the connections in the brain network. The AUCs on test data across 30 splits are around 0.76 ± 0.02 for SBLR, glmlasso and TNFA, 0.78 ± 0.02 for GC, and 0.71 ± 0.03 for MT-FDR.

Fig. 14.

Results of Crystallized Cognition Composite: (a) connections selected by glmlasso with selection probabilities > 0.5 across 30 splits of the data; (b) connections selected by MT-FDR with probabilities > 0.5; (c) - (f) subgraphs (≥ 3 ROIs) selected by SBLR with probabilities > 0.5. The subgraphs are arranged in the descending order of selection frequency. The thickness of each edge is proportional to the magnitude of its mean estimated coefficient; the color goes from blue to red as the coefficient goes from negative to positive.

The subgraphs identified by SBLR in Fig. 14 predictive of high and low crystallized cognitive ability have many overlaps with the selected subgraphs associated with Picture Vocabulary and Reading Recognition. The subgraph in Plot (c) of Fig. 14 seems like a “composite” of the subgraphs in Plot (d)’s of both Figs. 12 and 13. Plot (f) of Fig. 14 echoes Plot (f)’s in both Figs. 12 and 13. The subgraph in Plot (e) of Fig. 14 overlaps with that in Plot (e) of Fig. 13. Plot (e) of Fig. 14 implies that righthanded children with stronger neural connections among 17l (left pars opercularis), 13r (right medial orbitofrontal), 27r (right superior frontal) and 2r (right caudal anterior cingulate) are more likely to have high level of functioning on crystallized cognition. Plot (d) of Fig. 14 implies that stronger connection between 31r (right frontal pole) and 23r (right precentral) has positive effect on crystallized cognitive functioning, while the connection strength between 31r and 20l (left pericalcarine) may have negative effect.

Discussion

We have presented a useful tool for studying differences in brain connectivity patterns between groups, which produces more interpretable results than unstructured classifiers do, while maintaining competitive predictive performance. Simulation studies show that SBLR can successfully identify true signal subgraphs with high TPR and low FPR when the model assumption is true, and its selection accuracy could generally be improved by increasing the number of subjects in the data. In applications to crystallized cognition data, although penalized graph classification method (Arroyo Relión et al., 2019) obtains a bit higher predictive performance than SBLR, our method contributes to a more insightful understanding of the sub-structure of brain connectome related to crystallized intelligence, with discovery of a sequence of sub-networks with hub structure as well as the leading brain regions in these subgraphs. SBLR is theoretically able to represent any coefficients in a generalized linear regression with sufficient number of components (one edge per component). But the regularization terms encourage small and fewer cliques. Simulation study shows that SBLR tends to introduce redundant edges when true signal subgraphs have sparse structure. But SBLR is still useful for capturing the hub structure of a star signal subgraph and detecting the sets of nodes for predictive subgraphs. In practice, we can evaluate the out-of-sample predictive performance of SBLR to determine whether the model assumption is reasonable. High prediction accuracy could justify the model assumption to the extent that putting the structured constraints on the coefficients makes the unstructured model less overfitting. We suggest to use the proposed model when the number of selected subgraphs is small and the out-of-sample predictive performance is comparable to or better than glmnet or GC. Ideally, we could have statistical tests for each coefficient in the subgraphs, but testing the significance of the coefficients in SBLR is a challenging open problem due to the bilinear framework and inclusion of the penalty terms (Xia et al., 2020). There are other ways to define subgraphs, for example, Vogelstein et al. (2012) defined a subgraph as a minimum set of vertices and edges distinguishing groups; Khambhati et al. (2018) constructed subgraphs based an unsupervised non-negative matrix factorization, which is similar to Zhang et al. (2019); Chen et al. (2020) detected and tested altered brain connectivity networks with k-partite graph topological structure; mutually exclusive subgraphs may capture the modular structure of brain connectome. More flexible ways of defining subgraphs with more interesting network topological properties can be explored in the future. Although SBLR is motivated from structural connectivity analysis, the method can certainly be used in functional brain connectivity analysis to identify signal subgraphs in the functional brain network related to a binary outcome, after constructing the functional connectivity matrices based on correlations of BOLD time series data of pairwise brain regions. In addition, it is an interesting future direction to consider how to aggregate community-level or lobe-level information into SBLR to improve the selection accuracy of signal subgraphs. For example, Xia et al. (2020) exploited both edge- and community-level information in modeling brain networks. Xie et al. (2013) compared different algorithms in overlapping community detection. Very few studies have focused on WM’s contribution to cognition. Specific to crystallized cognition, there have only been a handful of studies examining major WM tracts’ roles. For example, higher FA in superior longitudinal fasciculus was related to higher crystallized cognition in children (Simpson-Kent et al., 2020), higher FA in forceps minor was related to higher crystallized cognition in adults (Góngora et al., 2020). Both tracts contain numerous inter-hemisphere connections across several brain regions (e.g., superior longitudinal fasciculus) or within frontal region (e.g., forceps minor). Our study represents the very first effort in the literature specifying clique subgraphs of SC related to crystallized cognition. Importantly, across composite score and individual domains of crystallized cognition, we identified consistent brain regions and subgraphs (refer to Figure 12, 13 and 14) predominantly in right-hemisphere frontal-parietal regions. The finding is consistent to cumulative functional subgraph literature on frontal-parietal driven executive network during neuro-development (Chai et al., 2017), and flexible periphery of the language network (Fedorenko and Thompson-Schill, 2014). We also noticed that the classification accuracy with AUC around 0.78 for crystallized cognition is very high. From our previous study, SC is robust and reproducible and more predictive of cognition compared with functional connectivity (FC) derived from functional MRI (Zhang et al., 2019; 2018). We hypothesize that SC is a better biomarker for understanding the cognition development in adolescents. To verify this hypothesis, analyses and comparisons with FC seem to be a natural next step. The frontal-parietal driven subgraphs among those aged 9-10 years represent a set of critical biomarkers for overall neuro-development. However, these subgraphs’ physiological meaning can be transient across ages. Comparisons across a broader age range should be conducted to further confirm the role of these subgraphs in crystallized cognition and its development. With more data being recorded in the ABCD study, we hope to further analyze SC and cognition development.

34 in total

1. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest.

Authors: Rahul S Desikan; Florent Ségonne; Bruce Fischl; Brian T Quinn; Bradford C Dickerson; Deborah Blacker; Randy L Buckner; Anders M Dale; R Paul Maguire; Bradley T Hyman; Marilyn S Albert; Ronald J Killiany
Journal: Neuroimage Date: 2006-03-10 Impact factor: 6.556

2. The role of the insula in speech and language processing.

Authors: Anna Oh; Emma G Duerden; Elizabeth W Pang
Journal: Brain Lang Date: 2014-07-10 Impact factor: 2.381

3. Tensor Regression with Applications in Neuroimaging Data Analysis.

Authors: Hua Zhou; Lexin Li; Hongtu Zhu
Journal: J Am Stat Assoc Date: 2013 Impact factor: 5.033

4. Correction for diffusion MRI fibre tracking biases: The consequences for structural connectomic metrics.

Authors: Chun-Hung Yeh; Robert E Smith; Xiaoyun Liang; Fernando Calamante; Alan Connelly
Journal: Neuroimage Date: 2016-05-20 Impact factor: 6.556

Review 5. Network neuroscience.

Authors: Danielle S Bassett; Olaf Sporns
Journal: Nat Neurosci Date: 2017-02-23 Impact factor: 24.884

6. Detecting and Testing Altered Brain Connectivity Networks with K-partite Network Topology.

Authors: Shuo Chen; F DuBois Bowman; Yishi Xing
Journal: Comput Stat Data Anal Date: 2019-07-09 Impact factor: 1.681

7. Brain white matter tract integrity as a neural foundation for general intelligence.

Authors: L Penke; S Muñoz Maniega; M E Bastin; M C Valdés Hernández; C Murray; N A Royle; J M Starr; J M Wardlaw; I J Deary
Journal: Mol Psychiatry Date: 2012-05-22 Impact factor: 15.992

8. Subgraphs of functional brain networks identify dynamical constraints of cognitive control.

Authors: Ankit N Khambhati; John D Medaglia; Elisabeth A Karuza; Sharon L Thompson-Schill; Danielle S Bassett
Journal: PLoS Comput Biol Date: 2018-07-06 Impact factor: 4.475

9. Crystallized and fluid intelligence are predicted by microstructure of specific white-matter tracts.

Authors: Daylín Góngora; Mayrim Vega-Hernández; Marjan Jahanshahi; Pedro A Valdés-Sosa; Maria L Bringas-Vega
Journal: Hum Brain Mapp Date: 2019-11-05 Impact factor: 5.038

10. The challenge of mapping the human connectome based on diffusion tractography.

Authors: Klaus H Maier-Hein; Peter F Neher; Jean-Christophe Houde; Marc-Alexandre Côté; Eleftherios Garyfallidis; Jidan Zhong; Maxime Chamberland; Fang-Cheng Yeh; Ying-Chia Lin; Qing Ji; Wilburn E Reddick; John O Glass; David Qixiang Chen; Yuanjing Feng; Chengfeng Gao; Ye Wu; Jieyan Ma; Renjie He; Qiang Li; Carl-Fredrik Westin; Samuel Deslauriers-Gauthier; J Omar Ocegueda González; Michael Paquette; Samuel St-Jean; Gabriel Girard; François Rheault; Jasmeen Sidhu; Chantal M W Tax; Fenghua Guo; Hamed Y Mesri; Szabolcs Dávid; Martijn Froeling; Anneriet M Heemskerk; Alexander Leemans; Arnaud Boré; Basile Pinsard; Christophe Bedetti; Matthieu Desrosiers; Simona Brambati; Julien Doyon; Alessia Sarica; Roberta Vasta; Antonio Cerasa; Aldo Quattrone; Jason Yeatman; Ali R Khan; Wes Hodges; Simon Alexander; David Romascano; Muhamed Barakovic; Anna Auría; Oscar Esteban; Alia Lemkaddem; Jean-Philippe Thiran; H Ertan Cetingul; Benjamin L Odry; Boris Mailhe; Mariappan S Nadar; Fabrizio Pizzagalli; Gautam Prasad; Julio E Villalon-Reina; Justin Galvis; Paul M Thompson; Francisco De Santiago Requejo; Pedro Luque Laguna; Luis Miguel Lacerda; Rachel Barrett; Flavio Dell'Acqua; Marco Catani; Laurent Petit; Emmanuel Caruyer; Alessandro Daducci; Tim B Dyrby; Tim Holland-Letz; Claus C Hilgetag; Bram Stieltjes; Maxime Descoteaux
Journal: Nat Commun Date: 2017-11-07 Impact factor: 14.919