Literature DB >> 32104201

Group Guided Fused Laplacian Sparse Group Lasso for Modeling Alzheimer's Disease Progression.

Xiaoli Liu1, Jianzhong Wang2,3, Fulong Ren4, Jun Kong1,2.   

Abstract

As the largest cause of dementia, Alzheimer's disease (AD) has brought serious burdens to patients and their families, mostly in the financial, psychological, and emotional aspects. In order to assess the progression of AD and develop new treatment methods for the disease, it is essential to infer the trajectories of patients' cognitive performance over time to identify biomarkers that connect the patterns of brain atrophy and AD progression. In this article, a structured regularized regression approach termed group guided fused Laplacian sparse group Lasso (GFL-SGL) is proposed to infer disease progression by considering multiple prediction of the same cognitive scores at different time points (longitudinal analysis). The proposed GFL-SGL simultaneously exploits the interrelated structures within the MRI features and among the tasks with sparse group Lasso (SGL) norm and presents a novel group guided fused Laplacian (GFL) regularization. This combination effectively incorporates both the relatedness among multiple longitudinal time points with a general weighted (undirected) dependency graphs and useful inherent group structure in features. Furthermore, an alternating direction method of multipliers- (ADMM-) based algorithm is also derived to optimize the nonsmooth objective function of the proposed approach. Experiments on the dataset from Alzheimer's Disease Neuroimaging Initiative (ADNI) show that the proposed GFL-SGL outperformed some other state-of-the-art algorithms and effectively fused the multimodality data. The compact sets of cognition-relevant imaging biomarkers identified by our approach are consistent with the results of clinical studies.
Copyright © 2020 Xiaoli Liu et al.

Entities:  

Mesh:

Substances:

Year:  2020        PMID: 32104201      PMCID: PMC7033952          DOI: 10.1155/2020/4036560

Source DB:  PubMed          Journal:  Comput Math Methods Med        ISSN: 1748-670X            Impact factor:   2.238


1. Introduction

Alzheimer's disease (AD) is a chronic neurodegenerative disease, which mainly affects memory function, and its progress ultimately culminates in a state of dementia where all cognitive functions are affected. Therefore, AD is a devastating disease for those who are affected and presents a major burden to caretakers and society. According to reports conducted by the Alzheimer's Disease Neuroimaging Initiative (ADNI), the worldwide prevalence of AD would be 131.5 million by the year 2050, which is nearly three times as much as the number in 2016 (i.e., 46.8 million) [1]. Moreover, the total worldwide cost of dementia caused by AD is about 818 billion US dollars, and it will become a trillion dollar disease by 2018 [1]. According to some researches, there exists strong connection between patterns of brain atrophy and AD progression [2, 3]. Thus, it is important to utilize some measurements to assess the patients' cognitive characterization so that the development of AD can be monitored [4, 5]. In the clinical field, the criteria such as Mini Mental State Examination (MMSE) and Alzheimer's Disease Assessment Scale cognitive subscale (ADAS-Cog) have been widely applied to evaluate the cognitive status of patients for diagnosis of probable AD. However, the results of these clinical criteria may be affected by demographic factors and insensitive to progressive changes occurring with severe Alzheimer's disease [6]. Furthermore, accurate diagnosis based on these criteria also depends on a doctor's expertise. Recently, some machine learning-based techniques have been employed in AD research. Compared with the clinical criteria, these machine learning approaches are always data-oriented. That is, they seek to infer patient's cognitive abilities and track the disease progression of AD from biomarkers of neuroimaging data such as magnetic resonance imaging (MRI) and positron emission tomography (PET). Regression-based models could explore the relationship between cognitive abilities of patients, and some valuable factors that may cause AD or affect disease development were widely applied for AD analysis field. Some early studies establish regression models for different cognitive scores or the same cognitive score over time independently. However, researchers have found that there exist inherent correlations among different cognitive scores or the same cognitive score over time, largely because the underlying pathology is the same and there is a clear pattern in disease progression over time [7-10]. To achieve a more accurate predictive ability, multitask learning (MTL) was introduced for AD analysis to learn all of the models jointly rather than separately [11]. In many studies, it has been proven that MTL could obtain better generalization performance than the approaches learning each task individually [12, 13]. An intuitive way to characterize the relationships among multiple tasks is to assume that all tasks are related and their respective models are similar to each other. In [14], Zhang et al. considered regression models of different targets (such as MMSE and ADAS-Cog) as a multitask learning problem. In their method, all regression models are constrained to share a common set of features so that the relationship among different tasks can be captured. Wan et al. [15] proposed an approach called sparse Bayesian multitask learning. In this approach, the correlation structure among tasks is adaptively learnt through constraining the coefficient vectors of the regression models to be similar. In [16], the sparse group Lasso (SGL) method was also adopted to consider two-level hierarchy with feature-level and group-level sparsity and parameter coupling across tasks. Besides, there also exist some studies which focused on analyzing longitudinal data of AD by MTL. That is, the aim of each task is to model a given cognitive score at a given time step, and different tasks are utilized to model different time steps for the same cognitive score. For AD, longitudinal data usually consist of measurements at a starting time point (t = 0), after 6 months (t = 6), after 12 months (t = 12), after 24 months (t = 24), and so on usually up to 48 months (t = 48). Zhou et al. employed MTL algorithm for longitudinal data analysis of AD [9]. In this work, we develop temporal group Lasso (TGL) regularization to capture the relatedness of multiple tasks. However, since the TGL enforces different regression models to select the same features at all time steps, the temporal patterns and variability of the biomarkers during disease progression may be ignored. In order to handle this issue, an MTL algorithm based on convex fused sparse group Lasso (cFSGL) was proposed [10]. Through a sparse group Lasso penalty, cFSGL could select a common set of biomarkers at all time steps and a specific set of biomarkers at different time steps simultaneously. Meanwhile, the fused Lasso penalty in cFSGL also took on the temporal smoothness of the adjacent time steps into consideration [17]. Since cFSGL is nonsmooth, the MTL problem with cFSGL regularization was solved by a variant of the accelerated gradient method. Though TGL and cFSGL have been successfully implemented for AD analysis, a major limitation of the complex relationships among different time points and the structures within the ROIs are often ignored. Specifically, (1) the fused Lasso in TGL and cFSGL only takes into account the association existing between the two consecutive time points that are likely to skip useful task dependencies beyond the next neighbors. To summarize, in a case where every task (time step) is seen to be a node of a graph, together with the edges determining the task dependencies, cFSGL makes use of a graph where there exist edges between the tasks, t and t+1,  t=1,…, T − 1; nonetheless, there do not exist any other edges. Assume that the scores between the two consecutive time points need to be close is quite logical [18]. Nevertheless, concerning medical practice, this supposition is unlikely to stay valid all the time. Figure 1 sheds light on how not just the real ADAS but also MMSE and RAVLT scores of several subjects from our dataset changed throughout the years. Besides, consistent periods are coupled with sharp falls and tangled with occasional enhancements. It suggests that the longitudinal medical scores are likely to have a more intricate evolution as compared with straightforward linear tendencies with the local temporal relationships [19]. (2) Conversely, concerning MRI data, many MRI attributes are interconnected, in addition to revealing the brain cognitive activities together [20]. In accordance with our data, multiple shape measures (which include volume, area, and thickness) from the same area offer a detailed quantitative assessment of the cortical atrophy, besides tending to be chosen as the collective predictors. Our earlier research work put forward a framework, which made use of the previous knowledge to guide a multitask feature learning framework. This model is an effective approach that uses group information to enforce the intragroup similarity [21]. Thus, exploring and utilizing these interrelated structures is important when finding and selecting important and structurally correlated features together. In our previous work [22], we proposed an algorithm that generalized a fused group Lasso regularization to multitask feature learning to exploit the underlying structures. This method considers a graph structure within tasks by constructing an undirected graph, where the computations are pairwise Pearson correlation coefficients for each pair of tasks. Meanwhile, the method jointly learns a group structure from the image features, which adopts group Lasso for each pair of correlated tasks. Thus, only the relationship between two time points in the graph was considered by the regularization.
Figure 1

The change patterns of several patients' cognitive scores over the 6 time points: (a) ADAS, (b) MMSE, and (c) RAVLT.TOTAL. The different colors indicate different patients from our dataset.

For the sake of overcoming these two limitations, a structure regularized regression approach, group guided fused Laplacian sparse group Lasso (GFL-SGL), is proposed in this paper. Our proposed GFL-SGL can exploit commonalities at the feature level, brain region level, and task level simultaneously so as to exactly identify the relevant biomarkers from the current cognitive status and disease progression. Specifically, we designed novel mixed structured sparsity norms, called group guided fused Laplacian (GFL), to capture more general weighted (undirected) dependency graphs among the tasks and ROIs. This regularizer is based on the natural assumption that if some ROIs are important for one time point, it has similar but not identical importance for other time points. To discover such dependent structures among the time points, we employed the graph Laplacian of the task dependency matrix to uncover the relationships among time points. In our work, we consider weighted task dependency graphs based on a Gaussian kernel over the time steps, which yields a fully connected graph with decaying weights. At the same time, through considering the group structure among predictors, group information is incorporated into the regularization by task-specific G2,1-norm, which leads to enforce the intragroup similarity with group sparse. Besides, by incorporating task-common G2,1-norm and Lasso penalties into the GFL model, we can better understand the underlying associations of the prediction tasks of the cognitive measures, allowing more stable identification of cognition-relevant imaging markers. Using task-common G2,1-norm can incorporate multitask and sparse group learning, which learns shared subsets of ROIs for all the tasks. This method has been demonstrated to be an effective approach in our previous study [23]. And Lasso can maintain sparsity between features. The resulting formulation is challenging to solve due to the use of nonsmooth penalties, including the GFL, G2,1-norm, and Lasso. In this work, we propose an effective ADMM algorithm to tackle the complex nonsmoothness. We perform extensive experiments using longitudinal data from the ADNI. Five types of cognitive scores are considered. Then, we empirically evaluate the performance of the proposed GFL-SGL methods along with several baseline methods, including ridge regression, Lasso, and the temporal smoothness models TGL [9] and cFSGL [24]. Experimental results indicate that GFL-SGL outperforms both the baselines and the temporal smoothness methods, which demonstrates that incorporating sparse group learning into temporal smoothness and multitask learning can improve predictive performance. Furthermore, based on the GFL-SGL models, stable MRI features and key regions of interest (ROIs) with significant predictive power are identified and discussed. We found that the results corroborate previous studies in neuroscience. Finally, in addition to the MRI features, we use multimodality data including PET, CSF, and demographic information for GFL-SGL as well as temporal smoothness models. While the additional modalities improve the predictive performance of all the models, GFL-SGL continues to significantly outperform other methods. The rest of the paper is organized as follows. In Section 2, we provide a description of the preliminary methodology: multitask learning (MTL), two types of group Lasso norms, and fused Lasso norm. In Section 3, we present the GFL-SGL model and discuss the details of the ADMM algorithm proposed for the optimization. We present experimental results and evaluate the performance using the MRI data from the ADNI-1 and multimodality data from the ADNI-2 in Section 4. The conclusions are presented in Section 5.

2. Preliminary Methodology

2.1. Multitask Learning

Take into account multitask learning (MTL) setting having k tasks [19, 21]. Suppose that p is the number of covariates, which is shared all through each task, n indicates the number of samples. Suppose that X ∈ ℝ indicates the matrix of covariates, X ∈ ℝ implies the matrix of feedbacks with each of the rows that correspond to a sample, and Θ ∈ ℝ suggests the parameter matrix, with column θ. ∈ ℝ that corresponds to task m, m=1,…, k, and row θ ∈ ℝ that corresponds to the feature j, j=1,…, p. Besides, the MTL issue can be established to be among the estimations of the parameters based on the appropriate regularized loss function. To associate the imaging markers and the cognitive measures, the MTL model minimizes the objective as follows:where L(·) is an indication of the loss function, whereas R(·) suggests the regularizer. In the present context, we make an assumption of the loss as a square loss, i.e.,where y ∈ ℝ1× and x ∈ ℝ1× denote the i-th rows of Y and X that correspond to the multitask feedback as well as the covariates for the i-th sample. Besides that, we observe the fact that the MTL framework is possible to be conveniently elongated to other loss functions. Quite apparently, varying options of penalty R(Θ) are likely to result in significantly varying multitask methodologies. Based on some previous knowledge, we subsequently add penalty R(Θ) to encode the relatedness among tasks.

2.2. G2,1-Norm

One of the attractive properties of the ℓ2,1-norm regularization indicates that it provides multiple predictors from varying tasks with encouragement for sharing the same kind of parameter sparsity patterns. The ℓ2,1-norm regularization considersand is appropriate to concurrently enforce sparsity over the attributes of each task. The primary point of equation (3) involves using ℓ2-norm for θ, forcing the weights that correspond to the j-th attribute across multiple tasks for being grouped, besides being inclined to selecting the attributes based on the robustness of k tasks collectively. Besides, there is a relationship existing among multiple cognitive tests. As per a hypothesis, a pertinent imaging predictor usually more or less impacts each of these scores; furthermore, there is just a subset of brain regions having relevance to each evaluation. Through the use of the ℓ2,1-norm, the relationship information among varying tasks can be embedded into the framework to build a more suitable predictive framework, together with identifying a subset of the attributes. The rows of Θ receive equal treatment in ℓ2,1-norm, suggesting that the potential structures among predictors are not taken into consideration. In spite of the achievements mentioned earlier, there are few regression frameworks, which consider the covariance structure among predictors. Aimed at attaining a specific feature, the brain imaging measures usually correlate with one another. Concerning the MRI data, the groups are respective to certain regions of interest (ROIs) in the brain, for instance, the entorhinal and hippocampus. Individual attributes are specific properties of those areas, for example, cortical volume as well as thickness. With regard to each area (group), multiple attributes are derived for the measurement of the atrophy information for all of the ROIs that involve cortical thickness, in addition to surface area and volume from gray matters as well as white matters in the current research work. The multiple shape measures from the same region provide a comprehensively quantitative evaluation of cortical atrophy and tend to be selected together as joint predictors [23]. We assume that p covariates are segregated into the q disjoint groups 𝒢,  l=1,…, q wherein every group has ν covariates, correspondingly. In the backdrop of AD, every group is respective to a region of interest (ROI) in the brain; furthermore, the covariates of all the groups are in respect to particular attributes of that area. Concerning AD, the number of attributes in every group, ν, is 1 or 4, whereas the number of groups q is likely to be in hundreds. After that, we provide the introduction of two varying G2,1-norms in accordance with the correlation that exists between the brain regions (ROIs) and cognitive tasks: ‖Θ‖ encouraging a shared subset of ROIs for all the tasks and ℓ2,1 encouraging a task-specific subset of ROIs. The task-common G2,1-norm ‖Θ‖ is defined aswhere is the weight of each group. The task-common G2,1-norm enforces ℓ2-norm at the features within the same ROI (intragroup) and keeps sparsity among the ROIs (intergroup) with ℓ1 norm, to facilitate the selection of ROI. ‖Θ‖ allows to learn the shared feature representations as well as ROI representations simultaneously. The task-specific G2,1-norm ‖Θ‖ is defined aswhere θ ∈ ℝ is the coefficient vector for group 𝒢 and task m. The task-specific G2,1-norm allows to select specific ROIs while learning a small number of common features for all tasks. It has more flexibility, which decouples the group sparse regularization across tasks, so that different tasks can use different groups. The difference between these two norms is illustrated in Figure 2(a).
Figure 2

The illustration of three different regularizations. Each column of Θ is corresponding to a single task and each row represents a feature dimension. For each element in Θ, white color means zero-valued elements and color indicates nonzero values. (a) G2,1-norm. (b) Fused Lasso.

2.3. Fused Lasso

Fused Lasso was first proposed by Tibshirani et al. [25]. Fused Lasso is one of the variants, where pairwise differences between variables are penalized using the ℓ1 norm, which results in successive variables being similar. The fused Lasso norm is defined aswhere ℋ is a (k − 1) × k sparse matrix with ℋ=1, and ℋ=−1. It encourages θ. and θ. to take the same value by shrinking the difference between them toward zero. This approach has been employed to incorporate temporal smoothness to model disease progression. In longitudinal model, it is assumed that the difference of the cognitive scores between two successive time points is relatively small. The fused Lasso norm is illustrated in Figure 2(b).

3. Group Guided Fused Laplacian Sparse Group Lasso (GFL-SGL)

3.1. Formulation

In longitudinal studies, the cognitive scores of the same subject are measured at several time points. Consider a multitask learning problem over k tasks, where each task corresponds to a time point t=1,…, k. For each time point t, we consider a regression task based on data (X, y), where X ∈ ℝ denotes the matrix of covariates and y ∈ ℝ is the matrix of responses. Let Θ ∈ ℝ denote the regression parameter matrix over all tasks so that column θ. ∈ ℝ corresponds to the parameters for the task in time step t. By considering the prediction of cognitive scores at a single time point as a regression task, tasks at different time points are temporally related to each other. To encode the dependency graphs among all the tasks, we construct the Laplacian fused regularized penalty:where 𝒟 ∈ ℝ has the following form: We assume a viewpoint that is under inspiration from the local nonparametric regression, being specific, the kernel-based linear smoothers like the Nadaraya–Watson kernel estimator [26]. Considering this kind of view, we model the local approximation as In our current work, weights are figured out with the help of a Gaussian kernel, as stated in equation (9), wherein σ indicates the kernel bandwidth, which requires a mandatory definition. As σ is small, the Gaussian curve shows a quick decay, followed by subsequent rapid decline of the weights w| with the increasing |t − ℓ|; conversely, as σ is large, the Gaussian curve shows a gradual decay, followed by the subsequent slow decline of the weights w| with the increasing |t − ℓ|. In this manner, the matrix 𝒟 shares symmetry with w=w, as an attribute of |t − ℓ|. Taking into account the covariance structure among predictors, we extend the Laplacian fused norm into group guided Laplacian fused norm. The task-specific G2,1-norm was used here to decouple the group sparse regularization across tasks. G2,1-norm allows for more flexibility so that different fused tasks are regularized by different groups. The group guided fused Laplacian (GFL) regularization is defined as The GFL regularization enforces ℓ2-norm at the fused features within the same ROI and keeps sparsity among the ROIs with ℓ1-norm to facilitate the selection of ROI. The GFL regularization is illustrated in Figure 3. The regularization involves two matrices: (1) Parameter matrix (left). For convenience, we let each group correspond to a time point in the transformation matrix. In fact, the transformation matrix operates on all groups. (2) Gaussian kernel weighted fused Laplacian matrix with σ=1 (right). Since this matrix is symmetric, we represent the columns as rows.
Figure 3

The illustration of GFL regularization. The regularization involves two matrices: parameter matrix (left); Gaussian kernel weighted fused Laplacian matrix with σ=1 (right).

The clinical score data are incomplete at some time points for many patients, i.e., there may be no values in the target vector y ∈ ℝ. In order not to reduce the number of samples significantly, we use a matrix Λ ∈ ℝ to indicate incomplete target vector instead of simply removing all the patients with missing values. Let Λ=0 if the target value of sample i is missing at the j-th time point, and Λ=1 otherwise. We use the componentwise operator  ⊙  as follows: Z=A ⊙ B denotes z=ab, for all i, j. Then, plugging task-common G2,1-norm Θ and Lasso to GFL model, the objective function of group guided fused Laplacian sparse group Lasso (GFL-SGL) is given in the following optimization problem:where R(Θ)=λ1‖Θ‖1+λ2‖Θ‖ and λ1, λ2, λ3 are the regularization parameters.

3.2. Efficient Optimization for GFL-SGL

3.2.1. ADMM

Recently, ADMM has emerged as quite famous since parallelizing the distributed convex issues is quite convenient usually. Concerning ADMM, the solutions to small local subproblems are coordinated to identify the global best solution [27-29]: The formulation of the variant augmented Lagrangian of ADMM methodology is done as follows:where f and g indicate the convex attributes and variables A ∈ ℝ, x ∈ ℝ, B ∈ ℝ, z ∈ ℝ, c ∈ ℝ. u denotes a scaled dual augmented Lagrangian multiplier, whereas ρ suggests a nonnegative penalty parameter. In all of the iterations of ADMM, this issue is solved through the alternation of minimization L(x, z, u) over x,  z,  and u. Concerning the (k+1)-th iteration, ADMM is updated by

3.2.2. Efficient Optimization for GFL-SGL

We put forward an efficient algorithm to solve the objective function in equation (11), equaling the limited optimization issue as follows:where Q,  Γ refer to slack variables. After that, the solution of equation (15) can be obtained by ADMM. The augmented Lagrangian iswhere U,  V are augmented Lagrangian multipliers. UpdateΘ: from the augmented Lagrangian in equation (16), the update of Θ at (s+1)-th iteration is conducted bythat is a closed form, which is likely to be extracted through the setting of equation (17) to zero. It requires observation that 𝒟 indicates a symmetric matrix. Besides, we state Φ=𝒟𝒟, wherein Φ is also an indication of a symmetric matrix where Φ denotes the value of weight (t, l). Through this kind of a linearization, Θ can be updated in parallel with the help of the individual θ.. In this manner, in the (s+1)-th iteration, it is possible to update θ.(efficiently with the use of Cholesky. The above optimization problem is quadratic. The optimal solution is given by θ.(=F−1b(, where Computing θ.( deals with the solution of a linear system, the most time-consuming component in the entire algorithm. For the computation of θ.( in an efficient manner, we perform the calculation of the Cholesky factorization of F as the algorithm begins: Observably, F refers to a constant and positive definite matrix. With the use of the Cholesky factorization, we require solving the following two linear systems at all of the iterations: Accordingly, A indicates an upper triangular matrix, which solves these two linear systems, which is quite effective. Update Q: updating Q effectively requires solving the problem as follows:which equals the computation of the proximal operator for R(·). Being specific, we require solvingwhere Ω(=Θ(+(1/ρ)U(. This is aimed at being capable of computing Q(=Ψ(Ω() in an efficient manner. The computation of the proximal operator for the composite regularizer can be done effectively in two steps [30, 31], which are illustrated as follows: These two steps can be carried out efficiently with the use of suitable extensions of soft-thresholding. It is possible to compute the update in equation (25a) with the help of the soft-thresholding operator ζ(Ω(), which is stated as follows: After that, we emphasize updating equation (25b), effectively equivalent to the computation of the proximal operator for G2,1-norm. Specifically, the problem can be jotted down as follows: Since group 𝒢 put to use in our research work is disjoint, equation (27) can be decoupled into Because ϕ(q) is strictly convex, we conclude that q( refers to its exclusive minimizer. After that, we provide the introduction of the following lemma [32] for the solution of equation (28).

Lemma 1 .

For any λ2 ≥ 0, we havewhere q is the j-th row of Q. UpdateΓ: the update for Γ efficiently requires solving the problem as follows:which is efficiently equivalent to the computation of the proximal operator for GFL-norm. Explicitly, the problem can be stated as follows:where Z(=Θ(𝒟+(1/ρ)V(. Equation (31) can be decoupled into Then, we introduce the following lemma [32].

Lemma 2 .

For any λ3 ≥ 0, we havewhere γ, z are rows in group 𝒢 for task t of Γ( and Z(, respectively. Dual update for U and V: following the standard ADMM dual update, the update for the dual variable for our setting is presented as follows: It is possible to carry out the dual updates in an elementwise parallel way. Algorithm 1 provides a summary of the entire algorithm. MATLAB codes of the proposed algorithm are available at https://XIAOLILIU@bitbucket.org/XIAOLILIU/gfl-sgl.
Algorithm 1

ADMM optimization of GFL-SGL.

3.3. Convergence

The convergence of the Algorithm 1 is shown in the following lemma.

Theorem 3 .

Suppose there exists at least one solution Θ of equation (is convex, λ1 > 0, λ2 > 0, λ3 > 0.Then the following property for GFL-SGL iteration in Furthermore,whenever equation (11) has a unique solution. The condition allowing the convergence in Theorem 1 is very convenient to meet. λ1,  λ2, and λ3 refer to the regularization parameters, which are required to be above zero all the time. The detailed proof is elaborated in Cai et al. [33]. Contrary to Cai et al., we do not need L(Θ) as differentiable, in addition to explicitly treating the nondifferentiability of L(Θ) through the use of its subgradient vector ∂L(Θ), which shares similarity with the strategy put to use by Ye and Xie [28].

4. Experimental Results and Discussions

In this section, we put forward the empirical analysis for the demonstration of the efficiency of the suggested model dealing with the characterization of AD progression with the help of a dataset from the Alzheimer's Disease Neuroimaging Initiative (ADNI) [34]. The principal objective of ADNI has been coping with testing if it is possible to combine serial MRI, together with PET, other biological markers, and medical and neuropsychological evaluations to measure the progression of MCI as well as early AD. Approaches for the characterization of the AD progression are expected to assisting both researchers and clinicians in developing new therapies and monitoring their efficacies. Besides, being capable of understanding the disease progression is expected to augment both the safety and efficiency of the drug development, together with potentially lowering the time and cost associated with the medical experiments.

4.1. Experimental Setup

The ADNI project is termed as a longitudinal research work, in which the chosen subjects are classified into three baseline diagnostic cohorts that include Cognitively Normal (CN), Mild Cognitive Impairment (MCI), and Alzheimer's Disease (AD), recurrently encompassing the interval of six or twelve months. Also, the date of scheduling the subjects for performing the screening emerges as the baseline (BL) after that approval; also, the time point for the follow-up visits is indicated by the period time that starts from the baseline. Moreover, we put to use the notation Month 6 (M6) to denote the time point half year following the very first visit. Nowadays, ADNI possesses up to Month 48 follow-up data that some patients can avail. Nevertheless, some patients skip research work for several causes. The current work places emphasis on the MRI data. Furthermore, the MRI attributes put to use in our assays are made based on the imaging data from the ADNI database that is processed with the help of a team from UCSF (University of California at San Francisco), carrying out cortical reconstruction as well as volumetric segmentations using the FreeSurfer image analysis suite (http://surfer.nmr.mgh.harvard.edu/). In the current investigation, we eliminate the attributes that have over 10% missing entries (concerning every patient as well as every time point), besides excluding the patients, who do not have the baseline MRI records and completing the missing entries with the use of the average value. This yields a total of n=788 subjects (173 AD, 390 MCI, and 225 CN) for baseline, and for the M6, M12, M24, M36, and M48 time points, the sample size is 718 (155 AD, 352 MCI, and 211 CN), 662 (134 AD, 330 MCI, and 198 CN), 532 (101 AD, 254 MCI, and 177 CN), 345 (1 AD, 189 MCI, and 155 CN), and 91 (0 AD, 42 MCI, and 49 CN), respectively. In aggregate, forty-eight cortical regions together with forty-four subcortical regions are created after this preprocessing. Both Tables 1 and 2 [19, 21] shed light on the names of the cortical and subcortical regions. For each cortical region, the cortical thickness average (TA), standard deviation of thickness (TS), surface area (SA), and cortical volume (CV) were calculated as features. For each subcortical region, subcortical volume was calculated as features. The SA of left and right hemisphere and total intracranial volume (ICV) were also included. This yielded a total of p=319 MRI features extracted from cortical/subcortical ROIs in each hemisphere (including 275 cortical and 44 subcortical features). Details of the analysis procedure are available at http://adni.loni.ucla.edu/research/mri-post-processing/.
Table 1

Cortical features from the following 71 (=35 × 2+1) cortical regions generated by FreeSurfer.

IDROI nameLateralityType
1Banks superior temporal sulcusL, RCV, SA, TA, TS
2Caudal anterior cingulate cortexL, RCV, SA, TA, TS
3Caudal middle frontal gyrusL, RCV, SA, TA, TS
4Cuneus cortexL, RCV, SA, TA, TS
5Entorhinal cortexL, RCV, SA, TA, TS
6Frontal poleL, RCV, SA, TA, TS
7Fusiform gyrusL, RCV, SA, TA, TS
8Inferior parietal cortexL, RCV, SA, TA, TS
9Inferior temporal gyrusL, RCV, SA, TA, TS
10InsulaL, RCV, SA, TA, TS
11Isthmus cingulateL, RCV, SA, TA, TS
12Lateral occipital cortexL, RCV, SA, TA, TS
13Lateral orbital frontal cortexL, RCV, SA, TA, TS
14Lingual gyrusL, RCV, SA, TA, TS
15Medial orbital frontal cortexL, RCV, SA, TA, TS
16Middle temporal gyrusL, RCV, SA, TA, TS
17Paracentral lobuleL, RCV, SA, TA, TS
18Parahippocampal gyrusL, RCV, SA, TA, TS
19Pars opercularisL, RCV, SA, TA, TS
20Pars orbitalisL, RCV, SA, TA, TS
21Pars triangularisL, RCV, SA, TA, TS
22Pericalcarine cortexL, RCV, SA, TA, TS
23Postcentral gyrusL, RCV, SA, TA, TS
24Posterior cingulate cortexL, RCV, SA, TA, TS
25Precentral gyrusL, RCV, SA, TA, TS
26Precuneus cortexL, RCV, SA, TA, TS
27Rostral anterior cingulate cortexL, RCV, SA, TA, TS
28Rostral middle frontal gyrusL, RCV, SA, TA, TS
29Superior frontal gyrusL, RCV, SA, TA, TS
30Superior parietal cortexL, RCV, SA, TA, TS
31Superior temporal gyrusL, RCV, SA, TA, TS
32Supramarginal gyrusL, RCV, SA, TA, TS
33Temporal poleL, RCV, SA, TA, TS
34Transverse temporal cortexL, RCV, SA, TA, TS
35HemisphereL, RSA
36Total intracranial volumeBilateralCV

275 (=34 × 2 × 4+1 × 2 × 1+1) cortical features calculated were analyzed in this study. Laterality indicates different feature types calculated for L (left hemisphere), R (right hemisphere), or Bilateral (whole hemisphere).

Table 2

Subcortical features from the following 44 (=16 × 2+12) subcortical regions generated by FreeSurfer.

NumberROILateralityType
1Accumbens areaL, RSV
2AmygdalaL, RSV
3CaudateL, RSV
4Cerebellum cortexL, RSV
5Cerebellum white matterL, RSV
6Cerebral cortexL, RSV
7Cerebral white matterL, RSV
8Choroid plexusL, RSV
9HippocampusL, RSV
10Inferior lateral ventricleL, RSV
11Lateral ventricleL, RSV
12PallidumL, RSV
13PutamenL, RSV
14ThalamusL, RSV
15Ventricle diencephalonL, RSV
16VesselL, RSV
17BrainstemBilateralSV
18Corpus callosum anteriorBilateralSV
19Corpus callosum centralBilateralSV
20Corpus callosum middle anteriorBilateralSV
21Corpus callosum middle posteriorBilateralSV
22Corpus callosum posteriorBilateralSV
23Cerebrospinal fluidBilateralSV
24Fourth ventricleBilateralSV
25Nonwhite matter hypointensitiesBilateralSV
26Optic chiasmBilateralSV
27Third ventricleBilateralSV
28White matter hypointensitiesBilateralSV

44 subcortical features calculated were analyzed in this study. Laterality indicates different feature types calculated for L (left hemisphere), R (right hemisphere), or Bilateral (whole hemisphere).

For predictive modeling, five sets of cognitive scores [25, 35] are examined: Alzheimer's Disease Assessment Scale (ADAS), Mini-Mental State Exam (MMSE), Rey Auditory Verbal Learning Test (RAVLT), Category Fluency (FLU), and Trail Making Test (TRAILS). ADAS is termed as the gold standard in the AD drug experiment concerning the cognitive function evaluation that refers to the most famous cognitive testing tool for the measurement of the seriousness of the most pivotal signs of AD. Furthermore, MMSE measures cognitive damage, which includes orientation to both time and place, coupled with the attention and calculation, spontaneous and delayed recall of words, and language and visuoconstructional attributes. RAVLT refers to the measurement of the episodic memory and put to use to diagnose memory interruptions, comprising eight recall experiments as well as a recognition test. FLU refers to the measurement of semantic memory (verbal fluency and language). The subject is requested for naming varying exemplars from a provided semantic classification. Furthermore, TRAILS is termed as an array of processing speed and executive attribute, comprising two components, wherein the subject is directed for connecting a set of twenty-five dots at the fastest possible, meanwhile performing the maintenance of precision. The specific scores we used are listed in Table 3. Note that the proposed GFL-SGL models are trained to model progression for each of these scores, with different time steps serving the role of distinct tasks. Since the five sets of cognitive scores include a total of ten different scores (see Table 3), results will be reported on each of these ten scores separately.
Table 3

Description of the cognitive scores considered in the experiments.

Score nameDescription
ADASAlzheimer's disease assessment scale
MMSEMini-mental state exam
RAVLTTOTALTotal score of the first 5 learning trials
TOT6Trial 6 total number of words recalled
T3030 minute delay total number of words recalled
RECOG30 minute delay recognition
FLUANIMAnimal total score
VEGVegetable total score
TRAILSATrail making test A score
BTrail making test B score
Concerning all of the trials, 10-fold cross valuation is employed for the evaluation of our framework, together with carrying out the comparison. For all of the experiments, 5-fold cross validation on the training set is carried out to select the regularization parameters (hyperparameters) (λ1, λ2, λ3). The approximated framework makes use of these regularization parameters for the prediction on the experiment set. About the cross validation, concerning a fixed set of hyperparameters, the use of four folds is made to train, besides using one fold for assessment with the help of nMSE. Concerning the hyperparameter choice, we take into account a grid of regularization parameter values, in which every regularization parameter varies between 10−1 and 103 in log scale. The data were z-scored before the application of the regression methods. The reported findings constituted the optimal findings of every method having the best parameter. Regarding the quantitative efficiency assessment, we made use of the metrics of correlation coefficient (CC) as well as root mean squared error (rMSE) between the forecasted medical scores and the targeted medical scores for all of the regression tasks. Besides, for the evaluation of the overall efficiency on each task, the use of normalized mean squared error (nMSE) [12, 24] and weighted R-value (wR) [36] is made. The nMSE and wR are defined as follows:where Y and are the ground truth cognitive scores and the predicted cognitive scores, respectively. A smaller (higher) value of nMSE and rMSE (CC and wR) represents better regression performance. We report the mean and standard deviation based on 10 iterations of experiments on different splits of data for all comparable experiments. We also performed paired t-tests on the corresponding cross validation performances measured by the nMSE and wR between predicted and actual scores to compare the proposed method and the other comparison methods [9, 24, 35, 37]. The p values were provided to examine whether these improved prediction performances were significant. A significant performance has a low p value (less than 0.05 for example). Aimed at assessing the sensitivity of the three hyperparameters in the GFL-SGL formulation (equation (11)), we investigated the 3D hyperparameter space, in addition to plotting the nMSE metric for all of the mixes of values, in the way we had done in our recent investigation [19]. The sensitivity research work is of importance for the study of the impact of all the terms in the GFL-SGL formulation, together with guiding on the way of appropriately setting the hyperparameters. The definition of the hyperparameter space is made as λ1, λ2, λ3 ∈ [0.1, 100]. The nMSE put forward was calculated in the test set. Owing to the space constraints, Figure 4 merely sheds light on the plots for ADAS as well as MMSE cognitive scores. Observing the fact is possible that, concerning all of the cognitive scores, smaller values for λ3 resulted in the low regression efficiency, which suggested that the temporal smooth penalization term mainly contributes to the forecast and requires consideration. Moreover, the bigger values for λ2 (linked to the task-common group Lasso penalty) tends to enhance the findings for smaller λ1. With the rise in λ1, we bring into force more sparsity on θ parameters, accordingly breaking the group structure that prevails in the data.
Figure 4

Hyperparameter sensitivity analysis: hyperparameter λ3 associated with the GFL-SGL temporal smooth penalization term mainly contributes to the forecast and requires consideration. Bigger values for λ2 (linked to the task-common group Lasso penalty) tends to enhance the findings for smaller λ1. (a) ADAS (λ1=1). (b) ADAS (λ1=10). (c) ADAS (λ1=100). (d) MMSE (λ1=1). (e) MMSE (λ1=10). (f) MMSE (λ1=100).

4.2. Prediction Performance Based on MRI Features

We compare the performance of GFL-SGL with different regression methods, including ridge regression [38] and Lasso [39], which are applied independently to each time point, and temporal group Lasso (TGL) [9] and convex fused sparse group Lasso (cFSGL) [24], which are state-of-the-art methods for characterizing longitudinal AD progression. TGL incorporates three penalty terms to capture task relatedness, which contains two ℓ2-norms to prevent overfitting and enforce temporal smoothness, and one ℓ2,1-norm to introduce joint feature selection. The optimal function is formulated as minΘ L(Θ)+λ1‖Θ‖2+λ2||ℛΘ||2+λ3‖Θ‖2,1. cFSGL allows the simultaneous selection of a common set of biomarkers for multiple time points and specific sets of biomarkers for different time points using the sparse group Lasso (SGL, λ1||Θ||2,1+λ2‖Θ‖1) penalty and in the meantime incorporates the temporal smoothness using the fused Lasso penalty (∑|θ − θ|). The downloading of the codes of TGL and cFSGL is carried out from the authors' websites, whereas the AGM algorithm is put to use as the optimization methodology. It is recalling the fact that every trial emphasizes a particular cognitive score, having varying time points that serve as different tasks for the multitask learning formulations. Since, in aggregate, there are ten cognitive scores, we carry out the trials, besides reporting the outcomes separately about all of the scores. The calculation of the average and standard deviation of the efficiency measures is carried out with the help of the 10-fold cross validation on the different splits of data, summarized in Table 4.
Table 4

Prediction performance results of ten cognitive scores of six time points based on MRI features.

RidgeLassoTGLcFSGLGFL-SGL
Score: ADAS
nMSE10.122 ± 1.41566.7689 ± 0.76986.2740 ± 0.78616.3092 ± 0.6991 6.1389±0.6951
wR0.5638 ± 0.05090.6237 ± 0.05410.6628 ± 0.05610.6560 ± 0.0486 0.6658±0.0469
BL rMSE7.6553 ± 0.55766.8217 ± 0.42386.7275 ± 0.42986.7151 ± 0.4275 6.6479±0.5045
M6 rMSE9.1778 ± 1.24677.9602 ± 0.84847.7637 ± 0.9345 7.6846±0.94937.6994 ± 0.9357
M12 rMSE9.7212 ± 1.09868.7050 ± 0.8651 8.3822±0.94018.4646 ± 1.05948.4076 ± 1.0478
M24 rMSE11.676 ± 1.646310.191 ± 1.29149.6773 ± 1.63089.7859 ± 1.6170 9.4808±1.6224
M36 rMSE12.772 ± 2.42629.4852 ± 1.38068.9110 ± 1.33568.9313 ± 1.3762 8.7939±1.2987
M48 rMSE20.433 ± 2.61639.0161 ± 2.33818.2041 ± 1.18698.6279 ± 2.0852 8.0947±1.6669

Score: MMSE
nMSE10.447 ± 1.45902.5284 ± 0.22302.4911 ± 0.14112.5048 ± 0.1772 2.3975±0.2140
wR0.4188 ± 0.05300.5720 ± 0.04980.5898 ± 0.04310.5878 ± 0.0449 0.5975±0.0425
BL rMSE2.6943 ± 0.17672.2001 ± 0.13492.2204 ± 0.13672.1729 ± 0.1505 2.1478±0.1159
M6 rMSE3.5136 ± 0.34132.8571 ± 0.26972.8260 ± 0.28752.8069 ± 0.2882 2.7682±0.2470
M12 rMSE3.9044 ± 0.23133.2128 ± 0.33013.1438 ± 0.33283.1558 ± 0.3650 3.1375±0.3660
M24 rMSE5.0192 ± 0.69563.8663 ± 0.6975 3.8171±0.70643.8316 ± 0.63553.8371 ± 0.7620
M36 rMSE5.7022 ± 0.55053.2518 ± 0.85923.2732 ± 0.81063.4828 ± 0.6365 3.1914±0.8230
M48 rMSE29.958 ± 0.72334.0539 ± 0.70974.0077 ± 0.80893.8018 ± 0.9474 3.5517±0.6933

Score: RAVLT.TOTAL
nMSE17.139 ± 1.23849.7932 ± 0.91199.1381 ± 0.81688.9621 ± 0.9867 8.7825±0.9241
wR0.4059 ± 0.05100.4989 ± 0.05870.5390 ± 0.06030.5498 ± 0.0533 0.5512±0.0558
BL rMSE11.404 ± 0.70439.8789 ± 0.92869.6628 ± 0.90919.6980 ± 0.7418 9.5445±0.6940
M6 rMSE11.828 ± 1.162310.210 ± 1.25129.9696 ± 1.191510.079 ± 1.1682 9.8337±1.2773
M12 rMSE13.027 ± 0.997411.457 ± 0.909610.945 ± 1.106310.865 ± 1.3290 10.788±1.2737
M24 rMSE14.647 ± 1.400612.330 ± 1.423111.997 ± 1.576511.756 ± 1.6851 11.740±1.5374
M36 rMSE15.899 ± 2.256711.512 ± 1.526810.640 ± 1.279210.331 ± 1.5089 10.306±1.5535
M48 rMSE41.462 ± 3.440412.728 ± 1.504813.105 ± 2.887411.333 ± 2.0937 11.803±2.5033

Score: RAVLT.TOT6
nMSE3.9829 ± 0.43972.9663 ± 0.19092.8853 ± 0.20572.8546 ± 0.1867 2.8198±0.1772
wR0.4528 ± 0.07030.5213 ± 0.08030.5412 ± 0.06820.5458 ± 0.0687 0.5541±0.0730
BL rMSE3.6885 ± 0.37413.2944 ± 0.26173.2949 ± 0.26113.2756 ± 0.2885 3.2540±0.2390
M6 rMSE3.4704 ± 0.39493.1592 ± 0.34433.1628 ± 0.29393.1386 ± 0.3116 3.1270±0.2841
M12 rMSE3.8384 ± 0.26763.4284 ± 0.22623.4271 ± 0.26323.4094 ± 0.2808 3.3763±0.2575
M24 rMSE4.0656 ± 0.37583.6252 ± 0.34693.5826 ± 0.35813.5894 ± 0.3360 3.5592±0.3293
M36 rMSE4.3074 ± 0.71743.5169 ± 0.36673.3890 ± 0.37993.3799 ± 0.3926 3.3557±0.3799
M48 rMSE7.4599 ± 1.06564.5834 ± 0.69693.7902 ± 0.7846 3.7275±0.70563.7694 ± 0.7746

Score: RAVLT.T30
nMSE3.9392 ± 0.39463.0595 ± 0.20122.9876 ± 0.19502.9706 ± 0.2044 2.9358±0.1919
wR0.4580 ± 0.06090.5255 ± 0.07300.5384 ± 0.06790.5422 ± 0.0647 0.5474±0.0646
BL rMSE3.7877 ± 0.30693.4076 ± 0.25953.4176 ± 0.24853.4034 ± 0.2806 3.3806±0.2491
M6 rMSE3.4750 ± 0.35313.1839 ± 0.33803.2095 ± 0.25933.1991 ± 0.2871 3.1496±0.3013
M12 rMSE3.9611 ± 0.44803.6673 ± 0.32423.6343 ± 0.37993.6173 ± 0.3800 3.5943±0.3790
M24 rMSE4.2027 ± 0.50113.8070 ± 0.46483.7570 ± 0.40513.7562 ± 0.4151 3.7389±0.4429
M36 rMSE4.2142 ± 0.51023.5049 ± 0.45953.3604 ± 0.4241 3.3473±0.43273.3852 ± 0.4545
M48 rMSE7.1834 ± 0.81454.5537 ± 0.6315 4.0102±0.44134.0900 ± 0.50644.0727 ± 0.5386

Score: RAVLT.RECOG
nMSE6.2754 ± 1.23063.4921 ± 0.3325 3.2186±0.29533.2282 ± 0.29923.2314 ± 0.2654
wR0.3496 ± 0.08510.4583 ± 0.07930.4993 ± 0.0779 0.5075±0.07380.5058 ± 0.0799
BL rMSE4.3887 ± 0.42103.6494 ± 0.29933.5990 ± 0.36473.5721 ± 0.3709 3.5653±0.3386
M6 rMSE4.4959 ± 0.36863.7470 ± 0.24123.6722 ± 0.2928 3.6616±0.29953.6627 ± 0.2815
M12 rMSE4.6874 ± 0.35743.7850 ± 0.28893.7034 ± 0.35213.7178 ± 0.2935 3.6942±0.3141
M24 rMSE4.8253 ± 0.40293.9168 ± 0.2251 3.7518±0.27713.8103 ± 0.23913.8058 ± 0.2488
M36 rMSE5.4178 ± 0.65483.8073 ± 0.2366 3.6448±0.29663.6962 ± 0.16163.7372 ± 0.1837
M48 rMSE12.411 ± 0.90355.1582 ± 1.09633.9023 ± 0.8880 3.7995±0.89983.9423 ± 0.7025

Score: FLU.ANIM
nMSE9.6435 ± 1.13875.2513 ± 0.72135.1293 ± 0.65974.9992 ± 0.6243 4.9478±0.6151
wR0.2872 ± 0.09420.3858 ± 0.08340.4212 ± 0.0895 0.4433±0.08390.4406 ± 0.0840
BL rMSE6.3878 ± 0.64235.2970 ± 0.53545.3535 ± 0.4841 5.1972±0.51495.2026 ± 0.4857
M6 rMSE6.1380 ± 0.59755.3040 ± 0.49955.3207 ± 0.47325.2175 ± 0.4797 5.1951±0.4563
M12 rMSE6.6219 ± 0.78005.7413 ± 0.86725.6134 ± 0.79775.5704 ± 0.8052 5.5303±0.7929
M24 rMSE7.2828 ± 0.93665.8387 ± 0.74925.7844 ± 0.62805.7839 ± 0.7570 5.6815±0.7035
M36 rMSE7.8427 ± 1.43615.6450 ± 0.6733 5.3599±0.74235.3988 ± 0.81885.3655 ± 0.6841
M48 rMSE20.613 ± 1.85246.2549 ± 1.6986 5.7005±1.31675.7501 ± 1.50195.9240 ± 1.4382

Score: FLU.VEG
nMSE6.6621 ± 0.84993.5364 ± 0.34633.4061 ± 0.28793.3593 ± 0.31463.3575 ± 0.2867
wR0.3726 ± 0.07300.4934 ± 0.08300.5257 ± 0.0781 0.5357±0.07460.5356 ± 0.0777
BL rMSE4.4121 ± 0.30823.7115 ± 0.22213.6980 ± 0.23873.6464 ± 0.2179 3.6368±0.2016
M6 rMSE4.7036 ± 0.19693.8593 ± 0.25893.8617 ± 0.20753.8033 ± 0.2318 3.7892±0.2294
M12 rMSE5.0566 ± 0.47723.9568 ± 0.49413.9319 ± 0.4757 3.9226±0.45423.9267 ± 0.4761
M24 rMSE5.2146 ± 0.44024.2580 ± 0.4104 4.1408±0.34444.1677 ± 0.41924.1908 ± 0.4275
M36 rMSE6.4334 ± 0.79334.4230 ± 0.39824.2656 ± 0.37024.2445 ± 0.42634.2392 ± 0.3829
M48 rMSE13.882 ± 1.45354.9607 ± 1.4253 3.9822±1.35273.9887 ± 1.40234.0292 ± 1.4371

Score: TRAILS.A
nMSE33.513 ± 3.849123.711 ± 1.8805 22.756±1.515523.151 ± 1.575423.349 ± 1.5768
wR0.3572 ± 0.07690.3740 ± 0.0658 0.4219±0.06820.4122 ± 0.06880.3965 ± 0.0704
BL rMSE25.942 ± 3.866523.421 ± 4.0061 23.039±3.659823.258 ± 3.723323.443 ± 3.8347
M6 rMSE28.290 ± 4.483225.328 ± 3.6847 25.021±3.371525.198 ± 3.560025.634 ± 3.3660
M12 rMSE27.665 ± 3.896125.043 ± 3.4997 24.493±3.301124.675 ± 3.302224.882 ± 3.2310
M24 rMSE31.805 ± 4.108728.384 ± 3.0384 27.845±3.210628.073 ± 3.107427.855 ± 3.0427
M36 rMSE33.414 ± 8.138324.980 ± 7.0999 23.996±5.222224.162 ± 5.911224.247 ± 6.0955
M48 rMSE53.906 ± 14.73028.256 ± 16.054 26.493±12.13226.870 ± 11.59825.241 ± 11.862

Score: TRAILS.B
nMSE94.882 ± 9.601568.077 ± 6.427764.789 ± 5.926963.707 ± 6.2629 63.604±5.5813
wR0.3837 ± 0.05090.4383 ± 0.06180.4845 ± 0.05650.4809 ± 0.0669 0.4858±0.0595
BL rMSE77.907 ± 6.562270.051 ± 4.514469.947 ± 4.9343 69.032±3.830469.154 ± 4.0030
M6 rMSE83.326 ± 7.107674.327 ± 4.298572.514 ± 3.4677 71.401±4.481471.756 ± 4.9096
M12 rMSE81.130 ± 8.946572.901 ± 6.0166 70.604±5.851070.777 ± 6.405370.815 ± 5.7209
M24 rMSE89.969 ± 13.03577.722 ± 8.9225 73.456±9.397973.950 ± 8.518673.460 ± 9.2281
M36 rMSE100.25 ± 21.73280.934 ± 26.92378.130 ± 24.53678.242 ± 27.867 77.639±23.797
M48 rMSE134.89 ± 29.88167.923 ± 29.60468.356 ± 11.96865.858 ± 24.964 63.491±18.188

Note that the best results are boldfaced. The superscript symbol “” indicates that GFL-SGL significantly outperformed that method on that score. Paired t-test at a level of 0.05 was used.

The results show that multitask temporal smoothness models (TGL, cFSGL, and GFL-SGL) are more effective than single-task learning models (ridge and Lasso) in terms of both nMSE and wR over all scores, especially for the task at the later time points where the training samples are limited. Both the norms of fused Lasso (TGL and cFSGL) and group guided fused Lasso (GFL-SGL) can improve performance, which demonstrates that taking into account the local structure within the tasks improves the prediction performance. Furthermore, GFL-SGL achieved better performances than TGL and cFSGL, which indicates that it is beneficial to simultaneously employ transform matrix taking into account all the time points and group structure information among the features. Two types of group penalties are used in our model (G2,1-norm and G2,1-norm). The former learns a shared subset of ROIs for all the tasks, whereas the latter learns a task-specific subset of Laplacian fused ROIs. Our GFL-SGL model performs consistently better than TGL and cFSGL, which further demonstrates that exploiting the underlying dependence structure may be advantageous, and exploiting the structure among tasks and features simultaneously resulted in significantly better prediction performance. The statistical hypothesis test reveals that GFL-SGL is significantly better than the contenders for most of the scores. We shed light on the scatter plots of the actual values against the forecasted values on the test dataset. For lacking the space, we just illustrated two scatter plots, which included ADAS as well as MMSE in Figures 5 and 6, correspondingly. Owing to the small sample size at M36 and M48 time points, we indicate the scatter plots for the first four time points. As the scatter plots indicate, the forecasted values, as well as the actual values scores, are similarly highly correlated to both of these tasks. The scatter plots demonstrate the fact that the prediction efficiency for ADAS is better as compared with that of MMSE. Section 4.4 is going to incorporate more modalities, which include not just PET but also CSF and demographic information, aimed at improving efficiency.
Figure 5

Scatter plots of the actual ADAS against the forecasted values on the test dataset by GFL-SGL using MRI features. High correlation is observed for the ADAS score at each time point. (a) Baseline (ADAS BL R = 0.678). (b) Month 6 (ADAS M6 R = 0.657). (c) Month 12 (ADAS M12 R = 0.673). (d) Month 24 (ADAS M24 R = 0.693).

Figure 6

Scatter plots of the actual MMSE against the forecasted values on the test dataset by GFL-SGL using MRI features. High correlation is observed for the MMSE score at each time point. (a) Baseline (MMSE BL R = 0.574). (b) Month 6 (MMSE M6 R = 0.572). (c) Month 12 (MMSE M12 R = 0.609). (d) Month 24 (MMSE M24 R = 0.662).

4.3. Identification of MRI Biomarkers

In Alzheimer's disease research works, researchers have interest in the provision of the improved cognitive scores forecast, besides identifying which constitute the brain regions that are more impacted by the disease that has the potential of helping perform the diagnosis of the preliminary phases of the disease, besides its way of dissemination. After that, we revert to analyzing the identification of MRI biomarkers. Our GFL-SGL refers to a group sparse framework, capable of identifying a compact set of relevant neuroimaging biomarkers from the region level for the group Lasso on the attributes, which is expected to give us improved interpretability of the brain region. Due to lack of space, we only show the top 30 ROIs for ADAS and MMSE by obtaining the regression weights of all ROIs in each hemisphere for six time points in Figure 7. The value of each item (i, j) in the heat map indicates the weight of the i-th ROI for the j-th time point and is calculated by , where k is the k-th MRI feature. The larger the absolute value of a coefficient is, the more important its corresponding brain region is in predicting the corresponding time point of that cognitive score. The figure illustrates that the proposed GFL-SGL clearly presents sparsity results across all time points, which demonstrates that these biomarkers are longitudinally important due to the advantage of smooth temporal regularization. We also observe that different time points share similar ROIs for these two cognitive measures, which demonstrates that there exists a strong correlation among the multiple tasks of score prediction at multiple time points.
Figure 7

Longitudinal heat maps of regression coefficients generated by GFL-SGL for ADAS and MMSE using 10 trials on different splits of data. The larger the value is, the more important the ROI is. (a) ADAS. (b) MMSE.

Moreover, the top 30 selected MRI features and brain regions (ROIs) for ADAS and MMSE are shown in Table 5. We also show the brain maps of the top ROIs in Figures 8 and 9, including cortical ROIs and subcortical ROIs. Note that the top features and ROIs are obtained by calculating the overall weights for the six time points. From the top 30 features, we can examine the group sparsity of GFL-SGL model at the ROI level. It can be seen clearly that many top features come from the same ROI due to the consideration of group property in features, such as L.Hippocampus, L.MidTemporal, L.InfLatVent, and R.Entorhinal.
Table 5

Top 30 selected MRI features and ROIs by GFL-SGL on the prediction ADAS and MMSE measures.

Num.ADASMMSE
FeaturesGroupsFeaturesGroups
1SV of L.HippVolL.HippocampusSV of L.HippVolL.Hippocampus
2TA of L.MidTemporalL.MidTemporalTA of L.MidTemporalL.InfLatVent
3TA of R.EntorhinalL.InfLatVentTA of R.EntorhinalL.MidTemporal
4CV of R.EntorhinalR.EntorhinalCV of R.EntorhinalR.Entorhinal
5SV of L.InfLatVentL.CerebellCtxSV of L.InfLatVentR.InfLatVent
6SV of L.CerebellCtxL.ThalamusCV of L.MidTemporalL.InfParietal
7TA of L.InfTemporalL.PallidumTA of L.InfParietalCC_Ant
8TS of L.ParahippCC_AntSV of R.InfLatVentWMHypoInt
9TA of R.InfParietalR.InfParietalTS of L.ParahippR.InfParietal
10CV of L.PrecentralL.PrecentralTS of R.RostAntCingL.Parahipp
11TA of L.PrecuneusL.InfTemporalTA of R.InfParietalR.RostAntCing
12SV of L.ThalVolL.PrecuneusCV of L.InfParietalBrainstem
13TS of L.ParsTriangR.PrecentralSV of CC_AntL.Supramarg
14SV of L.PallVolL.ParahippCV of R.InfParietalR.Precentral
15CV of R.PrecentralL.ParsTriangSV of WMHypoInt4thVent
16SA of L.SupramargL.SupramargTA of R.TransvTemporalR.TransvTemporal
17TA of L.PostcentralL.PostcentralTS of L.InfParietalR.BanksSTS
18CV of R.InfParietalCSFSA of L.SupramargL.InfTemporal
19SV of CC_AntOpticChiasmCV of L.SupramargL.Precentral
20SA of L.RostAntCingL.TemporalPoleSA of R.InfParietalR.Cuneus
21TA of R.PrecentralR.TransvTemporalSA of L.MidTemporalL.Amygdala
22CV of L.TemporalPoleL.LatOrbFrontalTA of L.ParahippOpticChiasm
23CV of L.LatOrbFrontalL.RostAntCingTA of R.PrecentralL.MedOrbFrontal
24CV of R.TransvTemporalR.CerebWMTA of L.InfTemporalL.IsthmCing
25TS of L.SupFrontalR.SupParietalSA of L.ParahippL.ParsOper
26TS of R.ParahippL.SupFrontalCV of R.PrecentralL.CerebellCtx
27CV of R.SupParietalR.AccumbensAreaCV of R.TransvTemporalL.ParsTriang
28CV of L.MidTemporalR.CuneusTA of R.BanksSTSR.ParsOper
29TS of L.Precentral3rdVentSA of L.InfParietalL.Precuneus
30TS of L.InfTemporalL.IsthmCingCV of L.PrecentralR.Fusiform
Figure 8

Brain maps of the top 30 ROIs selected by GFL-SGL for ADAS. (a)–(d) are cortical ROIs selected; (e)–(g) are subcortical ROIs selected. (a) Left hemisphere (outside). (b) Left hemisphere (inside). (c) Right hemisphere (outside). (d) Right hemisphere (inside). (e) Coronal view. (f) Horizontal view. (g) Sagittal view.

Figure 9

Brain maps of the top 30 ROIs selected by GFL-SGL for MMSE. (a)–(d) are cortical ROIs selected; (e)–(g) are subcortical ROIs selected. (a) Left hemisphere (outside). (b) Left hemisphere (inside). (c) Right hemisphere (outside). (d) Right hemisphere (inside). (e) Coronal view. (f) Horizontal view. (g) Sagittal view.

Some important brain regions are also selected by our GFL-SGL, such as middle temporal [20,40-42], hippocampus [42], entorhinal [20], inferior lateral ventricle [35, 43], and parahipp [44], which are highly relevant to the cognitive impairment. These results are consistent with the established understanding of the pathological pathway of AD. These recognized brain regions have been figured out in the recent literature besides having been presented as have a high correlation with the medical functions. For instance, the hippocampus is situated in the temporal lobe of the brain that plays the part of the memory as well as spatial navigation. The entorhinal cortex refers to the first region of the brain being impacted; also, it is termed as the most severely impaired cortex in Alzheimer's disease [45]. Together with that, there are some of the recent findings stressing the significance of parahippocampal atrophy as a preliminary biomarker of AD, owing to the fact parahippocampal volume makes better discrimination in comparison with the hippocampal volume between the cases of healthy aging, MCI, and mild AD, being specific, in the preliminary stage of the disease [44]. In addition to that, the findings also reveal the fact that the changing thickness of the inferior parietal lobule takes place early while progressing from normal to MCI, together with being associated with the neuropsychological efficiency [46].

4.4. Fusion of Multimodality

Clinical and research studies commonly demonstrate that complementary brain images can be more accurate and rigorous for assessment of the disease status and cognitive function. The previous experiments are conduced on the MRI, which measures the structure of the cerebrum and has turned out to be an efficient tool for detecting the structural changes caused by AD or MCI. Fluorodeoxyglucose PET (FDG-PET), a technique for measuring glucose metabolism, can determine the likelihood of deterioration of mental status. Each neuroimaging modality could offer valuable information, and biomarkers from different modalities could offer complementary information for different aspects of a given disease process [4, 14, 47–49]. Since the multimodality data of ADNI-1 are missing seriously, the samples from ADNI-2 are used instead. The PET imaging data are from the ADNI database processed by the UC Berkeley team, who use a native-space MRI scan for each subject that is segmented and parcellated with Freesurfer to generate a summary cortical and subcortical ROI and coregister each florbetapir scan to the corresponding MRI and calculate the mean florbetapir uptake within the cortical and reference regions. The procedure of image processing is described in http://adni.loni.usc.edu/updated-florbetapir-av-45-pet-analysis-results/. The amount of the patients with MRI at M48 is small (29 subjects), and there are no data with PET at M6; 4 time points' data were used. Furthermore, there is no score measure for FLU.ANIM and lack of samples for FLU.VEG and TRAILS, so we use ADAS, MMSE, and RAVLT for a total of 6 scores in this experiment. We followed the same experimental procedure as described in Section 4.1, which yields a total of n = 897 subjects for baseline, and for the M12, M24, M36 time points, the sample size is 671, 470, and 62, respectively. To estimate the effect of combining multimodality data with our GFL-SGL method and to provide a more comprehensive comparison of our group guided method and the method without group structure, we further perform some experiments, which are (1) employing only MRI modality, (2) employing only PET modality, (3) combining two modalities: MRI and PET (MP), and (4) combining four modalities: MRI, PET, CSF, and demographic information including age, gender, years of education, and ApoE genotyping (MPCD). Note that, for the CSF modality, the original three measures (i.e., Aß42, t-tau, and p-tau) are directly used as features without any feature selection step. We compare the performance of TGL, cFSGL, and GFL-SGL on the fusing multimodalities for predicting the disease progression measured by the clinical scores (ADAS-Cog, MMSE, and RAVLT). For TGL and cFSGL, the features from multimodalities are concatenated into long vector features, while for our GFL-SGL, the features from same modality are considered as a group. The prediction performance results are shown in Table 6. It is clear that the methods with multimodality outperform the methods using one single modality of data. This validates our assumption that the complementary information among different modalities is helpful for cognitive function prediction. Especially, when two modalities (MRI and PET) are used, the performance is improved significantly compared to using the unimodal (MRI or PET) information. Moreover when four modalities (MRI, PET, CSF, and demographic information) are used, the performance is further improved. Regardless of two or four modalities, the proposed multitask learning GFL-SGL achieves better performance than TGL and cFSGL. This justifies the motivation of learning multiple tasks simultaneously with considering the group of variables regardless of the ROI structure or the modality structure.
Table 6

Prediction performance results of ten cognitive scores of four time points based on multimodality features.

MethodTGLcFSGLGFL-SGLTGLcFSGLGFL-SGL
Score: ADAS
MRIPET
nMSE4.5264 ± 0.63824.4109 ± 0.59184.6987 ± 0.74194.6438 ± 0.67334.4061 ± 0.64134.4294 ± 0.6974
wR0.6806 ± 0.08770.6806 ± 0.08530.6692 ± 0.08970.6792 ± 0.07160.6940 ± 0.07550.6997 ± 0.0842
BL rMSE6.3971 ± 1.12706.3670 ± 1.05096.5227 ± 1.31046.5252 ± 1.41466.3755 ± 1.30766.3614 ± 1.3953
M12 rMSE5.8845 ± 1.17035.8519 ± 1.06806.0360 ± 1.19206.0747 ± 1.25605.8881 ± 1.05075.8035 ± 1.0189
M24 rMSE5.2531 ± 0.92375.2829 ± 0.99665.4175 ± 0.98465.4755 ± 1.06225.2970 ± 1.03865.2901 ± 0.9836
M36 rMSE5.6362 ± 1.44454.5437 ± 1.56775.0457 ± 1.87084.4315 ± 1.82314.2938 ± 1.59775.0727 ± 2.0042

Score: ADAS
MPMPCD
nMSE4.3771 ± 0.82254.0380 ± 0.52823.8140 ± 0.70564.1169 ± 0.57913.9251 ± 0.4846 3.7255±0.6441
wR0.7140 ± 0.07560.7178 ± 0.07170.7400 ± 0.09420.7222 ± 0.06330.7267 ± 0.0633 0.7477±0.0842
BL rMSE6.1640 ± 1.08416.1365 ± 1.09375.9200 ± 1.02386.0910 ± 1.13276.0447 ± 1.1147 5.8632±1.0630
M12 rMSE5.6180 ± 0.99295.5713 ± 0.96595.2731 ± 0.79405.5036 ± 1.00185.5110 ± 0.9877 5.2172±0.7874
M24 rMSE5.3149 ± 0.96095.0187 ± 0.98734.7865 ± 0.74745.1299 ± 1.02654.9841 ± 1.0004 4.7442±0.7253
M36 rMSE6.1291 ± 1.79314.3765 ± 1.59895.1638 ± 1.63695.4648 ± 1.8165 4.2363±1.29925.0341 ± 1.5739

Score: MMSE
MRIPET
nMSE1.9059 ± 0.36731.5544 ± 0.15891.5446 ± 0.17092.0863 ± 0.94971.8916 ± 0.41451.5699 ± 0.1326
wR0.4737 ± 0.11320.5449 ± 0.10090.5383 ± 0.10140.4988 ± 0.09060.5233 ± 0.08130.5270 ± 0.0843
BL rMSE1.9866 ± 0.27821.8715 ± 0.32531.9085 ± 0.34092.0294 ± 0.33621.9772 ± 0.36491.8974 ± 0.3600
M12 rMSE1.9969 ± 0.40541.7781 ± 0.28631.7843 ± 0.27641.9950 ± 0.48671.9040 ± 0.27251.8229 ± 0.2912
M24 rMSE1.8220 ± 0.42201.6044 ± 0.28431.5656 ± 0.31401.9738 ± 0.75461.7230 ± 0.28461.5950 ± 0.2976
M36 rMSE1.9900 ± 1.04391.6339 ± 0.52791.4005 ± 0.47291.9142 ± 0.83792.3950 ± 2.04771.3472 ± 0.4827

Score: MMSE
MPMPCD
nMSE1.7323 ± 0.31531.5056 ± 0.10551.4386 ± 0.13101.7428 ± 0.40591.5697 ± 0.2846 1.3881±0.1132
wR0.5128 ± 0.09500.5763 ± 0.09690.5743 ± 0.09960.5352 ± 0.0931 0.5961±0.10050.5899 ± 0.0882
BL rMSE1.9714 ± 0.33931.8456 ± 0.34511.8486 ± 0.31191.9487 ± 0.30991.8780 ± 0.3308 1.8185±0.3082
M12 rMSE1.8040 ± 0.29611.7277 ± 0.22231.7144 ± 0.23661.8753 ± 0.42891.7615 ± 0.2122 1.6804±0.2269
M24 rMSE1.7497 ± 0.44081.5849 ± 0.27281.4954 ± 0.28471.7516 ± 0.47811.5927 ± 0.2903 1.4702±0.2631
M36 rMSE1.8481 ± 0.86081.5549 ± 0.53381.3110 ± 0.35681.6768 ± 0.81751.5542 ± 0.5532 1.2835±0.4067

Score: RAVLT.TOTAL
MRIPET
nMSE8.0525 ± 0.81857.7486 ± 0.81797.6082 ± 0.68607.9924 ± 0.58397.8193 ± 0.82317.6544 ± 0.7192
wR0.5989 ± 0.08630.6094 ± 0.08430.6091 ± 0.07900.6060 ± 0.08470.6003 ± 0.08790.6114 ± 0.0853
BL rMSE9.9809 ± 0.44399.8006 ± 0.45099.7251 ± 0.46889.8305 ± 0.57599.7743 ± 0.53249.6843 ± 0.5663
M12 rMSE9.8284 ± 0.66859.6394 ± 0.72349.5819 ± 0.78339.8347 ± 0.73669.7590 ± 0.89699.6180 ± 0.8630
M24 rMSE9.4549 ± 0.72439.2849 ± 0.77499.2384 ± 0.64259.8301 ± 0.93719.4391 ± 0.97159.4183 ± 0.9896
M36 rMSE9.3364 ± 2.20438.9823 ± 1.79968.5631 ± 1.94447.8316 ± 3.00208.6391 ± 2.96978.3481 ± 2.5665

Score: RAVLT.TOTAL
MPMPCD
nMSE7.4966 ± 0.97177.2046 ± 0.83267.0655 ± 1.14937.1461 ± 0.87366.7422 ± 1.0602 6.6873±0.6733
wR0.6350 ± 0.08780.6474 ± 0.08880.6484 ± 0.08800.6617 ± 0.0839 0.6785±0.07770.6749 ± 0.0631
BL rMSE9.6097 ± 0.52999.4845 ± 0.43469.4516 ± 0.67099.4001 ± 0.4595 9.1793±0.45059.2208 ± 0.5886
M12 rMSE9.6195 ± 0.89689.2463 ± 0.80429.1269 ± 0.81639.3194 ± 0.80208.8950 ± 0.8305 8.8526±0.6871
M24 rMSE9.1631 ± 1.04798.9473 ± 0.97518.7158 ± 0.97219.0133 ± 1.13898.7459 ± 1.2098 8.5530±0.8763
M36 rMSE7.7625 ± 2.43388.2418 ± 2.45187.9290 ± 2.6269 7.5097±2.05727.5204 ± 1.93607.8103 ± 1.9563

Score: RAVLT.TOT6
MRIPET
nMSE2.8868 ± 0.28222.6401 ± 0.30552.6064 ± 0.27272.8255 ± 0.24112.7979 ± 0.30452.6485 ± 0.2487
wR0.5500 ± 0.09470.5910 ± 0.09240.5944 ± 0.09030.5577 ± 0.08940.5599 ± 0.09310.5890 ± 0.0806
BL rMSE3.4094 ± 0.20203.2999 ± 0.19863.2783 ± 0.18903.3321 ± 0.16123.3348 ± 0.22273.2591 ± 0.2057
M12 rMSE3.3799 ± 0.25173.2278 ± 0.22363.2090 ± 0.22463.3612 ± 0.19263.3499 ± 0.21583.2439 ± 0.2019
M24 rMSE3.3710 ± 0.24933.1341 ± 0.31273.1229 ± 0.28963.3609 ± 0.38453.2712 ± 0.35923.2008 ± 0.3248
M36 rMSE3.1387 ± 0.98382.9726 ± 0.81342.9418 ± 0.83773.1407 ± 0.53653.2738 ± 0.58653.1430 ± 0.6435

Score: RAVLT.TOT6
MPMPCD
nMSE2.7875 ± 0.38572.4537 ± 0.36212.4498 ± 0.33772.6628 ± 0.35592.4224 ± 0.3332 2.3788±0.3278
wR0.5778 ± 0.10490.6233 ± 0.09780.6234 ± 0.09700.5975 ± 0.09240.6322 ± 0.0941 0.6385±0.0945
BL rMSE3.3512 ± 0.19913.1895 ± 0.23253.1879 ± 0.22413.2930 ± 0.12803.1716 ± 0.1708 3.1401±0.1625
M12 rMSE3.3006 ± 0.29323.1072 ± 0.23493.0973 ± 0.23673.2311 ± 0.30663.0928 ± 0.2264 3.0549±0.2382
M24 rMSE3.2935 ± 0.31802.9931 ± 0.35422.9989 ± 0.36443.2161 ± 0.33362.9747 ± 0.3850 2.9621±0.3640
M36 rMSE3.1544 ± 0.95842.8170 ± 0.62292.8485 ± 0.66322.8216 ± 0.9254 2.7400±0.59872.7649 ± 0.6387

Score: RAVLT.T30
MRIPET
nMSE3.0202 ± 0.26552.8297 ± 0.34032.7928 ± 0.34982.9929 ± 0.46702.9441 ± 0.40392.9405 ± 0.4265
wR0.5590 ± 0.06150.5861 ± 0.07420.5930 ± 0.07210.5551 ± 0.07870.5692 ± 0.08080.5663 ± 0.0780
BL rMSE3.5917 ± 0.23743.5116 ± 0.25143.4962 ± 0.25303.5711 ± 0.28893.5457 ± 0.30323.5681 ± 0.3175
M12 rMSE3.5622 ± 0.22493.4155 ± 0.22753.3880 ± 0.25503.4596 ± 0.32943.4806 ± 0.25653.4492 ± 0.2890
M24 rMSE3.5364 ± 0.21773.3545 ± 0.25713.3244 ± 0.24033.5668 ± 0.34353.4654 ± 0.30943.4536 ± 0.3149
M36 rMSE2.8940 ± 1.01972.8383 ± 1.11632.8168 ± 1.19703.2296 ± 1.09623.2096 ± 1.01703.2185 ± 1.1173

Score: RAVLT.T30
MPMPCD
nMSE2.8661 ± 0.45742.6653 ± 0.42712.6191 ± 0.40002.8335 ± 0.42352.6732 ± 0.5128 2.5605±0.3929
wR0.5811 ± 0.07770.6213 ± 0.08280.6241 ± 0.08060.5955 ± 0.07520.6229 ± 0.0907 0.6369±0.0820
BL rMSE3.5165 ± 0.29413.4327 ± 0.29173.4035 ± 0.28993.4891 ± 0.27103.4373 ± 0.2959 3.3650±0.2547
M12 rMSE3.4078 ± 0.34793.3028 ± 0.25903.2698 ± 0.28023.4073 ± 0.32563.3030 ± 0.3404 3.2280±0.2819
M24 rMSE3.4425 ± 0.27633.1974 ± 0.33073.1882 ± 0.28263.4122 ± 0.28223.1894 ± 0.4132 3.1638±0.3322
M36 rMSE2.9862 ± 1.17632.7979 ± 1.02622.7246 ± 1.03792.9526 ± 1.16592.7488 ± 1.0675 2.6291±0.9136

Score: RAVLT.RECOG
MRIPET
nMSE2.7632 ± 0.26342.6598 ± 0.22042.6003 ± 0.33792.8033 ± 0.53742.6547 ± 0.40512.6324 ± 0.4138
wR0.4662 ± 0.10140.4728 ± 0.10370.5035 ± 0.11580.4830 ± 0.15300.4955 ± 0.14260.5053 ± 0.1311
BL rMSE3.1427 ± 0.27873.1581 ± 0.33013.0909 ± 0.31853.1383 ± 0.33983.0983 ± 0.36263.0782 ± 0.3718
M12 rMSE3.0770 ± 0.41683.0286 ± 0.37952.9733 ± 0.39813.0581 ± 0.49462.9974 ± 0.38162.9664 ± 0.3816
M24 rMSE2.9060 ± 0.32372.7350 ± 0.28522.7274 ± 0.29012.9088 ± 0.27682.7838 ± 0.35482.7737 ± 0.3809
M36 rMSE2.5846 ± 0.74152.2824 ± 0.50622.4006 ± 0.58762.7997 ± 0.73572.5641 ± 0.39842.5969 ± 0.8513

Score: RAVLT.RECOG
MPMPCD
nMSE2.7116 ± 0.30452.5529 ± 0.41802.4729 ± 0.35492.7455 ± 0.26602.5057 ± 0.3914 2.4583±0.3645
wR0.4849 ± 0.11140.5279 ± 0.11930.5376 ± 0.11760.4926 ± 0.11360.5351 ± 0.1247 0.5419±0.1240
BL rMSE3.1454 ± 0.26303.0738 ± 0.34043.0246 ± 0.32623.1666 ± 0.28853.0441 ± 0.3272 3.0206±0.3336
M12 rMSE3.0863 ± 0.45392.9311 ± 0.38652.8867 ± 0.37953.0925 ± 0.44552.9022 ± 0.4169 2.8690±0.3839
M24 rMSE2.7858 ± 0.34002.6819 ± 0.28552.6470 ± 0.29542.8290 ± 0.36992.6632 ± 0.3016 2.6380±0.3253
M36 rMSE2.3208 ± 0.70002.2783 ± 0.48322.3415 ± 0.53952.3693 ± 0.6535 2.2978±0.45412.3546 ± 0.4869

Note that the best results are boldfaced. The superscript symbol “±” indicates that GFL-SGL significantly outperformed that method on that score. Paired t-test at a level of 0.05 was used.

5. Conclusion

In this paper, we investigated the progression of longitudinal Alzheimer's disease (AD) by means of multiple cognitive scores and multimodality data. We proposed a multitask learning formulation with group guided regularization that can exploit the correlation of different time points and the importance of ROIs or multiple modalities for predicting the cognitive scores. Alternating direction method of multipliers (ADMM) method is presented to efficiently tackle the associated optimization problem. Experiments and comparisons of this model, with the baseline and temporal smoothness methods, illustrate that GFL-SGL offers consistently better performance than other algorithms on both MRI features and multimodality data. In the current work, group guided information is only considered for each cognitive score separately with multiple tasks corresponding to the same cognitive score across multiple time points. And the group guided information used in this work is predefined; there is no ability to automatically learn the feature groups. Since the cognitive scores are used in different ways to measure the same underlying medical condition and the features have different structures, we expect that a more general group guided framework that learns group information automatically will be considered for all cognitive scores across all time points simultaneously. While the current study illustrates the power of our proposed method, we expect to perform more general experiments to validate the effectiveness in our future work. All of the regions processed by UCSF are used in this work. We will consider the medical background and screen these features. In order to compare the significant performance of the methods more effectively, we will randomly split the subjects into train and test. This will be repeated many times to obtain enough scores for statistical analysis.
  28 in total

Review 1.  The mini-mental state examination: a comprehensive review.

Authors:  T N Tombaugh; N J McIntyre
Journal:  J Am Geriatr Soc       Date:  1992-09       Impact factor: 5.562

Review 2.  Entorhinal cortex pathology in Alzheimer's disease.

Authors:  G W Van Hoesen; B T Hyman; A R Damasio
Journal:  Hippocampus       Date:  1991-01       Impact factor: 3.899

Review 3.  Recent publications from the Alzheimer's Disease Neuroimaging Initiative: Reviewing progress toward improved AD clinical trials.

Authors:  Michael W Weiner; Dallas P Veitch; Paul S Aisen; Laurel A Beckett; Nigel J Cairns; Robert C Green; Danielle Harvey; Clifford R Jack; William Jagust; John C Morris; Ronald C Petersen; Andrew J Saykin; Leslie M Shaw; Arthur W Toga; John Q Trojanowski
Journal:  Alzheimers Dement       Date:  2017-03-22       Impact factor: 21.566

4.  Prediction of Progressive Mild Cognitive Impairment by Multi-Modal Neuroimaging Biomarkers.

Authors:  Lele Xu; Xia Wu; Rui Li; Kewei Chen; Zhiying Long; Jiacai Zhang; Xiaojuan Guo; Li Yao
Journal:  J Alzheimers Dis       Date:  2016       Impact factor: 4.472

5.  Classification of CT brain images based on deep learning networks.

Authors:  Xiaohong W Gao; Rui Hui; Zengmin Tian
Journal:  Comput Methods Programs Biomed       Date:  2016-10-20       Impact factor: 5.428

6.  Identifying the neuroanatomical basis of cognitive impairment in Alzheimer's disease by correlation- and nonlinearity-aware sparse Bayesian learning.

Authors:  Jing Wan; Zhilin Zhang; Bhaskar D Rao; Shiaofen Fang; Jingwen Yan; Andrew J Saykin; Li Shen
Journal:  IEEE Trans Med Imaging       Date:  2014-04-01       Impact factor: 10.048

7.  Cortical surface biomarkers for predicting cognitive outcomes using group l2,1 norm.

Authors:  Jingwen Yan; Taiyong Li; Hua Wang; Heng Huang; Jing Wan; Kwangsik Nho; Sungeun Kim; Shannon L Risacher; Andrew J Saykin; Li Shen
Journal:  Neurobiol Aging       Date:  2014-08-29       Impact factor: 4.673

8.  Multi-modal multi-task learning for joint prediction of multiple regression and classification variables in Alzheimer's disease.

Authors:  Daoqiang Zhang; Dinggang Shen
Journal:  Neuroimage       Date:  2011-10-04       Impact factor: 6.556

9.  Predicting clinical scores from magnetic resonance scans in Alzheimer's disease.

Authors:  Cynthia M Stonnington; Carlton Chu; Stefan Klöppel; Clifford R Jack; John Ashburner; Richard S J Frackowiak
Journal:  Neuroimage       Date:  2010-03-25       Impact factor: 6.556

10.  Atrophy in the parahippocampal gyrus as an early biomarker of Alzheimer's disease.

Authors:  C Echávarri; P Aalten; H B M Uylings; H I L Jacobs; P J Visser; E H B M Gronenschild; F R J Verhey; S Burgmans
Journal:  Brain Struct Funct       Date:  2010-10-19       Impact factor: 3.270

View more
  1 in total

Review 1.  The Road to Personalized Medicine in Alzheimer's Disease: The Use of Artificial Intelligence.

Authors:  Anuschka Silva-Spínola; Inês Baldeiras; Joel P Arrais; Isabel Santana
Journal:  Biomedicines       Date:  2022-01-29
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.