Literature DB >> 34262756

Fast Covariance Estimation for Multivariate Sparse Functional Data.

Abstract

Covariance estimation is essential yet underdeveloped for analyzing multivariate functional data. We propose a fast covariance estimation method for multivariate sparse functional data using bivariate penalized splines. The tensor-product B-spline formulation of the proposed method enables a simple spectral decomposition of the associated covariance operator and explicit expressions of the resulting eigenfunctions as linear combinations of B-spline bases, thereby dramatically facilitating subsequent principal component analysis. We derive a fast algorithm for selecting the smoothing parameters in covariance smoothing using leave-one-subject-out cross-validation. The method is evaluated with extensive numerical studies and applied to an Alzheimer's disease study with multiple longitudinal outcomes.

Entities: Chemical

Keywords: Bivariate smoothing; Covariance function; Functional principal component analysis; Longitudinal data; Multivariate functional data; Prediction

Year: 2020 PMID： 34262756 PMCID： PMC8276768 DOI： 10.1002/sta4.245

Source DB: PubMed Journal: Stat (Int Stat Inst) ISSN： 2049-1573

INTRODUCTION

Functional data analysis (FDA) has been enjoying great successes in many applied fields, for example, neuroimaging (Reiss & Ogden, 2010; Lindquist, 2012; Goldsmith, Crainiceanu, Caffo, & Reich, 2012; Zhu, Li, & Kong, 2012), genetics (Leng & Müller, 2006; Reimherr & Nicolae, 2014; 2016), and wearable computing (Morris et al., 2006; Xiao et al., 2015). Functional principal component analysis (FPCA) conducts dimension reduction on the inherently infinite‐dimensional functional data and thus facilitates subsequent modelling and analysis. Traditionally, functional data are densely observed on a common grid and can be easily connected to multivariate data, although the notion of smoothness distinguishes the former from the latter. In recent years, covariance‐based FPCA (Yao, Müller, & Wang, 2005) has become a standard approach and has greatly expanded the applicability of functional data methods to irregularly spaced data such as longitudinal data. Various nonparametric methods have now been proposed to estimate the smooth covariance function, for example, Peng and Paul (2009), Cai and Yuan (2010), Goldsmith et al. (2012), Xiao, Li, Checkley, and Crainiceanu (2018), and Wong and Zhang (2019). There has been growing interest in multivariate functional data where multiple functions are observed for each subject. For dense functional data, Ramsay and Silverman (2005, Chapter 8.5) proposed to concatenate multivariate functional data as a single vector and conduct multivariate PCA on the long vectors, and Berrendero, Justel, and Svarc (2011) repeatedly applied point‐wise univariate PCA. For sparse and paired functional data, Zhou, Huang, and Carroll (2008) extended the low‐rank mixed effects model in James, Hastie, and Sugar (2000). Chiou, Chen, and Yang (2014) considered normalized multivariate FPCA through standardizing the covariance operator. Petersen and Müller (2016) proposed various metrics for studying cross‐covariance between multivariate functional data. More recently, Happ and Greven (2018) introduced a FPCA framework for multivariate functional data defined on different domains. The interest of the paper is FPCA for multivariate sparse functional data, where multiple responses are observed at time points that vary from subject to subject and may even vary between responses within subjects. There are much fewer works to handle such data. The approach in Zhou et al. (2008) focuses on bivariate functional data and can be extended to more than two‐dimensional functional data, although model selection (e.g., selection of smoothing parameters) can be computationally difficult and convergence of the expectation–maximization estimation algorithm could also be an issue. The local polynomial method in Chiou et al. (2014) can be applied to multivariate sparse functional data, although a major drawback is the selection of multiple bandwidths. Moreover, because the local polynomial method is a local approach, there is no guarantee that the resulting estimates of covariance functions will lead to a properly defined covariance operator. The approach in Happ and Greven (2018) (denoted by mFPCA hereafter) estimates cross‐covariances via scores from univariate FPCA and hence can be applied to multivariate sparse functional data. Although mFPCA is theoretically sound for dense functional data, it may not capture cross‐correlations between functions because scores from univariate FPCA for sparse functional data are shrunk towards zero. We propose a novel and fast covariance‐based FPCA method for multivariate sparse functional data. Note that multiple auto‐covariance functions for within‐function correlations and cross‐covariance functions for between‐function correlations have to be estimated. Tensor‐product B‐splines are employed to approximate the covariance functions, and a smoothness penalty as in bivariate penalized splines (Eilers & Marx, 2003) is adopted to avoid overfit. Then, the individual estimates of covariance functions will be pooled and refined. The advantages of the new method are multifold. First, the tensor‐product B‐spline formulation is computationally efficient to handle multivariate sparse functional data. Second, a fast fitting algorithm for selecting the smoothing parameters will be derived, which alleviates the computational burden of conducting leave‐one‐subject‐out cross‐validation. Third, the tensor‐product B‐spline representation of the covariance functions enables a straightforward spectral decomposition of the covariance operator for the multivariate functional data; see Proposition 1. In particular, the eigenfunctions associated with the covariance operator are explicit functions of the B‐spline bases. Last but not the least, via a simple truncation step, the refined estimates of the covariance functions lead to a properly defined covariance operator. Compared with mFPCA, the proposed method does not rely on scores from univariate FPCA, which could be a severe problem for sparse functional data and hence could better capture the correlations between functions. And an improved correlation estimation will lead to improved subsequent FPCA analysis and curve prediction. The proposed method also compares favourably with the local polynomial method in Chiou et al. (2014) because of the computationally efficient tensor‐product spline formulation of the covariance functions and the derived fast algorithm for selecting the smoothing parameters. Moreover, as mentioned above, there exists an explicit and easy‐to‐calculate relationship between the tensor‐product spline representation of covariance functions and the associated eigenfunctions/eigenvalues, which greatly facilitate subsequent FPCA analysis. In addition to FPCA, there are also abundant literatures on models for multivariate functional data with most focusing on dense functional data. For clustering of multivariate functional data, see Zhu, Brown, and Morris (2012), Jacques and Preda (2014), Huang, Li, and Guan (2014), and Park and Ahn (2017). For regression with multivariate functional responses, see Zhu et al. (2012), Luo and Qi (2017), Li, Huang, and Zhu, (2017), Wong, Li, and Zhu (2019), Zhu, Morris, Wei, and Cox (2017), Kowal, Matteson, and Ruppert (2017), and Qi and Luo (2018). Graphical models for multivariate functional data are studied in (Zhu, Strawn, & Dunson, 2016) and Qiao, Guo, and James (2019). Works on multivariate functional data include also Chiou and Müller (2014, 2016). The remainder of the paper proceeds as follows. In Section 2, we present our proposed method. We conduct extensive simulation studies in Section 3 and apply the proposed method to an Alzheimer's disease (AD) study in Section 4. A discussion is given in Section 5. All technical details are enclosed in the Appendix.

METHODS

Fundamentals of multivariate functional principal component analysis

Let p be a positive integer and denote by a continuous and bounded domain in the real line . Consider the Hilbert space equipped with the inner product and norm such that for arbitrary functions and in with each element in , and . Let be a set of p random functions with each function in . Assume that the p‐dimensional vector has a p‐dimensional smooth mean function, . Define the covariance function as and . Then, the covariance operator associated with the kernel C(s,t) can be defined such that for any , the kth element of Γf is given by where C (s,t)=(C (s,t),…,C (s,t))⊤. Note that Γ is a linear, self‐adjoint, compact, and non‐negative integral operator. By the Hilbert–Schmidt theorem, there exists a set of orthonormal bases , , and , such that where d is the ℓth largest eigenvalue corresponding to Ψ . Then, the multivariate Mercer's theorem gives where . As shown in Saporta (1981), x(t) has the multivariate Karhunen–Loève representation, , where are the scores with and . The covariance operator Γ has the positive semidefiniteness property; that is, for any , the covariance function of ⊤ x, denoted by C (s,t), satisfies that for any sets of time points with an arbitrary positive integer q, the square matrix is positive semidefinite.

Covariance estimation by bivariate penalized splines

Suppose that the observed data take the form , where is the observed time point, is the observed kth response, n is the number of subjects, and m is the number of observations for subject i's kth response. The model is where , are random noises with zero means and variances and are independent across i,j, and k. The goal is to estimate the covariance functions C . We adopt a three‐step procedure. In the first step, empirical estimates of the covariance functions are constructed. Let be the residuals and be the auxiliary variables. Note that for 1 ≤ j 1 ≤ m ,1 ≤ j 2 ≤ m . Thus, is an unbiased estimate of whenever k≠k ′ or j 1≠j 2. In the second step, the noisy auxiliary variables are smoothed to obtain smooth estimates of the covariance functions. For smoothing, we use bivariate P‐splines (Eilers & Marx, 2003) because it is an automatic smoother and is computationally simple. In the final step, we pool all estimates of the individual covariance functions and use an extra step of eigendecomposition to obtain refined estimates of covariance functions. The refined estimates lead to a covariance operator that is properly defined, that is, positive semidefinite. In practice, the mean functions μ (s are unknown, and we estimate them using P‐splines (Eilers & Marx, 1996) with the smoothing parameters selected by leave‐one‐subject‐out cross‐validation; see the Supporting Information for details. Denote the estimates by . Let and , the actual auxiliary variables. The bivariate P‐splines model uses tensor‐product splines for 1 ≤ k,k ′ ≤ p. Specifically, , where is a coefficient matrix, {B 1(·),…,B (·)} is the collection of B‐spline basis functions in , and c is the number of equally spaced interior knots plus the order (degree plus 1) of the B‐splines. Because , it is reasonable to impose the assumption that so that G (s,t)=G (t,s). Therefore, in the rest of the section, we consider only k ≤ k ′. Let denote a second‐order differencing matrix such that for a vector , . Also let ‖·‖ be the Frobenius norm. For the cross‐covariance function with k For the auto‐covariance functions C (s,t) with k=1,…,p, we conduct bivariate covariance smoothing by enforcing the following constraint on the coefficient matrix Θ (Xiao et al., 2018): It follows that G (s,t) is a symmetric function. Then, the coefficient matrix Θ and the error variance are jointly estimated by and , which minimize the penalized least squares over all symmetric Θ and λ is a smoothing parameter. Note that the two penalty terms in Equation (4) become the same when Θ is symmetric, and thus, only one smoothing parameter is needed for auto‐covariance estimation.

Estimation

We first introduce the notation. Let vec(·) be an operator that stacks the columns of a matrix into a column vector and denote by ⊗ the Kronecker product. Fix k and k ′ with k ≤ k ′. Let be a vector of the coefficients and denotes the B‐spline base. Then, We now organize the auxiliary responses for each pair of k and k ′. Let , and , where is the total number of auxiliary responses for the pair of k and k ′. As for the B‐splines, let , , and . For estimation of the cross‐covariance functions C with k For estimation of the auto‐covariance functions, because of the constraint on the coefficient matrix in Equation (5), let be a vector obtained by stacking the columns of the lower triangle of Θ , and let be a duplication matrix such that =G η (Seber, 2008, p. 246). Let and . Finally, let with . It follows that the penalized least squares in Equation (6) can be rewritten as where and . Therefore, we obtain It follows that , and the estimate of the auto‐covariance function C (s,t) is . The above estimates of covariance functions may not lead to a positive semidefinite covariance operator and thus have to be refined. We pool all estimates together, and we shall use the following proposition. Assume that . Let and assume that G is positive definite (Zhou, Shen, & Wolfe, 1998). Then, admits the spectral decomposition, , where d is the ℓth largest eigenvalue of the covariance operator Γ, and is the associated eigenvector with and such that . The proof is provided in Appendix A. Proposition 1 implies that, with the tensor‐product B‐spline representation of the covariance functions, one spectral decomposition gives us the eigenvalues and eigenfunctions. In particular, the eigenfunctions are linear combinations of the B‐spline basis functions, which means that they can be straightforwardly evaluated, an advantage of spline‐based methods compared with other smoothing methods for which eigenfunctions are approximated by spectral decompositions of the covariance functions evaluated at a grid of time points. Once we have , the estimate of the coefficient matrix Θ , the spectral decomposition of gives us estimates and . We discard negative to ensure that the multivariate covariance operator is positive semidefinite, and this leads to a refined estimate of the coefficient matrix Θ , . Then, the refined estimate of the covariance functions is . Proposition 1 also suggests that the eigenfunctions can be estimated by . For principal component analysis or curve prediction in practice, one may select further the number of principal components by either the proportion of variance explained (PVE) (Greven, Crainiceanu, Caffo, & Reich, 2010) or an Akaike information criterion‐type criterion (Li, Wang, & Carroll, 2013). Here, we follow Greven et al. (2010) using PVE with a value of 0.99.

Selection of smoothing parameters

We select the smoothing parameters in each auto‐covariance/cross‐covariance estimation using leave‐one‐subject‐out cross‐validation; see, for example, Yao et al. (2005) and Xiao et al. (2018). A fast approximate algorithm for the auto‐covariance has been derived in Xiao et al. (2018). So we focus on the cross‐covariance and use the notation in Equation (7). Note that there are two smoothing parameters for each cross‐covariance estimation. For simplicity, we suppress the superscript and subscript kk ′ in Equation (7) for both and B. Let be the prediction of the auxiliary responses from the estimate using data without the ith subject. Let ‖·‖ be the Euclidean norm, and the cross‐validation error is We shall also now suppress the subscript k from m and kk ′ from N . Let , , and . Then, a shortcut formula for Equation (8) is Similar to Xu and Huang (2012) and Xiao et al. (2018), the iCV can be further simplified by adopting the approximation , which results in the generalized cross‐validation, denoted by iGCV, Although iGCV is much easier to compute than iCV, the formula in Equation (9) is still computationally expensive to compute. Indeed, the smoother matrix S is of dimension 2,500×2,500 if n=100 and m =m=5 for all i. Thus, we need to further simplify the formula. Let , , , , , and . Also let , , and . Then, Equation (9) can be simplified as Note that Σ has two smoothing parameters. Following Wood (2000), we use an equivalent parameterization , where ρ=λ 1+λ 2 represents the overall smoothing level and w=λ 1 ρ −1∈[0,1] is the relative weight of λ 1. We conduct a two‐dimensional grid search of (ρ,w), as follows. For a given w, let Udiag(s)U ⊤ be the eigendecompsition of , where is an orthonormal matrix and is the vector of eigenvalues. Then, with . Let ⊙ stand for the point‐wise multiplication. Then, where , , , , and . The proof is provided in Appendix A. For each w, note that only depends on ρ and needs to be calculated repeatedly, and all other terms need to be calculated only once. The entire algorithm is presented in Algorithm 1. We give an evaluation of the complexity of the proposed algorithm. Assume that m =m for all i. The first initialization (Step 1) requires O(nm 2 c 2+nc 4+c 6) computations. For each w, the second initialization (Step 2) also requires computations. For each ρ, Steps 3–8 require O(nc 4) computations. Therefore, the formula in Proposition 2 is most efficient to calculate for sparse data with small numbers of observations per subject; that is, m s are small.

Prediction

For prediction, assume that the smooth curve x (t) is generated from a multivariate Gaussian process. Suppose that we want to predict the ith multivariate response x (t) at {s ,…,s } for m ≥ 1. Let be the vector of observations at for the kth response. Let be the vector of the kth mean function at the observed time points. Let and . Let be the vector of mean functions at the time points for prediction. It follows that Thus, we obtain Let and . Next, let and . Then, Cov(y ) given by , and Cov(x ) and Cov(y ,x ) are given by and , respectively. Let . Plugging in the estimates, we predict x by where is the estimate of , is the estimate of , . An approximate covariance matrix for is Therefore, a 95% point‐wise confidence interval for the kth response is given by where can be extracted from the diagonal of . Finally, we predict the first L ≥ 1 scores =(ξ ,…,ξ )⊤ for the ith subject. Note that . With a similar derivation as above, x (t)− (t) can be predicted by . By Proposition 1, the eigenfunctions are estimated by , and thus, . It follows that

SIMULATIONS

We evaluate the finite sample performance of the proposed method (denoted by mFACEs) against mFPCA via a synthetic simulation study and a simulation study mimicking the Alzheimer's Disease Neuroimaging Initiative (ADNI) data in the real data example. Here, we report the details and results of the former as the conclusions remain the same for the latter, and details are provided in the Supporting Information.

Simulation settings and evaluation criteria

We generate data by model (3) with p=3 responses. The mean functions are . We first specify the auto‐covariance functions. Let , , and . Also let Then, the auto‐covariance functions are . For the cross‐covariance functions, let for k≠k ′, where ρ∈[0,1] is a parameter to be specified. The induced covariance operator from the above specifications is proper; see Lemma 1 in Appendix B. It is easy to derive that the absolute value of cross‐correlation is bounded by ρ. Hence, ρ controls the overall level of correlation between responses: if ρ=0, then the responses are uncorrelated from each other. The eigendecomposition of the multivariate covariance function gives nine nonzero eigenvalues with associated multivariate eigenfunctions; hence, for ℓ=1,…,9, we simulate the scores ξ from , where d are the induced eigenvalues. Next, we simulate the white noises from , where is determined according to the signal‐to‐noise ratio . Here, we let SNR=2. For each response, the sampling time points are drawn from a uniform distribution in the unit interval, and the number of observations for each subject, m , is generated from a uniform discrete distribution on {3,4,5,6,7}. Thus, the sampling points vary not only from subject to subject but also across responses within each subject. We use a factorial design with two factors: the number of subjects n and the correlation parameter ρ. We let n=100,200, or 400. We let ρ=0.5, which corresponds to a weak correlation between responses as the average absolute correlation between responses is only 0.36. Another value of ρ is 0.9, which corresponds to a moderate correlation between responses as the average absolute correlation between responses is about 0.50. In total, we have six model conditions, and for each model condition, we generate 200 datasets. To evaluate the prediction accuracy of the various methods, we draw 200 additional subjects as testing data. The true correlation functions and a sample of the simulated data are shown in the Supporting Information. We compare mFACEs and mFPCA in terms of estimation accuracy of the covariance functions, the eigenfunctions and eigenvalues, and prediction of new subjects. For covariance function estimation, we use the relative integrated square errors (RISE). Let be an estimate of , and then RISE are given by For estimating the ℓth eigenfunction, we use the integrated square errors (ISE), which are defined as Note that the range of ISE is [0,2]. For estimating the eigenvalues, we use the ratio of the estimate against the truth; that is, . For predicting new curves, we use the mean integrated square errors (MISE), which are given by For the curve prediction using mFPCA, we truncate the number of principal components using a PVE of 0.99. It is worth noting that if no truncation is adopted, then the curve prediction using mFPCA reduces to curve prediction using univariate FPCA. We shall also consider the conditional expectation method based on the estimates of covariance functions from mFPCA. The method is denoted by mFPCA(CE), and its difference with mFACEs is that different estimates of covariance functions are used.

Simulation results

Figure 1 gives boxplots of RISEs of mFACEs and mFPCA for estimating covariance functions. Under all model conditions, mFACES outperforms mFPCA, and the improvement in RISEs as the sample size increases is much more pronounced for mFACEs. Under the model conditions with moderate correlations (ρ=0.9), the advantage of mFACEs is substantial even for the small sample size n=100.

Figure 1

Boxplots of relative integrated square errors (RISEs) of mFACEs and mFPCA for estimating the covariance function

Boxplots of relative integrated square errors (RISEs) of mFACEs and mFPCA for estimating the covariance function Figures 2 and 3 give boxplots of ISEs and violin plots of mFACEs and mFPCA for estimating the top two eigenfunctions and eigenvalues, respectively. The top two eigenvalues account for about 60% of the total variation in the functional data for ρ=0.5, and it is 80% for ρ=0.9. Figure 2 shows that although the two methods are overall comparable for estimating the first eigenfunction, mFACEs has a much better accuracy for estimating the second eigenfunction than mFPCA. The violin plots in Figure 3 show that mFACEs outperforms mFPCA substantially for estimating both eigenvalues under all model conditions. The mFPCA always underestimates the eigenvalues as the variation of scores from univariate FPCA is smaller than the true variation and hence leads to underestimates of eigenvalues.

Figure 2

Boxplots of integrated square errors (ISEs) of mFACEs and mFPCA for estimating the top two eigenfunctions

Figure 3

Violin plots of mFACEs and mFPCA for estimating the top two eigenvalues. The red horizontal lines indicate that the estimates are equal to the truth

Boxplots of integrated square errors (ISEs) of mFACEs and mFPCA for estimating the top two eigenfunctions Violin plots of mFACEs and mFPCA for estimating the top two eigenvalues. The red horizontal lines indicate that the estimates are equal to the truth Finally, we consider the prediction of new subjects by mFACEs, mFPCA, and mFPCA(CE). We define the relative efficiencies of different methods as the ratios of MISEs with respect to those of univariate FPCA; see Figure 4. Univariate FPCA is implemented in the R package face (Xiao, Li, Checkley, & Crainiceanu, 2018). We have the following findings. Under all model conditions, mFACEs has the smallest MISE, mFPCA(CE) has the second best performance, and mFPCA is close to univariate FPCA. Thus, on average, mFACEs provides the most accurate curve prediction. These results indicate that (a) mFACEs has better covariance estimation than mFPCA(CE) and so is the prediction based on it; (b) compared with mFPCA/univariate FPCA, mFPCA(CE) exploits the correlation information and hence results in better predictions.

Figure 4

Boxplots of relative efficiency of three methods for curve prediction. The grey horizontal lines indicate the mean integrated square errors (MISEs) for univariate functional principal component analysis (FPCA)

APPLICATION TO ALZHEIMER'S DISEASE STUDY

The ADNI is a two‐stage longitudinal observational study launched in year 2003 with the primary goal of investigating whether serial neuroimages, biological markers, clinical and neuropsychological assessments can be combined to measure the progression of AD (Weiner et al., 2017). The ADNI‐1 data from the first stage contain 379 patients with amnestic mild cognitive impairment (MCI, a risk state for AD) at baseline who had at least one follow‐up visit. Participants were assessed at baseline, 6, 12, 18, 24, and 36 months with additional annual follow‐ups included in the second stage of the study. At each visit, various neuropsychological assessments, clinical measures, and brain images were collected. The ADNI‐2 data include 424 additional patients suffering from MCI and significant memory concern, with at least one follow‐up visit and longitudinal data collected over 4 years. Thus, for the combined data, the total number of subjects is 803, and the average number of visits is 4.72. The data are publicly available at http://ida.loni.ucla.edu/. We consider five longitudinal markers commonly measured in studies of AD with strong comparative predictive value (Li, Chan, Doody, Quinn, & Luo, 2017). Among the five markers, Disease Assessment Scale‐Cognitive 13 items (ADAS‐Cog 13), Rey Auditory Verbal Learning Test immediate recall (RAVLT.imme), Rey Auditory Verbal Learning Test learning curve (RAVLT.learn), and Mini‐Mental State Examination (MMSE) are neuropsychological assessments. Functional Assessment Questionnaire (FAQ) is a functional and behavioural assessment. High values of ADAS‐Cog 13 and FAQ indicate a high‐risk state for AD, whereas low values of RAVLT.imme, RAVLT.learn, and MMSE reflect severe cognitive impairment. The longitudinal trajectories in ADNI‐1 and ADNI‐2 are defined on the same time domain with the largest follow‐up time of 96 months from the start of ADNI‐1 (Time 0).

Multivariate FPCA via mFACEs

We analyse the five longitudinal biomarkers using mFACEs. For better visualization, we plot in Figure 5 the estimated correlation functions . The plot indicates two groups of biomarkers, ADAS‐Cog 13 and FAQ, in one group and RAVLT.imme, RAVLT.learn, and MMSE in another group. The biomarkers within the groups are positively correlated and negatively correlated between groups, which makes sense as high values of ADAS‐Cog 13 and FAQ and low values for the other biomarkers suggest of AD. Next, we display in Figure 6 the two estimated (multivariate) eigenfunctions associated with the top two estimated eigenvalues, which account for 69% and 11% of the total variance in the functional part of the data. The eigenfunctions reveal how the five biomarkers covary and how a subject's trajectories of biomarkers deviate from the population mean. Indeed, we see from Figure 6 that the first eigenfunction (solid curves) is below the zero line for ADAS‐Cog 13 and FAQ and above the zero line for the other three biomarkers. This means that the score corresponding to the first eigenfunction might be used as an indicator of AD. Indeed, a negative score for the first eigenfunction means higher‐than‐population‐mean values of the former whereas lower‐than‐population‐mean values of the latter, indicating more severe AD status. The second eigenfunction (dashed curves) for the five biomarkers is below the zero line at first and then above it or the other way around, potentially suggesting of a longitudinal pattern of the AD progression. Specifically, these subjects with a positive score for the second eigenfunction will have higher ADAS‐Cog 13/FAQ and lower RAVLT and MMSE over the months, suggesting of AD progression. Finally, we illustrate in Figure 7 the predicted curves along with the associated 95% point‐wise confidence bands for three subjects. We focus on predicting the trajectories over the first 4 years as there are more observations. We can see that the confidence bands are getting wider at the later time points because of fewer observations.

Figure 5

Figure 6

Estimated top two eigenfunctions for the longitudinal markers. ADAS‐Cog 13, Disease Assessment Scale‐Cognitive 13 items; FAQ, Functional Assessment Questionnaire; MMSE, Mini‐Mental State Examination; RAVLT.imme, Rey Auditory Verbal Learning Test immediate recall; RAVLT.learn, Rey Auditory Verbal Learning Test learning curve

Figure 7

Predicted subject‐specific curves (red solid line) of the longitudinal markers and associated 95% point‐wise confidence bands (blue dashed line) for three subjects. ADAS‐Cog 13, Disease Assessment Scale‐Cognitive 13 items; FAQ, Functional Assessment Questionnaire; MMSE, Mini‐Mental State Examination; RAVLT.imme, Rey Auditory Verbal Learning Test immediate recall; RAVLT.learn, Rey Auditory Verbal Learning Test learning curve

Estimated correlation functions for the longitudinal markers. ADAS‐Cog 13, Disease Assessment Scale‐Cognitive 13 items; FAQ, Functional Assessment Questionnaire; MMSE, Mini‐Mental State Examination; RAVLT.imme, Rey Auditory Verbal Learning Test immediate recall; RAVLT.learn, Rey Auditory Verbal Learning Test learning curve Estimated top two eigenfunctions for the longitudinal markers. ADAS‐Cog 13, Disease Assessment Scale‐Cognitive 13 items; FAQ, Functional Assessment Questionnaire; MMSE, Mini‐Mental State Examination; RAVLT.imme, Rey Auditory Verbal Learning Test immediate recall; RAVLT.learn, Rey Auditory Verbal Learning Test learning curve Predicted subject‐specific curves (red solid line) of the longitudinal markers and associated 95% point‐wise confidence bands (blue dashed line) for three subjects. ADAS‐Cog 13, Disease Assessment Scale‐Cognitive 13 items; FAQ, Functional Assessment Questionnaire; MMSE, Mini‐Mental State Examination; RAVLT.imme, Rey Auditory Verbal Learning Test immediate recall; RAVLT.learn, Rey Auditory Verbal Learning Test learning curve

Comparison of prediction performance of different methods

We compare the proposed mFACEs with mFPCA and mFPCA(CE) for predicting the five longitudinal biomarkers. The prediction performance is evaluated by the average squared prediction errors (APE), where is the predicted value of the kth biomarker for the ith subject at time . We conduct two types of validation: an internal validation and an external validation. For the internal validation, we perform a 10‐fold cross‐validation to the combined data of ADNI‐1 and ADNI‐2. For the external validation, we fit the model using only the ADNI‐1 data and then predict ADNI‐2 data. Figure 8 summarizes the results. For simplicity, we present the relative efficiency of APE, which is the ratio of APEs of one method against the mFPCA. In both cases, mFACEs achieves better prediction accuracy than competing methods. Note that mFPCA(CE) outperforms mFPCA for predicting almost all biomarkers. The results suggest that (a) mFACEs is better than competing methods for analysing the longitudinal biomarkers. and (b) exploiting the correlations between the biomarkers improves prediction.

Figure 8

The internal and external prediction validations for the Alzheimer's Disease Neuroimaging Initiative longitudinal makers. ADAS‐Cog 13, Disease Assessment Scale‐Cognitive 13 items; FAQ, Functional Assessment Questionnaire; MMSE, Mini‐Mental State Examination; RAVLT.imme, Rey Auditory Verbal Learning Test immediate recall; RAVLT.learn, Rey Auditory Verbal Learning Test learning curve

DISCUSSION

The prevalence of multivariate functional data has sparked much research interests in recent years. However, covariance estimation for multivariate sparse functional data remains underdeveloped. We proposed a new method, mFACEs, and its features include the following: (a) A covariance smoothing framework is proposed to tackle multivariate sparse functional data; (b) an automatic and fast fitting algorithm is adopted to ensure the scalability of the method; (c) eigenfunctions and eigenvalues can be obtained through a one‐time spectral decomposition, and eigenfunctions can be easily evaluated at any sampling points; and (d) a multivariate extension of the conditional expectation approach (Yao et al., 2005) is derived to exploit correlations between outcomes. The simulation study and the data example showed that mFACEs could better capture between‐function correlations and thus gives improved principal component analysis and curve prediction. When the magnitude of functional data are quite different, one may first normalize the functional data, as recommended by Chiou et al. (2014). One method of normalization is to rescale the functional data using the estimated variance function as in Chiou et al. (2014) and Jacques and Preda (2014). An alternative method is to use a global rescaling factor like as in Happ and Greven (2018). Both methods can be easily incorporated into our proposed method. In our data analysis, we find that the results with normalization are very close to those without normalization; thus, we present the results without normalization. Because multivariate FPCA is more complex than univariate FPCA, weak correlations between the functions and small sample size may offset the benefit of conducting multivariate FPCA; see Section 7.3 in Wong et al. (2019). Thus, it is of future interest to develop practical tests to determine if correlations between multivariate functional data are different from 0. The mFACEs method has been implemented in an R package mfaces and will be submitted to CRAN for public access.

DATA AVAILABILITY STATEMENT

Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this paper. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp‐content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf. The Supporting Information contains additional details and results of a simulation that mimics the ADNI data. R package mfaces (https://github.com/cli9/mfaces) contains an open‐source implementation of the proposed method described in the article. Demo zip file contains codes to demonstrate the proposed method with a simulated dataset in Section 3. Supporting info item Click here for additional data file.

19 in total

1. Functional generalized linear models with images as predictors.

Authors: Philip T Reiss; R Todd Ogden
Journal: Biometrics Date: 2009-05-08 Impact factor: 2.571

2. Clustering multivariate functional data with phase variation.

Authors: Juhyun Park; Jeongyoun Ahn
Journal: Biometrics Date: 2016-05-24 Impact factor: 2.571

3. Longitudinal functional principal component analysis.

Authors: Sonja Greven; Ciprian Crainiceanu; Brian Caffo; Daniel Reich
Journal: Electron J Stat Date: 2010 Impact factor: 1.125

4. Longitudinal Penalized Functional Regression for Cognitive Outcomes on Neuronal Tract Measurements.

Authors: Jeff Goldsmith; Ciprian M Crainiceanu; Brian Caffo; Daniel Reich
Journal: J R Stat Soc Ser C Appl Stat Date: 2012-01-05 Impact factor: 1.864

5. Robust classification of functional and quantitative image data using functional mixed models.

Authors: Hongxiao Zhu; Philip J Brown; Jeffrey S Morris
Journal: Biometrics Date: 2012-06-06 Impact factor: 2.571

6. Prediction of Conversion to Alzheimer's Disease with Longitudinal Measures and Time-To-Event Data.

Authors: Kan Li; Wenyaw Chan; Rachelle S Doody; Joseph Quinn; Sheng Luo
Journal: J Alzheimers Dis Date: 2017 Impact factor: 4.472

7. Multivariate functional response regression, with application to fluorescence spectroscopy in a cervical pre-cancer study.

Authors: Hongxiao Zhu; Jeffrey S Morris; Fengrong Wei; Dennis D Cox
Journal: Comput Stat Data Anal Date: 2017-02-15 Impact factor: 1.681

8. Using Wavelet-Based Functional Mixed Models to Characterize Population Heterogeneity in Accelerometer Profiles: A Case Study.

Authors: Jeffrey S Morris; Cassandra Arroyo; Brent A Coull; Louise M Ryan; Richard Herrick; Steven L Gortmaker
Journal: J Am Stat Assoc Date: 2006-12-01 Impact factor: 5.033