Literature DB >> 35491388

Fixed-effects inference and tests of correlation for longitudinal functional data.

Ruonan Li¹, Luo Xiao¹, Ekaterina Smirnova², Erjia Cui³, Andrew Leroux⁴, Ciprian M Crainiceanu³.

Abstract

We propose an inferential framework for fixed effects in longitudinal functional models and introduce tests for the correlation structures induced by the longitudinal sampling procedure. The framework provides a natural extension of standard longitudinal correlation models for scalar observations to functional observations. Using simulation studies, we compare fixed effects estimation under correctly and incorrectly specified correlation structures and also test the longitudinal correlation structure. Finally, we apply the proposed methods to a longitudinal functional dataset on physical activity. The computer code for the proposed method is available at https://github.com/rli20ST758/FILF.

Entities: Chemical

Keywords: accelerometry data; covariance function; hypothesis test; mixed effects model

Mesh：

Year: 2022 PMID： 35491388 PMCID： PMC9283332 DOI： 10.1002/sim.9421

Source DB: PubMed Journal: Stat Med ISSN： 0277-6715 Impact factor: 2.497

INTRODUCTION

Due to technological advancement, longitudinal functional data have become common in scientific studies. For example, objective physical activity measurements , , , obtained from wearable accelerometers usually consists of multiple daily profiles measured at every minute of the day for hundreds of study participants at a number of visits. This type of data structure gives rise to longitudinal functional data, which resembles traditional longitudinal data, but has a function instead of a scalar as the basic unit of observation. The average activity measured at multiple visits gives rise to standard longitudinal data, whereas minute‐by‐minute activity profiles observed at multiple visits give rise to longitudinal functional data. Another example is diffusion tensor imaging (DTI), , , , , where imaging data along brain tracts were collected for patients of multiple sclerosis and healthy controls at a number of visits. The physical activity measurements and the DTI data are usually collected at sparse longitudinal visits with densely observed functions, the data structure the article focuses on. This new and practical data structure requires specialized methodological tools to address scientific problems using the original structure of the data. Longitudinal functional data often exhibit complex within‐ and between‐visit correlations. To accommodate such correlations, various functional mixed effects models , , , , , , , , have been proposed. These approaches, after data transformation or projections, allow inference on the fixed effects parameters using the linear mixed effects (LMEs) inferential machinery. Alternatively, bootstrap of study participants , , can be used to construct confidence bands and conduct hypothesis tests for fixed effect functions. In spite of the growing interest in this area of research, modeling longitudinal functional data continues to be daunting. On the one hand, it is often difficult to specify within‐visit correlation structures which account for the continuity of the functional domain with dense observations. On the other hand, the between‐visit correlations may be modeled flexibly using structured specifications that represent the known and usually relatively sparse longitudinal sampling structure (eg, nesting of functions within individuals). With the above considerations, we consider a unified inferential model framework that builds on the marginal decomposition model proposed by Park and Staicu, which focused on modeling the covariance of longitudinal functional data, but did not consider fixed effects inference or tests of correlations. The marginal decomposition model used longitudinal time‐invariant and data‐adaptive eigenfunctions to model within‐visit correlations and a sequence of bivariate covariance functions (either parametric or nonparametric) to model the longitudinal correlations, and is computationally simpler than the approach in Chen and Müller, which modeled longitudinal functional data with a four‐dimensional nonparametric covariance function. For longitudinal functional data with sparse longitudinal visits and densely observed functions, it might be sufficient and preferred to model the longitudinal correlation with a simple parametric specification. More importantly, the separation of within‐subject correlation and longitudinal correlation in the marginal decomposition model not only allows us to compare fixed‐effects inference under different longitudinal correlation structures but also enables us to compare different longitudinal correlation structures via hypothesis testing, the major contributions of this article. For the latter, we shall conduct multiple tests on the correlation structure of the sequence of bivariate covariance functions for modeling the longitudinal correlations in the above marginal decomposition model. For each test, we use the bootstrap‐based approach for covariance testing proposed by Chen et al, which compared parametric covariance functions vs nonparametric covariance function for modeling traditional longitudinal data. To our best knowledge, we have not been aware of any existing test of the longitudinal correlations developed for longitudinal functional data. Moreover, the test in Chen et al is the only available test with an unspecified nonparametric alternative and can be applied to longitudinal data that are observed at irregular time points; see our discussion in Section 3 on alternative tests that are applicable only to limited scenarios. In terms of fixed‐effects inference, the proposed framework may compare favorably against the functional additive model proposed by Scheipl et al, which only considered multilevel functional models with multiple covariance functions. It is worth noting that the aforementioned model can be computationally challenging to fit when using its spline‐based formulation of functional random effects. Bootstrapped approaches such as in Park et al are usually computationally demanding, even for small datasets. In addition, as shall be shown in our simulations, simple bootstrapping of subjects may give inefficient inference when the subjects have multiple visits exhibiting within‐visit correlations. In summary, the contributions of this article are to, for longitudinal functional data: (a) generalize standard longitudinal correlation models; (b) introduce testing for longitudinal correlation structures; and (c) study the effect of correctly and incorrectly specified correlation structures on the fixed‐effects inference. The article is organized as follows. In Section 2, we introduce the general functional mixed effects model and the specific longitudinal correlation structures for modeling longitudinal functional data. In Section 3, we address the problem of testing longitudinal correlation structure. Section 4 assesses the properties of the statistical methods proposed using simulations. Section 5 applies our methods to real data. We conclude the article with a discussion in Section 6. The computer code for the proposed method and an associated RMD file with demonstration details are available at https://github.com/rli20ST758/FILF.

FUNCTIONAL MIXED EFFECTS MODELS

Let be a compact interval and the index be called the functional index. Let denote the observed response variable at for subject , , at the th, , visit that occurred at time . Denote by the vector of covariates for subject at visit , and the matrix of covariates for subject . Some of the covariates may depend on the visit number and some may not, but we use the index for all variables to simplify notation. For one data application on physical activity, the covariates are age, body mass index (BMI), sex, and weekend, and thus . Let , where is a fixed grid in . Let be the ‐dimensional column vector obtained by vectorizing all observations for subject . We denote by the conditional mean of the functional process given the covariates and assume that where are smooth, unknown functions of . Using the approach in Park and Staicu, we decompose the functional residuals, , where and is white noise with zero mean and variance . The functions form an orthonormal basis and are associated random coefficients. We assume that are normally distributed with zero mean and covariance and are uncorrelated over and . Then the covariance of the process is a four‐dimensional operator with the following structure We also assume that the sampling visits are independent over and and have a smooth probability density function, . Then the basis functions are eigenfunctions of the marginal covariance operator where are eigenvalues and are assumed to be ordered decreasingly, that is, , and the random coefficients satisfy . Standard longitudinal models using working covariance often assume either independence, exchangeability, or leave them unspecified. A natural generalization of these assumptions to longitudinal functional data is via the specification of the working covariance, , for every eigenfunction, , indexed by . The continuity of the functional domain is accounted by within‐visit correlation structures and the longitudinal sampling structure is represented by between‐visit correlations. We propose to model the within‐visit covariance nonparametrically and the between‐visit covariance using approaches inspired by standard and functional longitudinal data analysis. The rationale is that the functions are often densely observed and exhibit complex correlation structure, for which a nonparametric approach would provide the needed flexibility, while the longitudinal sampling visits are often sparse, for which a simpler correlation structure might be sufficient and also easy to interpret.

Longitudinal working covariances

We consider three common correlation models including two parametric correlation structures, but the proposed method can easily be extended to other parametric correlation structures. The first is the working independence model, where in Equation (2) are assumed to be independent and identically distributed for all and . Under this framework the random scores , where are uncorrelated across and for each and . Then, we have . The second is the exchangeable correlation model, that is , where are uncorrelated with all and are mutually uncorrelated for all , , and . Denote and , and then . The exchangeable model for longitudinal functional data is a particular case of the multilevel functional principal component model. Indeed, we may write with and . The third is the unspecified smooth correlation model, where is modeled nonparametrically. In this case , where is a zero‐mean smooth process with covariance function , and are zero mean random errors with . In particular, are mutually uncorrelated for all , and . The induced covariance function is . We further decompose the covariance function as , where and are orthonormal eigenfunctions that satisfy . It follows that , where are uncorrelated random scores with zero mean and variance . The unspecified smooth covariance model contains the independent and exchangeable models as special cases. Indeed, the working independence model corresponds to while the exchangeable correlation model corresponds to . Thus, for model estimation below, we will focus on the unspecified covariance model as the other cases can be treated as simpler cases.

Model estimation

We first consider estimation of using B‐spline expansion of the coefficient functions . Specifically, let be a collection of B‐spline basis functions in the unit interval with being the number of basis functions. Then , where is the unknown coefficient vector. We use cubic B‐splines and a relatively large number of equally spaced knots, that is, a large , so that the approximation bias is negligible. Let , then . Denote , . Then, we have . Let and . Then, we have . Now we consider in Equation (2), where for each , As both terms involve infinite sums, we propose to use a number of eigenfunctions that explains at least a given percentage of the variance explained (PVE) in the marginal covariance function, . Denote the selected number of by . Next we determine the approximation of for all simultaneously by ranking , the variances of scores . We retain the that are larger than , where is the largest value that satisfies This will discard with small variances. According to our simulation studies, discarding small variances leads to improved estimation. Let be the number of eigenfunctions selected for . Then approximately, Let , , , and . Then Let , where , and . Note that has a diagonal covariance matrix and has a diagonal covariance matrix . Let and . Then Let , , and . Then It is easy to show that has a diagonal covariance matrix and that has a diagonal covariance matrix . For the random error term , let and . Finally, we obtain the mixed model representation where are independent across subjects. Therefore, given the eigenfunctions and , the fixed effects and variance components , , and can be estimated using standard mixed effects software. To avoid overfitting, we impose a quadratic penalty on each . Let be a second order difference penalty matrix. To be specific we use the penalty , where is the smoothing parameter for the th function . The model can be fitted using the gam function or bam function in the R package mgcv and we use generalized cross validation to select the smoothing parameters.

Longitudinal covariance estimation and model estimation algorithm

We first obtain an initial estimate of for the fixed effects, by fitting model (4) without including the term . This is equivalent to assume that all observations are mutually independent. A small simulation study shows that the proposed method is not sensitive to the initial estimate. Indeed, see Section S.1 of the supplementary material for alternative methods of the initial estimate for . With an initial estimate of , the residuals can be calculated and used for estimating the marginal eigenfunctions and the longitudinal covariances. First, the eigenfunctions can be estimated by using the fast face algorithm of the sandwich smoother on the set of residuals. Specifically, let , , and , where . Then we apply the face algorithm to . Then, given , we calculate as the prediction for . For the working independence model, can be estimated along with directly from the face algorithm. For the exchangeable model, we have and a one‐way nested ANOVA model can be used to estimate and . Finally, when is unspecified, we approximate it using tensor‐product B‐splines and apply the fast covariance estimation method for sparse functional data to the dataset to obtain estimates of and . We use 10 marginal cubic B‐splines with equally spaced knots. Then, an eigendecomposition of leads to estimates of eigenfunctions and associated eigenvalues . We retain all eigenfunctions with associated positive eigenvalues and then use Equation (3) to discard small eigenvalues. Once we have got estimates of the marginal eigenfunctions and the longitudinal covariances, we refine the estimates of fixed effects by fitting the mixed model (4) to obtain estimates of , denoted by . Given the estimated variance parameters and treating the smoothness penalty on the fixed effects function as prior distributions, has a posterior multivariate normal distribution, which can be used to construct either point‐wise or simultaneous confidence bands for the fixed effects functions. We omit the details and refer to the textbook by Wood. We summarize the steps of the proposed method with unspecified smooth covariance in Algorithm 1. The estimation algorithm with independent and exchangeable covariance is available in Section S.2 in the supplementary material. Notice that the fixed effect is actually estimated twice (initial estimate and refined estimate) but not iterative since we have found that more iterations do not contribute to a better estimation. The reason for this is that as long as a good initial estimate of fixed effects is obtained, subsequent estimation of covariances via residuals is already sufficiently accurate and does not have substantial improvement using residuals with refined estimates. As shall be shown in our simulation study, our proposed initial estimates are often comparable with the proposed refined estimates. Notice that we still use the refined estimates as our final estimates because inference would not be valid with the initial estimates. Finally, a small simulation for the iterative procedure between estimating the mean and covariance structure is shown in Section S.1 in the supplementary material.

TESTS OF LONGITUDINAL COVARIANCES

We use the goodness of fit test in Chen et al to test the structure of the covariance in the following scenarios: (1) test the independent model against the unspecified covariance model; and (2) test the exchangeable model against the unspecified covariance model. The only other available method for carrying out the above tests is the multivariate test in Zhong et al, which can only be applied to data with a common longitudinal design and assumes an unstructured covariance that does not account for smoothness in the alternative covariance. Thus when applicable, the multivariate test tends to be less powerful than the above goodness of fit test. It is also worth mentioning that if the alternative is also parametric, then likelihood‐based tests such as the one in Crainiceanu and Ruppert may instead be used. We will consider two types of tests: test of an individual covariance or joint test of multiple covariances. To illustrate the idea, consider the case when the null hypothesis assumes independence. For the individual test, we may test against is unspecified for each . For the joint test, we may test for all against is unspecified for at least one . The null model for the joint test corresponds to the assumption that the longitudinal functions are uncorrelated across visits, , within study participants. The individual longitudinal covariance test can be written as where is a parametric covariance function with a finite number of unknown parameters. For example, to test the null hypothesis of an exchangeable covariance model, we have , where . For testing the null hypothesis of working independence, everything stays the same, except that . We now briefly describe the test proposed in Chen et al Suppose that the null covariance is and the alternative is . First, we estimate the unknown parameters under the null hypothesis and denote the resulting estimate by . Under the alternative hypothesis, we estimate nonparametrically using, for example, smoothing based on tensor product of B‐splines. If the estimator under the alternative hypothesis is then the test statistic is the Hilbert‐Schmidt norm distance where and is the smoothed null covariance estimate using again tensor‐product of B‐splines. The rationale for smoothing is to remove the potential bias induced by smoothing procedure used for obtaining . Larger values of the test statistic, , will provide evidence against the null hypothesis. The null distribution of is obtained via wild bootstrap, that is, the estimated error variance under the alternative hypothesis is used to generate bootstrap samples. To explore the overall structure of longitudinal covariances for modeling longitudinal functional data, we propose a joint test of for all . As we do not expect a large (only a small number of multiple comparisons), the approach we propose is straightforward: conduct individual covariance tests and use a Bonferroni correction to control the family wise error rate. While not explored, one may also use more refined multiple testing methods such as the Benjamini‐Hochberg method.

SIMULATION STUDIES

We use simulations to compare the performance of inferential approaches for the longitudinal functional models using independent, exchangeable, and unspecified covariance assumptions. We will focus on estimation and inference for fixed effects as well as size and power of the covariance tests introduced in Section 3. The longitudinal functional models are compared with the standard functional data method implemented in the pffr function , , of the R package refund. This approach was designed for functional data without residual correlation and will be referred to as the ind_obs method, in the sense that the method assumes independence in longitudinal observations. The methods implemented in the pffr function work very well in practice when residuals are not correlated. We also compare methods with the bootstrap method to obtain the confidence bands for the coefficient functions. The point estimators for the bootstrap approach are obtained using the pffr function.

Simulation settings

Data are generated from the model . The functional mean is where the number of covariates is , , and , where with . We let and . The coefficient functions are , and . The number of visits per subject, , will be specified later while the visit times, , are generated uniformly from the unit interval and then ordered. Using the orthonormal functions and , is generated from one of the three models below: Independent covariance: . Exchangeable covariance: . Unspecified covariance: . For the independent covariance, and . For the exchangeable covariance, , , , and . For the unspecified covariance, , , , and . We generate the random terms as , , , , , and . The functional arguments form a grid of 101 equidistant points in the unit interval. The mutually independent random errors are generated from . The value of is 1.5, which is determined by a signal‐to‐noise ratio (SNR) with a value of 5. For the independent and exchangeable covariances, we define SNR in the functional data as ; for the unspecified covariance, we define SNR as . For simulations, we use a factorial design with three factors: (a) the number of subjects is either 100 or 200; (b) the number of visits are generated from either or ; and (c) the covariance model is either independent, exchangeable or unspecified. There are 12 model conditions. For each condition, 500 replications are conducted on an Intel Core i7‐8700 with 32GB RAM using one core per simulation. For tests of longitudinal covariances, data are generated similarly. However, under the alternative we add terms to ensure that the null model is misspecified. Specifically, we consider Deviation from independent covariance: . Deviation from exchangeable covariance: . The scalar controls the magnitude of the deviation from the null model. The scores and are independent from the nonlinear random function , where . The terms and are specified as before. In addition to the above simulation settings, we have also conducted simulations when the number of covariates is four and the number of marginal eigenfunctions is also four. The details of the simulation settings and results can be found in Section S.6 of the supplementary material.

Model estimation and evaluation criteria

The initial mean estimation is conducted using the pffr function using 10 cubic B‐splines bases and the second order difference penalty. The marginal eigenfunctions are estimated by fitting the functional residuals via the fpca.face function in the R package refund with 20 knots. The unspecified covariance model components and are estimated using the face.sparse function in the R package face with 15 knots. The truncation parameters and (for the unspecified covariance model) are all determined using the prespecified level PVE with a value of 0.95. The final mixed model is fitted using bam function in the R package mgcv. The estimation accuracy for functional parameters is assessed using the mean integrated squared errors (MISE) and the properties of the pointwise confidence intervals. Specifically, for a function with estimate , . For the pointwise confidence intervals, we report the integrated actual pointwise coverage (IAC) and integrated actual width (IAW), where IAC is defined as and is the pointwise approximate confidence interval for the functional parameter . For example, an approximate pointwise confidence interval for can be constructed as . The length of the confidence interval at is and .

Simulation results for estimation

Tables 1, 2, 3 display the results for estimating the coefficient functions when the data generating mechanism uses independent, exchangeable, or unspecified covariances, respectively. For each table, we compare results using five methods. The ind_obs method refers to the method implemented in the pffr function; we use the bootstrap method to obtain the confidence bands for the coefficient functions; and the independent, exchangeable and unspecified methods refer to the proposed methods with independent, exchangeable and unspecified covariances, respectively. We first focus on the point estimation, which is partially captured by the MISE in the three tables. Results indicate that as the number of subjects and visits increase, MISEs decrease for all methods; see the columns labeled in Tables 1, 2, 3. Moreover, the MISE obtains the smallest value when the correlation is correctly specified. In particular, the reduction in MISE is substantial for data with exchangeable and unspecified covariance structures when the correct model is used compared to one that does not account for correlation. For example, in Table 2 which corresponds to data simulated from an exchangeable correlation structure, for and , the is 0.028 for for the method which accounts for exchangeable correlation, while it is 0.066 for the bootstrap method which does not. Finally, the unspecified correlation model consistently provides good estimators, with the corresponding MISE comparable to that obtains from the correctly specified model. See Section S.3 in the supplementary material for additional simulation results.

TABLE 1

, IAC for pointwise confidence interval and IAW for estimating , , and based on independent covariance data using five methods across 500 simulations

		β0(s)			β1(s)			β2(s)
Method	Time	MISE	IAW	IAC	MISE	IAW	IAC	MISE	IAW	IAC
	n=100 and ni∼Unif[3,6]
ind_obs	0 (0.0)	0.127	0.23	0.58	0.119	0.15	0.46	0.082	0.09	0.40
Bootstrap	72 (0.1)	0.127	0.54	0.93	0.119	0.51	0.93	0.082	0.35	0.93
Independent	3 (0.0)	0.125	0.80	0.98	0.115	0.51	0.95	0.073	0.33	0.95
Exchangeable	7 (0.1)	0.125	0.82	0.98	0.115	0.52	0.96	0.073	0.34	0.95
Unspecified	8 (0.0)	0.125	0.80	0.98	0.115	0.51	0.95	0.073	0.34	0.95
	n=100 and ni∼Unif[8,12]
ind_obs	1 (0.0)	0.086	0.16	0.58	0.079	0.10	0.46	0.052	0.07	0.45
Bootstrap	150 (0.1)	0.086	0.36	0.94	0.079	0.34	0.94	0.052	0.23	0.95
Independent	32 (0.1)	0.085	0.53	0.96	0.078	0.34	0.94	0.048	0.23	0.96
Exchangeable	49 (0.4)	0.085	0.54	0.97	0.078	0.35	0.95	0.048	0.23	0.96
Unspecified	37 (0.1)	0.085	0.53	0.96	0.078	0.34	0.94	0.048	0.23	0.96
	n=200 and ni∼Unif[3,6]
ind_obs	1 (0.0)	0.086	0.16	0.61	0.084	0.11	0.44	0.056	0.07	0.42
Bootstrap	138 (0.1)	0.086	0.38	0.95	0.084	0.36	0.94	0.056	0.25	0.96
Independent	24 (0.1)	0.085	0.57	0.97	0.082	0.36	0.95	0.051	0.24	0.97
Exchangeable	49 (0.4)	0.085	0.57	0.97	0.082	0.36	0.96	0.051	0.24	0.97
Unspecified	33 (0.1)	0.085	0.57	0.97	0.082	0.36	0.95	0.051	0.24	0.97
	n=200 and ni∼Unif[8,12]
ind_obs	1 (0.0)	0.059	0.11	0.61	0.057	0.07	0.46	0.038	0.05	0.44
Bootstrap	288 (0.4)	0.059	0.25	0.95	0.057	0.24	0.93	0.038	0.17	0.94
Independent	264 (0.9)	0.058	0.38	0.97	0.056	0.24	0.94	0.036	0.16	0.95
Exchangeable	381 (3.0)	0.058	0.39	0.98	0.056	0.24	0.94	0.036	0.17	0.96
Unspecified	275 (0.9)	0.058	0.38	0.97	0.056	0.24	0.94	0.036	0.16	0.95

Note: Time (standard error) with second as a unit is the run time per simulation averaged by 500 replications.

TABLE 2

, IAC for pointwise confidence interval and IAW for estimating , , and based on exchangeable covariance data using five methods across 500 simulations

		β0(s)			β1(s)			β2(s)
Method	Time	MISE	IAW	IAC	MISE	IAW	IAC	MISE	IAW	IAC
	n=100 and ni∼Unif[3,6]
ind_obs	0 (0.0)	0.225	0.22	0.34	0.226	0.15	0.24	0.119	0.09	0.28
Bootstrap	74 (0.1)	0.225	0.97	0.94	0.226	0.94	0.94	0.119	0.51	0.94
Independent	3 (0.0)	0.223	0.80	0.86	0.220	0.50	0.69	0.107	0.33	0.82
Exchangeable	6 (0.0)	0.215	1.44	0.97	0.202	0.90	0.95	0.063	0.29	0.95
Unspecified	11 (0.0)	0.215	1.42	0.99	0.208	0.87	0.93	0.064	0.29	0.94
	n=100 and ni∼Unif[8,12]
ind_obs	1 (0.0)	0.206	0.15	0.27	0.205	0.10	0.17	0.092	0.07	0.26
Bootstrap	152 (0.1)	0.206	0.91	0.95	0.205	0.90	0.95	0.092	0.39	0.94
Independent	32 (0.2)	0.205	0.54	0.74	0.202	0.34	0.54	0.086	0.23	0.75
Exchangeable	46 (0.2)	0.203	1.39	0.97	0.187	0.87	0.97	0.037	0.17	0.96
Unspecified	51 (0.2)	0.205	1.36	0.98	0.190	0.84	0.96	0.037	0.17	0.95
	n=200 and ni∼Unif[3,6]
ind_obs	1 (0.0)	0.162	0.16	0.34	0.149	0.11	0.26	0.085	0.07	0.29
Bootstrap	137 (0.2)	0.162	0.69	0.94	0.149	0.66	0.95	0.085	0.36	0.95
Independent	23 (0.1)	0.160	0.57	0.85	0.147	0.35	0.72	0.079	0.24	0.82
Exchangeable	44 (0.2)	0.151	1.02	0.97	0.141	0.64	0.95	0.047	0.21	0.95
Unspecified	54 (0.2)	0.152	1.01	0.99	0.142	0.63	0.95	0.047	0.21	0.95
	n=200 and ni∼Unif[8,12]
ind_obs	1 (0.0)	0.147	0.11	0.28	0.140	0.07	0.20	0.066	0.05	0.25
Bootstrap	288 (0.3)	0.147	0.65	0.95	0.140	0.63	0.95	0.066	0.28	0.95
Independent	261 (0.9)	0.147	0.38	0.73	0.139	0.24	0.58	0.063	0.16	0.76
Exchangeable	373 (1.5)	0.144	0.97	0.97	0.131	0.62	0.96	0.028	0.12	0.95
Unspecified	383 (1.5)	0.147	0.98	0.98	0.134	0.61	0.95	0.029	0.12	0.95

Note: Time (standard error) with second as a unit is the run time per simulation averaged by 500 replications.

TABLE 3

, IAC for pointwise confidence interval and IAW for estimating , , and based on unspecified covariance data using five methods across 500 simulations

		β0(s)			β1(s)			β2(s)
Method	Time	MISE	IAW	IAC	MISE	IAW	IAC	MISE	IAW	IAC
	n=100 and ni∼Unif[3,6]
ind_obs	0 (0.0)	0.127	0.23	0.58	0.122	0.15	0.42	0.086	0.10	0.37
Bootstrap	74 (0.1)	0.127	0.53	0.93	0.122	0.50	0.93	0.086	0.37	0.94
Independent	3 (0.0)	0.125	0.81	0.97	0.118	0.51	0.95	0.076	0.34	0.95
Exchangeable	7 (0.1)	0.125	0.82	0.97	0.118	0.52	0.96	0.076	0.34	0.95
Unspecified	19 (0.1)	0.081	0.62	1.00	0.077	0.31	0.93	0.052	0.22	0.94
	n=100 and ni∼Unif[8,12]
ind_obs	1 (0.0)	0.081	0.16	0.61	0.070	0.10	0.49	0.063	0.07	0.36
Bootstrap	152 (0.1)	0.081	0.34	0.94	0.070	0.31	0.94	0.063	0.28	0.95
Independent	32 (0.1)	0.079	0.55	0.97	0.069	0.34	0.97	0.059	0.23	0.91
Exchangeable	43 (0.3)	0.079	0.55	0.97	0.069	0.34	0.97	0.059	0.23	0.91
Unspecified	77 (0.4)	0.045	0.37	1.00	0.041	0.18	0.95	0.032	0.13	0.94
	n=200 and ni∼Unif[3,6]
ind_obs	1 (0.0)	0.085	0.16	0.60	0.079	0.11	0.47	0.059	0.07	0.40
Bootstrap	138 (0.1)	0.085	0.37	0.95	0.079	0.35	0.95	0.059	0.26	0.94
Independent	24 (0.1)	0.084	0.56	0.97	0.077	0.36	0.96	0.055	0.24	0.94
Exchangeable	45 (0.3)	0.084	0.57	0.97	0.077	0.36	0.96	0.055	0.24	0.94
Unspecified	116 (0.6)	0.054	0.43	1.00	0.050	0.21	0.94	0.036	0.15	0.95
	n=200 and ni∼Unif[8,12]
ind_obs	1 (0.0)	0.056	0.11	0.64	0.052	0.07	0.49	0.044	0.05	0.37
Bootstrap	288 (0.4)	0.056	0.24	0.94	0.052	0.22	0.94	0.044	0.20	0.95
Independent	262 (0.8)	0.055	0.38	0.97	0.051	0.24	0.96	0.043	0.16	0.91
Exchangeable	339 (1.5)	0.055	0.38	0.97	0.051	0.24	0.97	0.043	0.16	0.90
Unspecified	585 (3.6)	0.031	0.25	1.00	0.029	0.12	0.94	0.022	0.09	0.95

Note: Time (standard error) with second as a unit is the run time per simulation averaged by 500 replications.

, IAC for pointwise confidence interval and IAW for estimating , , and based on independent covariance data using five methods across 500 simulations Note: Time (standard error) with second as a unit is the run time per simulation averaged by 500 replications. , IAC for pointwise confidence interval and IAW for estimating , , and based on exchangeable covariance data using five methods across 500 simulations Note: Time (standard error) with second as a unit is the run time per simulation averaged by 500 replications. , IAC for pointwise confidence interval and IAW for estimating , , and based on unspecified covariance data using five methods across 500 simulations Note: Time (standard error) with second as a unit is the run time per simulation averaged by 500 replications. For confidence bands, all methods outperform the ind_obs method in terms of coverage. The confidence bands from the ind_obs method are on average too narrow and are far below the nominal coverage level. The correctly specified and the unspecified covariance models provide confidence bands that are close to the nominal, though, slightly conservative sometimes. The bootstrap method provides confidence bands that close to the nominal, though they require more computational times and have higher average width for estimating slope functions and lower average width for estimating intercept functions. When models are misspecified (eg, exchangeable model for unspecified covariance data), the average coverage of the confidence bands tends to be close to the nominal, but at the expense of much wider confidence bands. In terms of run times per simulation averaged by 500 replications, the bootstrap method is the slowest and the ind_obs method is the fastest. Computational time increases with more complex longitudinal correlation structures. However, when the dataset has a large number of subjects and a large number of visits per subject ( and , the computation time for fitting the proposed methods can be comparable to or longer than that of the bootstrap method. The proposed method may be sped up via parallel computing in the bam function, but in our simulation we only use one core. In summary, the unspecified covariance models perform well in terms of inference for the fixed effects in a variety of scenarios. However, in practice one may still want to know whether a simpler correlation structure is reasonable for the data, as the correct model slightly outperforms the unspecified covariance model. In the next section, we provide results for testing the longitudinal covariance.

Simulation results for longitudinal covariance tests

For the covariance tests introduced in Section 3, we report the empirical type I error rate (estimated size) for nominal levels and 0.10 based on 5000 simulated datasets, and the power at the level with 1000 simulated datasets. For the joint test, we use the Bonferroni correction to account for the multiplicity of hypotheses tests. For the individual covariance tests, the power curves are presented as functions of the deviation from the null, defined as for each . For the joint test, the deviation from the null is defined as The deviation from the null hypothesis is a number between 0 and 1. Table 4 reports the empirical type I errors of the individual and joint tests, indicating that all tests have empirical levels close to their nominal levels. Tests can be conservative for exchangeable covariance with a larger number of visits per subject. We have also evaluated the dependency of the testing statistics from different layers for a few simulation settings; see Section S.4 of the supplementary material for details. While we have found significant positive correlations among the testing statistics for most scenarios, the correlations seem small. Nevertheless, for the joint test where the Bonferroni correction is used, multiple testing methods that account for positive dependencies such as the Simes method might instead be employed.

TABLE 4

Empirical type I error of covariance tests at the nominal and 0.10 levels based on 5000 datasets, by sample size () and observations per subject ()

			n=100		n=200
Null	ni	Layer	α=0.05	α=0.10	α=0.05	α=0.10
Exchangeable	Unif[3,6]	First layer	0.055	0.105	0.046	0.098
		Second layer	0.050	0.099	0.045	0.093
		Joint test	0.053	0.103	0.043	0.090
	Unif[8,12]	First layer	0.042	0.094	0.039	0.085
		Second layer	0.037	0.080	0.040	0.092
		Joint test	0.037	0.078	0.033	0.078
Independent	Unif[3,6]	First layer	0.056	0.115	0.057	0.110
		Second layer	0.051	0.108	0.052	0.103
		Joint test	0.053	0.105	0.056	0.106
	Unif[8,12]	First layer	0.053	0.103	0.057	0.108
		Second layer	0.054	0.103	0.055	0.102
		Joint test	0.056	0.104	0.058	0.108

Empirical type I error of covariance tests at the nominal and 0.10 levels based on 5000 datasets, by sample size () and observations per subject () Figure 1 presents the power curves for the proposed tests, indicating similar patterns for the two individual and the joint tests. The test for independent covariance has higher power than the one for exchangeable covariance data. For both types of the null hypothesis, power increases with the number of subjects and observation per subject. Power seems to be particularly sensitive to the number of visits, , per curve. Indeed, with 100 study participants and exchangeable covariance, for all three tests, the power for is close to one under a deviation around 0.25. In contrast, for , the deviation needs to be closer to 0.7 to reach the same power. As expected, power depends on how closely the true model matches the specified alternative assumed by the test.

FIGURE 1

Power curves (type I error ) for individual and joint tests of covariances, under deviation from the null. Shown are: (solid line) and (dashed line), for (gray) and (black)

Power curves (type I error ) for individual and joint tests of covariances, under deviation from the null. Shown are: (solid line) and (dashed line), for (gray) and (black) In summary, in all settings, the proposed tests maintain proper size and have the power to detect alternative longitudinal covariance both when the null hypothesis assumes independent covariance and exchangeable covariance.

DATA APPLICATION

Wearable devices such as accelerometers provide objective and detailed measurements of physical activity and enable researchers to study how physical activity changes with age, personal characteristics, and health status. In earlier studies based on cross‐sectional data, the systematic and random circadian rhythms of physical activity was analyzed as functions of time of day and age. , In this analysis, our objective is to quantify the circadian rhythm of physical activity over multiple days and how they relate to age, sex, and BMI and provide associated inference. Data were collected from the National Health and Nutrition Examination Survey (NHANES), a large cohort study conducted in 2‐year waves by the US Centers for Disease Control and Prevention (CDC) to assess the health and nutritional status of US population. We focus on the NHANES 2011‐2012 and 2013‐2014 data, where each study participant was asked to wear the wrist‐worn device all the time for seven consecutive days. The physical activity data were collected and summarized in minute‐level Monitor‐Independent Movement Summary (MIMS) units, a physical activity intensity unit optimized to capture normal human motion. For each day of the study participant, our initial data are MIMS measured at 1440 minutes from midnight to midnight. The final dataset consists of 513 study participants (208 men and 305 women) with an average age of 57 years and an average BMI value of 28. Figure 2 displays the physical activity profiles of two randomly selected NHANES study participants. For each study participant, we further subset the days with at least estimated wear time, leading to a total of 3187 days of wear with an average of 6.21 days of wear per study participant. For subject at day , the average MIMS at hour of a day is our longitudinal functional outcome, denoted by .

FIGURE 2

Physical activity profiles of two NHANES study participants over available days. Panels in the top row are from a male with age 54 and BMI 20.5. Panels in the bottom row are from a female with age 57 and BMI 31.54. Each panel displays MIMS of 1 day from midnight to midnight, titled by day of the week Activity profiles often vary for weekends (Saturday/Sunday) and weekdays (Monday‐Friday). Thus, we create a binary variable which is 1 for weekends and 0 for weekdays. Then, we include sex (denoted by ), age (), BMI (), and weekend () as the covariates in this analysis. The mean function is . The variable is 0 for males and 1 for females. Both age and BMI are centered so that is the population mean MIMS corresponding to males of age 57 and with BMI of 28. The longitudinal covariate is the day of wear. Because physical activity profiles are periodic, that is, physical activity counts are the same at the start of the day and at the end of the day, the coefficient function should also be cyclic and we use cyclic P‐splines. To investigate the longitudinal correlation structure of the data, we first estimate the marginal covariance function via the fpca.face R function using cubic B‐spline basis functions with 10 knots. Using the procedure described in Section 2.2, we identify that eight eigenfunctions explain of the marginal variance. Using these eight eigenfunctions, we conduct a joint test for the null hypothesis that all longitudinal covariances can be induced by an exchangeable covariance between daily profiles within each subject. The ‐values are 0.04, 0.25, 0.06, 0.49, 0.62, 0.49, 0.09 and 0.19, respectively. Thus, with the Bonferroni correction, we do not reject the null hypothesis at the 0.05 level. The final model fit is obtained by using exchangeable covariances for all longitudinal covariances. The estimated coefficient functions are displayed in Figure 3, along with the pointwise confidence bands. The top left panel in Figure 3 displays the estimated mean activity profile for males with age 57 and BMI 28. The shape is consistent with that of published average activity plots over the course of the day. On average, individuals are less active during night. Then, their activities increase sharply between 5 AM and 11 AM, sustain between 11 AM and 7 PM, and decrease rapidly after 7 PM. The top middle panel indicates that females in this age group (50‐70) have significantly more activities than males throughout most of the day except for midnight. The top right panel displays the estimated coefficient function for age, indicating a loss of activity for older individuals from afternoon to 1 AM. There is no substantial age effect in activity in the morning while the most considerable age effect happens in the early night. These findings are consistent with previous findings that older individuals sleep early at night and wake up early in the morning, while younger individuals are more energetic at night. The bottom left panel indicates that activity patterns tend to decrease with increasing BMI, especially in the daytime. Compared to weekdays, weekends correspond to lower levels of activity in the whole day, significantly in the morning from 5 AM to 11 AM. These results are consistent with common sense that individuals are less active on weekends morning.

FIGURE 3

Estimated coefficient functions for the NHANES dataset along with pointwise confidence bands under exchangeable longitudinal covariance structure

Estimated coefficient functions for the NHANES dataset along with pointwise confidence bands under exchangeable longitudinal covariance structure Figure 4 displays the estimated mean activity profiles for females and males, with different age and BMI combinations. From the left two panels, we see that for a given BMI value, there is a pronounced decrease in activity as a function of age, both for females and males. Moreover, younger individuals have on average two activity peaks, while older individuals tend to lose the second activity peak. It indicates that the change of physical activity patterns with age is due to less activity in the afternoon and at night. The right two panels indicate that the average activity profile decreases with increasing BMI. The negative effect of age only works after noon while individuals with larger BMI have less activity during all daytime. Moreover, the effect of two units on the BMI scale is roughly equivalent to two years of age in terms of overall activity loss. Finally, the estimated covariances of physical activity profiles within each day and across days, respectively, and the top three estimated eigenfunctions are shown in Section S.5 in supplementary material.

FIGURE 4

Left two panels: Estimated mean activity profiles of females and males with BMI 28 at four different ages. Right two panels: Estimated mean activity profiles of females and males with age 57 and with four different BMI values

DISCUSSION

We introduced an inferential framework for estimating fixed effects functions for longitudinal functional data with complex correlation structures. Methods are natural extensions of standard longitudinal data methods with scalar observations and use the flexible marginal decomposition model framework in Park and Staicu. Simulation studies indicate that the estimation procedure works remarkably well even when the longitudinal correlations are misspecified. However, the confidence bands obtained under misspecified correlation models may have smaller than nominal coverage or are wider than necessary. Our results indicate that using unspecified correlation matrices is the safer approach when it is computationally feasible. More work will be necessary to improve the computational feasibility of these methods for larger sample sizes in terms of the number of study participants, visits, and observations per curve. We also introduced a testing framework for longitudinal correlations for longitudinal functional data, which extended the testing method in Chen et al. The proposed tests provide insight into what correlation structures are appropriate and may help reduce the computational cost when a simpler longitudinal correlation is chosen. This article only considered scalar covariates and it would be of interest to extend the proposed framework to include functional covariates. Data S1 Supplementary material Click here for additional data file.

21 in total

1. Bayesian Semiparametric Functional Mixed Models for Serially Correlated Functional Data, with Application to Glaucoma Data.

Authors: Wonyul Lee; Michelle F Miranda; Philip Rausch; Veerabhadran Baladandayuthapani; Massimo Fazio; J Crawford Downs; Jeffrey S Morris
Journal: J Am Stat Assoc Date: 2018-08-15 Impact factor: 5.033

Fixed-effects inference and tests of correlation for longitudinal functional data.

INTRODUCTION

FUNCTIONAL MIXED EFFECTS MODELS

Longitudinal working covariances

Model estimation

Longitudinal covariance estimation and model estimation algorithm

TESTS OF LONGITUDINAL COVARIANCES

SIMULATION STUDIES

Simulation settings

Model estimation and evaluation criteria

Simulation results for estimation

Simulation results for longitudinal covariance tests

DATA APPLICATION

DISCUSSION

1. Bayesian Semiparametric Functional Mixed Models for Serially Correlated Functional Data, with Application to Glaucoma Data.

2. Fast Covariance Estimation for High-dimensional Functional Data.

3. Generalized multilevel function-on-scalar regression and principal component analysis.

4. Simple fixed-effects inference for complex functional models.

5. Functional CAR models for large spatially correlated functional datasets.

6. Additive Functional Cox Model.

7. Functional Additive Mixed Models.

8. Bootstrap-based inference on the difference in the means of two correlated functional processes.

9. Quantifying the lifetime circadian rhythm of physical activity: a covariate-dependent functional approach.

10. Fast covariance estimation for sparse functional data.

1. Fixed-effects inference and tests of correlation for longitudinal functional data.