Literature DB >> 29861518

A joint design for functional data with application to scheduling ultrasound scans.

So Young Park¹, Luo Xiao², Jayson D Willbur³, Ana-Maria Staicu², N L'ntshotsholé Jumbe⁴.

Abstract

A joint design for sampling functional data is proposed to achieve optimal prediction of both functional data and a scalar outcome. The motivating application is fetal growth, where the objective is to determine the optimal times to collect ultrasound measurements in order to recover fetal growth trajectories and to predict child birth outcomes. The joint design is formulated using an optimization criterion and implemented in a pilot study. Performance of the proposed design is evaluated via simulation study and application to fetal ultrasound data.

Entities: Disease Species

Keywords: Covariance function; Fetal growth; Functional data analysis; Longitudinal data; Prediction

Year: 2018 PMID： 29861518 PMCID： PMC5840761 DOI： 10.1016/j.csda.2018.01.009

Source DB: PubMed Journal: Comput Stat Data Anal ISSN： 0167-9473 Impact factor: 1.681

Introduction

Functional data analysis has been a popular statistical research area for the last two decades and has found application in many fields such as brain imaging Jiang et al. (2009), Greven et al. (2010), Reiss and Ogden (2010), Lindquist (2012), Lu and Marron (2014), Park and Staicu (2015), biosignals Crainiceanu et al. (2012), Randolph et al. (2012), Goldsmith and Kitago (2016), genetics Tang and Müller (2009), Reimherr and Nicolae (2014) and wearable computing Morris et al. (2006), Li et al. (2014), Xiao et al. (2015). For a comprehensive treatment of functional data analysis see Ramsay and Silverman (2002), Ramsay and Silverman (2005) and Horváth and Kokoszka (2012). This paper considers sampling design for noisy growth data. The motivation arises from the study of fetal growth, where measurements of fetal size may be obtained during pregnancy using ultrasound. And the particular question to be addressed is: when a fixed number of ultrasound scans will be taken during pregnancy, what are the optimal time points for data collection? Optimality can be defined either in terms of recovering individual fetal growth trajectories or in terms of predicting a birth outcome, such as birth weight. However, in practice it may be important to predict both individual growth trajectories and birth outcomes, and in such cases a joint optimality criterion must be formulated. We also consider the closely related question of the number of ultrasound scans required to achieve a desired level of optimality. We address this question within the functional data framework. Design for functional data has received some interest recently. For example, Ferraty et al. (2010) considered a nonparametric model with a scalar response and a functional predictor and Delaigle et al. (2012) studied a similar problem for classifying and clustering functional data. Both methods are restricted to densely sampled functional data and focus on dimensionality reduction for a dense functional predictor. And for spatially correlated functional data, Rasekhi et al. (2014) and Bohorquez et al. (2015) considered the problem of selecting spatial sampling points. Design for functional data has also been extended to longitudinal data. Ji and Müller (2017) proposed prediction-based criteria for sampling functional data with the target of either recovering individual functions or predicting a scalar outcome. Wu et al. (2017) exploited the mixed effects model representation of functional data and proposed a design criterion based on Fisher’s information matrix of eigenvalues of the covariance function. There are several limitations with these approaches. Wu et al. (2017) focused on recovering individual functions, while Ji and Müller (2017) were limited to the study of design separately and did not consider a joint design, which is the focus of our data application. In addition, in these works the number of design points was pre-fixed and no data-driven method was developed. Finally, Ji and Müller (2017) did not compare functional data models versus parametric mixed effects models for prediction-based designs. Our work addresses these gaps. Following early work on design, such as Ylvisaker (1987) and the references therein and recent work by Ji and Müller (2017), we consider prediction-based designs and propose a unified design criterion for both recovering individual functions as well as predicting scalar outcomes from a functional predictor. We also propose a practical data-driven method for selecting the number of design points, building on the result that the larger the number of design points, the better the prediction will be (see Theorem 1). Finally we conduct a comprehensive simulation study to evaluate the performance of functional data models as compared to parametric mixed effects models, and demonstrate numerically that functional data models might be preferred over parametric mixed effects models for prediction-based optimal designs for longitudinal data. The rest of the paper is organized as follows. In Section 2 we introduce functional data models and propose a unified prediction-based design criterion for sampling functional data. In Section 3 we study the theoretic properties of the proposed design. In Section 4 we discuss implementation of the design and propose a data-driven method for selecting the number of design points. In Section 5 we illustrate the proposed method using a fetal ultrasound data. In Section 6, we investigate the performance of the design via simulation studies.

Optimal design for functional data

In this section, we first describe functional data models and then formulate two optimal design problems for sampling functional data: one design targets accurate prediction of individual functions while the other targets accurate prediction of a scalar outcome. Then, we propose a unified design criterion that targets both recovering individual functions and predicting a scalar outcome. In particular, the unified design contains the previous two designs as special cases.

Statistical models

Consider a random function defined over a continuous and compact time domain . Suppose that is a Gaussian process with mean function and covariance function . We assume that is square integrable in and without loss of generality we let . In practice, is observed at a finite number of time points and contaminated with noise. Hence, for a random function with a subject index observed at time points , the observations are where the are i.i.d. and independent of . Let be a scalar outcome with a functional predictor . And consider the functional linear model where is an intercept, , is a smooth coefficient function, and is white noise independent of with mean zero and variance . The fundamental element in functional data analysis is the covariance function . By Mercer’s theorem, can be written as where is the collection of eigenvalues and the are the associated eigenfunctions which satisfy . Here is 1 if the condition inside the bracket holds and 0 otherwise. To ensure that is identifiable, we assume that the coefficient function can be written as where the are scalars and, a possibly infinite represents the number of non-zero eigenvalues.

Optimal design for predicting functions

Fix and assume that observations will be collected from a new subject. The goal is to select the optimal sampling points in for predicting the new subject’s curve with the smallest possible error. Let be the vector of sampling points and be the noisy observations for a new subject . Under model (1), the best predictor of conditional on is the best linear unbiased predictor (BLUP) of , where and . For simplicity, we suppress the notation from and use . The optimal sampling points can be selected by minimizing the mean integrated squared error of the BLUP, The optimal design is then defined as . And we simplify as where is the trace operator and whose element is given by . We write where and . Note that It is also easy to show that Therefore, And if we let , then we obtain the simplified form

Optimal design for predicting an outcome

Similar to Section 2.2, assume that observations will be collected from a new subject indexed by . But let the goal now be to select the optimal sampling points in for predicting the new subject’s scalar outcome with the smallest possible error. Using the same notation as in Section 2.2, let be the vector of sampling points and be the noisy observations for subject . Under the functional linear model (2), , where . Then under the functional data model (1), the best predictor of conditional on is the best linear unbiased predictor of , And the mean squared error for predicting is Then the optimal design is Note that the mean squared error for predicting is , which results in the same design. This design was studied in earlier work including Ritter (1996), and more recently Ji and Müller (2017). By Eq. (3), Thus, Then, where is a -dimensional vector with . We also obtain which leads to Therefore, we obtain the simplified form

A joint design for functional data

In practice, there might be multiple goals in design with each goal resulting in one optimal design. Depending on the goal, the corresponding optimal design may vary and may not be optimal for alternative goals. Indeed, the optimal sampling points for predicting functions may not be the optimal sampling points for predicting a scalar outcome, and vice versa. It may thus be useful to consider a joint design to balance between the different goals. Note that joint designs may also be referred to as compound designs in the statistical design literature ( Atkinson et al., 2007, Chapter 21). Before formulating a joint design, consider first the design objective function where is an arbitrary positive semidefinite matrix and we will call it a “linear design criterion matrix” as the objective function depends linearly on elements of . The form in (7) is general with different leading to different designs. In particular, it includes as special cases, the objective functions for the design for predicting functions (5) and for the design for predicting an outcome (6). Indeed, for predicting the growth curve of a new subject, is the identity matrix and for predicting a scalar outcome of a new subject, . Additionally, if it is more important to predict a curve more accurately at some time points than others, one may consider a weighted mean integrated squared error for some known weight function . It can be shown that the objective function to be minimized still takes the form in (7) with a particular design criterion matrix. Specifically, and is positive semidefinite with a finite operator norm by Lemma 1 in Appendix A. Now consider a bivariate continuous function on and the objective function . Suppose the joint design is to minimize the objective function . It is reasonable to impose the following assumption on : is nondecreasing along both and and . Moreover, . Let and be two fixed non-negative constants. Two sensible forms of are: and . The former is a joint design that minimizes a linear combination of two prediction errors while the latter means that the joint design aims to minimize the maximum of the two prediction errors (up to multiplicative weights). In particular, with . It is straightforward to show that both forms satisfy Assumption 1. The two constants and are used to control the weights of the two different design objective functions. One reasonable choice of and is to balance the two design objective functions such that one design does not dominate the other. In view of (7) and Theorem 2 from the following section, we may let and , and it can then be shown that and .

Properties of

In this section we study the properties of for any function that satisfies Assumption 1. We assume that the random functions, , are square integrable (i.e., ) and the coefficient function in the functional linear regression model (2) is also square integrable (i.e., ). Proofs of the theorems are provided in Appendix A. Suppose , for some fixed integer and , then Theorem 1 implies that more observations (i.e., larger ) do not increase the value of the objective function . We also study the deterministic bound of as diverges to infinity according to a fixed design. Suppose that the assumptions stated in Appendix A hold. For the fixed design where , we have . Theorem 2 provides the rationale that a dense set of time points in is sufficient as the candidate sampling points. In practice, because of the cost for data collection and other considerations, a small number of sampling points with reasonable prediction power might be preferred. In Section 4.2, we propose a data-driven method for selecting the number of optimal time points.

Implementation

Model estimation using pilot data

To implement the proposed optimal design, we need to estimate the covariance function , error variance and coefficient function using pilot data. Many methods exist for covariance function estimation including local polynomial regression (Yao et al., 2005), mixed effects models (James et al., 2000) and geometric PCA (Peng and Paul, 2009). We use the fast covariance estimation method (FACEs) from Xiao et al. (2017), which uses a penalized tensor product of cubic B-splines for approximating the true covariance function. The error variance can also be estimated by FACEs. As for estimating , we select , the number of eigenfunctions, by the percentage of variance explained (PVE) with a value of 0.95.

Optimization algorithm and selection of number of design points

In practice, the optimal sampling points are selected from a pre-determined set of candidate time points, denoted by . Theorem 2 suggests that equally spaced sampling points can form a reasonable set of candidate points. If the number of selected design points is small, then we use a full search algorithm (i.e., we evaluate for every combination of points from ). If the number of selected design points is large, a full search becomes computationally difficult and one may use a Monte Carlo sampling method in Wu et al. (2017) or a sequential search method in Ji and Müller (2017). In this paper we focus on the full search algorithm. In many applications, the number of optimal time points may not be known a priori. One approach is to choose the smallest such that the expected error is smaller than some pre-determined tolerance error. Alternatively, similar to Ferraty et al. (2010), one may incorporate the cost of collecting more sampling points into consideration. Here we propose a new method for selecting . First, when , an empty set, we define , , and . Note that is the total variation of the functional predictor while is the total variation of the response that can be explained by the functional predictor. Then it can be easily verified by the definitions in (5) and (6) and the assumptions on that for any of any dimension. Let . Then, where is a fixed constant corresponding to the maximum percent reduction in expected squared error gained by augmenting the design with an additional design point. By Theorem 1 in Section 3, for any , . Thus, the relative error level is a decreasing function of and converges to 0 by Theorem 2. This implies that for any fixed , is finite. In practice, we plug estimated model components (see Section 4.1) into to obtain . Then we let and define Small values of seem preferable and we use in both the data application and simulations. This implies that we select a such that the addition of a new design point will result in no more than 5% of reduction in expected squared error with respect to the error reduced by using the fully observed functional predictor.

Software and shiny interactive graphic

The proposed optimal design method has been implemented as an R package (R Core Team, 2016) FDAdesign that includes interactive graphics using shiny (Chang et al., 2016) which can be used to evaluate design objectives corresponding to different sampling designs. The interface of the graphic is illustrated in the data application. Details about using the FDAdesign package and the interactive graphics can be found in Section S.1 of the Supplementary materials.

Application to fetal ultrasound

We apply the proposed methodology to fetal growth data, where ultrasound scans were performed at different weeks of gestational age (GA). For this analysis, we model measurements of abdominal circumference by ultrasound (scaled within the range of 0 and 1) to estimate individual fetal growth trajectories and use newborn birth weight as the scalar outcome. The fetal growth dataset contains between 1 and 6 ultrasound scans for each of 2388 subjects, with most subjects having 5 scans. The spaghetti plot for the ultrasound measurements is shown in Fig. 1, with data from 3 subjects highlighted. While the 3 subjects do show some degree of curvilinearity, the overall pattern of trajectories raises the question of whether a linear mixed effects model would suffice for this data.

Fig. 1

Spaghetti plot of the fetal ultrasound data.

Thus, we compare the functional model and the linear mixed effects model using 10-fold cross validation. We find that the linear mixed effects model has twice the prediction error of the functional model. Figure S.2 in the Supplementary materials illustrates the prediction performance of the two models for one particular case, where 90% data are used for model estimation and the remaining 10% data are used for evaluation. Therefore, using functional model seems more appropriate for this application. Fig. 2 displays the fPCA fit. The top panels of Fig. 2 show that both the estimated mean function and variance function are increasing with gestational age. The bottom left panel of Fig. 2 indicates high positive correlations () when both gestational ages are smaller than 32 weeks. The top three estimated eigenvalues are , and , respectively, with the corresponding estimated eigenfunctions shown in the bottom right panel of Fig. 2. The estimated error variance is .

Fig. 2

fPCA fit to the ultrasound measurements.

Spaghetti plot of the fetal ultrasound data. When we predict subject birth weight using the functional linear model (2) with abdominal circumference as the functional covariate, it turns out that about of the variation in the birth weight is explained by the functional covariate. For the estimated coefficient function; see Figure S.3 of the Supplementary materials. fPCA fit to the ultrasound measurements. Finally we consider a linear joint design with the target of accurately recovering the ultrasound measurements of fetal abdominal circumference and predicting the newborn outcome of birth weight. The objective function for the joint design is , where is the estimated objective function for recovering individual functions while is the estimated objective function for predicting a scalar outcome; see Section 2.4 for more details. To balance the two objective functions, we let and , where the and the are estimated from the fetal data and the top 3 eigenvalues are selected using a PVE of 0.95. The above weights ensure that and . As a result, one objective function will not dominate the other. We let the set of candidate time points be the collection of half weeks between 13 and 41 weeks gestational age. Using the proposed method, we determine the optimal sampling points when the number of sampling points is fixed at 1, 2 and 3. We also calculate the relative error and Fig. 3 displays the results. The top left panel of Fig. 3 shows that if only 1 sampling point is selected, then 37 weeks is the optimal time point for collecting the ultrasound measurement and its relative error is about 0.20 (bottom right panel of Fig. 3). If 2 sampling points are desired, then 32 and 38 weeks are the optimal time points for collecting ultrasound measurements. With 2 optimal sampling points, the relative error is 0.13, which is smaller than the relative error with only 1 optimal sampling point. The bottom right panel of Fig. 3 displays the relative error with several values of . As expected, the relative error decreases as increases. Using the selection criterion (9) with , we determine that optimally, 2 sampling points would be selected.

Fig. 3

To evaluate the uncertainty in the estimated optimal sampling points, we bootstrap the fetal ultrasound data at the subject level and select the optimal sampling points for 1000 bootstrapped datasets. Fig. 4 gives the histograms of the selected optimal sampling points, which show small variability of the estimated optimal sampling points. For example, for , week 37 is selected about 60% of the times.

Fig. 4

Histograms of selected optimal scan weeks from 1000 bootstrapped datasets for and . The blue dashed lines are the estimated optimal scan weeks using the original fetal ultrasound data.

Optimal sampling points and the corresponding relative error levels. The dashed gray vertical lines in the top left, top right and bottom left panels are the candidate sampling points. The relative error levels are , where is the vector of optimal sampling points determined by . Finally, we plot in Fig. 5 screenshots of the Shiny interface for the fetal ultrasound. The top panel displays the heat map of the objective function/prediction errors as a bivariate function of two scan weeks. The optimal weeks are highlighted. The heat map indicates that at least one sampling point needs to be no early than 33 weeks in order to obtain a relatively small prediction error. As these plots evaluate the prediction error of any combination of candidate sampling points, they can be used to find all candidate sampling points that give a prediction error smaller than certain fixed error. The interface is interactive as users can select the first scan weeks and then the application will find the optimal second scan weeks. Moreover, users can go further by selecting the second scan weeks and compare the results with the optimal scan weeks. For example, as illustrated in the bottom panel, 13 weeks is selected for the first scan, then the 37 weeks is found to be optimal second scan weeks (left plot in the bottom panel). If 16 weeks is also selected for the second scan, then the result can be compared with several different choices including the optimal scan weeks (right plot in the bottom panel). A similar screenshot with the goal of selecting just one scan is presented in Figure S.4 of the Supplementary materials. Histograms of selected optimal scan weeks from 1000 bootstrapped datasets for and . The blue dashed lines are the estimated optimal scan weeks using the original fetal ultrasound data. Screenshots of the interface of the Shiny application for the fetal ultrasound. (a) Heat map of the objective function evaluated with two scans. (b) Objective function evaluated with two scans while one fixed at 16 weeks.

A simulation study

We conduct a simulation study to investigate the performance of the proposed design for (a) estimating optimal sampling points and for (b) selecting the number of optimal sampling points. We also compare functional data models against a parametric mixed effects model in terms of estimating optimal sampling points, when data is generated from either a functional data model or a parametric mixed effects model. We focus on the linear joint design where the goal is to best predict both an underlying true curve and a scalar outcome and we use the same design criterion matrix in the data example with weights and .

Simulation settings

For each simulation scenario, we use Monte Carlo samples from the model in (1). For simplicity, we let the mean function be zero for all . We generate by , where is a set of orthonormal eigenfunctions (to be specified later) and is sampled from a normal distribution with mean zero and variance . Random errors are sampled independently from a normal distribution with mean zero and variance , which implies that the signal to noise ratio equals one. The number of observations per subject varies across subjects and the sampling time points are drawn from the uniform distribution in the unit interval. We consider a factorial design with three experimental factors: Covariance function : (a) Periodic covariance induced by five Fourier bases: for odd and for even . The covariance function is periodic because . (b) Non-periodic covariance induced by five eigenfunctions shown in Figure S.6 of the Supplementary materials and the eigenfunctions do not have analytical forms. Number of observations per subject: (a) and (b) . Number of subjects: (a) ; (b) ; and (b) . Thus, in total there are 12 model conditions to examine. Then the scalar outcomes, , are generated from the functional linear model in (2). For simplicity, we let intercept . We use four different coefficient functions (see Figure S.7 of the Supplementary materials): , with and 0.5. . . Note that in FLM-Case1 the coefficient function depends on the eigenfunctions and is different for the periodic and non-periodic covariances (see Figure S.7 of the Supplementary materials). Random errors in (2) were sampled independently from a normal distribution with mean zero and variance .

Results for estimation of optimal sampling points

We consider estimation of optimal sampling points when the number of optimal points is fixed at either 3, 4 or 5. Let be the optimal sampling points that minimize the true objective function and let be the selected sampling points that minimize the estimated objective function . We evaluate the accuracy of the estimated optimal sampling points using the following evaluation criterion: The absolute relative error, , measures how close the expected (integrated) squared error using observations collected at the estimated optimal sampling points is to the expected (integrated) squared error using the true optimal points. We compare between and , rather than between and , for the following reasons. First, when the covariance function is periodic as the one shown in the top left panel of Figure S.6 of the Supplementary materials, is not identifiable. This is because with a periodic covariance function, data (excluding random errors) collected at any sampling point in the left half of the domain is the same as data collected at one sampling point in the right half. The identifiability issue is illustrated in Section S.3.2 of the Supplementary materials. Second, as our ultimate goal is to minimize , the expected (integrated) squared error, we consider that the measure is more appropriate. In additional to functional data methods, we consider the following linear mixed effects (LME) model, for estimating the covariance function , where and are subject-specific random intercept and slope, respectively. The above model leads to a quadratic covariance function. In the following tables we use the labels, non-parametric and parametric, to indicate covariance estimation using the functional data model and using the linear mixed effects model, respectively. The results with the periodic covariance function are summarized in Table 1. The proposed design works well and the ARE decreases as a function of number of subjects and number of observations per subject . The improved performance is due to improved estimation accuracy of the covariance function (and associated eigenfunctions and eigenvalues) as well as of the error variance of the random errors (results not shown). The results with the non-periodic covariance function are similar and are shown in Section S.3.3 of the Supplementary materials.

Table 1

Median of absolute relative errors, and the corresponding interquartile ranges (IQR) in parentheses for the case of the periodic covariance.

			Joint-Case1		Joint-Case2		Joint-Case3
			Non-parametric	Parametric	Non-parametric	Parametric	Non-parametric	Parametric
p=3

	n=400	mi∼{3,4,5}	0.019 (0.019)	1.292 (0.032)	0.018 (0.033)	1.507 (0.284)	0.055 (0.053)	1.598 (0.006)
		mi∼{7,…,10}	0.011 (0.016)	1.537 (0.264)	0.010 (0.015)	1.507 (0.284)	0.024 (0.026)	1.598 (0.006)
	n=800	mi∼{3,4,5}	0.011 (0.016)	1.537 (0.000)	0.010 (0.023)	1.507 (0.000)	0.030 (0.037)	1.598 (0.006)
		mi∼{7,…,10}	0.005 (0.011)	1.537 (0.264)	0.009 (0.007)	1.507 (0.284)	0.012 (0.014)	1.598 (0.006)
	n=1500	mi∼{3,4,5}	0.011 (0.014)	1.537 (0.000)	0.010 (0.014)	1.507 (0.000)	0.026 (0.024)	1.601 (0.006)
		mi∼{7,…,10}	0.005 (0.007)	1.537 (0.264)	0.005 (0.007)	1.507 (0.284)	0.012 (0.010)	1.598 (0.006)

p=4

	n=400	mi∼{3,4,5}	0.047 (0.030)	1.585 (0.175)	0.046 (0.037)	1.612 (0.286)	0.070 (0.054)	1.983 (0.099)
		mi∼{7,…,10}	0.015 (0.023)	1.676 (0.000)	0.016 (0.023)	1.612 (0.000)	0.027 (0.035)	1.983 (0.000)
	n=800	mi∼{3,4,5}	0.031 (0.040)	1.676 (0.264)	0.029 (0.043)	1.612 (0.286)	0.045 (0.041)	1.983 (0.025)
		mi∼{7,…,10}	0.012 (0.013)	1.676 (0.000)	0.011 (0.011)	1.612 (0.000)	0.02 (0.022)	1.983 (0.000)
	n=1500	mi∼{3,4,5}	0.018 (0.024)	1.676 (0.000)	0.018 (0.026)	1.612 (0.000)	0.033 (0.032)	1.983 (0.000)
		mi∼{7,…,10}	0.007 (0.010)	1.676 (0.000)	0.007 (0.010)	1.612 (0.000)	0.008 (0.016)	1.983 (0.000)

p=5

	n=400	mi∼{3,4,5}	0.059 (0.051)	1.713 (0.050)	0.051 (0.044)	1.929 (0.342)	0.063 (0.070)	2.167 (0.020)
		mi∼{7,…,10}	0.027 (0.027)	1.695 (0.321)	0.022 (0.027)	1.587 (0.342)	0.026 (0.028)	2.167 (0.020)
	n=800	mi∼{3,4,5}	0.039 (0.037)	2.016 (0.321)	0.039 (0.037)	1.929 (0.342)	0.043 (0.041)	2.167 (0.020)
		mi∼{7,…,10}	0.020 (0.022)	1.695 (0.321)	0.013 (0.020)	1.587 (0.342)	0.016 (0.016)	2.167 (0.020)
	n=1500	mi∼{3,4,5}	0.029 (0.027)	2.016 (0.321)	0.029 (0.031)	1.929 (0.342)	0.032 (0.036)	2.167 (0.020)
		mi∼{7,…,10}	0.009 (0.017)	1.695 (0.000)	0.007 (0.009)	1.587 (0.000)	0.010 (0.015)	2.167 (0.020)

Note: Joint-Case1 indicates that the scalar responses are generated using in FLM-Case1; similarly, Joint-Case2 corresponds to FLM-Case2 and Joint-Case3 to FLM-Case3. non-parametric and Parametric refer to the covariance estimation using the fPCA and LME models, respectively.

In addition to the ARE measure, we study the behavior of the objective function for different choices of by investigating the median and interquartile range (IQR) of the . The statistics for the periodic and non-periodic covariance cases are presented in Tables S.1 and S.2 of the Supplementary materials, respectively. The true objective function depends on the true covariance function , , , and . Thus can only be compared across different , but not across different simulation settings with different or . As expected decreases with more number of optimal sampling points. The same holds when we use the parametric covariance estimation, estimating using the LME model. Because the LME model is misspecified for modeling functional data, selecting more optimal points by the parametric estimation has only a slight effect on improving the prediction accuracy. In all cases the proposed method with the non-parametric covariance estimation gives a smaller prediction error than the parametric estimation. When data are generated from the LME model, the proposed method performs equally well with both the non-parametric and parametric covariance estimation; see Section S.3.4 of the Supplementary materials. In conclusion, the proposed method with non-parametric covariance estimation using the fPCA model performs well on data with both simple and complex covariance structures. Median of absolute relative errors, and the corresponding interquartile ranges (IQR) in parentheses for the case of the periodic covariance. Note: Joint-Case1 indicates that the scalar responses are generated using in FLM-Case1; similarly, Joint-Case2 corresponds to FLM-Case2 and Joint-Case3 to FLM-Case3. non-parametric and Parametric refer to the covariance estimation using the fPCA and LME models, respectively.

Results for selection of number of optimal sampling points

Now we evaluate the performance of the proposed method in (9) for selecting the number of optimal sampling points . We use and the true number of optimal points determined by (8) is 3. The performance of the proposed method is assessed in terms of the proportion of selecting the correct number of optimal sampling points, , where is an indicator function and is the number of optimal sampling points determined by (9) using the th simulated data. The simulation results are presented in Table 2. We see that the performance of the proposed method is excellent for all cases. The results for the non-periodic function are similarly good and presented in Table S.4 of the Supplementary materials.

Table 2

Proportion of selected number of points being equal to 3 for the case of the periodic covariance.

		Joint-Case1	Joint-Case2	Joint-Case3
n=400	mi∼{3,4,5}	0.94	0.97	0.94
	mi∼{7,…,10}	0.98	0.99	0.95
n=800	mi∼{3,4,5}	0.98	0.99	0.95
	mi∼{7,…,10}	0.99	1.00	0.98
n=1500	mi∼{3,4,5}	0.99	0.99	0.96
	mi∼{7,…,10}	1.00	1.00	0.99

Note: Joint-Case1 indicates that the scalar responses are generated using in FLM-Case1; similarly, Joint-Case2 corresponds to FLM-Case2 and Joint-Case3 to FLM-Case3.

Proportion of selected number of points being equal to 3 for the case of the periodic covariance. Note: Joint-Case1 indicates that the scalar responses are generated using in FLM-Case1; similarly, Joint-Case2 corresponds to FLM-Case2 and Joint-Case3 to FLM-Case3.

Uncertainty of estimated optimal sampling points

To assess the uncertainty of the optimal sampling points estimated from the proposed method, we use a bootstrap approach as in the data application. For each simulated data, we bootstrap at the subject level, select optimal sampling points from the model estimation based on the bootstrapped data, and calculate the third quartile and 90% percentile of absolute relative errors in (10). The medians of the percentiles are presented in Tables S.7 and S.8 of the Supplementary materials. The results show good stability of the estimated optimal sampling points, which gets better when either the sample size or the number of observations per subject increases.

13 in total

10. Using Wavelet-Based Functional Mixed Models to Characterize Population Heterogeneity in Accelerometer Profiles: A Case Study.

Authors: Jeffrey S Morris; Cassandra Arroyo; Brent A Coull; Louise M Ryan; Richard Herrick; Steven L Gortmaker
Journal: J Am Stat Assoc Date: 2006-12-01 Impact factor: 5.033

A joint design for functional data with application to scheduling ultrasound scans.

Introduction

Optimal design for functional data

Statistical models

Optimal design for predicting functions

Optimal design for predicting an outcome

A joint design for functional data

Properties of

Implementation

Model estimation using pilot data

Optimization algorithm and selection of number of design points

Software and shiny interactive graphic

Application to fetal ultrasound

A simulation study

Simulation settings

Results for estimation of optimal sampling points

Results for selection of number of optimal sampling points

Uncertainty of estimated optimal sampling points

1. Functional generalized linear models with images as predictors.

2. Time-synchronized clustering of gene expression trajectories.

3. Hierarchical functional data with mixed continuous and binary measurements.

4. Longitudinal functional principal component analysis.

5. Structured penalties for functional linear models-partially empirical eigenvectors for regression.

6. Longitudinal Functional Data Analysis.

7. Smoothing dynamic positron emission tomography time courses using functional principal components.

8. Assessing systematic effects of stroke on motorcontrol by using hierarchical function-on-scalar regression.

9. Functional Causal Mediation Analysis With an Application to Brain Connectivity.

10. Using Wavelet-Based Functional Mixed Models to Characterize Population Heterogeneity in Accelerometer Profiles: A Case Study.