Literature DB >> 35265102

Gene Region Association Analysis of Longitudinal Quantitative Traits Based on a Function-On-Function Regression Model.

Shijing Li1,2, Shiqin Li3, Shaoqiang Su1, Hui Zhang1,2, Jiayu Shen1,2, Yongxian Wen1,2.   

Abstract

In the process of growth and development in life, gene expressions that control quantitative traits will turn on or off with time. Studies of longitudinal traits are of great significance in revealing the genetic mechanism of biological development. With the development of ultra-high-density sequencing technology, the associated analysis has tremendous challenges to statistical methods. In this paper, a longitudinal functional data association test (LFDAT) method is proposed based on the function-on-function regression model. LFDAT can simultaneously treat phenotypic traits and marker information as continuum variables and analyze the association of longitudinal quantitative traits and gene regions. Simulation studies showed that: 1) LFDAT performs well for both linkage equilibrium simulation and linkage disequilibrium simulation, 2) LFDAT has better performance for gene regions (include common variants, low-frequency variants, rare variants and mixture), and 3) LFDAT can accurately identify gene switching in the growth and development stage. The longitudinal data of the Oryza sativa projected shoot area is analyzed by LFDAT. It showed that there is the advantage of quick calculations. Further, an association analysis was conducted between longitudinal traits and gene regions by integrating the micro effects of multiple related variants and using the information of the entire gene region. LFDAT provides a feasible method for studying the formation and expression of longitudinal traits.
Copyright © 2022 Li, Li, Su, Zhang, Shen and Wen.

Entities:  

Keywords:  association testing; functional data analysis; gene region; longitudinal traits; rare variants

Year:  2022        PMID: 35265102      PMCID: PMC8899465          DOI: 10.3389/fgene.2022.781740

Source DB:  PubMed          Journal:  Front Genet        ISSN: 1664-8021            Impact factor:   4.599


1 Introduction

With sequencing technology development, genome-wide association studies (GWASs) have identified thousands of genetic variants successfully (Robinson et al., 2014). This research plays an important role in identifying the genetic associations of complex traits and diseases. However, GWASs that assess quantitative traits at a single time cannot better reveal the genetic mechanism of biological development. In fact, longitudinal traits have always been a major scientific issue in biology. As early as 1962, Kheiralla and Whittingtom, 1962 found that genetic effects behave differently in different periods. In the eighth decade of the last century, Lewis, 1978 revealed the molecular mechanism of morphological development in Drosophila, which laid a foundation for developing trait developmental genetics. At present, more and more scholars are conducting research on longitudinal traits and exploring the response mechanism of longitudinal traits to genetic variation in the development of crops. (Smith et al., 2010; Cousminer et al., 2013; Tang et al., 2014). With the development of molecular biotechnology, the position of genes that control phenotypic traits in the genome was determined via a linkage analysis and association analysis to reveal the influence of genetic variations on phenotypic traits. For quantitative traits, many QTL (quantitative trait locus) mapping methods were proposed. Early QTL mapping methods that use a linkage analysis were mainly divided into three types for longitudinal traits: 1) treating phenotypes at different time points as repeated values of one trait; 2) treating phenotypes at different time points as measured values of different traits and analyze them via multiple trait methods; and 3) establishing a model between time points and phenotypes (Zhang, 2006). The former two methods were discrete in both quantitative traits and gene locus directions (as shown with a diamond in Figure 1). The third method fits longitudinal traits to a continuous curve. However, the locus direction still maintained a discrete state (as shown with a square in Figure 1). In the longitudinal data analysis, the third method was most commonly used. Further, several statistical methods had been developed, such as random effects models (Laird and Ware, 1982), hierarchical linear models (Raudenbush and Bryk, 2002), empirical Bayes models (Hui and Berger, 1983), and growth mixture models (Muthen, 2004). Now with the development of GWASs, the above statistical methods have been applied to test the single genetic variant of longitudinal traits via an association analysis. Das et al. (2011) integrated the growth curve describing traits into the GWAS framework and established a functional GWAS model to improve the test power of variants. Fan et al. (2012) proposed temporal association mapping models for longitudinal population data. Both parametric models and nonparametric models were proposed to be applied to multiple diallelic genetic markers. Meanwhile, Meirelles et al. (2013) established a shrinking average model based on the empirical Bayes algorithm. The test power of that dynamic model at multiple time points was significantly increased compared with that of a single time point.
FIGURE 1

Research status on quantitative traits and genetic variants.

Research status on quantitative traits and genetic variants. GWASs were mainly divided into two types of studies for quantitative traits: 1) an association analysis based on common variants and 2) and an association analysis based on rare variants. At present, the single-variant association analysis has always been used by GWAS based on common variants mainly. Many methods were proposed in these studies and made great progress. However, the single-variant association analysis was limited to test rare variants common in high-throughput sequencing (Han and Pan, 2010; Sha et al., 2016). Common variants explained only a small part of genetic variation, and most of the associated sites that controlled complex traits were rare variants (Gibson, 2012; Marouli et al., 2017). A single-variant association analysis often ignored the overall information of gene region that rare variants were located. An association analysis method based on gene region can analyze the combination of the effects of the variant sites in the entire gene region. That method reduced the burden of multiple testing and has larger test power (Neale and Sham, 2004; Wu et al., 2010). Most association analysis methods based on the gene region were designed for the phenotypic traits at a single time point. These methods were mainly divided into three types: 1) the burden test method based on the idea of merging (Madsen and Browning, 2009; Han and Pan, 2010; Morris and Zeggini, 2010; Price et al., 2010; Lin and Tang, 2011), 2) the variance composition method based on mixture effects (Liu et al., 2007; Kwee et al., 2008; Liu et al., 2008; Wu et al., 2010; Wu et al., 2011; Schifano et al., 2012; Chen et al., 2013), and 3) the method based on a functional data analysis (Luo et al., 2012; Svishcheva et al., 2015; Svishcheva et al., 2016a; Svishcheva et al., 2016b; Li et al., 2020). The current functional data analysis method maintained a discrete state in the direction of quantitative traits at a single time point and treated many discrete SNP sequences located in a narrow gene region as continuous variables in the direction of the gene locus. Then, gene regions that contained a large amount of genetic variation were analyzed using a functional data analysis (as shown with a triangle in Figure 1). Many studies have shown that the test power of the functional data analysis method was higher than that of the burden test method based on the combination idea and the variance component method based on the mixture effect (Luo et al., 2012; Fan et al., 2013; Svishcheva et al., 2016a). The literature on the statistical method of the gene region association analysis for longitudinal traits was limited (Beyene and Hamid, 2014; Wu et al., 2014; Yan et al., 2015; Chien et al., 2016; Cao et al., 2017). In recent studies, a longitudinal trait association test method with the covariates based on SKAT (the sequence kernel association test) method (LSKAT; Longitudinal SNP-set/SKAT) was proposed (Wang et al., 2017). This method combined the features of linear mixed models and kernel machine methods. The association between genetic variation regions and longitudinal traits was analyzed using LSKAT. At the same time, a longitudinal trait burden test (LBT) that tested the association between traits and burden scores in a linear mixed model was proposed in that study. However, the inversion of the correlation matrix was required for these methods, and the calculation of the p-value using the eigenvalue decomposition of the correlation matrix brought a computational burden. Simultaneously, the time-varying genetic effect was not considered, but the influence of genes on traits might change over time. In the functional data analysis, the function-on-function regression model can fit the growth curve of quantitative traits and transform dense discrete gene loci into continuous functions (as shown with a dot in Figure 1). Therefore, we propose a longitudinal functional data association test (LFDAT) method where the function-on-function regression model is applied to detect the association between the gene region and the longitudinal trait. This method can aggregate the small effects distributed at multiple sites, gather the association information of the entire gene region, and improve the test power of related sites with micro effects.

2 Methods

2.1 Function-On-Function Regression Model

Suppose that there are n individuals in a population, and the group structure and other factors are not considered. The SNP sequence constitutes the gene region [0, M] containing the L genetic locus, and the growth and development traits are measured in the time period [0, T]. Let denote the phenotype of the ith subject at the time point t ( ) and denote the marker information of the ith subject at the sth ( ) genetic locus. Consider a QTL with two alleles: Q and q. The two alleles can form three genotypes: QQ, Qq, and qq. The value of is 2 for QQ, 1 for Qq, and 0 for qq. At the time point t, the relationship between the phenotypic trait and the marker information can be described with the following multivariate linear genetic model: where is the population mean, is random error following a normal distribution , and is the time correlation coefficient between each . Further, is the genetic effect of the sth genetic locus at time point t. When the number of genetic markers is infinitely dense, the genetic model of the phenotype can be expressed by a function-on-function regression model: where is the genetic marker function of the ith subject in the gene region and is the genetic effect of the sth genetic locus at the time point t, which is referred to herein as the time-varying function of the genetic effect.

2.2 Parameter Estimation

The intercept function is dropped by centralization to simplify our discussion of the estimation of the model (2). According to functional data analysis method (Malfait and Ramsay, 2003; Zhao, 2015), let and , where and . Then, obtain the following: The asterisk is dropped in what follows to further simplify the expression. Let be the approximation of . Based on functional data analysis method, so can be linearly expressed by K known two-dimensional basis functions : Combine (4) and (3) to obtain the following: where and . is the combined error composed of random error and approximation error . In simplifying the notations, we further obtain the matrix expression of Eq. 5: where The least-square method is used to estimate the coefficient vector for the sum of integrated squared errors, namely This is equivalent to solving the following equation: We impose a roughness penalty term on the two-dimensional basis function in each dimension separately. Let and denote the roughness penalty in the s and t directions, respectively. where is a matrix, is a smoothing parameter and , is the second derivative of for the direction of s. The same, the matrix expression of is as follows: where is a matrix, is a smoothing parameter and , is the second derivative of for the direction of t. Now, we wish to minimize the sum of two penalties and the sum of integrated squared errors, expressed as follows: This is equivalent to solving the following equation: We evaluate and at a set of time points . Let Then, the regular equation of the least square method can be obtained from Eq. 12 Finally, the least square estimate of the coefficient vector in Eq. 13 is as follows:

2.3 Hypothesis Testing Based on the Function-On-Function Regression Model

We usually consider the following hypothesis testing to detect whether the association between the gene regions and the phenotypes exists: Since the time-varying function of the genetic effect is a linear combination of two-dimensional basis functions, the above hypothesis testing is equivalent to the following: The following test statistics are available for the above hypothesis testing: where and are the sums of the squared residuals under the null model and the alternative model, respectively.

2.4 The Evaluation Indicators of the Estimation Result of the Time-Varying Function of the Genetic Effect

This was done to further measure the fitness of LFDAT for the time-varying function of the genetic effect in the gene region and provide other reference indicators for LFDAT in the process of the association analysis of the longitudinal trait. Then, some evaluation indicators are established for LFDAT. Let denote the null region of , and denote the non-null region of , which is defined as the following: and From a statistical genetic point of view, if the time-varying function of the genetic effect is null, the sth genetic locus at the time point t is not associated with the longitudinal trait. For the null region and non-null region, as Lin et al. (2017) and Centofanti et al. (2020) noted, we consider the integrated squared errors (ISE) as the fitting criterion for the estimator . The ISE over the null region (ISE0) and the non-null region (ISE1) are defined as follows: and where and are the measures of the null and non-null regions, respectively. The ISE0 and ISE1 are the measures of integrated squared errors between the true function and an estimated function on the null and non-null regions, respectively. The predictive performance is measured by prediction mean squared errors (PMSE), defined as the following: where test denotes the test sample data set, N is the size of the test sample data set, and is the estimated intercept . In the gene region, we define ISE0, ISE1, and PMSE as the criterion to measure the fitness of the time-varying effect function. ISE0 is used to measure the fitness of the null effect, which denotes the overall deviation between the true value and the estimated value at the site where there is no effect value in the gene region. The expression is as follows: where denotes a collection of SNP sites that do not have an association relationship in the region, represents the number of elements in the collection , denotes the collection of measurement time points, represents the number of elements in the collection , and and represent the actual effect and estimated effect of the sth genetic locus at time point t in the collection , respectively. ISE1 is used to measure the fitness of the non-null effect, which denotes the overall deviation between the true value and the estimated value at the site where there is an effect value in the gene region. The expression is as follows: where denotes a collection of SNP sites with an association relationship in the region, represents the number of elements in the collection , and and represent the actual effect and estimated effect of the sth genetic locus at time point t in the collection , respectively. PMSE is used to measure the fitness of the genetic model, which denotes the overall deviation between the estimated value of the trait obtained fitted by the model and the true value of the trait in the test set. The expression is as follows: where test denotes the test sample data set, N is the size of test sample data set, denotes the true value of the trait of the ith subject in test data set at time point t, and denotes the predictive value of the trait of the ith subject in test data set at time point t.

3 Simulation Studies

The SNP sequence data set generated by the computer is used for type I error simulation and power simulation to evaluate the feasibility of the LFDAT. In Luo et al. (2012), the FLM and the Smoothed FLM are proposed for test association between gene region and quantitative trait. Because both the Smoothed FLM and LFDAT have the smooth penalty, so we compare the power of the Smoothed FLM to that of the LFDAT in simulation. However, the Smoothed FLM is only applicable to a single measurement, it’s applied to detect association between gene region and trait at each time point. In the simulation, we consider linkage equilibrium simulation and linkage disequilibrium simulation. The SNP sequence data set simulated contains a 50 kb gene region, and a 1 kb genetic subregion is randomly selected from the gene region to assess type I error rates and power. The sizes of the samples are 1,000, 1,500, and 2,000, respectively. Gene regions consider five cases: 1) gene regions only have common variants, 2) gene regions only have rare variants, 3) gene regions only have low-frequency variants, 4) gene regions are randomly composed of 20% common variants and 80% rare variants, and 5) gene regions are randomly composed of 80% common variants and 20% rare variants. In the simulation, the upper limit b and lower limit a of U (a, b) corresponding to the MAF (minor allele frequency) of gene regions are different. The gene regions of rare variants are (0.0005, 0.01), gene regions of low-frequency variants are (0.01, 0.05), and gene regions of common variants are (0.05, 0.5). The codes used in this paper are the linmod function in the fda package of the R software (Ramsay et al., 2009). In the simulation, set the number of two-dimensional B-spline basis functions K to 15 and the order d to 4. Leave-one-out cross-validation (Ramsay et al., 2009) can be used to select the optimal parameter from a set of smoothing coefficients for and . Due to space limitations, all the simulated results are attached to Supplementary Datas S1–S6.

3.1 Linkage Equilibrium Simulation

3.1.1 Type I Error Rates

We use the following model to generate phenotype data to assess type I error rates of LFDAT: where , , and the time correlation coefficient between each random error is 0.5. We randomly selected a 1 kb subregion from the SNP sequence data set as the genotype data of the gene regions. Notice that the null hypothesis is valid, and the phenotypes have nothing to do with the current genotypes. A total of 1,000 genotype-phenotype data sets for each sample size were simulated. The test statistics and related p-value based on the above genetic model were calculated. Under a given significance level α, the ratio of genotype-phenotype data sets that p-value is less than α is regarded as a type I error rate. All results of type I error rates simulation can be seen Supplementary Data S1. Table 1 shows the type I error rates of the LFDAT and Smoothed FLM at the significance level of 0.05, 0.01, and 0.001 for linkage equilibrium simulation. It can be seen that LFDAT controls the type I error rates at each level of significance. The type I error rates of rare gene regions and low-frequency gene regions are lower than that of common gene regions. The type I error rates of gene regions with more common variants are generally higher than those with less common variants. As the significance level increases, the type I error rates of gene regions gradually decrease. For smaller significance levels (α is 1e-4, 1e-5, and 1e-6), LFDAT still performs well, and the type I error rates are all 0 (See Supplementary Data S1). Compared with the type I error rates of the LFDAT, the type I error rates of the Smoothed FLM is severely inflated. It means that there are more false associated gene regions with quantitative trait using the Smoothed FLM method. Simulation studies have shown that association analysis which combines the multiple measurement of quantitative traits can reduce the type I error rates.
TABLE 1

Type I error rates of the LFDA and Smoothed FLM based on 1,000 simulated replicates for linkage equilibrium simulation.

αSample sizeGene regionLFDASmoothed FLM
t = 1t = 2t = 3t = 4t = 5t = 6t = 7t = 8t = 9t = 1t = 2t = 3t = 4t = 5t = 6t = 7t = 8t = 9
0.051,000Common0.0250.0080.0110.0120.0080.0060.0110.0100.0270.0630.0430.0570.0570.0530.0490.0610.0480.057
Rare0.0130.0070.0050.0030.0040.0020.0010.0050.0120.0500.0560.0570.0480.0580.0530.0460.0480.056
Low0.0140.0120.0050.0040.0040.0060.0050.0040.0160.0550.0590.0550.0470.0450.0410.0530.0530.057
Mixture one0.0190.0070.0080.0060.0000.0070.0050.0160.0180.0620.0480.0560.0500.0490.0610.0560.0580.059
Mixture two0.0210.0110.0060.0100.0080.0090.0060.0100.0170.0540.0530.0400.0480.0590.0590.0600.0520.052
1,500Common0.0250.0090.0020.0100.0050.0090.0070.0090.0270.0580.0480.0440.0610.0350.0450.0490.0530.054
Rare0.0120.0040.0000.0020.0000.0030.0050.0020.0070.0530.0380.0570.0450.0430.0400.0490.0400.046
Low0.0190.0060.0110.0080.0020.0060.0040.0130.0170.0600.0420.0630.0500.0640.0550.0520.0620.059
Mixture one0.0130.0100.0060.0020.0000.0080.0020.0060.0210.0620.0600.0440.0440.0470.0450.0400.0360.054
Mixture two0.0310.0140.0090.0120.0050.0070.0100.0100.0280.0610.0730.0500.0530.0460.0490.0450.0450.054
2000Common0.0220.0090.0090.0050.0050.0090.0110.0070.0200.0470.0410.0440.0380.0480.0570.0530.0440.039
Rare0.0110.0130.0020.0030.0050.0040.0040.0070.0100.0530.0580.0460.0430.0450.0400.0540.0440.037
Low0.0130.0110.0090.0110.0020.0080.0090.0100.0110.0530.0590.0590.0530.0500.0530.0460.0460.041
Mixture one0.0230.0050.0110.0040.0070.0040.0090.0150.0140.0490.0530.0590.0510.0470.0460.0510.0470.048
Mixture two0.0240.0100.0090.0060.0100.0120.0160.0060.0320.0450.0520.0560.0550.0510.0550.0690.0490.062
0.011,000Common0.0030.0010.0010.0010.0020.0000.0010.0020.0060.0090.0080.0110.0130.0120.0110.0140.0110.012
Rare0.0010.0000.0000.0000.0000.0010.0000.0020.0010.0140.0080.0130.0110.0090.0090.0110.0120.012
Low0.0030.0020.0000.0010.0000.0000.0010.0000.0040.0110.0160.0090.0090.0070.0120.0120.0100.011
Mixture one0.0030.0010.0000.0010.0000.0000.0000.0010.0020.0120.0120.0120.0150.0070.0120.0100.0140.010
Mixture two0.0010.0000.0000.0000.0010.0010.0000.0010.0050.0100.0080.0100.0100.0110.0200.0070.0110.011
1,500Common0.0050.0020.0000.0010.0010.0000.0000.0020.0030.0090.0090.0020.0150.0060.0080.0110.0070.008
Rare0.0020.0000.0000.0000.0000.0000.0010.0000.0000.0130.0060.0050.0080.0070.0090.0130.0060.009
Low0.0030.0000.0010.0000.0000.0020.0000.0010.0030.0150.0100.0200.0120.0100.0150.0070.0120.012
Mixture one0.0020.0030.0010.0010.0000.0000.0000.0010.0010.0100.0100.0090.0040.0040.0140.0050.0080.010
Mixture two0.0060.0010.0010.0010.0000.0020.0020.0020.0040.0120.0190.0110.0160.0100.0110.0090.0120.009
2000Common0.0030.0000.0010.0000.0000.0000.0030.0020.0080.0040.0110.0090.0070.0110.0110.0140.0050.010
Rare0.0020.0020.0000.0000.0000.0000.0000.0010.0010.0110.0190.0050.0130.0120.0090.0080.0110.008
Low0.0010.0010.0000.0010.0000.0010.0000.0020.0010.0080.0080.0130.0180.0060.0120.0150.0060.005
Mixture one0.0050.0000.0020.0020.0000.0000.0020.0000.0020.0140.0130.0190.0080.0110.0080.0100.0180.006
Mixture two0.0060.0020.0000.0020.0010.0020.0030.0010.0040.0120.0140.0100.0130.0130.0190.0180.0090.010
0.0011,000Common0.0000.0000.0000.0000.0000.0000.0000.0020.0000.0000.0020.0010.0010.0020.0000.0020.0020.002
Rare0.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0010.0000.0000.0010.0000.0040.001
Low0.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0020.0000.0010.0010.0000.0010.0000.002
Mixture one0.0000.0000.0000.0000.0000.0000.0000.0000.0000.0010.0000.0000.0010.0000.0020.0000.0000.000
Mixture two0.0000.0000.0000.0000.0000.0000.0000.0000.0010.0000.0000.0000.0010.0010.0020.0000.0010.001
1,500Common0.0010.0000.0000.0000.0000.0000.0000.0000.0000.0010.0030.0000.0010.0010.0000.0000.0000.000
Rare0.0000.0000.0000.0000.0000.0000.0000.0000.0000.0030.0000.0000.0020.0000.0000.0000.0000.000
Low0.0000.0000.0000.0000.0000.0000.0000.0000.0000.0020.0010.0010.0010.0000.0020.0010.0030.000
Mixture one0.0000.0000.0010.0000.0000.0000.0000.0000.0000.0010.0010.0010.0010.0000.0010.0010.0010.000
Mixture two0.0000.0000.0000.0000.0000.0010.0000.0000.0000.0000.0010.0010.0010.0000.0020.0030.0020.000
2000Common0.0000.0000.0000.0000.0000.0000.0000.0010.0010.0000.0010.0010.0000.0000.0010.0030.0020.003
Rare0.0000.0000.0000.0000.0000.0000.0000.0000.0000.0010.0020.0000.0010.0010.0000.0000.0010.001
Low0.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0020.0010.0010.0000.0020.0010.0040.000
Mixture one0.0000.0000.0000.0000.0000.0000.0000.0000.0000.0010.0000.0010.0030.0000.0000.0020.0000.000
Mixture two0.0010.0000.0000.0000.0000.0000.0000.0000.0000.0020.0010.0000.0000.0000.0010.0020.0000.000

Note: Common denotes gene regions only with common variants, Rare denotes gene regions only with rare variants, Low denotes gene regions only with low-frequency variants, Mixture one denotes gene regions with 20% of common variants and 80% of rare variants, and Mixture two denotes gene regions with 80% of common variants and 20% of rare variants.

Type I error rates of the LFDA and Smoothed FLM based on 1,000 simulated replicates for linkage equilibrium simulation. Note: Common denotes gene regions only with common variants, Rare denotes gene regions only with rare variants, Low denotes gene regions only with low-frequency variants, Mixture one denotes gene regions with 20% of common variants and 80% of rare variants, and Mixture two denotes gene regions with 80% of common variants and 20% of rare variants.

3.1.2 Power

We randomly selected a 1 kb subregion from the SNP sequence data set under the alternative hypothesis as the genotype data of the variant region to measure the test power of LFDAT for the gene regions. The generate phenotypic data is based on the following model: where is the genotype of the ith subject in sth genetic locus, denotes the collection of causal variants in simulated gene regions, is the genetic effect in sth variant at time point t, , , , and the time correlation coefficient between each random error is 0.5. Consider the following scenarios for simulations: 1) the proportion of causal variants in the gene regions is 1, 2, or 4%, and 2) the proportion of negative effects of causal variants is 0, 20, 50%. Various processes in life activities are always accompanied by the selective opening and closing of different genes, and some genes are selectively expressed at a certain stage of development. Based on this phenomenon, the following two cases were considered for the time-varying function of the genetic effect: Case one. The time-varying function of the genetic effect is , where (Wu et al., 2011; Lee et al., 2012; Chen et al., 2013; Fan et al., 2013; Belonogova et al., 2018) is the genetic effect function and is the time effect function. Then, is the minor allele frequency of sth genetic locus. The constant c will directly affect the size of the genetic effect function, which is set to 3, 5, or 7 in the simulation. Case two. The time-varying function of the genetic effect is , where is the genetic effect function and is the time effect function. For each setting scenario, 1,000 genotype-phenotype data sets are simulated. At the given significance level α, the ratio of genotype-phenotype data sets with a p-value is less than α are used as power. For each genotype-phenotype data set, the variation area is the same for all individuals in the data set. However, we allow the variation of different data sets to be different.

Case One Simulation

We assess the test power of five gene regions under different sample sizes (n = 1,000, 1,500, 2,000) by LFDAT, and the features that result are the same for each sample size. All results of power simulation can be seen Supplementary Data S1. The figures of power based on LFDAT at nine time points are also shown in Supplementary Data S1 for different significance levels, constant c, the proportion of negative effects, and causal variants. Figure 2 (Only show the power figures when c is 3 and a sample size is 2000) show that the power of each time point is different, which might be the unequal value of the time-varying effect function at each time point. As constant c (See Supplementary Data S1) and the proportion of causal variants increase, the power also increases. However, as the proportion of negative effects and significance levels increase, the power gradually decreases. Overall, the power of the five gene regions is higher. We find that whether it is common, rare or low-frequency variants, as the genetic effects increase, the power of testing gene region increases by simulation study. The proportion of negative effects has a smaller impact on the power of mixture gene region one than on the other four gene regions. It may be that the effect values of rare variants are larger than that of common variants, and the offset effects of mixture gene region one are not as much as other regions. The LFDAT is applicable to common variants, rare variants and low-frequency variants.
FIGURE 2

Power of linkage equilibrium’s case one and case two based on LFDAT for the five gene regions when c is 3, and sample size is 2000. The (A–C) denotes the power results of case one. The (D–F) denotes the power results of case two. The time effect function is for case one, and for case two. (A) Proportion of causal variants is 1% (B) Proportion of causal variants is 2% (C) Proportion of causal variants is 4%. (D) Proportion of causal variants is 1% (E) Proportion of causal variants is 2% (F) Proportion of causal variants is 4%. Note: Common region denotes gene regions only with common variants, Rare region denotes gene regions only with rare variants, Low region denotes gene regions only with low-frequency variants, Mixture region one denotes gene regions with 20% of common variants and 80% of rare variants, and the Mixture region two denotes gene regions with 80% of common variants and 20% of rare variants.

Power of linkage equilibrium’s case one and case two based on LFDAT for the five gene regions when c is 3, and sample size is 2000. The (A–C) denotes the power results of case one. The (D–F) denotes the power results of case two. The time effect function is for case one, and for case two. (A) Proportion of causal variants is 1% (B) Proportion of causal variants is 2% (C) Proportion of causal variants is 4%. (D) Proportion of causal variants is 1% (E) Proportion of causal variants is 2% (F) Proportion of causal variants is 4%. Note: Common region denotes gene regions only with common variants, Rare region denotes gene regions only with rare variants, Low region denotes gene regions only with low-frequency variants, Mixture region one denotes gene regions with 20% of common variants and 80% of rare variants, and the Mixture region two denotes gene regions with 80% of common variants and 20% of rare variants. At the same time, we also compare the power of the LFDAT and Smoothed FLM (See Supplementary Data S1). As the sample size increases, the power increases. In Table 2 (Due to space limitations, results of power are shown when significance level is 0.05, sample is 2000, c is 7, and proportion of causal variants is 1%), we can see that the power of the LFDAT is very close to that of the Smoothed FLM. These results indicate that LFDAT can reduce the probability of making the type I errors with keeping its high power.
TABLE 2

The power of linkage equilibrium simulation based on LFDA and Smoothed FLM at significance level of 0.05 when sample size is 2000, c is 7 and proportion of causal variants is 1%.

Proportion of negative effects (%)Gene regionLFDASmoothed FLM
t = 1t = 2t = 3t = 4t = 5t = 6t = 7t = 8t = 9t = 1t = 2t = 3t = 4t = 5t = 6t = 7t = 8t = 9
Case 10Common0.9820.9870.9900.9890.9890.9890.9910.9890.9870.9830.9890.9900.9900.9910.9900.9920.9890.987
Rare0.9760.9850.9920.9910.9890.9920.9930.9910.9920.9840.9910.9920.9920.9930.9930.9940.9930.993
Low0.9900.9910.9920.9920.9920.9950.9930.9950.9930.9920.9910.9920.9920.9920.9950.9950.9960.994
Mixture one0.9080.9150.9240.9370.9340.9310.9320.9310.9250.9160.9220.9320.9400.9370.9340.9380.9380.929
Mixture two0.9790.9740.9840.9840.9810.9860.9860.9860.9830.9790.9770.9840.9850.9830.9880.9870.9880.985
20Common0.9130.9210.9310.9250.9230.9310.9340.9290.9340.9130.9240.9330.9270.9270.9320.9360.9290.935
Rare0.9210.9410.9600.9590.9640.9640.9680.9620.9520.9410.9530.9670.9700.9750.9700.9720.9670.968
Low0.9450.9480.9590.9620.9620.9580.9640.9600.9540.9510.9540.9610.9630.9630.9610.9650.9630.960
Mixture one0.8840.8830.8980.9060.8990.9090.8980.8960.9010.8930.8960.9020.9140.9100.9150.9060.9010.903
Mixture two0.9410.9440.9480.9500.9550.9510.9530.9530.9500.9440.9510.9550.9540.9550.9520.9570.9550.951
50Common0.8430.8670.8740.8830.8780.8800.8770.8760.8770.8460.8770.8770.8870.8840.8860.8850.8770.881
Rare0.8600.8880.8940.9090.9150.9200.9150.9090.9030.8950.9060.9120.9260.9280.9320.9250.9200.924
Low0.8870.9120.9160.9160.9160.9200.9160.9110.9120.8970.9160.9180.9260.9240.9220.9170.9190.915
Mixture one0.8330.8530.8590.8660.8700.8700.8770.8660.8630.8460.8590.8660.8750.8760.8760.8800.8730.867
Mixture two0.8700.8890.8990.9020.8960.9000.8990.8960.8990.8780.8950.9050.9080.9050.9030.9010.9000.905
Case 20Common0.9860.9710.0000.9670.9330.9680.0000.9730.9870.9910.980.0630.970.990.9720.0470.980.989
Rare0.9790.9500.0000.9490.9720.9390.0000.9470.9820.9950.9720.0580.9710.9940.9630.0550.9720.993
Low0.990.9850.0000.9860.9620.9770.0000.9810.9910.9930.9890.0630.9890.9920.9820.050.9840.992
Mixture one0.8940.850.0000.8520.8190.8490.0000.8550.8910.9120.8770.0430.8720.9110.8710.0470.8860.917
Mixture two0.9770.9520.0000.9590.9220.9610.0000.9610.9750.9810.9670.0510.9680.9810.9670.0460.9680.979
20Common0.9240.8790.0000.8810.7870.8820.0000.880.9210.940.9080.0510.8960.9380.8990.0570.9060.932
Rare0.9190.8810.0000.8760.8920.8840.0000.8820.9240.9710.9250.050.9130.9670.9170.0450.9230.968
Low0.9410.920.0000.9310.850.9330.0000.9280.9450.9650.940.0560.940.9650.9420.0550.9420.964
Mixture one0.8780.8390.0000.8280.8030.8320.0000.830.8760.9090.8650.050.8580.9070.8610.0520.8570.906
Mixture two0.9330.8960.0000.9070.8270.9040.0000.9070.9340.950.9150.0520.9160.9470.9150.0460.9190.948
50Common0.8520.8090.0000.8110.6860.8040.0000.810.8550.8840.8370.0390.8380.8860.830.0430.8440.89
Rare0.8380.7960.0000.7930.7940.7990.0000.7910.8520.9210.8570.0530.8380.9280.8520.0590.8420.928
Low0.880.8740.0000.8730.7710.8880.0000.8590.8890.9290.8970.0420.8910.9260.9070.0480.8950.93
Mixture one0.8610.8170.0000.8270.7780.8270.0000.8150.8590.8920.8580.0580.8560.8950.8510.0570.8520.888
Mixture two0.8910.8470.0000.8440.7490.8560.0000.840.8830.9090.870.0370.8640.9160.8790.0470.8770.911

Note: Common denotes gene regions only with common variants, Rare denotes gene regions only with rare variants, Low denotes gene regions only with low-frequency variants, Mixture one denotes gene regions with 20% of common variants and 80% of rare variants, and Mixture two denotes gene regions with 80% of common variants and 20% of rare variants.

The power of linkage equilibrium simulation based on LFDA and Smoothed FLM at significance level of 0.05 when sample size is 2000, c is 7 and proportion of causal variants is 1%. Note: Common denotes gene regions only with common variants, Rare denotes gene regions only with rare variants, Low denotes gene regions only with low-frequency variants, Mixture one denotes gene regions with 20% of common variants and 80% of rare variants, and Mixture two denotes gene regions with 80% of common variants and 20% of rare variants.

Case Two Simulation

In this case, the time effect function is . The time effect is 0 at certain time points (t = 3 and t = 7). Therefore, the genes do not express at time points 3 and 7. The rest of the settings are the same as the case one simulation. All results of power simulation can be seen Supplementary Data S2. The setting of case two in Table 2 and Figure 2 are the same as case one simulation. Figure 2 shows that the power based on LFDAT is 0 at the time points 3 and 7 for five gene regions. It means that the associated genes cannot be detected at these time points. It further indicates that the LFDAT method can accurately detect the selective expression function of genes. Other features and trends shown by these figures are consistent with the simulation of case one. Similarly, we compare the power of the LFDAT and Smoothed FLM for case two (See Supplementary Data S1). In Table 2, the power of the LFDAT is all 0 but the Smoothed FLM has weak power at time points 3 and 7. It can be known from the results of simulation that the performance of the LFDAT is stable in different scenarios. It can detect gene switching more accurately than the Smoothed FLM. While ensuring high power, it can accurately identify whether genes are expressed.

3.1.3 Estimation of ISE0, ISE1, and PMSE

We estimate the three evaluation indicators of two cases for five gene regions (See Supplementary Datas S1, S2), and we only display results of case one and case two when c is 3 and a sample size is 2000 in Table 3. In case one and case two simulations for the five gene regions, the means of ISE0 and ISE1 with the gene region of rare variants are the largest. Further, the means of PMSE with the gene region of low-frequency variants are largest in five gene regions. This indicates that LFDAT fits the time-varying effect function better for smaller genetic variants effects. Given the proportion of causal variants and the value of c, the change in the proportion of negative effects has little effect on the means and standard errors of ISE0, ISE1, and PMSE. Meanwhile, the proportion of causal variants and c increases (See Supplementary Datas S1, S2), and the means and the standard errors of ISE0 and PMSE gradually increase, whereas the means and standard errors of ISE1 decrease. The results of case two are smaller than that of case one, which might be affected by gene switching. The time-varying function of the genetic effect is null at a certain time point in case one so that the difference between the estimated time-varying function and the true time-varying function is smaller.
TABLE 3

The means and standard errors (in the parenthesis) of three indicators based on LFDA for linkage equilibrium simulation when sample size is 2000, c is 3

Proportion of causal variants (%)Proportion of negative effects (%)ISE0ISE1PMSE
Common regionRare regionLow-frequency regionMixture region oneMixture region twoCommon regionRare regionLow-frequency regionMixture region oneMixture region twoCommon regionRare regionLow-frequency regionMixture region oneMixture region two
Case 1100.0210.2400.1250.0750.0222.12726.64811.44522.2467.0175.5092.8785.7723.3945.027
(0.029)(0.314)(0.170)(0.113)(0.032)(0.610)(3.242)(1.263)(4.131)(3.378)(0.568)(0.245)(0.305)(0.448)(0.618)
200.0200.2390.1350.0710.0232.13526.96611.59622.2887.1165.5162.8885.8103.4204.992
(0.028)(0.354)(0.176)(0.110)(0.030)(0.583)(3.127)(1.347)(4.128)(3.226)(0.567)(0.243)(0.312)(0.452)(0.585)
500.0210.2240.1360.0710.0252.16727.16011.64722.4057.3535.5532.8995.8093.4294.980
(0.028)(0.330)(0.183)(0.111)(0.032)(0.573)(3.247)(1.316)(4.060)(3.519)(0.536)(0.245)(0.325)(0.459)(0.609)
200.0420.4730.2660.1530.0504.56057.10624.53247.39015.37510.7015.02911.2526.1909.565
(0.056)(0.608)(0.375)(0.238)(0.063)(1.176)(7.101)(2.663)(9.112)(7.599)(1.163)(0.501)(0.689)(0.970)(1.296)
200.0420.4440.3060.1460.0464.58057.84724.64147.95415.17410.7145.06611.3376.1669.609
(0.058)(0.665)(0.412)(0.235)(0.058)(1.238)(7.153)(2.809)(8.801)(7.187)(1.168)(0.525)(0.676)(0.968)(1.277)
500.0420.4730.2970.1400.0524.61858.33824.98448.15015.30510.7435.04811.3636.1609.589
(0.055)(0.646)(0.404)(0.202)(0.070)(1.265)(6.747)(2.739)(9.142)(7.503)(1.188)(0.517)(0.685)(1.008)(1.328)
400.0580.6910.3940.2120.0766.65383.58735.62369.95122.25115.1276.88216.0018.45213.489
(0.086)(0.983)(0.510)(0.329)(0.107)(1.730)(10.062)(4.284)(12.158)(10.635)(1.739)(0.757)(1.011)(1.261)(1.861)
200.0590.6770.4260.2260.0686.76684.34736.38870.57922.05215.2796.94216.0718.51613.565
(0.084)(0.984)(0.556)(0.358)(0.094)(1.877)(9.980)(4.281)(12.590)(10.306)(1.787)(0.770)(1.015)(1.408)(1.836)
500.0580.6080.4130.1770.0736.82785.60636.67570.77723.06515.2426.91916.1268.54213.488
(0.078)(0.781)(0.592)(0.264)(0.102)(1.868)(10.091)(4.380)(12.597)(11.529)(1.726)(0.769)(0.964)(1.387)(1.938)
Case 2100.0090.1040.0580.0280.0101.18515.1026.50512.6444.0873.5332.0773.7172.3413.241
(0.012)(0.144)(0.073)(0.038)(0.013)(0.315)(1.666)(0.670)(2.355)(2.006)(0.312)(0.140)(0.183)(0.253)(0.346)
200.0080.1080.0560.0300.0101.19715.3156.56812.5694.0643.5592.0763.7252.3753.258
(0.010)(0.147)(0.079)(0.046)(0.013)(0.330)(1.683)(0.650)(2.257)(1.901)(0.324)(0.142)(0.184)(0.255)(0.351)
500.0090.1040.0560.0300.0091.19215.2406.55312.6183.8773.5342.0913.7492.3693.282
(0.011)(0.137)(0.076)(0.049)(0.014)(0.331)(1.670)(0.701)(2.390)(1.885)(0.321)(0.138)(0.195)(0.263)(0.342)
200.0180.1960.1280.0610.0202.57432.61113.92026.9878.5216.4703.2976.8323.9485.866
(0.025)(0.262)(0.167)(0.092)(0.028)(0.696)(3.627)(1.367)(4.893)(4.069)(0.685)(0.280)(0.384)(0.542)(0.726)
200.0180.1880.1170.0670.0192.62632.91214.04226.6128.7706.5153.3026.8593.9825.859
(0.024)(0.267)(0.167)(0.111)(0.025)(0.712)(3.757)(1.324)(4.753)(4.163)(0.669)(0.313)(0.390)(0.520)(0.713)
500.0190.1970.1220.0660.0212.57032.99614.11727.0218.6956.4793.3086.8863.9575.858
(0.026)(0.272)(0.170)(0.106)(0.031)(0.691)(3.549)(1.557)(4.992)(4.238)(0.667)(0.301)(0.398)(0.542)(0.755)
400.0240.2930.1620.0800.0303.74247.56320.45539.24713.0148.9744.3649.5215.2617.990
(0.033)(0.394)(0.224)(0.135)(0.042)(1.029)(5.306)(2.079)(7.110)(6.094)(0.976)(0.422)(0.568)(0.780)(1.046)
200.0240.2840.1680.0900.0283.79147.69720.68439.80912.9619.0384.4059.5415.2668.105
(0.031)(0.355)(0.230)(0.136)(0.040)(1.006)(5.391)(2.106)(7.001)(5.980)(0.936)(0.436)(0.565)(0.787)(1.048)
500.0240.2920.1610.0960.0303.81348.09220.67439.03812.5819.0704.3919.5875.3768.141
(0.033)(0.424)(0.232)(0.162)(0.040)(1.006)(5.468)(2.026)(6.924)(6.147)(0.999)(0.437)(0.588)(0.816)(1.062)

Note: Common region denotes gene regions only with common variants, Rare region denotes gene regions only with rare variants, Low-frequency region denotes gene regions only with low-frequency variants, Mixture region one denotes gene regions with 20% of common variants and 80% of rare variants, and the Mixture region two denotes gene regions with 80% of common variants and 20% of rare variants.

The means and standard errors (in the parenthesis) of three indicators based on LFDA for linkage equilibrium simulation when sample size is 2000, c is 3 Note: Common region denotes gene regions only with common variants, Rare region denotes gene regions only with rare variants, Low-frequency region denotes gene regions only with low-frequency variants, Mixture region one denotes gene regions with 20% of common variants and 80% of rare variants, and the Mixture region two denotes gene regions with 80% of common variants and 20% of rare variants.

3.2 Linkage Disequilibrium Simulation

The measure of linkage disequilibrium is r 2. It is randomly generated from a uniform distribution U (a,b). The measure of linkage disequilibrium between each SNP is not equal. We consider two scenarios that the r 2 is between 0.01 and 0.25, and 0.25 and 0.64. Simulation settings of type I error rates and power are the same as Section 3.1. All results of simulation can be seen Supplementary Datas S3–S6. Due to space limitations and the similar features and trends of the results of two scenarios, we only display the partial results of second scenarios (r 2 is between 0.25 and 0.64). Table 4 shows the type I error rates of the LFDAT and Smoothed FLM at the significance level of 0.05, 0.01, and 0.001 for linkage disequilibrium simulation. The part of power results is shown in Table 5 (When significance level is 0.05, sample is 2000, c is 7, and proportion of causal variants is 1%) and Figure 3 (When sample is 2000, and c is 3). Type I error rates of rare gene region and low-frequency gene region are still lower than others. Type I error rates of the LFDAT is still lower than that of the Smoothed FLM, and the type I error rates of the Smoothed FLM is slightly inflated. It is verified once again that the use of the multiple measurement of traits can reduce the probability of making the type I errors. Power of linkage disequilibrium is very high for two cases. Especially in case one, power of five gene regions is 100% when the proportions of the negative effect of causal variants are 0%, 20%. The power of linkage disequilibrium simulation has increased a lot compared with linkage equilibrium simulation, which is due to consider the overall effect together in gene region as loci correlation.
TABLE 4

Type Ⅰ error rates of LFDA and Smoothed FLM based on 1,000 simulated replicates for linkage disequilibrium simulation.

αSample sizeGene regionLFDASmoothed FLM
t = 1t = 2t = 3t = 4t = 5t = 6t = 7t = 8t = 9t = 1t = 2t = 3t = 4t = 5t = 6t = 7t = 8t = 9
0.051,000Common0.0290.0090.0190.0100.0100.0140.0100.0100.0280.0520.0410.0590.0430.0490.0490.0440.0340.041
Rare0.0100.0050.0050.0080.0050.0040.0050.0080.0100.0410.0540.0510.0510.0560.0520.0560.0470.048
Low0.0180.0070.0080.0020.0090.0050.0090.0060.0210.0340.0480.0470.0440.0450.0440.0610.0550.042
Mixture one0.0300.0140.0110.0070.0030.0060.0030.0030.0320.0510.0550.0530.0430.0400.0460.0420.0430.053
Mixture two0.0320.0100.0110.0130.0110.0160.0090.0170.0360.0480.0330.0490.0430.0530.0600.0580.0550.052
1,500Common0.0360.0090.0100.0110.0080.0110.0150.0090.0330.0620.0480.0570.0490.0440.0430.0570.0500.049
Rare0.0100.0040.0030.0040.0040.0050.0020.0050.0100.0490.0500.0410.0470.0560.0440.0440.0440.056
Low0.0150.0070.0040.0040.0080.0080.0060.0080.0210.0480.0560.0410.0530.0690.0500.0540.0510.055
Mixture one0.0340.0130.0120.0050.0090.0070.0080.0090.0390.0570.0550.0580.0610.0560.0390.0500.0450.064
Mixture two0.0350.0070.0150.0080.0110.0190.0100.0160.0390.0520.0420.0530.0440.0520.0550.0500.0610.057
2000Common0.0320.0200.0160.0110.0180.0150.0200.0190.0430.0460.0540.0410.0450.0570.0520.0480.0600.054
Rare0.0190.0030.0070.0040.0080.0070.0110.0110.0170.0480.0470.0540.0450.0570.0620.0450.0480.055
Low0.0160.0150.0100.0070.0060.0060.0080.0080.0230.0430.0510.0520.0390.0550.0550.0380.0390.044
Mixture one0.0230.0060.0070.0070.0120.0030.0110.0070.0260.0530.0470.0430.0550.0650.0590.0640.0450.048
Mixture two0.0360.0140.0160.0180.0140.0170.0170.0180.0360.0520.0540.0580.0480.0490.0480.0470.0590.050
0.011,000Common0.0060.0040.0020.0030.0030.0060.0010.0010.0030.0140.0100.0180.0120.0130.0130.0060.0070.005
Rare0.0010.0020.0010.0000.0010.0020.0010.0030.0040.0090.0080.0080.0120.0080.0110.0100.0140.011
Low0.0010.0020.0010.0000.0010.0000.0000.0010.0000.0040.0090.0100.0020.0110.0070.0120.0040.008
Mixture one0.0080.0010.0020.0000.0010.0000.0000.0000.0060.0160.0170.0100.0110.0070.0100.0030.0040.013
Mixture two0.0080.0020.0020.0030.0020.0010.0000.0020.0070.0120.0070.0120.0110.0100.0130.0110.0160.011
1,500Common0.0050.0010.0010.0020.0020.0010.0010.0000.0060.0130.0080.0100.0110.0080.0080.0120.0090.008
Rare0.0010.0000.0000.0010.0010.0000.0000.0000.0020.0090.0090.0050.0140.0140.0090.0050.0110.014
Low0.0040.0020.0000.0000.0020.0000.0000.0030.0030.0070.0100.0050.0100.0090.0100.0150.0120.008
Mixture one0.0060.0030.0010.0000.0020.0000.0010.0020.0080.0130.0150.0120.0100.0180.0090.0070.0140.018
Mixture two0.0060.0010.0020.0020.0030.0030.0030.0010.0070.0100.0050.0140.0080.0140.0130.0070.0090.009
2000Common0.0090.0020.0040.0010.0020.0030.0040.0010.0070.0110.0130.0120.0050.0110.0110.0180.0130.011
Rare0.0030.0000.0000.0000.0010.0000.0010.0010.0040.0130.0040.0100.0070.0120.0120.0170.0100.013
Low0.0050.0020.0000.0000.0010.0010.0010.0000.0040.0090.0150.0090.0100.0090.0110.0060.0100.010
Mixture one0.0030.0010.0010.0010.0030.0010.0020.0020.0050.0070.0050.0080.0130.0180.0030.0120.0120.013
Mixture two0.0110.0020.0030.0040.0040.0020.0030.0010.0040.0130.0070.0120.0130.0090.0120.0120.0120.011
0.0011,000Common0.0000.0020.0000.0000.0000.0000.0000.0000.0000.0010.0020.0010.0020.0010.0050.0010.0020.001
Rare0.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0020.0010.0020.0010.0020.0010.0040.002
Low0.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0010.0000.0000.0030.0010.0010.0010.000
Mixture one0.0000.0000.0000.0000.0000.0000.0000.0000.0010.0010.0030.0030.0000.0010.0000.0000.0000.001
Mixture two0.0010.0000.0000.0000.0000.0010.0000.0000.0010.0010.0000.0000.0010.0010.0010.0000.0010.001
1,500Common0.0000.0000.0000.0000.0010.0010.0000.0000.0000.0000.0010.0000.0020.0010.0010.0010.0000.001
Rare0.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0010.0010.0020.0010.0000.0000.002
Low0.0000.0000.0000.0000.0000.0000.0000.0000.0010.0010.0010.0000.0010.0030.0000.0010.0030.002
Mixture one0.0010.0000.0000.0000.0000.0000.0000.0000.0020.0010.0010.0010.0000.0020.0010.0010.0010.002
Mixture two0.0000.0000.0000.0000.0010.0000.0000.0000.0000.0000.0000.0000.0000.0030.0010.0030.0000.001
2000Common0.0030.0000.0000.0000.0000.0010.0010.0000.0000.0030.0010.0000.0010.0010.0010.0030.0000.000
Rare0.0000.0000.0000.0000.0000.0000.0000.0000.0000.0020.0000.0020.0010.0020.0000.0020.0020.001
Low0.0000.0000.0000.0000.0000.0010.0000.0000.0010.0000.0020.0000.0010.0010.0020.0020.0010.002
Mixture one0.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0010.0000.0010.0010.0000.0010.000
Mixture two0.0000.0000.0020.0000.0000.0000.0000.0000.0000.0010.0010.0030.0000.0030.0020.0020.0000.000

Note: (i) The r 2 measure of linkage disequilibrium is between 0.25 and 0.64; (ii) Common denotes gene regions only with common variants, Rare denotes gene regions only with rare variants, Low denotes gene regions only with low-frequency variants, Mixture one denotes gene regions with 20% of common variants and 80% of rare variants, and Mixture two denotes gene regions with 80% of common variants and 20% of rare variants.

TABLE 5

The power of linkage disequilibrium simulation based on LFDA and Smoothed FLM at significance level of 0.05 when sample size is 2000, c is 7 and proportion of causal variants is 1%.

Proportion of negative effects (%)Gene regionLFDASmoothed FLM
t = 1t = 2t = 3t = 4t = 5t = 6t = 7t = 8t = 9t = 1t = 2t = 3t = 4t = 5t = 6t = 7t = 8t = 9
Case 10Common1.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.000
Rare1.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.000
Low1.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.000
Mixture one1.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.000
Mixture two1.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.000
20Common1.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.000
Rare1.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.000
Low1.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.000
Mixture one1.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.000
Mixture two1.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.000
50Common0.9990.9990.9991.0001.0000.9990.9991.0000.9990.9990.9990.9991.0001.0000.9990.9991.0000.999
Rare0.9970.9991.0000.9991.0000.9990.9990.9991.0000.9970.9991.0001.0001.0000.9990.9990.9991.000
Low1.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.000
Mixture one1.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.000
Mixture two0.9990.9991.0000.9990.9990.9990.9990.9990.9990.9990.9991.0000.9990.9990.9990.9990.9990.999
Case 20Common1.0001.0000.0001.0001.0001.0000.0001.0001.0001.0001.0000.0521.0001.0001.0000.0601.0001.000
Rare1.0001.0000.0001.0001.0001.0000.0001.0001.0001.0001.0000.0511.0001.0001.0000.0491.0001.000
Low1.0001.0000.0001.0001.0001.0000.0001.0001.0001.0001.0000.0421.0001.0001.0000.0491.0001.000
Mixture one1.0001.0000.0001.0001.0001.0000.0001.0001.0001.0001.0000.0581.0001.0001.0000.0431.0001.000
Mixture two1.0001.0000.0001.0001.0001.0000.0001.0001.0001.0001.0000.0571.0001.0001.0000.0511.0001.000
20Common1.0001.0000.0001.0001.0001.0000.0001.0001.0001.0001.0000.0531.0001.0001.0000.0481.0001.000
Rare1.0001.0000.0001.0001.0001.0000.0001.0001.0001.0001.0000.0491.0001.0001.0000.0441.0001.000
Low1.0001.0000.0001.0001.0001.0000.0001.0001.0001.0001.0000.0601.0001.0001.0000.0651.0001.000
Mixture one1.0001.0000.0001.0001.0001.0000.0001.0001.0001.0001.0000.0531.0001.0001.0000.0631.0001.000
Mixture two1.0001.0000.0001.0001.0001.0000.0001.0001.0001.0001.0000.0531.0001.0001.0000.0411.0001.000
50Common1.0001.0000.0001.0001.0001.0000.0001.0001.0001.0001.0000.0511.0001.0001.0000.0541.0001.000
Rare0.9980.9940.0000.9980.9980.9980.0000.9960.9980.9980.9960.0520.9980.9980.9980.0440.9990.998
Low1.0001.0000.0001.0000.9991.0000.0001.0001.0001.0001.0000.0521.0001.0001.0000.0541.0001.000
Mixture one1.0000.9990.0001.0000.9990.9980.0000.9991.0001.0000.9990.0611.0001.0000.9980.0541.0001.000
Mixture two1.0001.0000.0001.0001.0001.0000.0001.0001.0001.0001.0000.0531.0001.0001.0000.0581.0001.000

Note: (i) The r 2 measure of linkage disequilibrium is between 0.25 and 0.64; (ii) Common denotes gene regions only with common variants, Rare denotes gene regions only with rare variants, Low denotes gene regions only with low-frequency variants, Mixture one denotes gene regions with 20% of common variants and 80% of rare variants, and Mixture two denotes gene regions with 80% of common variants and 20% of rare variants.

FIGURE 3

Power of linkage disequilibrium’s case one and case two based on LFDAT for the five gene regions when c is 3, and sample size is 2000. The (A–C) denotes the power results of case one. The (D–F) denotes the power results of case two. The time effect function is for case one, and for case two. Case one: (A) Proportion of causal variants is 1% (B) Proportion of causal variants is 2% (C) Proportion of causal variants is 4%. Case two: (D) Proportion of causal variants is 1% (E) Proportion of causal variants is 2% (F) Proportion of causal variants is 4%. Note: The r 2 measure of linkage disequilibrium is 0.25 to 0.64; Common region denotes gene regions only with common variants, Rare region denotes gene regions only with rare variants, Low region denotes gene regions only with low-frequency variants, Mixture region one denotes gene regions with 20% of common variants and 80% of rare variants, and the Mixture region two denotes gene regions with 80% of common variants and 20% of rare variants.

Type Ⅰ error rates of LFDA and Smoothed FLM based on 1,000 simulated replicates for linkage disequilibrium simulation. Note: (i) The r 2 measure of linkage disequilibrium is between 0.25 and 0.64; (ii) Common denotes gene regions only with common variants, Rare denotes gene regions only with rare variants, Low denotes gene regions only with low-frequency variants, Mixture one denotes gene regions with 20% of common variants and 80% of rare variants, and Mixture two denotes gene regions with 80% of common variants and 20% of rare variants. The power of linkage disequilibrium simulation based on LFDA and Smoothed FLM at significance level of 0.05 when sample size is 2000, c is 7 and proportion of causal variants is 1%. Note: (i) The r 2 measure of linkage disequilibrium is between 0.25 and 0.64; (ii) Common denotes gene regions only with common variants, Rare denotes gene regions only with rare variants, Low denotes gene regions only with low-frequency variants, Mixture one denotes gene regions with 20% of common variants and 80% of rare variants, and Mixture two denotes gene regions with 80% of common variants and 20% of rare variants. Power of linkage disequilibrium’s case one and case two based on LFDAT for the five gene regions when c is 3, and sample size is 2000. The (A–C) denotes the power results of case one. The (D–F) denotes the power results of case two. The time effect function is for case one, and for case two. Case one: (A) Proportion of causal variants is 1% (B) Proportion of causal variants is 2% (C) Proportion of causal variants is 4%. Case two: (D) Proportion of causal variants is 1% (E) Proportion of causal variants is 2% (F) Proportion of causal variants is 4%. Note: The r 2 measure of linkage disequilibrium is 0.25 to 0.64; Common region denotes gene regions only with common variants, Rare region denotes gene regions only with rare variants, Low region denotes gene regions only with low-frequency variants, Mixture region one denotes gene regions with 20% of common variants and 80% of rare variants, and the Mixture region two denotes gene regions with 80% of common variants and 20% of rare variants. At the same time, we evaluate the three indicators of two cases for five gene regions (See Supplementary Datas S3–S6), and here we only display results of case one and case two when c is 3 in Table 6 for second scenarios (r 2 is between 0.25 and 0.64). Same as linkage equilibrium simulation, the means of ISE0 and ISE1 of rare gene region is largest and the means of PMSE of rare gene region is smallest. This also verifies that LFDAT fits the time-varying effect function better for smaller genetic effects of gene regions.
TABLE 6

The means and standard errors (in the parenthesis) of three indicators based on LFDA for linkage disequilibrium simulation when sample size is 2000, c is 3

Proportion of causal variants (%)Proportion of negative effects (%)ISE0ISE1PMSE
Common regionRare regionLow-frequency regionMixture region oneMixture region twoCommon regionRare regionLow-frequency regionMixture region oneMixture region twoCommon regionRare regionLow-frequency regionMixture region oneMixture region two
Case 1100.0140.2150.0970.0760.0181.50024.64010.84210.9782.2205.2622.8225.6035.4925.897
(0.021)(0.286)(0.128)(0.109)(0.023)(0.200)(1.945)(0.866)(2.210)(0.424)(0.405)(0.167)(0.299)(0.514)(0.466)
200.0120.2200.0960.0710.0181.52224.99810.96611.2032.2265.2582.8215.6605.5005.928
(0.017)(0.305)(0.136)(0.098)(0.025)(0.219)(2.132)(0.836)(2.241)(0.410)(0.411)(0.170)(0.318)(0.524)(0.459)
500.0140.2010.1030.0720.0181.54525.08911.08311.2062.2435.3032.8315.6455.5205.933
(0.019)(0.291)(0.139)(0.101)(0.026)(0.214)(1.918)(0.837)(2.293)(0.433)(0.414)(0.168)(0.319)(0.527)(0.467)
200.0290.4270.2090.1540.0383.26252.70123.19023.8404.72710.1484.90310.90510.62911.507
(0.038)(0.588)(0.285)(0.223)(0.055)(0.478)(4.013)(1.631)(5.010)(0.917)(0.882)(0.361)(0.665)(1.164)(0.977)
200.0300.4690.2090.1590.0393.31253.61023.55624.2944.81410.2264.92910.97510.58911.605
(0.042)(0.679)(0.275)(0.219)(0.058)(0.473)(3.914)(1.711)(4.985)(0.976)(0.840)(0.348)(0.657)(1.153)(1.010)
500.0280.4490.2080.1630.0393.31253.68623.83424.0574.82010.2574.97311.01110.69711.639
(0.041)(0.625)(0.321)(0.223)(0.053)(0.478)(4.070)(2.070)(4.988)(0.927)(0.894)(0.346)(0.680)(1.157)(0.957)
400.0410.7130.3020.2520.0534.77077.24833.97634.6206.91514.4356.71415.53515.08216.429
(0.053)(0.970)(0.415)(0.376)(0.073)(0.674)(6.265)(2.621)(7.464)(1.333)(1.259)(0.520)(0.985)(1.629)(1.400)
200.0420.6710.3180.2400.0504.82978.21534.58335.0117.01214.4726.77415.57115.11416.459
(0.058)(0.941)(0.426)(0.331)(0.070)(0.701)(6.598)(3.307)(7.349)(1.386)(1.250)(0.526)(0.949)(1.763)(1.368)
500.0420.6370.3100.2410.0524.81778.83134.59835.2287.04514.4686.78715.66215.23716.546
(0.059)(0.932)(0.428)(0.354)(0.071)(0.692)(6.105)(2.502)(7.163)(1.380)(1.282)(0.525)(0.989)(1.659)(1.381)
Case 2100.0060.0910.0410.0290.0070.87813.9486.1946.5161.3123.4762.0693.6613.5483.855
(0.008)(0.129)(0.056)(0.039)(0.009)(0.107)(0.906)(0.392)(1.372)(0.264)(0.221)(0.100)(0.187)(0.330)(0.259)
200.0060.0920.0400.0270.0070.88014.0686.2576.5891.3203.4802.0663.6623.5233.870
(0.008)(0.116)(0.057)(0.039)(0.010)(0.109)(1.015)(0.367)(1.412)(0.248)(0.226)(0.101)(0.188)(0.318)(0.247)
500.0060.0890.0410.0270.0070.88414.1646.3116.5221.3513.4712.0663.6663.5383.878
(0.009)(0.120)(0.053)(0.037)(0.010)(0.114)(0.923)(0.398)(1.332)(0.310)(0.226)(0.102)(0.176)(0.305)(0.262)
200.0130.1900.0870.0530.0161.86429.97713.33213.8782.8566.2753.2926.6976.4617.146
(0.019)(0.246)(0.110)(0.074)(0.022)(0.233)(1.974)(0.810)(2.794)(0.626)(0.467)(0.203)(0.370)(0.658)(0.561)
200.0130.1940.0880.0530.0161.89230.22013.44113.9892.8926.3193.2926.6886.4617.146
(0.018)(0.269)(0.121)(0.074)(0.022)(0.244)(1.954)(0.747)(2.993)(0.688)(0.471)(0.208)(0.393)(0.702)(0.570)
500.0130.2040.0870.0590.0171.89530.47213.56014.0272.8736.3203.2886.6966.4467.179
(0.018)(0.298)(0.114)(0.082)(0.023)(0.244)(2.134)(0.865)(2.908)(0.609)(0.480)(0.209)(0.386)(0.670)(0.535)
400.0190.2610.1250.0810.0222.72743.85619.50820.4334.1238.7454.3549.3408.9709.984
(0.029)(0.347)(0.169)(0.118)(0.030)(0.343)(2.944)(1.198)(4.177)(0.816)(0.703)(0.293)(0.540)(0.949)(0.804)
200.0190.2600.1190.0890.0212.75944.14519.70220.3224.2148.7544.3479.3349.02210.045
(0.025)(0.346)(0.155)(0.129)(0.027)(0.350)(2.815)(1.187)(4.260)(0.853)(0.698)(0.296)(0.565)(0.984)(0.796)
500.0180.2890.1270.0770.0222.77544.34619.75720.3204.2248.8024.3559.3749.00310.032
(0.024)(0.442)(0.185)(0.105)(0.030)(0.344)(2.867)(1.210)(4.165)(0.973)(0.716)(0.299)(0.543)(0.975)(0.791)

Note: (i) The r 2 measure of linkage disequilibrium is between 0.25 and 0.64; (ii) Common region denotes gene regions only with common variants, Rare region denotes gene regions only with rare variants, Low-frequency region denotes gene regions only with low-frequency variants, Mixture region one denotes gene regions with 20% of common variants and 80% of rare variants, and the Mixture region two denotes gene regions with 80% of common variants and 20% of rare variants.

The means and standard errors (in the parenthesis) of three indicators based on LFDA for linkage disequilibrium simulation when sample size is 2000, c is 3 Note: (i) The r 2 measure of linkage disequilibrium is between 0.25 and 0.64; (ii) Common region denotes gene regions only with common variants, Rare region denotes gene regions only with rare variants, Low-frequency region denotes gene regions only with low-frequency variants, Mixture region one denotes gene regions with 20% of common variants and 80% of rare variants, and the Mixture region two denotes gene regions with 80% of common variants and 20% of rare variants.

3.3 Comparison of Simulation

Linkage equilibrium simulation and the two scenarios of linkage disequilibrium simulation are compared and analyzed when sample size is 1,500, constant c is 5, and the proportion of casual variants is 2% (Tables 7–9). The characteristics of the remaining simulation results are similar to that of above simulation. In general, the type I error rates of the two scenarios of linkage disequilibrium simulation is larger than that of linkage equilibrium simulation. This is because the increase in power will increase the type I error rates. From results of two cases we can know that because the r 2 measure of linkage disequilibrium increase, the power also increases for five gene regions. The power of linkage disequilibrium simulation is significantly less affected by the proportion of negative effects than that of linkage equilibrium simulation, which is also due to the interaction between genes.
TABLE 7

Compare the type Ⅰ error rates of linkage equilibrium and linkage disequilibrium simulation based on LFDAT when sample size is 1,500.

αSimulationGene regionLFDA
t = 1t = 2t = 3t = 4t = 5t = 6t = 7t = 8t = 9
0.05LECommon0.0250.0090.0020.0100.0050.0090.0070.0090.027
Rare0.0120.0040.0000.0020.0000.0030.0050.0020.007
Low0.0190.0060.0110.0080.0020.0060.0040.0130.017
Mixture one0.0130.0100.0060.0020.0000.0080.0020.0060.021
Mixture two0.0310.0140.0090.0120.0050.0070.0100.0100.028
LD1Common0.0290.0120.0120.0090.0120.0100.0080.0170.031
Rare0.0070.0050.0040.0050.0030.0030.0030.0060.011
Low0.0200.0070.0070.0060.0070.0060.0040.0060.022
Mixture one0.0190.0050.0060.0030.0060.0120.0050.0110.021
Mixture two0.0300.0110.0080.0080.0060.0080.0060.0140.028
LD2Common0.0360.0090.0100.0110.0080.0110.0150.0090.033
Rare0.0100.0040.0030.0040.0040.0050.0020.0050.010
Low0.0150.0070.0040.0040.0080.0080.0060.0080.021
Mixture one0.0340.0130.0120.0050.0090.0070.0080.0090.039
Mixture two0.0350.0070.0150.0080.0110.0190.0100.0160.039
0.01LECommon0.0050.0020.0000.0010.0010.0000.0000.0020.003
Rare0.0020.0000.0000.0000.0000.0000.0010.0000.000
Low0.0030.0000.0010.0000.0000.0020.0000.0010.003
Mixture one0.0020.0030.0010.0010.0000.0000.0000.0010.001
Mixture two0.0060.0010.0010.0010.0000.0020.0020.0020.004
LD1Common0.0030.0020.0010.0020.0000.0010.0010.0050.005
Rare0.0010.0010.0000.0000.0010.0000.0010.0010.002
Low0.0030.0020.0000.0010.0020.0010.0000.0000.002
Mixture one0.0050.0000.0020.0000.0020.0000.0010.0010.001
Mixture two0.0070.0010.0000.0020.0010.0010.0000.0040.002
LD2Common0.0050.0010.0010.0020.0020.0010.0010.0000.006
Rare0.0010.0000.0000.0010.0010.0000.0000.0000.002
Low0.0040.0020.0000.0000.0020.0000.0000.0030.003
Mixture one0.0060.0030.0010.0000.0020.0000.0010.0020.008
Mixture two0.0060.0010.0020.0020.0030.0030.0030.0010.007
0.001LECommon0.0010.0000.0000.0000.0000.0000.0000.0000.000
Rare0.0000.0000.0000.0000.0000.0000.0000.0000.000
Low0.0000.0000.0000.0000.0000.0000.0000.0000.000
Mixture one0.0000.0000.0010.0000.0000.0000.0000.0000.000
Mixture two0.0000.0000.0000.0000.0000.0010.0000.0000.000
LD1Common0.0000.0000.0000.0000.0000.0000.0000.0000.000
Rare0.0000.0000.0000.0000.0000.0000.0000.0000.000
Low0.0000.0000.0000.0000.0000.0000.0000.0000.001
Mixture one0.0010.0000.0000.0000.0000.0000.0000.0000.000
Mixture two0.0000.0000.0000.0000.0000.0000.0000.0000.000
LD2Common0.0000.0000.0000.0000.0010.0010.0000.0000.000
Rare0.0000.0000.0000.0000.0000.0000.0000.0000.000
Low0.0000.0000.0000.0000.0000.0000.0000.0000.001
Mixture one0.0010.0000.0000.0000.0000.0000.0000.0000.002
Mixture two0.0000.0000.0000.0000.0010.0000.0000.0000.000

Note: Common denotes gene regions only with common variants, Rare denotes gene regions only with rare variants, Low denotes gene regions only with low-frequency variants, Mixture one denotes gene regions with 20% of common variants and 80% of rare variants, and Mixture two denotes gene regions with 80% of common variants and 20% of rare variants. LE, denotes linkage equilibrium simulation, LD1 denotes linkage disequilibrium simulation when r is between 0.01 and 0.25, LD2 denotes linkage disequilibrium simulation when r is between 0.25 and 0.64.

TABLE 9

Compare the estimated means and standard errors (in the parenthesis) of three indicators for linkage equilibrium and linkage disequilibrium simulation based on LFDA method when sample size is 1,500, c is 5, and proportion of casual variants is 2%.

SimulationProportion of negative effects (%)ISE0ISE1PMSE
Common regionRare regionLow-frequency regionMixture region oneMixture region twoCommon regionRare regionLow-frequency regionMixture region oneMixture region twoCommon regionRare regionLow-frequency regionMixture region oneMixture region two
Case 1LE00.0940.9400.6520.3010.0954.26353.52023.16144.89214.53020.0678.96721.17911.24617.846
(0.132)(1.256)(0.897)(0.437)(0.123)(0.878)(5.466)(2.597)(6.130)(4.935)(1.694)(0.758)(1.117)(1.399)(1.858)
200.0950.8820.5940.2920.1024.34455.04623.74245.44414.88120.2409.08621.48111.30517.961
(0.132)(1.211)(0.792)(0.430)(0.147)(0.907)(5.469)(2.519)(5.994)(5.050)(1.890)(0.771)(1.152)(1.392)(1.882)
500.0930.8930.6580.3050.1024.38255.38524.18845.45314.47520.3989.15921.58711.31918.199
(0.128)(1.177)(0.937)(0.466)(0.134)(0.872)(5.589)(2.774)(5.847)(4.688)(1.738)(0.814)(1.180)(1.339)(1.893)
LD100.0720.8670.5020.3450.1043.57050.54722.15031.9866.28720.1549.11221.36616.16221.750
(0.100)(1.089)(0.709)(0.493)(0.144)(0.607)(3.914)(2.014)(4.871)(1.444)(1.617)(0.647)(1.021)(1.702)(1.639)
200.0680.8670.4960.3420.1003.62551.86922.68132.2426.43420.3489.17921.70516.39022.008
(0.092)(1.211)(0.686)(0.470)(0.138)(0.583)(4.084)(2.083)(4.817)(1.489)(1.562)(0.666)(1.063)(1.758)(1.633)
500.0710.8170.5010.3480.1023.67952.62123.04932.3646.55820.5459.23221.75416.48822.139
(0.096)(1.118)(0.703)(0.473)(0.139)(0.584)(3.878)(2.112)(4.895)(1.580)(1.549)(0.640)(1.148)(1.804)(1.594)
LD200.0610.8710.4440.2700.0803.07249.21321.55822.3954.42819.3318.88320.77720.07521.867
(0.081)(1.120)(0.580)(0.371)(0.106)(0.354)(3.871)(1.747)(3.489)(0.603)(1.483)(0.688)(1.386)(1.890)(1.755)
200.0620.8530.4740.3040.0783.11150.44022.19122.6174.48919.4248.97321.02220.43421.967
(0.084)(1.143)(0.638)(0.440)(0.107)(0.337)(4.512)(1.834)(3.493)(0.601)(1.430)(0.662)(1.339)(1.987)(1.785)
500.0590.8640.4320.3100.0783.17351.08522.44822.7974.60119.6349.00821.25120.50122.222
(0.077)(1.199)(0.579)(0.420)(0.110)(0.356)(3.620)(1.779)(3.516)(0.655)(1.454)(0.671)(1.319)(2.002)(1.772)
Case 2LE00.0350.3570.2440.1280.0462.42930.72113.17025.5638.19611.7825.56112.5726.80910.601
(0.048)(0.480)(0.345)(0.202)(0.060)(0.479)(2.844)(1.155)(3.300)(2.727)(1.009)(0.454)(0.644)(0.767)(1.104)
200.0390.3580.2410.1210.0432.47931.29513.44625.6668.43011.9875.57812.6666.86110.603
(0.050)(0.480)(0.322)(0.171)(0.055)(0.488)(2.569)(1.227)(3.197)(2.843)(1.005)(0.431)(0.651)(0.788)(1.046)
500.0360.3550.2450.1190.0412.50731.33913.55225.8498.10612.0445.63012.7156.86710.740
(0.048)(0.496)(0.361)(0.182)(0.058)(0.475)(2.575)(1.123)(3.401)(2.628)(0.991)(0.459)(0.647)(0.863)(1.066)
LD100.0280.3660.1990.1180.0372.02628.75212.66017.3913.73111.9785.71212.6989.94312.798
(0.037)(0.513)(0.268)(0.163)(0.053)(0.299)(1.915)(0.947)(2.573)(0.887)(0.900)(0.363)(0.570)(0.958)(0.878)
200.0280.3510.2030.1350.0362.06829.11412.84517.4983.75912.0785.76112.80110.02612.851
(0.038)(0.467)(0.284)(0.181)(0.045)(0.320)(1.858)(0.811)(2.616)(0.916)(0.902)(0.383)(0.618)(1.017)(0.925)
500.0270.3760.2040.1160.0372.07429.61813.02617.8083.78112.1435.74112.8659.94712.952
(0.037)(0.510)(0.284)(0.153)(0.048)(0.297)(2.271)(0.974)(2.595)(0.900)(0.880)(0.368)(0.596)(1.029)(0.889)
LD200.0240.3440.1740.1170.0301.77128.72612.25912.3072.71111.5495.48712.61112.33313.343
(0.032)(0.450)(0.231)(0.155)(0.041)(0.185)(1.572)(0.795)(1.803)(0.349)(0.852)(0.375)(0.768)(1.101)(0.999)
200.0230.3230.1600.1230.0341.81029.32712.51312.4952.73711.6265.45612.57512.26613.416
(0.031)(0.416)(0.210)(0.191)(0.046)(0.186)(1.668)(0.823)(1.804)(0.369)(0.851)(0.351)(0.733)(1.083)(1.001)
500.0230.3810.1760.1190.0311.84029.77112.64412.5472.78011.7025.44412.57512.27613.494
(0.031)(0.537)(0.255)(0.161)(0.042)(0.184)(1.907)(0.723)(1.756)(0.359)(0.848)(0.395)(0.775)(1.079)(0.950)

Note: Common denotes gene regions only with common variants, Rare denotes gene regions only with rare variants, Low denotes gene regions only with low-frequency variants, Mixture one denotes gene regions with 20% of common variants and 80% of rare variants, and Mixture two denotes gene regions with 80% of common variants and 20% of rare variants. LE, denotes linkage equilibrium simulation, LD1 denotes linkage disequilibrium simulation when r is between 0.01 and 0.25, LD2 denotes linkage disequilibrium simulation when r is between 0.25 and 0.64.

Compare the type Ⅰ error rates of linkage equilibrium and linkage disequilibrium simulation based on LFDAT when sample size is 1,500. Note: Common denotes gene regions only with common variants, Rare denotes gene regions only with rare variants, Low denotes gene regions only with low-frequency variants, Mixture one denotes gene regions with 20% of common variants and 80% of rare variants, and Mixture two denotes gene regions with 80% of common variants and 20% of rare variants. LE, denotes linkage equilibrium simulation, LD1 denotes linkage disequilibrium simulation when r is between 0.01 and 0.25, LD2 denotes linkage disequilibrium simulation when r is between 0.25 and 0.64. Compare the power of linkage equilibrium and linkage disequilibrium simulation based on LFDAT when significant level is 0.05, sample size is 1,500, c is 5, and proportion of casual variants is 2%. Note: Common denotes gene regions only with common variants, Rare denotes gene regions only with rare variants, Low denotes gene regions only with low-frequency variants, Mixture one denotes gene regions with 20% of common variants and 80% of rare variants, and Mixture two denotes gene regions with 80% of common variants and 20% of rare variants. LE, denotes linkage equilibrium simulation, LD1 denotes linkage disequilibrium simulation when r is between 0.01 and 0.25, LD2 denotes linkage disequilibrium simulation when r is between 0.25 and 0.64. Compare the estimated means and standard errors (in the parenthesis) of three indicators for linkage equilibrium and linkage disequilibrium simulation based on LFDA method when sample size is 1,500, c is 5, and proportion of casual variants is 2%. Note: Common denotes gene regions only with common variants, Rare denotes gene regions only with rare variants, Low denotes gene regions only with low-frequency variants, Mixture one denotes gene regions with 20% of common variants and 80% of rare variants, and Mixture two denotes gene regions with 80% of common variants and 20% of rare variants. LE, denotes linkage equilibrium simulation, LD1 denotes linkage disequilibrium simulation when r is between 0.01 and 0.25, LD2 denotes linkage disequilibrium simulation when r is between 0.25 and 0.64. In two cases, as the r 2 measure of linkage disequilibrium increase, the means and standard errors of ISE0 of common gene region, low-frequency gene region, and mixture gene region two gradually decrease, and the standard errors of ISE1 of the five gene regions decrease. The means of PMSE of common gene region, rare gene region, and low-frequency gene region first increase and then decrease, but the change is not large. These phenomena may be caused by the fact that the fitting errors of the LFDAT to the time-varying effect function gradually decreases as the r 2 measure of linkage disequilibrium increases. Although LFDAT has a little bias for fitting of the time-varying effect function, it does not affect its detection efficiency on a gene region. In general, LFDAT performs well for both linkage equilibrium and linkage disequilibrium simulations, and has a lower type I error rates with a higher power for gene regions.

4 Application to PSA Data Set

We apply LFDAT to a longitudinal data set (Campbell et al., 2018) of an Oryza sativa projected shoot area (PSA) to demonstrate the applicability of LFDAT. That data set selected 378 lines of RDP1 (Zhao et al., 2011). All experiments were carried on at the Plant Accelerator in the Australian Plant Phenomics Facility at the University of Adelaide, SA, Australia. The experiments were repeated three times from February to April 2016. For details of the experimental design, see Campbell et al. (2018). Briefly, we first transplanted three uniformly germinated seedlings into pots. Seven days after the transplant (DAT), the plants were thinned to one seedling per pot. The plants were imaged daily from 13 to 33 DAT using a red-green-blue camera from two side-view angles, separated by 90° and a single top view. Each experiment adopted a partially replicated design with 54 lines selected randomly, and they were repeated twice. Three experiments produced 73,537 images, and “Plant pixels” were extracted from RGB images using the LemnaGrid software. The sum of the “plant pixels” extracted from the three RGB images is used as an indicator to measure shoot biomass. This indicator is referred to as PSA. PSA has been proved to be an accurate expression of shoot biomass (Golzarian et al., 2011; Campbell et al., 2015; Knecht et al., 2016), which can describe the morphology and dynamic growth of plants. The first set of data from the first repeated experiment is selected as the phenotypic data, and samples with missing values are eliminated. The 350 samples remaining are used for the subsequent analysis. The development trajectories of the shoot biomass are shown in Figure 4, with the shoot biomass trajectories for all individuals indicated in the background. The genotype data contains a total of 36,901 markers on 12 chromosomes. The missing genotype is estimated, and SNPs with a minimum allele frequency of less than 0.005 are deleted. Finally, 36,058 SNPs remained. In order to be consistent with the results of Campbell et al. (2015), we treat each chromosome as a gene region for the association analysis. The number of SNPs and the p-value of the association analysis of each gene region are shown in Table 10. The correlation coefficients between the measured traits at each time point are close to one. Significant SNP sites have been identified on each chromosome. Further, the SNP sites of chromosome 3 are more significant. In the type I error rates simulation, it can be seen from Table 1 that the type I error rates is low, which indicates that the LFDAT method is less likely to identify false gene region. Therefore, the detection of the significant SNPs on each chromosome is basically credible in our study, which is consistent with the results of Campbell et al. (2015). However, significant SNP sites were not detected at the first two time points. It may be that the PSA growth trajectory is exponentially increasing, and the value is too large, leading to the variation range of PSA being too small at the first and second time points. Then, the difference between the rice populations cannot be identified.
FIGURE 4

The shoot biomass development trajectory of 350 samples. The solid gray line represents the trajectory curve of 350 samples, and the solid black line represents the average trajectory curve.

TABLE 10

The number of SNPs and the p-value of the association analysis of each gene region based on LFDAT at significance level of 0.05

ChrNo. Of SNPs in test p-value
t = 1t = 2t = 3t = 4t = 5t = 6t = 7t = 8t = 9t = 10
16,332112.0 × 10−2 3.2 × 10−6 6.6 × 10−8 3.0 × 10−8 8.3 × 10−9 3.2 × 10−10 2.3 × 10−11 1.7 × 10−12
23,808116.7 × 10−1 1.5 × 10−5 7.2 × 10−7 1.1 × 10−6 2.4 × 10−7 6.8 × 10−8 3.3 × 10−9 1.2 × 10−9
34,298111.9 × 10−2 1.2 × 10−8 3.6 × 10−10 2.2 × 10−10 5.9 × 10−11 3.0 × 10−12 7.0 × 10−14 5.7 × 10−14
42,802119.8 × 10−3 1.4 × 10−6 7.0 × 10−8 5.2 × 10−8 6.6 × 10−9 3.9 × 10−9 2.4 × 10−10 1.2 × 10−10
52,800111.2 × 10−2 5.1 × 10−7 1.1 × 10−8 9.3 × 10−9 7.5 × 10−10 7.0 × 10−11 4.6 × 10−12 4.8 × 10−12
63,177119.1 × 10−4 2.7 × 10−8 2.5 × 10−10 1.2 × 10−10 2.2 × 10−11 3.1 × 10−12 4.8 × 10−14 1.8 × 10−13
72024111.2 × 10−5 1.0 × 10−9 4.5 × 10−11 7.0 × 10−11 3.4 × 10−12 1.8 × 10−12 3.1 × 10−13 2.8 × 10−13
82,233115.8 × 10−3 2.6 × 10−6 1.9 × 10−7 6.7 × 10−8 4.9 × 10−9 1.1 × 10−9 1.0 × 10−10 2.7 × 10−11
91939111.7 × 10−2 4.6 × 10−6 4.2 × 10−7 1.6 × 10−7 4.9 × 10−9 7.7 × 10−9 7.8 × 10−10 3.3 × 10−10
101,672111.5 × 10−3 4.5 × 10−8 2.2 × 10−9 3.3 × 10−9 3.6 × 10−10 7.2 × 10−11 2.4 × 10−12 1.4 × 10−12
112,857111.4 × 10−1 1.2 × 10−5 7.4 × 10−7 5.4 × 10−7 1.1 × 10−7 1.7 × 10−8 1.3 × 10−9 2.2 × 10−10
122,121112.5 × 10−2 1.0 × 10−7 5.6 × 10−9 1.7 × 10−9 2.5 × 10−10 7.2 × 10−11 5.2 × 10−12 3.6 × 10−12
The shoot biomass development trajectory of 350 samples. The solid gray line represents the trajectory curve of 350 samples, and the solid black line represents the average trajectory curve. The number of SNPs and the p-value of the association analysis of each gene region based on LFDAT at significance level of 0.05 The calculation of the whole process was taken 15 s on the Intel Core 3.40 GHz CPU. This result indicates that the genetic region association analysis method based on the LFDAT method is computationally feasible.

5 Discussion

Considering the association analysis of quantitative traits at multiple time points, it is possible to better observe the influence of time-changing genes on quantitative traits. Further, longitudinal trait research based on gene regions can improve the power. We propose the LFDAT method and that the function-on-function regression model is applied to detect the association between gene regions and longitudinal traits. This can simultaneously lead to continuous phenotypic traits and marker information and make full use of the information carried by the traits to explore the influence of genes on longitudinal traits. Compared with other dynamic association analysis methods, the LFDAT method considers the genetic effects of variants in the entire gene region and the time effects of genes. It can also accurately detect the selective expression function of genes. The gene region association analysis based on the LFDAT method has few restrictions on the direction of gene effects, low computational cost, fast detection speed, low false positives, and high power. It further has a stronger explanatory for the effect of genes on the quantitative traits concerning time. We consider linkage equilibrium simulation and linkage disequilibrium simulation, the powers of the five gene regions are compared to prove the feasibility of LFDAT for a longitudinal trait association analysis in two simulation studies. At the same time, two cases are set for the time-varying function of the genetic effects to explore whether LFDAT can detect the selective expression of genes at different time points. The simulated results show that LFDAT has a lower type I error rates and higher power on the association analysis of the five gene regions and can accurately detect the selective expression of genes in two simulations. In addition, different settings for the variance and correlation coefficient of the random error are simulated. When the variance is 25, compared with the variance of 1, the powers of linkage equilibrium and linkage disequilibrium are significantly reduced, three indicators of linkage disequilibrium increase, and ISE0 and PMSE of linkage equilibrium increase. However, ISE1 of linkage equilibrium decrease. When the correlation coefficient is 0.95, compared with the correlation coefficient of 0.5, power of linkage disequilibrium increase, and three indicators of linkage disequilibrium decrease, however the changes of power and three indicators of linkage equilibrium are not obvious. For the continuous effect function, we try to plot the figure the estimated time-varying function of the genetic effect in linkage equilibrium and linkage disequilibrium simulation at first time point, in which time effect function and genetic effect function are fixed to constants and causal variants are 55, 66, 77, 88, 99, 110, 121, 132, 143, and 154-th SNP respectively. We find that the fitting of the time-varying function of the genetic effect in the linkage disequilibrium simulation is smoother than that of the linkage equilibrium simulation in most figures (As shown in Supplementary Data S7). This is because there is an association between each SNP, which makes the fitting of the time-varying function more constrained. Furthermore, Haseman and Elston (1972) proposed a linear model for detecting linkage between a marker and a QTL in a full-sib design. Then, the mathematical expectation of the regression coefficient is expressed by the additive genetic variance of the QTL. Chen (2014, 2016) proposed that variance component methods, such as the Haseman-Elston (HE) regression and the linear mixed model (LMM), provide valid estimate of heritability based gene effect in GWAS data for complex traits. The estimated heritability may reveal the genetic architecture underlying a complex trait. For the study about between the heritability and the continuous effect function within a gene region based on functional data analysis, there is currently no relevant research in this field. In the future, we will conduct in-depth research in this direction. Of course, LFDA that converts gene loci into continuous variables has some shortcomings. First, the covariates, population structure, and locus weights are not considered. In gene regions, the weak effects of rare variants are difficult to find, making it challenging to identify the gene regions of rare variants. The common solution is to assign different weights to different types of variants. In the research on LSKAT and LBT methods proposed by Wang et al. (2017), covariates and population structures were considered, and common and rare variants were given different weights. We sought to study the growth and development mechanism of plants in this paper mainly. However, we could consider adding factors, such as the population structure, and introduce the idea of weight to improve the detection ability of LFDAT in future research. Second, the fitting errors of indicators ISE0, ISE1, and PMSE are relatively large. It might be because of the limitations of LFDAT, which cannot compress the time-varying function of genetic effects to a state close to null, as stated by Lin et al. (2017). This is one direction of our future research. Third, in the simulation of the selective gene expression, it can be seen that the powers of linkage equilibrium for five gene region are unstable, and the powers are lower at some time points of the gene opening. We find it is related to the time-varying function of genetic effects by simulating a different time-varying function of the genetic effects. Therefore, accurately grasping how genetic effects change over time is a direction worth studying. Fourth, for the application on the PSA of the Oryza sativa data set, no significant SNP loci are detected at the first two time points. This indicates that the gene region association analysis based on LFDAT needs to be further improved to make the detection effect more accurate. Then, it could be better applied to the gene region association analysis of different longitudinal traits. This paper applies the LFDAT method to the real data process, and each chromosome is analyzed as an independent gene region. However, the variants that control longitudinal traits might be distributed in different gene regions. If there is a correlation between the causal variants in different gene regions, it is necessary to perform association analysis on multiple gene regions. This is also true if each chromosome is regarded as a gene region and the region is too large to accurately detect genes that control longitudinal traits. Then, the SNP sequence needs to be refined into multiple gene regions. The extension of the longitudinal trait association analysis based on the functional data analysis to multiple gene regions will be a future research direction.
TABLE 8

Compare the power of linkage equilibrium and linkage disequilibrium simulation based on LFDAT when significant level is 0.05, sample size is 1,500, c is 5, and proportion of casual variants is 2%.

Gene regionSimulationProportion of negative effects (%)Case 1Case 2
t = 1t = 2t = 3t = 4t = 5t = 6t = 7t = 8t = 9t = 1t = 2t = 3t = 4t = 5t = 6t = 7t = 8t = 9
CommonLE00.9910.9900.9880.9900.9870.9880.9900.9910.9890.9880.9760.0000.9790.9510.9820.0000.9810.993
200.9170.9220.9290.9270.9240.9320.9320.9370.9270.9170.8860.0000.8940.7330.8970.0000.8860.919
500.7350.7340.7440.7520.7520.7570.7600.7550.7590.7080.6760.0000.6840.4820.6750.0000.6840.700
LD101.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0000.0001.0001.0001.0000.0001.0001.000
200.9960.9960.9960.9970.9960.9970.9970.9980.9980.9940.9920.0000.9910.9860.9920.0000.9880.992
500.9380.9490.9520.9540.9570.9500.9490.9510.9470.9420.9160.0000.9130.8510.9230.0000.9150.942
LD201.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0000.0001.0001.0001.0000.0001.0001.000
201.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0000.0001.0001.0001.0000.0001.0001.000
501.0001.0001.0001.0001.0001.0001.0001.0001.0001.0000.9990.0000.9990.9981.0000.0001.0001.000
RareLE00.9920.9960.9980.9980.9990.9980.9981.0001.0000.9890.9760.0000.9800.9830.9810.0000.9810.988
200.9130.9440.9440.9490.9540.9580.9520.9540.9510.8950.8810.0000.8740.8610.8800.0000.8730.896
500.7860.8370.8540.8610.8640.8590.8620.8590.8480.7210.7090.0000.7130.6570.7160.0000.7130.722
LD101.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0000.0001.0001.0001.0000.0001.0001.000
200.9930.9960.9970.9950.9960.9980.9960.9960.9960.9960.9870.0000.9890.9940.9880.0000.9870.997
500.9460.9520.9540.9620.9590.9610.9550.9570.9450.9140.9030.0000.9050.8760.8980.0000.9080.911
LD201.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0000.0001.0001.0001.0000.0001.0001.000
201.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0000.0001.0001.0001.0000.0001.0001.000
501.0001.0001.0001.0001.0001.0001.0000.9991.0000.9990.9950.0000.9980.9960.9970.0000.9940.999
LowLE01.0001.0001.0001.0001.0001.0001.0001.0001.0001.0000.9960.0000.9970.9830.9970.0000.9960.998
200.9480.9480.9550.9540.9620.9620.9550.9550.9530.8950.8870.0000.9030.7980.9070.0000.8930.895
500.8270.8370.8370.8460.8400.8550.8490.8500.8390.7520.7610.0000.7810.5870.7850.0000.7760.749
LD101.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0000.0001.0000.9991.0000.0001.0001.000
200.9960.9960.9960.9980.9970.9980.9980.9960.9970.9980.9970.0000.9980.9890.9990.0000.9960.999
500.9550.9560.9640.9650.9650.9630.9620.9580.9580.9420.9380.0000.9380.8600.9410.0000.9340.940
LD201.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0000.0001.0001.0001.0000.0001.0001.000
201.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0000.0001.0001.0001.0000.0001.0001.000
500.9980.9980.9990.9990.9980.9990.9980.9990.9980.9990.9980.0000.9990.9960.9990.0000.9980.999
Mixture OneLE00.9610.9690.9720.9680.9740.9750.9720.9750.9710.9580.9280.0000.9380.9050.9390.0000.9300.958
200.9130.9170.9300.9310.9270.9250.9330.9300.9190.9150.8760.0000.8960.8210.8870.0000.8680.903
500.8460.8740.8780.8900.8940.8970.9010.8900.8810.8400.8070.0000.8140.7140.8150.0000.8000.847
LD101.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0000.0001.0001.0001.0000.0000.9991.000
200.9910.9950.9950.9940.9920.9930.9940.9920.9930.9970.9910.0000.9940.9840.9950.0000.9890.996
500.9800.9840.9850.9900.9870.9900.9880.9880.9850.9800.9660.0000.9660.9250.9690.0000.9610.978
LD201.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0000.0001.0001.0001.0000.0001.0001.000
201.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0000.0001.0001.0001.0000.0001.0001.000
501.0000.9991.0001.0001.0000.9991.0001.0001.0001.0000.9990.0000.9990.9990.9990.0001.0000.999
Mixture TwoLE00.9920.9930.9900.9910.9920.9950.9940.9950.9950.9910.9790.0000.9850.9510.9850.0000.9830.991
200.9150.9230.9250.9360.9370.9400.9310.9330.9290.9090.8750.0000.8930.7610.8830.0000.8740.899
500.7890.7950.8060.8080.8130.8150.8130.8130.8080.7680.7430.0000.7340.5380.7410.0000.7210.777
LD101.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0000.0001.0001.0001.0000.0001.0001.000
200.9920.9920.9940.9920.9930.9940.9920.9940.9930.9970.9960.0000.9940.9850.9950.0000.9970.997
500.9320.9410.9420.9500.9480.9440.9440.9440.9400.9420.9280.0000.9430.8540.9310.0000.9320.941
LD201.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0000.0001.0001.0001.0000.0001.0001.000
201.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0000.0001.0001.0001.0000.0001.0001.000
501.0000.9991.0001.0001.0001.0001.0001.0001.0000.9970.9980.0000.9980.9970.9970.0000.9960.997

Note: Common denotes gene regions only with common variants, Rare denotes gene regions only with rare variants, Low denotes gene regions only with low-frequency variants, Mixture one denotes gene regions with 20% of common variants and 80% of rare variants, and Mixture two denotes gene regions with 80% of common variants and 20% of rare variants. LE, denotes linkage equilibrium simulation, LD1 denotes linkage disequilibrium simulation when r is between 0.01 and 0.25, LD2 denotes linkage disequilibrium simulation when r is between 0.25 and 0.64.

  46 in total

1.  Optimal tests for rare variant effects in sequencing association studies.

Authors:  Seunggeun Lee; Michael C Wu; Xihong Lin
Journal:  Biostatistics       Date:  2012-06-14       Impact factor: 5.899

2.  Powerful SNP-set analysis for case-control genome-wide association studies.

Authors:  Michael C Wu; Peter Kraft; Michael P Epstein; Deanne M Taylor; Stephen J Chanock; David J Hunter; Xihong Lin
Journal:  Am J Hum Genet       Date:  2010-06-11       Impact factor: 11.025

3.  A powerful and flexible multilocus association test for quantitative traits.

Authors:  Lydia Coulter Kwee; Dawei Liu; Xihong Lin; Debashis Ghosh; Michael P Epstein
Journal:  Am J Hum Genet       Date:  2008-02       Impact factor: 11.025

4.  On the reconciliation of missing heritability for genome-wide association studies.

Authors:  Guo-Bo Chen
Journal:  Eur J Hum Genet       Date:  2016-07-20       Impact factor: 4.246

5.  Longitudinal SNP-set association analysis of quantitative phenotypes.

Authors:  Zhong Wang; Ke Xu; Xinyu Zhang; Xiaowei Wu; Zuoheng Wang
Journal:  Genet Epidemiol       Date:  2016-11-09       Impact factor: 2.135

6.  Quantitative trait locus analysis for next-generation sequencing with the functional linear models.

Authors:  Li Luo; Yun Zhu; Momiao Xiong
Journal:  J Med Genet       Date:  2012-08       Impact factor: 6.318

7.  Longitudinal association analysis of quantitative traits.

Authors:  Ruzong Fan; Yiwei Zhang; Paul S Albert; Aiyi Liu; Yuanjia Wang; Momiao Xiong
Journal:  Genet Epidemiol       Date:  2012-09-10       Impact factor: 2.135

Review 8.  Explaining additional genetic variation in complex traits.

Authors:  Matthew R Robinson; Naomi R Wray; Peter M Visscher
Journal:  Trends Genet       Date:  2014-03-11       Impact factor: 11.639

9.  Genome-wide association and longitudinal analyses reveal genetic loci linking pubertal height growth, pubertal timing and childhood adiposity.

Authors:  Diana L Cousminer; Diane J Berry; Nicholas J Timpson; Wei Ang; Elisabeth Thiering; Enda M Byrne; H Rob Taal; Ville Huikari; Jonathan P Bradfield; Marjan Kerkhof; Maria M Groen-Blokhuis; Eskil Kreiner-Møller; Marcella Marinelli; Claus Holst; Jaakko T Leinonen; John R B Perry; Ida Surakka; Olli Pietiläinen; Johannes Kettunen; Verneri Anttila; Marika Kaakinen; Ulla Sovio; Anneli Pouta; Shikta Das; Vasiliki Lagou; Chris Power; Inga Prokopenko; David M Evans; John P Kemp; Beate St Pourcain; Susan Ring; Aarno Palotie; Eero Kajantie; Clive Osmond; Terho Lehtimäki; Jorma S Viikari; Mika Kähönen; Nicole M Warrington; Stephen J Lye; Lyle J Palmer; Carla M T Tiesler; Claudia Flexeder; Grant W Montgomery; Sarah E Medland; Albert Hofman; Hakon Hakonarson; Mònica Guxens; Meike Bartels; Veikko Salomaa; Joanne M Murabito; Jaakko Kaprio; Thorkild I A Sørensen; Ferran Ballester; Hans Bisgaard; Dorret I Boomsma; Gerard H Koppelman; Struan F A Grant; Vincent W V Jaddoe; Nicholas G Martin; Joachim Heinrich; Craig E Pennell; Olli T Raitakari; Johan G Eriksson; George Davey Smith; Elina Hyppönen; Marjo-Riitta Järvelin; Mark I McCarthy; Samuli Ripatti; Elisabeth Widén
Journal:  Hum Mol Genet       Date:  2013-02-27       Impact factor: 6.150

10.  Longitudinal data analysis for rare variants detection with penalized quadratic inference function.

Authors:  Hongyan Cao; Zhi Li; Haitao Yang; Yuehua Cui; Yanbo Zhang
Journal:  Sci Rep       Date:  2017-04-05       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.