Literature DB >> 25519346

Bivariate association analysis of longitudinal phenotypes in families.

Abstract

Statistical genetic methods incorporating temporal variation allow for greater understanding of genetic architecture and consistency of biological variation influencing development of complex diseases. This study proposes a bivariate association method jointly testing association of two quantitative phenotypic measures from different time points. Measured genotype association was analyzed for single-nucleotide polymorphisms (SNPs) for systolic blood pressure (SBP) from the first and third visits using 200 simulated Genetic Analysis Workshop 18 (GAW18) replicates. Bivariate association, in which the effect of an SNP on the mean trait values of the two phenotypes is constrained to be equal for both measures and is included as a covariate in the analysis, was compared with a bivariate analysis in which the effect of an SNP was estimated separately for the two measures and univariate association analyses in 9 SNPs that explained greater than 0.001% SBP variance over all 200 GAW18 replicates.The SNP 3_48040283 was significantly associated with SBP in all 200 replicates with the constrained bivariate method providing increased signal over the unconstrained bivariate method. This method improved signal in all 9 SNPs with simulated effects on SBP for nominal significance (p-value <0.05). However, this appears to be determined by the effect size of the SNP on the phenotype. This bivariate association method applied to longitudinal data improves genetic signal for quantitative traits when the effect size of the variant is moderate to large.

Entities: CellLine Chemical Disease Gene Species

Year: 2014 PMID： 25519346 PMCID： PMC4143799 DOI： 10.1186/1753-6561-8-S1-S90

Source DB: PubMed Journal: BMC Proc ISSN： 1753-6561

Background

Traditional analyses of genetic variants influencing complex diseases focus on phenotypes and covariate measurements from a single time point. However, the majority of human epidemiologic studies collect information from multiple measurements. This, coupled with the knowledge that many quantitative phenotypes correlated with complex disease change with age or environmental confounders, suggests that inclusion of a temporal component may allow for increased understanding of complex diseases. Given the nature of these longitudinal data, methods jointly using multiple time points when performing association may have increased statistical power over univariate association methods [1-6]. However, although some statistical methods have been proposed for the analysis of longitudinal data, few have been successful in being adopted by the wider genetic epidemiologic community because of the difficulty of implementing them. One potential drawback to the utility of these bivariate methods is the addition of a degree of freedom as a result of the additional phenotype, thereby potentially reducing statistical power to detect genetic signals that do not vary with time or age. We present a method for bivariate association using longitudinal data from the same phenotype in families using the Genetic Analysis Workshop 18 (GAW18) simulated single-nucleotide polymorphism (SNP) data for the phenotype systolic blood pressure (SBP) from visits 1 and 3. We have previously applied this method to the analysis of different phenotypic measures of heart rate (echo- and electrocardiograms) in American Indian participants of the Strong Heart Family Study [7] but wish to test its efficacy in a simulated longitudinal data set. To test this method, we first conducted association using measured genotype analysis of all SNPs for SBP from visits 1 and 3 using the GAW18 family data. We then conducted two bivariate analyses within the variance-component framework using 20 SNPs known to influence SBP from the GAW18 SNPs and 20 SNPs that did not explain any of the SBP variance identified in our association analysis. This work was done with knowledge of the GAW18 simulating model.

Methods

Data description

The GAW18 data set contains 959 individuals from 20 extended Mexican American pedigrees from the Type 2 Diabetes Consortium. Each of the 200 simulated data sets includes the following information for each individual for three time periods along with gender: age, SBP, diastolic blood pressure (DBP), hypertension status, blood pressure medication status, and smoking [8].

Univariate association

Maximum likelihood methods, taking into account relationships among family members, were used to determine association for the phenotypes SBP at visit 1 (SBP_1) and visit 3 (SBP_3) independently in a polygenic model available in the computer program Sequential Oligogenic Linkage Analysis Routines (SOLAR) [9]. Covariates included age, sex, and their interactions as well as smoking for both visits 1 and 3. Variables were carried forward to association models if associated with SBP_1 or SBP_3 at p-valuebelow0.05. Measured genotype analysis was conducted for all available GAW18 polymorphic variants in which the number of minor alleles is added to the quantitative polygenic genetic model as a covariate to assess the effect of the SNP on the mean of the trait using the equation where s defines a variate for the ith SNP that takes the value, 0, 1, and 2 for the marker genotypes AA, Aa, and aa, respectively; α represents one-half the displacement between homozygous marker means; β represents fixed-effect regression coefficients for any measured covariates x; and g and e are random effects representing residual genetic effects and random environmental effects [10]. This model tests whether α is different from 0 using a likelihood ratio test. Twice the difference in log-likelihoods is distributed as a random variable with 1 degree of freedom.

Bivariate association

We also applied maximum likelihood methods accounting for familial relationships in bivariate association analyses. This bivariate method investigates two related phenotypes simultaneously, modeling genetic and environmental correlations between them [11]. Our proposed method investigates the effect of an SNP on the mean trait values of two longitudinal phenotypes i and j, constraining the displacement in trait means (α) with each copy of the minor allele to be equal for both measures using the equations where α, β, and βare fixed-effect regression coefficients and g and e are modeled through random effects with the bivariate model allowing for correlations between gand g(ρ) and between eand e(ρ). The difference between the log-likelihoods of a model in which the SNP effect is estimated versus one in which it is constrained to zero is then distributed as a distribution with 1 degree of freedom. For our bivariate analysis, we used the same covariates from the univariate analysis along with 9 variants that explained greater than 0.001 of SBP variance from the GAW18 answers.We then compared these results with univariate association models and a bivariate model in which the effect of genotype on the mean trait value of the two phenotypes was estimated separately, distributed as a distribution with 2 degrees of freedom.Results were compared between approaches over 200 GAW18 replicates to determine which method provided the best evidence for genetic signal for these SNPs, tallying the proportion of replicates in which association was detected at p-values below and .

Results

The average genetic correlation (ρ) for SBP over 200 GAW18 replicates between visits 1 and 3 was with an average environmental correlation of . This high ρvalue demonstrates that these two phenotypes are measures of the same genetic mechanism and therefore appropriate for our proposed bivariate association approach. Table 1 shows results of three different association analyses for 9 SNPs influencing SBP across all 200 GAW18 replicates for p-values below 0.05, 0.001, and . All analyses identified the variant 3_48040283 in MAP4 as genome-wide significant . The MAP4 SNP, 3_47957996 was significant in 199 of the constrained bivariate tests and 200 of the unconstrained tests, with the number of genome-wide significant replicates dropping slightly for univariate models. Two additional variants, 1_66075952 from LEPR and MAP4 variant 3_28601297, demonstrated low numbers of genome-wide significant associations across the four tested association methods.

Table 1

Comparisons of association analyses results for 9 functional variants explaining more than 0.001 of the trait variance.

Variant (%Variance SBP¹)	Bivariate constrained				Bivariate unconstrained			Univariate visit 1			Univariate visit 3

	0.001	5.0 × 10⁻⁵	5.0 × 10⁻⁹	0.001	5.0 × 10⁻⁵	5.0 × 10⁻⁹	0.001	5.0 × 10⁻⁵	5.0 × 10⁻⁹	0.001	5.0 × 10⁻⁵	5.0 × 10⁻⁹
3_48040283 (0.0278)	200²	200	200	200	200	200	200	200	199	200	200	197
1_66075952 (0.0206)	153	76	1	121	50	1	137	64	1	125	57	2
3_47957996 (0.0149)	200	200	199	200	200	200	200	200	193	200	200	195
3_47956424 (0.0143)	182	133	3	177	111	4	169	108	5	169	115	3
3_48040284 (0.011)	49	14	0	33	8	0	24	4	0	30	7	0
13_28624294 (0.0081)	26	0	0	4	0	0	11	1	0	18	4	0
3_47913455 (0.004)	11	1	0	8	1	0	3	0	0	5	0	0
3_58109162 (0.0027)	41	9	0	22	5	0	13	0	0	8	0	0
19_12541795 (0.0017)	0	0	0	0	0	0	0	0	0	0	0	0

1Percent of the variance explained by the variant for SBP from the Genetic Analysis Workshop 18 (GAW18) answers.

2Number of replicates exceeding threshold.

Comparisons of association analyses results for 9 functional variants explaining more than 0.001 of the trait variance. 1Percent of the variance explained by the variant for SBP from the Genetic Analysis Workshop 18 (GAW18) answers. 2Number of replicates exceeding threshold. When comparing the different methods, the bivariate method in which the effect of genotype on mean trait values of two phenotypes is constrained to be equal provided the most robust analysis, improving association for all 9 variants compared with the bivariate analysis in which these values were estimated separately and versus univariate analyses of exam 1 and 3 in cases where the p-value is less than 0.001 or pis below 5.0 × 10−5. To ensure that the improved power for the constrained bivariate approach did not come at the expense of increased false-positive rates, we chose 20 SNPs that did not explain any of the variance from the simulated model. For these 20 null markers, there were an average of 8.1 replicates less than 0.05 for the constrained bivariate (range, 1-28), indicating no systematic inflation of p-values under the null (data not shown).

Discussion

The analysis of genetic variants using longitudinal data has the potential to be a valuable resource for determining biological and environmental factors affecting complex disease phenotypes over time. This type of analysis may provide increased power to detect rare genetic variants in complex diseases or to better understand when genetic components contribute to human development [4]. In addition, these types of analyses may allow for the identification of environmental covariates associated with complex diseases[2]. However, although statistical genetic methods for the analysis of longitudinal data have been proposed, they have not been widely adopted. The single degree of freedom association test we propose could also be implemented easily in generalized estimating equations (GEEs) or other mixed-model frameworks. However, theoretical advantages to using the likelihood-based variance component framework are that the bivariate variance component model explicitly allows both shared/stable and unshared/changing genetic and environmental effects across timeand age in the random effects portion of the model through the estimation of genetic and environmental correlations.

Conclusions

In this paper, we present a bivariate approach to increase the genetic signal for a variant by constraining the effect of the SNP on the phenotype using a variance-component model. This model is predicated on the assumption that there is no gene- by-age interaction; however, the structure is general and is applicable to other issues in genetic epidemiology. As whole-genome data becomes more affordable for large-scale epidemiologic studies, an important consideration will be to maximize the ability to detect rare variants that have a large effect on complex disease. The easiest way to detect these rare variants will be through large pedigrees because they are amplified in families. However, the sample size of family studies is often small, making it difficult to determine association; therefore, methodologies that maximize the use of the genetic data and phenotypes from longitudinal studies may allow for an increased ability to identify genetic variants associated with complex disease. The model presented in this manuscript can be used as an early step in the analysis of longitudinal data and may lead to the development of more complex models.

Competing interests

There are no competing interests.

Authors' contributions

LA designed the overall study. PM conducted the analysis and drafted the manuscript. All authors read and accepted the final manuscript.

11 in total

1. Bivariate genetic association of KIAA1797 with heart rate in American Indians: the Strong Heart Family Study.

Authors: Phillip E Melton; Sue Rutherford; Venkata Saroja Voruganti; Harald H H Göring; Sandra Laston; Karin Haack; Anthony G Comuzzie; Thomas D Dyer; Matthew P Johnson; Jack W Kent; Joanne E Curran; Eric K Moses; John Blangero; Ana Barac; Elisa T Lee; Lyle G Best; Richard R Fabsitz; Richard B Devereux; Peter M Okin; Jonathan N Bella; Uli Broeckel; Barbara V Howard; Jean W MacCluer; Shelley A Cole; Laura Almasy
Journal: Hum Mol Genet Date: 2010-07-03 Impact factor: 6.150

2. Quantitative trait nucleotide analysis using Bayesian model selection.

Authors: John Blangero; Harald H H Goring; Jack W Kent; Jeff T Williams; Charles P Peterson; Laura Almasy; Thomas D Dyer
Journal: Hum Biol Date: 2005-10 Impact factor: 0.553

3. Ignoring temporal trends in genetic effects substantially reduces power of quantitative trait linkage analysis.

Authors: Gang Shi; D C Rao
Journal: Genet Epidemiol Date: 2008-01 Impact factor: 2.135

4. On the replication of genetic associations: timing can be everything!

Authors: Jessica Lasky-Su; Helen N Lyon; Valur Emilsson; Iris M Heid; Cliona Molony; Benjamin A Raby; Ross Lazarus; Barbara Klanderman; Manuel E Soto-Quiros; Lydiana Avila; Edwin K Silverman; Gudmar Thorleifsson; Unnur Thorsteinsdottir; Florian Kronenberg; Caren Vollmert; Thomas Illig; Caroline S Fox; Daniel Levy; Nan Laird; Xiao Ding; Matt B McQueen; Johannah Butler; Kristin Ardlie; Constantina Papoutsakis; George Dedoussis; Christopher J O'Donnell; H-Erich Wichmann; Juan C Celedón; Eric Schadt; Joel Hirschhorn; Scott T Weiss; Kari Stefansson; Christoph Lange
Journal: Am J Hum Genet Date: 2008-04 Impact factor: 11.025

5. Bivariate quantitative trait linkage analysis: pleiotropy versus co-incident linkages.

Authors: L Almasy; T D Dyer; J Blangero
Journal: Genet Epidemiol Date: 1997 Impact factor: 2.135

6. Multipoint quantitative-trait linkage analysis in general pedigrees.

Authors: L Almasy; J Blangero
Journal: Am J Hum Genet Date: 1998-05 Impact factor: 11.025

7. Evidence for a gene influencing blood pressure on chromosome 17. Genome scan linkage results for longitudinal blood pressure phenotypes in subjects from the framingham heart study.

Authors: D Levy; A L DeStefano; M G Larson; C J O'Donnell; R P Lifton; H Gavras; L A Cupples; R H Myers
Journal: Hypertension Date: 2000-10 Impact factor: 10.190

8. Longitudinal association analysis of quantitative traits.

Authors: Ruzong Fan; Yiwei Zhang; Paul S Albert; Aiyi Liu; Yuanjia Wang; Momiao Xiong
Journal: Genet Epidemiol Date: 2012-09-10 Impact factor: 2.135

9. Genome-wide association mapping with longitudinal data.

Authors: Nicholas A Furlotte; Eleazar Eskin; Susana Eyheramendy
Journal: Genet Epidemiol Date: 2012-05-11 Impact factor: 2.135

10. Data for Genetic Analysis Workshop 18: human whole genome sequence, blood pressure, and simulated phenotypes in extended pedigrees.

Authors: Laura Almasy; Thomas D Dyer; Juan M Peralta; Goo Jun; Andrew R Wood; Christian Fuchsberger; Marcio A Almeida; Jack W Kent; Sharon Fowler; Tom W Blackwell; Sobha Puppala; Satish Kumar; Joanne E Curran; Donna Lehman; Goncalo Abecasis; Ravindranath Duggirala; John Blangero
Journal: BMC Proc Date: 2014-06-17

2 in total

1. Rare-Variant Kernel Machine Test for Longitudinal Data from Population and Family Samples.

Authors: Qi Yan; Daniel E Weeks; Hemant K Tiwari; Nengjun Yi; Kui Zhang; Guimin Gao; Wan-Yu Lin; Xiang-Yang Lou; Wei Chen; Nianjun Liu
Journal: Hum Hered Date: 2016-04-29 Impact factor: 0.444

2. Constrained multivariate association with longitudinal phenotypes.

Authors: Phillip E Melton; Juan M Peralta; Laura Almasy
Journal: BMC Proc Date: 2016-10-18

2 in total