Literature DB >> 20018036

Using a latent growth curve model for an integrative assessment of the effects of genetic and environmental factors on multiple phenotypes.

Jemila S Hamid1, Nicole M Roslin, Andrew D Paterson, Joseph Beyene.   

Abstract

We propose the use of latent growth curve model to assess the influence of genetic, environmental, demographic, and lifestyle factors on multiple phenotypes related to coronary heart disease. We model four quantitative traits (systolic blood pressure, high-density lipoprotein, low-density lipoprotein, and triglycerides) simultaneously in a multivariate framework that allows us to study their change over time, assess individual variation, and investigate cross-phenotype relationships. Environmental, demographic, and lifestyle covariates are included at different levels of the model as time-varying or time-invariant, as appropriate. To investigate the change over time attributed to genetic factors, we use candidate markers that have previously been shown to be associated with the quantitative traits. We illustrate our approach using independent observations from the offspring cohort of the Framingham Heart Study data.

Entities:  

Year:  2009        PMID: 20018036      PMCID: PMC2795943          DOI: 10.1186/1753-6561-3-S7-S44

Source DB:  PubMed          Journal:  BMC Proc        ISSN: 1753-6561


Background

Numerous studies have identified environmental, demographic, and genetic factors that increase the risk of coronary heart disease (CHD). A notable major study that led to the identification of several risk factors for heart disease is the Framingham Heart Study (FHS), which began in 1948. The study provides measurements of major risk factors such as blood pressure and lipid levels taken over a long period of time, offering the opportunity to model developmental trajectories. Very recently, FHS genotyped individuals, which permits researchers to perform genome-wide association and/or linkage analyses to identify potential genetic factors that may influence the development of CHD. Environmental and genetic variables influencing auantitative traits related to CHD such as systolic blood pressure have been studied extensively. Methods ranging from simple regression to more complicated multilevel models have been used to model the longitudinal aspects of blood pressure and other quantitative traits of interest [1]. However, few studies looked at more than one phenotype simultaneously, and cross-phenotype relationships are not often investigated. In this paper, we consider longitudinal measurements taken from four different phenotypes known to be associated with CHD, namely: systolic blood pressure (SBP), low-density lipoprotein (LDL), high-density lipoprotein (HDL), and triglycerides (TG). We propose the use of latent growth curve (LGC) to simultaneously model these quantitative traits in a multivariate framework that allows us to investigate cross-phenotype correlations as well as to study the effect of environmental, genetic, and other covariates on the change of these phenotypes over time.

Methods

Data description

We included data from the FHS offspring cohort provided by Genetic Analysis Workshop 16 (GAW16). We restricted our analysis to independent members of the offspring cohort, which were selected as follows. Starting with the original 1538 families, the Generation 3 cohort was removed, which split the pedigrees into 3379 independent sub-pedigrees. The maximal set of independent samples was obtained, among the samples that belonged to the offspring cohort, consented to have their phenotype data used, and had genotype data, which resulted in 1488 individuals. An additional 171 samples without family data were added for a total of 1659 independent (kinship coefficient = 0) individuals. Among them, 221 individuals had one or more element of missing genotype information and were excluded. We considered time-varying covariates: smoking, hypertension, and cholesterol treatments. Other variables related to CHD including age, sex, body mass index (BMI), and diabetes status were included in the analysis as time-invariant covariates. Selected markers that have been previously identified to be linked and/or associated with the traits are included in the model to account for genetic contribution. The authors have adhered to the data use agreement for FHS data and this agreement has been reviewed and approved by the Research Ethics Board at the Research Institute, The Hospital for Sick Children, Toronto, Canada.

Marker selection

For the lipid traits, eight markers were selected from genes or gene regions that showed evidence for association with lipoprotein or lipid concentrations and were confirmed in a meta-analysis [2]. Two of these eight markers were not present in either the 500 k or 50 k marker sets, and were also not in strong linkage disequilibrium with any marker. Marker rs11591147 (chromosome 1, in PCSK9) was replaced with rs11206510. Marker rs4420638 (chromosome 19, in the APOE-C1-C4-C2 gene cluster), which showed a weaker association in Willer et al. [3], was replaced with rs10402771. Similarly, rs1800775 was replaced with rs1150802. No genome-wide study has shown evidence of significant association with either blood pressure or hypertension. However, we include two markers with the smallest p-values from a genotypic test in the Wellcome Trust Case-Control Study [4]. Information about the markers is provided in Table 1.
Table 1

Selected markers known to be associated with cardiovascular-related traits

MarkerChromosomePosition (bp)Nearest geneAssociated trait
rs11206510155,268,627PCSK9LDL
rs28200371237,503,165CHRM3SBP
rs693221,085,700APOBLDL, TG
rs328819,864,004LPLHDL, TG
rs38901829106,687,476ABCA1HDL
rs289276801116,124,283APOA1 clusterHDL, TG
rs18005881556,510,967LIPCHDL
rs23981621594,631,554NR2F2SBP
rs11508021655,552,737CETPHDL
rs104022711950,021,054APOE clusterLDL
Selected markers known to be associated with cardiovascular-related traits

LGC model

LGC modelling is used to study the effect of genetic and environmental factors on the change of SBP, HDL, LDL, and TG over time. One of the strengths of LGC modelling is that it allows us to study multiple outcomes over time in a multivariate framework, which is particularly useful in investigating the change in the levels of phenotypes simultaneously and assessing cross-phenotype relationships. Suppose yis a measurement taken from individual i in pedigree p at exam t, where i = 1, 2, ..., n, p = 1, 2, ..., k, t = 1, 2, ..., q, then the general growth curve model is described as, where αand βare the intercept and the slope [5]. Time-varying covariates such as vare included in the model at individual level as in Eq. (1), whereas time-invariant covariates such as wenter the model through the growth parameters (intercept and slope) as in Eq. (2). Covariates affecting the phenotypes at the pedigree level such as zare included at the family (or pedigree) level as in Eq. (3). In our case, the measurements corresponding to y are SBP, HDL, LDL, and TG, and these four phenotypes are modelled simultaneously as parallel processes. Moreover, we do not have pedigree level parameters αand βbecause we considered unrelated individuals. We analyzed data using Mplus statistical software [6].

Results

The path diagram given in Figure 1 describes the growth curve used in modeling the longitudinal measurements of SBP, HDL, LDL, and TG. Paths with one arrow represent casual relationships, whereas those with two arrows indicate correlations between the traits involved. For simplicity, we have not included all cross-trait relationships in the diagram; however, the results are provided in Tables 2 and 3. Considerable amount of variation in the intercepts are explained by the time-invariant variables sex, age, baseline BMI, and diabetes status (Table 2). For SBP and HDL, 35.6% and 33.6% of the variations in the intercepts, respectively, are explained by these covariates (Table 2). However, a significant amount of the variations (64.5%, p-value < 0.0001 and 66.4%, p-value <0.0001, for SBP and HDL, respectively) have not been accounted for. On the other hand, only a small amount of the variation in the slopes is explained by the time invariant covariates, where the largest explained variance is for LDL slopes (24.0%).
Figure 1

Path diagram describing growth curve modeling of longitudinal measurements of SBP, HDL, LDL, and TG taken at exams 1, 3, 5, and 7. The environmental and demographic covariates given on both sides of the path diagram represent the time invariant covariates sex, age, baseline BMI, and diabetes status. Genetic covariates represent the ten selected markers. The numbers on the lines connecting these covariates with the intercepts and slopes are percentages of explained variation and correlations. Paths with one arrow indicate causal relationships whereas those with two show correlations. The boxes contain tvalues representing time-varying covariates hypertensive and cholesterol treatments as well as number of cigarettes smoked.

Table 2

Estimated variance for the latent variables and percentage of variation explained by the time-invariant covariates, genetic covariates, and the combined model

% Variance explained by

MeanEstimated varianceEnvironmental factorsGenetic factorsCombined model
HDL
 Intercept52.020136.92633.63.937.5
 Slope0.2847.86016.13.519.6
LDL
 Intercept125.238926.50823.13.326.7
 Slope1.47358.51824.00.224.0
TG
 Intercept71.9893258.82116.018.02.1
 Slope22.294643.7695.57.92.0
SBP
 Intercept119.696130.19835.60.00936.3
 Slope2.60118.5249.40.02811.8
Table 3

Correlations explained by environmental and genetic covariates.a

HDLLDLTRGSBP




InterceptSlopeInterceptSlopeInterceptSlopeInterceptSlope
HDL
 Intercept1.0000.094<0.00010.003<0.00010.0210.0180.001
 Slope0.1421.0000.483<0.00010.549<0.0001<0.00010.058
LDL
 Intercept-0.245b0.0401.000<0.0001<0.00010.2210.9610.253
 Slope0.1590.527-0.2851.0000.0010.0010.8250.017
TRG
 Intercept-0.318-0.0310.176-0.1791.000<0.0001<0.00010.139
 Slope-0.094-0.6080.0490.204-0.2251.0000.5690.008
SBP
 Intercept0.101-0.2580.002-0.0120.1560.0241.0000.020
 Slope-0.1630.1380.0550.158-0.0670.134-0.1631.000

aValues above diagonal are the corresponding p-values.

bBold font indicates significance at α = 0.005.

Estimated variance for the latent variables and percentage of variation explained by the time-invariant covariates, genetic covariates, and the combined model Correlations explained by environmental and genetic covariates.a aValues above diagonal are the corresponding p-values. bBold font indicates significance at α = 0.005. Path diagram describing growth curve modeling of longitudinal measurements of SBP, HDL, LDL, and TG taken at exams 1, 3, 5, and 7. The environmental and demographic covariates given on both sides of the path diagram represent the time invariant covariates sex, age, baseline BMI, and diabetes status. Genetic covariates represent the ten selected markers. The numbers on the lines connecting these covariates with the intercepts and slopes are percentages of explained variation and correlations. Paths with one arrow indicate causal relationships whereas those with two show correlations. The boxes contain tvalues representing time-varying covariates hypertensive and cholesterol treatments as well as number of cigarettes smoked. For the genetic factors, the results from our analysis are in agreement with previous association findings indicated in Table 1. We found strong associations between HDL and markers rs28927680 (p-value < 0.0001), rs1800588 (p-value = 0.002), and rs1150802 (p-value < 0.0001) (through the slope). A weak association between HDL slope and marker rs328 was also observed. Markers rs693 and rs10402271 are shown to be strongly associated with the intercept of LDL (both with p-value < 0.0001), whereas marker rs11206510 showed a weak association (p-value = 0.026). Markers rs28927680 and rs328 are also shown to be strongly associated with the intercept and slope of TG, respectively. For blood pressure, no marker was associated with the intercept of the model; however, a strong association between marker rs1800588 and SBP slope was found (p-value = 0.001). This marker is previously liked to HDL [2], but there has not been any study that linked the marker with SBP. It is important to note that markers rs2820037 and rs2398162, with smallest p-values from a genotypic test (for SBP) in the Wellcome Trust Case-Control Study [4], did not show any association in our data. In general, a small amount of variation for all the quantitative traits is attributed to the genetic covariates, where the largest explained variation (3.9%) was for the intercept of HDL (Table 2). The variation in the latent variables explained by the combined model with both the environmental and genetic factors is shown in Table 2. It can be seen that 37.1% of the variation in the slope of HDL is explained by the model; however, a significant amount (86 out of the total 136.93, p-value < 0.0001) is left unexplained. Further analysis with more environmental and genetic factors is needed to explain this variation. Moreover, the slope and/or intercept of one or more of the phenotypes could be included as a covariate in the analysis to account for a possible casual dependence between the phenotypes. We plan to consider these analyses in future studies. Here we only investigated the cross-phenotype relationships via correlations. Model estimated correlations for the latent variables are given in Figure 1 using curved, double-arrow lines. Table 3 shows the correlation (along with p-values) explained by the environmental and genetic covariates. The residual correlations (data not shown) show that a significant percent of the correlations are not explained by the model, indicating that there are other common factors affecting these phenotypes simultaneously.

Discussion

Our results show that a significant amount of the variations in the intercepts of the traits are explained by environmental and demographic factors. Moreover, the results identified markers that have been previously associated with the traits. We also found a novel association between marker rs1800588 and SBP. In general, however, only a small percent of the variations in the traits were attributed to the genetic factors. In our LGC modelling, we considered unrelated individuals (with kinship coefficient = 0) from the offspring cohort of the FHS data. However, one might be interested to know how the intercepts and slopes vary not only at the individual level but also at the family level. Therefore, it is important to use models that take the correlation among family members into account. This will also allow us to explain some of the residual variances and correlations. One can use two approaches in dealing with this challenge 1) adjust for the dependency when the familial correlation is considered as a nuisance parameter and standard errors and goodness-of-fit statistics are estimated using the sandwich estimator or 2) use a two-level LGC model that allows modelling not only average change in the values of the phenotypes over time but also allows us to assess how the these changes vary between individuals in the same family and between families. We plan to address these issues in subsequent studies.

List of abbreviations used

BMI: Body mass index; CHD: Coronary heart disease; FHS: Framingham Heart Study; GAW16: Genetic Analysis Workshop 16; HDL: High-density lipoprotein; LDL: Low-density lipoprotein; LGC: Latent growth curve; SBP: Systolic blood pressure; SNP: Single-nucleotide polymorphism; TG: Triglyceride

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

JSH contributed to the conception and design of the study, carried out the phenotype modeling, and drafted the manuscript. NMR performed marker selection and helped in drafting the manuscript. ADP participated in drafting the manuscript and helped in the biological interpretation of the results. JB contributed to the conception and design of the study, participated in the phenotype modeling and drafting of the manuscript. All authors read and approved the final manuscript.
  4 in total

1.  Six new loci associated with blood low-density lipoprotein cholesterol, high-density lipoprotein cholesterol or triglycerides in humans.

Authors:  Sekar Kathiresan; Olle Melander; Candace Guiducci; Aarti Surti; Noël P Burtt; Mark J Rieder; Gregory M Cooper; Charlotta Roos; Benjamin F Voight; Aki S Havulinna; Björn Wahlstrand; Thomas Hedner; Dolores Corella; E Shyong Tai; Jose M Ordovas; Göran Berglund; Erkki Vartiainen; Pekka Jousilahti; Bo Hedblad; Marja-Riitta Taskinen; Christopher Newton-Cheh; Veikko Salomaa; Leena Peltonen; Leif Groop; David M Altshuler; Marju Orho-Melander
Journal:  Nat Genet       Date:  2008-01-13       Impact factor: 38.330

2.  Genome-wide linkage analysis of systolic blood pressure slope using the Genetic Analysis Workshop 13 data sets.

Authors:  Dushanthi Pinnaduwage; Joseph Beyene; Shafagh Fallah
Journal:  BMC Genet       Date:  2003-12-31       Impact factor: 2.797

3.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls.

Authors: 
Journal:  Nature       Date:  2007-06-07       Impact factor: 49.962

4.  Newly identified loci that influence lipid concentrations and risk of coronary artery disease.

Authors:  Cristen J Willer; Serena Sanna; Anne U Jackson; Angelo Scuteri; Lori L Bonnycastle; Robert Clarke; Simon C Heath; Nicholas J Timpson; Samer S Najjar; Heather M Stringham; James Strait; William L Duren; Andrea Maschio; Fabio Busonero; Antonella Mulas; Giuseppe Albai; Amy J Swift; Mario A Morken; Narisu Narisu; Derrick Bennett; Sarah Parish; Haiqing Shen; Pilar Galan; Pierre Meneton; Serge Hercberg; Diana Zelenika; Wei-Min Chen; Yun Li; Laura J Scott; Paul A Scheet; Jouko Sundvall; Richard M Watanabe; Ramaiah Nagaraja; Shah Ebrahim; Debbie A Lawlor; Yoav Ben-Shlomo; George Davey-Smith; Alan R Shuldiner; Rory Collins; Richard N Bergman; Manuela Uda; Jaakko Tuomilehto; Antonio Cao; Francis S Collins; Edward Lakatta; G Mark Lathrop; Michael Boehnke; David Schlessinger; Karen L Mohlke; Gonçalo R Abecasis
Journal:  Nat Genet       Date:  2008-01-13       Impact factor: 38.330

  4 in total
  2 in total

1.  Analysis of multiple phenotypes.

Authors:  Jack W Kent
Journal:  Genet Epidemiol       Date:  2009       Impact factor: 2.135

2.  A 2-step strategy for detecting pleiotropic effects on multiple longitudinal traits.

Authors:  Weiqiang Wang; Zeny Feng; Shelley B Bull; Zuoheng Wang
Journal:  Front Genet       Date:  2014-10-20       Impact factor: 4.599

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.