Literature DB >> 25519409

Analysis of baseline, average, and longitudinally measured blood pressure data using linear mixed models.

Abstract

This article compares baseline, average, and longitudinal data analysis methods for identifying genetic variants in genome-wide association study using the Genetic Analysis Workshop 18 data. We apply methods that include (a) linear mixed models with baseline measures, (b) random intercept linear mixed models with mean measures outcome, and (c) random intercept linear mixed models with longitudinal measurements. In the linear mixed models, covariates are included as fixed effects, whereas relatedness among individuals is incorporated as the variance-covariance structure of the random effect for the individuals. The overall strategy of applying linear mixed models decorrelate the data is based on Aulchenko et al.'s GRAMMAR. By analyzing systolic and diastolic blood pressure, which are used separately as outcomes, we compare the 3 methods in identifying a known genetic variant that is associated with blood pressure from chromosome 3 and simulated phenotype data. We also analyze the real phenotype data to illustrate the methods. We conclude that the linear mixed model with longitudinal measurements of diastolic blood pressure is the most accurate at identifying the known single-nucleotide polymorphism among the methods, but linear mixed models with baseline measures perform best with systolic blood pressure as the outcome.

Entities: CellLine Chemical Disease Gene Mutation Species

Year: 2014 PMID： 25519409 PMCID： PMC4143715 DOI： 10.1186/1753-6561-8-S1-S80

Source DB: PubMed Journal: BMC Proc ISSN： 1753-6561

Background

Hypertension is a major morbidity and mortality risk factor for stroke, myocardial infarction, heart failure, and end-stage renal disease [1]. It is a multifactorial disorder resulting from inheritance of several susceptibility genes, as well as multiple environmental determinants, including weight control, dietary intake, physical activity, and alcohol consumption [2]. To date, several variants have been identified by genome-wide association studies (GWAS) as being associated with blood pressure and hypertension [1,3,4]. Various statistical, data mining, and machine learning strategies have shown some promise for identifying genetic variants, but are not scalable to large-scale GWAS [5,6]. Linear mixed models (LMMs) are widely used in controlling for phenotypes and relatedness within GWAS [7]. In the application of LMMs for GWAS data the covariates are included as fixed effects, whereas kinship among individuals is incorporated as a variance-covariance structure of the random effect for the individuals. We followed Aulchenko et al's [8] residual approach, which is based on a 2-step strategy in the application of the LMM. The first step optimizes a reduced LMM with the genetic marker effect excluded. In the second step, the residual from the reduced LMM is fitted as the dependent variable to test each marker in a linear model. We give an overview of 3 LMMs for the analysis of Genetic Analysis Workshop 18 (GAW18) data, paying attention to the power of selecting an associated single-nucleotide polymorphism (SNP) from chromosome 3 and simulated phenotype data. In particular, we apply 3 types of LMMs for statistical analysis of baseline measurements, mean measurements, and longitudinal data. We compare the LMMs through simulations and illustrate them using the real phenotype data.

Methods

Data and quality control

We use 3 models to analyze the GWAS data set from chromosome 3 of the GAW18: a Diabetes-GENES Project, which consists of whole genome sequence data in a pedigree-based sample, longitudinal phenotype data for hypertension and related traits, and selected covariates. In this GWAS data, 65,519 SNPs have been genotyped for chromosome 3. In the simulated phenotype data, 849 subjects were measured at 3 time points for age, medication use, smoking status, and blood pressure. As is standard practice, SNPs with minor allele frequency (MAF) <1% were excluded from data analysis. We also filtered out SNPs with low call rates (<90%) and deviation from Hardy-Weinberg equilibrium (p value ≤ 1e−6). The quality controls were implemented using the R package SNPassoc [9]. In addition, we excluded 4 individuals because more than half of their SNP values were missing. After filtering, a total of 27,313 SNPs and 845 samples met our quality-control criteria and were used for analysis. The family relationships among these individuals were copied from the pedigree on the real data.

Statistical analysis to evaluate the effect of SNPs

We consider 3 LMMs to evaluate the effect of SNPs on systolic blood pressure (SBP) and diastolic blood pressure (DBP) separately.

Model 1: GRAMMAR approach for baseline measures analysis

Aulchenko et al. [8] proposed a genome-wide rapid association using mixed model and regression (GRAMMAR) to assess significance of the effect of a polymorphism. The method first obtains residuals adjusted for family effects and then analyzes the association between these residuals and genetic polymorphisms using least-squares methods. The model is expressed as follows: Initial model. The initial model is , where yis the value of phenotype corresponding to the jth individual in the ith pedigree, xis the value of the kth covariate or fixed effect, is an estimate of the kth fixed effect or covariate, and eis the vector of residual effects. G is the random polygenic effect that follows a multivariate normal distribution with mean 0 and variance where Φ is the relationship matrix (kinship matrix) and the additive genetic variance as a result of polygenes. The vector of estimated residuals is given by . SNP model. The residuals are used as the dependent trait in a simple linear regression for each SNP, , where is the coefficient of the lth SNP from the model 1 scenario. The method adjusts for familial relationship and is computationally fast, but the model only considers the time 1 information from the GAW18 data. The first stage model is implemented using the polygenic() function of the R package GenABEL, and the kinship matrix is estimated using the R package kinship2. Next, the lm() function is used for fitting the linear model with residuals obtained from the first-stage model.

Model 2: Two-stage LMM for mean measured outcome analysis

We considered the measurement of the mean across the 3 time points as the outcome and followed the 2-stage approach. The model formula for the first stage is , where denotes the mean phenotype across the time points for the jth individual in the ith pedigree. β is the coefficient for unknown fixed effects representing nongenetic effects (mean age across time points, sex, smoking status at time 1, and medication use at time 1), and G is the random polygenic effect that follows a multivariate normal distribution with mean 0 and variance where K is the kinship matrix with elements calculated from pedigrees, and is an unknown genetic variance; e is a vector of random residual effects that are normally distributed with zero mean and variance-covariance where I is the identity matrix and is the unknown residual variance. In the second stage we consider the residuals as the outcome and fit the linear model, where is the coefficient of the lth SNP from the model 2 scenario. We implemented the model using the R package kinship (R 2.10.1). The lmekin() function is used to obtain the residuals from the first-stage model.

Model 3: Two-stage LMM for longitudinal analysis

We also evaluate a 2-stage LMM that takes longitudinal measurements into account. We consider 2 models, without and with time effects, in the application of the longitudinal analysis. In the first stage, we fit a random intercept LMM as follows: (A)(1) Next, we extend the first-stage model allowing time points: (B)(2) where ydenotes the phenotype (SBP or DBP) for the jth individual in the ith pedigree at time t. xis the k fixed effect time-dependent covariate, v is the slope coefficient for the time points z; t = 1, 2, 3, and G is the random polygenic effect as in model 2; e is a vector of random residual effects that are normally distributed with zero mean and covariance , and is the unknown residual variance. Then we consider the mean of the residuals across the time points as the outcome and fit the model where is the vector of mean residuals across the time points and is the coefficient of the lth SNP from the model 3 scenario. We applied the model using the R packages pedigreemm and kinship.

Results

Simulated data analysis

We investigated the performance of the 3 LMMs for selecting a known associated SNP from simulation studies. The 3 models are employed after adjusting for covariates and pedigree information, and the p values for each SNP are used to rank the SNPs. The simulated phenotype data in GAW18 has 10 known SNPs from chromosome 3 that are associated with blood pressure. Among these SNPs, 2 have MAF >0.05. These 2 variants are rs6442089 (gene symbol: MAP4, position: 47956424, and MAF: 0.367) and rs1131356 (gene symbol: FLNB, position: 58109162, and MAF: 0.488). We investigated the 3 LMMs in terms of selection performances of rs6442089. We selected rs6442089 because it is a well-known SNP from the gene MAP4 that affects blood pressure. We denote a SNP to be significant either if its p value is smaller than a cutoff value or if it belongs to a target number of ranked SNPs. For example, if our target number of selected SNPs is 200, then a SNP will be called truly identified from a simulated phenotype data if its rank belongs to the top 200. Alternatively, if our target cutpoint for p value is 0.001, then a SNP will be called truly identified if the p value of the SNP is found to be less than 0.001. The proportion is estimated by counting how many times from the 200 simulations the SNP (rs6442089) was in the list of target SNPs or within the p value cutpoint. Figure 1 lists the proportions for the 3 methods. Figure 1A indicates that the GRAMMAR procedure with the baseline measures is more effective than any of the other methods in selecting the SNPs considering SBP as outcome. But we found that the GRAMMAR procedure was not effective with baseline DBP measures among the models (Figure 1C). As seen in Figures 1Band D, LMM with mean measures outcome has greater power to detect the genetic variant considering a cutpoint of p values. It appears from the figures that applying LMM to longitudinal DBP data provides better results in selecting the SNP compared to any of the other methods. We found similar results by both Model 3(A) and Model 3(B). That is, the results from Model 3(A) and Model 3(B) do not look qualitatively different from each other: In both cases, the performance of selecting the SNP is lower in SBP and higher in DBP.

Figure 1

Identifying a known significant SNP by three LMMs using the simulated GAW18 data.

Application to real data

We employed the 3 LMMs to real phenotype SBP data after adjusting for covariates and we rank the SNPs using the p values for each SNP. We considered first 3 time points to avoid the missing values of the fourth time point, and we applied the 3 models to the same 845 individuals who were selected in the simulated data analysis. We report 5 top-ranked SNPs in Table 1. It can be seen that the p values from the LMM with longitudinal measurements are conservative compared to other methods. After investigation of the top 20 SNPs we found 3 SNPs in common across the models. The ranks for the known SNP, rs6442089, are 4376, 3105, and 758 by the LMMs with baseline, longitudinal (Model 3A), and mean measures outcome, respectively. Therefore, the real phenotype data suggest that the LMM with mean measurements performs best among the 3 methods for identifying the SNP rs6442089.

Table 1

Top 5 SNPs by the 3 LMMs considering the outcome SBP

Rank	Model 1Baseline measures	Model 2Average measures	Model 3ALongitudinal measures
1	rs2712464 (7.062e-06)	Rs9846213 (1.745e-05)	Rs9813958 (4.211e-04)
2	rs2953046 (9.053e-06)	Rs3911499 (2.500e-05)	Rs2662090 (1.586e-03)
3	rs1445065 (1.864e-05)	Rs17005789(2.813e-05)	Rs10511379 (1.651 e-03)
4	rs2867840 (2.708e-05)	Rs534185 (3.187e-05)	Rs12488556 (1.811e-03)
5	rs1386291 (3.357e-05)	Rs2161060 (4.696e-05)	Rs2366104 (1.810 e-03)

p Values are shown in parentheses.

Top 5 SNPs by the 3 LMMs considering the outcome SBP p Values are shown in parentheses.

Discussion

In this article we applied 3 LMMs to the study of GAW18 in family data and in settings of relevance to baseline measures, mean measures, and longitudinally measured data. The statistical analysis of GWAS for GAW18 data using LMMs with longitudinal DBP measurements is capable of revealing the dynamic pattern of genetic control over chromosome 3 but did not perform competitively with other models for longitudinal SBP measurements. Exploratory/graphical analysis for the trajectories of SBP and DBP measurements also supported the conclusion that DBP had more subject-specific variability in slopes than SBP. However, the GRAMMAR approach with single-measure SBP data at baseline can be used on the development of SNP selection. A general consideration applicable to all the methods discussed here concerns the issue of whether the outcome is linear or nonlinear. An alternative approach could be to relax the conditions imposed on linear models and explore the hidden structure by using a varying coefficient model [10]. Consequently, it will be interesting to apply another method assuming the effects of SNPs are smooth functions of time.

Conclusion

We showed that a linear mixed modeling framework was most accurate at identifying known single-nucleotide polymorphism compared to other competing methods we considered in this manuscript for the analysis of longitudinal measurements of diastolic blood pressure. In contrast, baseline measures performed best with systolic blood pressure highlighting that, depending on the trajectory profile of the quantitative trait of interest, either just baseline values or serially measured values can be useful in genetic association studies.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

AH designed the overall study, performed all of the data analysis and drafted the manuscript. JB assisted in conceiving the idea, helped in drafting the manuscript and provided overall supervision. Both authors read and approved the final manuscript.

9 in total

1. Trimming, weighting, and grouping SNPs in human case-control association studies.

Authors: J Hoh; A Wille; J Ott
Journal: Genome Res Date: 2001-12 Impact factor: 9.043

2. SNPassoc: an R package to perform whole genome association studies.

Authors: Juan R González; Lluís Armengol; Xavier Solé; Elisabet Guinó; Josep M Mercader; Xavier Estivill; Víctor Moreno
Journal: Bioinformatics Date: 2007-01-31 Impact factor: 6.937

3. Genomewide rapid association using mixed model and regression: a fast and simple method for genomewide pedigree-based quantitative trait loci association analysis.

Authors: Yurii S Aulchenko; Dirk-Jan de Koning; Chris Haley
Journal: Genetics Date: 2007-07-29 Impact factor: 4.562

4. Neural network analysis of complex traits.

Authors: P R Lucek; J Ott
Journal: Genet Epidemiol Date: 1997 Impact factor: 2.135

5. A common genetic variant of FCN3/CD164L2 is associated with essential hypertension in a Chinese population.

Authors: Jingyi Lu; Ming Li; Rong Zhang; Cheng Hu; Congrong Wang; Feng Jiang; Weihui Yu; Wen Qin; Shanshan Tang; Weiping Jia
Journal: Clin Exp Hypertens Date: 2012-04-03 Impact factor: 1.749

6. Mixed linear model approach adapted for genome-wide association studies.

Authors: Zhiwu Zhang; Elhan Ersoz; Chao-Qiang Lai; Rory J Todhunter; Hemant K Tiwari; Michael A Gore; Peter J Bradbury; Jianming Yu; Donna K Arnett; Jose M Ordovas; Edward S Buckler
Journal: Nat Genet Date: 2010-03-07 Impact factor: 38.330

7. Joint associations of physical activity and aerobic fitness on the development of incident hypertension: coronary artery risk development in young adults.

Authors: Mercedes R Carnethon; Natalie S Evans; Timothy S Church; Cora E Lewis; Pamela J Schreiner; David R Jacobs; Barbara Sternfeld; Stephen Sidney
Journal: Hypertension Date: 2010-06-01 Impact factor: 10.190

8. Genome-wide association study of blood pressure and hypertension.

Authors: Daniel Levy; Georg B Ehret; Kenneth Rice; Germaine C Verwoert; Lenore J Launer; Abbas Dehghan; Nicole L Glazer; Alanna C Morrison; Andrew D Johnson; Thor Aspelund; Yurii Aulchenko; Thomas Lumley; Anna Köttgen; Ramachandran S Vasan; Fernando Rivadeneira; Gudny Eiriksdottir; Xiuqing Guo; Dan E Arking; Gary F Mitchell; Francesco U S Mattace-Raso; Albert V Smith; Kent Taylor; Robert B Scharpf; Shih-Jen Hwang; Eric J G Sijbrands; Joshua Bis; Tamara B Harris; Santhi K Ganesh; Christopher J O'Donnell; Albert Hofman; Jerome I Rotter; Josef Coresh; Emelia J Benjamin; André G Uitterlinden; Gerardo Heiss; Caroline S Fox; Jacqueline C M Witteman; Eric Boerwinkle; Thomas J Wang; Vilmundur Gudnason; Martin G Larson; Aravinda Chakravarti; Bruce M Psaty; Cornelia M van Duijn
Journal: Nat Genet Date: 2009-05-10 Impact factor: 38.330

9. Common variants in the ATP2B1 gene are associated with susceptibility to hypertension: the Japanese Millennium Genome Project.

Authors: Yasuharu Tabara; Katsuhiko Kohara; Yoshikuni Kita; Nobuhito Hirawa; Tomohiro Katsuya; Takayoshi Ohkubo; Yumiko Hiura; Atsushi Tajima; Takayuki Morisaki; Toshiyuki Miyata; Tomohiro Nakayama; Naoyuki Takashima; Jun Nakura; Ryuichi Kawamoto; Norio Takahashi; Akira Hata; Masayoshi Soma; Yutaka Imai; Yoshihiro Kokubo; Tomonori Okamura; Hitonobu Tomoike; Naoharu Iwai; Toshio Ogihara; Itsuro Inoue; Katsushi Tokunaga; Toby Johnson; Mark Caulfield; Patricia Munroe; Satoshi Umemura; Hirotsugu Ueshima; Tetsuro Miki
Journal: Hypertension Date: 2010-10-04 Impact factor: 10.190

9 in total

4 in total

1. Genome-wide association of trajectories of systolic blood pressure change.

Authors: Anne E Justice; Annie Green Howard; Geetha Chittoor; Lindsay Fernandez-Rhodes; Misa Graff; V Saroja Voruganti; Guoqing Diao; Shelly-Ann M Love; Nora Franceschini; Jeffrey R O'Connell; Christy L Avery; Kristin L Young; Kari E North
Journal: BMC Proc Date: 2016-10-18

2. Trans-ethnic meta-analysis identifies new loci associated with longitudinal blood pressure traits.

Authors: Mateus H Gouveia; Amy R Bentley; Hampton Leonard; Karlijn A C Meeks; Kenneth Ekoru; Guanjie Chen; Michael A Nalls; Eleanor M Simonsick; Eduardo Tarazona-Santos; Maria Fernanda Lima-Costa; Adebowale Adeyemo; Daniel Shriner; Charles N Rotimi
Journal: Sci Rep Date: 2021-02-18 Impact factor: 4.996

3. Comparing Analytic Methods for Longitudinal GWAS and a Case-Study Evaluating Chemotherapy Course Length in Pediatric AML. A Report from the Children's Oncology Group.

Authors: Marijana Vujkovic; Richard Aplenc; Todd A Alonzo; Alan S Gamis; Yimei Li
Journal: Front Genet Date: 2016-08-05 Impact factor: 4.599

4. A Comparison of Statistical Methods for the Discovery of Genetic Risk Factors Using Longitudinal Family Study Designs.

Authors: Kelly M Burkett; Marie-Hélène Roy-Gagnon; Jean-François Lefebvre; Cheng Wang; Bénédicte Fontaine-Bisson; Lise Dubois
Journal: Front Immunol Date: 2015-11-19 Impact factor: 7.561

4 in total