Literature DB >> 25519345

Mixed-effects models for GAW18 longitudinal blood pressure data.

Abstract

In this paper, we propose two mixed-effects models for Genetic Analysis Workshop 18 (GAW18) longitudinal blood pressure data. The first method extends EMMA, an efficient mixed-model association-mapping algorithm. EMMA corrects for population structure and genetic relatedness using a kinship similarity matrix. We replace the kinship similarity matrix in EMMA with an estimated correlation matrix for modeling the dependence structure of repeated measurements. Our second approach is a Bayesian multiple association-mapping algorithm based on a mixed-effects model with a built-in variable selection feature. It models multiple single-nucleotide polymorphisms (SNPs) simultaneously and allows for SNP-SNP interactions and SNP-environment interactions. We applied these two methods to the longitudinal systolic blood pressure (SBP) and diastolic blood pressure (DBP) data from GAW18. The extended EMMA method identified a single SNP on Chr5:75506197 (p-value = 4.67 × 10(-7)) for SBP and three SNPs on Chr3:23715851 (p-value = 9.00 × 10(-8)), Chr 17:54834217 (p-value = 1.98 × 10(-7)), and Chr21:18744081 (p-value = 4.95 × 10(-7)) for DBP. The Bayesian method identified several additional SNPs on Chr1:17876090 (Bayes factor [BF] = 102), Chr3:197469358 (BF = 69), Chr15:87675666 (BF = 43), and Chr19:41642807 (BF = 33) for SBP. Furthermore, for SBP, we found a single SNP on Chr3:197469358 (BF = 69) that has a strong interaction with age. We further evaluated the performances of the proposed methods by simulations.

Entities: CellLine Chemical Disease Gene Species

Year: 2014 PMID： 25519345 PMCID： PMC4143717 DOI： 10.1186/1753-6561-8-S1-S87

Source DB: PubMed Journal: BMC Proc ISSN： 1753-6561

Background

Genome-wide association studies (GWAS) have been used for examining genetic variants associated with blood pressure and hypertension [1,2]. Because blood pressure changes over time, it is important to collect multiple blood pressure measurements to study time-dependent genetic effects. Genetic Analysis Workshop 18 (GAW18) data included systolic blood pressure (SBP) and diastolic blood pressure (DBP) measurements from a human whole genome sequencing (WGS) study [3]. The study was longitudinal, and the majority of participants had three measurements collected at approximately 5-year intervals. This paper proposes two mixed-effects models for GAW18 longitudinal SBP and DBP data. The first approach extends the EMMA method [4], an efficient mixed-model association-mapping algorithm. EMMA corrects for population structure and genetic relatedness using a kinship similarity matrix. We replace the kinship similarity matrix in EMMA with an estimated correlation matrix for the dependence structure of the multiple measurements from each individual. With this extended approach, hundreds of thousands or even millions of association tests can be performed efficiently. However, this approach tests only one single-nucleotide polymorphism (SNP) at a time and may have low power to map SNPs that interact with each other. Furthermore, it is not straightforward to tweak EMMA software for testing SNP by time interaction, an important question that can be addressed through longitudinal data. To address these concerns, we developed a Bayesian method based on the composite model space framework of Yi et al [5-7]. The proposed method fits multiple SNPs simultaneously. In addition, it allows for SNP-SNP interactions and SNP-time interactions.

Methods

Extended EMMA

For testing association between a given SNP and the longitudinal phenotype, we fit the mixed-effects model where is the phenotype vector of individual i; with μ being the grand mean and being the vector whose elements are all equal to 1; is the design matrix corresponding to nongenetic covariates (e.g., time), and is the associated nongenetic effects; is the numerically coded genotype of individual i and is the corresponding SNP effect. In the model, we assume random effect where is an matrix, and random error . The SNP effect can be tested as versus via the likelihood ratio test. For GWAS or WGS data, this test needs to be performed with a large number of SNPs, which can be computationally intensive if we treat s as the unknowns and estimate them jointly with the fixed effects. EMMA [4] is an efficient algorithm originally developed for GWAS data in which samples are potentially structured. EMMA models the structure effect via a similarity matrix. An R package that implements EMMA can either estimate the similarity matrix using genotype data or take any similarity matrix provided by users. We tweak EMMA for our purpose. We provide EMMA with the following similarity matrix where s are the estimated correlation matrices from model (1) in which is set to 0. The idea of estimating s this way is not new and has been used in EMMAX [8], a fast version of EMMA. These estimates should be reasonable unless some SNPs have large effects, which is rare for most complex traits.

Bayesian multiple QTL mapping

To further identify SNPs interacting with each other and with other nongenetic factors, such as time, we consider the following mixed-effects model where is the design matrix corresponding to nongenetic factors, p putative SNPs, two-way interactions between p SNPs (resulting in total of p(p−1)/2 terms) and other selected SNP-environment interactions (for GAW18 data, we consider p SNP-age interactions); is the vector of all fixed effects. We define the same way as in model (1). The random effects and are also assumed to follow the same distributions as described in model (1). Model (2) includes the effects of all putative SNPs; thus, the number of such effects can be large. To identify SNPs associated with the trait of interest, we use a Bayesian variable selection procedure in which we use a set of latent binary variables to indicate which of the q genetic effects (be they main genetic effects, epistasis effects and/or SNP by environment interactions) are associated or not associated with the trait. As in model (1), we assume matrix is known. We apply the Cholesky decomposition to such that where is the lower triangular Cholesky decomposition matrix of . Then model (2) can be reparameterized as where . We use the same prior distributions for μ, β, , and in Yi et al [7]. We set the prior of to , where is the positive truncated normal density with mean and variance , and both and are prespecified hyperparameters. The proposed method has been implemented upon the widely used R package, R/qtlbim [9] for these GAW18 longitudinal data.

Results and discussion

GAW18 data

The GAW18 data included 849 individuals with both phenotype and imputed genotype data from 20 extended pedigrees. Each sample was measured multiple times on their blood pressures over approximately 5-year intervals. Among these 849 individuals, 139 were genetically unrelated and were measured for age, sex, medication use, smoking status, and blood pressure. Our analysis was restricted to the 139 unrelated individuals. The number of SBP and DBP ranged from one to four for each sample. WGS data provided by the GAW18 data had 8,348,674 SNPs from odd numbered autosomes. All SNPs provided passed the initial quality control checking, but among 2,796,608 SNPs with minor allele frequency (MAF) greater than 0.05, 17,463 of them failed Hardy-Weinberg equilibrium (HWE) test (with p-value<0.05/2,796,608, a Bonferroni corrected genome-wide threshold). We removed all SNPs with MAFs less than 0.05 plus those not passing the HWE test, resulting in 2,779,145 SNPs for the subsequent analyses. To check population outliers and potential population substructure, we generated a subset of SNPs that are not in high linkage disequilibrium (LD) with each other (i.e., ) and performed the multidimensional scaling (MDS) analysis in PLINK [10]. Pairwise scatter plots of the top four MDS scores showed that the 139 individuals are homogeneous in terms of their ethnicities. However, two samples, T2DG0400207 and T2DG0400247, have an estimated IBD value of 0.3 between them, indicating that they are likely related. In our analysis, we retained all 139 samples because the number of putatively related samples is small and their inclusion should have a negligible effect on our analysis results. We applied the two proposed procedures to these filtered GAW18 data on the two log-transformed phenotypes, log(SBP) and log(DBP). Five covariates (age, age2, sex, medication use, and smoking status) were included for analyses. We fitted these data with different covariance matrices in SAS 9.2 and selected the spatial power covariance structure for the downstream analysis based on the AIC criteria. Specifically, we let , where is the time distance between the jth and th examinations for individual i. After obtaining the parameter estimate of ρ, from model (1) with , we substituted the kinship matrix K in EMMA by where . Figure 1(a) displays the Manhattan plots of the two phenotypes from the extended EMMA model. For SBP, one SNP on Chr5:75506197 () reached the genome-wide significance (, a cutoff suggested by Burton et al [11]). For DBP, three SNPs on Chr3:23715851 (), Chr17:54834217 () and Chr21:18744081 () exceeded the genome-wide significance.

Figure 1

Manhattan plots on Genetic Analysis Workshop 18 (GAW18) longitudinal data. (a) Manhattan plots of -log10 (p-value) for systolic blood pressure (SBP) and diastolic blood pressure (DBP) from the extended EMMA. The two dashed horizontal lines represent the genome-wide thresholds for suggestive (p-value = 10−5 and significant (p-value = 5 × 10−7) associations. (b) Manhattan plots of 2 in (BF) for the proposed Bayesian method. Two dashed horizontal lines represent the genome-wide thresholds for moderate (BF = 10) and strong (BF = 30) associations. Because of the limited sample size, it is not feasible to include all available SNPs in our Bayesian analysis. For each phenotype, we selected a list of 3000 top-ranked SNPs that are not highly correlated with each other (with correlation < 0.95 to avoid multicollinearity) from the extended EMMA for the Bayesian analysis. We applied this Bayesian method with the same covariates used in the extended EMMA method. For all analyses, the MCMC algorithm ran for iterations after the first 1000 burn-in iterations were discarded. The chain was then thinned for every 40 iterations, yielding MCMC samples for the posterior analysis. Based on the posterior inclusion probability of each SNP, the Bayes factor (BF) (see [6,7] for details) was estimated and used to judge the importance of each SNP. Figure 1(b) shows the Manhattan plots of for the combined genetic effects of each SNP, which include the main effects, epistasis effects, and SNP-age interactions. We found several additional SNPs with strong signals (BF as suggested by Yandell et al [12]) on Chr1:17876090 (BF = 102), Chr3:197469358 (BF = 69), Chr15:87675666 (BF = 43), and Chr19:41642807 (BF = 33) for SBP. No new SNPs were found for DBP. For SBP, we found one SNP located on Chr3:197469358 (BF = 69) has a strong interaction with age.

Simulations

To evaluate the performances of the proposed methods, we conducted the following simulations. From the 3000 top-ranked SBP SNPs previously selected, we randomly picked up 10 of them that are at least 10 Mb apart as causal SNPs and called them ,...,. Among the 10 causal SNPs, we let 7 of them have only main effects, 2 have an epistasis effect, and 1 have an SNP-age interaction. The estimated correlation matrix along with was used to simulate the random effects s. We set to 1. Specifically, we simulated data according to the following model: where and . A total of 100 simulations were performed. We compared the two proposed methods with each other and with two other existing methods, the original EMMA and R/qtlbim methods. The last two methods only work for univariate data, so we applied them to the simulated data with only first-time measurements used. To make the methods comparable, we generated the receiver operating characteristic (ROC) curve for each method as described later. For a given cutoff of p-value or BF, we calculated the true and false positive findings as follows: a significant finding is claimed to be a true positive finding if it is located less than 1 Mb from any one of the simulated causal SNPs; otherwise the finding is false. The ROC curves with the false-positive rate less than 0.2 are presented in Figure 2. Intuitively, our two methods that used all available data are more powerful than their corresponding univariate analysis methods that only used the first-time data. Furthermore, the Bayesian method is more powerful than the extended EMMA as expected because (a) the Bayesian model allows for SNP-SNP and SNP-age interactions, which are totally ignored by the extended EMMA, and (b) the Bayesian model jointly model multiple SNPs, but the extended EMMA only tests one SNP at a time.

Figure 2

Receiver operating characteristic curves on simulated data. Solid line represents proposed Bayesian method (i.e., modified R/qtlbim) on all data; dashed line, R/qtlbim on first time point data; dotted line, extended EMMA on all data; and dot-dashed line, EMMA with first time point data.

Conclusions

In this paper, we developed two mixed-effects models for the GAW18 longitudinal blood pressure data. The first approach extends the EMMA method. We replace the kinship similarity matrix in EMMA with an estimated correlation matrix for dealing with the dependent structure of the repeated measurements. The second approach is a Bayesian method that models multiple SNPs simultaneously and allows for SNP-SNP interactions and SNP-time interactions. The advantages of the Bayesian method have been clearly demonstrated by our simulations. Both methods are currently developed for unrelated samples. The GAW18 data contained extended pedigrees. Ideally, we should use all available data in our analysis. What complicates the analysis on longitudinal pedigree data is that both the correlation structure of the repeated measurements and the familial correlation structure of related individuals should be considered. We are currently extending the two proposed methods for the GAW18 pedigree data. Furthermore, for both our analyses, we assume that the covariance matrix is known up to a constant. For the Bayesian model, this assumption can be relaxed and we are developing a semiparametric approach where the covariance matrix is assumed unknown. We estimate the unknown covariance matrix with a modified Cholesky decomposition [13]. Last, our Bayesian model for GWAS data relies on a set of preselected putative SNPs. How to select a good set of putative SNPs, especially those with low marginal effects but high interactions with other SNPs or environmental factors is challenging and deserves further investigations.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

WC developed, implemented methods, performed statistical analysis, and drafted the manuscript. FZ designed the study, directed the research, revised the manuscript critically, and gave final approval for publication. All authors read and approved the final manuscript.

11 in total

1. A unified Markov chain Monte Carlo framework for mapping multiple quantitative trait loci.

Authors: Nengjun Yi
Journal: Genetics Date: 2004-06 Impact factor: 4.562

2. Random effects selection in linear mixed models.

Authors: Zhen Chen; David B Dunson
Journal: Biometrics Date: 2003-12 Impact factor: 2.571

3. Bayesian model selection for genome-wide epistatic quantitative trait loci analysis.

Authors: Nengjun Yi; Brian S Yandell; Gary A Churchill; David B Allison; Eugene J Eisen; Daniel Pomp
Journal: Genetics Date: 2005-05-23 Impact factor: 4.562

4. R/qtlbim: QTL with Bayesian Interval Mapping in experimental crosses.

Authors: Brian S Yandell; Tapan Mehta; Samprit Banerjee; Daniel Shriner; Ramprasad Venkataraman; Jee Young Moon; W Whipple Neely; Hao Wu; Randy von Smith; Nengjun Yi
Journal: Bioinformatics Date: 2007-01-19 Impact factor: 6.937

5. An efficient Bayesian model selection approach for interacting quantitative trait loci models with many effects.

Authors: Nengjun Yi; Daniel Shriner; Samprit Banerjee; Tapan Mehta; Daniel Pomp; Brian S Yandell
Journal: Genetics Date: 2007-05-04 Impact factor: 4.562

6. PLINK: a tool set for whole-genome association and population-based linkage analyses.

Authors: Shaun Purcell; Benjamin Neale; Kathe Todd-Brown; Lori Thomas; Manuel A R Ferreira; David Bender; Julian Maller; Pamela Sklar; Paul I W de Bakker; Mark J Daly; Pak C Sham
Journal: Am J Hum Genet Date: 2007-07-25 Impact factor: 11.025

7. Efficient control of population structure in model organism association mapping.

Authors: Hyun Min Kang; Noah A Zaitlen; Claire M Wade; Andrew Kirby; David Heckerman; Mark J Daly; Eleazar Eskin
Journal: Genetics Date: 2008-03 Impact factor: 4.562

8. Genome-wide association study of blood pressure extremes identifies variant near UMOD associated with hypertension.

Authors: Sandosh Padmanabhan; Olle Melander; Toby Johnson; Anna Maria Di Blasio; Wai K Lee; Davide Gentilini; Claire E Hastie; Cristina Menni; Maria Cristina Monti; Christian Delles; Stewart Laing; Barbara Corso; Gerjan Navis; Arjan J Kwakernaak; Pim van der Harst; Murielle Bochud; Marc Maillard; Michel Burnier; Thomas Hedner; Sverre Kjeldsen; Björn Wahlstrand; Marketa Sjögren; Cristiano Fava; Martina Montagnana; Elisa Danese; Ole Torffvit; Bo Hedblad; Harold Snieder; John M C Connell; Morris Brown; Nilesh J Samani; Martin Farrall; Giancarlo Cesana; Giuseppe Mancia; Stefano Signorini; Guido Grassi; Susana Eyheramendy; H Erich Wichmann; Maris Laan; David P Strachan; Peter Sever; Denis Colm Shields; Alice Stanton; Peter Vollenweider; Alexander Teumer; Henry Völzke; Rainer Rettig; Christopher Newton-Cheh; Pankaj Arora; Feng Zhang; Nicole Soranzo; Timothy D Spector; Gavin Lucas; Sekar Kathiresan; David S Siscovick; Jian'an Luan; Ruth J F Loos; Nicholas J Wareham; Brenda W Penninx; Ilja M Nolte; Martin McBride; William H Miller; Stuart A Nicklin; Andrew H Baker; Delyth Graham; Robert A McDonald; Jill P Pell; Naveed Sattar; Paul Welsh; Patricia Munroe; Mark J Caulfield; Alberto Zanchetti; Anna F Dominiczak
Journal: PLoS Genet Date: 2010-10-28 Impact factor: 5.917

9. Genome-wide association study of blood pressure and hypertension.

Authors: Daniel Levy; Georg B Ehret; Kenneth Rice; Germaine C Verwoert; Lenore J Launer; Abbas Dehghan; Nicole L Glazer; Alanna C Morrison; Andrew D Johnson; Thor Aspelund; Yurii Aulchenko; Thomas Lumley; Anna Köttgen; Ramachandran S Vasan; Fernando Rivadeneira; Gudny Eiriksdottir; Xiuqing Guo; Dan E Arking; Gary F Mitchell; Francesco U S Mattace-Raso; Albert V Smith; Kent Taylor; Robert B Scharpf; Shih-Jen Hwang; Eric J G Sijbrands; Joshua Bis; Tamara B Harris; Santhi K Ganesh; Christopher J O'Donnell; Albert Hofman; Jerome I Rotter; Josef Coresh; Emelia J Benjamin; André G Uitterlinden; Gerardo Heiss; Caroline S Fox; Jacqueline C M Witteman; Eric Boerwinkle; Thomas J Wang; Vilmundur Gudnason; Martin G Larson; Aravinda Chakravarti; Bruce M Psaty; Cornelia M van Duijn
Journal: Nat Genet Date: 2009-05-10 Impact factor: 38.330

10. Data for Genetic Analysis Workshop 18: human whole genome sequence, blood pressure, and simulated phenotypes in extended pedigrees.

Authors: Laura Almasy; Thomas D Dyer; Juan M Peralta; Goo Jun; Andrew R Wood; Christian Fuchsberger; Marcio A Almeida; Jack W Kent; Sharon Fowler; Tom W Blackwell; Sobha Puppala; Satish Kumar; Joanne E Curran; Donna Lehman; Goncalo Abecasis; Ravindranath Duggirala; John Blangero
Journal: BMC Proc Date: 2014-06-17

5 in total