Literature DB >> 23665878

Generalized admixture mapping for complex traits.

Bin Zhu¹, Allison E Ashley-Koch, David B Dunson.

Abstract

Admixture mapping is a popular tool to identify regions of the genome associated with traits in a recently admixed population. Existing methods have been developed primarily for identification of a single locus influencing a dichotomous trait within a case-control study design. We propose a generalized admixture mapping (GLEAM) approach, a flexible and powerful regression method for both quantitative and qualitative traits, which is able to test for association between the trait and local ancestries in multiple loci simultaneously and adjust for covariates. The new method is based on the generalized linear model and uses a quadratic normal moment prior to incorporate admixture prior information. Through simulation, we demonstrate that GLEAM achieves lower type I error rate and higher power than ANCESTRYMAP both for qualitative traits and more significantly for quantitative traits. We applied GLEAM to genome-wide SNP data from the Illumina African American panel derived from a cohort of black women participating in the Healthy Pregnancy, Healthy Baby study and identified a locus on chromosome 2 associated with the averaged maternal mean arterial pressure during 24 to 28 weeks of pregnancy.

Entities: Chemical Disease Gene Species

Keywords: generalized linear model; local ancestry; mapping by admixture linkage disequilibrium; quadratic normal moment prior; quantitative traits

Mesh：

Year: 2013 PMID： 23665878 PMCID： PMC3704244 DOI： 10.1534/g3.113.006478

Source DB: PubMed Journal: G3 (Bethesda) ISSN： 2160-1836 Impact factor: 3.154

Admixture mapping, also known as mapping by admixture linkage disequilibrium, has become an important tool for localizing disease genes. A number of admixture mapping studies have successfully identified candidate loci associated with common complex traits and biomarkers (Reich ; Zhu ; Freedman ; Kao ; Yang ). As a genome-wide association approach, admixture mapping aims to identify susceptibility loci, which confer risk or are linked with other loci harboring risk variants, for complex-traits that have different prevalences between ancestral populations (McKeigue 2005; Winkler ). In recently admixed populations, such as African Americans or Hispanic Americans, the chromosome resembles a mosaic of ancestry blocks, with alleles inherited together from one ancestral population within each block. The ancestral populations have different risks for the trait, which is assumed to be due in part to frequency differences in risk variants. The block containing the risk variant is more likely to have originated from the high-risk ancestral population than the low-risk ancestral population. Hence, detecting the association between ancestry block and trait helps us to localize the susceptibility loci. The ancestral status of a block at a specific genomic region, or local ancestry, is unobserved and can be estimated based on ancestry informative markers (AIMs), such as single-nucleotide polymorphisms (SNPs), which vary in frequency across ancestral populations. AIMs tag the status of an ancestry block, similar to that of tagSNPs, which are used to characterize common haplotypes in a chromosomal region. In the African-American population, the linkage disequilibrium due to admixture extends for a much wider region than the linkage disequilibrium between haplotypes (Smith ; Patterson ). Hence, compared with the tagSNP-based genome-wide association study, admixture mapping requires many fewer markers to tag the whole genome and therefore increases the detection power at a reduced resolution, which is still greater than linkage analysis (Patterson ; Smith and O’Brien 2005). Moreover, admixture mapping is less vulnerable to allelic heterogeneity because it relies on local ancestry instead of alleles directly. Given the local ancestries of each individual, several hypothesis testing-based approaches have been proposed to test, one locus at a time, the null hypothesis that the AIM is unlinked to the complex-trait/disease for a dichotomous trait within a case-control study design. McKeigue (1998) proposed a test for gametic disequilibrium between an AIM locus and the trait locus, conditional on the parental admixture. Patterson suggested a Bayesian likelihood ratio test, comparing the likelihood under the alternative hypothesis (a given AIM locus is associated with the trait) vs. the one under the null hypothesis, for cases and controls respectively. Zhu described a Z-score statistic, similar to the one proposed by Montana and Pritchard (2004), for testing the estimated local ancestry proportion is equal to one under the null hypothesis for case-control and case-only studies. In contrast, few methods are proposed for the quantitative traits and to consider multiple loci simultaneously while adjusting for other risk factors. To apply the aforementioned admixture methods primarily developed for a dichotomous trait, the common practice has been to dichotomize subjects with the least and greatest q% (e.g., 20%) of the quantitative trait value as cases and controls; The remaining subjects with in-between quantitative trait values are then discarded (Reich ; Cheng ; Scherer ). In addition, ADMIXMAP (Hoggart ) has been proposed for quantitative traits based on generalized linear model, which is also used by Basu and Zhu for one locus at a time. However, complex traits are commonly caused by joint effects of the multiple genes and other risk factors, such as age, sex, and smoking status. Investigating the association between AIM loci and a trait, one locus a time, without considering other loci or risk factors may capture a rather small proportion of joint effects and will possibly lead to inconsistent conclusions. Similar considerations have been addressed in association mapping using shrinkage priors (Wu ; Guan and Stephens 2011). With these motivations, we propose regression-based generalized admixture mapping (GLEAM) for both quantitative and qualitative traits. The new approach is able to examine the association between the complex trait and single or multiple loci simultaneously while also adjusting for other risk factors. GLEAM is based on generalized linear models (GLMs) (McCullagh and Nelder 1989), with linear regression for continuous traits, logistic regression for binary (e.g., case-control) traits and Poisson regression for count traits. The predictors in GLM include local ancestries at the given AIM loci and other risk factors. The local ancestry is defined as the number of alleles from the high-risk ancestral population, for example, 0, 1, or 2 alleles from African ancestry at a given AIM locus. The association examined in GLEAM can be adjusted by other risk factors. We assume for complex genetic traits that most loci have no association with the trait, a few loci may have small to modest association (e.g., odds ratio <2 for binary traits), and the loci with greater proportions of disease-causing alleles from the high-risk population would possibly have stronger association with the traits. This prior knowledge is incorporated into GLEAM by using a quadratic normal moment (QNM) prior (Johnson and Rossell 2010) for the coefficients in GLM (see Material and Methods) with the benefit of reducing the type I error while increasing the power, as demonstrated by the simulations in Results. The number of AIMs (1500~3000) (Smith ) is usually larger than the number of study subjects, and keeps increasing (>4000) (Tandon ) with advances due to the HapMap project (The International Hapmap Consortium 2005) and commercially available genome-wide SNP arrays. It is not feasible to consider loci all together simultaneously due to the “curse of dimensionality” (Bellman 1961). Rather, we propose a two-stage approach: in the first stage, we examine the association between local ancestries with the trait for one locus at a time and select a small subset of susceptibility loci; in the second stage, the associations between the various combinations of these selected loci and the trait are evaluated and the most significant ones are reported. The associations in both steps are assessed by the Bayes factor (BF), the ratio between the likelihood of observed traits under the alternative hypothesis (presence of association between single or multiple loci with traits) and that under the null hypothesis (lack of association). Different from the association mapping based on the SNPs that are directly measured, the local ancestries are unobserved and are inferred on the basis of the AIMs via use of the Hidden Markov Model (HMM) detailed in the Appendices. At each AIM locus, the number of alleles from the high-risk ancestral population is imputed multiple times for every subject, using an Markov chain Monte Carlo (MCMC) algorithm. By using the multiple imputed datasets of local ancestries, we are able to assess the association between the traits and local ancestries directly while taking imputation uncertainty into account through Bayesian averaging. Importantly, our multiple imputation approach preserves the admixture linkage disequilibrium between the AIM loci, which is crucial for multilocus admixture mapping in GLEAM. In addition, GLEAM can also use the local ancestries sampled by other local ancestry inferring methods, such as HAPMIX (Price ).

Material and Methods

Generalized linear model with QNM prior

GLEAM is a regression method that extends the current approaches in various ways. The most obvious extension is to accommodate both quantitative and qualitative traits y through a generalized linear model with the ability to adjust for covariates = (E1, E2, …, E)′. Specifically, we use the linear model for continuous traits,and the logistic model for dichotomous traits,where p local ancestries = (S1, S2, …, S)′ are considered and centered to have mean zero, = (β1, β2, …, β)′ and = (α1, α2, …, α)′ are the regression coefficients for and respectively, and . We use the Bayes factor to assess the admixture association between local ancestries and the trait of interest. The Bayes factor is the ratio of the marginal likelihoods under the alternative hypothesis, H1: β ≠ 0 for j = 1, …, p, and null hypothesis, H0: β = 0 for j = 1, …, p. Marginal likelihoods remove the parameters from the likelihood by integrating over the prior distribution. The larger the Bayes factor, the stronger the evidence in favor of H1. As a prior distribution for β under H1, we use the QNM prior having densitywhere is the p-dimensional multivariate normal distribution with the mean vector and covariance matrix , and τ is the dispersion parameter. The QNM prior is able to incorporate the case with a large number of loci of tiny effect. As shown in Figure 1A, the modes of the prior distribution will move toward zero when we reduce the value of τ. For illustration purposes, we only showed a particular value of τ = 0.01, but as we decrease this value, tiny effects are accommodated. For data containing a large number of loci of tiny effect, the empirical Bayes approach should estimate a very small value, and the QNM prior will concentrate on very small effect sizes. Usual priors face major problems in distinguishing the signal from the noise, and we argue that nonlocal priors such as the quadratic normal provide more accurate results for genetic effects on complex traits. Hence, The QNM prior increases the evidence in favor of both the true null and true alternative hypothesis, compared to other prior distributions (e.g., intrinsic and Cauchy priors) (Johnson and Rossell 2010). Moreover, we specify σ2Σ as the covariance matrix of the (iterative weighted) least square estimation of in the GLM. This choice not only leads to convenient computation but also easily incorporates the prior knowledge about the effect of local ancestry on the trait. For example, when is orthogonal to , Σ = ()−1 with = [1, 2, …, ]′ in the linear model for the continuous trait. As illustrated by the right panel of Figure 1, the QNM prior with Σ = ()−1 suggests that for each locus, the greater the proportion of alleles from the high-risk population (p), on average the larger the risk effect of local ancestry. Such relationships frequently are observed in admixture mapping but not in association mapping based on SNPs in general. More importantly, when we investigate multiple loci simultaneously, it is crucial to take the correlation (linkage disequilibrium, LD) between the local ancestries into consideration. Figure 2 plots several volcano-shaped bivariate QNM densities for various correlations between two local ancestries. It is clear that for two loci with admixture linkage equilibrium (as shown in Figure 2A), such as two loci on different chromosomes, their risk effects would be independent; and that for two loci with high admixture LD (as shown in Figure 2D), usually located in the same gene, they would have similar risk effects.

Figure 1

Figure 2

Bivariate quadratic normal moment prior with τσ2 = 0.1 and Σ = ()−1, where = [1, 2]′, 1 = (S1,1, S1,2, …, S1000,1)′, 2 = (S1,2, S2,2, …, S1000,2)′ and S1 ∈ {0, 1, 2} and S2 ∈ {0, 1, 2}. We introduce correlation between S1 and S2 through the latent variables (Z1, Z2), where , and Cov(Z1, Z2) = ρ. let S1 = 0 if Z1 ≤ C0; S1 = 2 if Z1 > C1; and S1 = 0 otherwise with C0 = Φ−1((1−p)2) and where Φ−1(⋅) denotes normal inverse cumulative distribution function. We consider four scenarios when p = 0.8: (A) ρ = 0; (B) ρ = 0.25; (C) ρ = 0.5; and (D) ρ = 0.75 with contours drawn beneath the probability density function’s surface.

Univariate quadratic normal moment prior (A) for τ = 0.01 (—), τ = 0.05 (⋅⋅⋅), and τ = 0.1 (−⋅−) when p = 0.8; (B) for p = 0.8 (—), p = 0.9 (…), and p = 0.99 (−⋅−) when τ = 0.01. In both cases, σ2 = 1 and with Pr(S = 0) = (1−p)2, Pr(S = 1) = 2p(1−p) and . Bivariate quadratic normal moment prior with τσ2 = 0.1 and Σ = ()−1, where = [1, 2]′, 1 = (S1,1, S1,2, …, S1000,1)′, 2 = (S1,2, S2,2, …, S1000,2)′ and S1 ∈ {0, 1, 2} and S2 ∈ {0, 1, 2}. We introduce correlation between S1 and S2 through the latent variables (Z1, Z2), where , and Cov(Z1, Z2) = ρ. let S1 = 0 if Z1 ≤ C0; S1 = 2 if Z1 > C1; and S1 = 0 otherwise with C0 = Φ−1((1−p)2) and where Φ−1(⋅) denotes normal inverse cumulative distribution function. We consider four scenarios when p = 0.8: (A) ρ = 0; (B) ρ = 0.25; (C) ρ = 0.5; and (D) ρ = 0.75 with contours drawn beneath the probability density function’s surface. Under the QNM prior for β, the Bayes factor is simplywhere , is the maximum likelihood estimate of , adjusted by other risk covariates when necessary, is the corresponding covariance matrix estimate and and are the empirical Bayes estimates. Bayes factor (3) is used to identify the loci associated with the traits, detailed as follows.

Generalized admixture mapping procedure

We propose a two-stage approach for GLEAM. In the first stage, we examine the marginal association between a single AIM locus and the trait, using the Bayes factors (3), one locus a time for J AIM loci. The loci at which log10BF(y) > δ are considered susceptibility loci. Although the “one locus a time” approach explores the marginal association and is widely used, marginal association only reflects part of the relationship between the AIM loci and the trait. Several loci in different regions may show associations with the trait. Thus, it is desirable to quantify the evidence for joint association of multiple loci with the trait. For this reason, in the second stage, we list all possible combinations of susceptibility loci selected in the first stage. For each set of susceptibility loci, we can again calculate the Bayes factors for the joint association at those loci simultaneously. The most significant ones are reported. The local ancestries at the AIM loci are unobserved and imputed from the HMM. The imputation uncertainty could be properly accounted for by calculating weighted average of the Bayes factors for each imputed local ancestry dataset, which is similar to the strategy used by Guan and Stephens (2008) in imputation-based association mapping for testing untyped variants.

Simulation studies

We carried out simulation studies to assess the performance of GLEAM in terms of type I error rate and power under various scenarios and compared it with the method based on Bayesian likelihood ratio (BLR) by Patterson , which is implemented by the software ANCESTRYMAP (http://genepath.med.harvard.edu/∼reich/Software.htm) as well as regularized regression methods Lasso and elastic net (Tibshirani 1996; Zou and Hastie 2005; Friedman ). GLEAM and ANCESTRYMAP use slightly different HMMs to impute the local ancestries and regularized regression methods require given local ancestries. Because of these differences, we assumed the true local ancestries were given and focused on evaluating the ability of localizing susceptibility loci instead of estimating local ancestries. Our simulations were based on empirical data of local ancestries for 1001 African-American subjects from the HPHB Study (Miranda ), with 1296 AIM loci measured across the genome. The MATLAB codes for simulating and analyzing the data are included in a Supporting Information folder online. We started by investigating the type I error rates for the local ancestries that were scattered around different regions of the genome and in linkage equilibrium. Under this scenario, the falsely localized AIM locus would be in the region remote from the true disease causing locus, which leads to a false positive finding. We first randomly sampled 1000 AIM loci with replacement from 1296 AIM loci for 1000 subjects. At each AIM locus, we simulated the local ancestries measured by the number of alleles from the African ancestral population from their maximum a posteriori (MAP) frequency estimates under the assumption of Hardy-Weinberg equilibrium. Ten sets of trait data were then generated such that we were able to assess the type I error rates under the genome-wide threshold level (e.g., α = 10−4), by using the following null model for continuous traits: y = αE + ε and for binary traits, logit{Prob(y = 1)} = αE; where the continuous risk covariate E and the measurement error ε followed standard normal distributions. We considered two situations whereby α = 0 in the absence of a covariate effect and α = 1 in the presence of a covariate effect. We next examined power under the single locus alternative models. We simulated 100 sets of traits. Each set included 1000 subjects and one disease associated local ancestry whose location was randomly sampled from 259 AIM loci, where the proportion of African ancestral population ranged from 0.8321 to 0.8817 and was on the top 20% percentile among 1296 AIM loci. Given the local ancestry S, continuous covariates and measurement error ε generated same as that for the null model, continuous traits were simulated from y = αE + βS + ε and binary traits from logit{Prob(y = 1)} = αE + βS. Under both models, the β was specified as β = c × proportion of African ancestral population which reflected the a priori observation that the locus with the larger proportion of the high-risk ancestral (here African American) population usually demonstrated stronger association with the traits. For continuous traits, we chose the values of effect size multiplier c as 0.2, 0.25, 0.3, 0.35, and 0.4 respectively, with the largest possible effect size equal to 0.3527. Similarly, we picked the c values as 0.4, 0.5, 0.6, 0.7, and 0.8 for binary traits with the largest possible odds ratio equal to 1.8537. We further considered a multilocus alternative model where two local ancestries were associated with the traits and there existed admixture linkage disequilibrium. To do so, we generated an artificial chromosome composed of two pieces from chromosome 1 and chromosome 4 with the length 139.50 Mb and 114.88 Mb, respectively, for 1000 subjects, based on empirical data on local ancestries from HPHB study. In the middle of each chromosome piece with 51 loci, there is one locus whose proportion of African ancestry population was among the highest in all 1296 AIM loci. In the simulations, those two loci are assumed to be associated with traits. We generated 100 sets of continuous and binary traits respectively, each of which was simulated similarly to the single locus alternative model except with two local ancestries involved and both effect size multiplier c values set at 0.7 for continuous traits and 0.35 for binary traits. The simulated datasets were analyzed by the GLEAM and the BLR method. Because the BLR method was primarily developed for binary traits, the BLR method required transformation of continuous traits into binary ones, such as defining the subjects with top 20% traits as the cases and the one with bottom 20% traits as controls.

Results

Figure 3 presents the empirical type I error rates for both the binary and continuous traits, with or without covariate effects. For the GLEAM and the BLR methods, we chose a threshold of 2 for log10BF(y) to control the genome-wide type I error rates. The regularization parameters of Lasso and elastic net are chosen with the minimal cross validation error. The loci with nonzero regression coefficients are regarded as the ones associated with the traits. As illustrated in Figure 3A and Figure 3B, under the null model that all the local ancestries are in linkage equilibrium, the type I error rate is controlled at a low level with the median around 5 × 10−4 for GLEAM and 4.2 × 10−3 for the BLR method. In both cases, those type I error rates seem overly conservative. However, in the application to real data, slight admixture linkage disequilibrium between the AIM loci will significantly inflate the type I error rate close to the nominal levels (i.e., α = 0.05 or 0.005), which is discussed in the later paragraphs. Comparing Figure 3A and Figure 3B reveals that the type I error rates of GLEAM are consistently smaller than those of the method based on BLR and are little affected by the presence of covariate effects when properly adjusted. The covariates are not considered by the BLR method and have a mixed effect on type I error rates, where the median is slightly reduced with the maximal type I error rates increased. For the regularized regression methods Lasso and elastic net, the type I errors are significantly inflated, as shown in Figure 3C and Figure 3D. In addition, when a nonzero covariate presents, the type I errors will further increase.

Figure 3

The type I error rates under the null model (note the different scaling of the Y-axis for panels). The type I error rates are presented for both the binary and continuous traits respectively, with or without covariate effect (denoted by E1 and E0, respectively). For each simulated dataset, we calculate one type I error rate for each method. The results for 100 replications are summarized by the box plots, where the center bar is median, bottom and top of the box are the 25th and 75th percentile and the whiskers stretch out until the extreme values. (A) Generalized admixture mapping; (B) Method based on BLR; (C) Regularized regression with Lasso; (D) Regularized regression with elastic net. Power of the methods also was evaluated for binary and continuous traits under the single locus alternative model, with or without covariate effects. We considered various effect sizes of local ancestries with the results shown in Figure 4. For the binary trait, when the effect size is small, the BLR method performs better with larger power. With the increment of the effect sizes, GLEAM gradually outperforms the BLR method. For both methods, covariates have moderate effects on power, which is more obvious for the smaller effect sizes. For the continuous trait, the GLEAM performs significantly better at each effect size. These results were expected since the BLR method discards part of the dataset in order to transform the continuous trait into the binary one (case vs. control), which inevitably loses power. For all situations considered, the power of the GLEAM approach increases with the increment of the local ancestry effect size, most rapidly when the effect sizes are smaller and then levels off with larger effect sizes. In comparison, the power of the BLR method increases roughly linearly. Both GLEAM and BLR are less powerful than the regularized methods especially when the effect sizes are small. With the growth of the effect size, the power of GLEAM will quickly increase and be comparable to the ones of regularized regression.

Figure 4

Powers for single locus alternative models. Power is calculated for each dataset with 100 replications total for the binary or continuous traits simulated under the single locus alternative model with or without covariate effect. The × indicates the median of powers by the GLEAM and•denotes the median of powers by the method based on Bayesian likelihood ratio; ∘ denotes the median of regularized regression with lasso; ∘ denotes the median of regularized regression with elastic net. The whiskers on each bar represent the minimal and maximal powers respectively. The effect sizes of local ancestries are equal to the multiplication of effect size multiplier c and the proportion of African ancestry population. (A) Binary traits without covariate effect; (B) Binary traits with covariate effect; (C) Continuous traits without covariate effect; (D) Continuous traits with covariate effect. To understand the impact of admixture linkage disequilibrium on type I error rates and to evaluate the ability of localizing multiple loci simultaneously, we generated a set of artificial chromosomes as described previously, where two loci were associated with the traits, named as Locus 1 and Locus 2. Besides Locus 1 and Locus 2, we divided the remaining loci into three regions: region 1 (REG1) with 42 loci and region 2 (REG2) with 35 loci, where the admixture linkage disequilibrium measured by the correlation coefficient between a given locus at these regions and Locus 1 or Locus 2 was larger than 0.12 respectively; and region 3 (REG3), the unassociated loci which did not belong to region 1 and region 2. Strictly speaking, the identified loci except Locus 1 and Locus 2 were all false positives. However, in contrast to the loci found in region 3, which were completely false findings, the loci identified in Region 1 and Region 2 were partially correct and could be regarded as low-resolution findings instead, because the true associated locus did exist in the nearby region. Therefore, we evaluated the false positives in three regions separately. An ideal method under the prespecified genome-wide threshold would lead to few completely false positives in region 3 and to a small number of partially false positives in regions 1 and 2, while being able to identify the true associated loci with high frequency. Table 1 summarizes the frequencies of identified loci for each locus or locus combination at different regions by GLEAM, BLR and regularized regression methods. For the GLEAM method, we applied the two-step approach outlined in the “Generalized admixture mapping procedure” subsection. The results by applying the first step only (GLEAM1) and by applying the two-step approach (GLEAM2) were both presented. For binary traits, both the BLR method and GLEAM1 could localize both Locus 1 and Locus 2 with high power. The type I error rates in region 1 were around the nominal level (0.025 and 0.003, respectively). The type I error rates in region 1 and region 2 were higher than the ones in region 3, which would decrease the resolution of the finding. Compared with GLEAM1, further applying the second step of generalized admixture mapping procedure (GLEAM2) could significantly improve the resolution by reducing the type I errors in region 1 (from 0.013 to 0.002) and region 2 (from 0.014 to 0.003). For continuous traits, GLEAM2 also performed best with much higher power and lower type I rate than the BLR method. Similar to the simulation results under null and single locus alternative model, regularized regressions show marginally higher power at the cost of inflated type I error rate, e.g., power 1 for detecting both locus 1 and 2 with type I error rates 0.023 of Lasso and 0.029 of elastic net at region 3 for the continuous trait.

Table 1

The frequency of identified loci for each locus or locus combination at different regions of the artificial chromosome

Trait	Method	REG1	REG2	REG3	Locus1	Locus2	Locus1/2^a
Binary	BLR	0.103	0.047	0.025	0.000	0.000	1.000
	GLEAM1^b	0.013	0.014	0.003	0.020	0.020	0.960
	GLEAM2^c	0.002	0.003	0.001	0.030	0.030	0.940
	Lasso	0.030	0.025	0.017	0.000	0.000	1.000
	Elatic net	0.045	0.038	0.025	0.000	0.000	1.000
Continuous	BLR	0.035	0.018	0.011	0.030	0.400	0.560
	GLEAM1	0.021	0.017	0.004	0.030	0.000	0.970
	GLEAM2	0.004	0.003	0.002	0.040	0.000	0.960
	Lasso	0.039	0.031	0.023	0.000	0.000	1.000
	Elatic net	0.049	0.037	0.029	0.000	0.000	1.000

BLR, Bayesian likelihood ratio; GLEAM, generalized admixture mapping.

The combination of Locus 1 and Locus 2.

Applying the first step of generalized admixture mapping procedure only;

Applying both steps of generalized admixture mapping procedure;

BLR, Bayesian likelihood ratio; GLEAM, generalized admixture mapping. The combination of Locus 1 and Locus 2. Applying the first step of generalized admixture mapping procedure only; Applying both steps of generalized admixture mapping procedure;

Application

We applied our approach to data from the Healthy Pregnancy, Healthy Baby (HPHB) study, which is a prospective cohort study of pregnant women aimed at identifying genetic, social, and environmental contributors to disparities in adverse birth outcomes in the Southern United States (Miranda ). Consistent with previous studies, African-American women in HPHB have greater risk for maternal hypertension than white women during the pregnancy, which contributes to the poor birth outcomes (Allen ). Even within the African-American subpopulation, some women have much greater blood pressure, and we hypothesize that one possible contributor may be the percentage of African ancestry. To explore this hypothesis, we applied GLEAM to investigate the association between the averaged maternal mean arterial pressure (MAP), defined as (1/3 × systolic blood pressure) + (2/3 × systolic blood pressure), during 24 to 28 weeks of pregnancy and local ancestries among these pregnant African-American women. Clinical and genetic data were available for 1004 non-Hispanic black women. A total of 1509 SNP AIMs were genotyped using the Illumina African-American admixture panel. After quality control measures described previously (A. E. Ashley-Koch, Me. E. Garrett, S. Edwards, K. S. Quinn, G. K. Swamy, and M. L. Miranda, unpublished results), the dataset consisted of 1001 non-Hispanic black women with 1296 AIMs. The proposed GLEAM approach was applied to this dataset to identify the local ancestry associated with the averaged maternal MAP, a continuous trait, while we adjusted for mother’s age. The local ancestries were multiply imputed based on the HMM. We first examined the marginal association between the trait and local ancestries, one locus a time. The results are summarized in Figure 5, where one local ancestry on the chromosome 2 was identified with its log10(Bayes factor) = 2.05 exceeding the threshold 2. With only one local ancestry localized, the second step of the generalized admixture mapping procedure was unnecessary. The same data were analyzed by the BLR method, which treated the subjects with averaged maternal MAP more than 93.67 (top 20% quantile) as cases and the ones with averaged maternal MAP less than 79.33 (bottom 20% quantile) as control. No local ancestry was identified as being associated with the averaged maternal MAP with this approach, presumably due to its relatively low power compared with the GLEAM approach.

Figure 5

Manhattan plot of log10(Bayes factor) for the association between the averaged maternal MAP during 24 to 28 weeks of pregnancy and genome-wide local ancestries among 1001 African-American subjects.

Discussion

When the admixture linkage disequilibrium is used, admixture mapping is an indispensable tool to localize the alleles that are associated with the qualitative or quantitative traits and diseases that vary in prevalence across the ancestral populations. In this article, we propose a flexible and powerful generalized admixture mapping approach, which is based on the generalized linear model and is able to incorporate admixture prior information by using the quadratic normal moment prior and to adjust for covariates. The proposed method is applicable to both qualitative and quantitative traits with satisfactory power while controlling the type I error rates at a low level, and is able to be easily implemented as we demonstrated with our HPHB example. In addition to the flexibility to handle different types of traits, other attractive generalizations include consideration of multiple loci simultaneously. It is known that admixture linkage disequilibrium extends much further than haplotype linkage disequilibrium. Consequently, if we only examine one locus a time, the local ancestries which are highly correlated to the true disease associated local ancestry tend to be identified as significant ones as well. As demonstrated by the simulations, those false positives can be significantly reduced by considering multiple susceptible loci simultaneously, which reduce the type I error rates and improve the mapping resolution. In addition, GLEAM specifies a hidden Markov model treating the recombination rates varying across the genome, which allows us to infer the recombination “hotspots” in admixture population. Moreover, within the generalized linear model framework, it is straightforward to extend the current method to populations with more than to two ancestral populations, such as Hispanic populations, by adding extra ancestry population covariates. It is also easy to consider the interaction between the local ancestries and covariates with the properly specification of the priors on interaction coefficients.

29 in total

1. Control of confounding of genetic associations in stratified populations.

Authors: Clive J Hoggart; Eteban J Parra; Mark D Shriver; Carolina Bonilla; Rick A Kittles; David G Clayton; Paul M McKeigue
Journal: Am J Hum Genet Date: 2003-06 Impact factor: 11.025

2. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies.

Authors: Daniel Falush; Matthew Stephens; Jonathan K Pritchard
Journal: Genetics Date: 2003-08 Impact factor: 4.562

3. Linkage analysis of a complex disease through use of admixed populations.

Authors: Xiaofeng Zhu; Richard S Cooper; Robert C Elston
Journal: Am J Hum Genet Date: 2004-05-06 Impact factor: 11.025

Review 4. Prospects for admixture mapping of complex traits.

Authors: Paul M McKeigue
Journal: Am J Hum Genet Date: 2004-11-11 Impact factor: 11.025

5. A haplotype map of the human genome.

Authors:
Journal: Nature Date: 2005-10-27 Impact factor: 49.962

6. Admixture mapping identifies 8q24 as a prostate cancer risk locus in African-American men.

Authors: Matthew L Freedman; Christopher A Haiman; Nick Patterson; Gavin J McDonald; Arti Tandon; Alicja Waliszewska; Kathryn Penney; Robert G Steen; Kristin Ardlie; Esther M John; Ingrid Oakley-Girvan; Alice S Whittemore; Kathleen A Cooney; Sue A Ingles; David Altshuler; Brian E Henderson; David Reich
Journal: Proc Natl Acad Sci U S A Date: 2006-08-31 Impact factor: 11.205

7. Genome-wide association analysis by lasso penalized logistic regression.

Authors: Tong Tong Wu; Yi Fang Chen; Trevor Hastie; Eric Sobel; Kenneth Lange
Journal: Bioinformatics Date: 2009-01-28 Impact factor: 6.937

8. Admixture mapping of an allele affecting interleukin 6 soluble receptor and interleukin 6 levels.

Authors: David Reich; Nick Patterson; Vijaya Ramesh; Philip L De Jager; Gavin J McDonald; Arti Tandon; Edwin Choy; Donglei Hu; Bani Tamraz; Ludmila Pawlikowska; Christina Wassel-Fyr; Scott Huntsman; Alicja Waliszewska; Elizabeth Rossin; Rongling Li; Melissa Garcia; Alexander Reiner; Robert Ferrell; Steve Cummings; Pui-Yan Kwok; Tamara Harris; Joseph M Zmuda; Elad Ziv
Journal: Am J Hum Genet Date: 2007-03-08 Impact factor: 11.025

9. Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors: Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal: J Stat Softw Date: 2010 Impact factor: 6.440

10. Methods for high-density admixture mapping of disease genes.

Authors: Nick Patterson; Neil Hattangadi; Barton Lane; Kirk E Lohmueller; David A Hafler; Jorge R Oksenberg; Stephen L Hauser; Michael W Smith; Stephen J O'Brien; David Altshuler; Mark J Daly; David Reich
Journal: Am J Hum Genet Date: 2004-04-14 Impact factor: 11.025

2 in total

1. ALDsuite: Dense marker MALD using principal components of ancestral linkage disequilibrium.

Authors: Randall C Johnson; George W Nelson; Jean-Francois Zagury; Cheryl A Winkler
Journal: BMC Genet Date: 2015-03-07 Impact factor: 2.797

2. Trans-ethnic genome-wide association studies: advantages and challenges of mapping in diverse populations.

Authors: Yun R Li; Brendan J Keating
Journal: Genome Med Date: 2014-10-31 Impact factor: 11.117

2 in total