Literature DB >> 27980648

Comparing strategies for combined testing of rare and common variants in whole sequence and genome-wide genotype data.

Dörthe Malzahn1, Stefanie Friedrichs1, Heike Bickeböller1.   

Abstract

We used our extension of the kernel score test to family data to analyze real and simulated baseline systolic blood pressure in extended pedigrees. We compared the power for different kernels and for different weightings of genetic markers. Moreover, we compared the power of rare and common markers with 3 strategies for joint testing and on marker panels with different densities. Marker weights had much greater influence on power than the kernel chosen. Inverse minor allele frequency weights often increased power on common markers but could decrease power on rare markers. Furthermore, defining the gene region based on linkage disequilibrium blocks often yielded robust power of joint tests of rare and common markers.

Entities:  

Year:  2016        PMID: 27980648      PMCID: PMC5133495          DOI: 10.1186/s12919-016-0042-9

Source DB:  PubMed          Journal:  BMC Proc        ISSN: 1753-6561


Background

The kernel score test is a global covariate-adjusted multilocus procedure that tests for overall association of sets of markers (see Schaid [1] for a review). This reduces the multiple-testing burden. Tested marker sets can, for example, belong to a pathway or candidate gene. The kernel score test can be applied to common and rare variants alike, as well as to data of genome-wide association studies (GWAS) or sequence data where it is named SKAT (sequence kernel association test). The kernel score test was developed for independent subjects [1]. Recent contributions by others and ourselves [2-6] extended the kernel score test to family data. The kernel is chosen to describe genetic correlation among subjects. Different kernels have been suggested for genetic epidemiological applications. These kernels differ in whether marker–marker interactions are modeled and how complex the interaction effects may be. A frequent choice is to apply the kernel function on weighted minor allele dosage data (thus using an additive coding of minor allele effects). The dosage weights increase with decreasing minor allele frequency corresponding to the a priori assumption that less-frequent variants may have larger effects. Weighting allows rarer variants to contribute more to the overall test despite of their low frequencies. With appropriate weighting, rare and common variants may be entered together into the kernel for joint testing. Recently however, Ionita-Laza et al. [7] proposed alternatives that can be more powerful. We explored these alternative joint tests on rare and common variants in the Genetic Analysis Workshop 19 (GAW19) family data. Moreover, we compared the power of different marker weights and kernels on sequence and GWAS panels. As we focused on genes, we also explored how size or positioning of a flanking region affects the test power.

Methods

Data

We analyzed baseline systolic blood pressure (SBP) and dosage data in the extended Mexican American pedigrees of the GAW19 family data, which are identical to the Genetic Analysis Workshop 18 data [8]. As before [6], we considered subjects with known baseline SBP and baseline diastolic blood pressure, sex, and age, who were not on blood pressure medication (real SBP: 706 subjects, excluding the first listed monozygotic twin of 2 observed twin pairs; simulated SBP: 740 to 781 subjects, numbers vary for 200 simulated study replicates because of inclusion criteria). For real SBP, we considered candidate gene AGTR1 [9] on chromosome (chr) 3 that tends to associate with SBP in the present family sample [6]. For simulated SBP, we selected from the simulation answers 5 strongly associated genes with various linkage disequilibrium (LD) structures: MAP4 (very homogeneous LD, chr3) and, in the order of increasing variability of LD, TNN (chr1), FLT3 (chr13), LEPR (chr1), and GSN (chr9). We used NCBI build 37, International Haplotype Map Project (HapMap) [10] reference data for Mexican Americans and the default algorithm in Haploview 4.2 [11] with a required fraction of strong LD of 0.7 and confidence interval limits of 0.5 and 0.8 to determine LD-blocks based on the D’ measure. Gene regions were defined as the LD-block(s) that contained the gene. For AGTR1, we also considered the region from the first to the last exonic position and flanking regions of 30 kb or 500 kb. For the same subjects, we used 2 single-nucleotide polymorphism (SNP) panels: sequence (allele dosage data) and GWAS (allele dosage data reduced to GWAS SNPs). Biallelic SNPs were included for testing if their Hardy-Weinberg equilibrium test p values were equal to or greater than 10−5 (rounding imputed dosages for this purpose only) and if at least 7 observations of the minor allele were present in the sample. The latter parallels minimum data requirements in parametric regression.

Kernel score test for family data

Here we briefly summarize our method introduced in [6], denoting vectors and matrices by bold letters. Baseline SBP is right-skewed distributed and was therefore rank-normalized by Blom transformation [12] to standard normally distributed target variables Y = (Y1,…,Yn). Y depend on fixed covariate effects b (intercept, age, sex, age × sex interaction), random effects c that adjust for familial polygenic background, a semiparametric model h(G) of genetic markers G, and regression residuals e ~ N(0,s2 I) with residual variance s2. X, Z are the design matrices for fixed covariate effects and random family effects. h(G) = Ka T depends on a n × n dimensional kernel matrix K of genetic similarities between n subjects on markers G, and multivariate normally distributed random effects a ~ N(0,τK) [1]. One tests for a genetic covariance component τ. The kernel score test is computed from restricted maximum likelihood parameter estimates of the genetic null model (where h(G) = 0). Thus, the null model estimates fixed covariate effects b , random pedigree effects c , the variance s2 fam of the polygenic familial component, and the residual variance s2 o. The null model was adjusted for polygenic familial background based on the kinship coefficient matrix Φkin = ZZ T using R-packages kinship2 and coxme with R-function lmekin. The kernel score test statistic is. R = P o 1/2 Y are standard normally distributed residuals and matrix M = (P o 1/2 K P o 1/2)/2 incorporates the kernel [6]. P   = V –V X(X V X) X V is the null projection matrix with V o = s2 o I + s2 fam ZZ T. The p values for test statistic (2) were calculated by Davies’ exact method [13] with the R package CompQuadForm from sample estimates Q and all eigenvalues of matrix M.

Kernels and single-nucleotide polymorphism weights

We applied all kernel functions on allele dosage data g i, g j (for pairs of subjects i, j) on NSNP biallelic SNP markers. The kernel matrix entries are with diagonal weight matrix W. The linear kernel (3) does not allow for SNP interactions opposed to the RBF kernel (4), which yields polynomial models. Dosage weights are normed W mm = f(νm)/∑mf(νm) for any chosen SNP set m = 1,…,NSNP and depend on the minor allele frequency (MAF) ν of the respective SNP. We considered: f(νm) = 1 (treating SNPs alike), f(νm) = 1/νm, as well as f(νm) = Beta(νm,1,25) for νm equal to or less than 5 % and f(νm) = Beta(νm,0.5,0.5) for νm greater than 5 % as suggested earlier [7]. Beta-density weights distinguish MAFs more moderately than 1/ν-weights. For the RBF kernel (4), the scale parameter μ was the average weighted squared genetic difference between subjects Σi,j((g i-g j)T W(g i-g j))/n2 multiplied by the effective number of independent SNPs in the tested set [14].

Strategies for combined testing of common and rare variants

By default, the kernel score test, Eq. (2), is performed with a kernel matrix K computed on all dosages with a weighting of common and rare SNPs. In contrast, Ionita-Laza et al. [7] recently suggested computing the kernel separately for rare SNPs (K ) and for common SNPs (K ), respectively, in a region of interest. Analogous to Eq. (2), this yields matrices M , M , test statistics Qrare, Qcommon, and p values p rare, p common. The null model, P and R were always the same. The weighted sum test (WS) on common and rare variants has test statistic [7]. Weight φ = (tr(M ∙M )/(tr(M ∙M ) + tr(M ∙M )))1/2 may be chosen such that (1 − φ)∙Qrare and φ∙Qcommon have the same variance. P values are obtained by Davies’ exact method from sample estimates QWS and all eigenvalues of matrix ((1 − φ)∙M  + φ∙M ). Alternatively, Fishers p value pooling can be applied. Under H0, QFISHER/(1 + 0.25∙cov) is chi-square distributed with 16/(4 + cov) degrees of freedom [7]. With r = tr(M ∙M )/(tr(M ∙M )∙tr(M ∙M ))1/2, the covariance between p rare and p common is cov ≈ r∙(3.25 + 0.75∙r) for 0 ≤ r ≤1 and cov ≈ r∙(3.27 + 0.71∙r) for −0.5 ≤ r ≤0. Only test statistic (6) yields approximate p values; all other p values are obtained with Davies’ method and are exact.

Results and discussion

Our test extension to families holds the nominal significance level and correctly adjusts for a polygenic familial variance component (as demonstrated in [6]). Table 1 lists the p values obtained for association testing of AGTR1 on real SBP, considering common SNPs (MAF >5 %) and rare SNPs (MAF ≤5 %) as well as 3 joint tests (default test K , WS, Fisher). Beta-weights (not shown) performed between equal weights and 1/ν-weights. The 1/ν-weight lowered p values particularly on common SNPs. AGTR1 association is suggested by common as well as rare SNPs. Joint testing of rare and common SNPs was beneficial. In particular, WS and Fisher test p values were often smaller (and otherwise close to) the smallest p value of the separate rare and common SNP tests. When using ad hoc definitions of the AGTR1 flanking region, Fisher and WS p values remained relatively stable and were also smaller compared to the default test K . However, on the AGTR1 containing LD-block all joint tests performed highly similar, p values were the smallest and also relatively stable regardless of SNP weights and SNP density.
Table 1

Analysis of real data: real SBP and candidate gene AGTR1

SNP panelWeightCommon SNPsRare SNPsJoint tests
MAF >5 %MAF ≤5 %DefaultWSFisher
NSNP p valueNSNP p value p value p value p value
AGTR1 with no flanking region, positions 148415571–148460795
 GWASequal110.18970.0970.1770.1020.101
1/ν110.1137 0.050 0.054 0.044 0.043
 SEQequal740.2031380.0600.1730.0760.076
1/ν740.1601380.0980.0830.0880.090
AGTR1 with 30 kb flanking region, positions 148385571–148490795
 GWASequal300.100120.0720.092 0.050 0.052
1/ν30 0.045 120.069 0.030 0.029 0.029
 SEQequal1980.0533000.067 0.047 0.030 0.032
1/ν198 0.039 3000.172 0.045 0.044 0.050
AGTR1 with 500 kb flanking region, positions 147915571–148960795
 GWASequal2770.20651 0.048 0.1960.0610.065
1/ν2770.151510.0640.1020.0590.066
 SEQequal21700.19222440.0690.1730.0800.085
1/ν21700.15722440.0510.0620.0570.060
AGTR1 containing LD-block, positions 148344702–148568958
 GWASequal800.058190.0760.055 0.035 0.036
1/ν80 0.040 190.114 0.034 0.036 0.039
 SEQequal499 0.029 5920.106 0.027 0.027 0.030
1/ν499 0.027 5920.112 0.025 0.026 0.030

Association of AGTR1 with real SBP was tested with a linear kernel on minor allele dosage data for GWAS and sequence (SEQ); p ≤0.05 bold. NSNP common and rare SNPs, respectively, were combined into joint tests: kernel K (default), weighted sum test (WS), and Fisher’s p value pooling for correlated p values

Analysis of real data: real SBP and candidate gene AGTR1 Association of AGTR1 with real SBP was tested with a linear kernel on minor allele dosage data for GWAS and sequence (SEQ); p ≤0.05 bold. NSNP common and rare SNPs, respectively, were combined into joint tests: kernel K (default), weighted sum test (WS), and Fisher’s p value pooling for correlated p values Next, we analyzed LD-blocks that contain the genes MAP4, TNN, LEPR, GSN, or FLT3. Figure 1 displays the average test power on 200 data replicates of simulated SBP. Sequence-derived variants were often more powerful than GWAS with some exceptions (Fig. 1 left and middle panels, black solid lines vs. gray dashed lines). The best were often 1/ν-weights (circle), otherwise equal weights (diamond) were favored. Particularly 1/ν-weights may be beneficial on common SNPs (LEPR) and occasionally detrimental on rare SNPs (MAP4). The latter is an exceptional finding but consistent with Table 1 on candidate gene AGTR1. On rare MAP4 SNPs, 1/ν-weights lowered the power, especially when testing also extremely rare SNPs (encircled plus), but less so when testing only MAF equal to or less than 5 % SNPs that had at least 7 observations of the minor allele (filled circle; sequence data). On gene-containing LD-blocks, all joint tests (default test K , WS, Fisher) often had similar power (Fig. 1, right panel: LEPR, FLT3, TNN with highly similar results [only TNN shown]; GSN sequence). However, default test K was the most powerful test on the gene with homogeneous strong LD (MAP4: sequence [Fig. 1, right] and GWAS [not shown]) and on the gene with the most variable LD structure (GSN: when using GWAS SNPs, not shown). Then, K likely exploited SNP correlations better. When LD-blocks were enlarged by flanking regions, WS and Fisher often were slightly more powerful than K (results not shown). The linear kernel had always similar or better power than the RBF kernel (results not shown).
Fig. 1

Test power on simulated SBP may greatly depend on SNP weights. Left and middle panels: Power of the kernel score test over 200 study replicates of simulated SBP as function of the significance level for different SNP weights and SNP panels. Right panel: Power of joint tests of rare and common SNPs at 2 significance levels α = 0.05, 10−6 when using 1/ν-weights on the sequence of gene-containing LD-blocks. Power estimates for LEPR (positions 65743083 to 66106465) and FLT3 (28490385–28713642) (not shown) were highly similar to TNN

Test power on simulated SBP may greatly depend on SNP weights. Left and middle panels: Power of the kernel score test over 200 study replicates of simulated SBP as function of the significance level for different SNP weights and SNP panels. Right panel: Power of joint tests of rare and common SNPs at 2 significance levels α = 0.05, 10−6 when using 1/ν-weights on the sequence of gene-containing LD-blocks. Power estimates for LEPR (positions 65743083 to 66106465) and FLT3 (28490385–28713642) (not shown) were highly similar to TNN

Conclusions

As the power of kernel methods increases through the exploitation of SNP correlations [2], this ability should be utilized fully by analyzing LD-blocks. SNP weights have a far greater impact on test power than the kernel chosen. Currently, the benefit of 1/ν-weights may be underestimated for common SNPs. On rare SNPs, 1/ν-weights often improve power, but can also be detrimental. Findings are consistent with both real and simulated data. Our results suggest using 1/ν-weights on all SNPs in a single kernel K testing LD-blocks and only SNPs with sufficient minor allele observations. Alternatively, one may use WS with 1/ν-weights on common SNPs and equal weights on rare SNPs in the kernels. WS upweights the rare variant contribution globally; see Eq. (5).
  12 in total

1.  Haploview: analysis and visualization of LD and haplotype maps.

Authors:  J C Barrett; B Fry; J Maller; M J Daly
Journal:  Bioinformatics       Date:  2004-08-05       Impact factor: 6.937

Review 2.  Genomic similarity and kernel methods I: advancements by building on mathematical and statistical foundations.

Authors:  Daniel J Schaid
Journal:  Hum Hered       Date:  2010-07-03       Impact factor: 0.444

3.  Sequence kernel association tests for the combined effect of rare and common variants.

Authors:  Iuliana Ionita-Laza; Seunggeun Lee; Vlad Makarov; Joseph D Buxbaum; Xihong Lin
Journal:  Am J Hum Genet       Date:  2013-05-16       Impact factor: 11.025

Review 4.  Polymorphism in angiotensin II receptor genes and hypertension.

Authors:  Bruno Baudin
Journal:  Exp Physiol       Date:  2005-01-07       Impact factor: 2.969

5.  SNP set association analysis for familial data.

Authors:  Elizabeth D Schifano; Michael P Epstein; Lawrence F Bielak; Min A Jhun; Sharon L R Kardia; Patricia A Peyser; Xihong Lin
Journal:  Genet Epidemiol       Date:  2012-09-11       Impact factor: 2.135

6.  Sequence kernel association test for quantitative traits in family samples.

Authors:  Han Chen; James B Meigs; Josée Dupuis
Journal:  Genet Epidemiol       Date:  2012-12-26       Impact factor: 2.135

7.  Adjusted sequence kernel association test for rare variants controlling for cryptic and family relatedness.

Authors:  Karim Oualkacha; Zari Dastani; Rui Li; Pablo E Cingolani; Timothy D Spector; Christopher J Hammond; J Brent Richards; Antonio Ciampi; Celia M T Greenwood
Journal:  Genet Epidemiol       Date:  2013-03-25       Impact factor: 2.135

8.  Kernel score statistic for dependent data.

Authors:  Dörthe Malzahn; Stefanie Friedrichs; Albert Rosenberger; Heike Bickeböller
Journal:  BMC Proc       Date:  2014-06-17

9.  Data for Genetic Analysis Workshop 18: human whole genome sequence, blood pressure, and simulated phenotypes in extended pedigrees.

Authors:  Laura Almasy; Thomas D Dyer; Juan M Peralta; Goo Jun; Andrew R Wood; Christian Fuchsberger; Marcio A Almeida; Jack W Kent; Sharon Fowler; Tom W Blackwell; Sobha Puppala; Satish Kumar; Joanne E Curran; Donna Lehman; Goncalo Abecasis; Ravindranath Duggirala; John Blangero
Journal:  BMC Proc       Date:  2014-06-17

10.  Comparing the power of family-based association tests for sequence data with applications in the GAW18 simulated data.

Authors:  Jing Huang; Yong Chen; Michael D Swartz; Iuliana Ionita-Laza
Journal:  BMC Proc       Date:  2014-06-17
View more
  3 in total

1.  Pathway-Based Kernel Boosting for the Analysis of Genome-Wide Association Studies.

Authors:  Stefanie Friedrichs; Juliane Manitz; Patricia Burger; Christopher I Amos; Angela Risch; Jenny Chang-Claude; Heinz-Erich Wichmann; Thomas Kneib; Heike Bickeböller; Benjamin Hofner
Journal:  Comput Math Methods Med       Date:  2017-07-13       Impact factor: 2.238

2.  Gene-set meta-analysis of lung cancer identifies pathway related to systemic lupus erythematosus.

Authors:  Albert Rosenberger; Melanie Sohns; Stefanie Friedrichs; Rayjean J Hung; Gord Fehringer; John McLaughlin; Christopher I Amos; Paul Brennan; Angela Risch; Irene Brüske; Neil E Caporaso; Maria Teresa Landi; David C Christiani; Yongyue Wei; Heike Bickeböller
Journal:  PLoS One       Date:  2017-03-08       Impact factor: 3.240

3.  Relating drug response to epigenetic and genetic markers using a region-based kernel score test.

Authors:  Summaira Yasmeen; Patricia Burger; Stefanie Friedrichs; Sergi Papiol; Heike Bickeböller
Journal:  BMC Proc       Date:  2018-09-17
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.