Literature DB >> 33811807

Quantifying the contribution of dominance deviation effects to complex trait variation in biobank-scale data.

Ali Pazokitoroudi¹, Alec M Chiu², Kathryn S Burch², Bogdan Pasaniuc³, Sriram Sankararaman⁴.

Abstract

The proportion of variation in complex traits that can be attributed to non-additive genetic effects has been a topic of intense debate. The availability of biobank-scale datasets of genotype and trait data from unrelated individuals opens up the possibility of obtaining precise estimates of the contribution of non-additive genetic effects. We present an efficient method to estimate the variation in a complex trait that can be attributed to additive (additive heritability) and dominance deviation (dominance heritability) effects across all genotyped SNPs in a large collection of unrelated individuals. Over a wide range of genetic architectures, our method yields unbiased estimates of additive and dominance heritability. We applied our method, in turn, to array genotypes as well as imputed genotypes (at common SNPs with minor allele frequency [MAF] > 1%) and 50 quantitative traits measured in 291,273 unrelated white British individuals in the UK Biobank. Averaged across these 50 traits, we find that additive heritability on array SNPs is 21.86% while dominance heritability is 0.13% (about 0.48% of the additive heritability) with qualitatively similar results for imputed genotypes. We find no statistically significant evidence for dominance heritability (p<0.05/50 accounting for the number of traits tested) and estimate that dominance heritability is unlikely to exceed 1% for the traits analyzed. Our analyses indicate a limited contribution of dominance heritability to complex trait variation.

Entities: Chemical

Keywords: additive; biobank; complex traits; dominance; genetic variation; heritability; mixed models; variance components

Mesh：

Year: 2021 PMID： 33811807 PMCID： PMC8206203 DOI： 10.1016/j.ajhg.2021.03.018

Source DB: PubMed Journal: Am J Hum Genet ISSN： 0002-9297 Impact factor: 11.025

Introduction

Variation in complex traits can be partitioned into variation due to additive, dominance, and epistatic effects. Despite decades of theoretical and experimental efforts, the quantification of non-additive genetic variation in outbred populations such as humans remains challenging.2, 3, 4, 5, 6 One approach to estimate non-additive sources of heritability in humans have been focused on comparing phenotypic similarity between close relatives. These estimates, however, can be biased by confounding due to shared environmental factors. Further, the limited sample sizes of family and twin studies lead to large standard errors in estimates of non-additive effects. An alternative approach relies on the analysis of unrelated individuals. The relatively small estimates of non-additive sources of heritability from prior studies, suggest that achieving sufficient power will require the analysis of large numbers of unrelated individuals and methods that can be run on these large sample sizes. To this end, we extend our previously proposed variance components method to jointly estimate the heritability due to additive and dominance deviation effects attributed to SNPs genotyped across hundreds of thousands of individuals. Additive variance refers to the variance in genotypic value (the conditional mean of phenotype given genotype) explained by regression of the genotypic value on an additive representation of the genotype while dominance variance denotes the residual variance that is not explained by a model with only additive effects. Using this definition, the additive variance component captures the variance attributed to breeding values and includes both additive and dominant genetic effects., The additive (dominance) heritability refers to the ratio of the additive (dominance) variance to the phenotypic variance. Our method can jointly fit multiple additive and dominance variance components, thereby allowing it to provide unbiased estimates of heritability for genetic architectures in which SNP effect sizes vary as a function of minor allele frequency (MAF) and linkage disequilibrium (LD). Our method obtains unbiased estimates of additive and dominance heritability under a range of MAF and LD-dependent architectures while controlling the false positive rate of rejecting the null hypothesis of no dominance heritability under genetic architectures that assume no dominance. Analyzing a total of 50 continuous traits measured in 291,273 unrelated white British individuals in the UK Biobank, we find that additive heritability is 21.86% on average while dominance heritability is 0.13% on average (about 0.48% of the heritability attributed to additive effects) across common array SNPs (M = 459,792 SNPs, MAF > 1%). Analyzing common imputed SNPs (M = 4,824,392, MAF > 1%), we find that additive heritability is 22.83% on average while dominance heritability is 0.06% on average (about 0.47% of the heritability attributed to additive effects). We find no evidence for traits that have non-zero dominance heritability after correcting for multiple testing . Based on our power estimates, we estimate that dominance heritability is unlikely to exceed 1% for the traits analyzed.

Material and methods

Variance components model with additive and dominance components

We are interested in estimating how much extra genetic variance can be explained by dominance variation on top of a model with only additive effects. To this end, we fit a variance components model that relates phenotypes y measured across N individuals to the additive values and dominant deviations over M SNPs (while allowing for multiple additive and dominance components). Here is an arbitrary distribution over a random vector with mean μ and covariance matrix . SNPs are partitioned into K additive categories and L dominance categories where and are the N × M and N × M matrices consisting of standardized additive and dominance deviation encodings of SNPs belonging to additive category i and dominance category j, respectively; is the residual variance, and and are the variance components of the ith additive and dominance categories, respectively. Our encoding captures dominance deviation, which is different from the dominance effect of markers. We encode additive and dominance deviation effects using a representation that leads to uncorrelated variance components., For alleles A and B at a SNP with the frequency of allele B denoted by f, the additive and dominance deviation encodings of the genotypes are defined as follows: The proportion of phenotypic variance explained by additive variation (additive heritability) at all SNPs is defined as: The proportion of phenotypic variance explained by dominance deviation (dominance heritability) at all SNPs is defined as: The proposed model extends previous models by introducing the component corresponding to dominance deviation effects in addition to the additive effects. Further, the proposed model allows for the joint estimation of multiple additive and dominance components, e.g., corresponding to SNPs with varying minor allele frequency (MAF) and linkage disequilibrium (LD) annotations that have been previously shown to lead to relatively unbiased estimates of SNP heritability., The key inference problem in this model is the estimation of the variance components: where and . We use a scalable method-of-moments estimator, i.e., finding values of the variance components such that the population moments match the sample moments.13, 14, 15, 16, 17 Our method uses a randomized algorithm that avoids explicitly computing genetic relatedness matrices. Instead, it operates on a smaller matrix formed by multiplying the input genotype matrix with a small number of random vectors allowing it to scale to large samples. We estimate standard errors (SE) using an efficient block Jackknife over SNPs with 100 blocks.

Method-of-moments for estimating variance components

To estimate the variance components, we use a method-of-moments (MoM) estimator that estimates parameter values so that the population moments are close to the sample moments. Since E[y] = 0, we derived the MoM estimates by equating the population covariance to the empirical covariance. The population covariance is given by: Here () is the additive (dominance) genetic relatedness matrix (GRM) computed from all SNPs of kth category. Using yy as our estimate of the empirical covariance, we need to solve the following least-squares problem to find the variance components. For simplicity, we denote = for i = 1,…, K, = for j = 1,…, L, and J = K + L. The MoM estimator satisfies the following normal equations: Here , is a J × J matrix with entries , is a J-vector with entries b = tr() = N (because s and s are standardized), and c is a J-vector with entries c = . Each GRM can be computed in time and memory. Given J GRMs, the quantities T, c, , can be computed in . Given the quantities T, c, the normal Equation 5 can be solved in . Therefore, the total time complexity for estimating the variance components is .

Randomized estimator of multiple variance components

The key bottleneck in solving the normal Equation 5 is the computation of T, , which takes . Instead of computing the exact value of T, we use an unbiased estimator of the trace based on the following identity: for a given N × N matrix , is an unbiased estimator of tr() (), where is a random vector with mean zero and covariance . Hence, we can estimate the values , as follows:where matrix can be standardized additive or dominance matrix. Here 1,…, are B independent random vectors with zero mean and covariance . We draw these random vectors independently from a standard normal distribution. Computing T using the unbiased estimator involves four multiplications of sub-matrices of the genotype matrix with a vector, repeated B times. Therefore, the total running time for estimating the matrix T is .

Simulations

We simulated phenotypes from UK Biobank genotypes consisting of M = 459,792 array SNPs and N = 291,273 unrelated white British individuals (see section on UK Biobank data). We simulated phenotypes from genotypes using the following model:where , , . Here S and S′ are normalizing constants chosen so that , . Additive and dominance deviation effect sizes are denoted by β and α, respectively. f and w are the minor allele frequency and LDAK score of mth SNP, respectively. In this model, are indicator variables for the causal status of SNP m. The LDAK score of a SNP is computed based on local levels of LD such that the LDAK score tends to be higher for SNPs in regions of low LD. The above models relating genotype to phenotype are commonly used in methods for estimating SNP heritability: the GCTA model (when a = b = 0 in Equation 7), which is used by the software GCTA, and the LDAK model (where a = 0.75, b = 1 in Equation 7) used by software LDAK. Moreover, under each model, we varied the proportion and minor allele frequency (MAF) of causal variants (CVs). Proportion of causal variants were set to be either 100% or 1%, and MAF of causal variants drawn uniformly from [0, 0.5] or [0.01, 0.05] or [0.05, 0.5] to consider genetic architectures that are either infinitesimal or sparse as well genetic architectures that include a mixture of common and rare SNPs as well as one that includes only common SNPs. We generated 100 sets of simulated phenotypes for each setting of parameters. In experiments to assess the false positive rate, the additive heritability was set to 0.5 while the dominance heritability was set to 0. Let be the estimate of while is the jackknife estimate of standard error on the ith replicate for . We computed the p value of the two-tailed test of the null hypothesis of no on the ith replicate from the Z score defined as for . To test the bias of the estimator, for every simulation setting, first we compute and from all replicates, then we reported p values of the two-tailed test of no bias from the Z score defined as .

Power

To assess the power of our method to detect dominance heritability, we considered simulations under different genetic architectures with a non-zero dominance heritability. Across 16 different genetic architectures, we vary the additive and dominance heritabilities and proportion of causal dominance variants. We simulated 100 replicates for every genetic architecture. Let be the estimate of while is the jackknife estimate of standard error on the ith replicate for . We computed the p value of a test of the null hypothesis of no on the ith replicate from the Z score defined as for . Finally, we reported the percentage of replicates with p value < t as the power of our method on a given simulated genetic architecture for a p value threshold of t.

UK Biobank data

We applied our method to UK Biobank data. We restricted our study to self-reported British white ancestry individuals that are >3rd degree relatives, which are defined as pairs of individuals with kinship coefficient . Furthermore, we removed individuals who are outliers for genotype heterozygosity and/or missingness. We removed SNPs with greater than 1% missingness and minor allele frequency <1% and that fail the test of Hardy-Weinberg equilibrium at significance threshold 10−7. Finally, we obtained a set of N = 291,273 individuals and M = 459,792 SNPs to use in the real data analyses. We included age, sex, and the top 20 genetic principal components (PCs) as covariates in our analysis for all traits. We used PCs precomputed by the UK Biobank from a superset of 488,295 individuals. Additional covariates were used for waist-to-hip ratio (adjusted for BMI) and diastolic/systolic blood pressure (adjusted for cholesterol-lowering medication, blood pressure medication, insulin, hormone replacement therapy, and oral contraceptives). Further, we also analyzed M = 4,824,392 imputed SNPs with MAF > 1% minor allele frequency (excluding SNPs with missingness >1% and SNPs that fail the Hardy-Weinberg test at significance threshold 10−7) across N = 291,273 unrelated white British individuals.

Results

Accuracy of estimates of dominance heritability in simulations

Previous studies estimate a relatively small contribution of dominance heritability for complex traits, so we wanted to test the false positive rate of a test of the hypothesis of no dominance heritability. To assess the false positive rate of our method, we performed simulations in the absence of dominance deviation effects (M = 459,792 SNPs, N = 291,273 individuals). Since additive SNP effects tend to vary as a function of MAF and LD patterns at the SNP, and SNP heritability estimates tend to be sensitive to these assumptions, we simulated phenotypes according to 16 MAF and LD-dependent architectures by varying the additive heritability, the proportion of variants that have non-zero effects (causal variants or CVs), the distribution of causal variants across minor allele frequencies (CVs distributed across all minor allele frequency bins or CVs restricted to either common or low-frequency bins), and the form of coupling between the SNP effect size and MAF as well as LD. The key parameter in applying RHE-mc is the number of random vectors B. We have performed a set of experiments to explore the choice of B. We simulated 100 phenotypes based on M = 459,792 array SNPs and N = 291,273 individuals where and , p(A) = 1, and p(D) = 0.02. We observe that the Pearson’s correlation coefficients (r) between estimates with B = 10 and estimates with B = 100 are 0.94 (additive heritability) and 0.91 (dominance heritability). Therefore, B = 10 is sufficient for the applications considered (Figure S2). To obtain unbiased estimates, we also do not constrain the estimates of the variance components (allowing for negative estimates). Recent studies have shown that methods that fit a single additive variance component yield biased estimates of SNP heritability due to the LD- and MAF-dependent architecture of complex traits,,, while models that allow for SNP effects to vary with MAF and LD obtain relatively unbiased estimates.,, Thus, we ran our method using 24 bins for additive effects (based on 6 MAF and 4 LD bins) and a single bin for dominance deviation effects (although our method allows for fitting multiple dominance bins). Across the range of genetic architectures, we obtained accurate estimates of when we jointly fit additive and dominance heritability: biases range from −2 × 10−3 to 2 × 10−3 where (Figure 1). We also obtain unbiased estimates of with biases ranging from −5 × 10−5 to 6 × 10−4 where (Figure 1). Importantly, the false positive rate of rejecting the null hypothesis of no dominance heritability across 16 diverse genetic architecture is controlled at level 0.05 (see Table 1). We performed additional simulations that demonstrate accurate heritability estimates for a smaller sample size of N = 10,000 individuals (Figure S1 and Table S1).

Figure 1

The accuracy of estimates of dominance and additive heritabilities in simulations with no dominance heritability

We plot estimates from our method in the absence of dominance deviation effects under 16 different genetic architectures (N = 291,273 unrelated individuals, M = 459,792 array SNPs). We varied the MAF range of causal variants (MAF of CV), the coupling of MAF with effect size (a), and the effect of local LD on effect size (b = 0 indicates no LDAK weights and b = 1 indicates LDAK weights. We ran 100 replicates where the true additive and dominance heritabilities of the phenotype are 0.5 and 0.0, respectively. We ran our method using a single dominance bin and 24 additive bins formed by the combination of 6 bins based on MAF as well as 4 bins based on quartiles of the LDAK score of a SNP. Black points and error bars represent the mean and ±2 SE, respectively. Each boxplot represents estimates from 100 simulations. Boxplot whiskers extend to the minimum and maximum estimates located within 1.5× interquartile range (IQR) from the first and third quartiles, respectively.

Table 1

Calibration of tests of dominance heritability

Genetic architecture			P(rejection at p < t)		Test of bias
% of causal SNPs	MAF of causal SNPs	MAF and LD coupling	t = 0.05	t = 10⁻³	p value
0.01	[0.01,0.05]	a = b = 0	6%	0%	0.192
0.01	[0.01,0.05]	a = 0,b = 1	5%	0%	0.006
0.01	[0.01,0.05]	a = 0.75,b = 0	6%	1%	0.011
0.01	[0.01,0.05]	a = 0.75,b = 1	8%	0%	0.187
0.01	[0.0,0.5]	a = b = 0	4%	0%	0.388
0.01	[0.0,0.5]	a = 0,b = 1	8%	0%	0.415
0.01	[0.0,0.5]	a = 0.75,b = 0	4%	0%	0.593
0.01	[0.0,0.5]	a = 0.75,b = 1	2%	0%	0.367
0.01	[0.05,0.5]	a = b = 0	7%	0%	0.046
0.01	[0.05,0.5]	a = 0,b = 1	4%	0%	0.813
0.01	[0.05,0.5]	a = 0.75,b = 0	6%	1%	0.105
0.01	[0.05,0.5]	a = 0.75,b = 1	1%	0%	0.855
1.0	[0.0,0.5]	a = b = 0	2%	0%	0.196
1.0	[0.0,0.5]	a = 0,b = 1	5%	0%	0.298
1.0	[0.0,0.5]	a = 0.75,b = 0	7%	0%	0.522
1.0	[0.0,0.5]	a = 0.75,b = 1	2%	0%	0.130

We assess the false positive rate of tests of dominance heritability based on our method in the absence of dominance deviation effects under 16 different genetic architectures. We varied the MAF range of causal variants, the coupling of MAF with effect size (a), and the effect of local LD on effect size (b = 0 indicates no LDAK weights and b = 1 indicates LDAK weights). Probability of rejection is computed from 100 replicates. We report p value of a test of the null hypothesis of no bias in the estimates of .

The accuracy of estimates of dominance and additive heritabilities in simulations with no dominance heritability We plot estimates from our method in the absence of dominance deviation effects under 16 different genetic architectures (N = 291,273 unrelated individuals, M = 459,792 array SNPs). We varied the MAF range of causal variants (MAF of CV), the coupling of MAF with effect size (a), and the effect of local LD on effect size (b = 0 indicates no LDAK weights and b = 1 indicates LDAK weights. We ran 100 replicates where the true additive and dominance heritabilities of the phenotype are 0.5 and 0.0, respectively. We ran our method using a single dominance bin and 24 additive bins formed by the combination of 6 bins based on MAF as well as 4 bins based on quartiles of the LDAK score of a SNP. Black points and error bars represent the mean and ±2 SE, respectively. Each boxplot represents estimates from 100 simulations. Boxplot whiskers extend to the minimum and maximum estimates located within 1.5× interquartile range (IQR) from the first and third quartiles, respectively. Calibration of tests of dominance heritability We assess the false positive rate of tests of dominance heritability based on our method in the absence of dominance deviation effects under 16 different genetic architectures. We varied the MAF range of causal variants, the coupling of MAF with effect size (a), and the effect of local LD on effect size (b = 0 indicates no LDAK weights and b = 1 indicates LDAK weights). Probability of rejection is computed from 100 replicates. We report p value of a test of the null hypothesis of no bias in the estimates of . Next, we considered simulations under genetic architectures with a non-zero dominance heritability. We evaluated the accuracy of additive and dominance heritability estimates across 16 different genetic architecture where we vary the additive and dominance heritabilities and proportion of causal dominance variants. We ran our method using 24 bins for additive effects (based on 6 MAF and 4 LD bins) and a single bin for dominance deviation effects. We obtained accurate estimates of and when we jointly fit additive and dominance heritability: biases range from −1.6 × 10−3 to 2.7 × 10−4 where for dominance heritability while the biases range from −2.3 × 10−3 to 1.4 × 10−4 where for additive heritability (Figure 2).

Figure 2

The accuracy of estimates of dominance and additive heritabilities in simulations with non-zero dominance heritability

We plot estimates from our method under 16 different genetic architectures (N = 291,273 unrelated individuals, M = 459,792 array SNPs). We varied the additive heritability , dominance heritability , and the proportion of dominance causal variants (causal ratio). Black points and error bars represent the mean and ±2 SE, respectively. Each boxplot represents estimates from 100 simulations. Boxplot whiskers extend to the minimum and maximum estimates located within 1.5× interquartile range (IQR) from the first and third quartiles, respectively.

The accuracy of estimates of dominance and additive heritabilities in simulations with non-zero dominance heritability We plot estimates from our method under 16 different genetic architectures (N = 291,273 unrelated individuals, M = 459,792 array SNPs). We varied the additive heritability , dominance heritability , and the proportion of dominance causal variants (causal ratio). Black points and error bars represent the mean and ±2 SE, respectively. Each boxplot represents estimates from 100 simulations. Boxplot whiskers extend to the minimum and maximum estimates located within 1.5× interquartile range (IQR) from the first and third quartiles, respectively. In addition, we observe high power (>95% for a p value threshold of 0.05) to detect dominance heritability as low as 1% in a sample size of (Table 2). A more realistic assessment of power would consider the multiple testing burden incurred when testing a collection of phenotypes with the goal of discovering traits with significant dominance heritability. Assuming we test 50 phenotypes (matching our analyses of the UK Biobank), we estimate 100% power to detect and >50% power to detect in a sample of individuals .

Table 2

Accuracy and power to detect dominance heritability in simulations

Genetic architecture		Power		hˆD2		Test of bias
Additive component	Dominance component	t = 0.05	t = 10⁻³	Mean	SE	p value
pcausal(A)=1, hA2=0.5	pcausal(D)=1, hD2=0.05	100%	100%	0.05	0.003	0.432
pcausal(A)=1, hA2=0.5	pcausal(D)=0.01, hD2=0.05	100%	100%	0.049	0.003	0.596
pcausal(A)=1, hA2=0.5	pcausal(D)=1, hD2=0.02	100%	100%	0.02	0.002	0.351
pcausal(A)=1, hA2=0.5	pcausal(D)=0.01, hD2=0.02	100%	100%	0.02	0.002	0.869
pcausal(A)=1, hA2=0.5	pcausal(D)=1, hD2=0.01	97%	68%	0.01	0.002	0.901
pcausal(A)=1, hA2=0.5	pcausal(D)=0.01, hD2=0.01	98%	67%	0.0099	0.002	0.730
pcausal(A)=1, hA2=0.5	pcausal(D)=1, hD2=0.002	11%	2%	0.0018	0.0025	0.738
pcausal(A)=1, hA2=0.5	pcausal(D)=0.01, hD2=0.002	10%	1%	0.0019	0.0027	0.590
pcausal(A)=1, hA2=0.25	pcausal(D)=1, hD2=0.05	100%	100%	0.049	0.003	0.434
pcausal(A)=1, hA2=0.25	pcausal(D)=0.01, hD2=0.05	100%	100%	0.048	0.003	2.5e-06
pcausal(A)=1, hA2=0.25	pcausal(D)=1, hD2=0.02	100%	100%	0.02	0.002	0.889
pcausal(A)=1, hA2=0.25	pcausal(D)=0.01, hD2=0.02	100%	100%	0.02	0.002	0.476
pcausal(A)=1, hA2=0.25	pcausal(D)=1, hD2=0.01	93%	73%	0.01	0.002	0.744
pcausal(A)=1, hA2=0.25	pcausal(D)=0.01, hD2=0.01	93%	66%	0.0098	0.002	0.632
pcausal(A)=1, hA2=0.25	pcausal(D)=1, hD2=0.002	9%	0%	0.0017	0.0024	0.373
pcausal(A)=1, hA2=0.25	pcausal(D)=0.01, hD2=0.002	12%	1%	0.0017	0.0026	0.292

We assess power, bias, and SE of our method in the presence of dominance and additive heritability under 16 different genetic architectures (N = 291,273 unrelated individuals, M = 459,792 array SNPs). Power, mean, and SE are computed from 100 replicates. We report p value of a test of the null hypothesis of no bias in the estimates of . Here, p(A) and p(D) denote proportion of additive and dominance causal variants, respectively. and denotes total additive and dominance heritabilities. For both components we assumed the GCTA model, which is defined as setting a = b = 0 in Equation 7. Power is reported for p value threshold of .

Accuracy and power to detect dominance heritability in simulations We assess power, bias, and SE of our method in the presence of dominance and additive heritability under 16 different genetic architectures (N = 291,273 unrelated individuals, M = 459,792 array SNPs). Power, mean, and SE are computed from 100 replicates. We report p value of a test of the null hypothesis of no bias in the estimates of . Here, p(A) and p(D) denote proportion of additive and dominance causal variants, respectively. and denotes total additive and dominance heritabilities. For both components we assumed the GCTA model, which is defined as setting a = b = 0 in Equation 7. Power is reported for p value threshold of . We performed simulations to compare the accuracies of RHE-mc to REML and HE regression implemented in the GCTA software. For computational reasons, we simulate phenotypes from a subsampled set of 10,000 genotypes across M = 459,792 array SNPs from the UK Biobank data. We simulated 100 phenotypes where and , p(A) = 1, and p(D) = 0.05. All three methods obtain unbiased estimates of additive and dominance heritability. The standard error of RHE-mc is 3% and 12% larger than REML(GCTA) for additive and dominance heritability, respectively. The standard error of RHE-mc is same as HE(GCTA) for additive heritability and 3% less than HE(GCTA) for dominance heritability (Table S8). Further, we evaluated the accuracy of the jackknife estimate of standard error in simulations (N = 291,273 unrelated individuals, M = 459,792 array SNPs) across diverse genetic architectures. We observe that the jackknife SE yields estimates with relative bias −1.7% on average over 13 genetic architectures (Table S5). Finally, we performed experiments to measure the extent to which we are able to capture additive and dominance variation of causal SNPs when only a subset of causal SNPs are observed due to imperfect tagging. In the first set of experiments, we simulated phenotypes based on array SNPs (N = 291,273 unrelated individuals, M = 459,792 array SNPs) where and , the proportion of causal variants in the additive component is varied between 1% and 100% while the proportion of causal variants in the dominance variance component is set to 1%. We ran RHE-mc on genotypes with varying proportions of observed causal SNPs, . While estimates of additive heritability remain relatively unbiased, estimates of dominance heritability are biased downward with the magnitude of the bias being proportional to the percentage of observed causal SNPs (Table S6). These experiments suggest that dominance heritability is more sensitive to imperfect tagging than additive heritability (although this sensitivity might also be partly explained by the smaller magnitudes of the dominance heritability in our simulations). To further explore this issue, we repeated this experiment using M = 4,824,392 imputed genotypes with MAF > 1% with the same genetic architecture used in the analysis of array SNPs. We observe that both additive and dominance heritability estimates are relatively unbiased even when the percentage of observed causal SNPs is as low as 0% (Table S7). These observations likely reflect the better tagging of SNPs that encode additive and dominance genotypes in the imputed data.

Estimates of additive and dominance deviation effects in the UK Biobank

We applied our method to estimate additive and dominance heritability for 50 quantitative traits in the UK Biobank by partitioning the additive component into 8 bins (based on two MAF bins [MAF ≤ 0.05, MAF > 0.05] and quartiles of the LD-scores) and a single dominance bin. We restricted our analysis to N = 291,273 unrelated white British individual and M = 459,792 SNPs (MAF > 1%) that were present in the UK Biobank Axiom array. Further, we chose a subset of 50 traits out of a total of 57 traits that have evidence for non-zero additive heritability (Z score > 3). Across the 50 traits, we observe that the average additive heritability is 21.86% (standard deviation of 9.21% across traits) (Figure 3). On the other hand, we estimate average dominance heritability to be 0.13% (SD = 0.39%). On average, we observe that dominance heritability is about 0.48% of additive heritability. We find no evidence for traits that have statistically significant non-zero dominance heritability after correcting for multiple testing .

Figure 3

Estimates of additive and dominance heritability from array SNPs for 50 quantitative phenotypes in the UK Biobank

We ran our method partitioning the additive component into 8 bins defined based on two MAF bins (MAF ≤ 0.05, MAF > 0.05) and quartiles of the LD-scores and a single dominance bin. We summarize the estimates of additive and dominance heritability across the 50 phenotypes (N = 291,273 unrelated white British individuals, M = 459,792 common array SNPs [MAF > 1%]). Black error bars in (A) mark ±2 standard errors centered on the estimated heritability. In (B) and (C) we plot the histogram of and , respectively. Point estimates and SE’s are reported in Table S2.

Estimates of additive and dominance heritability from array SNPs for 50 quantitative phenotypes in the UK Biobank We ran our method partitioning the additive component into 8 bins defined based on two MAF bins (MAF ≤ 0.05, MAF > 0.05) and quartiles of the LD-scores and a single dominance bin. We summarize the estimates of additive and dominance heritability across the 50 phenotypes (N = 291,273 unrelated white British individuals, M = 459,792 common array SNPs [MAF > 1%]). Black error bars in (A) mark ±2 standard errors centered on the estimated heritability. In (B) and (C) we plot the histogram of and , respectively. Point estimates and SE’s are reported in Table S2. Applying our method with a single additive component (no MAF/LD partitioning), we obtain an average (SD = 12.14%) and average (SD = 0.42%) across 50 traits with no evidence for statistically significant non-zero (Table S4). To assess the effect of population stratification on our results, we repeated our analyses retaining the first 10 PCs and 40 PCs. While our original results with first 20 PCs suggested that average (SD = 0.39%), we observe average (SD = 0.38%) with the first 10 PCs while average (SD = 0.42%) with the first 40 PCs. Across these analyses, none of the traits show evidence for non-zero estimates that are statistically significant (Table S9). To explore the impact of imperfect tagging of causal variants on our results, we analyzed M = 4,824,392 imputed genotypes with MAF > 1%. We observed average (SD = 9.49%) across the 50 traits (Figure 4; Pearson’s correlation between the point estimates of across array and imputed genotypes is 0.998) with no statistically significant differences between the estimates . On the imputed genotypes, we estimated average to be 0.06% (SD = 0.19%) with the dominance heritability being about 0.47% of additive heritability. We also did not observe any statistically significant differences between the estimates across array and imputed genotypes suggesting that imperfect tagging of common causal SNPs (MAF > 1%) is unlikely to explain our results. Although we did not find evidence for statistically significant non-zero after correcting for multiple testing, we found suggestive evidence for non-zero dominance heritability for blood biochemistry traits: aspartate, basal metabolic rate, blood reticulocyte count, glucose, and calcium (p < 0.05).

Figure 4

Estimates of additive and dominance heritability from imputed SNPs for 50 quantitative phenotypes in the UK Biobank

We ran our method partitioning the additive component into 8 bins defined based on two MAF bins (MAF ≤ 0.05, MAF > 0.05) and quartiles of the LD-scores and a single dominance bin. We summarize the estimates of additive and dominance heritability across the 50 phenotypes (N = 291,273 unrelated white British individuals, M = 4,824,392 common imputed SNPs [MAF > 1%]). Black error bars in (A) mark ±2 standard errors centered on the estimated heritability. In (B) and (C) we plot the histogram of and , respectively. Point estimates and SE’s are reported in Table S3.

Estimates of additive and dominance heritability from imputed SNPs for 50 quantitative phenotypes in the UK Biobank We ran our method partitioning the additive component into 8 bins defined based on two MAF bins (MAF ≤ 0.05, MAF > 0.05) and quartiles of the LD-scores and a single dominance bin. We summarize the estimates of additive and dominance heritability across the 50 phenotypes (N = 291,273 unrelated white British individuals, M = 4,824,392 common imputed SNPs [MAF > 1%]). Black error bars in (A) mark ±2 standard errors centered on the estimated heritability. In (B) and (C) we plot the histogram of and , respectively. Point estimates and SE’s are reported in Table S3.

Discussion

The contribution of non-additive genetic effects to complex trait variation has been intensely debated.,,,, Here, we have extended our previously developed variance components method to jointly estimate multiple additive and dominance variance components on biobank-scale genotype-trait data. We find that our method accurately estimates additive and dominance heritability across a range of MAF and LD-dependent genetic architectures. While tests for the existence of a dominance component have well-controlled false positive rates, our method has high power to detect dominance components with in a sample of unrelated individuals. In application to 50 quantitative traits in the UK Biobank with genotypes measured across 459,792 array SNPs (MAF > 1%) as well as genotypes measured across 4,824,392 imputed SNPs (MAF > 1%), we observe substantial additive heritability (21.86% on average for array SNPs, 22.83% on average for imputed SNPs). On the other hand, estimates of dominance heritability tend to be low (0.13% for array and 0.06% for imputed SNPs) so that we do not find any trait with statistically significant evidence of dominance heritability. While a previous study estimated a 3% dominance heritability (point estimate averaged across 79 traits), we estimate a dominance heritability of 0.13% (point estimate averaged across 50 traits). The differences in the point estimates could be due to the differences in the set of phenotypes and individuals analyzed as well as methodology used. However, our results are concordant with Zhu et al. in that we find no statistically significant estimate of dominance heritability across the traits analyzed. Further, Zhu et al. analyzed 7,000 individuals, which leads to larger SEs than our results based on 300K individuals. The authors of Zhu et al. note that the power to estimate a dominance heritability of 0.05 with a sample size of 7,000 is only about 12%. On the other hand, our power calculations indicate that it is unlikely that is larger than 1% at the traits analyzed. Taken together, our results suggest that systematic identification of dominance heritability will require analysis of even larger sample sizes than the individuals that we analyzed here. While the growth of biobank-scale datasets will facilitate such estimates, such analyses will also require the development of novel methods that can analyze data at scale. We discuss several limitations of our study as well as directions for future work. The analysis of dominance variance that we have undertaken relies on a specific encoding of dominance and additive effects that leads to uncorrelated components. Due to the choice of this representation, the additive variance component that we estimate includes a contribution from dominant genetic effects while the dominance variance component quantifies the extra genetic variance that can be explained by dominance deviation on top of the additive-only model. Alternative encodings might be associated with different statistical and biological interpretation. Second, while our analysis has focused primarily on common SNPs (MAF > 1%), previous work has shown that dominance deviation effects tend to decay faster due to imperfect tagging relative to additive effects leading to a larger bias in estimates of these effects. The concordance of our results across array and imputed genotypes suggests that our estimates of dominance heritability attributed to common SNPs are likely to be robust although we would still underestimate the contribution from low-frequency SNPs. The scalability of our method allows for the exploration of alternative encodings and low-frequency variants at scale. Finally, while our current work focuses on quantitative traits, methods that have previously proposed to estimate heritability in case-control studies, can be extended to estimate dominance heritability for binary traits.

Declaration of interests

The authors declare no competing interests.

22 in total

Review 1. On epistasis: why it is unimportant in polygenic directional selection.

Authors: James F Crow
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2010-04-27 Impact factor: 6.237

2. Dominance genetic variation contributes little to the missing heritability for human complex traits.

Authors: Zhihong Zhu; Andrew Bakshi; Anna A E Vinkhuyzen; Gibran Hemani; Sang Hong Lee; Ilja M Nolte; Jana V van Vliet-Ostaptchouk; Harold Snieder; Tonu Esko; Lili Milani; Reedik Mägi; Andres Metspalu; William G Hill; Bruce S Weir; Michael E Goddard; Peter M Visscher; Jian Yang
Journal: Am J Hum Genet Date: 2015-02-12 Impact factor: 11.025

3. Measuring missing heritability: inferring the contribution of common variants.

Authors: David Golan; Eric S Lander; Saharon Rosset
Journal: Proc Natl Acad Sci U S A Date: 2014-11-24 Impact factor: 11.205

4. Estimating SNP-Based Heritability and Genetic Correlation in Case-Control Studies Directly and with Summary Statistics.

Authors: Omer Weissbrod; Jonathan Flint; Saharon Rosset
Journal: Am J Hum Genet Date: 2018-07-05 Impact factor: 11.025

5. Age at menarche as a fitness trait: nonadditive genetic variance detected in a large twin sample.

Authors: S A Treloar; N G Martin
Journal: Am J Hum Genet Date: 1990-07 Impact factor: 11.025

6. Influence of gene interaction on complex trait variation with multilocus models.

Authors: Asko Mäki-Tanila; William G Hill
Journal: Genetics Date: 2014-07-01 Impact factor: 4.562

7. Epistasis and its contribution to genetic variance components.

Authors: J M Cheverud; E J Routman
Journal: Genetics Date: 1995-03 Impact factor: 4.562

8. Efficient variance components analysis across millions of genomes.

Authors: Ali Pazokitoroudi; Yue Wu; Kathryn S Burch; Kangcheng Hou; Aaron Zhou; Bogdan Pasaniuc; Sriram Sankararaman
Journal: Nat Commun Date: 2020-08-11 Impact factor: 14.919

9. Functional architecture of low-frequency variants highlights strength of negative selection across coding and non-coding annotations.

Authors: Steven Gazal; Po-Ru Loh; Hilary K Finucane; Andrea Ganna; Armin Schoech; Shamil Sunyaev; Alkes L Price
Journal: Nat Genet Date: 2018-10-08 Impact factor: 38.330

10. The UK Biobank resource with deep phenotyping and genomic data.

Authors: Clare Bycroft; Colin Freeman; Desislava Petkova; Gavin Band; Lloyd T Elliott; Kevin Sharp; Allan Motyer; Damjan Vukcevic; Olivier Delaneau; Jared O'Connell; Adrian Cortes; Samantha Welsh; Alan Young; Mark Effingham; Gil McVean; Stephen Leslie; Naomi Allen; Peter Donnelly; Jonathan Marchini
Journal: Nature Date: 2018-10-10 Impact factor: 49.962

7 in total

1. gJLS2: an R package for generalized joint location and scale analysis in X-inclusive genome-wide association studies.

Authors: Wei Q Deng; Lei Sun
Journal: G3 (Bethesda) Date: 2022-04-04 Impact factor: 3.154

Review 2. From Mendel to quantitative genetics in the genome era: the scientific legacy of W. G. Hill.

Authors: Brian Charlesworth; Michael E Goddard; Karin Meyer; Peter M Visscher; Bruce S Weir; Naomi R Wray
Journal: Nat Genet Date: 2022-07-11 Impact factor: 41.307

3. Genetic and environmental contributions to IQ in adoptive and biological families with 30-year-old offspring.

Authors: Emily A Willoughby; Matt McGue; William G Iacono; James J Lee
Journal: Intelligence Date: 2021-08-25

4. Fully exploiting SNP arrays: a systematic review on the tools to extract underlying genomic structure.

Authors: Laura Balagué-Dobón; Alejandro Cáceres; Juan R González
Journal: Brief Bioinform Date: 2022-03-10 Impact factor: 11.622

5. Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals.

Authors: Aysu Okbay; Alexander I Young; Yeda Wu; Nancy Wang; Hariharan Jayashankar; Michael Bennett; Seyed Moeen Nehzati; Julia Sidorenko; Hyeokmoon Kweon; Grant Goldman; Tamara Gjorgjieva; Yunxuan Jiang; Barry Hicks; Chao Tian; David A Hinds; Rafael Ahlskog; Patrik K E Magnusson; Sven Oskarsson; Caroline Hayward; Archie Campbell; David J Porteous; Jeremy Freese; Pamela Herd; Chelsea Watson; Jonathan Jala; Dalton Conley; Philipp D Koellinger; Magnus Johannesson; David Laibson; Michelle N Meyer; James J Lee; Augustine Kong; Loic Yengo; David Cesarini; Patrick Turley; Peter M Visscher; Jonathan P Beauchamp; Daniel J Benjamin
Journal: Nat Genet Date: 2022-03-31 Impact factor: 41.307

6. Genome-wide variance quantitative trait locus analysis suggests small interaction effects in blood pressure traits.

Authors: Gang Shi
Journal: Sci Rep Date: 2022-07-25 Impact factor: 4.996

7. The X factor: A robust and powerful approach to X-chromosome-inclusive whole-genome association studies.

Authors: Bo Chen; Radu V Craiu; Lisa J Strug; Lei Sun
Journal: Genet Epidemiol Date: 2021-07-05 Impact factor: 2.344

7 in total