| Literature DB >> 24817879 |
Abstract
Exploring heritability of complex traits is a central focus of statistical genetics. Among various previously proposed methods to estimate heritability, variance component methods are advantageous when estimating heritability using markers. Due to the high-dimensional nature of data obtained from genome-wide association studies (GWAS) in which genetic architecture is often unknown, the most appropriate heritability estimator model is often unclear. The Haseman-Elston (HE) regression is a variance component method that was initially only proposed for linkage studies. However, this study presents a theoretical basis for a modified HE that models linkage disequilibrium for a quantitative trait, and consequently can be used for GWAS. After replacing identical by descent (IBD) scores with identity by state (IBS) scores, we applied the IBS-based HE regression to single-marker association studies (scenario I) and estimated the variance component using multiple markers (scenario II). In scenario II, we discuss the circumstances in which the HE regression and the mixed linear model are equivalent; the disparity between these two methods is observed when a covariance component exists for the additive variance. When we extended the IBS-based HE regression to case-control studies in a subsequent simulation study, we found that it provided a nearly unbiased estimate of heritability, more precise than that estimated via the mixed linear model. Thus, for the case-control scenario, the HE regression is preferable. GEnetic Analysis Repository (GEAR; http://sourceforge.net/p/gbchen/wiki/GEAR/) software implemented the HE regression method and is freely available.Entities:
Keywords: GWAS; Haseman–Elston regression; REML; case-control; identity by state; missing heritability; mixed linear model; variance component
Year: 2014 PMID: 24817879 PMCID: PMC4012219 DOI: 10.3389/fgene.2014.00107
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Notation definitions.
| Allele frequencies of | |
| Linkage disequilibrium of a pair of loci, | |
| ρ | |
| The mean of the squared correlation between any marker pair, including the marker with itself. This can be estimated from the genotype data. | |
| The mean of the squared correlation between any a marker and a QTL. | |
| Λ | The ratio between |
| The number of markers. | |
| The effective number of markers. See the text and Supplementary Note | |
| The genotype set for the | |
| Standardized genotype scores for the | |
| The number of QTLs. | |
| Sample size. | |
| The phenotype of the | |
| The square of the phenotype difference between the | |
| Ω | The genetic relatedness between the |
| β | The additive effect of the |
| σ2 | Total additive variance. |
| Narrow-sense heritability. | |
| σ | The square-root of the additive variance of the |
| Hong23 | Expressed as |
| Subscript | Subscripts |
The joint distribution of two loci.
| The | 2 | |||
| 2 | 2 | 2 | ||
| 2 | ||||
| Marginal probability | 2 | |||
Each cell lists the joint probability of a genotype pair at the .
r.
The joint distribution of the genetic relatedness between individuals .
| 2 | 2 | ||||||
| 2 | 2 | ||||||
| 2 | 4 | ||||||
| 2 | |||||||
| 2 | 2 | ||||||
As A was set as the reference allele, with a frequency of p, aa, Aa, and AA were coded as 0, 1, and 2, respectively.
A reorganization of Table .
| Ω | ||||||
| Frequency | 4 | 2 | 4 | 4 |
Ω.
The expected phenotype conditional to one's genotype on the observed marker.
| (1 − 2 | |||
| (1 − | |||
| ( | |||
| (1 − | |||
| (1 − | |||
| (1 − | (2 | ||
It is assumed that the k.
r.
The joint distribution of .
s.
For the nine cells, the symmetrical cells are highlighted in same color. In each highlighted cell, three terms from the top to the bottom are Ω.
τ.
Summary of the derivations.
| One marker and one QTL | −4τ2 | −2ρ2 | |
| One marker and multiple QTLs | As above | ||
| Multiple markers and multiple QTLs | −2σ2 | ||
For .
When the phenotype is standardized, h.
Simulation evaluations of Equation (7).
| ρ = 0.25 | −0.062 (0.004) | −0.062 (0.0039) |
| ρ = 0.5 | −0.25 (0.004) | −0.25 (0.0039) |
| ρ = 0.75 | −0.56 (0.004) | −0.56 (0.0039) |
The standard error was calculated: . Here N = 1000 and M.
The standard errors in parentheses indicate the mean of the standard error from 100 simulation replications.
The sample size required for the single-marker HE regression to detect a QTL associated with the target marker.
| 0.005 | 33,276 | 8,319 | 3,697 |
| 0.01 | 16,638 | 4,159 | 1,849 |
| 0.025 | 6,655 | 1,664 | 739 |
| 0.05 | 3,327 | 832 | 370 |
Here the p-value cutoff was 10.
The required sample size that makes the HE regression more powerful than the conventional single-marker linear regression.
| 0.005 | 12,800 | 3,200 | 1,423 |
| 0.01 | 6,400 | 1,600 | 712 |
| 0.025 | 2,560 | 640 | 285 |
| 0.05 | 1,280 | 320 | 143 |
Simulation evaluations of Equation (11) and comparison between the HE regression and the mixed linear model method (Δ = 0).
| ρ = 0 | 0.5 (0.020) | 0.499 (0.020) | 0.499 (0.041) |
| ρ = 0.25 | 0.5 (0.019) | 0.500 (0.019) | 0.501 (0.042) |
| ρ = 0.5 | 0.5 (0.016) | 0.502 (0.015) | 0.491 (0.043) |
| ρ = 0.75 | 0.5 (0.011) | 0.488 (0.011) | 0.508 (0.048) |
Calculated given △ = 0.
The standard errors in parentheses indicate the mean of the standard error from 100 simulation replications.
Simulation evaluations of Equation (11) when the covariance summation is not zero (Δ ≠ 0).
| ρ = 0 | 0.500 (0.020) | 0.497 (0.020) | 0.499 (0.041) |
| ρ = 0.25 | 0.715 (0.019) | 0.712 (0.019) | 0.414 (0.041) |
| ρ = 0.5 | 0.853 (0.015) | 0.850 (0.015) | 0.347 (0.043) |
| ρ = 0.75 | 0.878 (0.011) | 0.881 (0.011) | 0.291 (0.048) |
Calculated given △ ≠ 0.
The standard errors in parentheses indicate the mean of the standard error from 100 simulation replications.
Figure 1Estimation of heritability on the liability scale using the HE regression and mixed linear model methods. In each row, from left to right, each panel represents the case-control sample simulated under the same heritability on the liability scale (h2) but with different prevalence. In each panel, the vertical axis indicates the estimated heritability on the liability scale (h2), whereas the horizontal axis indicates which of the three methods (REML, non-constrained REML, and HE regression [least square estimate]) was used. The standard error of the mean (SEM) is indicated at the top of each bar.