| Literature DB >> 30575761 |
Anthony F Herzig1,2, Teresa Nutile3, Daniela Ruggiero3,4, Marina Ciullo5,6, Hervé Perdry7, Anne-Louise Leutenegger8,9.
Abstract
Inconsistencies between published estimates of dominance heritability between studies of human genetic isolates and human outbred populations incite investigation into whether such differences result from particular trait architectures or specific population structures. We analyse simulated datasets, characteristic of genetic isolates and of unrelated individuals, before analysing the isolate of Cilento for various commonly studied traits. We show the strengths of using genetic relationship matrices for variance decomposition over identity-by-descent based methods in a population isolate and that heritability estimates in isolates will avoid the downward biases that may occur in studies of samples of unrelated individuals; irrespective of the simulated distribution of causal variants. Yet, we also show that precise estimates of dominance in isolates are demonstrably problematic in the presence of shared environmental effects and such effects should be accounted for. Nevertheless, we demonstrate how studying isolates can help determine the existence or non-existence of dominance for complex traits, and we find strong indications of non-zero dominance for low-density lipoprotein level in Cilento. Finally, we recommend future study designs to analyse trait variance decomposition from ensemble data across multiple population isolates.Entities:
Mesh:
Year: 2018 PMID: 30575761 PMCID: PMC6303332 DOI: 10.1038/s41598-018-36050-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Published results for additive and dominant genetic variability from various study designs.
| Phenotype | Abney, McPeek, & Ober[ | Pilia | Traglia | Zaitlen | van Dongen | Chen | Chen | Zhu | Nolte | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
| Height | — | — | 0.77 | 0.23 * | 0.78 | 0.22 * | — | — | 0.81 | 0.09 | 0.77 | 0.09* | 0.62 | 0.00 | 0.48 | 0.02 | 0.49 | 0.00 |
| BMI | 0.54 | 0.00 | 0.36 | 0.32 * | 0.33 | 0.17 | 0.16 | 0.09 | 0.41 | 0.37 | 0.28 | 0.41* | 0.21 | 0.02 | 0.23 | 0.15* | 0.25 | 0.02 |
| TGLY | 0.37 | 0.00 | 0.30 | 0.42 * | 0.39 | 0.35 * | — | — | 0.33 | 0.25 | 0.42 | 0.14 | 0.31 | 0.28* | — | — | 0.19 | 0.01 |
| HDL | 0.63 | 0.00 | 0.47 | 0.11 | 0.62 | 0.00 | 0.42 | 0.14* | 0.40 | 0.27 | 0.66 | 0.00 | 0.24 | 0.01 | 0.25 | 0.07 | 0.19 | 0.00 |
| Total Chol | — | — | 0.38 | 0.29 * | 0.23 | 0.77 * | — | — | 0.51 | 0.16 | 0.28 | 0.19* | 0.15 | 0.00 | 0.21 | 0.01 | 0.23 | 0.00 |
| LDL | 0.36 | 0.60 * | 0.37 | 0.27 * | 0.33 | 0.66 * | 0.20 | 0.26* | 0.51 | 0.18 | 0.23 | 0.24* | 0.16 | 0.00 | 0.26 | 0.02 | 0.27 | 0.00 |
*Estimates of presented as statistically significant at the 5% level.
‘—’ Trait not studied for dominance in the article.
(1) Estimates based on estimating K and D from expected proportions of identity-by-descent (IBD) sharing coming from pedigree information.
(2) The depth of pedigree information in these studies did not allow the differentiation between a dominance model (including non-additive genetic variation) and a household model (including an effect of shared environment between siblings).
(3) The authors of this study analysed a large sample from the Icelandic population for whom extensive pedigree data was available, Matrices K and D were estimated by locating and counting stretches of IBD between pairs of individuals.
(4) This study analyses a large cohort of monozygotic and dizygotic adult twins. Standard errors are only presented for broad-sense heritability, though it is likely that the estimates for for all traits other than height were significantly different to zero.
(5) The authors of this study performed separate analysis, firstly a twin based study using structural equation methods with adjustments for reported levels of time spent in a shared environment between twins, and secondly a study of a large sample of unrelated which included one individual out of most twin pairs in the first analysis.
(6) Estimates based on calculating correlations between additively and non-additively coded genotypes to compute matrices K and D.
Abbreviations: BMI: Body-mass index; TGLY: Triglycerides; HDL: High-density lipoproteins; Total Chol: Total cholesterol; LDL: Low-density lipoproteins; N: Sample size.
Figure 1Estimating heritability components in simulated populations with different structures. (a) Maximum Likelihood Estimates (MLEs) of and are presented for each simulated phenotype by vertical descending gold and ascending blue bars respectively. The middle grey bars represent the remaining environmental variation Each phenotype was simulated using different numbers of causal variants (M) for each variance component which corresponds to the x-axis. Causal variants are mostly rare, as they are selected completely at random (Causal Variant Scenario A). All MLEs are displayed for the 4 populations either Isolated(N) or Outbred(N), where the value of N denotes the sample size. Horizontal gold and blue lines indicating the values used for simulation . Matrices K and D were calculated using roughly 5.8 million frequent UK10K positions. A missing bar for or indicates the maximum likelihood estimate of the parameter was zero. (b) An example of one set of MLEs from section A is given for the population Isolated(1444) and a value of M of 105. (c) Gold and blue diamonds represent the empirical standard errors of the MLEs for a selection of values of M. Simulation repeated 500 times.
Figure 2Heritability estimates when causal variants are non-rare. Here, phenotypes are simulated by choosing causal variants that are all non-rare, as they are selected to have MAF > 0.01 (Causal Variant Scenario B). Legends and the configuration of this plot are identical to those of Fig. 1A. Here, and for subsequent figures, we overlay the empirical standard error estimates, whose values correspond to the second y-axis on the right of the figure.
Figure 3Effect of sample size on heritability estimates in an isolate. Estimates of and are compared for populations with isolate characteristics of size 1,444, 4,332, and 8,664. Phenotypes are simulated under Causal Variant Scenario A and under the setting , . Legends and the configuration of this plot are identical to those of Fig. 2.
Figure 4Effect of relatedness matrix estimation method in an isolate. Here, we compare methods of estimating matrices K and D for the simulated population isolate ‘Isolated(1444)’ K and D are estimated using either genetic relationship matrices (GRM), Pedigree information, or true IBD-sharing (IBD). Results are displayed on a simplex governed by the two parameters and , which both could range between 0 and 1. The heritability scenario used to simulate all phenotypes is marked by the triangular point in the centre of each simplex. Minimal ellipses containing 95% of the maximum likelihood estimates (MLEs) from 500 simulated phenotypes under either Causal Variant Scenario A or B (see Figs 1 and 2) are presented. Here, phenotypes are simulated from a large set of causal variants (M = 100,000).
Figure 5Effect of shared environmental factors on heritability component estimates in an isolate. Comparison of estimates of and under models with and without a shared environment component (model KDS and model KD, respectively). As in Fig. 4, minimal ellipses containing 95% of the maximum likelihood estimates (MLEs) from 500 simulated phenotypes but now under the setting . Matrices K and D are calculated either using genotype relationship matrices (GRMs) or pedigree information. In the case of model KD when using pedigree information (right), all MLEs were found to be directly on the bottom edge of the simplex, and so the minimal ellipsoid degenerated into a line segment. Here, phenotypes are simulated from a large set of causal variants (M = 100,000).
Maximum likelihood estimates for the contribution of each variance components considered in a Linear Mixed Model (LMM).
| Phenotype | GRM Model: K | GRM Model: KD | GRM Model: KS | GRM Model: KDS | Pedigree Model: K | Pedigree Model: KD | Pedigree Model: KS | Pedigree Model: KDS | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
| Height | 0.76 | 0.74 | 0.13 | 0.74 | 0.04 | 0.74 | 0.12 | 0.01 | 0.75 | 0.74 | 0.15 | 0.74 | 0.04 | 0.74 | 0.15 | 0.00 |
| BMI | 0.40 | 0.35 | 0.58 | 0.31 | 0.23 | 0.31 | 0.00 | 0.23 | 0.44 | 0.35 | 0.65 | 0.35 | 0.21 | 0.35 | 0.00 | 0.21 |
| TGLY | 0.27 | 0.24 | 0.26 | 0.21 | 0.11 | 0.21 | 0.00 | 0.11 | 0.28 | 0.23 | 0.45 | 0.23 | 0.11 | 0.23 | 0.41 | 0.01 |
| HDL | 0.49 | 0.49 | 0.00 | 0.44 | 0.02 | 0.44 | 0.00 | 0.02 | 0.48 | 0.49 | 0.00 | 0.48 | 0.01 | 0.48 | 0.00 | 0.01 |
| Total Chol | 0.29 | 0.23 | 0.55 | 0.23 | 0.18 | 0.22 | 0.27 | 0.12 | 0.29 | 0.21 | 0.72 | 0.22 | 0.18 | 0.21 | 0.47 | 0.06 |
| LDL | 0.32 | 0.25 | 0.52 | 0.24 | 0.17 | 0.23 | 0.29 | 0.10 | 0.33 | 0.24 | 0.66 | 0.24 | 0.16 | 0.24 | 0.45 | 0.06 |
Model names refer to the set of variance components included. K denotes the additive genetic component, D the non-additive or dominant genetic component, and S the component accounting for shared environmental effects between siblings. The previously reported results from Table 1 can be compared to our results under the model KD. Matrices K and D are calculated either as genetic relationship matrices (GRMs) or from pedigree information.
Figure 6Heritability analysis for BMI and LDL in Cilento. Black contours represent the likelihood profile from the model KD (see Fig. 5), with matrices K and D calculated as genetic relationship matrices (GRMs). The red zone represents the 95% confidence interval for the red maximum likelihood estimate (MLE) (red triangular peak). The corresponding MLE and 95% confidence boundary for the analysis using pedigree information to estimate K and D are added to the plot in blue.
Figure 7Effect of shared environmental factors on heritability analysis for BMI and LDL in Cilento. Here we compare models KD and KDS (see Fig. 5) for the two traits in Cilento. Black contours represent the likelihood profile for the model KD, with the red zone indicating the 95% confidence interval for the red maximum likelihood estimate (MLE) (red triangular peak). The corresponding MLE for the KDS model is added in green. We also add in the previously observed estimates from the literature (Table 1).