| Literature DB >> 24765102 |
Abstract
In gene mapping, it is common to test for association between the phenotype and the genotype at a large number of loci, i.e., the same response variable is used repeatedly to test a large number of non-independent and non-nested hypotheses. In many of these genetic problems, the underlying model is a mixed model consistent of one or very few major genes concurrently with a genetic background effect, usually thought as of polygenic nature and, consequently, modeled through a random effects term with a well-defined covariance structure dependent upon the kinship between individuals. Either because the interest lies only on the major genes or to simplify the analysis, it is habitual to drop the random effects term and use a simple linear regression model, sometimes complemented with testing via resampling as an attempt to minimize the consequences of this practice. Here, it is shown that dropping the random effects term has not only extreme negative effects on the control of the type I error rate, but it is also unlikely to be fixed by resampling because, whenever the mixed model is correct, this practice does not allow to meet some basic requirements of resampling in a gene mapping context. Furthermore, simulations show that the type I error rates when the random term is ignored can be unacceptably high. As an alternative, this paper introduces a new bootstrap procedure to handle the specific case of mapping by using recombinant congenic strains under a linear mixed model. A simulation study showed that the type I error rates of the proposed procedure are very close to the nominal ones, although they tend to be slightly inflated for larger values of the random effects variance. Overall, this paper illustrates the extent of the adverse consequences of ignoring random effects term due to polygenic factors while testing for genetic linkage and warns us of potential modeling issues whenever simple linear regression for a major gene yields multiple significant linkage peaks.Entities:
Keywords: bootstrapping mixed models; ignoring random effects; mapping quantitative trait loci; misspecified genetic models; recombinant congenic strains
Year: 2014 PMID: 24765102 PMCID: PMC3980105 DOI: 10.3389/fgene.2014.00068
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Percentage of declared significant peaks with a bootstrap genome-wide adjusted significance level of 0.01 when the proposed mixed model methodology is used.
Estimates based on 1000 simulated datasets for each λ.
Empirical genome-wide type I error rates obtained via bootstrap in the simulation study (0.01 is the nominal value and the number of simulated datasets for each λ is 1000).
| Naive regression | 0.008 | 0.389 | 0.527 | 0.612 | 0.761 | 0.808 |
| Mixed model | 0.008 | 0.013 | 0.011 | 0.017 | 0.015 | 0.016 |
Figure 1Typical histogram of simulated data. The p-value profiles of the data on this histogram were computed and plotted in Figure 2.
Figure 2Bootstrap genome-wide corrected Dashed line for naive model (Equation 2) and solid line (sometimes hardly distinguishable from the x-axis line) for the mixed model (Equation 6). Note that both profiles have been corrected for multiple testing.
Percentage of declared significant peaks with a bootstrap genome-wide adjusted significance level of 0.01 when a naive regression at the markers is used.
Estimates based on 1000 simulated datasets for each λ.