| Literature DB >> 26612537 |
Nicolas Heslot1,2, Jean-Luc Jannink3,4.
Abstract
BACKGROUND: For genomic prediction and genome-wide association studies (GWAS) using mixed models, covariance between individuals is estimated using molecular markers. Based on the properties of mixed models, using available molecular data for prediction is optimal if this covariance is known. Under this assumption, adding individuals to the analysis should never be detrimental. However, some empirical studies showed that increasing training population size decreased prediction accuracy. Recently, results from theoretical models indicated that even if marker density is high and the genetic architecture of traits is controlled by many loci with small additive effects, the covariance between individuals, which depends on relationships at causal loci, is not always well estimated by the whole-genome kinship.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26612537 PMCID: PMC4661961 DOI: 10.1186/s12711-015-0171-z
Source DB: PubMed Journal: Genet Sel Evol ISSN: 0999-193X Impact factor: 4.297
Summary of the datasets used
| Name | Species | Description | Traits |
|---|---|---|---|
| Loblolly pine | Loblolly pine [ | 926 individuals, 5000 SNPs | Five wood quality traits |
| Cimmyt wheat | Wheat [ | 599 individuals, 1279 DArT markers | Yield in four environments |
| Pig | Pig [ | 3460 individuals, 53k SNPs | Two anonymous trait |
| Maize panel | Maize [ | 2279 inbreds, 68,120 GBS SNPs imputed | Flowering time in degree days |
| Maize connected crosses | Maize [ | 635 inbreds, 17k SNPs | Five traits (two diseases, three yield components) |
| Cornell wheat | Wheat [ | Breeding population 365 individuals, 32k GBS SNPs imputed | Four traits (yield, height, heading date, pre-harvest sprouting) |
| Rice panel | Rice [ | 398 individuals, diverse panel 36,901 SNPs | 28 traits (flowering time, yield components and quality traits) |
SNP single nucleotide polymorphisms, GBS genotyping by sequencing, DArT diversity arrays technology
Fig. 1Minus log P values of the log-likelihood ratio tests for the Gaussian kernel (gray), K-kernel (red) and C-kernel (blue) for each trait and dataset except the rice dataset (see Fig. 2). The horizontal line indicates the significance level at 0.05 with Bonferroni correction for multiple testing on trait-dataset combinations; the vertical lines separate the different datasets; datasets are presented in Table 1 and more details on the traits are available in Additional file 1: Table S1
Fig. 2Minus log P values of the log-likelihood ratio tests for the Gaussian kernel (gray), K-kernel (red) and C-kernel (blue) for each trait of the rice dataset. The horizontal line indicates the significance level at 0.05 with Bonferroni correction for multiple testing; the rice dataset is in Table 1 and more details on the traits are available in Additional file 1: Table S1
Average statistical power of the different kernels across QTN size and traits
| Dataset | Baseline | Gaussian | K-kernel | C-kernel |
|---|---|---|---|---|
| Maize panel | 0.700 | 0.699 | 0.700 | 0.700 |
| Rice panel | 0.523 | 0.549 | 0.547 | 0.532 |
All models included a Q matrix (average fraction of true QTN detected)
Fig. 3Gain in power for each trait in the rice panel with alternative kernels as a function of the reduction in AIC compared to GBLUP. a Gaussian kernel. b K-kernel. c C-kernel
Fig. 4Gain in accuracy for all traits and datasets with stratified cross-validation (a) and leave-one-cluster-out cross-validation (b), for the Gaussian kernel (gray filled circle), the K-kernel (red square) and the C-kernel (blue triangle) for all trait-dataset combinations as a function of the reduction in AIC compared to the GBLUP