| Literature DB >> 22403583 |
M Angeles Pérez-Cabal1, Ana I Vazquez, Daniel Gianola, Guilherme J M Rosa, Kent A Weigel.
Abstract
The impact of extent of genetic relatedness on accuracy of genome-enabled predictions was assessed using a dairy cattle population and alternative cross-validation (CV) strategies were compared. The CV layouts consisted of training and testing sets obtained from either random allocation of individuals (RAN) or from a kernel-based clustering of individuals using the additive relationship matrix, to obtain two subsets that were as unrelated as possible (UNREL), as well as a layout based on stratification by generation (GEN). The UNREL layout decreased the average genetic relationships between training and testing animals but produced similar accuracies to the RAN design, which were about 15% higher than in the GEN setting. Results indicate that the CV structure can have an important effect on the accuracy of whole-genome predictions. However, the connection between average genetic relationships across training and testing sets and the estimated predictive ability is not straightforward, and may depend also on the kind of relatedness that exists between the two subsets and on the heritability of the trait. For high heritability traits, close relatives such as parents and full-sibs make the greatest contributions to accuracy, which can be compensated by half-sibs or grandsires in the case of lack of close relatives. However, for the low heritability traits the inclusion of close relatives is crucial and including more relatives of various types in the training set tends to lead to greater accuracy. In practice, CV designs should resemble the intended use of the predictive models, e.g., within or between family predictions, or within or across generation predictions, such that estimation of predictive ability is consistent with the actual application to be considered.Entities:
Keywords: accuracy; genetic relationships; training–testing designs
Year: 2012 PMID: 22403583 PMCID: PMC3288819 DOI: 10.3389/fgene.2012.00027
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Relatives in the training set from testing bulls in the generational (GEN), random (RAN), and the A-matrix decomposition (UNREL) designs.
| Type of relative | GEN | RAN | UNREL |
|---|---|---|---|
| Sire | 121 | 134 | 130 |
| Maternal grandsire | 105 | 102 | 94 |
| Paternal grandsire | 53 | 47 | 39 |
| Full-sibs | 6 | 242 | 206 |
| Half-sibs | 767 | 2,803 | 2,777 |
| Offspring | 0 | 782 | 965 |
Figure 1Box plots of average and maximum additive genetic relationships between a testing individual and all individuals in the training set for the generational (GEN), random (RAN), and the A-matrix decomposition (UNREL) designs.
Total number of relatives of individuals in the testing set contributing to the training set by degree of relatedness (.
| Degree of relatedness | GEN | RAN | UNREL |
|---|---|---|---|
| 1,505 | 887 | 39 | |
| 0.5 < | 19,129 | 56,085 | 16,952 |
| 11,070 | 28,930 | 8,867 |
.
Accuracy measured as the correlation between direct genomic values and realized PTA in the testing set for protein yield and somatic cell score (SCS) for different training–testing designs: generational (GEN), random (RAN), and the A-matrix decomposition (UNREL).
| Trait | GEN_0308 | GEN_0808 | RAN | UNREL |
|---|---|---|---|---|
| Protein yield | 0.7080 | 0.7077 | 0.8218 | 0.8106 |
| SCS | 0.6706 | 0.6709 | 0.6864 | 0.7121 |
Figure 2Accuracy of predictions for protein yield and somatic cell score (SCS) depending on the number and type of relatives in the training set for the generational (GEN), random (RAN), and the A-matrix decomposition (UNREL) designs (* indicates correlation obtained from less than 10 individuals).
Figure 3Scatter plots of direct genomic value and PTA, as well as the regression line (dashed line), for three testing sires common in all the layouts, which differed in the number of relatives included in the training set (Red square: no close relatives; Green triangle: one close relative; Blue circle: two close relatives).
Summary of relatives in the training set for three testing sires common to all layouts (GEN, generational; RAN, random; UNREL, .
| Sire | GEN | RAN | UNREL |
|---|---|---|---|
| Red (no close relatives) | 0 + 2 | 0 + 6 | 0 + 8 |
| Green (one close relative) | 1 + 0 | 1 + 27 | 1 + 0 |
| Blue (Two close relatives) | 2 + 0 | 2 + 4 | 1 + 7 |
It is expressed as number of relatives with a genetic relationship greater or equal than 0.5 (sire, offspring, and full-sibs) plus number of relatives with a genetic relationship greater or equal than 0.25 and less than 0.5 (grandsires, half-sibs, and grandsons).
Accuracy measured as the correlation between direct genomic values and realized PTA in the testing set for protein yield and somatic cell score (SCS) in sires with and without offspring in the training set, as estimated from the random (RAN) and the .
| RAN | UNREL | |||
|---|---|---|---|---|
| Offspring in training set | No offspring in training set | Offspring in training set | No offspring in training set | |
| Protein yield | 0.89 | 0.81 | 0.86 | 0.81 |
| SCS | 0.79 | 0.68 | 0.87 | 0.70 |