| Literature DB >> 28720710 |
Réka Howard1, Alicia L Carriquiry2, William D Beavis3.
Abstract
An epistatic genetic architecture can have a significant impact on prediction accuracies of genomic prediction (GP) methods. Machine learning methods predict traits comprised of epistatic genetic architectures more accurately than statistical methods based on additive mixed linear models. The differences between these types of GP methods suggest a diagnostic for revealing genetic architectures underlying traits of interest. In addition to genetic architecture, the performance of GP methods may be influenced by the sample size of the training population, the number of QTL, and the proportion of phenotypic variability due to genotypic variability (heritability). Possible values for these factors and the number of combinations of the factor levels that influence the performance of GP methods can be large. Thus, efficient methods for identifying combinations of factor levels that produce most accurate GPs is needed. Herein, we employ response surface methods (RSMs) to find the experimental conditions that produce the most accurate GPs. We illustrate RSM with an example of simulated doubled haploid populations and identify the combination of factors that maximize the difference between prediction accuracies of best linear unbiased prediction (BLUP) and support vector machine (SVM) GP methods. The greatest impact on the response is due to the genetic architecture of the population, heritability of the trait, and the sample size. When epistasis is responsible for all of the genotypic variance and heritability is equal to one and the sample size of the training population is large, the advantage of using the SVM method vs. the BLUP method is greatest. However, except for values close to the maximum, most of the response surface shows little difference between the methods. We also determined that the conditions resulting in the greatest prediction accuracy for BLUP occurred when genetic architecture consists solely of additive effects, and heritability is equal to one.Entities:
Keywords: epistasis; genomic prediction; machine learning; mixed models
Mesh:
Year: 2017 PMID: 28720710 PMCID: PMC5592935 DOI: 10.1534/g3.117.044453
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1(A) The response surface of yield in relation to temperature and drought. (B) The contour plot (level curves) of the response surface of yield.
Combinations and factorial effects for a design
| Treatment Combination | Factorial Effect | |||||||
|---|---|---|---|---|---|---|---|---|
| + | + | − | − | − | − | + | + | |
| + | − | + | − | − | + | − | + | |
| + | − | − | + | + | − | − | + | |
| + | + | + | + | + | + | + | + | |
| + | + | + | − | + | − | − | − | |
| + | + | − | + | − | + | − | − | |
| + | − | + | + | − | − | + | − | |
| (1) | + | − | − | − | + | + | + | − |
Factor levels are denoted + and −. Taking only the treatment combinations where the factorial effect is + or − will provide a half, i.e., fractional-factorial design.
Possible values for n, m, QTL, epi, and h
| Factor | Minimum | Maximum | Other Constraints |
|---|---|---|---|
| 0 | ∞ | ||
| 0 | ∞ | ||
| 0 | ∞ | qtl is an integer, qtl ≤ | |
| 0 | 1 | ||
| 0 | 1 |
n, number of segregating progeny; m, number of markers; QTL, number of QTL; epi, proportion of genetic variance due to epistasis; h, heritability in the broad sense.
Specification of the factors including n, m, QTL, epi, and h
| Factor | Level 1 | Level 2 | Level 3 | Level 4 | Level 5 |
|---|---|---|---|---|---|
| 200 | 1000 | 2000 | |||
| 100 | 400 | 1000 | |||
| 10 | 50 | 100 | |||
| 0 | 0.2 | 0.5 | 0.8 | 1 | |
| 0.2 | 0.5 | 0.8 |
n, number of segregating progeny; m, number of markers; QTL, number of QTL; epi, proportion of genetic variance due to epistasis; h, heritability.
Figure 2Histogram of differences of prediction accuracies between SVM and BLUP shown in the left panel. Histogram of prediction accuracies for BLUP shown in the right panel.
Starting values for the factors ind, m, QTL, epi, and h in terms of natural units and coded units
| Factor | Natural Units | Coded Units | ||
|---|---|---|---|---|
| Low Level | High Level | Low Level | High Level | |
| 200 | 1000 | −1 | 1 | |
| 100 | 400 | −1 | 1 | |
| 10 | 100 | −1 | 1 | |
| 0.2 | 0.5 | −1 | 1 | |
| 0.2 | 0.5 | −1 | 1 | |
ind, number of progeny; m, number of markers; QTL, number of QTL; epi, proportion of genetic variability explained by the epistatic variability vs. the additive variability; h, proportion of phenotypic variability explained by the genetic variability.
Mean accuracy of BLUP, mean accuracy of SVM, and the difference between the mean accuracy of SVM and mean accuracy of BLUP for 16 combinations of factors
| Treatment Combination | BLUP Accuracy | SVM Accuracy | Response |
|---|---|---|---|
| 1000 | 0.59 | 0.58 | −0.01 |
| 200 | 0.53 | 0.54 | 0.01 |
| 200 | 0.39 | 0.36 | −0.03 |
| 1000 | 0.59 | 0.58 | −0.01 |
| 200 | 0.40 | 0.39 | −0.01 |
| 1000 | 0.45 | 0.44 | −0.01 |
| 1000 | 0.41 | 0.41 | 0.00 |
| 200 | 0.33 | 0.31 | −0.02 |
| 200 | 0.31 | 0.29 | −0.02 |
| 1000 | 0.36 | 0.34 | −0.02 |
| 1000 | 0.29 | 0.27 | −0.02 |
| 200 | 0.21 | 0.20 | −0.01 |
| 1000 | 0.27 | 0.24 | −0.03 |
| 200 | 0.18 | 0.15 | −0.03 |
| 200 | 0.06 | 0.05 | −0.01 |
| 1000 | 0.23 | 0.22 | −0.01 |
Levels of factors, average of the levels of the factors, and half of the difference between the levels of factors
| Factor | Level 1 | Level 2 | ||
|---|---|---|---|---|
| 200 | 1000 | 600 | 400 | |
| 100 | 400 | 250 | 150 | |
| 10 | 100 | 55 | 45 | |
| 0.2 | 0.5 | 0.35 | 0.15 | |
| 0.2 | 0.5 | 0.35 | 0.15 |
Base, step size in natural units, and the coordinates of the steepest ascent for the number of individuals, number of markers, number of QTL, proportion of epistasis, and the degree of heritability to determine the second set of factor combinations for response
| Individuals | Markers | QTL | Epistasis | Heritability | |
|---|---|---|---|---|---|
| Base | 600 | 250 | 55 | 0.35 | 0.35 |
| Increment | 400(0.74) | 150(−0.16) | 45(−0.12) | 0.25 | 0.15(0.78) |
| 296 | −24 | −5.4 | 0.25 | 0.12 | |
| Base | 896 | 226 | 50 | 0.6 | 0.47 |
| Base | 1192 | 202 | 44 | 0.85 | 0.59 |
| Base | 1488 | 178 | 39 | 1 | 0.71 |
| Base | 1784 | 154 | 33 | 1 | 0.83 |
| Base | 2080 | 130 | 28 | 1 | 0.95 |
| Base | 2376 | 106 | 23 | 1 | 1 |
| Base | 2672 | 82 | 17 | 1 | 1 |
| Base | 2968 | 58 | 12 | 1 | 1 |
| Base | 3264 | 34 | 6 | 1 | 1 |
Coordinates of the steepest ascent for the number of individuals, number of markers, number of QTL, proportion of epistasis, and the degree of heritability for the additional runs when the response is the mean accuracy difference between the SVM and BLUP, and the corresponding mean accuracy for BLUP, SVM, and for SVM–BLUP
| Factor Level | Individuals | Markers | QTL | Epistasis | Heritability | |||
|---|---|---|---|---|---|---|---|---|
| 1 | 896 | 226 | 50 | 0.60 | 0.47 | 0.37 | 0.38 | 0.01 |
| 2 | 1192 | 202 | 44 | 0.85 | 0.59 | 0.26 | 0.29 | 0.03 |
| 3 | 1488 | 178 | 39 | 1 | 0.71 | 0.00 | 0.23 | 0.23 |
| 4 | 1784 | 154 | 33 | 1 | 0.83 | 0.01 | 0.31 | 0.30 |
| 5 | 2080 | 130 | 28 | 1 | 0.95 | 0.01 | 0.46 | 0.45 |
| 6 | 2376 | 106 | 23 | 1 | 1 | 0.01 | 0.55 | 0.54 |
| 7 | 2672 | 82 | 17 | 1 | 1 | 0.00 | 0.62 | 0.62 |
| 8 | 2968 | 58 | 12 | 1 | 1 | 0.01 | 0.73 | 0.72 |
| 9 | 3264 | 34 | 6 | 1 | 1 | 0.00 | 0.98 | 0.98 |
Base, step size in natural units, and the coordinates of the steepest ascent for the number of individuals, number of markers, number of QTL, proportion of epistasis, and the degree of heritability for response
| Individuals | Markers | QTL | Epistasis | Heritability | |
|---|---|---|---|---|---|
| Base | 600 | 250 | 55 | 0.35 | 0.35 |
| Increment | 400(0.228) | 150(−0.029) | 45(0.146) | −0.25 | 0.15(0.842) |
| 91 | −4.4 | 6.6 | −0.25 | 0.13 | |
| Base | 691 | 246 | 62 | 0.10 | 0.48 |
| Base | 782 | 241 | 68 | 0 | 0.61 |
| Base | 873 | 237 | 75 | 0 | 0.74 |
| Base | 964 | 232 | 81 | 0 | 0.87 |
| Base | 1055 | 228 | 88 | 0 | 1 |
Coordinates of the steepest ascent for the number of individuals, number of markers, number of QTL, proportion of epistasis, and the degree of heritability for the additional runs when the response is the mean accuracy for BLUP, and the corresponding mean accuracy for BLUP
| Individuals | Markers | QTL | Epistasis | Heritability | BLUP | |
|---|---|---|---|---|---|---|
| Run 1 | 691 | 246 | 62 | 0.1 | 0.48 | 0.62 |
| Run 2 | 782 | 241 | 68 | 0 | 0.61 | 0.74 |
| Run 3 | 873 | 237 | 75 | 0 | 0.74 | 0.83 |
| Run 4 | 964 | 232 | 81 | 0 | 0.87 | 0.92 |
| Run 5 | 1055 | 228 | 88 | 0 | 1 | 1 |