| Literature DB >> 25886037 |
Anita Ehret1, David Hochstuhl2, Daniel Gianola3,4,5, Georg Thaller6.
Abstract
BACKGROUND: Recently, artificial neural networks (ANN) have been proposed as promising machines for marker-based genomic predictions of complex traits in animal and plant breeding. ANN are universal approximators of complex functions, that can capture cryptic relationships between SNPs (single nucleotide polymorphisms) and phenotypic values without the need of explicitly defining a genetic model. This concept is attractive for high-dimensional and noisy data, especially when the genetic architecture of the trait is unknown. However, the properties of ANN for the prediction of future outcomes of genomic selection using real data are not well characterized and, due to high computational costs, using whole-genome marker sets is difficult. We examined different non-linear network architectures, as well as several genomic covariate structures as network inputs in order to assess their ability to predict milk traits in three dairy cattle data sets using large-scale SNP data. For training, a regularized back propagation algorithm was used. The average correlation between the observed and predicted phenotypes in a 20 times 5-fold cross-validation was used to assess predictive ability. A linear network model served as benchmark.Entities:
Mesh:
Year: 2015 PMID: 25886037 PMCID: PMC4379719 DOI: 10.1186/s12711-015-0097-5
Source DB: PubMed Journal: Genet Sel Evol ISSN: 0999-193X Impact factor: 4.297
Figure 1Schematic representation of an artificial neuron. x = input value; w = weights linked to single input values; f(.) = activation function of the artificial neurons; z = output of artificial neuron; indicates some computation.
Figure 2Architecture of a two-layer feed forward neural network. x = network input, e.g., marker genotype j of individual i; w 1 = network weight from the input to hidden layer; w 2 = network weight from the hidden to the output layer; y network output, e.g., predicted phenotype of individual; f(.) = activation function at the hidden neurons; g(.) = activation function at the output neuron; indicates some computation.
Data used
|
|
|
| |
|---|---|---|---|
|
|
| ||
| German Fleckvieh bulls | 3 341 | 39 344 SNPs | DYD of milk traits |
| Holstein-Friesian bulls | 2 303 | 41 995 SNPs | DYD of milk traits |
| Holstein-Friesian cows | 777 | 41 718 SNPs | YD of milk traits |
SNP = single nucleotide polymorphism, YD = yield deviations, DYD = daughter yield deviations.
Summary statistics of phenotypes used
|
|
|
|
| |
|---|---|---|---|---|
|
| ||||
| Milk yield DYD | 1 779.16 | 219 257.40 | -852.48 | 3 372.67 |
| Protein yield DYD | 59.34 | 214.18 | -23.56 | 108.65 |
| Fat yield DYD | 59.34 | 320.81 | -39.12 | 137.11 |
|
| ||||
| Milk yield DYD | 707.44 | 434 324.64 | -852.09 | 3 706.01 |
| Protein yield DYD | 41.88 | 391.42 | -24.19 | 104.57 |
| Fat yield DYD | 41.14 | 645.42 | -45.81 | 139.74 |
|
| ||||
| Milk yield YD | 3.26 | 26.13 | -14.36 | 19.37 |
| Protein yield YD | 0.91 | 1.54 | -4.86 | 4.23 |
| Fat yield YD | 0.21 | 0.50 | -4.17 | 1.80 |
DYD = daughter yield deviations, YD = yield deviations.
Figure 3Comparison of predictive abilities for all scenarios. Different data sets are in the columns, in rows milk, protein and fat yield are shown. Panels (a-h) show the average Pearson’s correlation coefficients over cross-validation runs on the vertical axis, and the number of hidden neurons tested on the horizontal axis. Results of different genomic covariate structures used as inputs (X, G, U D) are presented in each panel.
Phenotypic and marker-based genetic correlations between traits within data sets
|
| |||
|---|---|---|---|
| Milk yield DYD | Protein yield DYD | Fat yield DYD | |
| Milk yield DYD | 0.87(0.04) | 0.58(0.03) | 0.73(0.04) |
| Protein yield DYD | 0.70(0.01) | 0.79(0.04) | 0.62(0.03) |
| Fat yield DYD | 0.89(0.01) | 0.81(0.01) | 0.77(0.04) |
|
| |||
| Milk yield DYD | Protein yield DYD | Fat yield DYD | |
| Milk yield DYD | 0.67(0.05) | 0.24(0.04) | 0.52(0.04) |
| Protein yield DYD | 0.43(0.02) | 0.82(0.05) | 0.42(0.04) |
| Fat yield DYD | 0.86(0.01) | 0.63(0.01) | 0.60(0.04) |
|
| |||
| Milk yield YD | Protein yield YD | Fat yield YD | |
| Milk yield YD | 0.61(0.08) | 0.24(0.06) | 0.51(0.08) |
| Protein yield YD | 0.48(0.03) | 0.67(0.08) | 0.31(0.07) |
| Fat yield YD | 0.92(0.01) | 0.60(0.02) | 0.51(0.08) |
On diagonal of singular panels the marker-based heritability is shown, on the upper off-diagonal the marker-based genetic correlation and on the lower off-diagonal the phenotypic correlation are presented, Standard errors (SE) are shown in brackets, DYD = Daughter yield deviation, YD = Yield deviation.
Model comparison of linear and non-linear ANN models
|
|
|
| |
|---|---|---|---|
|
|
|
| |
|
| |||
| Milk yield DYD | 0.68 (0.0007) | 0.52 (0.0016) | 0.68 (0.0008) |
| Protein yield DYD | 0.68 (0.0006) | 0.53 (0.0011) | 0.67 (0.0005) |
| Fat yield DYD | 0.66 (0.0005) | 0.56 (0.0008) | 0.65 (0.0005) |
|
| |||
| Milk yield DYD | 0.60 (0.0006) | 0.53 (0.0011) | 0.58 (0.0008) |
| Protein yield DYD | 0.59 (0.0009) | 0.50 (0.0013) | 0.57 (0.0009) |
| Fat yield DYD | 0.57 (0.0009) | 0.51 (0.0010) | 0.56 (0.0009) |
|
| |||
| Milk yield YD | 0.47 (0.0031) | 0.44 (0.0040) | 0.47 (0.0027) |
| Protein yield YD | 0.37 (0.0033) | 0.35 (0.0039) | 0.35 (0.0032) |
| Fat yield YD | 0.46 (0.0037) | 0.39 (0.0049) | 0.47 (0.0028) |
Compared are linear and non-linear ANN with 1 neuron in hidden layer and G matrix as input to the network and best non-linear ANN. DYD = Daughter yield deviation, YD = Yield deviation, r = average Pearson correlation coefficient of the cross-validation runs, variance of cross-validation runs is shown in brackets.