| Literature DB >> 33167865 |
Tiago L Passafaro1, Fernando B Lopes2, João R R Dórea1, Mark Craven3,4, Vivian Breen2, Rachel J Hawken2, Guilherme J M Rosa5,6.
Abstract
BACKGROUND: Deep neural networks (DNN) are a particular case of artificial neural networks (ANN) composed by multiple hidden layers, and have recently gained attention in genome-enabled prediction of complex traits. Yet, few studies in genome-enabled prediction have assessed the performance of DNN compared to traditional regression models. Strikingly, no clear superiority of DNN has been reported so far, and results seem highly dependent on the species and traits of application. Nevertheless, the relatively small datasets used in previous studies, most with fewer than 5000 observations may have precluded the full potential of DNN. Therefore, the objective of this study was to investigate the impact of the dataset sample size on the performance of DNN compared to Bayesian regression models for genome-enable prediction of body weight in broilers by sub-sampling 63,526 observations of the training set.Entities:
Keywords: And multilayer perceptron; Body weight; Broilers; Deep neural networks; Genome-enabled prediction
Mesh:
Year: 2020 PMID: 33167865 PMCID: PMC7654004 DOI: 10.1186/s12864-020-07181-x
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Hyperparameters considered in the neural architecture search of deep neural networks (DNN)a
| Hyperparameter | Space |
|---|---|
| Number of units | 1, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000 |
| Hidden layers | 1, 2, 3, 4 |
| Dropout rateb | 0.5, 0.6, 0.7, 0.8, 0.9, 1 |
| L2c | 0.0000, 0.0025, 0.0050, 0.0075, 0.0100, 0.0125, 0.0150, 0.0175, 0.0200, 0.0225, 0.0250, 0.0275, 0.3000, 0.0325, 0.0350, 0.0375, 0.0400, 0.0425, 0.0450, 0.0475, 0.0500, 0.0525, 0.0550, 0.0575, 0.0600, 0.0625, 0.0650, 0.0675, 0.0700, 0.0725, 0.0750, 0.0775, 0.0800, 0.0825, 0.0850, 0.0875, 0.0900, 0.0925, 0.0950, 0.0975, 0.1000 |
aThe hyperparameters were randomly select and combined to find the optimal DNN architecture
bThe dropout rate was applied in all layers, except for the output layer
cL2 = ridge regularization
The best deep neural network architecture selected based on prediction correlation on the tuning set for each sub-sampling of the training set
| Size (%) | Deep neural network architecture | |||||
|---|---|---|---|---|---|---|
| Number of layers | Number of units per layera | L2b | Dropout ratec | Accuracy | MSEPd | |
| 1 | 4 | 5000(1)-1(2)-600(3)-800(4) | 0.0600 | 1.0 | 0.090 | 30,589.3 |
| 3 | 4 | 5000(1)-300(2)-200(3)-4000(4) | 0.0675 | 1.0 | 0.137 | 29,649.9 |
| 5 | 3 | 400(1)-200(2) -900(3) | 0.0100 | 0.5 | 0.145 | 30,408.7 |
| 7 | 2 | 500(1)-2000(2) | 0.0450 | 0.8 | 0.166 | 29,062.4 |
| 10 | 2 | 800(1)-100(2) | 0.0025 | 0.6 | 0.200 | 28,440.9 |
| 15 | 2 | 800(1)-900(2) | 0.0050 | 0.5 | 0.236 | 27,755.0 |
| 20 | 4 | 600(1)-100(2)-500(3)-700(4) | 0.0325 | 0.5 | 0.226 | 28,849.5 |
| 30 | 1 | 1000(1) | 0.0100 | 0.7 | 0.274 | 27,025.5 |
| 40 | 1 | 2000(1) | 0.0800 | 0.6 | 0.285 | 26,877.4 |
| 50 | 3 | 600(1)-4000(2) -100(3) | 0.0975 | 0.5 | 0.285 | 27,250.3 |
| 60 | 1 | 300(1) | 0.0800 | 0.8 | 0.304 | 26,622.3 |
| 70 | 1 | 400(1) | 0.0800 | 0.5 | 0.309 | 26,506.4 |
| 80 | 1 | 800(1) | 0.0925 | 0.7 | 0.308 | 26,484.5 |
| 90 | 1 | 400(1) | 0.0800 | 0.5 | 0.307 | 26,710.1 |
| 100 | 1 | 500(1) | 0.0600 | 1.0 | 0.322 | 26,264.8 |
aThe number in parenthesis represents the corresponding hidden layer
bL2 = ridge regularization
cDropout rate was applied in all layers, except for the output layer
dMSEP = mean square error of prediction
Fig. 1Predictive performance for each of the 200 deep neural networks generated using the neural architecture search, in (a) prediction correlation and (b) mean square error of prediction (MSEP). The continuous black line represents the median of the 200 deep neural networks for each sub-sample of the training set
Fig. 2Predictive performance for Bayes Cπ, Bayesian Ridge Regression (BRR), Deep Neural Networks (DNN), Bayes Cπ fit with the tuning set (Bayes Cπ-WT), and Bayesian Ridge Regression fit with the tuning set (BRR-WT) for each sub-sampling of the entire training set size, in (a) prediction correlation and (b) mean square error of prediction (MSEP)
Fig. 3Predictive bias for Bayes Cπ, Bayesian Ridge Regression (BRR), Deep Neural Networks (DNN), Bayes Cπ fit with the tuning set (Bayes Cπ-WT), and Bayesian Ridge Regression fit with the tuning set (BRR-WT) for each sub-sampling of the entire training set size
Spearman rank correlations between predicted body weight from the different genome-enabled prediction approaches and for the various sub-sampling of the entire training dataset
| Predictive approach | Training dataset size (%) | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | 5 | 7 | 10 | 15 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 | |
| BRR x Bayes Cπ | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.98 | 0.98 | 0.98 | 0.98 | 0.98 | 0.98 | 0.98 |
| BRR x DNN | 0.79 | 0.89 | 0.86 | 0.95 | 0.95 | 0.96 | 0.78 | 0.97 | 0.94 | 0.91 | 0.94 | 0.95 | 0.95 | 0.95 | 0.94 |
| BRR x BRR-WT | 0.33 | 0.52 | 0.64 | 0.69 | 0.78 | 0.83 | 0.87 | 0.91 | 0.94 | 0.95 | 0.96 | 0.96 | 0.96 | 0.97 | 0.97 |
| BRR x Bayes Cπ-WT | 0.33 | 0.52 | 0.63 | 0.69 | 0.79 | 0.83 | 0.85 | 0.90 | 0.93 | 0.93 | 0.95 | 0.95 | 0.95 | 0.96 | 0.96 |
| Bayes Cπ x DNN | 0.79 | 0.89 | 0.86 | 0.96 | 0.95 | 0.95 | 0.78 | 0.95 | 0.93 | 0.88 | 0.93 | 0.94 | 0.94 | 0.93 | 0.94 |
| Bayes Cπ x BRR-WT | 0.32 | 0.52 | 0.64 | 0.69 | 0.78 | 0.82 | 0.86 | 0.90 | 0.93 | 0.93 | 0.94 | 0.95 | 0.95 | 0.95 | 0.96 |
| Bayes Cπ x Bayes Cπ-WT | 0.32 | 0.52 | 0.63 | 0.69 | 0.80 | 0.82 | 0.85 | 0.90 | 0.93 | 0.94 | 0.95 | 0.95 | 0.96 | 0.97 | 0.97 |
| DNN x BRR-WT | 0.33 | 0.52 | 0.58 | 0.66 | 0.76 | 0.79 | 0.71 | 0.88 | 0.89 | 0.87 | 0.92 | 0.92 | 0.93 | 0.92 | 0.92 |
| DNN x Bayes Cπ-WT | 0.33 | 0.52 | 0.57 | 0.66 | 0.76 | 0.79 | 0.69 | 0.87 | 0.87 | 0.86 | 0.90 | 0.91 | 0.92 | 0.91 | 0.91 |
| BRR-WT x Bayes Cπ-WT | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.98 | 0.98 | 0.98 | 0.97 | 0.99 | 0.98 | 0.98 | 0.98 | 0.98 |
Agreement on the top 10-ranked broilers selected across the different genome-enabled prediction approaches and for the various sub-sampling of the entire training dataset
| Predictive Approach | Training dataset size (%) | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | 5 | 7 | 10 | 15 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 | |
| BRR x Bayes Cπ | 91.5 | 94.5 | 95.0 | 94.4 | 95.9 | 93.9 | 93.2 | 90.8 | 90.3 | 87.6 | 86.6 | 89.1 | 87.6 | 87.9 | 88.6 |
| BRR x DNN | 57.8 | 66.9 | 64.9 | 79.3 | 78.4 | 82.6 | 55.5 | 79.8 | 76.6 | 73.1 | 72.9 | 77.1 | 78.5 | 75.9 | 76.5 |
| BRR x BRR-WT | 28.2 | 37.3 | 44.5 | 50.1 | 57.7 | 62.8 | 68.5 | 74.3 | 78.8 | 81.5 | 83.3 | 84.6 | 86.4 | 86.1 | 87.9 |
| BRR x Bayes Cπ-WT | 28.3 | 36.8 | 44.9 | 50.5 | 58.1 | 63.4 | 64.7 | 72.1 | 75.8 | 76.4 | 79.5 | 80.1 | 80.6 | 83.0 | 83.5 |
| Bayes Cπ x DNN | 57.6 | 65.7 | 64.2 | 78.8 | 77.9 | 80.9 | 55.3 | 76.9 | 74.8 | 69.4 | 71.6 | 74.9 | 75.2 | 72.2 | 74.2 |
| Bayes Cπ x BRR-WT | 28.5 | 37.3 | 44.0 | 50.7 | 56.4 | 62.5 | 67.6 | 73.3 | 76.8 | 79.1 | 79.8 | 82.2 | 82.0 | 81.8 | 82.6 |
| Bayes Cπ x Bayes Cπ-WT | 28.7 | 36.9 | 44.2 | 50.9 | 57.3 | 63.4 | 64.1 | 73.2 | 75.8 | 78.7 | 80.4 | 81.3 | 81.3 | 83.3 | 85.2 |
| DNN x BRR-WT | 28.2 | 37.5 | 38.0 | 47.3 | 55.3 | 58.8 | 47.4 | 66.5 | 68.5 | 68.3 | 69.7 | 72.8 | 73.6 | 72.2 | 72.3 |
| DNN x Bayes Cπ-WT | 28.1 | 36.5 | 38.1 | 47.4 | 55.4 | 58.8 | 46.3 | 64.4 | 67.8 | 65.2 | 67.7 | 71.0 | 71.0 | 71.2 | 70.1 |
| BRR-WT x Bayes Cπ-WT | 94.1 | 96.2 | 93.3 | 93.9 | 94.5 | 93.9 | 89.5 | 88.9 | 88.7 | 85.7 | 89.6 | 88.6 | 88.1 | 88.3 | 88.2 |
Fig. 4Representation of a Multilayer Perceptron (MLPs) architecture. In (a) The structure of the deep neural network (DNN) and the training process including forward and backward propagation are depicted. In the forward propagation information flows from the input to the output layers by outputting the calculations of the activation function to the next layer. In the backward propagation, the output is assessed and a loss function L(W) [i.e. mean square error] is used to minimize the overall error function, and consequently update the network weights using stochastic gradient descent. In (b) The underlying calculations for each unit in order to provide the output to the next layer. In this process, weight vectors [W(.)] and inputs are linearly combined and transformed based on an activation function, i.e., rectified linear which outputs the maximum between zero and the linear combination of weights and inputs. This figure is based and adapted from the diagram proposed by Angermueller et al. (2016) [44]