| Literature DB >> 33329658 |
Carlos Maldonado1, Freddy Mora-Poblete2, Rodrigo Iván Contreras-Soto1, Sunny Ahmar2,3, Jen-Tsung Chen4, Antônio Teixeira do Amaral Júnior5, Carlos Alberto Scapim6.
Abstract
Genomic selection models were investigated to predict several complex traits in breeding populations of Zea mays L. and Eucalyptus globulus Labill. For this, the following methods of Machine Learning (ML) were implemented: (i) Deep Learning (DL) and (ii) Bayesian Regularized Neural Network (BRNN) both in combination with different hyperparameters. These ML methods were also compared with Genomic Best Linear Unbiased Prediction (GBLUP) and different Bayesian regression models [Bayes A, Bayes B, Bayes Cπ, Bayesian Ridge Regression, Bayesian LASSO, and Reproducing Kernel Hilbert Space (RKHS)]. DL models, using Rectified Linear Units (as the activation function), had higher predictive ability values, which varied from 0.27 (pilodyn penetration of 6 years old eucalypt trees) to 0.78 (flowering-related traits of maize). Moreover, the larger mini-batch size (100%) had a significantly higher predictive ability for wood-related traits than the smaller mini-batch size (10%). On the other hand, in the BRNN method, the architectures of one and two layers that used only the pureline function showed better results of prediction, with values ranging from 0.21 (pilodyn penetration) to 0.71 (flowering traits). A significant increase in the prediction ability was observed for DL in comparison with other methods of genomic prediction (Bayesian alphabet models, GBLUP, RKHS, and BRNN). Another important finding was the usefulness of DL models (through an iterative algorithm) as an SNP detection strategy for genome-wide association studies. The results of this study confirm the importance of DL for genome-wide analyses and crop/tree improvement strategies, which holds promise for accelerating breeding progress.Entities:
Keywords: Bayesian regularized neural network; deep learning; eucalypt; genomic prediction; machine learning; single-nucleotide polymorphisms; tropical maize
Year: 2020 PMID: 33329658 PMCID: PMC7728740 DOI: 10.3389/fpls.2020.593897
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
FIGURE 1Diagram of a Long Short-Term Memory (LSTM) block. This block is a recurrently connected subnet that contains memory cell and gates functional modules. X, h, and C are the inputs of the LSTM unit, which correspond to the input of the current time step, the output from the previous LSTM unit, and the memory of the previous unit, respectively. C denotes the memory of the current unit and h denotes the output of the current network (outputs of LSTM unit). The LSTM block is divided into three parts: gates forget (blue), update (green), and output the cell (yellow). Each part is composed of a sigmoid function (σ), which computes the gates activation function (f: forget gate, i: input gate, o: output gate) from input weights (WX), recurrent weights (Wh), and bias (b). The update and output parts use a hyperbolic tangent (Tanh) function to calculate the input update (g) and the memory of the current unit (h), respectively.
Predictive ability of complex traits in maize (FF, female flowering; MF, male flowering; ASI, anthesis-silking interval) and eucalypt (WD, pilodyn penetration; ST, stem straightness; BQ, branch quality; TH, tree height; DBH, diameter at breast height) for six deep learning models, considering different hyperparameters: activation function (Rectified Linear Units: lstm1, lstm3, and lstm5; hyperbolic tangent: lstm2, lstm4, and lstm6) and mini-batch (10%: lstm1 and lstm2, 50%: lstm3 and lstm4, 100%: lstm5 and lstm6).
| Model | |||||||||||
| FF | MF | ASI | WD | ST | BQ | TH | DBH | ||||
| Sabaudia | Cambira | Sabaudia | Cambira | Sabaudia | Cambira | ||||||
| lstm1 | 0.533 | 0.724 | 0.623 | 0.764 | 0.455 | 0.552 | 0.317 | 0.469 | 0.422 | 0.368 | 0.377 |
| lstm2 | 0.545 | 0.740 | 0.631 | 0.757 | 0.492 | 0.548 | 0.271 | 0.416 | 0.382 | 0.350 | 0.369 |
| lstm3 | 0.558 | 0.742 | 0.639 | 0.763 | 0.486 | 0.561 | 0.395 | 0.481 | 0.388 | 0.423 | 0.472 |
| lstm4 | 0.539 | 0.737 | 0.628 | 0.751 | 0.475 | 0.506 | 0.365 | 0.345 | 0.343 | 0.408 | 0.404 |
| lstm5 | 0.565 | 0.751 | 0.639 | 0.776 | 0.528 | 0.610 | 0.471 | 0.557 | 0.460 | 0.496 | 0.556 |
| lstm6 | 0.558 | 0.730 | 0.627 | 0.765 | 0.488 | 0.537 | 0.408 | 0.558 | 0.436 | 0.474 | 0.452 |
Predictive ability of complex traits in maize (FF, female flowering; MF, male flowering; ASI, anthesis-silking interval) and eucalypt (WD, pilodyn penetration; ST, stem straightness; BQ, branch quality; TH, tree height; DBH, diameter at breast height) for Bayesian regularized neural network models, considering different hyperparameters: activation function (pureline: brnn1, brnn4, brnn7, brnn8, brnn9, and brnn10; logsig: brnn2, brnn5, brnn8, brnn9, and brnn10; tansig: brnn3, brnn6, brnn7, brnn9, and brnn10) and number of layers (one layer: brnn1, brnn2, and brnn3, two layers: brnn4, brnn5, brnn6m brnn7, and brnn8, three layers: brnn9, and brnn10).
| Model | |||||||||||
| FF | MF | ASI | WD | ST | BQ | TH | DBH | ||||
| Sabaudia | Cambira | Sabaudia | Cambira | Sabaudia | Cambira | ||||||
| brnn1 | 0.481 | 0.617 | 0.548 | 0.709 | 0.447 | 0.461 | 0.454 | 0.469 | 0.419 | 0.491 | 0.490 |
| brnn2 | 0.410 | 0.548 | 0.436 | 0.609 | 0.272 | 0.230 | 0.211 | 0.333 | 0.349 | 0.374 | 0.399 |
| brnn3 | 0.307 | 0.294 | 0.311 | 0.673 | 0.337 | 0.453 | 0.271 | 0.398 | 0.311 | 0.349 | 0.390 |
| brnn4 | 0.486 | 0.652 | 0.584 | 0.710 | 0.423 | 0.460 | 0.466 | 0.459 | 0.412 | 0.504 | 0.501 |
| brnn5 | 0.444 | 0.607 | 0.543 | 0.672 | 0.333 | 0.379 | 0.223 | 0.412 | 0.391 | 0.410 | 0.463 |
| brnn6 | 0.413 | 0.593 | 0.439 | 0.618 | 0.406 | 0.459 | 0.371 | 0.427 | 0.374 | 0.380 | 0.409 |
| brnn7 | 0.434 | 0.604 | 0.533 | 0.322 | 0.413 | 0.407 | 0.385 | 0.345 | 0.208 | 0.378 | 0.400 |
| brnn8 | 0.444 | 0.596 | 0.497 | 0.641 | 0.415 | 0.286 | 0.397 | 0.315 | 0.260 | 0.409 | 0.432 |
| brnn9 | 0.440 | 0.581 | 0.521 | 0.637 | 0.407 | 0.454 | 0.407 | 0.422 | 0.374 | 0.375 | 0.403 |
| brnn10 | 0.426 | 0.532 | 0.520 | 0.647 | 0.39 | 0.416 | 0.231 | 0.443 | 0.222 | 0.301 | 0.253 |
Estimates of predictive ability of complex traits for different genomic models assessed in 6 years old eucalypt trees.
| Model/traits | WD | ST | BQ | TH | DBH |
| Bayes A | 0.267 | 0.376 | 0.216 | 0.304 | 0.352 |
| Bayes B | 0.295 | 0.518 | 0.128 | 0.319 | 0.341 |
| Bayes Cπ | 0.455 | 0.544 | 0.162 | 0.441 | 0.394 |
| BL | 0.301 | 0.200 | 0.056 | 0.204 | 0.169 |
| BRR | 0.321 | 0.481 | 0.309 | 0.303 | 0.444 |
| GBLUP | 0.187 | 0.226 | 0.142 | 0.159 | 0.220 |
| RKHS | 0.223 | 0.225 | 0.180 | 0.197 | 0.230 |
| BRNN | 0.454 | 0.469 | 0.419 | 0.491 | 0.490 |
| DL | 0.471 | 0.557 | 0.460 | 0.496 | 0.556 |
| 0.09 (0.05) | 0.01 (0.03) | 0.05 (0.04) | 0.04 (0.04) | 0.01 (0.03) |
Estimates of predictive ability of complex traits for different genomic models assessed in maize inbred lines.
| Model/traits | FF | MF | ASI | |||
| Sabaudia | Cambira | Sabaudia | Cambira | Sabaudia | Cambira | |
| Bayes A | 0.512 | 0.635 | 0.592 | 0.652 | 0.464 | 0.510 |
| Bayes B | 0.499 | 0.633 | 0.567 | 0.661 | 0.462 | 0.550 |
| Bayes Cπ | 0.487 | 0.648 | 0.586 | 0.644 | 0.469 | 0.533 |
| BL | 0.498 | 0.624 | 0.561 | 0.664 | 0.527 | 0.540 |
| BRR | 0.511 | 0.617 | 0.594 | 0.660 | 0.479 | 0.543 |
| GBLUP | 0.531 | 0.558 | 0.563 | 0.645 | 0.429 | 0.454 |
| RKHS | 0.526 | 0.560 | 0.590 | 0.667 | 0.421 | 0.469 |
| BRNN | 0.481 | 0.617 | 0.548 | 0.709 | 0.447 | 0.461 |
| DL | 0.565 | 0.751 | 0.639 | 0.776 | 0.528 | 0.610 |
| 0.606 (0.12) | 0.847 (0.08) | 0.614 (0.12) | 0.778 (0.1) | 0.287 (0.1) | 0.295 (0.1) | |
| 0.208 (0.43) | 0.506 (0.47) | 0.206 (0.43) | 0.758 (0.54) | 0.253 (0.14) | 0.274 (0.12) | |
| 0.398 (0.48) | 0.341 (0.49) | 0.408 (0.48) | 0.020 (0.57) | 0.034 (0.12) | 0.021 (0.1) | |
FIGURE 2Estimates of marker effects obtained using the deep learning model for male flowering (MF) of maize (A,C,E) and stem straightness (ST) of 6 years old eucalypt trees (B,D,F). Three iterations of the algorithm of Wang et al. (2012) are shown.