| Literature DB >> 36249559 |
Weverton Gomes da Costa1, Maurício de Oliveira Celeri2, Ivan de Paiva Barbosa3, Gabi Nunes Silva4, Camila Ferreira Azevedo3, Aluizio Borem3, Moysés Nascimento2, Cosme Damião Cruz1.
Abstract
Genomic wide selection (GWS) is one contributions of molecular genetics to breeding. Machine learning (ML) and artificial neural networks (ANN) methods are non-parameterized and can develop more accurate and parsimonious models for GWS analysis. Multivariate Adaptive Regression Splines (MARS) is considered one of the most flexible ML methods, automatically modeling nonlinearities and interactions of the predictor variables. This study aimed to evaluate and compare methods based on ANN, ML, including MARS, and G-BLUP through GWS. An F2 population formed by 1000 individuals and genotyped for 4010 SNP markers and twelve traits from a model considering epistatic effect, with QTL numbers ranging from eight to 480 and heritability ( h 2 ) of 0.3, 0.5 or 0.8 were simulated. Variation in heritability and number of QTL impacts the performance of methods. About quantitative traits (40, 80, 120, 240, and 480 QTLs) was observed highest R2 to Radial Base Network (RBF) and G-BLUP, followed by Random Forest (RF), Bagging (BA), and Boosting (BO). RF and BA also showed better results for traits to h 2 of 0.3 with R 2 values 16.51% and 16.30%, respectively, while MARS methods showed better results for oligogenic traits with R 2 values ranging from 39,12 % to 43,20 % in h 2 of 0.5 and from 59.92% to 78,56% in h 2 of 0.8. Non-additive MARS methods also showed high R2 for traits with high heritability and 240 QTLs or more. ANN and ML methods are powerful tools to predict genetic values in traits with epistatic effect, for different degrees of heritability and QTL numbers.Entities:
Keywords: Genome wide selection; Genome-enabled prediction; Multivariate adaptive regression splines; Non-additive effects; Quantitative trait locus
Year: 2022 PMID: 36249559 PMCID: PMC9547190 DOI: 10.1016/j.csbj.2022.09.029
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 6.155
Number of controlling loci and heritability () of the 12 simulated traits (C1 to C18).
| C1 | C3 | C4 | C5 | C6 | ||
| C7 | C8 | C9 | C10 | C11 | C12 | |
| C13 | C14 | C15 | C16 | C17 | C18 | |
Fig. 1Boxplot of the genetic and phenotypic values of the 18 simulated traits, considering a coefficient of variation equal to 12% and mean to 100. The specification of each characteristic is represented in Table 1.
Fig. 2Average results of selective accuracy () as a function of the number of genes and heritability for the families of the methods: Trees [Bagging (BA), Boosting (BO), Decision Tree (DT); and Random Forest (RF)]; Network (Multilayer Perceptron Network (MLP) and Radial Base Function Network (RBF) MARS (MARS 1, 2 and 3); and G-BLUP. The red dashed line refers to the overall mean value of the selective accuracy () between all methods for comparison purposes. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 3Average results of predictive accuracy (RMSE) as a function of the number of genes and heritability for the families of the methods: Trees [Bagging (BA), Boosting (BO), Regression Tree (DT); and Random Forest (RF)]; Network (Multilayer Perceptron Network (MLP) and Radial Base Function Network (RBF) MARS (MARS 1, 2 and 3) and G-BLUP. The red dashed line refers to the overall mean value of predictive accuracy (RMSE)) between all methods for comparison purposes. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)