| Literature DB >> 35909774 |
Lígia de Oliveira Amaral1, Glauco Vieira Miranda2, Bruno Henrique Pedroso Val1, Alice Pereira Silva1, Alyce Carla Rodrigues Moitinho1, Sandra Helena Unêda-Trevisoli1.
Abstract
Soybean has a recognized narrow genetic base that often makes it difficult to visualize available genetic and phenotypic variability and identify superior genotypes during the selection process. However, the phenotypic expression of soybean plants is highly affected by photoperiod and the cultivation of a given variety is performed in the latitude range that presents ideal conditions for its development based on its relative maturity group (RMG) for the optimization of the phenotypic expression of its genotype. Based on the above, this study aimed to evaluate the efficiency of artificial neural networks (ANNs) as a tool for the correct discrimination and classification of tropical soybean genotypes according to their relative maturity group during the population selection process with the aim of optimizing the phenotypic performance of these selected genotypes. For this purpose, three biparental populations were synthesized, one with a wide genetic variability for the RMG character obtained from the hybridization between genitors of maturity groups RMG 5 (Sub-tropical 23° LS) × RMG 9.4 (Tropical 0° LS) and two populations with a narrow variability obtained between genitors RMG 7.3 (Tropical 20° LS) × RMG 9.4 and RMG 5.3 × RMG 6.7, respectively. Criteria for comparing the developed ANN architecture with Fisher's linear and Anderson's quadratic parametric discriminant methodologies were applied to the data for the discrimination and classification of the genotypes. ANN showed an apparent error rate of less than 8.16% as well as a low influence of environmental factors, correctly classifying the genotypes in the populations even in cases of reduced genetic variability such as in the RMG 5 × RMG 6 population. In contrast, the discriminant functions were inefficient in correctly classifying the genotypes in the populations with genealogical similarity (RMG 5 × RMG 6) and wide genetic variability, with an error rate of more than 50%. Based on the results of this study, ANN can be used for the discrimination of genotypes in the initial generations of selection in breeding programs for the development of high performance cultivars for wide and reduced photoperiod amplitudes, even with fewer selection environments, more efficiently, and with fewer time and resources applied. As a result of similarity between the parents, ANN can correctly classify genotypes from populations with a narrow genetic base, in addition to pure lines and genotypes with a high degree of inbreeding.Entities:
Keywords: apparent error rate; data mining; glycine max; machine learning; photoperiod; relative maturity
Year: 2022 PMID: 35909774 PMCID: PMC9328155 DOI: 10.3389/fpls.2022.814046
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 6.627
Genealogy and relative maturity group (RMG) of 11 soybean populations used in study.
| Population | Genealogy | RMG |
|---|---|---|
| Brazil | BRS 278 RR × 5953 RSF RR | 9.4/5.0 |
| Southern | BMX Potência RR × BMX Energia RR | 6.7/5.3 |
| Northern | BRS 245 RR × BRS 278 RR | 7.3/9.4 |
| GBN1 | BRS 278 RR | 9.4 |
| GB2 | 5953 RSF RR | 5.0 |
| GS1 | BMX Potência RR | 6.7 |
| GS2 | BMX Energia RR | 5.3 |
| GN2 | BRS 245 RR | 7.3 |
| TGM7 | TMG 1174 RR | 7.4 |
| TGM6 | TMG 7262 RR | 6.2 |
| TGM8 | TMG 1179 RR | 7.9 |
Classification of soybean genotypes into 11 populations of different relative maturity groups and estimation of apparent error rate (AER) according to Fisher’s and Anderson’s discriminant analysis and hold-out and k-fold ANN approaches.
| Approach | Total ratings | Misclassifications | AER (%) |
|---|---|---|---|
| Fisher | 1,517 | 889 | 58.60 |
| Anderson | 1,517 | 769 | 50.59 |
| Hold-out | 1,458 | 119 | 8.16 |
| 729 | 41 | 5.62 |
Model prediction quality evaluation metrics for hold-out and k-fold approaches in classifying soybean genotypes in 11 populations from different relative maturity groups.
| Approach | Loss (%) | Accuracy (%) | Precision (%) | Recall (%) | |
|---|---|---|---|---|---|
| Hold-out | 34.10 | 91.84 | 92.14 | 91.70 | 91.94 |
| 26.39 | 93.36 | 93.49 | 93.23 | 93.36 |
Classification of soybean genotypes in 11 populations from different relative maturity groups according to Anderson’s discriminant analysis.
| POP | Brazil | Southern | Northern | GBN1 | GB2 | GS1 | GS2 | GN2 | TGM7 | TGM6 | TGM8 |
|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
| 96 | 84 | 2 | 10 | 70 | 7 | 15 | 27 | 28 | 25 |
|
| 22 |
| 0 | 0 | 19 | 197 | 3 | 0 | 0 | 40 | 12 |
|
| 0 | 0 |
| 3 | 0 | 0 | 0 | 15 | 1 | 0 | 0 |
|
| 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| 0 | 2 | 0 | 0 |
| 0 | 12 | 0 | 0 | 2 | 0 |
|
| 0 | 1 | 0 | 0 | 0 |
| 0 | 0 | 0 | 2 | 1 |
|
| 0 | 1 | 0 | 0 | 1 | 0 |
| 0 | 0 | 0 | 0 |
|
| 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 0 | 0 | 1 |
|
| 0 | 0 | 1 | 0 | 0 | 0 | 0 | 3 |
| 0 | 7 |
|
| 1 | 23 | 0 | 0 | 1 | 18 | 1 | 0 | 0 |
| 1 |
|
| 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 8 | 4 |
|
The column population is that to which the genotype belongs and the row population is that allocated by the model. Therefore, the correct classifications are on the highlighted diagonal and the incorrect ones outside of it.
Classification of soybean genotypes in 11 populations of different relative maturity groups by k-fold approach.
| POP | Brazil | Southern | Northern | GBN1 | GB2 | GS1 | GS2 | GN2 | TGM7 | TGM6 | TGM8 |
|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
| 11 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 |
|
| 4 |
| 0 | 0 | 1 | 4 | 0 | 0 | 0 | 3 | 0 |
|
| 1 | 0 |
| 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
|
| 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| 0 | 2 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 | 0 |
|
| 0 | 2 | 0 | 0 | 0 |
| 0 | 0 | 0 | 1 | 0 |
|
| 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 |
|
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 |
|
| 3 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
The column population is that to which the genotype belongs and the row population is that allocated by the model. Therefore, the correct classifications are on the highlighted diagonal and the incorrect ones outside of it.
Figure 1Incorrect classifications of soybean genotypes from 11 populations by k-fold methodology in agricultural years of evaluation 2017/2018, 2018/2019, and 2019/2020.