| Literature DB >> 34843488 |
Antônio Carlos da Silva Júnior1, Michele Jorge da Silva1, Cosme Damião Cruz1, Isabela de Castro Sant'Anna2, Gabi Nunes Silva3, Moysés Nascimento4, Camila Ferreira Azevedo4.
Abstract
The present study evaluated the importance of auxiliary traits of a principal trait based on phenotypic information and previously known genetic structure using computational intelligence and machine learning to develop predictive tools for plant breeding. Data of an F2 population represented by 500 individuals, obtained from a cross between contrasting homozygous parents, were simulated. Phenotypic traits were simulated based on previously established means and heritability estimates (30%, 50%, and 80%); traits were distributed in a genome with 10 linkage groups, considering two alleles per marker. Four different scenarios were considered. For the principal trait, heritability was 50%, and 40 control loci were distributed in five linkage groups. Another phenotypic control trait with the same complexity as the principal trait but without any genetic relationship with it and without pleiotropy or a factorial link between the control loci for both traits was simulated. These traits shared a large number of control loci with the principal trait, but could be distinguished by the differential action of the environment on them, as reflected in heritability estimates (30%, 50%, and 80%). The coefficient of determination were considered to evaluate the proposed methodologies. Multiple regression, computational intelligence, and machine learning were used to predict the importance of the tested traits. Computational intelligence and machine learning were superior in extracting nonlinear information from model inputs and quantifying the relative contributions of phenotypic traits. The R2 values ranged from 44.0% - 83.0% and 79.0% - 94.0%, for computational intelligence and machine learning, respectively. In conclusion, the relative contributions of auxiliary traits in different scenarios in plant breeding programs can be efficiently predicted using computational intelligence and machine learning.Entities:
Mesh:
Year: 2021 PMID: 34843488 PMCID: PMC8629227 DOI: 10.1371/journal.pone.0257213
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Description of phenotypic traits (PT) in relation to heritability (h2) and the distribution of linkage groups (LG).
| PT |
| LG1 | LG2 | LG3 | LG4 | LG5 | LG6 | LG7 | LG8 | LG9 | LG10 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.5 | 8 | 8 | 8 | 8 | 8 | - | - | - | - | - |
| 2 | 0.5 | - | - | - | - | - | 8 | 8 | 8 | 8 | 8 |
| 3 | 0.3 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
| 4 | 0.5 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
| 5 | 0.8 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
| 6 | 0.3 | 4 | 4 | 4 | - | - | 8 | 8 | 4 | 4 | 4 |
| 7 | 0.5 | 4 | 4 | 4 | - | - | 8 | 8 | 4 | 4 | 4 |
| 8 | 0.8 | 4 | 4 | 4 | - | - | 8 | 8 | 4 | 4 | 4 |
| 9 | 0.3 | 4 | - | - | - | - | 8 | 8 | 8 | 8 | 4 |
| 10 | 0.5 | 4 | - | - | - | - | 8 | 8 | 8 | 8 | 4 |
| 11 | 0.8 | 4 | - | - | - | - | 8 | 8 | 8 | 8 | 4 |
Phenotypic traits (PT) were simulated using previously established means and heritability (h2). The h2 values used were 30%, 50%, and 80%. The traits were established by the action of 40 locus alleles based on 1,000 markers in 10 LGs with differential additive effects.
Location of markers and control loci of traits.
| Linkage group | Markers | Control loci |
|---|---|---|
| 1 | 1–100 | 10 20 30 40 50 60 70 80 |
| 2 | 101–200 | 110 120 130 140 150 160 170 180 |
| 3 | 201–300 | 210 220 230 240 250 260 270 280 |
| 4 | 301–400 | 310 320 330 340 350 360 370 380 |
| 5 | 401–500 | 410 420 430 440 450 460 470 480 |
| 6 | 501–600 | 510 520 530 540 550 560 570 580 |
| 7 | 601–700 | 610 620 630 640 650 660 670 680 |
| 8 | 701–800 | 710 720 730 740 750 760 770 780 |
| 9 | 801–900 | 810 820 830 840 850 860 870 880 |
| 10 | 900–1000 | 910 920 930 940 950 960 970 980 |
Eight control loci were used for each link group.
Maximum estimate of the coefficient of determination (R2) for all methodologies using the explanatory traits for phenotypic trait 1.
| Scenario | CI | ML | MR | ||||
|---|---|---|---|---|---|---|---|
| MLP (NN) | RBF | DT | RF | BA | BO | Stepwise | |
| 1 | 83.02 (30) | 54.42 | 51.00 | 94.40 | 94.64 | 82.12 | 41.03 |
| 2 | 77.89 (29) | 48.51 | 49.24 | 93.82 | 93.83 | 79.74 | 33.88 |
| 3 | 75.49 (29) | 44.04 | 43.66 | 93.99 | 93.89 | 79.86 | 34.82 |
| 4 | 82.14 (25) | 47.06 | 45.75 | 93.49 | 93.32 | 80.01 | 38.16 |
CI: computational intelligence, ML: machine learning, MR: multiple regression, MLP: multilayer perceptron, RBF: radial basis function, DT: decision tree, RF: random forest, BA: bagging, BO: boosting. NN: number of neurons in hidden layer.
Pearson’s correlation coefficients between phenotypic trait (PT) 1 and other traits in the four scenarios.
| PT | Scenario | |||
|---|---|---|---|---|
| 1 | 2 | 3 | 4 | |
| 2 | 0.08ns | 0.10* | -0.08ns | -0.05ns |
| 3 | 0.07ns | 0.38** | 0.34** | 0.27** |
| 4 | 0.43** | 0.30** | 0.38** | 0.47** |
| 5 | 0.47** | 0.43** | 0.39** | 0.47** |
| 6 | 0.14** | 0.05ns | 0.05ns | 0.04ns |
| 7 | 0.25** | 0.19** | 0.26** | 0.14** |
| 8 | 0.12** | 0.24** | 0.15** | 0.23** |
| 9 | 0.03ns | -0.13** | 0.06ns | 0.21** |
| 10 | 0.03ns | -0.04ns | 0.03ns | 0.12** |
| 11 | 0.04ns | 0.03ns | 0.02ns | 0.08ns |
Significant at **1% and *5% probability of error by t-test. ns: non-significant. Scenario 1 represents the first four control loci, scenario 2 represents the last four control loci, scenario 3 represents the first two and last two control loci, and scenario 4 represents the central control loci (excluding the first and last two loci).
Fig 1Percentage gain of indirect selection between phenotypic trait (PT) 1 and the other traits in the four scenarios.
| PT | 1 | 2 | 3 | 4 |
|---|---|---|---|---|
| 2 | 0.06 | 1.32 |
|
|
| 3 | 0.05 | 3.94+ | 0.25 | 1.97 |
| 4 | 0.31+ | 2.12+ | 0.28+ | 3.46+ |
|
| 0.34+ | 2.73+ | 0.29+ | 3.12+ |
| 6 | 0.1 | -0.32 | 0.04 | -0.27 |
| 7 | 0.18 | 1.05 | 0.19 | 1.18 |
| 8 | 0.08 | 1.63 | 0.11 | 1.26 |
| 9 | 0.02 | -1.4 | 0.04 | 2.51 |
| 10 | 0.02 | -0.42 | 0.02 | 0.76 |
| 11 | 0.03 | 0.06 | 0.02 | 0.32 |
+: minor importance in PT1 prediction Scenario 1 represents the first four control loci, scenario 2 represents the last four control loci, scenario 3 represents the first two and last two control loci, and scenario 4 represents the central control loci (excluding the first and last two loci).
Estimation of the coefficient of determination (R2) for the prediction of phenotypic trait 1 (PT1) using multilayer perceptron (MLP).
| PT | Zero | Permutation | ||||||
|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 1 | 2 | 3 | 4 | |
| 2 | 34.13 | 16.28 | 17.71 | 6.69 | 45.54 | 36.33 | 43.89* | 44.14* |
| 3 | 11.33 | 0.81 | 11.23 | 1.47 | 36.14 | 29.23 | 38.78 | 35.15 |
| 4 | 10.32 | 3.62 | 2.72 | 0.13 | 19.45 | 32.84 | 41.32 | 12.03+ |
| 5 | 11.93 | 1.22+ | 3.30 | 0.76 | 17.83+ | 18.72+ | 19.09+ | 16.29 |
| 6 | 14.52 | 17.54* | 14.86 | 0.03+ | 50.35* | 31.42 | 33.34 | 21.11 |
| 7 | 8.34 | 3.80 | 23.53* | 1.87 | 42.83 | 37.08 | 37.47 | 24.82 |
| 8 | 0.05+ | 10.99 | 1.41+ | 0.19 | 22.30 | 39.62 | 32.28 | 29.26 |
| 9 | 22.52 | 24 | 13.1 | 15.44* | 31.26 | 56.68* | 38.01 | 41.98 |
| 10 | 36.77* | 1.83 | 7.11 | 14.21 | 46.71 | 30.00 | 30.96 | 34.12 |
| 11 | 24.94 | 7.99 | 7.24 | 0.06 | 34.48 | 32.46 | 38.42 | 21.34 |
Auxiliary traits of +major and *minor importance in the prediction of PT1 Scenario 1 represents the first four control loci, scenario 2 represents the last four control loci, scenario 3 represents the first two and last two control loci, and scenario 4 represents the central control loci (excluding the first and last two loci).
Percentage of relative contribution using the Garson’s algorithm (1991) modified by Goh (1995) for 10 phenotypic traits (PTs) relative to PT1 in the four scenarios.
| PT | Scenarios | |||
|---|---|---|---|---|
| 1 | 2 | 3 | 4 | |
| 2 | 6.12* | 5.24* | 8.28* | 8.77* |
| 3 | 9.26 | 8.65 | 10.63 | 9.27 |
| 4 | 7.89 | 11.13 | 9.47 | 10.26 |
| 5 | 13.11+ | 12.04+ | 11.41+ | 12.73+ |
| 6 | 9.47 | 10.16 | 9.00 | 9.11 |
| 7 | 9.87 | 10.49 | 8.98 | 9.79 |
| 8 | 11.14 | 11.96 | 10.64 | 10.16 |
| 9 | 10.81 | 9.85 | 11.03 | 10.35 |
| 10 | 10.77 | 9.96 | 10.43 | 11.33 |
| 11 | 11.55 | 10.53 | 11.12 | 11.21 |
Auxiliary traits of +major and *minor importance in the prediction of PT1 Scenario 1 represents the first four control loci, scenario 2 represents the last four control loci, scenario 3 represents the first two and last two control loci, and scenario 4 represents the central control loci (excluding the first and last two loci).
Estimation of the coefficient of determination (R2) for the prediction of phenotypic trait 1 (PT1) using the radial basis function (RBF).
| PT | Zero | Permutation | ||||||
|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 1 | 2 | 3 | 4 | |
| 2 | 37.15* | 9.38 | 27.92 | 37.99* | 47.96 | 32.27 | 33.27 | 37.82 |
| 3 | 12.63 | 23.66 | 15.44 | 25.94 | 27.35 | 37.66 | 24.24 | 37.01 |
| 4 | 19.39 | 23.39 | 21.65 | 18.91 | 24.96 | 24.40 | 36.66 | 32.15 |
| 5 | 8.40 | 8.60+ | 9.73+ | 19.91 | 9.31+ | 16.19+ | 14.46+ | 16.86+ |
| 6 | 32.37 | 20.21 | 15.79 | 30.83 | 44.45 | 29.01 | 33.78 | 38.95 |
| 7 | 28.17 | 19.46 | 33.69 | 19.94 | 44.39 | 40.89* | 31.02 | 33.63 |
| 8 | 29.26 | 13.03 | 11.02 | 9.79+ | 43.74 | 38.73 | 33.11 | 39.69 |
| 9 | 36.10 | 36.20* | 22.62 | 28.53 | 33.47 | 36.65 | 39.25 | 28.79 |
| 10 | 39.41 | 26.63 | 32.27 | 19.12 | 50.79* | 39.77 | 39.40* | 40.38* |
| 11 | 2.01+ | 30.15 | 37.32* | 25.23 | 29.40 | 40.14 | 34.84 | 30.49 |
Auxiliary traits of +major and *minor importance in the prediction of PT1 Scenario 1 represents the first four control loci, scenario 2 represents the last four control loci, scenario 3 represents the first two and last two control loci, and scenario 4 represents the central control loci (excluding the first and last two loci).
Average estimate of the relative contribution of explanatory phenotypic traits (PTs) to the prediction of PT1 using machine learning in the four scenarios.
| PT | Bagging | Random forest | Boosting | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 1 | 2 | 3 | 4 | 1 | 2 | 3 | 4 | |
| 2 | 8.35* | 10.63 | 16.5 | 13.14 | 7.32 | 13.79 | 15.71 | 15.94 | 1.08* | 3.20 | 11.18 | 11.42 |
| 3 | 20.50 | 25.36 | 18.23 | 12.70 | 22.51 | 25.27 | 18.39 | 13.81 | 11.92 | 19.22 | 16.58 | 9.88 |
| 4 | 29.58 | 14.72 | 19.22 | 23.92 | 28.7 | 14.69 | 20.33 | 24.85 | 26.69 | 6.73 | 13.04 | 27.57 |
| 5 | 34.13+ | 40.45+ | 24.8+ | 29.36+ | 33.9+ | 34.78+ | 25.56+ | 26.98+ | 33.82+ | 31.31+ | 23.28+ | 24.67+ |
| 6 | 10.67 | 16.48 | 7.18 | 9.17 | 9.98 | 16.4 | 7.82* | 10.05 | 3.23 | 8.89 | 5.85 | 5.94 |
| 7 | 12.22 | 9.45* | 12.42 | 7.75 | 13.65 | 8.51* | 13.46 | 8.38 | 3.89 | 2.39* | 9.35 | 1.69 |
| 8 | 12.28 | 11.99 | 9.34 | 5.41 | 12.96 | 11.89 | 9.31 | 6.03 | 3.64 | 3.86 | 3.88 | 1.68* |
| 9 | 13.00 | 14.24 | 7.55* | 11.57 | 13.63 | 14.99 | 8.59 | 14.11 | 4.71 | 7.70 | 1.82* | 6.82 |
| 10 | 4.03 | 15.59 | 11.64 | 4.00* | 3.82* | 15.92 | 13.05 | 5.35* | 3.20 | 10.68 | 4.96 | 2.62 |
| 11 | 15.43 | 11.61 | 18.6 | 14.48 | 15.9 | 12.62 | 15.03 | 16.39 | 7.82 | 6.04 | 10.07 | 7.70 |
Auxiliary traits of +major and *minor importance in the prediction of PT1 Scenario 1 represents the first four control loci, scenario 2 represents the last four control loci, scenario 3 represents the first two and last two control loci, and scenario 4 represents the central control loci (excluding the first and last two loci).