| Literature DB >> 35173770 |
Karansher S Sandhu1, Shruti Sunil Patil2, Meriem Aoun1, Arron H Carter1.
Abstract
Soft white wheat is a wheat class used in foreign and domestic markets to make various end products requiring specific quality attributes. Due to associated cost, time, and amount of seed needed, phenotyping for the end-use quality trait is delayed until later generations. Previously, we explored the potential of using genomic selection (GS) for selecting superior genotypes earlier in the breeding program. Breeders typically measure multiple traits across various locations, and it opens up the avenue for exploring multi-trait-based GS models. This study's main objective was to explore the potential of using multi-trait GS models for predicting seven different end-use quality traits using cross-validation, independent prediction, and across-location predictions in a wheat breeding program. The population used consisted of 666 soft white wheat genotypes planted for 5 years at two locations in Washington, United States. We optimized and compared the performances of four uni-trait- and multi-trait-based GS models, namely, Bayes B, genomic best linear unbiased prediction (GBLUP), multilayer perceptron (MLP), and random forests. The prediction accuracies for multi-trait GS models were 5.5 and 7.9% superior to uni-trait models for the within-environment and across-location predictions. Multi-trait machine and deep learning models performed superior to GBLUP and Bayes B for across-location predictions, but their advantages diminished when the genotype by environment component was included in the model. The highest improvement in prediction accuracy, that is, 35% was obtained for flour protein content with the multi-trait MLP model. This study showed the potential of using multi-trait-based GS models to enhance prediction accuracy by using information from previously phenotyped traits. It would assist in speeding up the breeding cycle time in a cost-friendly manner.Entities:
Keywords: end-use quality; genomic prediction; heritability; machine learning; multi-trait; secondary traits; wheat
Year: 2022 PMID: 35173770 PMCID: PMC8841657 DOI: 10.3389/fgene.2022.831020
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Summary statistics of seven end-use quality traits evaluated from the SWW population.
| Trait | Abbreviation | Mean | Standard error | Heritability | Units |
|---|---|---|---|---|---|
| Grain protein content | GPC | 10.73 | 0.05 | 0.56 | percent |
| Flour protein | FPROT | 8.93 | 0.04 | 0.57 | percent |
| Flour ash | FASH | 0.39 | 0.001 | 0.88 | percent |
| Milling score | MSCOR | 85.6 | 0.10 | 0.81 | unitless |
| Flour yield | FYELD | 69.9 | 0.09 | 0.91 | percent |
| Cookie diameter | CODI | 9.2 | 0.008 | 0.89 | cm |
| Flour SDS sedimentation | FSDS | 10.1 | 0.09 | 0.92 | g/mL |
FIGURE 1Phenotypic correlation among the seven end-use quality traits evaluated from the SWW population.
FIGURE 2Genetic correlation among the seven end-use quality traits evaluated from the SWW population.
FIGURE 3Principal component analysis for the 666 SWW genotypes obtained using 40,518 SNP markers.
Hyperparameters optimized for seven end-use quality traits using the uni-trait MLP model.
| Hyperparameter | GPC | FPROT | FASH | MSCOR | FYELD | CODI | FSDS |
|---|---|---|---|---|---|---|---|
| Activation function | relu | relu | tanh | relu | relu | tanh | tanh |
| Epochs | 200 | 200 | 100 | 150 | 150 | 200 | 150 |
| Dropout | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
| Learning rate | adaptive | adaptive | constant | adaptive | constant | adaptive | constant |
| No. of hidden layers | 4 | 3 | 4 | 3 | 3 | 4 | 2 |
| No. of neurons | (30, 30, 30, 30) | (24, 24, 24) | (50, 50, 25, 25) | (30, 30, 10) | (90, 90) | (100, 50, 25, 25) | (50, 50) |
| Regularization | 0.1 | 0.1 | 0.05 | 0.02 | 0.05 | 0.1 | 0.001 |
| Solver | Adam | Adam | SGD | L-BFGS | SGD | L-BFGS | SGD |
Hyperparameters optimized for seven end-use quality traits using the multi-trait MLP model.
| Hyperparameter | GPC and FPROT | FPROT and FSDS | FASH and MSCOR | FYELD and MSCOR | CODI and FSDS |
|---|---|---|---|---|---|
| Activation function | relu | relu | relu | relu | relu |
| Epochs | 200 | 200 | 200 | 200 | 200 |
| Dropout | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
| Learning rate | adaptive | adaptive | adaptive | adaptive | adaptive |
| No. of hidden layers | 5 | 4 | 5 | 4 | 4 |
| No. of neurons | (90, 90, 90, 90, 90) | (100, 60, 60, 60) | (50, 50, 50, 50) | (30, 15, 15, 10) | (100, 90, 90, 70) |
| Regularization | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 |
| Solver | Adam | Adam | Adam | Adam | Adam |
FIGURE 4Prediction accuracies for seven end-use quality traits using four different uni- and multi-trait genomic selection models for the Pullman location.
FIGURE 5Prediction accuracies for seven end-use quality traits using four different uni- and multi-trait genomic selection models for the Lind location.
Prediction accuracies for seven end-use quality traits using four different uni- and multi-trait genomic selection models for the two locations across the years, namely, Pullman and Lind using the cross-validation approach.
| Uni-trait models | Multi-trait models | ||||||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
| Pullman | GPC | 0.55 | 0.54 | 0.59 | 0.60 | 0.59 | 0.50 |
| 0.72 |
| FPROT | 0.58 | 0.58 | 0.61 | 0.62 | 0.64 | 0.61 | 0.66 |
| |
| FASH | 0.55 | 0.59 | 0.58 | 0.59 | 0.63 | 0.58 | 0.62 |
| |
| MSCOR | 0.58 | 0.52 | 0.60 | 0.63 | 0.66 | 0.57 | 0.64 |
| |
| FYELD | 0.71 | 0.64 |
| 0.75 | 0.68 | 0.65 | 0.75 | 0.73 | |
| CODI | 0.67 | 0.67 |
|
| 0.64 | 0.61 | 0.67 | 0.64 | |
| FSDS | 0.67 | 0.66 | 0.69 | 0.70 | 0.71 | 0.72 | 0.73 |
| |
| Lind | GPC | 0.51 | 0.51 | 0.54 | 0.55 | 0.55 | 0.53 | 0.58 |
|
| FPROT | 0.48 | 0.46 | 0.51 | 0.53 | 0.53 | 0.50 |
| 0.54 | |
| FASH | 0.51 | 0.44 | 0.54 | 0.56 | 0.59 | 0.40 | 0.62 |
| |
| MSCOR | 0.48 | 0.53 | 0.50 | 0.52 | 0.57 | 0.57 | 0.55 |
| |
| FYELD | 0.64 | 0.58 | 0.68 | 0.67 | 0.66 | 0.59 | 0.69 |
| |
| CODI | 0.56 | 0.54 | 0.57 | 0.58 | 0.55 | 0.54 | 0.58 |
| |
| FSDS | 0.59 | 0.59 | 0.62 | 0.63 | 0.64 | 0.62 |
| 0.64 | |
| |
|
|
|
|
|
|
|
| |
Highest prediction accuracies are bolded for each trait.
Prediction accuracies for seven end-use quality traits using four different uni- and multi-trait genomic prediction models for the across-location predictions. 2019_Pullman_Lind represents the scenario where predictions were made on 2019_Pullman by training models on the Lind dataset.
| Uni-trait models | Multi-trait models | Multi-trait multi-environment models | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
| 2019_Pullman_Lind | GPC | 0.25 | 0.23 | 0.30 | 0.31 | 0.32 | 0.28 |
| 0.31 | 0.31 |
| FPROT | 0.35 | 0.34 | 0.40 | 0.40 | 0.40 | 0.29 | 0.39 | 0.44 |
| |
| FASH | 0.40 | 0.41 | 0.41 | 0.41 | 0.42 |
| 0.44 | 0.43 |
| |
| MSCOR | 0.27 | 0.23 | 0.30 | 0.30 | 0.33 | 0.27 | 0.35 |
| 0.36 | |
| FYELD | 0.41 | 0.42 | 0.48 | 0.50 | 0.42 | 0.45 | 0.51 | 0.50 |
| |
| CODI | 0.40 | 0.43 | 0.45 | 0.46 | 0.47 | 0.44 | 0.49 | 0.53 |
| |
| FSDS | 0.36 | 0.30 | 0.44 | 0.43 | 0.38 | 0.34 | 0.47 |
| 0.46 | |
| 2019_Lind_Pullman | GPC | 0.27 | 0.29 | 0.30 | 0.28 | 0.31 | 0.33 | 0.37 | 0.36 |
|
| FPROT | 0.34 | 0.37 | 0.42 | 0.42 | 0.37 | 0.39 | 0.42 |
| 0.38 | |
| FASH | 0.41 | 0.38 | 0.42 | 0.42 |
| 0.46 | 0.44 | 0.45 | 0.47 | |
| MSCOR | 0.28 | 0.28 | 0.29 | 0.31 | 0.31 | 0.28 | 0.31 |
| 0.31 | |
| FYELD | 0.43 | 0.42 | 0.47 | 0.50 | 0.47 | 0.43 | 0.52 | 0.51 |
| |
| CODI | 0.42 | 0.45 | 0.44 |
| 0.43 | 0.44 | 0.41 | 0.46 |
| |
| FSDS | 0.38 | 0.35 | 0.41 | 0.40 | 0.42 | 0.39 |
|
| 0.42 | |
| |
|
|
|
|
|
|
|
|
| |
Highest prediction accuracies are bolded for each trait.
FIGURE 6Prediction accuracies across environment Pullman with training on the Lind dataset for seven end-use quality traits using four different uni- and multi-trait and one Bayesian multi-trait multi-environment genomic prediction models.
FIGURE 7Prediction accuracies across environment Lind with training on the Pullman dataset for seven end-use quality traits using four different uni- and multi-trait and one Bayesian multi-trait multi-environment genomic prediction models.