| Literature DB >> 35578371 |
Xue Wang1, Shaolei Shi1, Guijiang Wang2, Wenxue Luo2, Xia Wei3, Ao Qiu1, Fei Luo2, Xiangdong Ding4.
Abstract
BACKGROUND: Recently, machine learning (ML) has become attractive in genomic prediction, but its superiority in genomic prediction over conventional (ss) GBLUP methods and the choice of optimal ML methods need to be investigated.Entities:
Keywords: Genomic prediction; Machine learning; Pig; Prediction accuracy
Year: 2022 PMID: 35578371 PMCID: PMC9112588 DOI: 10.1186/s40104-022-00708-0
Source DB: PubMed Journal: J Anim Sci Biotechnol ISSN: 1674-9782
Summary of two reproduction traits of Yorkshire pigs
| Traita | Number of records | Birth year | Genotyped animals | Mean | SD | Minimum | Maximum | σ2a | σ2e | h2(SE) |
|---|---|---|---|---|---|---|---|---|---|---|
| TNB | 4274 | 2016–2020 | 2566 | 13 | 3.38 | 3 | 24 | 1.26 | 8.95 | 0.12 (0.034) |
| NBA | 4274 | 2016–2020 | 2566 | 12 | 3.13 | 3 | 24 | 0.98 | 7.13 | 0.12 (0.032) |
a TNB: total number of piglets born; NBA: number of piglets born alive
SE standard error
The optimal hyperparameters of each ML model obtained through a grid search for TNB and NBA traits in 20 replicates of 5-fold CV
| Method | Optimal hyperparametersa |
|---|---|
| SVR | kernel = ‘rbf’, C = 7, gamma = 0.0001 |
| KRR | kernel = ‘rbf’, |
| RF | n_estimators = 250, max_depth = None |
| Adaboost.R2_SVR | n_estimators = 50, kernel = ‘rbf’, C = 7, gamma = 0.0001 |
| Adaboost.R2_KRR | n_estimators = 50, kernel = ‘rbf’, |
a Optimal hyperparameters: The optimal hyperparameters of each machine learning method obtained by using a grid search
Fig. 1Imputation accuracy. Imputation accuracy of GenoBaits Porcine SNP 50 K to PorcineSNP50 BeadChip at different minor allele frequency (MAF) intervals (a) and chromosomes (b). DR2, the estimated squared correlation between the estimated allele dose and the true allele dose; Genotype concordance rate (CR), the ratio of correctly imputed genotypes; Genotype correlation (COR), the correlation coefficient between the imputed variants and the true variants
Accuracies and unbiasedness of genomic prediction on TNB and NBA from seven methods in 20 replicates of 5-fold CV
| Hyper-parameters | Method | TNB1 | NBA2 | ||
|---|---|---|---|---|---|
| Accuracy3 | Unbiasedness4 | Accuracy3 | Unbiasedness4 | ||
| GBLUP | 0.248a ± 0.026 | 0.958 ± 0.132 | 0.208a ± 0.025 | 0.931 ± 0.142 | |
| ssGBLUP | 0.251a ± 0.026 | 0.901 ± 0.121 | 0.221ab ± 0.026 | 0.844 ± 0.113 | |
| BayesHE | 0.243a ± 0.025 | 1.015 ± 0.148 | 0.207a ± 0.026 | 1.009 ± 0.171 | |
| Tuning | SVR | 0.295b ± 0.025 | 1.23 ± 0.119 | 0.254b ± 0.023 | 1.106 ± 0.11 |
| KRR | 0.295b ± 0.025 | 1.266 ± 0.125 | 0.256b ± 0.023 | 1.151 ± 0.113 | |
| RF | 0.270ab ± 0.029 | 1.229 ± 0.152 | 0.248ab ± 0.028 | 1.188 ± 0.147 | |
| Adaboost.R2_SVR | 0.293b ± 0.025 | 1.363 ± 0.138 | 0.254b ± 0.024 | 1.256 ± 0.131 | |
| Adaboost.R2_KRR | 0.292b ± 0.025 | 1.344 ± 0.136 | 0.258b ± 0.024 | 1.249 ± 0.129 | |
| Default | SVR | 0.255 ± 0.027 | 1.275 ± 0.147 | 0.224 ± 0.023 | 1.098 ± 0.126 |
| KRR | 0.264 ± 0.025 | 1.007 ± 0.108 | 0.222 ± 0.024 | 0.879 ± 0.101 | |
| RF | 0.246 ± 0.028 | 1.064 ± 0.142 | 0.225 ± 0.027 | 1.002 ± 0.128 | |
| Adaboost.R2_SVR | 0.273 ± 0.024 | 0.998 ± 0.106 | 0.228 ± 0.026 | 0.822 ± 0.099 | |
| Adaboost.R2_KRR | 0.254 ± 0.024 | 0.759 ± 0.085 | 0.209 ± 0.027 | 0.636 ± 0.085 | |
1 TNB: total number of piglets born
2 NBA: number of piglets born alive
3 Accuracy: the correlation between corrected phenotypes and predicted values of the validation population;
4 Unbiasedness: the regression of corrected phenotypes onto the predicted values
The different superscript of accuracy indicates the significant difference by the Hotelling-Williams test
Mean squared error (MSE) and mean absolute error (MAE) of seven methods for TNB and NBA as assessed with 20 replicates of 5-fold CV
| Hyperparameters | Method | TNB | NBA | ||
|---|---|---|---|---|---|
| MSE | MAE | MSE | MAE | ||
| GBLUP | 5.259 | 1.749 | 4.168 | 1.606 | |
| ssGBLUP | 5.26 | 1.748 | 3.95 | 1.532 | |
| BayesHE | 5.32 | 1.763 | 4.023 | 1.556 | |
| Tuning | SVR | 5.129 | 1.730 | 3.880 | 1.521 |
| KRR | 5.134 | 1.731 | 3.876 | 1.521 | |
| RF | 5.212 | 1.747 | 3.901 | 1.527 | |
| Adaboost.R2_SVR | 5.158 | 1.739 | 3.892 | 1.528 | |
| Adaboost.R2_KRR | 5.153 | 1.737 | 3.883 | 1.526 | |
| Default | SVR | 5.271 | 1.748 | 3.956 | 1.522 |
| KRR | 5.21 | 1.743 | 3.944 | 1.531 | |
| RF | 5.266 | 1.756 | 3.93 | 1.531 | |
| Adaboost.R2_SVR | 5.202 | 1.75 | 3.95 | 1.541 | |
| Adaboost.R2_KRR | 5.309 | 1.771 | 4.04 | 1.566 | |
Accuracy and mean squared error (MSE) of genomic prediction of TNB and NBA in younger individuals from seven methods
| Hyperparameters | Method | TNB1 | NBA2 | ||||
|---|---|---|---|---|---|---|---|
| Accuracy3 | MSE | Optimal hyperparameters4 | Accuracy3 | MSE | Optimal hyperparameters4 | ||
| GBLUP | 0.355ab | 11.598 | – | 0.264ab | 10.203 | – | |
| ssGBLUP | 0.408b | 11.221 | – | 0.288ab | 9.974 | – | |
| BayesHE | 0.357ab | 11.566 | – | 0.262ab | 10.143 | – | |
| Tuning | SVR | 0.307a | 11.488 | kernel = ‘rbf’; gamma = 0.00005; C = 14 | 0.229a | 10.235 | kernel = ‘rbf’; gamma = 0.00005; C = 13 |
| KRR | 0.362ab | 11.367 | kernel = ‘rbf’; gamma = 0.000001; λ = 0.07 | 0.266ab | 10.121 | kernel = ‘rbf’; gamma = 0.000001; λ = 0.12 | |
| RF | 0.385ab | 11.337 | n_estimators = 430; max_depth = None | 0.285ab | 10.116 | n_estimators = 400; max_depth = None | |
| Adaboost.R2_KRR | 0.395b | 11.254 | n_estimators = 70; kernel = ‘rbf’, gamma = 0.00001, λ = 1 | 0.328b | 9.794 | n_estimators = 60; kernel = ‘rbf’, gamma = 0.00001, λ = 0.9 | |
| Default | SVR | 0.271 | 11.858 | – | 0.17 | 10.37 | – |
| KRR | 0.346 | 11.538 | – | 0.259 | 10.158 | – | |
| RF | 0.26 | 11.867 | – | 0.179 | 10.335 | – | |
| Adaboost.R2_KRR | 0.36 | 11.392 | – | 0.322 | 9.797 | – | |
1 TNB: total number of piglets born
2 NBA: number of piglets born alive
3 Accuracy: the correlation between corrected phenotypes and predicted values of the validation population;
4Optimal hyperparameters: The optimal hyper-parameters of each machine learning method obtained by using grid search
The different superscript of accuracy indicates the significant difference by the Hotelling-Williams test
Average computing time to complete each fold of 5-fold CV according to different genomic prediction methods
| Method | TNB | NBA |
|---|---|---|
| GBLUP | 2 min 6 s | 2 min 2 s |
| ssGBLUP | 3 min 12 s | 3 min 16 s |
| BayesHE | 3 h 57 min 1 s | 3 h 35 min 13 s |
| SVR | 5 min 27 s | 5 min 7 s |
| KRR | 1 min 4 s | 1 min 16 s |
| RF | 50 min 38 s | 56 min 16 s |
| Adaboost.R2_SVR | 1 h 35 min 13 s | 1 h 15 min 28 s |
| Adaboost.R2_KRR | 5 min 3 s | 5 min 16 s |