| Literature DB >> 25358359 |
Minsu Park1, Tae-Hun Kim1, Eun-Seok Cho1, Heebal Kim2, Hee-Seok Oh1.
Abstract
This study considers a problem of genomic selection (GS) for adjacent genetic markers of Yorkshire pigs which are typically correlated. The GS has been widely used to efficiently estimate target variables such as molecular breeding values using markers across the entire genome. Recently, GS has been applied to animals as well as plants, especially to pigs. For efficient selection of variables with specific traits in pig breeding, it is required that any such variable selection retains some properties: i) it produces a simple model by identifying insignificant variables; ii) it improves the accuracy of the prediction of future data; and iii) it is feasible to handle high-dimensional data in which the number of variables is larger than the number of observations. In this paper, we applied several variable selection methods including least absolute shrinkage and selection operator (LASSO), fused LASSO and elastic net to data with 47K single nucleotide polymorphisms and litter size for 519 observed sows. Based on experiments, we observed that the fused LASSO outperforms other approaches.Entities:
Keywords: Genomic Selection; Litter Size; Pig; Regularized Regression; Single Nucleotide Polymorphism
Year: 2014 PMID: 25358359 PMCID: PMC4213677 DOI: 10.5713/ajas.2014.14236
Source DB: PubMed Journal: Asian-Australas J Anim Sci ISSN: 1011-2367 Impact factor: 2.509
Figure 1Observed sows per parity. The range of parity has from 1 to 12 and initial distinct observations having litter size values are 4,163 Yorkshire sows.
Figure 2Boxplot of litter size per parity. Boxplot depicts the distribution of the sows per parity through their quartiles. The black line in box represents the second quantile (median) of litter size and the upper and lower boundary of box means third quantile and first quantile, respectively.
Figure 3Identification of the Gaussian assumption. The empirical distribution of transformed average litter size by the Box-Cox transformation (left) and the normal Q-Q plot comparing randomly generated by independent normal data to the standard normal population (right).
PE1 and average PE by regularized regressions and MFPE2
| Fold | MFPE | Regularized regression | ||
|---|---|---|---|---|
|
| ||||
| LASSO | Fused LASSO | Elastic net | ||
| 1 | 1.0061 | 0.9673 | 0.9514 | 0.9610 |
| 2 | 0.9796 | 0.8052 | 0.7777 | 0.7794 |
| 3 | 0.8429 | 0.7227 | 0.7049 | 0.7210 |
| 4 | 0.9595 | 0.8370 | 0.8172 | 0.8306 |
| 5 | 1.0420 | 0.8282 | 0.8061 | 0.8257 |
| 6 | 1.0147 | 0.9110 | 0.8885 | 0.9236 |
| 7 | 1.0813 | 0.9950 | 0.9809 | 0.9918 |
| 8 | 1.0241 | 0.8880 | 0.8635 | 0.8851 |
| 9 | 1.0163 | 0.8972 | 0.8784 | 0.8931 |
| 10 | 0.9568 | 0.8235 | 0.8074 | 0.8169 |
| Ave PE | 0.9923 | 0.8675 | 0.8476 | 0.8628 |
PE, prediction error; MFPE, mean fitting prediction error; LASSO, least absolute shrinkage and selection operator; Ave PE, average prediction error for fold.
The root mean squared error with respect to fitted coefficients from the first-deep training set.
The root mean squared error with respect to fitted mean of litter size from the first-deep training set.
Number of non-zero estimated coefficients derived from training set by regularized regression methods (total SNPs: 47,112)
| Fold | Regularized regression | ||
|---|---|---|---|
|
| |||
| LASSO | Fused LASSO | Elastic net | |
| 1 | 445 | 884 | 903 |
| 2 | 850 | 847 | 678 |
| 3 | 499 | 759 | 863 |
| 4 | 510 | 514 | 368 |
| 5 | 535 | 851 | 601 |
| 6 | 553 | 821 | 869 |
| 7 | 949 | 1,339 | 1,035 |
| 8 | 1,056 | 963 | 1,139 |
| 9 | 917 | 850 | 1,004 |
| 10 | 899 | 1369 | 620 |
| Ave numb1 | 721.3 | 953.3 | 808 |
LASSO, least absolute shrinkage and selection operator; Ave numb, average number.
Accuracy (Pearson correlation1) and average correlation by regularized regression methods
| Fold | Regularized regression | ||
|---|---|---|---|
|
| |||
| LASSO | Fused LASSO | Elastic net | |
| 1 | 0.3627 | 0.4150 | 0.3972 |
| 2 | 0.6802 | 0.6978 | 0.6966 |
| 3 | 0.6136 | 0.6410 | 0.6239 |
| 4 | 0.5600 | 0.5848 | 0.5694 |
| 5 | 0.7295 | 0.7510 | 0.7338 |
| 6 | 0.5973 | 0.6265 | 0.6011 |
| 7 | 0.4849 | 0.5126 | 0.4925 |
| 8 | 0.5931 | 0.6070 | 0.5962 |
| 9 | 0.5200 | 0.5422 | 0.5291 |
| 10 | 0.5708 | 0.5891 | 0.5777 |
| Ave corr | 0.5712 | 0.5967 | 0.5818 |
LASSO, least absolute shrinkage and selection operator; Ave corr, average Pearson correlation for fold.
Pearson correlation coefficient is obtained by the true litter size vectors in validation set and predicted values which coefficients in the model are derived from training set at each fold.
The 10 SNPs with the highest coefficients (in absolute value) selected by the fused LASSO1
| Name of SNP | Coef |
|---|---|
| M1GA0023299 | 0.0099 |
| MARC0015851 | 0.0094 |
| H3GA0002658 | 0.0084 |
| ASGA0001125 | 0.0074 |
| ALGA0106999 | 0.0069 |
| MARC0016306 | 0.0068 |
| ASGA0080059 | 0.0064 |
| ASGA0054467 | −0.0063 |
| MARC0027886 | −0.0064 |
| MARC0023564 | −0.0064 |
SNP, single nucleotide polymorphism; LASSO, least absolute shrinkage and selection operator.
The significant SNPs in the final model are obtained from the whole data set using the fused LASSO.
Estimated coefficient by the fused LASSO.
Figure 4Scatter plot of true average litter size and predicted litter size obtained by the fused LASSO in the final model. The sample correlation coefficient is 0.7041. LASSO, least absolute shrinkage and selection operator.