| Literature DB >> 32595700 |
Yangfan Wang1,2, Xiao-Lin Wu2,3, Zhi Li3,4, Zhenmin Bao1,5, Richard G Tait3, Stewart Bauck3, Guilherme J M Rosa2.
Abstract
A variety of statistical methods, such as admixture models, have been used to estimate genomic breed composition (GBC). These methods, however, tend to produce non-zero components to reference breeds that shared some genomic similarity with a test animal. These non-essential GBC components, in turn, offset the estimated GBC for the breed to which it belongs. As a result, not all purebred animals have 100% GBC of their respective breeds, which statistically indicates an elevated false-negative rate in the identification of purebred animals with 100% GBC as the cutoff. Otherwise, a lower cutoff of estimated GBC will have to be used, which is arbitrary, and the results are less interpretable. In the present study, three admixture models with regularization were proposed, which produced sparse solutions through suppressing the noise in the estimated GBC due to genomic similarities. The regularization or penalty forms included the L1 norm penalty, minimax concave penalty (MCP), and smooth clipped absolute deviation (SCAD). The performances of these regularized admixture models on the estimation of GBC were examined in purebred and composite animals, respectively, and compared to that of the non-regularized admixture model as the baseline model. The results showed that, given optimal values for λ, the three sparsely regularized admixture models had higher power and thus reduced the false-negative rate for the breed identification of purebred animals than the non-regularized admixture model. Of the three regularized admixture models, the two with a non-convex penalty outperformed the one with L1 norm penalty. In the Brangus, a composite cattle breed, estimated GBC were roughly comparable among the four admixture models, but all the four models underestimated the GBC for these composite animals when non-ancestral breeds were included as the reference. In conclusion, the admixture models with sparse regularization gave more parsimonious, consistent and interpretable results of estimated GBC for purebred animals than the non-regularized admixture model. Nevertheless, the utility of regularized admixture models for estimating GBC in crossbred or composite animals needs to be taken with caution.Entities:
Keywords: SNP; admixture models; bovine; breed composition; linear regression; nonconvex penalty; sparse regularization
Year: 2020 PMID: 32595700 PMCID: PMC7300184 DOI: 10.3389/fgene.2020.00576
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Descriptive statistics of genotype data for the ten cattle breeds used in the present study.
| Angus | 20,359 (20,322) | 49,463 | 0.492 (0.247) |
| Brahman | 349 (349) | 777,962 | 0.439 (0.343) |
| 68 (43) | 49,463 | 0.431 (0.363) | |
| Brangus | 3,605 | 49,463 | 0.477 (0.231) |
| Hereford | 2,423 (2,421) | 49,463 | 0.496 (0.271) |
| Holstein | 20,350 (20,246) | 49,463 | 0.489 (0.254) |
| Jersey | 15,689 (15,607) | 49,463 | 0.489 (0.288) |
| Limousine | 5,043 (5,041) | 49,463 | 0.490 (0.228) |
| Shorthorn | 1,232 (1,218) | 49,463 | 0.491 (0.258) |
| Simmental | 14,754 (14,727) | 49,463 | 0.490 (0.226) |
| Wagyu | 23,721 (21,844) | 49,463 | 0.483 (0.302) |
In the brackets are the number of genotyped animals remained after excluding outliers.
Mean FreqA (SD) = mean (standard deviation) of allele A frequencies of genotyped SNP for each breed.
Figure 1Percent of individuals with GBC=1 obtained by the three regularized ADMIXTURE methods, each with a varying value for the regulation parameter lambda (λ). Curves were extracted from the surfaces in this figure by fixing the GBC =1 for ADMIXTURE-L1, ADMIXTURE-MCP, and ADMIXTURE-SCAD in Angus, Holstein, and Limousin, respectively.
Percent (%) of animals by categories of estimated GBC obtained using four statistical models with the 16K SNP panel in Angus (A), Holstein (H), and Limousine (L).
| 1 | 69.6 | 70.7 | 47.4 | 94.1 | 97.7 | 65.1 | 98.6 | 99.2 | 72.5 | 96.5 | 99.6 | 70.9 |
| [0.9, 1) | 18.9 | 19.5 | 9.4 | 3.3 | 1.2 | 6.7 | 0.4 | 0.3 | 4.4 | 2.3 | 0.1 | 4.3 |
| [0.8, 0.9) | 8.5 | 7.0 | 9.4 | 1.5 | 1.0 | 5.8 | 0.5 | 0.4 | 3.6 | 0.4 | 0.1 | 4.5 |
| [0.7, 0.8) | 1.8 | 2.4 | 9.2 | 0.5 | 0.1 | 8.7 | 0.1 | 0.0 | 5.0 | 0.2 | 0.0 | 6.1 |
| [0.6, 0.7) | 0.4 | 0.2 | 13.5 | 0.2 | 0.0 | 6.9 | 0.2 | 0.0 | 5.5 | 0.2 | 0.0 | 7.2 |
| [0.5, 0.6) | 0.3 | 0.0 | 6.2 | 0.1 | 0.0 | 2.8 | 0.1 | 0.0 | 4.4 | 0.1 | 0.0 | 3.5 |
| [0.5, 0.4) | 0.2 | 0.0 | 2.6 | 0.1 | 0.0 | 2.0 | 0.0 | 0.0 | 1.8 | 0.1 | 0.0 | 1.1 |
| [0.4, 0.3) | 0.1 | 0.0 | 1.2 | 0.1 | 0.0 | 0.9 | 0.0 | 0.0 | 1.2 | 0.0 | 0.0 | 0.8 |
| [0.3, 0.2) | 0.1 | 0.0 | 0.5 | 0.1 | 0.0 | 0.8 | 0.0 | 0.0 | 0.8 | 0.0 | 0.0 | 0.4 |
| [0.2, 0.1) | 0.0 | 0.0 | 0.4 | 0.0 | 0.0 | 0.2 | 0.0 | 0.0 | 0.4 | 0.0 | 0.0 | 0.3 |
| [0.1, 0) | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
ADMIXTURE, non-regularized admixture model (λ= 0); ADMIXTURE-L1, admixture model with L1 norm penalty (λ= 0.1); ADMIXTURE-MCP, admixture model with MCP penalty (λ= 0.25); ADMIXTURE-SCAD, admixture model with SCAD penalty (λ= 0.25).
Figure 2Histogram of the means of estimated GBC for 5,041 Limousin animals, obtained using four statistical models, respectively. Bar plot of the mean GBC across the 10 breeds, which were estimated by ADMIXUTUR ADMIXUTURE-L1 (λ = 0.1), ADMIXUTURE-MCP (λ = 0.25), and ADMIXUTURE-SCAD (λ = 0.25) using 5K SNP panel. Standard deviations (SD) is abled on the bar of Limousin.
Figure 3Histogram of the means of estimated GBC for 3,605 Brangus(0.625 Angus, 0.375 Brahman) obtained the four statistical models, respectively. Bar plot of the mean GBC across the ten breeds, which were estimated by ADMIXUTUR, ADMIXUTURE-L1 (λ = 0.1), ADMIXUTURE-MCP (λ = 0.25), and ADMIXUTURE-SCAD (λ = 0.25) using 5K SNP panel. Standard deviations (SD) were abled on the Angus and Brahman bars.
Percent (%) of animals by categories of estimated GBC obtained using four statistical models in Brangus.
| ADMIXTURE | 54.3 (68.3) | 11.9 | 25.1 (31.7) | 6.31 | 71.1 | 6.70 | 28.9 | 6.70 |
| ADMIXTURE-L1 | 61.5 (68.2) | 15.6 | 28.6 (31.8) | 12.1 | 77.1 | 8.70 | 22.9 | 8.70 |
| ADMIXTURE-MCP | 59.8 (68.1) | 12.9 | 27.9 (31.9) | 9.1 | 74.6 | 7.10 | 25.4 | 7.10 |
| ADMIXTURE-SCAD | 59.5 (67.9) | 13.1 | 28.1 (32.1) | 10.4 | 75.3 | 7.50 | 24.7 | 7.50 |
In the brackets are the relative GBC ratio of Angus and Brahman origin only, respectively, computed with nine reference breeds.
Figure 4Population distribution across the first (PC1) and second principal component (PC2) on the genotype data of the Brangus individuals. Animals are labels based on their Angus percent of GBC estimated by ADMIXTURE.