Literature DB >> 18076475

Machine learning classification procedure for selecting SNPs in genomic selection: application to early mortality in broilers.

N Long1, D Gianola, G J M Rosa, K A Weigel, S Avendaño.   

Abstract

Genome-wide association studies using single nucleotide polymorphisms (SNPs) can identify genetic variants related to complex traits. Typically thousands of SNPs are genotyped, whereas the number of phenotypes for which there is genomic information may be smaller. When predicting phenotypes, options for statistical model building range from incorporating all possible markers into the specification to including only sets of relevant SNPs (features). In the latter case, an efficient method of selecting influential features is required. A two-step feature selection method for binary traits was developed, which consisted of filtering (using information gain), and wrapping (using naïve Bayesian classification). The filter reduces the large number of SNPs to a much smaller size, to facilitate the wrapper step. As the procedure is tailored for discrete outcomes, an approach based on discretization of phenotypic values was developed, to enable feature selection in a classification framework. The method was applied to chick mortality rates (0-14 days of age) on progeny from 201 sires in a commercial broiler line, with the goal of identifying SNPs (over 5000) related to progeny mortality. To mimic a case-control study, sires were clustered into two groups, low and high, according to two arbitrarily chosen mortality rate cut points. By varying these thresholds, 11 different 'case-control' samples were formed, and the SNP selection procedure was applied to each sample. To compare the 11 sets of chosen SNPs, predicted residual sum of squares (PRESS) from a linear model was used. The two-step method improved naïve Bayesian classification accuracy over the case without feature selection (from around 50 to above 90% without and with feature selection in each case-control sample). The best case-control group (63 sires above or below the thresholds) had the smallest PRESS statistic among groups with model p-values below 0.003. The 17 SNPs selected using this group accounted for 31% of the variation in raw mortality rates between sire families.

Entities:  

Mesh:

Year:  2007        PMID: 18076475     DOI: 10.1111/j.1439-0388.2007.00694.x

Source DB:  PubMed          Journal:  J Anim Breed Genet        ISSN: 0931-2668            Impact factor:   2.380


  34 in total

1.  A non-parametric mixture model for genome-enabled prediction of genetic value for a quantitative trait.

Authors:  Daniel Gianola; Xiao-Lin Wu; Eduardo Manfredi; Henner Simianer
Journal:  Genetica       Date:  2010-08-25       Impact factor: 1.082

2.  The impact of genetic architecture on genome-wide evaluation methods.

Authors:  Hans D Daetwyler; Ricardo Pong-Wong; Beatriz Villanueva; John A Woolliams
Journal:  Genetics       Date:  2010-04-20       Impact factor: 4.562

3.  Genomic selection using low-density marker panels.

Authors:  D Habier; R L Fernando; J C M Dekkers
Journal:  Genetics       Date:  2009-03-18       Impact factor: 4.562

Review 4.  Additive genetic variability and the Bayesian alphabet.

Authors:  Daniel Gianola; Gustavo de los Campos; William G Hill; Eduardo Manfredi; Rohan Fernando
Journal:  Genetics       Date:  2009-07-20       Impact factor: 4.562

5.  Nonparametric methods for incorporating genomic information into genetic evaluations: an application to mortality in broilers.

Authors:  Oscar González-Recio; Daniel Gianola; Nanye Long; Kent A Weigel; Guilherme J M Rosa; Santiago Avendaño
Journal:  Genetics       Date:  2008-04       Impact factor: 4.562

6.  Reproducing kernel hilbert spaces regression methods for genomic assisted prediction of quantitative traits.

Authors:  Daniel Gianola; Johannes B C H M van Kaam
Journal:  Genetics       Date:  2008-04       Impact factor: 4.562

7.  Marker-assisted prediction of non-additive genetic values.

Authors:  Nanye Long; Daniel Gianola; Guilherme J M Rosa; Kent A Weigel
Journal:  Genetica       Date:  2011-06-15       Impact factor: 1.082

8.  Comparisons of single-stage and two-stage approaches to genomic selection.

Authors:  Torben Schulz-Streeck; Joseph O Ogutu; Hans-Peter Piepho
Journal:  Theor Appl Genet       Date:  2012-08-19       Impact factor: 5.699

9.  Genetic variants and their interactions in the prediction of increased pre-clinical carotid atherosclerosis: the cardiovascular risk in young Finns study.

Authors:  Sebastian Okser; Terho Lehtimäki; Laura L Elo; Nina Mononen; Nina Peltonen; Mika Kähönen; Markus Juonala; Yue-Mei Fan; Jussi A Hernesniemi; Tomi Laitinen; Leo-Pekka Lyytikäinen; Riikka Rontu; Carita Eklund; Nina Hutri-Kähönen; Leena Taittonen; Mikko Hurme; Jorma S A Viikari; Olli T Raitakari; Tero Aittokallio
Journal:  PLoS Genet       Date:  2010-09-30       Impact factor: 5.917

10.  A fast and efficient approach for genomic selection with high-density markers.

Authors:  Vitara Pungpapong; William M Muir; Xianran Li; Dabao Zhang; Min Zhang
Journal:  G3 (Bethesda)       Date:  2012-10-01       Impact factor: 3.154

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.