Literature DB >> 23102953

The gradient boosting algorithm and random boosting for genome-assisted evaluation in large data sets.

O González-Recio1, J A Jiménez-Montero, R Alenda.   

Abstract

In the next few years, with the advent of high-density single nucleotide polymorphism (SNP) arrays and genome sequencing, genomic evaluation methods will need to deal with a large number of genetic variants and an increasing sample size. The boosting algorithm is a machine-learning technique that may alleviate the drawbacks of dealing with such large data sets. This algorithm combines different predictors in a sequential manner with some shrinkage on them; each predictor is applied consecutively to the residuals from the committee formed by the previous ones to form a final prediction based on a subset of covariates. Here, a detailed description is provided and examples using a toy data set are included. A modification of the algorithm called "random boosting" was proposed to increase predictive ability and decrease computation time of genome-assisted evaluation in large data sets. Random boosting uses a random selection of markers to add a subsequent weak learner to the predictive model. These modifications were applied to a real data set composed of 1,797 bulls genotyped for 39,714 SNP. Deregressed proofs of 4 yield traits and 1 type trait from January 2009 routine evaluations were used as dependent variables. A 2-fold cross-validation scenario was implemented. Sires born before 2005 were used as a training sample (1,576 and 1,562 for production and type traits, respectively), whereas younger sires were used as a testing sample to evaluate predictive ability of the algorithm on yet-to-be-observed phenotypes. Comparison with the original algorithm was provided. The predictive ability of the algorithm was measured as Pearson correlations between observed and predicted responses. Further, estimated bias was computed as the average difference between observed and predicted phenotypes. The results showed that the modification of the original boosting algorithm could be run in 1% of the time used with the original algorithm and with negligible differences in accuracy and bias. This modification may be used to speed the calculus of genome-assisted evaluation in large data sets such us those obtained from consortiums.
Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

Mesh:

Year:  2012        PMID: 23102953     DOI: 10.3168/jds.2012-5630

Source DB:  PubMed          Journal:  J Dairy Sci        ISSN: 0022-0302            Impact factor:   4.034


  9 in total

1.  Machine learning in postgenomic biology and personalized medicine.

Authors:  Animesh Ray
Journal:  Wiley Interdiscip Rev Data Min Knowl Discov       Date:  2022-01-24

2.  Genomic Prediction Methods Accounting for Nonadditive Genetic Effects.

Authors:  Luis Varona; Andres Legarra; Miguel A Toro; Zulma G Vitezica
Journal:  Methods Mol Biol       Date:  2022

3.  Genome-Enabled Prediction Methods Based on Machine Learning.

Authors:  Edgar L Reinoso-Peláez; Daniel Gianola; Oscar González-Recio
Journal:  Methods Mol Biol       Date:  2022

4.  A Model for Predicting Cervical Cancer Using Machine Learning Algorithms.

Authors:  Naif Al Mudawi; Abdulwahab Alazeb
Journal:  Sensors (Basel)       Date:  2022-05-29       Impact factor: 3.847

5.  Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research: A Multidisciplinary View.

Authors:  Wei Luo; Dinh Phung; Truyen Tran; Sunil Gupta; Santu Rana; Chandan Karmakar; Alistair Shilton; John Yearwood; Nevenka Dimitrova; Tu Bao Ho; Svetha Venkatesh; Michael Berk
Journal:  J Med Internet Res       Date:  2016-12-16       Impact factor: 5.428

6.  Benchmarking Parametric and Machine Learning Models for Genomic Prediction of Complex Traits.

Authors:  Christina B Azodi; Emily Bolger; Andrew McCarren; Mark Roantree; Gustavo de Los Campos; Shin-Han Shiu
Journal:  G3 (Bethesda)       Date:  2019-11-05       Impact factor: 3.154

7.  Prediction performance of linear models and gradient boosting machine on complex phenotypes in outbred mice.

Authors:  Bruno C Perez; Marco C A M Bink; Karen L Svenson; Gary A Churchill; Mario P L Calus
Journal:  G3 (Bethesda)       Date:  2022-04-04       Impact factor: 3.154

8.  Symptom-Based COVID-19 Prognosis through AI-Based IoT: A Bioinformatics Approach.

Authors:  Madhumita Pal; Smita Parija; Ranjan K Mohapatra; Snehasish Mishra; Ali A Rabaan; Abbas Al Mutair; Saad Alhumaid; Jaffar A Al-Tawfiq; Kuldeep Dhama
Journal:  Biomed Res Int       Date:  2022-07-23       Impact factor: 3.246

9.  Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes.

Authors:  Rostam Abdollahi-Arpanahi; Daniel Gianola; Francisco Peñagaricano
Journal:  Genet Sel Evol       Date:  2020-02-24       Impact factor: 4.297

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.