Literature DB >> 31447154

Comparing regression, naive Bayes, and random forest methods in the prediction of individual survival to second lactation in Holstein cattle.

E M M van der Heide1, R F Veerkamp2, M L van Pelt3, C Kamphuis4, I Athanasiadis4, B J Ducro2.   

Abstract

In this study, we compared multiple logistic regression, a linear method, to naive Bayes and random forest, 2 nonlinear machine-learning methods. We used all 3 methods to predict individual survival to second lactation in dairy heifers. The data set used for prediction contained 6,847 heifers born between January 2012 and June 2013, and had known survival outcomes. Each animal had 50 genomic estimated breeding values available at birth and up to 65 phenotypic variables that accumulated over time. Survival was predicted at 5 moments in life: at birth, at 18 mo, at first calving, at 6 wk after first calving, and at 200 d after first calving. The data sets were randomly split into 70% training and 30% testing sets to evaluate model performance for 20-fold validation. The methods were compared for accuracy, sensitivity, specificity, area under the curve (AUC) value, contrasts between groups for the prediction outcomes, and increase in surviving animals in a practical scenario. At birth and 18 mo, all methods had overlapping performance; no method significantly outperformed the other. At first calving, 6 wk after first calving, and 200 d after first calving, random forest and naive Bayes had overlapping performance, and both machine-learning methods outperformed multiple logistic regression. Overall, naive Bayes has the highest average AUC at all decision points up to 200 d after first calving. Random forest had the highest AUC at 200 d after first calving. All methods obtained similar increases in survival in the practical scenario. Despite this, the methods appeared to predict the survival of individual heifers differently. All methods improved over time, but the changes in mean model outcomes for surviving and non-surviving animals differed by method. Furthermore, the correlations of individual predictions between methods ranged from r = 0.417 to r = 0.700; the lowest correlations were at first calving for all methods. In short, all 3 methods were able to predict survival at a population level, because all methods improved survival in a practical scenario. However, depending on the method used, predictions for individual animals were quite different between methods. The Authors. Published by FASS Inc. and Elsevier Inc. on behalf of the American Dairy Science Association®. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Entities:  

Keywords:  machine learning; naive Bayes; phenotypic prediction; random forest; regression

Mesh:

Year:  2019        PMID: 31447154     DOI: 10.3168/jds.2019-16295

Source DB:  PubMed          Journal:  J Dairy Sci        ISSN: 0022-0302            Impact factor:   4.034


  8 in total

1.  Genome-Enabled Prediction Methods Based on Machine Learning.

Authors:  Edgar L Reinoso-Peláez; Daniel Gianola; Oscar González-Recio
Journal:  Methods Mol Biol       Date:  2022

2.  Identification of Age-Specific and Common Key Regulatory Mechanisms Governing Eggshell Strength in Chicken Using Random Forests.

Authors:  Faisal Ramzan; Selina Klees; Armin Otto Schmitt; David Cavero; Mehmet Gültas
Journal:  Genes (Basel)       Date:  2020-04-24       Impact factor: 4.096

3.  Living With COVID-19: A Systemic and Multi-Criteria Approach to Enact Evidence-Based Health Policy.

Authors:  Didier Raboisson; Guillaume Lhermie
Journal:  Front Public Health       Date:  2020-06-16

4.  Application of Internet of Things and Naive Bayes in Public Health Environmental Management of Government Institutions in China.

Authors:  Zhipeng Zhang; Shuxiang Zhang
Journal:  J Healthc Eng       Date:  2021-08-13       Impact factor: 2.682

Review 5.  Over 20 Years of Machine Learning Applications on Dairy Farms: A Comprehensive Mapping Study.

Authors:  Philip Shine; Michael D Murphy
Journal:  Sensors (Basel)       Date:  2021-12-22       Impact factor: 3.576

6.  Can machine learning algorithms perform better than multiple linear regression in predicting nitrogen excretion from lactating dairy cows.

Authors:  Xianjiang Chen; Huiru Zheng; Haiying Wang; Tianhai Yan
Journal:  Sci Rep       Date:  2022-07-21       Impact factor: 4.996

7.  Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS.

Authors:  Sara Domínguez-Rodríguez; Miquel Serna-Pascual; Andrea Oletto; Shaun Barnabas; Peter Zuidewind; Els Dobbels; Siva Danaviah; Osee Behuhuma; Maria Grazia Lain; Paula Vaz; Sheila Fernández-Luis; Tacilta Nhampossa; Elisa Lopez-Varela; Kennedy Otwombe; Afaaf Liberty; Avy Violari; Almoustapha Issiaka Maiga; Paolo Rossi; Carlo Giaquinto; Louise Kuhn; Pablo Rojo; Alfredo Tagarro
Journal:  PLoS One       Date:  2022-10-14       Impact factor: 3.752

8.  Combining Random Forests and a Signal Detection Method Leads to the Robust Detection of Genotype-Phenotype Associations.

Authors:  Faisal Ramzan; Mehmet Gültas; Hendrik Bertram; David Cavero; Armin Otto Schmitt
Journal:  Genes (Basel)       Date:  2020-08-05       Impact factor: 4.096

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.