Literature DB >> 33692825

Feature Selection Stability and Accuracy of Prediction Models for Genomic Prediction of Residual Feed Intake in Pigs Using Machine Learning.

Miriam Piles1, Rob Bergsma2, Daniel Gianola3,4, Hélène Gilbert5, Llibertat Tusell1,5.   

Abstract

Feature selection (FS, i.e., selection of a subset of predictor variables) is essential in high-dimensional datasets to prevent overfitting of prediction/classification models and reduce computation time and resources. In genomics, FS allows identifying relevant markers and designing low-density SNP chips to evaluate selection candidates. In this research, several univariate and multivariate FS algorithms combined with various parametric and non-parametric learners were applied to the prediction of feed efficiency in growing pigs from high-dimensional genomic data. The objective was to find the best combination of feature selector, SNP subset size, and learner leading to accurate and stable (i.e., less sensitive to changes in the training data) prediction models. Genomic best linear unbiased prediction (GBLUP) without SNP pre-selection was the benchmark. Three types of FS methods were implemented: (i) filter methods: univariate (univ.dtree, spearcor) or multivariate (cforest, mrmr), with random selection as benchmark; (ii) embedded methods: elastic net and least absolute shrinkage and selection operator (LASSO) regression; (iii) combination of filter and embedded methods. Ridge regression, support vector machine (SVM), and gradient boosting (GB) were applied after pre-selection performed with the filter methods. Data represented 5,708 individual records of residual feed intake to be predicted from the animal's own genotype. Accuracy (stability of results) was measured as the median (interquartile range) of the Spearman correlation between observed and predicted data in a 10-fold cross-validation. The best prediction in terms of accuracy and stability was obtained with SVM and GB using 500 or more SNPs [0.28 (0.02) and 0.27 (0.04) for SVM and GB with 1,000 SNPs, respectively]. With larger subset sizes (1,000-1,500 SNPs), the filter method had no influence on prediction quality, which was similar to that attained with a random selection. With 50-250 SNPs, the FS method had a huge impact on prediction quality: it was very poor for tree-based methods combined with any learner, but good and similar to what was obtained with larger SNP subsets when spearcor or mrmr were implemented with or without embedded methods. Those filters also led to very stable results, suggesting their potential use for designing low-density SNP chips for genome-based evaluation of feed efficiency.
Copyright © 2021 Piles, Bergsma, Gianola, Gilbert and Tusell.

Entities:  

Keywords:  SNP; feature selection; feed efficiency and growth; genomic prediction; machine learning; pigs; stability

Year:  2021        PMID: 33692825      PMCID: PMC7938892          DOI: 10.3389/fgene.2021.611506

Source DB:  PubMed          Journal:  Front Genet        ISSN: 1664-8021            Impact factor:   4.599


  6 in total

1.  Using machine learning to improve the accuracy of genomic prediction of reproduction traits in pigs.

Authors:  Xue Wang; Shaolei Shi; Guijiang Wang; Wenxue Luo; Xia Wei; Ao Qiu; Fei Luo; Xiangdong Ding
Journal:  J Anim Sci Biotechnol       Date:  2022-05-17

2.  Improvement of Genomic Predictions in Small Breeds by Construction of Genomic Relationship Matrix Through Variable Selection.

Authors:  Enrico Mancin; Lucio Flavio Macedo Mota; Beniamino Tuliozi; Rina Verdiglione; Roberto Mantovani; Cristina Sartori
Journal:  Front Genet       Date:  2022-05-18       Impact factor: 4.772

3.  Objective Phenotyping of Root System Architecture Using Image Augmentation and Machine Learning in Alfalfa (Medicago sativa L.).

Authors:  Zhanyou Xu; Larry M York; Anand Seethepalli; Bruna Bucciarelli; Hao Cheng; Deborah A Samac
Journal:  Plant Phenomics       Date:  2022-04-07

4.  Machine Learning Applied to the Search for Nonlinear Features in Breeding Populations.

Authors:  Iulian Gabur; Danut Petru Simioniuc; Rod J Snowdon; Dan Cristea
Journal:  Front Artif Intell       Date:  2022-05-20

5.  Meta-analysis across Nellore cattle populations identifies common metabolic mechanisms that regulate feed efficiency-related traits.

Authors:  Lucio F M Mota; Samuel W B Santos; Gerardo A Fernandes Júnior; Tiago Bresolin; Maria E Z Mercadante; Josineudson A V Silva; Joslaine N S G Cyrillo; Fábio M Monteiro; Roberto Carvalheiro; Lucia G Albuquerque
Journal:  BMC Genomics       Date:  2022-06-07       Impact factor: 4.547

6.  Machine Learning-Based Radiomics for Prediction of Epidermal Growth Factor Receptor Mutations in Lung Adenocarcinoma.

Authors:  Jiameng Lu; Xiaoqing Ji; Lixia Wang; Yunxiu Jiang; Xinyi Liu; Zhenshen Ma; Yafei Ning; Jie Dong; Haiying Peng; Fei Sun; Zihan Guo; Yanbo Ji; Jianping Xing; Yue Lu; Degan Lu
Journal:  Dis Markers       Date:  2022-05-07       Impact factor: 3.464

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.