Literature DB >> 25370926

A comparison of principal component regression and genomic REML for genomic prediction across populations.

Christos Dadousis, Roel F Veerkamp, Bjørg Heringstad, Marcin Pszczola, Mario P L Calus1.   

Abstract

BACKGROUND: Genomic prediction faces two main statistical problems: multicollinearity and n ≪ p (many fewer observations than predictor variables). Principal component (PC) analysis is a multivariate statistical method that is often used to address these problems. The objective of this study was to compare the performance of PC regression (PCR) for genomic prediction with that of a commonly used REML model with a genomic relationship matrix (GREML) and to investigate the full potential of PCR for genomic prediction.
METHODS: The PCR model used either a common or a semi-supervised approach, where PC were selected based either on their eigenvalues (i.e. proportion of variance explained by SNP (single nucleotide polymorphism) genotypes) or on their association with phenotypic variance in the reference population (i.e. the regression sum of squares contribution). Cross-validation within the reference population was used to select the optimum PCR model that minimizes mean squared error. Pre-corrected average daily milk, fat and protein yields of 1609 first lactation Holstein heifers, from Ireland, UK, the Netherlands and Sweden, which were genotyped with 50 k SNPs, were analysed. Each testing subset included animals from only one country, or from only one selection line for the UK.
RESULTS: In general, accuracies of GREML and PCR were similar but GREML slightly outperformed PCR. Inclusion of genotyping information of validation animals into model training (semi-supervised PCR), did not result in more accurate genomic predictions. The highest achievable PCR accuracies were obtained across a wide range of numbers of PC fitted in the regression (from one to more than 1000), across test populations and traits. Using cross-validation within the reference population to derive the number of PC, yielded substantially lower accuracies than the highest achievable accuracies obtained across all possible numbers of PC.
CONCLUSIONS: On average, PCR performed only slightly less well than GREML. When the optimal number of PC was determined based on realized accuracy in the testing population, PCR showed a higher potential in terms of achievable accuracy that was not capitalized when PC selection was based on cross-validation. A standard approach for selecting the optimal set of PC in PCR remains a challenge.

Entities:  

Mesh:

Year:  2014        PMID: 25370926      PMCID: PMC4220066          DOI: 10.1186/s12711-014-0060-x

Source DB:  PubMed          Journal:  Genet Sel Evol        ISSN: 0999-193X            Impact factor:   4.297


  31 in total

Review 1.  Genetic markers in the playground of multivariate analysis.

Authors:  T Jombart; D Pontier; A-B Dufour
Journal:  Heredity (Edinb)       Date:  2009-01-21       Impact factor: 3.821

2.  Efficient methods to compute genomic predictions.

Authors:  P M VanRaden
Journal:  J Dairy Sci       Date:  2008-11       Impact factor: 4.034

3.  Principal component analysis of genetic data.

Authors:  David Reich; Alkes L Price; Nick Patterson
Journal:  Nat Genet       Date:  2008-05       Impact factor: 38.330

4.  Interpreting principal component analyses of spatial population genetic variation.

Authors:  John Novembre; Matthew Stephens
Journal:  Nat Genet       Date:  2008-04-20       Impact factor: 38.330

5.  Genome-wide associations for feed utilisation complex in primiparous Holstein-Friesian dairy cows from experimental research herds in four European countries.

Authors:  R F Veerkamp; M P Coffey; D P Berry; Y de Haas; E Strandberg; H Bovenhuis; M P L Calus; E Wall
Journal:  Animal       Date:  2012-11       Impact factor: 3.240

6.  Components of the accuracy of genomic prediction in a multi-breed sheep population.

Authors:  H D Daetwyler; K E Kemper; J H J van der Werf; B J Hayes
Journal:  J Anim Sci       Date:  2012-10       Impact factor: 3.159

7.  Improved accuracy of genomic prediction for dry matter intake of dairy cattle from combined European and Australian data sets.

Authors:  Y de Haas; M P L Calus; R F Veerkamp; E Wall; M P Coffey; H D Daetwyler; B J Hayes; J E Pryce
Journal:  J Dairy Sci       Date:  2012-08-03       Impact factor: 4.034

8.  Tracing sub-structure in the European American population with PCA-informative markers.

Authors:  Peristera Paschou; Petros Drineas; Jamey Lewis; Caroline M Nievergelt; Deborah A Nickerson; Joshua D Smith; Paul M Ridker; Daniel I Chasman; Ronald M Krauss; Elad Ziv
Journal:  PLoS Genet       Date:  2008-07-04       Impact factor: 5.917

9.  A genealogical interpretation of principal components analysis.

Authors:  Gil McVean
Journal:  PLoS Genet       Date:  2009-10-16       Impact factor: 5.917

10.  Reducing dimensionality for prediction of genome-wide breeding values.

Authors:  Trygve R Solberg; Anna K Sonesson; John A Woolliams; Theo H E Meuwissen
Journal:  Genet Sel Evol       Date:  2009-03-18       Impact factor: 4.297

View more
  3 in total

1.  Genomic prediction based on data from three layer lines using non-linear regression models.

Authors:  Heyun Huang; Jack J Windig; Addie Vereijken; Mario P L Calus
Journal:  Genet Sel Evol       Date:  2014-11-06       Impact factor: 4.297

2.  Identification of informative features for predicting proinflammatory potentials of engine exhausts.

Authors:  Chia-Chi Wang; Ying-Chi Lin; Yuan-Chung Lin; Syu-Ruei Jhang; Chun-Wei Tung
Journal:  Biomed Eng Online       Date:  2017-08-18       Impact factor: 2.819

3.  Incorporating Prior Knowledge of Principal Components in Genomic Prediction.

Authors:  Sayed M Hosseini-Vardanjani; Mohammad M Shariati; Hossein Moradi Shahrebabak; Mojtaba Tahmoorespur
Journal:  Front Genet       Date:  2018-08-02       Impact factor: 4.599

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.