| Literature DB >> 24963154 |
Doug Speed1, David J Balding2.
Abstract
BLUP (best linear unbiased prediction) is widely used to predict complex traits in plant and animal breeding, and increasingly in human genetics. The BLUP mathematical model, which consists of a single random effect term, was adequate when kinships were measured from pedigrees. However, when genome-wide SNPs are used to measure kinships, the BLUP model implicitly assumes that all SNPs have the same effect-size distribution, which is a severe and unnecessary limitation. We propose MultiBLUP, which extends the BLUP model to include multiple random effects, allowing greatly improved prediction when the random effects correspond to classes of SNPs with distinct effect-size variances. The SNP classes can be specified in advance, for example, based on SNP functional annotations, and we also provide an adaptive procedure for determining a suitable partition of SNPs. We apply MultiBLUP to genome-wide association data from the Wellcome Trust Case Control Consortium (seven diseases), and from much larger studies of celiac disease and inflammatory bowel disease, finding that it consistently provides better prediction than alternative methods. Moreover, MultiBLUP is computationally very efficient; for the largest data set, which includes 12,678 individuals and 1.5 M SNPs, the total analysis can be run on a single desktop PC in less than a day and can be parallelized to run even faster. Tools to perform MultiBLUP are freely available in our software LDAK.Entities:
Mesh:
Year: 2014 PMID: 24963154 PMCID: PMC4158754 DOI: 10.1101/gr.169375.113
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
Figure 1.Prediction performance of BLUP and MultiBLUP on simulated quantitative traits. The two plots correspond to unrelated humans (left) and related mice (right). They show across 50 repetitions the correlation between predicted and observed phenotypes in the test set for BLUP (white boxes) and MultiBLUP (shaded boxes). The x-axis indexes the simulation scenarios, with increasing heterogeneity of effect sizes across the five regions. Here, MultiBLUP uses five GSMs, one for each region. Within each plot, the true (simulated) heritability is 0.5 (left half) or 0.8 (right half).
Prediction of case/control status for WTCCC1 human traits
Prediction of case/control status for celiac disease and inflammatory bowel disease