| Literature DB >> 23821618 |
Hanni P Kärkkäinen1, Mikko J Sillanpää.
Abstract
Because of the increased availability of genome-wide sets of molecular markers along with reduced cost of genotyping large samples of individuals, genomic estimated breeding values have become an essential resource in plant and animal breeding. Bayesian methods for breeding value estimation have proven to be accurate and efficient; however, the ever-increasing data sets are placing heavy demands on the parameter estimation algorithms. Although a commendable number of fast estimation algorithms are available for Bayesian models of continuous Gaussian traits, there is a shortage for corresponding models of discrete or censored phenotypes. In this work, we consider a threshold approach of binary, ordinal, and censored Gaussian observations for Bayesian multilocus association models and Bayesian genomic best linear unbiased prediction and present a high-speed generalized expectation maximization algorithm for parameter estimation under these models. We demonstrate our method with simulated and real data. Our example analyses suggest that the use of the extra information present in an ordered categorical or censored Gaussian data set, instead of dichotomizing the data into case-control observations, increases the accuracy of genomic breeding values predicted by Bayesian multilocus association models or by Bayesian genomic best linear unbiased prediction. Furthermore, the example analyses indicate that the correct threshold model is more accurate than the directly used Gaussian model with a censored Gaussian data, while with a binary or an ordinal data the superiority of the threshold model could not be confirmed.Entities:
Keywords: G-BLUP; GenPred; binary; censored Gaussian; genomic selection; multiocus association model; ordinal; shared data resources; threshold model
Mesh:
Year: 2013 PMID: 23821618 PMCID: PMC3755911 DOI: 10.1534/g3.113.007096
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Hierarchical structure of the model framework. The ellipses indicate random parameters and rectangles fixed values, whereas the round-cornered rectangle representing the Gaussian phenotype may be either, depending on whether the threshold module is included in the model. Solid arrows indicate statistical dependency and dashed arrows functional relationship. The background boxes indicate the main modules of the model framework.
Hyperprior selection for the Bayesian LASSO
| Data Type | ||||||||
|---|---|---|---|---|---|---|---|---|
| Binary | Ordinal | Censored | ||||||
| Data/Model | 50% | 80% | Even | Odd | 20% | 50% | 80% | Gaussian |
| QTL-MAS | ||||||||
| TH | 0.20/0.81 | 0.20/0.75 | 0.20/0.86 | 0.20/0.82 | 0.20/0.86 | 0.20/0.85 | 0.20/0.81 | − |
| 0.50/0.81 | 0.50/0.88 | − | ||||||
| 1.00/0.84 | 1.00/0.87 | 1.00/0.84 | 1.00/0.87 | 1.00/0.81 | − | |||
| 2.00/0.82 | 2.00/0.80 | 2.00/0.85 | 2.00/0.81 | 2.00/0.87 | 2.00/0.86 | 2.00/0.78 | − | |
| G | 0.05/0.84 | 0.02/0.79 | 0.10/0.85 | 0.10/0.80 | 0.05/0.84 | 0.05/0.85 | 0.01/0.76 | 0.10/0.87 |
| 0.20/0.87 | 0.20/0.84 | 0.10/0.87 | 0.02/0.79 | 0.20/0.88 | ||||
| 0.10/0.84 | 0.10/0.80 | 0.25/0.83 | ||||||
| 0.20/0.81 | 0.20/0.75 | 0.60/0.87 | 0.60/0.82 | 0.50/0.86 | 0.50/0.79 | 0.06/0.77 | 0.60/0.88 | |
| Pig | ||||||||
| TH | 1.00/0.51 | 3.00/0.48 | 1.00/0.56 | 1.00/0.55 | 1.00/0.56 | 1.00/0.54 | 1.00/0.47 | − |
| 3.00/0.59 | 3.00/0.59 | − | ||||||
| 5.00/0.55 | 7.00/0.49 | 5.00/0.55 | 5.00/0.57 | 5.00/0.48 | − | |||
| 7.00/0.54 | 9.00/0.48 | 7.00/0.58 | 7.00/0.54 | 7.00/0.59 | 7.00/0.56 | 7.00/0.48 | − | |
| G | 0.10/0.51 | 0.10/0.46 | 0.50/0.56 | 1.00/0.55 | 0.20/0.55 | 0.20/0.53 | 0.04/0.41 | 0.50/0.59 |
| 0.20/0.55 | 1.00/0.59 | 1.50/0.56 | 0.40/0.58 | |||||
| 0.20/0.47 | 0.40/0.54 | 0.06/0.41 | 1.50/0.59 | |||||
| 0.40/0.53 | 0.25/0.46 | 2.00/0.57 | 2.50/0.56 | 0.80/0.59 | 0.50/0.51 | 0.07/0.41 | 2.00/0.57 | |
Different values given for the scale parameter ξ of the gamma hyperprior for the LASSO parameter, and the corresponding average accuracy of the genomic breeding value estimates (ξ/accuracy) within the 100 QTL-MAS data replicates and the 10 cross-validation partitions of the pig data set. The boldface values are the ones selected for the analyses. “Model” refers to the model type used, TH being the correct threshold model and G the linear Gaussian model used directly. The correlation in the pig data is computed as correlation between the estimated genomic breeding values and the Gaussian phenotypes divided by the square root of the predetermined heritability 0.62. The ”Binary” phenotype has either 50% or 80% success probability. The class sizes of the “Ordinal” phenotype are “Even,” 20:30:30:20%, and “Odd,” 70:10:10:10%. The percentage of censored observations in the “Censored” phenotype is 20%, 50%, or 80%. ”Gaussian” refers to the original fully observed Gaussian phenotype
Prior selection for the Bayesian G-BLUP
| Data Type | ||||||||
|---|---|---|---|---|---|---|---|---|
| Binary | Ordinal | Censored | ||||||
| Data/Model | 50% | 80% | Even | Odd | 20% | 50% | 80% | Gaussian |
| QTL-MAS | ||||||||
| TH | 400/0.74 | 400/0.71 | 400/0.78 | 400/0.74 | 400/0.78 | 400/0.77 | 400/0.72 | − |
| 800/0.75 | 800/0.72 | 800/0.75 | − | |||||
| 1000/0.79 | 1000/0.80 | 1000/0.78 | 1000/0.73 | − | ||||
| 1200/0.75 | 1200/0.72 | 1200/0.79 | 1200/0.75 | 1200/0.79 | 1200/0.78 | 1200/0.73 | − | |
| G | 25/0.70 | 10/0.64 | 200/0.77 | 200/0.73 | 100/0.76 | 50/0.74 | 10/0.67 | 200/0.78 |
| 50/0.74 | 25/0.71 | 400/0.79 | 200/0.79 | 100/0.77 | ||||
| 600/0.74 | 50/0.70 | 500/0.80 | ||||||
| 200/0.74 | 100/0.71 | 800/0.79 | 800/0.73 | 600/0.79 | 400/0.75 | 100/0.66 | 600/0.80 | |
| Pig | ||||||||
| TH | 400/0.56 | 400/0.54 | 400/0.60 | 400/0.59 | 400/0.61 | 400/0.60 | 400/0.55 | − |
| 800/0.58 | 800/0.55 | 800/0.62 | 800/0.62 | 800/0.61 | − | |||
| 1000/0.60 | 1000/0.55 | − | ||||||
| 1600/0.56 | 1200/0.55 | 1200/0.62 | 1200/0.59 | 1200/0.62 | 1200/0.61 | 1200/0.55 | − | |
| G | 25/0.55 | 10/0.49 | 200/0.60 | 200/0.56 | 100/0.59 | 50/0.57 | 5/0.47 | 200/0.62 |
| 50/0.57 | 25/0.54 | 400/0.59 | ||||||
| 600/0.60 | 300/0.61 | 150/0.57 | 25/0.49 | 500/0.62 | ||||
| 200/0.34 | 100/0.51 | 800/0.41 | 800/0.59 | 400/0.49 | 200/0.49 | 50/0.44 | 600/0.44 | |
Different values given for the scale parameter τ2 of the inverse-χ2 prior for the polygene variance, and the corresponding average accuracy of the genomic breeding value estimates (τ2/accuracy) within the 100 QTL-MAS data replicates and the 10 cross-validation partitions of the pig data set. The boldface values are the ones selected for the analyses. The column “Model” refers to the model type used, TH being the correct threshold model and G the linear Gaussian model used directly. The correlation in the pig data is computed as correlation between the estimated genomic breeding values and the Gaussian phenotypes, divided by the square root of the predetermined heritability 0.62. The “Binary” phenotype has either 50% or 80% success probability. The class sizes of the “Ordinal” phenotype are “Even,” 20:30:30:20%, and “Odd,” 70:10:10:10%. The percentage of censored observations in the “Censored” phenotype is 20%, 50%, or 80%. ”Gaussian” refers to the original fully observed Gaussian phenotype
Model accuracy
| Binary | Ordinal | Censored | ||||||
|---|---|---|---|---|---|---|---|---|
| Data/Model | 50% | 80% | Even | Odd | 20% | 50% | 80% | Gaussian |
| Bayesian LASSO | ||||||||
| QTL-MAS | ||||||||
| TH | 0.85 ± 0.02 | 0.82 ± 0.02 | 0.88 ± 0.01 | 0.85 ± 0.02 | 0.88 ± 0.01 | 0.87 ± 0.01 | 0.83 ± 0.02 | − |
| G | 0.85 ± 0.02 | 0.82 ± 0.02 | 0.88 ± 0.01 | 0.84 ± 0.02 | 0.88 ± 0.01 | 0.86 ± 0.02 | 0.80 ± 0.03 | 0.89 ± 0.01 |
| Pig | ||||||||
| TH | 0.55 ± 0.03 | 0.49 ± 0.05 | 0.59 ± 0.03 | 0.56 ± 0.04 | 0.60 ± 0.03 | 0.57 ± 0.05 | 0.49 ± 0.04 | − |
| G | 0.55 ± 0.03 | 0.48 ± 0.05 | 0.59 ± 0.03 | 0.56 ± 0.04 | 0.59 ± 0.03 | 0.54 ± 0.05 | 0.42 ± 0.05 | 0.61 ± 0.03 |
| Bayesian G-BLUP | ||||||||
| QTL-MAS | ||||||||
| TH | 0.75 ± 0.02 | 0.72 ± 0.03 | 0.79 ± 0.02 | 0.75 ± 0.02 | 0.80 ± 0.02 | 0.78 ± 0.02 | 0.74 ± 0.02 | − |
| G | 0.75 ± 0.02 | 0.72 ± 0.03 | 0.79 ± 0.02 | 0.74 ± 0.02 | 0.79 ± 0.02 | 0.77 ± 0.02 | 0.71 ± 0.03 | 0.80 ± 0.02 |
| Pig | ||||||||
| TH | 0.58 ± 0.04 | 0.55 ± 0.05 | 0.62 ± 0.04 | 0.60 ± 0.04 | 0.62 ± 0.04 | 0.61 ± 0.04 | 0.55 ± 0.05 | − |
| G | 0.57 ± 0.04 | 0.54 ± 0.05 | 0.61 ± 0.04 | 0.59 ± 0.04 | 0.61 ± 0.04 | 0.58 ± 0.04 | 0.50 ± 0.05 | 0.63 ± 0.04 |
Correlation coefficients (± 1 SD) between the true and estimated genomic breeding values in the 100 replicates of the QTL-MAS data set and the 10 cross-validation partitions of the pig data set. “Model” refers to the model type used, TH being the correct threshold model and G the linear Gaussian model used directly. The correlation in the pig data is computed as correlation between the estimated genomic breeding values and the Gaussian phenotypes, divided by the square root of the predetermined heritability 0.62. The “Binary” phenotype has either 50% or 80% success probability. The class sizes of the “Ordinal” phenotype are “Even,” 20:30:30:20% and “Odd,” 70:10:10:10%. The percentage of censored observations in the “Censored” phenotype is 20%, 50%, or 80%. “Gaussian” refers to the original fully observed Gaussian phenotype