| Literature DB >> 30233646 |
Ning Gao1, Jinyan Teng1, Shaopan Ye1, Xiaolong Yuan1, Shuwen Huang1, Hao Zhang1, Xiquan Zhang1, Jiaqi Li1, Zhe Zhang1.
Abstract
In the last years, a series of methods for genomic prediction (GP) have been established, and the advantages of GP over pedigree best linear unbiased prediction (BLUP) have been reported. However, the majority of previously proposed GP models are purely based on mathematical considerations while seldom take the abundant biological knowledge into account. Prediction ability of those models largely depends on the consistency between the statistical assumptions and the underlying genetic architectures of traits of interest. In this study, gene annotation information was incorporated into GP models by constructing haplotypes with SNPs mapped to genic regions. Haplotype allele similarity between pairs of individuals was measured through different approaches at single gene level and then converted into whole genome level, which was then treated as a special kernel and used in kernel based GP models. Results shown that the gene annotation guided methods gave higher or at least comparable predictive ability in some traits, especially in the Arabidopsis dataset and the rice breeding population. Compared to SNP models and haplotype models without gene annotation, the gene annotation based models improved the predictive ability by 0.56~26.67% in the Arabidopsis and 1.62~16.53% in the rice breeding population, respectively. However, incorporating gene annotation slightly improved the predictive ability for several traits but did not show any extra gain for the rest traits in a chicken population. In conclusion, integrating gene annotation into GP models could be beneficial for some traits, species, and populations compared to SNP models and haplotype models without gene annotation. However, more studies are yet to be conducted to implicitly investigate the characteristics of these gene annotation guided models.Entities:
Keywords: complex phenotypes; gene annotation; genomic prediction; genomic selection; haplotype models
Year: 2018 PMID: 30233646 PMCID: PMC6127733 DOI: 10.3389/fgene.2018.00364
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Genotype matrix of five individuals and 10 consecutive markers from a certain protein coding gene.
| Paternal | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | ||
| Maternal | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | ||
| Paternal | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | ||
| Maternal | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | ||
| Paternal | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | ||
| Maternal | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | ||
| Paternal | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | ||
| Maternal | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | ||
| Paternal | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | ||
| Maternal | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | ||
A haplotype block contains four haplotype alleles is defined by these 10 consecutive markers from a protein coding gene.
Datasets description.
| Rice | 315 | 58,227 | 44,831 | 22,509 | 25,453 | |
| Arabidopsis | 349 | 208,481 | 193,646 | 27,169 | 167,837 | |
| Chicken | 435 | 408,715 | 233,417 | 17,686 | 45,470 |
Denoted “the number.”
Pearson's correlation between observed and predicted phenotypes in the rice breeding population (Mean ± SE).
| DS_PH | 0.486 ± 0.007 | 0.498 ± 0.007 | 0.493 ± 0.007 | 0.501 ± 0.007 | 0.498 ± 0.007 | 0.503 ± 0.007 | |
| DS_FLW | 0.534 ± 0.005 | 0.555 ± 0.005 | 0.530 ± 0.005 | 0.552 ± 0.005 | 0.540 ± 0.005 | 0.553 ± 0.005 | |
| DS_YLD | 0.289 ± 0.006 | 0.285 ± 0.006 | 0.286 ± 0.006 | 0.312 ± 0.006 | 0.286 ± 0.006 | 0.311 ± 0.006 | |
| WS_PH | 0.482 ± 0.006 | 0.496 ± 0.005 | 0.489 ± 0.006 | 0.507 ± 0.005 | 0.492 ± 0.006 | 0.509 ± 0.005 | |
| WS_FLW | 0.467 ± 0.007 | 0.487 ± 0.006 | 0.465 ± 0.006 | 0.491 ± 0.006 | 0.474 ± 0.006 | 0.492 ± 0.006 | |
| WS_YLD | 0.258 ± 0.007 | 0.242 ± 0.007 | 0.268 ± 0.008 | 0.264 ± 0.007 | 0.256 ± 0.007 | 0.280 ± 0.008 |
For each trait (row), the values in boldface indicate the best prediction among all models. DS, dry season; WS, wet season; PH, plant height; FLW, flower time; YLD, grain yield.
Genomic best linear unbiased prediction (VanRaden, .
Haplotype similarity based models without gene annotation. HAPI, HAP1, and Hap2 are differ on the way of evaluating haplotype similarity.
~|GA denoted gene annotation guided GP models.
Pearson's correlation between observed and predicted phenotypes in the Arabidopsis population (Mean ± SE).
| Labv | 0.163 ± 0.009 | 0.161 ± 0.009 | 0.170 ± 0.009 | 0.164 ± 0.009 | 0.166 ± 0.009 | 0.174 ± 0.009 | |
| Laav | 0.201 ± 0.006 | 0.205 ± 0.006 | 0.200 ± 0.006 | 0.209 ± 0.006 | 0.201 ± 0.006 | 0.208 ± 0.006 | |
| PH1S | 0.191 ± 0.005 | 0.196 ± 0.005 | 0.190 ± 0.005 | 0.213 ± 0.005 | 0.191 ± 0.005 | 0.211 ± 0.005 | |
| TPH | 0.185 ± 0.007 | 0.183 ± 0.007 | 0.175 ± 0.007 | 0.181 ± 0.007 | 0.185 ± 0.007 | 0.179 ± 0.007 | |
| MSB | 0.340 ± 0.004 | 0.346 ± 0.004 | 0.337 ± 0.004 | 0.346 ± 0.004 | 0.337 ± 0.004 | 0.348 ± 0.004 | |
| RB | 0.281 ± 0.006 | 0.283 ± 0.007 | 0.281 ± 0.007 | 0.277 ± 0.006 | 0.282 ± 0.006 | 0.276 ± 0.006 | |
| LL | 0.356 ± 0.006 | 0.355 ± 0.005 | 0.353 ± 0.005 | 0.356 ± 0.006 | 0.357 ± 0.005 | ||
| PL | 0.303 ± 0.006 | 0.301 ± 0.006 | 0.301 ± 0.005 | 0.305 ± 0.006 | 0.306 ± 0.006 | 0.307 ± 0.006 | |
| PL/LL | 0.255 ± 0.009 | 0.249 ± 0.009 | 0.237 ± 0.008 | 0.247 ± 0.008 | 0.257 ± 0.009 | 0.245 ± 0.008 | |
| FT | 0.643 ± 0.003 | 0.653 ± 0.003 | 0.642 ± 0.003 | 0.658 ± 0.003 | 0.644 ± 0.003 | 0.660 ± 0.003 | |
| RGRbv | 0.045 ± 0.007 | 0.050 ± 0.007 | 0.042 ± 0.007 | 0.054 ± 0.008 | 0.042 ± 0.007 | 0.054 ± 0.008 | |
| RGRav | 0.184 ± 0.006 | 0.179 ± 0.006 | 0.194 ± 0.006 | 0.184 ± 0.006 | 0.183 ± 0.006 | 0.199 ± 0.006 |
For each trait (row), the values in boldface indicate the best prediction among all models. LAbv, leaf area before vernalization; LAav, leaf area after vernalization; FT, flowering time; PL/LL, petiole to leaf length ratio; PL, petiole length; LL, leaf length; RB, rosette branching; MSB, main stem branching; PH1S, plant height at 1st silique; TPH, total plant height; RGRbv, relative growth rate before vernalization; RGRav, relative growth rate after vernalization.
Genomic best linear unbiased prediction (VanRaden, .
Haplotype similarity based models without gene annotation. HAPI, HAP1, and Hap2 are differ on the way of evaluating haplotype similarity.
~|GA denoted gene annotation guided GP models.
Pearson's correlation between observed and predicted phenotypes in the yellow chicken population (Mean ± SE).
| ADG | 0.344 ± 0.005 | 0.342 ± 0.004 | 0.345 ± 0.005 | 0.345 ± 0.004 | 0.345 ± 0.005 | 0.345 ± 0.004 | |
| ADFI | 0.437 ± 0.004 | 0.438 ± 0.004 | 0.436 ± 0.004 | 0.439 ± 0.004 | 0.437 ± 0.004 | ||
| MTW | 0.322 ± 0.005 | 0.315 ± 0.004 | 0.314 ± 0.005 | 0.325 ± 0.004 | 0.316 ± 0.005 | 0.326 ± 0.004 | |
| MTMW | 0.322 ± 0.005 | 0.315 ± 0.004 | 0.314 ± 0.005 | 0.325 ± 0.004 | 0.316 ± 0.005 | 0.327 ± 0.004 | |
| RFI | 0.464 ± 0.005 | 0.465 ± 0.005 | 0.466 ± 0.005 | 0.467 ± 0.005 | 0.467 ± 0.005 | ||
| FCR | 0.288 ± 0.004 | 0.274 ± 0.004 | 0.286 ± 0.004 | 0.271 ± 0.004 | 0.288 ± 0.004 | 0.273 ± 0.004 | |
| EWG | 0.253 ± 0.009 | 0.256 ± 0.009 | 0.253 ± 0.009 | 0.256 ± 0.008 | 0.254 ± 0.009 | 0.256 ± 0.009 | |
| EW | 0.253 ± 0.009 | 0.249 ± 0.010 | 0.253 ± 0.009 | 0.250 ± 0.010 | 0.250 ± 0.010 | ||
| BMW | 0.142 ± 0.011 | 0.138 ± 0.011 | 0.144 ± 0.011 | 0.142 ± 0.011 | 0.143 ± 0.011 | 0.141 ± 0.011 | |
| BMP | 0.128 ± 0.011 | 0.128 ± 0.011 | 0.123 ± 0.011 | 0.128 ± 0.011 | 0.129 ± 0.011 | 0.126 ± 0.011 | |
| DW | 0.175 ± 0.010 | 0.172 ± 0.010 | 0.176 ± 0.010 | 0.175 ± 0.010 | 0.174 ± 0.010 | 0.179 ± 0.010 | |
| DP | 0.128 ± 0.011 | 0.128 ± 0.011 | 0.123 ± 0.011 | 0.128 ± 0.011 | 0.129 ± 0.011 | 0.126 ± 0.011 | |
| AFW | 0.108 ± 0.009 | 0.104 ± 0.009 | 0.112 ± 0.009 | 0.110 ± 0.009 | 0.111 ± 0.009 | 0.108 ± 0.009 | |
| AFP | 0.128 ± 0.011 | 0.128 ± 0.011 | 0.123 ± 0.011 | 0.128 ± 0.011 | 0.129 ± 0.011 | 0.126 ± 0.011 | |
| GW | 0.067 ± 0.011 | 0.070 ± 0.010 | 0.066 ± 0.011 | 0.068 ± 0.011 | 0.070 ± 0.011 | 0.067 ± 0.011 | |
| IL | 0.041 ± 0.005 | 0.037 ± 0.005 | 0.043 ± 0.005 | 0.040 ± 0.005 | 0.043 ± 0.005 | 0.039 ± 0.005 | |
| BW45 | 0.307 ± 0.005 | 0.306 ± 0.005 | 0.303 ± 0.005 | 0.302 ± 0.005 | 0.304 ± 0.005 | 0.304 ± 0.005 |
For each trait (row), the values in boldface indicate the best prediction among all models. ADG, Average daily gain; ADFI, Average daily feed intake; MTW, Mid-term body weight; MTMW, Mid-term metabolic body weight; RFI, Residual feed intake; FCR, Feed conversion rate; EWG, Eviscerated weight with giblet; EW, Eviscerated weight; BMW, Breast muscle weight; BMP, Breast muscle percentage; DW, Drumstick weight; DP, Drumstick percentage; AFW, Abdominal fat weight; AFP, Abdominal fat percentage; GW, Gizzard weight; IL, intestine length; BW45, body weight at 45 day.
Genomic best linear unbiased prediction (VanRaden, .
Haplotype similarity based models without gene annotation. HAPI, HAP1, and Hap2 are differ on the way of evaluating haplotype similarity.
~|GA denoted gene annotation guided GP models.