| Literature DB >> 32318093 |
Dzianis Prakapenka1, Chunkao Wang1, Zuoxiang Liang1, Cheng Bian1,2, Cheng Tan1,3, Yang Da1.
Abstract
Haplotype prediction models open many possibilities to improve the accuracy of genomic selection but require more data processing and computing time than single-SNP prediction models. To facilitate haplotype analysis for genomic prediction and estimation using structural and functional genomic information, we developed a computing pipeline to implement haplotype analysis with capabilities for preparation of input data for haplotype analysis, genomic prediction and estimation using GVCHAP, and analysis of GVCHAP results. Data preparation includes utility programs for haplotype imputing; defining haplotype blocks by a fixed number of SNPs, a fixed distance in base pairs per block, or user defined block lengths based on structural or functional genomic information or a mixture of both types of information; and defining haplotype genotypes within each haplotype block. GVCHAP is the main program for genomic prediction and estimation, calculates GREML (genomic restricted maximum likelihood) estimates of variance components and heritabilities, and calculates GBLUP (genomic best linear unbiased prediction) for additive and dominance values of single SNPs as well as additive values of haplotypes with reliability estimates for training and validation populations. A two-step strategy and a method of multi-node processing are implemented to remove the computing bottleneck due to the creation of genomic relationship matrices for large samples. The analysis of GVCHAP results includes calculation of observed prediction accuracies from validation studies and preparation of input files for graphical visualization of heritability estimates of haplotype blocks as well as estimates of SNP effects and heritabilities. The entire pipeline provides an efficient and versatile computing tool for identifying the most accurate haplotype model among many candidate haplotype models utilizing structural and functional genomic information for genomic selection.Entities:
Keywords: SNP; genomic selection; haplotype; heritability; prediction accuracy
Year: 2020 PMID: 32318093 PMCID: PMC7154123 DOI: 10.3389/fgene.2020.00282
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1Structure of GVCHAP computing pipeline. The pipeline consists of three components, preparation of input data for haplotype analysis, GVCHAP analysis of genomic prediction and estimation, and post-GVCHAP analysis.
FIGURE 2Computing pipeline for preparing input files for GVCHAP analysis.
FIGURE 3Input and output files of GVCHAP analysis.
Seven prediction models configured by parameters for starting values of variance components in the parameter file and the main results of GBLUP and GREML for each model.
| Model 1 | var_snp_a NPV | var_snp_d NPV | var_snp_e NPV | var_hap_a NPV |
| Model 2 | var_snp_a NPV | #var_snp_d | var_snp_e NPV | var_hap_a NPV |
| Model 3 | #var_snp_a | var_snp_d NPV | var_snp_e NPV | var_hap_a NPV |
| Model 4 | #var_snp_a | #var_snp_d | var_snp_e NPV | var_hap_a NPV |
| Model 5 | var_snp_a NPV | var_snp_d NPV | var_snp_e NPV | #var_hap_a |
| Model 6 | var_snp_a NPV | #var_snp_d | var_snp_e NPV | #var_hap_a |
| Model 7 | #var_snp_a | var_snp_d NPV | var_snp_e NPV | #var_hap_a |
FIGURE 4Examples of graphical visualization of SNP heritabilities (left) and haplotype heritabilities (right).
Computing time of GVCHAP using the Mesabi supercomputer.
| Number of individuals (n) | 7549 | 7549 | 15,098 | 15,098 | 37,745a |
| Number of SNPs (m) | 82,128 | 328,512 | 328,512 | 657,024 | 328,512 |
| SNP | 0.12 h | 0.71 h | 0.70 h | 1.32 h | 2.67 h |
| Haplotype | 1.31 h | 7.88 h | 14.31 h | 22.11 h | ≈3 hb |
| Time per iteration | 19 sc | 37 sc | 82 sc | 62 sc | 22–29 minc (one node) |
Approximate saving of computing time of GVCHAP due to the two-step strategy or multi-node processing (MNP) for a 10-fold validation study relative to the use of a single node of the Mesabi supercomputer without the two-step strategy of MNP.
| Number of individuals (n) | 7549 | 15,098 | 15,098 |
| Number of SNPs (m) | 328,512 | 328,512 | 657,024 |
| 10-fold validation model/trait | 3.22 days | 6 days | 9 days |
| 10-fold validation 10 models per trait | 32.2 days | 60 days | 90 days |