| Literature DB >> 32840564 |
Heather Manching1, Randall J Wisser1.
Abstract
MOTIVATION: Ancestral haplotype maps provide useful information about genomic variation and insights into biological processes. Reconstructing the descendent haplotype structure of homologous chromosomes, particularly for large numbers of individuals, can help with characterizing the recombination landscape, elucidating genotype-to-phenotype relationships, improving genomic predictions and more. Inferring haplotype maps from sparse genotype data is an efficient approach to whole-genome haplotyping, but this is a non-trivial problem. A standardized approach is needed to validate whether haplotype reconstruction software, conceived population designs and existing data for a given population provides accurate haplotype information for further inference.Entities:
Year: 2021 PMID: 32840564 PMCID: PMC8097754 DOI: 10.1093/bioinformatics/btaa749
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Overview of SPEARS. (A) Steps in the pipeline, file inputs/outputs and standard metrics. This begins with the creation of a simulated population (n virtual genotypes) using SAEGUS based on a user-provided population design, genetic map and parental genotype data. Genotyping errors and missing data are induced in the simulated data, which is then processed via MaCH (imputation) and RABBIT (ancestral haplotype reconstruction). Highlighted in gray are the main steps of the pipeline and files used to compute SPEARS metrics. (B) The figure portrays diplotypes (phased genotypes) for a genomic segment to describe how each summary statistic is calculated. Each of the three colors represents a distinct parent-of-origin. Heterozygous genotypes are shown in white text. The top diplotype represents known data that would be generated by simulation. The bottom diplotype represents inferred data that would be generated by RABBIT as part of the SPEARS pipeline. The formulas and variables used to calculate each metric are shown. CCC is computed as the correlation between known and inferred crossover events for all individuals and is therefore not shown for the example diplotype. The example shows five crossovers for the known individual and four crossovers for the inferred individual