Literature DB >> 27595130

Data supporting the high-accuracy haplotype imputation using unphased genotype data as the references.

Wenzhi Li1, Wei Xu2, Shaohua He3, Li Ma4, Qing Song5.   

Abstract

The data presented in this article is related to the research article entitled "High-accuracy haplotype imputation using unphased genotype data as the references" which reports the unphased genotype data can be used as reference for haplotyping imputation [1]. This article reports different implementation generation pipeline, the results of performance comparison between different implementations (A, B, and C) and between HiFi and three major imputation software tools. Our data showed that the performances of these three implementations are similar on accuracy, in which the accuracy of implementation-B is slightly but consistently higher than A and C. HiFi performed better on haplotype imputation accuracy and three other software performed slightly better on genotype imputation accuracy. These data may provide a strategy for choosing optimal phasing pipeline and software for different studies.

Entities:  

Year:  2016        PMID: 27595130      PMCID: PMC4995474          DOI: 10.1016/j.dib.2016.06.029

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table Value of the data This data is beneficial to researchers who are interested in haplotyping The data may provide guidance on how to choose the optimal phasing pipeline. This data is beneficial to researchers who are interested in imputations and comparison between HiFi and three major phasing software tools (MACH, Impute2 and Beagle) on the accuracy and speed. The data may provide guidance on how to choose the suitable software for different study. This data is helpful to compare between HiFi and three major phasing software tools (MACH, Impute2 and Beagle) on their tolerance on statistical reference panels.

Data

Data presented are summaries of comparison of HiFi performances with three different implementations A, B and C; comparison of HiFi and three standard imputation software performances with molecular reference and statistical reference. The data showed that implementation-B is slightly but consistently higher than A and C; and the data also showed that HiFi performed better on haplotype imputation accuracy and speed,three other tools performed slightly better on genotype imputation.

Experimental design, materials and methods

Acquisition and processing of HapMap data for different implementations

We downloaded CEU (CEPH, U.S. Utah residents with ancestry from northern and western Europe) chromosome 1 genotype data and haplotype data from HapMap in text format [5], [6]. We use the original haplotype data as molecular reference. To generate the statistical haplotype reference panel, we erased the phase information from those trio haplotypes downloaded from HapMap, and then used the software Beagle version 3.3.2 to resolve the haplotypes from the unphased genotypes. Then we generated following three different implementations by Beagle version 3.3.2: (A) Beagle statistical phasing of unrelated persons and Mendelian-inheritance-based phasing of trios, and then pools the results together; (B) Beagle statistical phasing of pooled unrelated persons and trios, but presumes all as unrelated; and (C) Beagle statistical phasing of pooled unrelated persons and trios, and specifying the family structure in the input. And we chose same 6 samples [2] for further analysis.

Comparison of HiFi performances with three different implementations A, B and C

We compared the HiFi performances with three different implementations. Our data showed that the performances of these three implementations are similar on accuracy, in which the accuracy of implementation-B is slightly but consistently higher than A and C (Table S1).

Comparison of HiFi and three standard imputation software performances with molecular reference and statistical reference

We compared the performance between HiFi and three standard imputation software tools (MACH, IMPUTE2 and BEAGLE) [7], [8], [9]. As the result, HiFi performed better on haplotype imputation accuracy (Table S2) and speed (Table S4), whereas MACH, IMPUTE2 and BEAGLE performed slightly better on genotype imputation accuracy (Table S3), in which MACH and IMPUTE2 performed the best on genotype imputation.
Subject areaBiology
More specific subject areaBioinformatics
Type of dataTables
How data was acquiredGenotype and haplotype data were obtained from the International HapMap Project database
Data formatAnalyzed
Experimental factorsThe original data were reformatted to fit the requirement of different software
Experimental featuresWe generated different implementations from HapMap data set. Then: [1] We compared the performance of different implementations [2]. We compared the phasing performances among HiFi, MACH 1.0, IMPUTE2, BEAGLE.
Data source locationAtlanta, Georgia, USA
Data accessibilityThe data are with this article
  9 in total

1.  GenomeLaser: fast and accurate haplotyping from pedigree genotypes.

Authors:  Wenzhi Li; Guoxing Fu; Weinian Rao; Wei Xu; Li Ma; Shiwen Guo; Qing Song
Journal:  Bioinformatics       Date:  2015-08-18       Impact factor: 6.937

2.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering.

Authors:  Sharon R Browning; Brian L Browning
Journal:  Am J Hum Genet       Date:  2007-09-21       Impact factor: 11.025

3.  High-resolution whole-genome haplotyping using limited seed data.

Authors:  Weinian Rao; Yamin Ma; Li Ma; Jian Zhao; Qiling Li; Weikuan Gu; Kui Zhang; Vincent C Bond; Qing Song
Journal:  Nat Methods       Date:  2013-01       Impact factor: 28.547

4.  MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes.

Authors:  Yun Li; Cristen J Willer; Jun Ding; Paul Scheet; Gonçalo R Abecasis
Journal:  Genet Epidemiol       Date:  2010-12       Impact factor: 2.135

5.  Functional pseudogenes inhibit the superoxide production.

Authors:  Wei Xu; Li Ma; Wenzhi Li; Tiffany A Brunson; Xiaohua Tian; Jendai Richards; Qiling Li; Tameka Bythwood; Zuyi Yuan; Qing Song
Journal:  Precis Med       Date:  2015

6.  High-accuracy haplotype imputation using unphased genotype data as the references.

Authors:  Wenzhi Li; Wei Xu; Guoxing Fu; Li Ma; Jendai Richards; Weinian Rao; Tameka Bythwood; Shiwen Guo; Qing Song
Journal:  Gene       Date:  2015-07-30       Impact factor: 3.688

7.  Accurate inference of local phased ancestry of modern admixed populations.

Authors:  Yamin Ma; Jian Zhao; Jian-Syuan Wong; Li Ma; Wenzhi Li; Guoxing Fu; Wei Xu; Kui Zhang; Rick A Kittles; Yun Li; Qing Song
Journal:  Sci Rep       Date:  2014-07-23       Impact factor: 4.379

8.  References for Haplotype Imputation in the Big Data Era.

Authors:  Wenzhi Li; Wei Xu; Qiling Li; Li Ma; Qing Song
Journal:  Mol Biol (Los Angel)       Date:  2015-10-31

9.  A flexible and accurate genotype imputation method for the next generation of genome-wide association studies.

Authors:  Bryan N Howie; Peter Donnelly; Jonathan Marchini
Journal:  PLoS Genet       Date:  2009-06-19       Impact factor: 5.917

  9 in total
  2 in total

1.  The mirror RNA expression pattern in human tissues.

Authors:  Tameka N Bythwood; Wei Xu; Wenzhi Li; Weinian Rao; Qiling Li; Xue Xue; Jendai Richards; Li Ma; Qing Song
Journal:  Precis Med       Date:  2015-10-01

2.  High-accuracy haplotype imputation using unphased genotype data as the references.

Authors:  Wenzhi Li; Wei Xu; Guoxing Fu; Li Ma; Jendai Richards; Weinian Rao; Tameka Bythwood; Shiwen Guo; Qing Song
Journal:  Gene       Date:  2015-07-30       Impact factor: 3.688

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.