Lei Zhang1, Yu-Fang Pei1, Xiaoying Fu2, Yong Lin2, Yu-Ping Wang2, Hong-Wen Deng2. 1. School of Public Health, Xi'an Jiaotong University, Shaanxi, China, Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, USA and Center of System Biomedical Sciences, University of Shanghai for Science and Technology, Shanghai, ChinaSchool of Public Health, Xi'an Jiaotong University, Shaanxi, China, Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, USA and Center of System Biomedical Sciences, University of Shanghai for Science and Technology, Shanghai, China. 2. School of Public Health, Xi'an Jiaotong University, Shaanxi, China, Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, USA and Center of System Biomedical Sciences, University of Shanghai for Science and Technology, Shanghai, China.
Abstract
MOTIVATION: Fast and accurate genotype imputation is necessary for facilitating gene-mapping studies, especially with the ever increasing numbers of both common and rare variants generated by high-throughput-sequencing experiments. However, most of the existing imputation approaches suffer from either inaccurate results or heavy computational demand. RESULTS: In this article, aiming to perform fast and accurate genotype-imputation analysis, we propose a novel, fast and yet accurate method to impute diploid genotypes. Specifically, we extend a hidden Markov model that is widely used to describe haplotype structures. But we model hidden states onto single reference haplotypes rather than onto pairs of haplotypes. Consequently the computational complexity is linear to size of reference haplotypes. We further develop an algorithm 'merge-and-recover (MAR)' to speed up the calculation. Working on compact representation of segmental reference haplotypes, the MAR algorithm always calculates an exact form of transition probabilities regardless of partition of segments. Both simulation studies and real-data analyses demonstrated that our proposed method was comparable to most of the existing popular methods in terms of imputation accuracy, but was much more efficient in terms of computation. The MAR algorithm can further speed up the calculation by several folds without loss of accuracy. The proposed method will be useful in large-scale imputation studies with a large number of reference subjects. AVAILABILITY: The implemented multi-threading software FISH is freely available for academic use at https://sites.google.com/site/lzhanghomepage/FISH.
MOTIVATION: Fast and accurate genotype imputation is necessary for facilitating gene-mapping studies, especially with the ever increasing numbers of both common and rare variants generated by high-throughput-sequencing experiments. However, most of the existing imputation approaches suffer from either inaccurate results or heavy computational demand. RESULTS: In this article, aiming to perform fast and accurate genotype-imputation analysis, we propose a novel, fast and yet accurate method to impute diploid genotypes. Specifically, we extend a hidden Markov model that is widely used to describe haplotype structures. But we model hidden states onto single reference haplotypes rather than onto pairs of haplotypes. Consequently the computational complexity is linear to size of reference haplotypes. We further develop an algorithm 'merge-and-recover (MAR)' to speed up the calculation. Working on compact representation of segmental reference haplotypes, the MAR algorithm always calculates an exact form of transition probabilities regardless of partition of segments. Both simulation studies and real-data analyses demonstrated that our proposed method was comparable to most of the existing popular methods in terms of imputation accuracy, but was much more efficient in terms of computation. The MAR algorithm can further speed up the calculation by several folds without loss of accuracy. The proposed method will be useful in large-scale imputation studies with a large number of reference subjects. AVAILABILITY: The implemented multi-threading software FISH is freely available for academic use at https://sites.google.com/site/lzhanghomepage/FISH.
Authors: D E Reich; M Cargill; S Bolk; J Ireland; P C Sabeti; D J Richter; T Lavery; R Kouyoumjian; S F Farhadian; R Ward; E S Lander Journal: Nature Date: 2001-05-10 Impact factor: 49.962
Authors: Stephen F Schaffner; Catherine Foo; Stacey Gabriel; David Reich; Mark J Daly; David Altshuler Journal: Genome Res Date: 2005-11 Impact factor: 9.043
Authors: Shaun Purcell; Benjamin Neale; Kathe Todd-Brown; Lori Thomas; Manuel A R Ferreira; David Bender; Julian Maller; Pamela Sklar; Paul I W de Bakker; Mark J Daly; Pak C Sham Journal: Am J Hum Genet Date: 2007-07-25 Impact factor: 11.025
Authors: Qing Duan; Eric Yi Liu; Paul L Auer; Guosheng Zhang; Ethan M Lange; Goo Jun; Chris Bizon; Shuo Jiao; Steven Buyske; Nora Franceschini; Chris S Carlson; Li Hsu; Alex P Reiner; Ulrike Peters; Jeffrey Haessler; Keith Curtis; Christina L Wassel; Jennifer G Robinson; Lisa W Martin; Christopher A Haiman; Loic Le Marchand; Tara C Matise; Lucia A Hindorff; Dana C Crawford; Themistocles L Assimes; Hyun Min Kang; Gerardo Heiss; Rebecca D Jackson; Charles Kooperberg; James G Wilson; Gonçalo R Abecasis; Kari E North; Deborah A Nickerson; Leslie A Lange; Yun Li Journal: Bioinformatics Date: 2013-08-16 Impact factor: 6.937