Literature DB >> 36260643

A comparative analysis of current phasing and imputation software.

Adriano De Marino1, Abdallah Amr Mahmoud1, Madhuchanda Bose1, Karatuğ Ozan Bircan1, Andrew Terpolovsky1, Varuna Bamunusinghe1, Sandra Bohn1, Umar Khan1, Biljana Novković1, Puya G Yazdi1.   

Abstract

Whole-genome data has become significantly more accessible over the last two decades. This can largely be attributed to both reduced sequencing costs and imputation models which make it possible to obtain nearly whole-genome data from less expensive genotyping methods, such as microarray chips. Although there are many different approaches to imputation, the Hidden Markov Model (HMM) remains the most widely used. In this study, we compared the latest versions of the most popular HMM-based tools for phasing and imputation: Beagle5.4, Eagle2.4.1, Shapeit4, Impute5 and Minimac4. We benchmarked them on four input datasets with three levels of chip density. We assessed each imputation software on the basis of accuracy, speed and memory usage, and showed how the choice of imputation accuracy metric can result in different interpretations. The highest average concordance rate was achieved by Beagle5.4, followed by Impute5 and Minimac4, using a reference-based approach during phasing and the highest density chip. IQS and R2 metrics revealed that Impute5 and Minimac4 obtained better results for low frequency markers, while Beagle5.4 remained more accurate for common markers (MAF>5%). Computational load as measured by run time was lower for Beagle5.4 than Minimac4 and Impute5, while Minimac4 utilized the least memory of the imputation tools we compared. ShapeIT4, used the least memory of the phasing tools examined with genotype chip data, while Eagle2.4.1 used the least memory phasing WGS data. Finally, we determined the combination of phasing software, imputation software, and reference panel, best suited for different situations and analysis needs and created an automated pipeline that provides a way for users to create customized chips designed to optimize their imputation results.

Entities:  

Mesh:

Year:  2022        PMID: 36260643      PMCID: PMC9581364          DOI: 10.1371/journal.pone.0260177

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.752


  41 in total

1.  A linear complexity phasing method for thousands of genomes.

Authors:  Olivier Delaneau; Jonathan Marchini; Jean-François Zagury
Journal:  Nat Methods       Date:  2011-12-04       Impact factor: 28.547

2.  A comparison of linkage disequilibrium patterns and estimated population recombination rates across multiple populations.

Authors:  David M Evans; Lon R Cardon
Journal:  Am J Hum Genet       Date:  2005-02-17       Impact factor: 11.025

3.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase.

Authors:  Paul Scheet; Matthew Stephens
Journal:  Am J Hum Genet       Date:  2006-02-17       Impact factor: 11.025

4.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering.

Authors:  Sharon R Browning; Brian L Browning
Journal:  Am J Hum Genet       Date:  2007-09-21       Impact factor: 11.025

Review 5.  Genotype Imputation from Large Reference Panels.

Authors:  Sayantan Das; Gonçalo R Abecasis; Brian L Browning
Journal:  Annu Rev Genomics Hum Genet       Date:  2018-05-23       Impact factor: 8.929

6.  Comprehensive Assessment of Genotype Imputation Performance.

Authors:  Shuo Shi; Na Yuan; Ming Yang; Zhenglin Du; Jinyue Wang; Xin Sheng; Jiayan Wu; Jingfa Xiao
Journal:  Hum Hered       Date:  2019-01-22       Impact factor: 0.444

7.  Phasing of many thousands of genotyped samples.

Authors:  Amy L Williams; Nick Patterson; Joseph Glessner; Hakon Hakonarson; David Reich
Journal:  Am J Hum Genet       Date:  2012-08-10       Impact factor: 11.025

8.  A new statistic to evaluate imputation reliability.

Authors:  Peng Lin; Sarah M Hartz; Zhehao Zhang; Scott F Saccone; Jia Wang; Jay A Tischfield; Howard J Edenberg; John R Kramer; Alison M Goate; Laura J Bierut; John P Rice
Journal:  PLoS One       Date:  2010-03-15       Impact factor: 3.240

9.  An integrated map of structural variation in 2,504 human genomes.

Authors:  Peter H Sudmant; Tobias Rausch; Eugene J Gardner; Robert E Handsaker; Alexej Abyzov; John Huddleston; Yan Zhang; Kai Ye; Goo Jun; Markus Hsi-Yang Fritz; Miriam K Konkel; Ankit Malhotra; Adrian M Stütz; Xinghua Shi; Francesco Paolo Casale; Jieming Chen; Fereydoun Hormozdiari; Gargi Dayama; Ken Chen; Maika Malig; Mark J P Chaisson; Klaudia Walter; Sascha Meiers; Seva Kashin; Erik Garrison; Adam Auton; Hugo Y K Lam; Xinmeng Jasmine Mu; Can Alkan; Danny Antaki; Taejeong Bae; Eliza Cerveira; Peter Chines; Zechen Chong; Laura Clarke; Elif Dal; Li Ding; Sarah Emery; Xian Fan; Madhusudan Gujral; Fatma Kahveci; Jeffrey M Kidd; Yu Kong; Eric-Wubbo Lameijer; Shane McCarthy; Paul Flicek; Richard A Gibbs; Gabor Marth; Christopher E Mason; Androniki Menelaou; Donna M Muzny; Bradley J Nelson; Amina Noor; Nicholas F Parrish; Matthew Pendleton; Andrew Quitadamo; Benjamin Raeder; Eric E Schadt; Mallory Romanovitch; Andreas Schlattl; Robert Sebra; Andrey A Shabalin; Andreas Untergasser; Jerilyn A Walker; Min Wang; Fuli Yu; Chengsheng Zhang; Jing Zhang; Xiangqun Zheng-Bradley; Wanding Zhou; Thomas Zichner; Jonathan Sebat; Mark A Batzer; Steven A McCarroll; Ryan E Mills; Mark B Gerstein; Ali Bashir; Oliver Stegle; Scott E Devine; Charles Lee; Evan E Eichler; Jan O Korbel
Journal:  Nature       Date:  2015-10-01       Impact factor: 49.962

10.  Accurate, scalable and integrative haplotype estimation.

Authors:  Olivier Delaneau; Jean-François Zagury; Matthew R Robinson; Jonathan L Marchini; Emmanouil T Dermitzakis
Journal:  Nat Commun       Date:  2019-11-28       Impact factor: 14.919

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.