Literature DB >> 35785021

P-smoother: efficient PBWT smoothing of large haplotype panels.

William Yue1, Ardalan Naseri1, Victor Wang1, Pramesh Shakya2, Shaojie Zhang2, Degui Zhi1.   

Abstract

Motivation: As large haplotype panels become increasingly available, efficient string matching algorithms such as positional Burrows-Wheeler transformation (PBWT) are promising for identifying shared haplotypes. However, recent mutations and genotyping errors create occasional mismatches, presenting challenges for exact haplotype matching. Previous solutions are based on probabilistic models or seed-and-extension algorithms that passively tolerate mismatches.
Results: Here, we propose a PBWT-based smoothing algorithm, P-smoother, to actively 'correct' these mismatches and thus 'smooth' the panel. P-smoother runs a bidirectional PBWT-based panel scanning that flips mismatching alleles based on the overall haplotype matching context, which we call the IBD (identical-by-descent) prior. In a simulated panel with 4000 haplotypes and a 0.2% error rate, we show it can reliably correct 85% of errors. As a result, PBWT algorithms running over the smoothed panel can identify more pairwise IBD segments than that over the unsmoothed panel. Most strikingly, a PBWT-cluster algorithm running over the smoothed panel, which we call PS-cluster, achieves state-of-the-art performance for identifying multiway IBD segments, a challenging problem in the computational community for years. We also showed that PS-cluster is adequately efficient for UK Biobank data. Therefore, P-smoother opens up new possibilities for efficient error-tolerating algorithms for biobank-scale haplotype panels. Availability and implementation: Source code is available at github.com/ZhiGroup/P-smoother.
© The Author(s) 2022. Published by Oxford University Press.

Entities:  

Year:  2022        PMID: 35785021      PMCID: PMC9245627          DOI: 10.1093/bioadv/vbac045

Source DB:  PubMed          Journal:  Bioinform Adv        ISSN: 2635-0041


  20 in total

Review 1.  Genotype imputation for genome-wide association studies.

Authors:  Jonathan Marchini; Bryan Howie
Journal:  Nat Rev Genet       Date:  2010-07       Impact factor: 53.242

2.  Efficient clustering of identity-by-descent between multiple individuals.

Authors:  Yu Qian; Brian L Browning; Sharon R Browning
Journal:  Bioinformatics       Date:  2013-12-19       Impact factor: 6.937

3.  A method for detecting IBD regions simultaneously in multiple individuals--with applications to disease genetics.

Authors:  Ida Moltke; Anders Albrechtsen; Thomas V O Hansen; Finn C Nielsen; Rasmus Nielsen
Journal:  Genome Res       Date:  2011-04-14       Impact factor: 9.043

4.  Efficient haplotype matching and storage using the positional Burrows-Wheeler transform (PBWT).

Authors:  Richard Durbin
Journal:  Bioinformatics       Date:  2014-01-09       Impact factor: 6.937

5.  Fast and accurate long-range phasing in a UK Biobank cohort.

Authors:  Po-Ru Loh; Pier Francesco Palamara; Alkes L Price
Journal:  Nat Genet       Date:  2016-06-06       Impact factor: 38.330

6.  Maximal Perfect Haplotype Blocks with Wildcards.

Authors:  Lucia Williams; Brendan Mumey
Journal:  iScience       Date:  2020-05-11

7.  Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes.

Authors:  Jerome Kelleher; Alison M Etheridge; Gilean McVean
Journal:  PLoS Comput Biol       Date:  2016-05-04       Impact factor: 4.475

8.  Reference-based phasing using the Haplotype Reference Consortium panel.

Authors:  Po-Ru Loh; Petr Danecek; Pier Francesco Palamara; Christian Fuchsberger; Yakir A Reshef; Hilary K Finucane; Sebastian Schoenherr; Lukas Forer; Shane McCarthy; Goncalo R Abecasis; Richard Durbin; Alkes L Price
Journal:  Nat Genet       Date:  2016-10-03       Impact factor: 38.330

9.  Insights into clonal haematopoiesis from 8,342 mosaic chromosomal alterations.

Authors:  Po-Ru Loh; Giulio Genovese; Robert E Handsaker; Hilary K Finucane; Yakir A Reshef; Pier Francesco Palamara; Brenda M Birmann; Michael E Talkowski; Samuel F Bakhoum; Steven A McCarroll; Alkes L Price
Journal:  Nature       Date:  2018-07-11       Impact factor: 49.962

10.  Benchmarking of computational error-correction methods for next-generation sequencing data.

Authors:  Keith Mitchell; Jaqueline J Brito; Igor Mandric; Qiaozhen Wu; Sergey Knyazev; Sei Chang; Lana S Martin; Aaron Karlsberg; Ekaterina Gerasimov; Russell Littman; Brian L Hill; Nicholas C Wu; Harry Taegyun Yang; Kevin Hsieh; Linus Chen; Eli Littman; Taylor Shabani; German Enik; Douglas Yao; Ren Sun; Jan Schroeder; Eleazar Eskin; Alex Zelikovsky; Pavel Skums; Mihai Pop; Serghei Mangul
Journal:  Genome Biol       Date:  2020-03-17       Impact factor: 13.583

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.