| Literature DB >> 35410384 |
Jana Ebler1, Peter Ebert1, Wayne E Clarke2, Tobias Rausch3,4, Peter A Audano5, Torsten Houwaart6, Yafei Mao5, Jan O Korbel3, Evan E Eichler5,7, Michael C Zody2, Alexander T Dilthey6,8,9, Tobias Marschall10.
Abstract
Typical genotyping workflows map reads to a reference genome before identifying genetic variants. Generating such alignments introduces reference biases and comes with substantial computational burden. Furthermore, short-read lengths limit the ability to characterize repetitive genomic regions, which are particularly challenging for fast k-mer-based genotypers. In the present study, we propose a new algorithm, PanGenie, that leverages a haplotype-resolved pangenome reference together with k-mer counts from short-read sequencing data to genotype a wide spectrum of genetic variation-a process we refer to as genome inference. Compared with mapping-based approaches, PanGenie is more than 4 times faster at 30-fold coverage and achieves better genotype concordances for almost all variant types and coverages tested. Improvements are especially pronounced for large insertions (≥50 bp) and variants in repetitive regions, enabling the inclusion of these classes of variants in genome-wide association studies. PanGenie efficiently leverages the increasing amount of haplotype-resolved assemblies to unravel the functional impact of previously inaccessible variants while being faster compared with alignment-based workflows.Entities:
Mesh:
Year: 2022 PMID: 35410384 PMCID: PMC9005351 DOI: 10.1038/s41588-022-01043-w
Source DB: PubMed Journal: Nat Genet ISSN: 1061-4036 Impact factor: 38.330