Chen Sun1, Paul Medvedev1,2,3. 1. Department of Computer Science and Engineering, Pennsylvania State University, USA. 2. Department of Biochemistry and Molecular Biology, Pennsylvania State University, USA. 3. Center for Computational Biology and Bioinformatics, Pennsylvania State University, USA.
Abstract
Motivation: Genotyping a set of variants from a database is an important step for identifying known genetic traits and disease-related variants within an individual. The growing size of variant databases as well as the high depth of sequencing data poses an efficiency challenge. In clinical applications, where time is crucial, alignment-based methods are often not fast enough. To fill the gap, Shajii et al. propose LAVA, an alignment-free genotyping method which is able to more quickly genotype single nucleotide polymorphisms (SNPs); however, there remains large room for improvements in running time and accuracy. Results: We present the VarGeno method for SNP genotyping from Illumina whole genome sequencing data. VarGeno builds upon LAVA by improving the speed of k-mer querying as well as the accuracy of the genotyping strategy. We evaluate VarGeno on several read datasets using different genotyping SNP lists. VarGeno performs 7-13 times faster than LAVA with similar memory usage, while improving accuracy. Availability and implementation: VarGeno is freely available at: https://github.com/medvedevgroup/vargeno. Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: Genotyping a set of variants from a database is an important step for identifying known genetic traits and disease-related variants within an individual. The growing size of variant databases as well as the high depth of sequencing data poses an efficiency challenge. In clinical applications, where time is crucial, alignment-based methods are often not fast enough. To fill the gap, Shajii et al. propose LAVA, an alignment-free genotyping method which is able to more quickly genotype single nucleotide polymorphisms (SNPs); however, there remains large room for improvements in running time and accuracy. Results: We present the VarGeno method for SNP genotyping from Illumina whole genome sequencing data. VarGeno builds upon LAVA by improving the speed of k-mer querying as well as the accuracy of the genotyping strategy. We evaluate VarGeno on several read datasets using different genotyping SNP lists. VarGeno performs 7-13 times faster than LAVA with similar memory usage, while improving accuracy. Availability and implementation: VarGeno is freely available at: https://github.com/medvedevgroup/vargeno. Supplementary information: Supplementary data are available at Bioinformatics online.
Authors: Jana Ebler; Peter Ebert; Wayne E Clarke; Tobias Rausch; Peter A Audano; Torsten Houwaart; Yafei Mao; Jan O Korbel; Evan E Eichler; Michael C Zody; Alexander T Dilthey; Tobias Marschall Journal: Nat Genet Date: 2022-04-11 Impact factor: 38.330
Authors: Phillip Andrew Richmond; Alice Mary Kaye; Godfrain Jacques Kounkou; Tamar Vered Av-Shalom; Wyeth W Wasserman Journal: PLoS Comput Biol Date: 2021-03-22 Impact factor: 4.475
Authors: Ruchira M Jha; Benjamin E Zusman; Ava M Puccio; David O Okonkwo; Matthew Pease; Shashvat M Desai; Matthew Leach; Yvette P Conley; Patrick M Kochanek Journal: JAMA Netw Open Date: 2021-07-01