Jasmijn A Baaijens1, Bastiaan Van der Roest2, Johannes Köster3,4, Leen Stougie1,5,6, Alexander Schönhuth1,6,7. 1. Life Sciences and Health Group, Centrum Wiskunde & Informatica, Amsterdam, Netherlands. 2. Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, Netherlands. 3. Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany. 4. Medical Oncology, Dana Farber Cancer Institute, Harvard Medical School, Boston, MA, USA. 5. Department of Econometrics and Operations Research, Vrije Universiteit, Amsterdam, Netherlands. 6. INRIA-Erable, Grenoble, France. 7. Theoretical Biology and Bioinformatics, Utrecht University, Utrecht, Netherlands.
Abstract
MOTIVATION: Viruses populate their hosts as a viral quasispecies: a collection of genetically related mutant strains. Viral quasispecies assembly is the reconstruction of strain-specific haplotypes from read data, and predicting their relative abundances within the mix of strains is an important step for various treatment-related reasons. Reference genome independent ('de novo') approaches have yielded benefits over reference-guided approaches, because reference-induced biases can become overwhelming when dealing with divergent strains. While being very accurate, extant de novo methods only yield rather short contigs. The remaining challenge is to reconstruct full-length haplotypes together with their abundances from such contigs. RESULTS: We present Virus-VG as a de novo approach to viral haplotype reconstruction from preassembled contigs. Our method constructs a variation graph from the short input contigs without making use of a reference genome. Then, to obtain paths through the variation graph that reflect the original haplotypes, we solve a minimization problem that yields a selection of maximal-length paths that is, optimal in terms of being compatible with the read coverages computed for the nodes of the variation graph. We output the resulting selection of maximal length paths as the haplotypes, together with their abundances. Benchmarking experiments on challenging simulated and real datasets show significant improvements in assembly contiguity compared to the input contigs, while preserving low error rates compared to the state-of-the-art viral quasispecies assemblers. AVAILABILITY AND IMPLEMENTATION: Virus-VG is freely available at https://bitbucket.org/jbaaijens/virus-vg. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Viruses populate their hosts as a viral quasispecies: a collection of genetically related mutant strains. Viral quasispecies assembly is the reconstruction of strain-specific haplotypes from read data, and predicting their relative abundances within the mix of strains is an important step for various treatment-related reasons. Reference genome independent ('de novo') approaches have yielded benefits over reference-guided approaches, because reference-induced biases can become overwhelming when dealing with divergent strains. While being very accurate, extant de novo methods only yield rather short contigs. The remaining challenge is to reconstruct full-length haplotypes together with their abundances from such contigs. RESULTS: We present Virus-VG as a de novo approach to viral haplotype reconstruction from preassembled contigs. Our method constructs a variation graph from the short input contigs without making use of a reference genome. Then, to obtain paths through the variation graph that reflect the original haplotypes, we solve a minimization problem that yields a selection of maximal-length paths that is, optimal in terms of being compatible with the read coverages computed for the nodes of the variation graph. We output the resulting selection of maximal length paths as the haplotypes, together with their abundances. Benchmarking experiments on challenging simulated and real datasets show significant improvements in assembly contiguity compared to the input contigs, while preserving low error rates compared to the state-of-the-art viral quasispecies assemblers. AVAILABILITY AND IMPLEMENTATION: Virus-VG is freely available at https://bitbucket.org/jbaaijens/virus-vg. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Jordan M Eizenga; Adam M Novak; Jonas A Sibbesen; Simon Heumos; Ali Ghaffaari; Glenn Hickey; Xian Chang; Josiah D Seaman; Robin Rounthwaite; Jana Ebler; Mikko Rautiainen; Shilpa Garg; Benedict Paten; Tobias Marschall; Jouni Sirén; Erik Garrison Journal: Annu Rev Genomics Hum Genet Date: 2020-05-26 Impact factor: 8.929
Authors: Kim M Pepin; Matthew W Hopken; Susan A Shriner; Erica Spackman; Zaid Abdo; Colin Parrish; Steven Riley; James O Lloyd-Smith; Antoinette J Piaggio Journal: Philos Trans R Soc Lond B Biol Sci Date: 2019-08-12 Impact factor: 6.237
Authors: Anton Eliseev; Keylie M Gibson; Pavel Avdeyev; Dmitry Novik; Matthew L Bendall; Marcos Pérez-Losada; Nikita Alexeev; Keith A Crandall Journal: Infect Genet Evol Date: 2020-03-06 Impact factor: 3.342
Authors: Sergey Knyazev; Viachaslau Tsyvina; Anupama Shankar; Andrew Melnyk; Alexander Artyomenko; Tatiana Malygina; Yuri B Porozov; Ellsworth M Campbell; William M Switzer; Pavel Skums; Serghei Mangul; Alex Zelikovsky Journal: Nucleic Acids Res Date: 2021-09-27 Impact factor: 16.971
Authors: Christopher Quince; Sergey Nurk; Sebastien Raguideau; Robert James; Orkun S Soyer; J Kimberly Summers; Antoine Limasset; A Murat Eren; Rayan Chikhi; Aaron E Darling Journal: Genome Biol Date: 2021-07-26 Impact factor: 13.583