BACKGROUND: Reading the nucleotides from two ends of a DNA fragment is called paired-end tag (PET) sequencing. When the fragment length is longer than the combined read length, there remains a gap of unsequenced nucleotides between read pairs. If the target in such experiments is sequenced at a level to provide redundant coverage, it may be possible to bridge these gaps using bioinformatics methods. Konnector is a local de novo assembly tool that addresses this problem. Here we report on version 2.0 of our tool. RESULTS: Konnector uses a probabilistic and memory-efficient data structure called Bloom filter to represent a k-mer spectrum - all possible sequences of length k in an input file, such as the collection of reads in a PET sequencing experiment. It performs look-ups to this data structure to construct an implicit de Bruijn graph, which describes (k-1) base pair overlaps between adjacent k-mers. It traverses this graph to bridge the gap between a given pair of flanking sequences. CONCLUSIONS: Here we report the performance of Konnector v2.0 on simulated and experimental datasets, and compare it against other tools with similar functionality. We note that, representing k-mers with 1.5 bytes of memory on average, Konnector can scale to very large genomes. With our parallel implementation, it can also process over a billion bases on commodity hardware.
BACKGROUND: Reading the nucleotides from two ends of a DNA fragment is called paired-end tag (PET) sequencing. When the fragment length is longer than the combined read length, there remains a gap of unsequenced nucleotides between read pairs. If the target in such experiments is sequenced at a level to provide redundant coverage, it may be possible to bridge these gaps using bioinformatics methods. Konnector is a local de novo assembly tool that addresses this problem. Here we report on version 2.0 of our tool. RESULTS: Konnector uses a probabilistic and memory-efficient data structure called Bloom filter to represent a k-mer spectrum - all possible sequences of length k in an input file, such as the collection of reads in a PET sequencing experiment. It performs look-ups to this data structure to construct an implicit de Bruijn graph, which describes (k-1) base pair overlaps between adjacent k-mers. It traverses this graph to bridge the gap between a given pair of flanking sequences. CONCLUSIONS: Here we report the performance of Konnector v2.0 on simulated and experimental datasets, and compare it against other tools with similar functionality. We note that, representing k-mers with 1.5 bytes of memory on average, Konnector can scale to very large genomes. With our parallel implementation, it can also process over a billion bases on commodity hardware.
Authors: Aleksey V Zimin; Guillaume Marçais; Daniela Puiu; Michael Roberts; Steven L Salzberg; James A Yorke Journal: Bioinformatics Date: 2013-08-29 Impact factor: 6.937
Authors: Daniel C Koboldt; Ken Chen; Todd Wylie; David E Larson; Michael D McLellan; Elaine R Mardis; George M Weinstock; Richard K Wilson; Li Ding Journal: Bioinformatics Date: 2009-06-19 Impact factor: 6.937
Authors: Jason R Miller; Arthur L Delcher; Sergey Koren; Eli Venter; Brian P Walenz; Anushka Brownley; Justin Johnson; Kelvin Li; Clark Mobarry; Granger Sutton Journal: Bioinformatics Date: 2008-10-24 Impact factor: 6.937
Authors: Lauren Coombe; René L Warren; Shaun D Jackman; Chen Yang; Benjamin P Vandervalk; Richard A Moore; Stephen Pleasance; Robin J Coope; Joerg Bohlmann; Robert A Holt; Steven J M Jones; Inanc Birol Journal: PLoS One Date: 2016-09-15 Impact factor: 3.240
Authors: S Austin Hammond; René L Warren; Benjamin P Vandervalk; Erdi Kucuk; Hamza Khan; Ewan A Gibb; Pawan Pandoh; Heather Kirk; Yongjun Zhao; Martin Jones; Andrew J Mungall; Robin Coope; Stephen Pleasance; Richard A Moore; Robert A Holt; Jessica M Round; Sara Ohora; Branden V Walle; Nik Veldhoen; Caren C Helbing; Inanc Birol Journal: Nat Commun Date: 2017-11-10 Impact factor: 14.919
Authors: Shaun D Jackman; Benjamin P Vandervalk; Hamid Mohamadi; Justin Chu; Sarah Yeo; S Austin Hammond; Golnaz Jahesh; Hamza Khan; Lauren Coombe; Rene L Warren; Inanc Birol Journal: Genome Res Date: 2017-02-23 Impact factor: 9.043
Authors: Steven J M Jones; Gregory A Taylor; Simon Chan; René L Warren; S Austin Hammond; Steven Bilobram; Gideon Mordecai; Curtis A Suttle; Kristina M Miller; Angela Schulze; Amy M Chan; Samantha J Jones; Kane Tse; Irene Li; Dorothy Cheung; Karen L Mungall; Caleb Choo; Adrian Ally; Noreen Dhalla; Angela K Y Tam; Armelle Troussard; Heather Kirk; Pawan Pandoh; Daniel Paulino; Robin J N Coope; Andrew J Mungall; Richard Moore; Yongjun Zhao; Inanc Birol; Yussanne Ma; Marco Marra; Martin Haulena Journal: Genes (Basel) Date: 2017-12-11 Impact factor: 4.096
Authors: Samantha J Jones; Martin Haulena; Gregory A Taylor; Simon Chan; Steven Bilobram; René L Warren; S Austin Hammond; Karen L Mungall; Caleb Choo; Heather Kirk; Pawan Pandoh; Adrian Ally; Noreen Dhalla; Angela K Y Tam; Armelle Troussard; Daniel Paulino; Robin J N Coope; Andrew J Mungall; Richard Moore; Yongjun Zhao; Inanc Birol; Yussanne Ma; Marco Marra; Steven J M Jones Journal: Genes (Basel) Date: 2017-12-11 Impact factor: 4.096
Authors: Shaun D Jackman; René L Warren; Ewan A Gibb; Benjamin P Vandervalk; Hamid Mohamadi; Justin Chu; Anthony Raymond; Stephen Pleasance; Robin Coope; Mark R Wildung; Carol E Ritland; Jean Bousquet; Steven J M Jones; Joerg Bohlmann; Inanç Birol Journal: Genome Biol Evol Date: 2015-12-08 Impact factor: 3.416