Literature DB >> 25183486

String graph construction using incremental hashing.

Ilan Ben-Bassat1, Benny Chor1.   

Abstract

MOTIVATION: New sequencing technologies generate larger amount of short reads data at decreasing cost. De novo sequence assembly is the problem of combining these reads back to the original genome sequence, without relying on a reference genome. This presents algorithmic and computational challenges, especially for long and repetitive genome sequences. Most existing approaches to the assembly problem operate in the framework of de Bruijn graphs. Yet, a number of recent works use the paradigm of string graph, using a variety of methods for storing and processing suffixes and prefixes, like suffix arrays, the Burrows-Wheeler transform or the FM index. Our work is motivated by a search for new approaches to constructing the string graph, using alternative yet simple data structures and algorithmic concepts.
RESULTS: We introduce a novel hash-based method for constructing the string graph. We use incremental hashing, and specifically a modification of the Karp-Rabin fingerprint, and Bloom filters. Using these probabilistic methods might create false-positive and false-negative edges during the algorithm's execution, but these are all detected and corrected. The advantages of the proposed approach over existing methods are its simplicity and the incorporation of established probabilistic techniques in the context of de novo genome sequencing. Our preliminary implementation is favorably comparable with the first string graph construction of Simpson and Durbin (2010) (but not with subsequent improvements). Further research and optimizations will hopefully enable the algorithm to be incorporated, with noticeable performance improvement, in state-of-the-art string graph-based assemblers.
© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Mesh:

Year:  2014        PMID: 25183486     DOI: 10.1093/bioinformatics/btu578

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  1 in total

1.  Structural variants shape the genomic landscape and clinical outcome of multiple myeloma.

Authors:  Cody Ashby; Eileen M Boyle; Michael A Bauer; Aneta Mikulasova; Christopher P Wardell; Louis Williams; Ariel Siegel; Patrick Blaney; Marc Braunstein; David Kaminetsky; Jonathan Keats; Francesco Maura; Ola Landgren; Brian A Walker; Faith E Davies; Gareth J Morgan
Journal:  Blood Cancer J       Date:  2022-05-30       Impact factor: 9.812

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.