| Literature DB >> 24152242 |
Abstract
BACKGROUND: Processing of reads from high throughput sequencing is often done in terms of edges in the de Bruijn graph representing all k-mers from the reads. The memory requirements for storing all k-mers in a lookup table can be demanding, even after removal of read errors, but can be alleviated by using a memory efficient data structure.Entities:
Mesh:
Year: 2013 PMID: 24152242 PMCID: PMC4015147 DOI: 10.1186/1471-2105-14-313
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The kFM-index data and corresponding de Bruijn subgraph. Representation of the data structure for DNA 4-mers. The vertex strings, lexigographically sorted, are not stored, but reconstructed from the edge and group end data. The edges columns indicate in-coming edges to each vertex, i.e. letters that may prefix the vertex strings. The group end flag inidicates groups of vertices with the same k−2-prefix. The previous position data can be generated from the edge set data and group end data and is constant within each vertex group; a subset is stored for computational speed.