| Literature DB >> 22068540 |
Phillip E C Compeau1, Pavel A Pevzner, Glenn Tesler.
Abstract
Entities:
Mesh:
Year: 2011 PMID: 22068540 PMCID: PMC5531759 DOI: 10.1038/nbt.2023
Source DB: PubMed Journal: Nat Biotechnol ISSN: 1087-0156 Impact factor: 54.908
Figure 1Bridges of Konigsberg problem
(a) A map of old Königsberg, in which each area of the city is labeled with a differently colored point. (b) The Königsberg Bridge Graph, formed by representing each of four land areas as a node and each of the city’s seven bridges as an edge.
Figure 2Two strategies for genome assembly: from Hamiltonian cycles to Eulerian cycles
(a) A simplified example of a small circular genome. (b) In traditional Sanger sequencing algorithms, reads were represented as nodes in a graph, and edges represented alignments between reads. Walking along a Hamiltonian cycle by following the edges in numerical order allows one to reconstruct the circular genome by combining alignments between successive reads. At the end of the cycle, the sequence wraps around to the start of the genome; the repeated part of the sequence is grayed out in the alignment diagram. (c) An alternative assembly technique first splits reads into all possible k-mers: with k = 3, “ATGGCGT” comprises ATG, TGG, GGC, GCG and CGT. Following a Hamiltonian cycle (indicated by red edges) allows one to reconstruct the genome by forming an alignment in which each successive k-mer (from successive nodes) is shifted by one position. This procedure recovers the genome but does not scale well to large graphs. (d) Modern short-read assembly algorithms construct a de Bruijn graph by representing all k-mer prefixes and suffixes as nodes, then drawing edges that represent k-mers having a particular prefix and suffix. For example, k-mer edge ATG has prefix AT and suffix TG. Finding an Eulerian cycle allows one to reconstruct the genome by forming an alignment in which each successive k-mer (from successive edges) is shifted by one position. This generates the same cyclic genome sequence without the computational strain of finding a Hamiltonian cycle.