| Literature DB >> 33325773 |
Hongyu Zheng1, Carl Kingsford1, Guillaume Marçais1.
Abstract
Universal hitting sets (UHS) are sets of words that are unavoidable: every long enough sequence is hit by the set (i.e., it contains a word from the set). There is a tight relationship between UHS and minimizer schemes, where minimizer schemes with low density (i.e., efficient schemes) correspond to UHS of small size. Local schemes are a generalization of minimizer schemes that can be used as replacement for minimizer scheme with the possibility of being much more efficient. We establish the link between efficient local schemes and the minimum length of a string that must be hit by a UHS. We give bounds for the remaining path length of the Mykkeltveit UHS. In addition, we create a local scheme with the lowest known density that is only a log factor away from the theoretical lower bound.Entities:
Keywords: de Bruijn graph; depathing set; minimizers; sequence sketch; universal hitting set
Mesh:
Year: 2020 PMID: 33325773 PMCID: PMC8066347 DOI: 10.1089/cmb.2020.0432
Source DB: PubMed Journal: J Comput Biol ISSN: 1066-5277 Impact factor: 1.549
FIG. 1.(a) Example of selecting minimizers with , , and the lexicographic order (i.e., ). The top line is the input sequence, each subsequent line is a 7-bases long window (the number of bases in a window is ) with the minimum 3-mer highlighted. The positions {1, 2, 5, 9, 10, 11} are selected for a density . (b) On the same sequence, an example of a selection scheme for (and because it is a selection scheme, hence the number of bases in a window is also w). The set of positions selected is {1, 6, 7, 8, 11, 13, 14}. This is not a forward scheme as the sequence of selected position is not decreasing. (c) A forward selection scheme for with selected positions {1, 7, 8, 12, 13}. Like the minimizer scheme, the sequence of selected positions is nondecreasing.
FIG. 2.(a) Mykkeltveit embedding of the de Bruijn graph of order 5 on the binary alphabet. The nodes of a conjugacy class have the same color and form a circle (there is more than one class per circle). The pure rotations are represented by the red edges. A nonpure rotation is a red edge followed by a horizontal shift (blue edge). The set of nodes circled in gray is the Mykkeltveit set. (b) Weight-in embedding of the same graph. Multiple w-mers map to the same position in this embedding and each circle represents a conjugacy class. The gray dots on the horizontal axis are the w centers of rotations and the vertical gray lines going through the centers separate the space in subregions of interest.
FIG. 3.(a) For w = 40, each set of four arrows of the same color represents a quadruple set of root of unity. There are a total of five sets. They were crafted so that the four vectors in each set cancel out. (b) The path generated by these quadruple sets. The top circle of radius 1 is traveled many times (between tags r1 and r2 in each quadruple), as after setting the 4 bits to 0, the w-mer has the same norm as the starting point.