| Literature DB >> 28957657 |
Yaron Orenstein1, Robert Puccinelli2, Ryan Kim3, Polly Fordyce4, Bonnie Berger5.
Abstract
Sequence libraries that cover all k-mers enable universal, unbiased measurements of binding to both oligonucleotides and peptides. While the number of k-mers grows exponentially in k, space on all experimental platforms is limited. Here, we shrink k-mer library sizes by using joker characters, which represent all characters in the alphabet simultaneously. We present the JokerCAKE (joker covering all k-mers) algorithm for generating a short sequence such that each k-mer appears at least p times with at most one joker character per k-mer. By running our algorithm on a range of parameters and alphabets, we show that JokerCAKE produces near-optimal sequences. Moreover, through comparison with data from hundreds of DNA-protein binding experiments and with new experimental results for both standard and JokerCAKE libraries, we establish that accurate binding scores can be inferred for high-affinity k-mers using JokerCAKE libraries. JokerCAKE libraries allow researchers to search a significantly larger sequence space using the same number of experimental measurements and at the same cost.Entities:
Keywords: de Bruijn graph; microarray design; sequence libraries
Mesh:
Substances:
Year: 2017 PMID: 28957657 PMCID: PMC5661997 DOI: 10.1016/j.cels.2017.07.006
Source DB: PubMed Journal: Cell Syst ISSN: 2405-4712 Impact factor: 10.304