Literature DB >> 30117747

Joker de Bruijn: Covering k-Mers Using Joker Characters.

Yaron Orenstein1,2, Yun William Yu3, Bonnie Berger2,3.   

Abstract

Sequence libraries that cover all k-mers enable universal and unbiased measurements of nucleotide and peptide binding. The shortest sequence to cover all k-mers is a de Bruijn sequence of length [Formula: see text]. Researchers would like to increase k to measure interactions at greater detail, but face a challenging problem: the number of k-mers grows exponentially in k, while the space on the experimental device is limited. In this study, we introduce a novel advance to shrink k-mer library sizes by using joker characters, which represent all characters in the alphabet. Theoretically, the use of joker characters can reduce the library size tremendously, but it should be limited as the introduced degeneracy lowers the statistical robustness of measurements. In this work, we consider the problem of generating a minimum-length sequence that covers a given set of k-mers using joker characters. The number and positions of the joker characters are provided as input. We first prove that the problem is NP-hard. We then present the first solution to the problem, which is based on two algorithmic innovations: (1) a greedy heuristic and (2) an integer linear programming (ILP) formulation. We first run the heuristic to find a good feasible solution, and then run an ILP solver to improve it. We ran our algorithm on DNA and amino acid alphabets to cover all k-mers for different values of k and k-mer multiplicity. Results demonstrate that it produces sequences that are very close to the theoretical lower bound.

Entities:  

Keywords:  de Bruijn sequence; microarray library design; peptide arrays; protein binding; protein binding microarrays

Mesh:

Substances:

Year:  2018        PMID: 30117747      PMCID: PMC6247992          DOI: 10.1089/cmb.2018.0032

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  8 in total

1.  Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities.

Authors:  Michael F Berger; Anthony A Philippakis; Aaron M Qureshi; Fangxue S He; Preston W Estep; Martha L Bulyk
Journal:  Nat Biotechnol       Date:  2006-09-24       Impact factor: 54.908

2.  Design of compact, universal DNA microarrays for protein binding microarray experiments.

Authors:  Anthony A Philippakis; Aaron M Qureshi; Michael F Berger; Martha L Bulyk
Journal:  J Comput Biol       Date:  2008-09       Impact factor: 1.479

3.  Efficient Design of Compact Unstructured RNA Libraries Covering All k-mers.

Authors:  Yaron Orenstein; Bonnie Berger
Journal:  J Comput Biol       Date:  2015-12-29       Impact factor: 1.479

4.  Peptide arrays identify isoform-selective substrates for profiling endogenous lysine deacetylase activity.

Authors:  Zachary A Gurard-Levin; Kristopher A Kilian; Joohoon Kim; Katinka Bähr; Milan Mrksich
Journal:  ACS Chem Biol       Date:  2010-09-17       Impact factor: 5.100

5.  A compendium of RNA-binding motifs for decoding gene regulation.

Authors:  Debashish Ray; Hilal Kazan; Kate B Cook; Matthew T Weirauch; Hamed S Najafabadi; Xiao Li; Serge Gueroussov; Mihai Albu; Hong Zheng; Ally Yang; Hong Na; Manuel Irimia; Leah H Matzat; Ryan K Dale; Sarah A Smith; Christopher A Yarosh; Seth M Kelly; Behnam Nabet; Desirea Mecenas; Weimin Li; Rakesh S Laishram; Mei Qiao; Howard D Lipshitz; Fabio Piano; Anita H Corbett; Russ P Carstens; Brendan J Frey; Richard A Anderson; Kristen W Lynch; Luiz O F Penalva; Elissa P Lei; Andrew G Fraser; Benjamin J Blencowe; Quaid D Morris; Timothy R Hughes
Journal:  Nature       Date:  2013-07-11       Impact factor: 49.962

6.  De novo identification and biophysical characterization of transcription-factor binding sites with microfluidic affinity analysis.

Authors:  Polly M Fordyce; Doron Gerber; Danh Tran; Jiashun Zheng; Hao Li; Joseph L DeRisi; Stephen R Quake
Journal:  Nat Biotechnol       Date:  2010-08-29       Impact factor: 54.908

7.  Design of shortest double-stranded DNA sequences covering all k-mers with applications to protein-binding microarrays and synthetic enhancers.

Authors:  Yaron Orenstein; Ron Shamir
Journal:  Bioinformatics       Date:  2013-07-01       Impact factor: 6.937

8.  A compact, in vivo screen of all 6-mers reveals drivers of tissue-specific expression and guides synthetic regulatory element design.

Authors:  Robin P Smith; Samantha J Riesenfeld; Alisha K Holloway; Qiang Li; Karl K Murphy; Natalie M Feliciano; Lorenzo Orecchia; Nir Oksenberg; Katherine S Pollard; Nadav Ahituv
Journal:  Genome Biol       Date:  2013-07-18       Impact factor: 13.583

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.