Sven Rahmann1. 1. Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany. Sven.Rahmann@molgen.mpg.de
Abstract
MOTIVATION: During microarray production, several thousands of oligonucleotides (short DNA sequences) are synthesized in parallel, one nucleotide at a time. We are interested in finding the shortest possible nucleotide deposition sequence to synthesize all oligos in order to reduce production time and increase oligo quality. Thus we study the shortest common super-sequence problem of several thousand short strings over a four-letter alphabet. RESULTS: We present a statistical analysis of the basic ALPHABET-LEFTMOST approximation algorithm, and propose several practical heuristics to reduce the length of the super-sequence. Our results show that it is hard to beat ALPHABET-LEFTMOST in the microarray production setting by more than 2 characters, but these savings can improve overall oligo quality by more than four percent. AVAILABILITY: Source code in C may be obtained by contacting the author, or from http://oligos.molgen.mpg.de.
MOTIVATION: During microarray production, several thousands of oligonucleotides (short DNA sequences) are synthesized in parallel, one nucleotide at a time. We are interested in finding the shortest possible nucleotide deposition sequence to synthesize all oligos in order to reduce production time and increase oligo quality. Thus we study the shortest common super-sequence problem of several thousand short strings over a four-letter alphabet. RESULTS: We present a statistical analysis of the basic ALPHABET-LEFTMOST approximation algorithm, and propose several practical heuristics to reduce the length of the super-sequence. Our results show that it is hard to beat ALPHABET-LEFTMOST in the microarray production setting by more than 2 characters, but these savings can improve overall oligo quality by more than four percent. AVAILABILITY: Source code in C may be obtained by contacting the author, or from http://oligos.molgen.mpg.de.