A Bolshoy1, I Ioshikhes, E N Trifonov. 1. Department of Membranes Research and Biophysics, Weizmann Institute of Science, Rehovot, Israel. bmbolsho@dapsas1.weizmann.ac.il
Abstract
MOTIVATION: A nucleosome DNA positioning pattern is known to be one of the weakest (highly degenerated) patterns. The alignment procedure that has been developed recently for the extraction of such a pattern is based on a statistical matching of the sequences, and its success depends on the pattern/background ratio in the individual sequences and in the generated pattern. The heuristic nature of the method and distinctive properties of the pattern bring up the question of efficiency and sensitivity in the procedure. This paper presents a method of verification for this multiple sequence alignment algorithm. RESULTS: To verify the applicability of the multiple alignment approach, we constructed a set of sequences carrying the hidden pattern. The pattern was presented by weak ('signal') oscillations of occurrences of AA and TT dinucleotides along otherwise random sequences. Only a few dinucleotides of any given 145 base long sequence would correspond to the signal, appearing in about the same phase within the simulated periodic pattern. The novelty of our simulation approach is that we simulated a database as a whole, as opposed to simulating each sequence separately. The correlation between the hidden pattern and a sequence from the database is negligible on average, but our statistical multicycle alignment procedure produced the pattern with attributes very close to the simulated ones. The accuracy of the procedure was tested and calibrated. The presence in a typical sequence of as little as three dinucleotides corresponding to the signal is sufficient to generate (detect) the pattern hidden in a collection of 204 sequences.
MOTIVATION: A nucleosome DNA positioning pattern is known to be one of the weakest (highly degenerated) patterns. The alignment procedure that has been developed recently for the extraction of such a pattern is based on a statistical matching of the sequences, and its success depends on the pattern/background ratio in the individual sequences and in the generated pattern. The heuristic nature of the method and distinctive properties of the pattern bring up the question of efficiency and sensitivity in the procedure. This paper presents a method of verification for this multiple sequence alignment algorithm. RESULTS: To verify the applicability of the multiple alignment approach, we constructed a set of sequences carrying the hidden pattern. The pattern was presented by weak ('signal') oscillations of occurrences of AA and TT dinucleotides along otherwise random sequences. Only a few dinucleotides of any given 145 base long sequence would correspond to the signal, appearing in about the same phase within the simulated periodic pattern. The novelty of our simulation approach is that we simulated a database as a whole, as opposed to simulating each sequence separately. The correlation between the hidden pattern and a sequence from the database is negligible on average, but our statistical multicycle alignment procedure produced the pattern with attributes very close to the simulated ones. The accuracy of the procedure was tested and calibrated. The presence in a typical sequence of as little as three dinucleotides corresponding to the signal is sufficient to generate (detect) the pattern hidden in a collection of 204 sequences.
Authors: Noam Kaplan; Irene K Moore; Yvonne Fondufe-Mittendorf; Andrea J Gossett; Desiree Tillo; Yair Field; Emily M LeProust; Timothy R Hughes; Jason D Lieb; Jonathan Widom; Eran Segal Journal: Nature Date: 2008-12-17 Impact factor: 49.962
Authors: Travis N Mavrich; Ilya P Ioshikhes; Bryan J Venters; Cizhong Jiang; Lynn P Tomsho; Ji Qi; Stephan C Schuster; Istvan Albert; B Franklin Pugh Journal: Genome Res Date: 2008-06-12 Impact factor: 9.043