Gilles Didier1, Carito Guziolowski. 1. Institut de Mathématiques de Luminy, 163 avenue de Luminy, Case 907, 13288 Marseille Cedex 9, France. didier@iml.univ-mrs.fr
Abstract
BACKGROUND: We present the N-map method, a pairwise and asymmetrical approach which allows us to compare sequences by taking into account evolutionary events that produce shuffled, reversed or repeated elements. Basically, the optimal N-map of a sequence s over a sequence t is the best way of partitioning the first sequence into N parts and placing them, possibly complementary reversed, over the second sequence in order to maximize the sum of their gapless alignment scores. RESULTS: We introduce an algorithm computing an optimal N-map with time complexity O (|s| x |t| x N) using O (|s| x |t| x N) memory space. Among all the numbers of parts taken in a reasonable range, we select the value N for which the optimal N-map has the most significant score. To evaluate this significance, we study the empirical distributions of the scores of optimal N-maps and show that they can be approximated by normal distributions with a reasonable accuracy. We test the functionality of the approach over random sequences on which we apply artificial evolutionary events. PRACTICAL APPLICATION: The method is illustrated with four case studies of pairs of sequences involving non-standard evolutionary events.
pan class="abstract_title">BACKGROUND: We present the N-map method, a pairwise and asymmetrical approach which allows us to compan>re sequences by taking into account evolutionary events that produce shuffled, reversed or repeated elements. Basically, the optimal N-map of a sequence s over a sequence t is the best way of partitioning the first sequence into N parts and placing them, possibly complementary reversed, over the second sequence in order to maximize the sum of their gapless alignment scores. n>n class="abstract_title">RESULTS: We introduce an algorithm computing an optimal N-map with time complexity O (|s| x |t| x N) using O (|s| x |t| x N) memory space. Among all the numbers of parts taken in a reasonable range, we select the value N for which the optimal N-map has the most significant score. To evaluate this significance, we study the empirical distributions of the scores of optimal N-maps and show that they can be approximated by normal distributions with a reasonable accuracy. We test the functionality of the approach over random sequences on which we apply artificial evolutionary events. PRACTICAL APPLICATION: The method is illustrated with four case studies of pairs of sequences involving non-standard evolutionary events.