Literature DB >> 30065782

Sorting signed circular permutations by super short operations.

Andre R Oliveira¹, Guillaume Fertin², Ulisses Dias³, Zanoni Dias¹.

Abstract

BACKGROUND: One way to estimate the evolutionary distance between two given genomes is to determine the minimum number of large-scale mutations, or genome rearrangements, that are necessary to transform one into the other. In this context, genomes can be represented as ordered sequences of genes, each gene being represented by a signed integer. If no gene is repeated, genomes are thus modeled as signed permutations of the form π=(π1π2…πn) , and in that case we can consider without loss of generality that one of them is the identity permutation ιn=(12…n) , and that we just need to sort the other (i.e., transform it into ιn ). The most studied genome rearrangement events are reversals, where a segment of the genome is reversed and reincorporated at the same location; and transpositions, where two consecutive segments are exchanged. Many variants, e.g., combining different types of (possibly constrained) rearrangements, have been proposed in the literature. One of them considers that the number of genes involved, in a reversal or a transposition, is never greater than two, which is known as the problem of sorting by super short operations (or SSOs). RESULTS AND
CONCLUSIONS: All problems considering SSOs in permutations have been shown to be in P , except for one, namely sorting signed circular permutations by super short reversals and super short transpositions. Here we fill this gap by introducing a new graph structure called cyclic permutation graph and providing a series of intermediate results, which allows us to design a polynomial algorithm for sorting signed circular permutations by super short reversals and super short transpositions.

Entities: Chemical

Keywords: Circular permutations; Genome rearrangements; Super short operations

Year: 2018 PMID： 30065782 PMCID： PMC6060566 DOI： 10.1186/s13015-018-0131-6

Source DB: PubMed Journal: Algorithms Mol Biol ISSN： 1748-7188 Impact factor: 1.405

Background

In bioinformatics, comparative genomics consists in analyzing the contents of two (or more) genomes in order to extract information. In particular, estimating the evolutionary distance between two extant species can be achieved by counting the minimum number of large-scale evolution events (called genome rearrangements) that separate two genomes. This is usually modeled as the following algorithmic problem: given two genomes and represented as ordered sequences of (possibly signed) genes, and a set of allowed genome rearrangement events, determine the minimum number of events from needed to obtain from The first such genome rearrangement problems have been studied in the 1990s, and the topic has given rise to a very large literature since then (see, for example, Fertin et al. [1] for a survey). Two well-studied rearrangements are reversals, in which a segment of the genome is reversed and reincorporated at the same location, and transpositions, where two consecutive segments are exchanged. If every gene appears exactly once in (resp. ), a genome can be represented by a (possibly signed) permutation, and one can without loss of generality rewrite the two input genomes into , leaving the distance unchanged, and in such a way that is the positive identity permutation , where n is the number of genes in and . In that case, we talk about sorting genome Sorting by reversals has been shown to be in for signed genomes [2] and to be -hard for unsigned genomes [3], where the best known approximation factor is 1.375 [4]. Sorting by transpositions is -hard in unsigned genomes [5], and the best known approximation factor is 1.375 [6]. Sorting by reversals and transpositions is of unknown complexity both for signed and unsigned permutations, and the best known approximation factors is 2 for signed permutations [7] and 2k for unsigned permutations [8], where k is the approximation of the algorithm used for cycle decomposition [9]. Many other variants have been considered, notably considering different combinations and constraints for the set of allowed rearrangements [1]. One of these variants considers that the number of genes involved in any rearrangement in is never greater than two—such rearrangements are called super short operations (or SSOs). Although such models are of more theoretical interest, they are also motivated by the fact that rearrangements affecting large portions of a genome are less likely to occur [10] and that short reversals are prevalent in the evolution of some species [11, 12]. Sorting by SSOs has been studied in linear and circular genomes, signed and unsigned, when the allowed operations are reversals and/or transpositions. To cover circular genomes adequately, we will define in “Genome representation and super short operations” section cyclic SSOs, a particular type of SSOs which modify the permutation cyclically. On (a) unsigned permutations, we have that a super short reversal has the same effect of a super short transposition, which results in two different versions: (a.1) Sorting Permutations by SSOs and (a.2) Sorting Permutations by cyclic SSOs. Besides, as we will see in the next section, since transpositions cannot change the signs of elements, we do not use transpositions only on (b) signed permutations, so this operation must be used together with super short reversals. This results in four different problems: (b.1) Sorting Signed Permutations by Super Short Reversals, (b.2) Sorting Signed Permutations by cyclic Super Short Reversals, (b.3) Sorting Signed Permutations by SSOs, and (b.4) Sorting Signed Permutations by cyclic SSOs. The summary of problems and the known results until now are shown in Table 1. In all cases the problem has been shown to be in , except for the latter case, which was left open, and which we solve in this paper. More precisely, we prove that sorting signed circular permutations by super short reversals and transpositions is in , thereby closing a gap in the literature concerning super short operations.

Table 1

List of sorting by SSOs and cyclic SSOs problems

Permutation type	Allowed genome rearrangement events	Polynomial-time algorithm
Unsigned	(a.1) Super short operations	[15]
Unsigned	(a.2) Cyclic super short operations	[13]
Signed	(b.1) Super short reversals	[16]
Signed	(b.2) Super short operations	[16]
Signed	(b.3) Cyclic super short reversals	[14]
Signed	(b.4) Cyclic super short operations	Here

List of sorting by SSOs and cyclic SSOs problems This paper is organized as follows. “Preliminaries and notations” section presents some important concepts and notations that we use throughout the paper, notably the cp-graph that we introduce here and extensively use. “Related results” section presents a review on Sorting Permutations by SSOs and cyclic SSOs. In “Sorting Signed Permutations by cyclic SSOs” section, we provide a series of intermediate results, which allows us to design a polynomial algorithm for sorting signed linear permutations by cyclic super short reversals and transpositions. From this, we derive our main result, i.e., a proof that sorting signed circular permutations by super short reversals and transpositions is in . “Conclusion” section concludes the paper.

Preliminaries and notations

In this section we present the important concepts and notations that we use throughout the paper.

Genome representation and super short operations

A genome g can be transformed into a reduced mathematical representation by modeling it as an n-tuple whose elements represent its genes. In this paper, we assume that g contains no duplicated genes, thus the n-tuple is a permutation , with and whenever . Each element has a sign, or −, indicating the gene orientation, and we say that is a signed permutation. If the genome represented by is circular, then the elements and are considered to be adjacent and we say that is a circular permutation (also called n-cycle in the literature); otherwise is a linear permutation. Given two permutations and , the composition between and , denoted by results in the permutation . If , then , and otherwise. The inverse of , denoted by , is the permutation such that . That said, we can rewrite a pair of permutations as , with , such that the distance between permutations and is the same as the distance between permutations and . For example, if and , we have that , so the distance between permutations and is the same as the distance between and A reversal , , is a rearrangement that reverses the order and signs of the genes in the subset of adjacent elements . More precisely, it transforms the permutation into . A cyclic reversal is the extension of a reversal to the case where : if , whereas if , then the subset reversed by is . A reversal (resp. cyclic reversal) (resp. ) is called a z-reversal, where . We say that a z-reversal is super short if A transposition , , is a rearrangement that transforms into . In other words, exchanges subsets of adjacent elements and Note that, since these subsets are not reversed, transpositions never change signs. As for reversals, a cyclic transposition is the extension of transpositions to the cases (a) (in which subsets and are exchanged), and (b) (in which subsets and are exchanged). A transposition (resp. cyclic transposition) (resp. ) is called a z-transposition, where with and , and we say that a z-transposition is super short if In the remainder of the paper, we will call a super short operation (or SSO) any 1-reversal, 2-reversal, or 2-transposition; moreover, any SSO that is not a 1-reversal will be called a swap, and the swap between elements and is denoted by Given a permutation , the sorting distance of , denoted by , is the length of a minimum-length sequence of SSOs needed to sort . This paper is devoted to finding the smallest number of SSOs that are needed to sort a signed circular permutation of size n using for that a polynomial algorithm designed for sorting signed linear permutations by cyclic super short reversals and transpositions.

VD-vector, crossing value, and crossing number

Most of the present paper will be concerned with cyclic SSOs in linear permutations. This section introduces a structure (called valid displacement vector) that allows us to compute the minimum number of cyclic swaps (i.e., 2-reversals or 2-transpositions) that put every element in its correct position (this number is called crossing number). It is important to note that this structure does not take into account the signs of the elements mainly for two reasons: (i) it does not make a distinction between 2-reversals and 2-transpositions (both are swaps), and (ii) it does not take into account 1-reversals, since they are not swaps by definition. Given a sequence of k cyclic SSOs that sort a linear permutation ( is also called a sorting sequence for ), and given , we denote by (resp. ) the number of cyclic SSOs in that move to the right (resp. to the left). For any , the displacement value of with respect to is given by , and the displacement vector of associated to is For instance, let us consider the permutation and the sequence of cyclic SSOs that sorts . The sequence results in the following sequence of swaps: Note that and , so . One can also see that Let be a displacement vector and let be a permutation. We say that X is a valid displacement vector (or VD-vector) for if (i.e., for each element of that moves one position to the right, another must move one position to the left and vice versa) and for (i.e., every element must be in its correct position at the end). For instance, is a VD-vector for since and for Given a VD-vector and two distinct integers , let and Note that r measures how distant the element is from in (if (resp. ) then is located in a position to the left (resp. right) of ), while s measures how distant the element will be from in their final positions, i.e., positions for and for . The crossing value between i and j, , with respect to X is defined as follows:In other words, represents the minimum number of times and are swapped (by a 2-reversal or a 2-transposition) in , a sorting sequence X is associated to. The sign of is positive (resp. negative) if is to the left (resp. right) of before the swap between these two elements takes place. For this reason, is undefined, and for any . Besides, for If X is a VD-vector associated to some sorting sequence for , we say that X induces a swap between two elements and if and . We also say that is induced by X. Given any VD-vector X for , there exists at least one sorting sequence such that . For instance, we can use 2-transpositions to apply the swaps induced by X (that will put every element in its correct position) followed by a sequence of 1-reversals applied to every negative element. The crossing number of a VD-vector X is defined as . Informally, represents the minimum number of swaps in the sorting sequence X is associated to. Take again and the VD-vector . Given and , we have that and , and we have the crossing value . Now for and , we have that and , and we have . After computing every crossing value, we obtain the crossing number Given a VD-vector X for a permutation , we denote by the size of a minimum-length sequence of SSOs needed to sort by applying swaps induced by X. Formally, , where y is the minimum number of 1-reversals over all sorting sequences having associated VD-vector X. Given two integers , we define the transformation over a VD-vector as the one that creates the VD-vector with , for , , and After such transformation is applied, each crossing value of the form and is one unit smaller than and , respectively, and each crossing value of the form and is one unit larger than and , respectively, with , , and . Besides, (resp. ) is two units larger (resp. smaller) than (resp. The following property was given by Jerrum [13]. Note that the author mistakenly wrote , which was later corrected by Galvão et al. [14].

Property 1

Let be a VD-vector for , and let . Then A transformation is called contracting (resp. strictly contracting) if and only if (resp. ), which implies by Property 1 that (resp. ). If a VD-vector X admits no strictly contracting transformation, we have that for any , and thus for any VD-vector Y, Given the VD-vector for , we obtain the vector with and , so Note that is also a VD-vector for , which means there exists a sorting sequence for such that . By Property 1,

Cyclic permutation graph

Since a VD-vector does not take into account the signs of the elements, we will introduce a new graph structure called cyclic permutation graph. This graph is constructed based on VD-vectors, and will help us to determine the minimum number of SSOs that sorts a permutation (now taking into account the signs of the elements and also considering 1-reversals). Given a VD-vector X for a permutation , we define the cyclic permutation graph (or cp-graph) of X and , as the undirected graph , with and We associate weights to edges of as follows: if and , then . Note that, by construction, we have . If every edge satisfies , then for any vertex has at least edges (since ), and thus the connected component that contains has at least vertices. We denote by the number of connected components of . Moreover, a connected component of is said to be odd if it contains an odd number of vertices such that , and is said to be even otherwise. The number of odd connected components in is denoted Let be a permutation, be a VD-vector for , and be a cyclic SSO induced by X (i.e., is a 2-reversal or a 2-transposition) applied to two adjacent elements and of . The resulting VD-vector for is such that for every , and . Moreover, , and the cp-graph can be obtained from by decreasing the weight of the edge between vertices and by 1 (or by removing that edge if its previous weight was one). For instance, take again , , and . We can apply the 2-reversal that is induced by to obtain the permutation and the VD-vector . The corresponding cp-graphs , , and are given in Fig. 1. In Fig. 1a we have and ; in Fig. 1b we have , , and ; in Fig. 1c we have , , and .

Fig. 1

In a we have the cp-graph for and , with , , and . In b we have the cp-graph for . By Property 1 (recall that is also the sum of weights in the graph ), , and . We can see in and that the 2-reversal is induced by but not by X; c the cp-graph for and its VD-vector with , , and

Related results

In this section, we provide related results of solving Sorting Permutations by SSOs and cyclic SSOs problems.

Sorting by SSOs

Given a permutation , a pair of elements is called an inversion if and , with and . Let be the number of inversions in . Knuth [15, p. 108] showed in 1973 that Sorting Unsigned Permutations by SSOs belongs to and that the sorting distance for this version is . For instance, taking the unsigned permutation , we have that so it follows that . The number of inversions is a natural lower bound for Sorting Signed Permutations by SSOs: for any signed permutation , . Galvão et al. [16] proved in 2015 that Sorting Signed Permutations by Super Short Operations is in . Let be the inversion graph of the signed permutation . is such that is formed by the elements of and is formed by the pairs of inversions in . A component in is odd if it contains an odd number of negative elements, and it is even otherwise. The authors showed that there exists a minimum sorting sequence for this problem that uses swaps plus k 1-reversals, such that k is the number of odd components in . For instance, taking the signed permutation , we have that has two components: an odd component with the element only, and an even component with the remaining elements, so it follows that . Note that the sorting distance for is decreased by two compared to the version that allows only super short reversals.

Sorting by cyclic SSOs

Jerrum [13] showed in 1985 that Sorting Unsigned Permutations by cyclic SSOs belongs to . The author proved that the sorting distance for this version is is a VD-vector for . Take the permutation again. We have that , with , is a VD-vector for . Besides, for any VD-vector for , , so it follows that . Note that the sorting distance for decreases from 10 to 6 by allowing cyclic SSOs. In 2016, Galvão et al. [14] proved that Sorting Signed Permutations by cyclic super short reversals is also in . Given a signed permutation and a VD-vector X, let be the set of elements from such that is even and , and let be the set of elements from such that is odd and . In a similar way as in Sorting Signed Permutations by SSOs, the authors proved that, given any VD-vector X for such that X has the minimum crossing number over all VD-vectors for , the sorting distance of is precisely , where k is the number of elements in . Taking and (recall that any VD-vector for is such that ), we have , , and , so it follows that . Compared to the version that does not allow cyclic super short reversals, the sorting distance for decreases from 13 to 11. For Sorting Signed Permutations by cyclic SSOs, a trivial lower bound comes from the unsigned version with cyclic SSOs: is a VD-vector for . Inspired by , we defined in “Cyclic permutation graph” section the cp-graph, creating edges according to the crossing values of a VD-vector X instead of inversions. Although these graphs are different, the classification of odd and even components is the same and will be useful later. Note that in all previous problems showed in this section the sorting distance is always associated with the minimum number of inversions or the minimum crossing number. What makes Sorting Signed Permutations by cyclic SSOs not trivial is that, as we will see later, unlike all previous problems a minimum sorting sequence is not necessarily associated to a VD-vector with minimum crossing number (see Fig. 5 for an example).

Fig. 5

Given , a and b show the two cp-graphs and for VD-vectors and . and are the two VD-vectors with minimum crossing number (i.e., ), so . Note that (resp. ) can be obtained from (resp. ) by (resp. ), so Algorithm 1 will generate both VD-vectors, starting either with or . Note that , so . In c we have the cp-graph for VD-vector with , so , but since . Note that , so . Among all VD-vectors in , is in fact the VD-vector that minimizes the sum and it follows that

Sorting linear permutations by cyclic SSOs vs. sorting circular permutations by SSOs

Note that, although sorting linear permutations by cyclic SSOs and sorting circular permutations by SSOs are different problems, we can use the first to solve the latter. Just as an example, the permutation has a sorting distance of 8, considering the model that only allows SSOs, and it has a sorting distance of 4 considering the model that allows cyclic SSOs. But if is circular, then is also a linear representation for , since it respects all adjacencies between elements. This linear representation has a sorting distance of 2 for the model that only allows SSOs and also in the model that allows cyclic SSOs. Besides, is, in fact, the linear representation for the circular permutation with the lowest sorting distance by cyclic SSOs, so it follows that the sorting distance of the circular permutation is 2. A more detailed explanation of how to use a linear model to solve circular permutations will be given at the end of “Sorting Signed Permutations by cyclic SSOs” section.

Sorting Signed Permutations by cyclic SSOs

This section is devoted to proving our two main results, namely the fact that sorting signed linear permutations by cyclic SSOs is in (Theorem 1), and, consequently, that sorting signed circular permutations by SSOs is also in (Theorem 2). For this, we study in depth (and provide properties of) sorting signed linear permutations by cyclic SSOs, which heavily rely on the cp-graph we introduced in “Preliminaries and notations” section.

Properties of VD-vectors

Before we provide a series of lemmas that will lead to our final algorithm, we begin with the three following properties, which will prove useful in this section. In Property 2 we will show that if a VD-vector X has a displacement value whose absolute value is greater than or equal to n, then there is a crossing value (in absolute value) greater than one. In Property 3 we will show that if a VD-vector X has a crossing value greater than zero, then elements , , must be in the same component in its corresponding cp-graph. Property 4 is an extension of Property 3, where we show that if a VD-vector X has a crossing value (in absolute value) greater than one, then all elements are in the same component in its corresponding cp-graph.

Property 2

Let be a VD-vector for . If there exists such that , then there exists such that

Proof

Let be a VD-vector for , and let us suppose that there exists such that . We know, by definition, that . Since there are crossing values of the form (one for each ), this necessarily implies that for some .

Property 3

Let be a VD-vector for . If (resp. ) for some , then, for any (resp. ), we have and Let X be a VD-vector such that for two elements and , , . Since , let us suppose, without loss of generality, that with . Let and . Since is positive we have, by definition of crossing value, that . Suppose first that . Since and , then , otherwise . We have that . Suppose that we have an element with such that . For , we have that , so , otherwise . It follows that , and, since , we have that . For , we have that , so , otherwise . It follows that , which is a contradiction to the fact that and , so we conclude that . Now let us suppose . In this case, we can split the interval that goes from i to j into . Since and , then , otherwise . It follows that . Suppose that we have an element such that and . We have to consider two cases: when (i.e., ) and when (i.e., ):In all cases, it follows that , thus . : in this case, for both and , we have that and , so and , otherwise . For and , we have , resulting in . Since , we have that . For and , we have , resulting in , which is a contradiction to the fact that and . : in this case, for both and , we have that and , so and , otherwise . For and , we have , resulting in . Since , we have that . For and , we have , resulting in , which is a contradiction to the fact that and .

Property 4

Let be a VD-vector for . If there exists a such that then Let X be a VD-vector for some permutation such that for some . Let us suppose, without loss of generality, that (recall that by definition), and for readability, let . Let and . Since is positive we have, by definition of crossing value, that . Let us now consider two cases: either , or . Suppose first . In this case, is such that . Since , we have that (we use () because , so and ), otherwise . It follows that . Now suppose that we have an element such that . We have three cases:Now let us consider . In this case, is such that . Since , we have that , otherwise . It follows that . Now suppose that we have an element such that . We also have three cases:It follows that if , then, for any element with , we have that , so . Since , we also have that , so . : in this case, and , so we must have that and , with and . For , we have , and since we have that . For , we have , but this is not possible since . : in this case, and , so and , with and . For , we have , and since we have that . For , we have , but this is not possible since . : in this case, and , so and , with and . For , we have , and since we have that . For , we have , but this is not possible since . : in this case, and , so we must have that and , with and . For , we have , and, since , we have that . For , we have , but this is not possible since . : in this case, and , so and , with and . For , we have , and, since , we have that . For , we have , but this is not possible since . : in this case, and , so and , with and . For , we have , and, since , we have that . For , we have , but this is not possible since .

SSOs and the cp-graphs

In this section we provide five lemmas relating SSOs with cp-graphs. In Lemma 1 (resp. Lemma 2) we will analyze the difference in the number of odd components in the cp-graph when we apply a 1-reversal (resp. a swap, i.e., a 2-reversal or a 2-transposition) to the permutation . In Lemma 3 we will show that we can always apply a swap induced by X without increasing the number of odd components in the resulting cp-graph. Let denote the length of a sorting sequence . In Lemma 4 (resp. Lemma 5) we will show that if a sorting sequence has an SSO that increases the number of odd components (resp. the weight of an edge), then there is another sorting sequence with such that does not contain such SSOs.

Lemma 1

Let X and be two VD-vectors of such that X is a VD-vector for and is a VD-vector for , where is a cyclic SSO. If is a 1-reversal, then , and If is a 1-reversal, we have that for every , which implies that , , and . Now if the connected component impacted by in is even (resp. odd), then it will become odd (resp. even) in , thus .

Lemma 2

Let X and be two VD-vectors of such that X is a VD-vector for and is a VD-vector for , where is a cyclic SSO. If is a 2-reversal or a 2-transposition induced by X in , then and either and or and . Since is a cyclic SSO induced by X, the crossing values between elements and impacted by are different from zero, which implies that . By definition, in this edge either decreases its weight by one or is removed, so . Suppose first . This means that the SSO applied to leaves the connected component to which it is applied in unchanged. If the SSO is a 2-reversal (resp. a 2-transposition), two (resp. zero) elements inside have changed sign. In both cases, we have that . Now let us suppose . This means that has been split into two connected components and , thus . If the SSO is a 2-transposition, zero elements of have changed sign. If the SSO is a 2-reversal, two elements changed sign such that one element is in and the other is in . In both cases, if is odd then and have distinct parities, and ; if is even then and have the same parity, and . At this point, we know by Lemma 1 that a 1-reversal always increases or decreases the number of odd components by one, and by Lemma 2 that a 2-reversal or a 2-transposition induced by X can only increase by two or leave the number of odd components unchanged in the cp-graph.

Lemma 3

Let be a VD-vector for . If , it is always possible to find a cyclic SSO induced by X such that is a VD-vector for and Let be a swap induced by X (recall that by definition, is either a 2-reversal or a 2-transposition), and let . Note that, since is induced by X, applying it to necessarily decreases by one unit the weight of an edge from in , thus and . If , then, by Lemma 2, we know that and we are done. Otherwise, we necessarily have . As shown in the proof of Lemma 2, if the component impacted by in is odd, we know that and we are done. Now suppose that the component impacted by in is even. Let us consider the two components obtained from after is applied. If both components are even, then trivially and we are done again. Finally, if both components are odd, then we can replace by , where (i) acts on the same elements of as , and (ii) is a 2-transposition (resp. a 2-reversal) if is a 2-reversal (resp. a 2-transposition). Note that is induced by X, and that applying also yields two connected components on . Moreover, since 2-reversals change signs while 2-transpositions do not, the two components obtained in the new cp-graph after is applied on are both even. Thus and is the sought SSO.

Lemma 4

Let be a sequence of cyclic SSOs that sorts a permutation , and let be its associated VD-vector. If is a minimum-length sequence of all sorting sequences induced by X, then does not contain SSOs that increase the number of odd components. We will prove the following: if a sorting sequence for , of VD-vector X, contains a cyclic SSO that increases the number of odd components at some point in , then we can always find an alternate sorting sequence for , also with associated VD-vector X, that contains no cyclic SSO that increases the number of odd components, and such that . Note that, in order to sort a permutation, we need to end up with a cp-graph with n even components. From Lemmas 1 and 2, we have that only 1-reversals can decrease the number of odd components. Then, contains at least -reversals. Suppose that is a minimum-length sequence that sorts , and that has an SSO that increases the number of odd components. If is a 1-reversal, then it is necessarily applied to an even component. Thus, the total number of 1-reversals of must be greater than or equal to . In that case, let , where is a 1-reversal applied to the odd component created by . Note that we apply the same sequence of swaps in and , so both sequences are induced by X. If is a 2-reversal (resp. a 2-transposition), then, as shown in the proof of Lemma 2, it is necessarily applied to an even component , transforming it into two odd components. Thus, the total number of 1-reversals in is greater than or equal to . Let be the sequence obtained from by changing into the 2-transposition (resp. 2-reversal) acting on the same elements as , and by removing the two 1-reversals applied to the odd components created by . Because has been transformed into a 2-transposition (resp. a 2-reversal), it now creates two even components from . Note that we apply the same sequence of swaps in and (in this case they differ only at the type of swap but it uses the same pair of elements), so both sequences are induced by X. In the above cases, the new sequence is also a sorting sequence for , and of length , a contradiction to the fact that is of minimum length. Thus does not contain SSOs that increase the number of odd components.

Lemma 5

Given a permutation , let be a sequence of cyclic SSOs that sorts , and let be its associated VD-vector. If is a minimum-length sequence of all sorting sequences induced by X, then only uses cyclic SSOs that do not increase the edge weights in We will prove the following: if a sorting sequence for , of VD-vector X, contains a cyclic SSO that increases the weight of an edge e at some point in , then we can always find an alternate sorting sequence for , also with associated VD-vector X, that contains no cyclic SSO that increases the weight of an edge, and such that . Suppose, without loss of generality, that a cyclic SSO in increases the weight of an edge e in the cp-graph, and consider the first such SSO, say . Note that, since 1-reversals do not change the cp-graph, is necessarily a 2-reversal or a 2-transposition. Note also that since this swap increases the weight of an edge, it is not induced by X. If is applied to two elements in the same component, then and , with . Otherwise, is merging two components, say A and B, and . If both components are odd, then the resultant component will be even, so , and otherwise. Since we increase the weight of e, then . It follows that . Let be the operation that decreases the weight of e at some point during the sorting such that , where is the permutation with all operations before in the sorting sequence applied. By Lemma 3 we have that decreases the crossing number by one unit and keeps the same number of odd components so . It follows that both and are decreasing the sum of crossing number and odd components by one unit at most. Let . If merged two odd components A and B, we add at the beginning of two 1-reversals: one applied to any element , the other to any . As shown in the proof of Lemma 1, each 1-reversal here decreases the number of odd components by exactly one unit and keeps the same crossing number. It follows that the newly built sequence is not longer than , sorts , and uses cyclic SSOs which never increase the weight of edges in the cp-graph, so the only swaps it contains are induced by X.

A polynomial-time algorithm for Sorting Signed Permutations by SSOs

In this section, we first provide a closed formula for computing the length of a sorting sequence of cyclic SSOs for signed linear permutations based on its associated VD-vector X. Then, we provide a polynomial-time algorithm for sorting signed circular permutations by SSOs.

Lemma 6

Let be a minimum-length sequence of cyclic SSOs that sorts a signed linear permutation , and let X be its associated VD-vector. Then Let us partition into two sequences and in which (resp. ) contains all 1-reversals (resp. swaps) of . In addition, since 1-reversals do not modify the order of elements in the permutation, we can assume, without loss of generality, that the swaps of are applied first. We will show that . To see this, suppose that we apply a swap (i.e., a 2-reversal or a 2-transposition) of in , obtaining a permutation , and let , and its associated VD-vector. Then, by Lemma 3, we know that . In addition, by Lemma 4, the number of odd components is not increased by and, by Lemma 1, can be reduced only by 1-reversals. Note that the sum of weights of edges in is , and, by Lemma 2, applying any cyclic SSO either increases or decreases this sum by one unit, thus . By Lemma 5, we can assume that contains no cyclic SSO that increases the weight of an edge, so it follows that . Lemma 6 shows us that the problem of sorting a signed permutation by cyclic SSOs is equivalent to the following optimization problem: find a VD-vector for which minimizes . We will now prove that finding such a VD-vector can be achieved in polynomial time. First, we will introduce Lemma 7, where we show that if a VD-vector X has a cp-graph with only one component, then any VD-vector with has . Then, in Lemma 8, we will prove that any VD-vector such that necessarily belongs to one of two sets that we will define. Finally, we will show in Theorem 1 that we can find a VD-vector such that in polynomial time.

Lemma 7

Consider two VD-vectors X and of for a signed linear permutation , such that . If , then We know, by Lemma 3, that we can apply induced swaps in while keeping odd components. Using Lemmas 4 and 5, we have that . Using the same argument, we have that . Since , then all elements from are in the same component, and (resp. ) if there is an odd (resp. even) number of negative elements in , and any is such that (if has an odd number of negative elements then for any VD-vector there is at least one odd component in ). Since we also have that , it follows that , and the lemma follows. Let , i.e., is the minimum crossing number over all VD-vectors for .

Lemma 8

Let S be the set of all VD-vectors such that , and let be the set of all VD-vectors such that , for some with . Then there exists a VD-vector such that Recall that for any VD-vector there is a strictly contracting transformation that we can apply. Consider a sequence of strictly contracting transformations applied to until we reach a VD-vector . Here we will prove by contradiction that there is always a VD-vector in this sequence such that and , i.e., every VD-vector outside has . If there is a VD-vector such that , then, by Lemma 7, for any such that , we have , and it follows that the VD-vector with is such that . Now suppose that a VD-vector is such that . Since , X does not admit a strictly contracting transformation, so, for any two distinct , we have that . Let be a VD-vector such that for some , and let be a VD-vector such that for some . As we can see in Fig. 2, if and then and . If (resp. ), then (resp. ) and .

Fig. 2

Transformation flow starting with a VD-vector X. Note that applying a second transformation at the same positions from the first transformation but reversed results again in X (see the transformations where both indices are in red). Note also that every VD-vector obtained from X (in gray) can be transformed into two different VD-vectors (that are also obtained from X) when we use one of the two indices from the first transformation from X (see the transformations from gray VD-vectors where one of the indices is in red). The VD-vector can be obtained from the four VD-vectors in gray but not from X. Supposing , if all VD-vectors in gray are in , then . If one (or more) VD-vector in gray is also in S, then it follows that Suppose now that , , and is such that . Since , then and , and we have that and . This means that, as shown in Fig. 2, can be obtained by four different transformations of VD-vectors from : , , , and , such that , , , and . The VD-vectors , and are in , and we say they are adjacent to . Recall that, since and , we have that , , , and , otherwise at least one VD-vector between , , , and would be generated by a contracting transformation and is in S, and as a consequence . It follows that . Let us suppose that for any VD-vector adjacent to we have . Using Lemma 7 we have that values , , , and must be strictly greater than 1, otherwise . This observation implies that values , , , and are all equal to 1 (recall that, by definition of transformation, if , then ), otherwise, in at least one of the four VD-vectors in adjacent to there is a crossing value (in absolute value) greater than one, and, by Property 4, the corresponding cp-graph would have only one connected component. Hence, we conclude that the four elements , and are in the same component in . For the same reason, we have that and , otherwise at least one of the four VD-vectors in contains a displacement value involved in the transformation whose absolute value is greater than or equal to n (recall that, by definition of transformation, if , then and ), which implies, by Property 2, that this VD-vector has a crossing value with absolute value greater than 1 and, by Property 4, that the cp-graph of this VD-vector has only one component. Now we argue that cannot be a vector such that . Note that since we know, by Property 3, that all elements with must be in the same component. Suppose, without loss of generality, that and . We show in Fig. 3 all the possible configurations for these intervals, depending on their relative positions. We can see that we always have either (i) a VD-vector X with only one component, so by Lemma 7 (Fig. 3a–f), or (ii) a VD-vector obtained from X with only one component, so by Lemma 7 (Fig. 3g–j), and it follows that . Thus the VD-vector with necessarily satisfies .

Fig. 3

a–f The six possible configurations for VD-vector , which we call . For VD-vectors and , the union of intervals, highlighted in gray, contains all the elements, thus . For VD-vectors to , the union of intervals does not necessarily contain all elements (cf. white regions), but for each of these configurations there exists a VD-vector obtained by transforming X, shown in g–j, in which the union of intervals, highlighted in gray, also contains all elements, i.e., . Note that in g , in h , in i , and in j

Theorem 1

Finding a VD-vector X for that minimizes can be achieved in polynomial time, and thus sorting signed linear permutations by cyclic SSOs is in Given a permutation , we first compute a VD-vector X (lines 1–3 of Algorithm 1), then iteratively apply strictly contracting transformations (lines 4–7) until none exists. Now let S be the set with all VD-vectors X such that . Jerrum [13] proved that (i) when no further strictly contracting transformations can be performed on a VD-vector X, we have that ; (ii) for any two VD-vectors X and such that , we can go from X to by a sequence of contracting transformations, i.e., we do not need to go through a VD-vector that is not in S. The above properties are in the context of VD-vectors for unsigned permutations, but since VD-vectors for signed permutations do not take into account the signs of the elements we have that (i) and (ii) also apply in our context. By (i), we have that (line 9), and, by (ii), we know that we can use X to generate the remaining VD-vectors that are also in S (lines 10–16). If there is a VD-vector such that then, by Lemma 7, we have that for any VD-vector , so and we can just return this value as the sorting distance (line 15). Otherwise, by Lemma 8 the VD-vector with satisfies , where is the set of all VD-vectors obtained by some , with and (instructions in lines 17–21 generate all the VD-vectors of ). Now we argue that the set of instructions in lines 10–16 of Algorithm 1 either provides all VD-vectors of S or at least a VD-vector such that . Note that if it provides all VD-vectors of S, we just need to generate and find the VD-vector with . Otherwise, if we have a VD-vector such that then, by Lemma 7, we have that, for any VD-vector , , so . Arguing by contradiction we show that this set of instructions always leads to one of these two situations. Suppose that the instructions in lines 10–16 do not provide all VD-vectors in S and that, for all VD-vectors X provided, we have . We have the following facts:Suppose now that we have a VD-vector such that (which implies that ), and suppose that cannot be obtained by one contracting transformation on X (i.e., Algorithm 1 cannot generate ). We will show that if requires at least two contracting transformations then X admits a contracting transformation using only indices from the first and the second transformation such that the resulting VD-vector has only one component, and, by Lemma 7, already has . Since, for any , , we must have, by Property 4, that , with . For a VD-vector such that with , if , then , which implies, by Property 1, that . Since, for any , , if we have and with and , then and (otherwise we have that or ). Note that, by definition of displacement value, . Since, for any , we have , it follows, by definition of cp-graph, that (resp. ) is located in a component with at least (resp. ) elements on . If then and must be in the same component, otherwise would have at least vertices. Besides, this component has at least vertices. For the same reason, if we have more than one distinct pair of elements such that , then all these pairs must be in the same component. Let us assume then, without loss of generality, that , where and are two distinct contracting transformations (i.e., with and ). In a similar way as explained in the proof of Lemma 8, this means that the VD-vector can be reached by four distinct pairs of transformations (namely , , , and ). However, by Property 1, and using the above mentioned facts, we can conclude that and , so these four intermediate VD-vectors between X and are also in S, since these transformations are contracting transformations. Since these four vectors are generated by Algorithm 1 and are in the set S, and since, for any , we assumed that , we have, by Property 4, that . Now we can use Fig. 3 again to show that, with the above properties and whatever the order in which elements , , , and appear, we always have at least one vector reachable from X by one contracting transformation such that , a contradiction to the fact that S has no VD-vector such that . It follows that Algorithm 1 provides either all VD-vectors or at least a VD-vector such that . Algorithm 1 presents the above mentioned procedure, which consists in finding a VD-vector X that minimizes , the latter value being the sought distance. We now turn to evaluating the computational complexity of Algorithm 1. Our primary goal is to ensure polynomiality of the algorithm, and our analysis can certainly be improved. The loop in lines 1–2, line 3, and the loop in lines 4–7 run in linear time each. Line 8 takes time to compute plus time to compute , resulting in time. Line 9 runs in linear time. The loop in lines 10–16 runs in time: it iterates in and line 12 runs in time. The loop in lines 17–21 runs in time: it iterates in and line 20 runs in time. The overall running time complexity of our algorithm is then . An example where Algorithm 1 does not generate all VD-vectors in S is given in Fig. 4, and an example where Algorithm 1 generates all VD-vectors in is given in Fig. 5.

Fig. 4

Given , we have in a the cp-graph for , in b the cp-graph for , in c the cp-graph for , in d the cp-graph for , in e the cp-graph for , and in f the cp-graph for . to are the six possible VD-vectors for with minimum crossing number (i.e., with ). Note that, by definition, they are in S, but Algorithm 1 will not generate all of them. For instance, if the algorithm starts S (at line 9) with , it will not generate (in loop at lines 10–16) since this VD-vector is reachable from using two transformations but not using only one (note that they differ in exactly four displacement values). As proved in Theorem 1, since Algorithm 1 is not capable of generating all VD-vectors in S, then there is at least one VD-vector generated in the path between and with only one component (in this example, the four intermediate VD-vectors , , , and that are generated by Algorithm 1 satisfy this condition) Given , a and b show the two cp-graphs and for VD-vectors and . and are the two VD-vectors with minimum crossing number (i.e., ), so . Note that (resp. ) can be obtained from (resp. ) by (resp. ), so Algorithm 1 will generate both VD-vectors, starting either with or . Note that , so . In c we have the cp-graph for VD-vector with , so , but since . Note that , so . Among all VD-vectors in , is in fact the VD-vector that minimizes the sum and it follows that Theorem 1 shows that computing the sorting distance for signed permutations, using cyclic SSOs, is in . From this, one can easily derive a polynomial-time algorithm for sorting signed circular permutations by SSOs: it suffices to cut the circular permutation (n different cuts are possible), and to decide which extremity is left and which is right (2 possible cases). Once this is done, we are left with a linear permutation, which can be sorted using cyclic SSOs using Algorithm 1. The sorting distance is the minimum value obtained by Algorithm 1 over the 2n possible linear permutations obtained from the circular one. Thus we have the following result.

Theorem 2

Sorting signed circular permutations by SSOs is in

Computing a sorting sequence

Algorithm 1 only returns the length of a minimum-length sorting sequence, not the sequence itself. However, we can easily provide a minimum-length sequence using the cp-graph , where is the input signed permutation and X is a VD-vector such that . For this, we iteratively remove each edge from by applying a swap over two adjacent elements connected by an edge, and choosing between a 2-reversal or a 2-transposition so that the number of odd components does not increase, as shown in Lemma 3. When has no more edges, we just need to apply -reversals over the remaining negative elements.

Conclusion

In this work, we presented a polynomial-time algorithm for sorting signed circular permutations by SSOs. This solution closes a gap in the literature concerning the use of SSOs to sort linear and circular permutations, considering both signed and unsigned versions. Some theoretical questions concerning SSOs and signed permutations remain open, such as diameter issues: what is the maximum distance over all permutations of size n? Another interesting question consists in refining the model by taking into account the sizes of the intergenic regions between genes, as was recently done for the classical DCJ distance [17-19]. In particular, sorting by DCJ becomes -hard when intergenic regions are considered, while it is in otherwise, and SSOs seem to be very well-suited for a similar study.

9 in total

1. Sorting Circular Permutations by Super Short Reversals.

Authors: Gustavo Rodrigues Galvao; Christian Baudet; Zanoni Dias
Journal: IEEE/ACM Trans Comput Biol Bioinform Date: 2016-01-07 Impact factor: 3.710

2. Efficient sorting of genomic permutations by translocation, inversion and block interchange.

Authors: Sophia Yancopoulos; Oliver Attie; Richard Friedberg
Journal: Bioinformatics Date: 2005-06-09 Impact factor: 6.937

3. A 1.375-approximation algorithm for sorting by transpositions.

Authors: Isaac Elias; Tzvika Hartman
Journal: IEEE/ACM Trans Comput Biol Bioinform Date: 2006 Oct-Dec Impact factor: 3.710

4. Parametric genome rearrangement.

Authors: M Blanchette; T Kunisawa; D Sankoff
Journal: Gene Date: 1996-06-12 Impact factor: 3.688

5. Algorithms for computing the double cut and join distance on both gene order and intergenic sizes.

Authors: Guillaume Fertin; Géraldine Jean; Eric Tannier
Journal: Algorithms Mol Biol Date: 2017-06-05 Impact factor: 1.405

6. Prevalence of small inversions in yeast gene order evolution.

Authors: C Seoighe; N Federspiel; T Jones; N Hansen; V Bivolarovic; R Surzycki; R Tamse; C Komp; L Huizar; R W Davis; S Scherer; E Tait; D J Shaw; D Harris; L Murphy; K Oliver; K Taylor; M A Rajandream; B G Barrell; K H Wolfe
Journal: Proc Natl Acad Sci U S A Date: 2000-12-19 Impact factor: 11.205