Literature DB >> 29869043

Quarnet Inference Rules for Level-1 Networks.

Katharina T Huber¹, Vincent Moulton², Charles Semple³, Taoyang Wu².

Abstract

An important problem in phylogenetics is the construction of phylogenetic trees. One way to approach this problem, known as the supertree method, involves inferring a phylogenetic tree with leaves consisting of a set X of species from a collection of trees, each having leaf-set some subset of X. In the 1980s, Colonius and Schulze gave certain inference rules for deciding when a collection of 4-leaved trees, one for each 4-element subset of X, can be simultaneously displayed by a single supertree with leaf-set X. Recently, it has become of interest to extend this and related results to phylogenetic networks. These are a generalization of phylogenetic trees which can be used to represent reticulate evolution (where species can come together to form a new species). It has recently been shown that a certain type of phylogenetic network, called a (unrooted) level-1 network, can essentially be constructed from 4-leaved trees. However, the problem of providing appropriate inference rules for such networks remains unresolved. Here, we show that by considering 4-leaved networks, called quarnets, as opposed to 4-leaved trees, it is possible to provide such rules. In particular, we show that these rules can be used to characterize when a collection of quarnets, one for each 4-element subset of X, can all be simultaneously displayed by a level-1 network with leaf-set X. The rules are an intriguing mixture of tree inference rules, and an inference rule for building up a cyclic ordering of X from orderings on subsets of X of size 4. This opens up several new directions of research for inferring phylogenetic networks from smaller ones, which could yield new algorithms for solving the supernetwork problem in phylogenetics.

Entities: Chemical Disease Gene Species

Keywords: Closure; Cyclic orderings; Inference rules; Level-1 network; Phylogenetic network; Qnet; Quarnet; Quartet trees

Mesh：

Year: 2018 PMID： 29869043 PMCID： PMC6061523 DOI： 10.1007/s11538-018-0450-2

Source DB: PubMed Journal: Bull Math Biol ISSN： 0092-8240 Impact factor: 1.758

Introduction

One of the main goals in phylogenetics is to develop methods for constructing evolutionary trees, the tree-of-life being a prime example of such a tree (Letunic and Bork 2016). Mathematically speaking, for a set X of species, a phylogenetic X-tree is a (graph theoretical) tree with leaf-set X and no degree-2 vertices; it is binary if every internal vertex has degree three. A popular approach to constructing such trees, called the supertree method, is to build them up from smaller trees (Bininda-Emonds 2014). The smallest possible trees that can be used in this approach are quartet trees, that is, binary phylogenetic trees having 4 leaves (see e.g. Fig. 1 for the quartet tree ab|cd with leaf-set ). Thus, it is natural to ask the following question: How should we decide whether or not it possible to simultaneously display all of the quartet trees in a given collection of quartet trees by some phylogenetic tree?

Fig. 1

i A level-1 phylogenetic network with leaf-set . ii Top: a quartet tree with leaf-set , also denoted by ab|cd. Bottom: a quarnet with leaf-set . Both the quartet tree and quarnet are displayed by the level-1 network in i In case the collection consists of a quartet tree for every possible subset of X of size 4 (which we denote by ), this problem has an elegant solution that was originally presented by Colonius and Schulze (1981) (see also Bandelt and Dress 1986 for related results). We present full details in Theorem 1, but essentially their result states that, given a collection of quartet trees , one for each element in , there exists (a necessarily unique) binary phylogenetic X-tree displaying every quartet tree in the collection if and only if when the quartet trees ab|cx and ab|xd are contained in then so is the quartet tree ab|cd. Rules such as ab|cx plus ab|xd implies ab|cd are known as inference rules, and they have been extensively studied in the phylogenetics literature (see e.g. Semple and Steel 2003, Chapter 6.7 and the references therein). Although phylogenetic trees are extremely useful for representing evolutionary histories, in certain circumstances they can be inadequate. For example, when two viruses recombine to form a new virus (e.g. swine flu), this is not best represented by a tree as it involves species combining together to form a new one rather than splitting apart. In such cases, phylogenetic networks provide a more accurate alternative to trees and there has been much recent work on such structures (see e.g. Steel 2016, Chapter 10 for a recent review). In this paper, we will consider properties of a particular type of phylogenetic network called a level-1 network (Gambette et al. 2012).1 For a set X of species, this is a connected graph with leaf-set X and such that every maximal subgraph with no cut-edge is either a vertex or a cycle (see Sect. 2 for more details). Our main results will apply to binary level-1 networks, where we also assume that every vertex has degree 1 or 3. We present an example of such a network in Fig. 1. Note that a phylogenetic X-tree is a special example of a level-1 network with leaf-set X. As with phylogenetic X-trees, it is possible to construct level-1 networks from quartets (Gambette et al. 2012). However, it has been pointed out that there are problems with understanding such networks in terms of inference rules (see e.g. Keijsper and Pendavingh 2014, p. 2540). Here, we circumvent these problems by considering a certain type of subnetwork of level-1 network called a quarnet instead of using quartet trees. A quarnet is a 4-leaved, binary, level-1 network (see e.g. Fig. 1); they are displayed by binary level-1 networks in a similar way to quartets (see Sect. 3 for details). As we shall see, quarnets naturally lead to inference rules for level-1 networks which can be thought of as a combination of quartet inference and inference rules for building circular orderings of a set. Moreover, in our main result we show that, just as with phylogenetic trees, the quarnet inference rules that we introduce can be used to characterize when a collection of quarnets, one for each element in , is equal to the set of quarnets displayed by a binary level-1 network with leaf-set X. We now summarize the contents of the rest of the paper. In the next section, we present some preliminaries concerning phylogenetic trees and level-1 networks, as well as their relationship with quartets. Then, in Sect. 3, we prove an analogous theorem to the quartet results of Colonius and Schulze for level-1 networks (Theorem 2). In Sect. 4, we use Theorem 2 to provide a characterization for when a set of quartets, one for each element of , can be displayed by a binary level-1 network (Theorem 3). In Sect. 5, we then define the closure of a set of quarnets. This can be thought of as the collection of quarnets that is obtained by applying inference rules to a given collection of quarnets until no further quarnets are generated. We show that this has similar properties to the so-called semi-dyadic closure of a set of quartets (see Theorem 4). We conclude with a brief discussion of some possible further directions.

Preliminaries

In this section, we review some definitions as well as results concerning the connection between phylogenetic trees and quartets. From now on, we assume that X is a finite set with .

Definitions

An unrooted phylogenetic network N (on X) (or network N (on X) for short) is a connected graph (V, E) with , every vertex has either degree 1 or degree at least 3, and the set of degree-1 vertices is X. The elements in X are the leaves of N. We also denote the leaf-set of N by L(N). The network is called binary if every vertex in N has degree 1 or 3. An interior vertex of N is a vertex that is not a leaf. A cherry in N is a pair of leaves that are adjacent with the same vertex. Two phylogenetic networks N and on X are isomorphic if there exists a graph theoretical isomorphism between N and whose restriction to X is the identity map. Note that a phylogenetic (X-) tree is a network which is also a tree. For any three vertices in such a tree T, their median, denoted by , is the unique vertex in T that is contained in every path between any two vertices in . A cut-vertex of a network is a vertex whose removal disconnects the network, and a cut-edge of a network is an edge whose removal disconnects the network. A cut-edge is trivial if one of the connected components induced by removing the cut-edge is a vertex (which must necessarily be a leaf). A network is simple if all of the cut-edges are trivial (so for instance, note that phylogenetic trees with more than three leaves are not simple networks). A network N is level-1 if every maximal subgraph in N that has no cut-edge is either a vertex or a cycle. Note that we shall say that a network N on X, where , is of cycle type if it contains a unique cycle of length |X|, and the number of vertices in N is 2|X| (so in particular, a network is of cycle type if it is simple, binary, level-1 and is not a phylogenetic tree). In what follows it will be useful to consider a certain type of operation on a level-1 network, which we define as follows. For a level-1 network N on X, let u be an interior vertex of N that is not contained in any cycle in N. Furthermore, let , where , be a circular ordering of the set of vertices in N that are adjacent to u. Then, we obtain a new network on X from N by removing vertex u and all edges incident with it and inserting new vertices and new edges and for all (see Fig. 2). Here, we use the convention that is identified with 1. We say that is obtained from N by a blow-up operation on u (using the given circular ordering of its neighbours). Note that is a level-1 network with one more cycle than N. Note that blow-up operations on the same vertex but with different circular orderings of its neighbours may lead to non-isomorphic networks. We illustrate a blow-up operation in Fig. 2.

Fig. 2

Example of blow-up operations: is obtained from N by a blow-up operation on u

Quartets, Trees and Networks

We now briefly recall some notation and results concerning quartet systems (for more details see Dress et al. 2012, Chapter 3). Although quartets are often considered as being 4-leaved trees, here it is more convenient to consider a quartet Q to be a partition of a subset Y of X of size 4 into two subsets of size 2. The set Y is called the support of Q. If for distinct, we denote Q by ab|cd. The set of all quartets on X is denoted by , and any non-empty subset is called a quartet system (on X). Given a quartet system on X and a subset , let be the number of quartets in whose support is Y. For simplicity, we write as m(a, b, c, d). If for every subset , then is said to be dense. Following the terminology in Dress et al. (2012), a quartet system is:These concepts are related as follows: thin if no pair of quartets in have the same support; saturated if for all with , the system contains at least one quartet in ; transitive if for all , if holds, then ab|cd is also contained in .

Lemma 1

Suppose that is a quartet system on X. If is saturated and thin, then is transitive.

Proof

We use a similar argument to that used by Bandelt and Dress (1986, Lemma 1). Suppose with . We need to show . Since is saturated and ab|cx is contained in , we have . Using a similar argument, ab|dx in implies that . Therefore, we must have as otherwise , a contradiction to the assumption that is thin. A quartet ab|cd on X is displayed by a phylogenetic X-tree T if the path between a and b in T is vertex disjoint from the path between c and d in T. The quartet system displayed by T is denoted by . In view of Dress et al. (2012, Theorem 3.7) and the last lemma, we have the following slightly stronger characterisation of quartet systems displayed by a phylogenetic tree, which was stated in Bandelt and Dress (1986, Proposition 2) using slightly different terminology.

Theorem 1

A quartet system is of the form for a (necessarily unique) phylogenetic X-tree T if and only if is thin and saturated. We now turn our attention to the relationship between quartets and level-1 networks. A split A|B of X is a bipartition of X into two non-empty parts A and B (note that since A|B is a bipartition, order does not matter, that is, ). Such a split is induced by a network N if there exists a cut-edge in N whose removal results in two connected components, one with leaf-set A and the other with leaf-set B. A quartet ab|cd is exhibited by a network N if there exists a split A|B induced by N such that and . Note that if a quartet is exhibited by N, then it is displayed by N, that is, N contains two disjoint paths, one from a to b, and the other from c to d. However, the converse is not true. For example, quartet ab|cd is displayed by the network in Fig. 4(iv), but ab|cd is not exhibited by this network. Given a network N, we let denote the set of quartets exhibited by N, and let be the set of quartets displayed by N. In the light of the last remark, clearly we have .

Fig. 4

Four types of qnets on : i a Type I qnet ; ii a Type II qnet ; iii a type III qnet ; iv a type IV qnet . Type IV is of cycle type

Quarnets

In this section, we shall show that an analogue of Theorem 1 holds for quarnets and level-1 networks. We begin by formally defining the concept of a quarnet and how quarnets can be obtained from level-1 networks. Given a binary, level-1 phylogenetic network N on X and a subset , we let denote the network induced on A by N, which is obtained from N by deleting all edges that are not contained in some path between a pair of elements in A, removing all isolated vertices, and then repeatedly applying the following two operations until neither of them is applicable (i) suppressing degree-2 vertices, and (ii) suppressing parallel edges. Note that is a binary, level-1 phylogenetic network on A. The two types of three-leaved networks: tree type (left) and cycle type (right) We now consider the different possible phylogenetic networks on three and four leaves. First note that there are two possible types of phylogenetic networks with three leaves (see Fig. 3). We call these cycle type and tree type depending on whether they contain a cycle or not, respectively. Similarly, a quarnet or qnet, for short,2 is a binary, level-1 phylogenetic network with four leaves. The leaf-set L(F) of a qnet F is called its support. As illustrated in Fig. 4, there are four types of qnets: Type I qnets contain no cycles; Type II qnets contain one cycle and one non-trivial cut-edge; Type III qnets contain two cycles; and Type IV qnets contain no non-trivial cut-edge. A qnet system on X is a collection of qnets all of whose supports are contained in X. We shall say that a qnet F with support is displayed by a network N on X if F is isomorphic to . Moreover, we let be the qnet system displayed by N, that is,

Fig. 3

The two types of three-leaved networks: tree type (left) and cycle type (right)

Four types of qnets on : i a Type I qnet ; ii a Type II qnet ; iii a type III qnet ; iv a type IV qnet . Type IV is of cycle type We now turn to characterizing when a qnet system is displayed by a level-1 network. To do this, we introduce some additional concepts concerning qnet systems. First, a qnet system on X is consistent (on subsets of X of size three) if for all subsets , is isomorphic to , for each pair of qnets in with . In addition, a qnet system on X is minimally dense if for all , there exists precisely one qnet in with support Y. Second, we say that a qnet system on X is cyclically transitive or cyclative if for all subsets with , the system also contains . Note that this is closely related to the cyclic-ordering inference rule given in Bandelt and Dress (1992, Proposition 1). Finally, we say that a qnet system on X is saturated, if for all subsets , the following hold:We next show how these concepts are related. To prove the following result, given a qnet system , we shall consider the quartet system consisting of those quartets that are exhibited by some qnet in , which we shall denote by . If contains , then , or , or , or is contained in . If contains , then , or , or , or is contained in . If contains , then , or , or , or is contained in .

Lemma 2

Suppose that is a qnet system on X. If is minimally dense, then is thin. If is saturated, then is saturated.

Proof

For the proof of (i), as is minimally dense, for each subset Y of X with size four, there exists precisely one qnet F in whose support is Y. Hence, there exists at most one quartet in with support Y. To prove (ii), consider a quartet in and an arbitrary element x in X that is distinct from a, b, c, d. Let F be a qnet in such that Q is the quartet exhibited by F. Then, F is Type I, II or III. Assume first that F is Type I, then . Since is saturated, by (S1),and so one of the quartets ab|cx and ax|cd is contained in , as required. If F is of Type II or III, then similar arguments using (S2) and (S3), respectively, show that ab|cx or ax|cd is contained in . We now characterize when a minimally dense set of qnets is displayed by a level-1 network.

Theorem 2

Let be a minimally dense qnet system on X with . Then, for some (necessarily unique) binary, level-1 network N on X if and only if is consistent, cyclative and saturated. Clearly, if holds for a binary, level-1 network N, then is consistent, cyclative and saturated. We now show that the converse holds. Suppose that is a minimally dense qnet system on X that is consistent, cyclative and saturated. Consider the quartet system . By Lemma 2, is thin and saturated. Therefore, by Theorem 1, there exists a unique phylogenetic tree T with . For each interior vertex v in T, let denote the partition of X induced by deleting v from T so that, in particular, the number of parts in is equal to the degree of v. Note that, for all , if and , the path in T between a and b must contain v, and if , the path between a and b does not contain v. We next partition the set of interior vertices of T. Let be the set of degree-3 vertices v in T with the property that there exist three elements, one from each distinct part of , so that there exists a qnet F in whose restriction to these three elements is of cycle type. Let be the set of degree-3 vertices in T not contained in . Lastly, let be the set of interior vertices in T with degree at least 4.

Claim 1

A degree-3 vertex v in T is contained in if and only if, for each subset Y of X of size three that contains precisely one element from each part of , the restriction is of cycle type for every qnet F in with . Since is minimally dense, the “if ” direction follows directly from the definition of . Conversely, let be such that , , are all contained in distinct parts of and there exists a qnet in such that is of cycle type. Now let with all contained in distinct parts of and let F be an arbitrary qnet in with . We shall show that is of cycle type by considering the size of the intersection . First assume that , that is, . Then, as is consistent, is of cycle type since it is isomorphic to . Second assume that . By swapping the indices, we may further assume that , , and . In other words, we have . Consider and let be the qnet in with . Since are both contained in , the quartet is contained in . As is of cycle type, this implies that is either or . In both cases, is of cycle type, and hence is also of cycle type in view of the consistency of . Next assume that . By swapping the indices, we may further assume that, for , elements and are contained in the same part of but . Consider the sets and , and put and . Then, we have for . Repeatedly applying the argument used when the size of the intersection is two, it follows that is of cycle type, as required. Lastly, the case can be established using a similar argument to that when the size of the intersection is zero. This completes the proof of the claim. Although we will not use this fact later, note that it follows from Claim 1 that a vertex v in T is contained in if and only if, for each subset Y of X of size three whose elements are contained in distinct elements of , the restriction is a tree type for every qnet F in with .

Claim 2

Suppose . Let be contained in distinct parts of , respectively. Then, the qnet F in with support is of Type IV. Moreover, if F is , then, for all , , and , the qnet with support is . Suppose F is not of Type IV. Then, contains precisely one quartet, denoted by Q, and . This implies that . However, Q is not contained in because the path between any pair of distinct elements in A contains v; a contradiction. Thus, F is of Type IV. Now, suppose . Then, we may further assume without loss of generality that , , , and . Hence, . Note that the argument in the last paragraph implies that is of Type IV. If is not isomorphic to , then is isomorphic to either or . In the first subcase, since is cyclative and , the qnet is contained in . This implies that the quartet is not contained in , a contradiction since are contained in while p, y are contained in . The second subcase follows in a similar way. Lastly, if , then note that there exists a list of 4-element subsets for some such that, for , we have and the two elements in are contained in the same part of . Claim 2 follows by repeatedly applying the argument in the last paragraph to the list. Using the last claim, we next establish the following.

Claim 3

For each vertex , there exists a unique circular ordering of the parts of such that, for each tuple with , the qnet in with support is isomorphic to . In the light of Claim 2, we can define a quaternary relation || on the parts of by setting AB||CD, for all distinct parts , if and only if, for all , , and , the qnet with support is . Put differently, the distance between x and p in the qnet with support is two, and so is the distance between y and q. Now, for all distinct , we show thatIndeed, let , , , , . Then, (BD-1) holds since is isomorphic to and to . Next, (BD-2) follows immediately since is minimally dense. To see (BD-3) holds, note that since AD||CE and AC||BD imply that and are contained in , using the fact that is cyclative implies that is in , and hence AC||EB holds. Using (BD-1), it follows that AC||BE, as required. AB||CD implies BA||CD and CD||AB; either AB||CD, or AC||BD, or AD||BC (exclusively); AC||BD and AD||CE implies AC||BE. Since the quaternary relation || on satisfies the conditions (BD-1)–(BD-3) as specified in Proposition 1 on page 73 of Bandelt and Dress (1992), it follows that || determines a unique circular ordering of the parts in as specified in Claim 3. Now let , and for each vertex , fix a circular ordering of its neighbourhood induced by the ordering of in Claim 3 if , or the necessarily unique circular ordering (clockwise and anticlockwise are treated as the same) of if (and hence ). Let N be the level-1 network obtained from T by blowing up each vertex u in using the given circular ordering of . We next show that . To this end, fix four arbitrary elements a, b, c, d in X and let F be the qnet in with support . We need to show that . There are four cases depending upon whether F is Type I, II, III, or IV. First suppose F is of Type I. Without loss of generality, we may assume that . Let . If , then a, b, c are contained in three distinct parts in the partition of X on u. By Claims 1 and 2, it follows that with is of cycle type, a contradiction. Thus, and so there exists a cut-vertex in N whose removal induces three connected components, containing a, b and c, respectively. Similarly, the median is contained in . Hence, there exists a cut-vertex in N whose removal induces three connected components, containing a, c and d, respectively. Let be the qnet in whose support is . Thus, by inspecting all possible qnets on , it follows that is isomorphic to , and hence . Second, suppose that F is of Type II. Without loss of generality, we may assume that . Let be the qnet in whose support is . Let u be the median of a, c, d in T. Then, by an argument similar to the one used in the last paragraph, it follows that there exists a cut-vertex in N (and hence also a cut-vertex in ) whose removal results in three connected components, containing a, c and d, respectively. On the other hand, let v be the median of in T. Then, a, b, c are contained in three distinct parts of . Since is of cycle type, by Claim 2 it follows that , which implies that is also of cycle type. Thus, by inspecting all possible qnets on , it follows that is isomorphic to , and hence . Next, suppose that F is of Type III. Without loss of generality, we may assume that . Let be the qnet in whose support is . Let u be the median of in T and v be the median of in T. Since the quartet ab|cd is contained in , we know that u and v are distinct. Hence, there exists a cut-edge whose deletion puts a and b in one component and c and d in the other connected component. By an argument similar to that used for analysing when F is of Type II, it follows that and are both of cycle type. Hence, by inspecting all possible qnets on , the qnet is isomorphic to , and hence . Lastly, suppose that F is of Type IV. Without loss of generality, we may assume that . Let be the qnet in whose support is . Hence, there exists no quartet in whose support is A. Therefore, . Denoting this median by u, it follows that u is necessarily contained in , and hence contains vertices. Now let be the unique circular ordering of vertices induced by the circular ordering of in Claim 3. Without loss of generality, we may assume that . Then, there exists such that . By the construction of N (which locally is the blow-up at u with respect to the circular ordering), it follows that is isomorphic to F, and hence . This shows that . Since and are both minimally dense, we have . Finally, the uniqueness statement concerning N is a direct consequence of the uniqueness of T and the unique way in which N is constructed from T.

A Characterization of Level-1 Quartet Systems

We now use Theorem 2 to characterize when a quartet system is equal to the set of quartets displayed by a binary level-1 network. This characterization is given as Theorem 3. Let be a quartet system on X. A quartet Q in is distinguished if Q is the only quartet in with support equal to the leaf-set of Q. Moreover, a network N is called 3-cycle free if it does not contain any cycle consisting of three vertices.

Theorem 3

Let be a dense quartet system on X with . Then, for some binary level-1 network N on X if and only if the following three conditions hold:Moreover, if satisfies (D1)–(D3), then there exists a unique level-1, 3-cycle free network N with . For all , we have or . If , then , for distinct. If ab|cd is a distinguished quartet in , then, for each where are distinct, either ax|cd or ab|cx is a distinguished quartet in . It is easily checked that, if holds for some binary level-1 network N, then (D1)–(D3) holds. Conversely, let be a dense quartet system satisfying (D1)–(D3). Let be the set consisting of the distinguished quartets contained in . We first associate a phylogenetic X-tree T to . If , then we let T denote the phylogenetic X-tree which contains precisely one vertex that is not a leaf (i.e. a “star tree”). If , then let be some quartet contained in , . Suppose that there exists some . Then, by (D3), either or . It follows that . Moreover, as is clearly thin and by (D3) is saturated, it follows by Theorem 1, that there exists a phylogenetic X-tree T with . Now we construct a qnet system as follows. Let be the subset of consisting of those Y with , and . To each we associate a qnet as follows. Swapping the labels of the elements in if necessary, we may assume that is the (necessarily unique) quartet in with leaf-set . Now let and be the median of in Q and T, respectively. Similarly, let and be the median of in Q and T, respectively. Then, is the qnet on obtained from Q by performing a blow-up on each of , where , if and only if the degree of in T is at least four. We also associate a qnet to each as follows. Swapping the labels of the elements in if necessary, we may assume that the quartets in with leaf-set are ab|cd and ad|bc. We then define to be the qnet . Now, let . By construction is minimally dense. Moreover, , and is cyclative in view of (D2). Next, we shall show that is consistent. Fix a subset and consider its median v in T. By construction, it suffices to establish the claim that the degree of v is three in T if and only if, for each , the set is not contained in . To see that this claim holds first note that if v has degree three, then each of the three components of contains precisely one element in . Without loss of generality, we may assume that element d is contained in the connected component containing element c. But this implies that ab|cd is a quartet in , and hence . On the other hand, if v has degree at least four, then there exists an element such that x, a, b, c belong to four different connected components of . Therefore, and are disjoint. This implies that is not contained in , and so it is contained in . This establishes the claim. Next, we show that is saturated. We shall show that (S2) holds; the fact that satisfies (S1) and (S3) can be established by a similar argument. Let be a set that satisfies the condition in (S2), that is, is contained in . Then, ab|cd is a quartet in . Furthermore, put and , then the degree of u is at least four and the degree of v is three. Now, fix an element . If x and a are in the same connected component resulting from deleting v from T, then ax|cd is a quartet in . Since the median of a, c, d in T has degree three, by construction either or (but not both) is contained in . Otherwise, ab|cx is a quartet in . Since the median u of a, b, c in T has degree greater than three, by construction we can conclude that either or is contained in (but not both). This completes the verification of (S2). It follows that is minimally dense, cyclative, consistent and saturated. By Theorem 2, there exists a unique binary level-1 network N on X such that . By construction, it also follows that . The uniqueness statement in the theorem follows from the uniqueness of N and the fact that for two binary level-1 networks N and if and only if N and on X differ only by 3-cycles (see e.g. Keijsper and Pendavingh 2014, Lemma 2).

Quarnet Inference Rules and Closure

For a quartet system on X, we write precisely if every phylogenetic X-tree that displays also displays ab|cd. The statement is known as a quartet inference rule (Semple and Steel 2003). A well-known example of such a rule iswhich leads to the concept of the semi-dyadic closure of the set , that is, the minimal set of quartets that contains and has the property that if , then . In this section, we define analogous concepts for qnets and show that they have similar properties to those enjoyed by phylogenetic trees. If is a qnet system, we write for some qnet F if every binary level-1 network that displays also displays F. Now, let denote symbols in . For example, is equivalent to when and . We introduce three qnet inference rules on :We illustrate two of these rules in Fig. 5.

Fig. 5

An illustration of the (CL2) and (CL3) inference rules. Top: The first part of the (CL2) inference rule with . Bottom: the (CL3) inference rule

for all ; and and for all ; An illustration of the (CL2) and (CL3) inference rules. Top: The first part of the (CL2) inference rule with . Bottom: the (CL3) inference rule We remark in passing that the qnet system implies that inference rules (CL1)–(CL3) are independent from one another. Using Theorem 2, it is straightforward to show that the above three rules are well defined. That is, given three qnets , and F such that holds for one of the above three rules, then every binary level-1 network that displays must display F. For a qnet system , we define the set to be the minimal qnet system (under set-inclusion) that contains such that if holds under (CL1)–(CL3), then holds. We call the closure of . The following key proposition is analogous to that for semi-dyadic closure for quartet systems (cf. Meacham 1983; Huber et al. 2005, Proposition 2.1). It follows from the fact that the closure of a qnet system can clearly be obtained from by repeatedly applying the qnet rules (CL1)–(CL3) until the sequence of sets so obtained stabilizes. Note that this process must clearly terminate in polynomial time.

Proposition 1

Let be a qnet system and let N be a binary, level-1 network. Then, N displays if and only if N displays . We now show that behaves in a similar way to the semi-dyadic closure of a quartet system (cf. Semple and Steel 2003, Exercise 19, p. 143).

Theorem 4

Suppose that is a minimally dense, consistent set of qnets on X with . Then, the following statements are equivalent: holds for a (necessarily unique) binary, level-1 network N on X; ; For every 3-element subset of , the subset is displayed by some binary level-1 network on X. The fact that (i) implies (ii) and (i) implies (iii) are straightforward. We complete the proof by showing that (ii) implies (i) and (iii) implies (i). For the proof of (ii) implies (i), suppose that . Note first that by (CL3) is cyclative. Moreover, is minimally dense and consistent by assumption. Hence, by Theorem 2, it suffices to show that is saturated. To this end, let w, x, y, z, t be five pairwise distinct elements in X such that is contained in with and . We need to show that satisfies (S1)–(S3). For , let be the qnet on that is contained in (which must exist as is minimally dense). First assume that there exists some element p in such that the qnet is of Type IV. Without loss of generality, assume (the other cases can be established in a similar manner). Since is of Type IV, by the consistency of we have . Now, applying (CL2) with , , , , implies that , by (ii). Therefore, satisfies (S2) and (S3) (corresponding, respectively, to taking and ). It follows that in the remainder of the proof we can assume that none of the qnets in is of Type IV. For convenience, in the following, we will use the convention that when we apply (CL1), we will write a 5-tuple and assume that the ith element in the 5-tuple will correspond to the ith element in the tuple (a, b, c, d, e) of elements used in (CL1) for . To show that satisfies (S1), suppose that . Note first that if , then applying (CL1) to (x, w, y, z, t) implies , and hence (S1) holds. Similarly, if , then applying (CL1) to (z, y, w, x, t) implies , and hence (S1) holds. Therefore, if (S1) does not hold, then, by consistency, we may assume and with . Considering and , and applying (CL1) to (x, y, t, w, z) implies . On the other hand, considering F and and applying (CL1) to (z, y, x, w, t) implies that , a contradiction to the fact that is minimally dense. Thus, satisfies (S1). Using an argument similar to the one that we used to show that satisfies (S1), it is straightforward to deduce that satisfies (S2) and (S3). We next prove that (iii) implies (i). Since is minimally dense and consistent by assumption, it follows by Theorem 2 that it suffices to show that is cyclative and saturated. First, we show that is cyclative. If not, then there exist five elements such that and are contained in but is not contained in . Let be the (necessarily unique) qnet in whose leaf-set is . Then, . Consider the set . The assumption (iii) implies that is displayed by a binary level-1 network N on X. Consider . Then, . By Theorem 2, is minimally dense and cyclative. Since , it follows that , a contradiction in view of . Second we show that is saturated. Here, we only show that satisfies (S2) as showing that satisfies (S1) and (S3) can be done in a similar manner. If does not satisfy (S2), then there exists a 5-element set such that is contained in while, for the qnet systemwe have . Let and be the qnets in with leaf sets and , respectively which must exist as is minimally dense by assumption. Then, neither nor is contained in . Lastly, consider the subset of . Then, as assumption (iii) holds it follows that is displayed by a binary level-1 network N on X. Consider . Then, . By Theorem 2, is minimally dense and saturated. Using the fact that is saturated, it follows that as . Therefore, contains either two distinct qnets on A or two distinct qnets on B, a contradiction to the fact that is minimally dense. Thus, (iii) implies (i), thereby completing the proof of the theorem. Note that it follows from Theorem 4 that we can decide whether or not a given minimally dense set of qnets is displayed by a level-1 binary phylogenetic network on leaves in time. This follows since we can compute in time. It would be interesting to see if this time bound can be improved upon.

Discussion

We have shown that by considering quarnets we can define natural inference rules, as well as the concept of quarnet closure. With quartets, there are various types of inference rules, which imply alternative definitions of closure for quartet systems (see e.g. Bryant and Steel 1995; Semple and Steel 2003). It would thus be of interest to explore whether there are other types of inference rules for quarnets and, if so, what their properties are. In this paper, we have focused on understanding the closure for a minimally dense set of quarnets. For real data, there can be cases where it may be necessary to consider non-minimally dense sets (e.g. in case there is missing data). Hence, it could be useful to develop results for such situations. However, it should be noted that understanding the closure of a non-minimally dense set quartets is already quite challenging (for example, as opposed to the minimally dense case, deciding whether or not an arbitrary set of quartets can be displayed by a phylogenetic tree is NP-complete) (Steel 1992). In many applications, biologists prefer to use weighted phylogenetic trees and networks to model their data, where non-negative numbers are assigned to edges of the tree or network to, for example, represent evolutionary distance. The problem of considering when a dense set of weighted quartets can be represented by a weighted phylogenetic tree has been considered in Dress and Erdös (2003), Grünewald et al. (2008). Given the results in this paper, it could therefore be of interest to consider how weighted level-1 networks may be inferred from dense sets of weighted quarnets. In applications, it can also be useful to consider rooted networks, which are essentially leaf-labelled, directed acyclic graphs. Edges in such networks have a direction which represents the fact that species evolve through time from a common ancestor (represented in graph theoretical terms by a root vertex). For such networks, the concept of level-1 networks can be defined in a similar way to the unrooted case, and algorithms are known for deciding when minimally dense collections of 3-leaved, rooted level-1 phylogenetic networks (which are known as trinets) can be displayed by a single phylogenetic network (Huber and Moulton 2013; Huber et al. 2017). It would thus be of interest to consider inference rules for trinets. Moreover, for both the rooted and unrooted case, it could be worth exploring whether there are inference rules for more complicated networks (e.g. networks with level higher than one, as defined in e.g. Gambette et al. 2012). Although results in Iersel and Moulton (2017) indicate that such inference rules might exist, if they do, then we expect that these will probably be quite complicated.

5 in total

2 in total

1. Identifying Species Network Features from Gene Tree Quartets Under the Coalescent Model.

Authors: Hector Baños
Journal: Bull Math Biol Date: 2018-08-09 Impact factor: 1.758

2. Identifiability of species network topologies from genomic sequences using the logDet distance.

Authors: Elizabeth S Allman; Hector Baños; John A Rhodes
Journal: J Math Biol Date: 2022-04-07 Impact factor: 2.164

2 in total

Quarnet Inference Rules for Level-1 Networks.

Introduction

Preliminaries

Definitions

Quartets, Trees and Networks

Lemma 1

Proof

Theorem 1

Quarnets

Lemma 2

Proof

Theorem 2

Claim 1

Claim 2

Claim 3

A Characterization of Level-1 Quartet Systems

Theorem 3

Quarnet Inference Rules and Closure

Proposition 1

Theorem 4

Discussion

1. QNet: an agglomerative method for the construction of phylogenetic networks from weighted quartets.

2. Encoding phylogenetic trees in terms of weighted quartets.

3. Quartets and unrooted phylogenetic networks.

4. Reconstructing a phylogenetic level-1 network from quartets.

5. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees.

1. Identifying Species Network Features from Gene Tree Quartets Under the Coalescent Model.

2. Identifiability of species network topologies from genomic sequences using the logDet distance.