Literature DB >> 36114394

The hybrid number of a ploidy profile.

Abstract

Polyploidization, whereby an organism inherits multiple copies of the genome of their parents, is an important evolutionary event that has been observed in plants and animals. One way to study such events is in terms of the ploidy number of the species that make up a dataset of interest. It is therefore natural to ask: How much information about the evolutionary past of the set of species that form a dataset can be gleaned from the ploidy numbers of the species? To help answer this question, we introduce and study the novel concept of a ploidy profile which allows us to formalize it in terms of a multiplicity vector indexed by the species the dataset is comprised of. Using the framework of a phylogenetic network, we present a closed formula for computing the hybrid number (i.e. the minimal number of polyploidization events required to explain a ploidy profile) of a large class of ploidy profiles. This formula relies on the construction of a certain phylogenetic network from the simplification sequence of a ploidy profile and the hybrid number of the ploidy profile with which this construction is initialized. Both of them can be computed easily in case the ploidy numbers that make up the ploidy profile are not too large. To help illustrate the applicability of our approach, we apply it to a simplified version of a publicly available Viola dataset.

Entities: Chemical

Keywords: Binary representation; Hybrid number; Multiplicity vector; Phylogenetic network; Ploidy profile; Prime factor decomposition; Simplification sequence

Mesh：

Year: 2022 PMID： 36114394 PMCID： PMC9481518 DOI： 10.1007/s00285-022-01792-6

Source DB: PubMed Journal: J Math Biol ISSN： 0303-6812 Impact factor: 2.164

Introduction

Datasets such as the Viola dataset considered in Marcussen et al. (2012) arise when species inherit multiple sets of chromosomes from their parents. Generally referred to as polyploidization, this can be due to whole genome duplication (also called autopolyplodization) as in the case of e.g. watermelons and bananas Varoquaux et al. (2000), or by obtaining an additional complete set of chromosomes via hybridization (also called allopolyploidization), as in the case of the frog genus Xenopus Ownbey (1950). This poses the following intriguing question at the center of this paper: How much information about the evolutionary past of a set of species can be gleaned from the ploidy number (i.e. the number of complete chromosome sets in a genome) of the species? Evoking parsimony to capture the idea that polyploidization is a relatively rare evolutionary event we re-phrase this question as follows: What is the minimum number of polyploidization events necessary to explain a dataset’s observed ploidy profile. For a set X of species that make up a dataset, we define such a profile to be the multiplicity vector for , indexed by the species in X where, for each , the ploidy number of species is . As it turns out, an answer to this question is well-known if the ploidy profile in question is presented in terms of a multi-labelled tree (see e.g. Huber and Moulton 2006; Huber et al. 2006; Marcussen et al. 2015, 2012). Since it is, however, not always clear how to derive a biologically meaningful multi-labelled tree from the dataset in the first place Huber et al. (2012), we focus here on ploidy profiles for which such a tree is not necessarily available. Due to the reticulate nature of the signal left behind by polyploidization Sagitov et al. (2013), Wagner et al. (2017), Waight et al. (2020), phylogenetic networks offer themselves as a natural framework to formalize and answer our question. Although we present a definition of such structures (and all other concepts used in this section) below, from an intuition development point of view, it suffices to observe at this stage that a phylogenetic network can sometimes be thought of as a rooted directed bifurcating tree T with a pre-given set X as leaves to which additional arcs have been added via joining subdivision vertices of arcs of T so that the following property holds. The resulting graph is a rooted directed acyclic graph with leaf set X such that a subdivision vertex v of T either only has additional arcs starting at it or only additional arcs ending at it. For our purposes we only allow the case that v has one additional outgoing arc. Subdivision vertices that have at least one additional incoming arc are called hybrid vertices and are assumed to represent reticulate evolutionary events such as polyploidization. If a hybrid vertex in a phylogenetic network N also has overall degree three then N is generally called a binary phylogenetic network. We refer the interested reader to Fig. 1i for an example of a binary phylogenetic network on that is obtained from the tree depicted in Fig. 1ii and to Gusfield (2014), Huber and Moulton (2013), Huson et al. (2010), Steel (2016) for methodology and construction algorithms surrounding phylogenetic networks. Note that to be able to account for autopolyploidization, we deviate from the usual notion of a phylogenetic network by allowing our phylogenetic networks to have parallel arcs (but no loops) – see e.g. Huber et al. (2021), Van Iersel et al. (2020) and the references therein for further results concerning such networks.

Fig. 1

i One of potentially many phylogenetic networks that realize the ploidy profile on . To improve clarity of exposition, we always assume that arcs are directed downward, away from the root. ii A (phylogenetic) tree to which subdivision vertices and arcs have been added to obtain the phylogenetic network in i – see the text for details

By taking for every leaf x of a binary phylogenetic network N on some finite set X the number of directed paths from the root of N to x, every phylogenetic network induces a multiplicity vector indexed by the elements in X. Saying that N realizes in this case (see Sect. 3 for an extension of this concept to phylogenetic networks) allows us to formalize our question as follows. Suppose is a ploidy profile indexed by the elements of some finite set X. What can be said about the minimum number of hybrid vertices required by a binary phylogenetic network on X to realize ? We call this number which is central to the paper the hybrid number of and denote it by . If a binary phylogenetic network N has hybrid vertices then we also say that N attains (see again Sect. 3 for an extension of this concept to phylogenetic networks). The interested reader is referred to Steel (2016) for an overview of the related concept of the hybrid number of a set of phylogenetic trees (i.e. leaf-labelled rooted trees without any vertices of indegree and outdegree one whose leaf set is a pre-given set). Before proceeding with presenting an example to help illustrate this question we remark that multiplicity vectors realized by binary phylogenetic networks have been used in Rossello et al. (2008) to define a metric for a certain class of binary phylogenetic networks. Furthermore, the stronger assumption that the number of directed paths from every vertex of a binary phylogenetic network N to every leaf of N is known, has led to the introduction of the concept of an ancestral profile for N Steel et al. (2019). Returning to our question, consider the ploidy profile indexed by where the multiplicity of is 12, that of and is 6, and that of is 5. Since no binary phylogenetic network on one leaf and two hybrid vertices can realize the ploidy profile because it has at most directed paths from the root to the leaf, it follows that a binary phylogenetic network that realizes and therefore also must have at least three hybrid vertices. In fact, the subnetwork in bold of the phylogenetic network depicted in Fig. 1i is the unique (subject to letting the arc a finish at a subdivision vertex of an outgoing or incoming arc of the hybrid vertex h or letting a start at a subdivision vertex of an outgoing or incoming arc of the vertex t) binary phylogenetic network that realizes and uses a minimum number of hybrid vertices. To be able to realize the ploidy profile (6, 5) and therefore also the ploidy profile at least four hybrid vertices are therefore needed. By counting directed paths from the root to each leaf of the phylogenetic network depicted in Fig. 1i with , the hybrid vertex above , the two incoming arcs of , and the arc removed and any resulting vertices of indegree and outdegree one suppressed clearly realizes . Calling that phylogenetic network then, in a similar sense as , we also have that is unique. To obtain a binary phylogenetic network from that realizes at least one further hybrid vertex is needed. Again by counting directed paths from the root to each leaf, it is easy to check that the binary phylogenetic network depicted in Fig. 1i realizes and postulates five hybrid vertices. As we shall see as a direct consequence of Theorem 2, . As a further consequence of that theorem, we obtain a closed formula for the hybrid number of a ploidy profile (Corollary 1). i One of potentially many phylogenetic networks that realize the ploidy profile on . To improve clarity of exposition, we always assume that arcs are directed downward, away from the root. ii A (phylogenetic) tree to which subdivision vertices and arcs have been added to obtain the phylogenetic network in i – see the text for details The outline of the paper is as follows. In the next section, we present some relevant basic terminology and notation concerning phylogenetic networks. This also includes an unfold-operation for phylogenetic networks and a fold-up operation that generates phylogenetic networks, both of which were introduced originally in Huber and Moulton (2006). In Sect. 3, we extend the concept of attainment from binary phylogenetic networks to phylogenetic networks and study structural properties of phylogenetic networks that attain ploidy profile. As part of this, we introduce the two main concepts of the paper: a simple ploidy profile and an attainment of a ploidy profile. In Sect. 4, we associate two binary phylogenetic networks to a simple ploidy profile which we denote by and , respectively. As we shall see, the former is based on the prime factor decomposition of a positive integer m and the latter on a binary representation of m. In Sect. 5, we associate a sequence to a ploidy profile which we call the simplification sequence of (Algorithm 1). As part of this, we also present some basic results concerning such sequences. This includes an infinite family of ploidy profiles that shows that such a sequence can grow exponentially large. Denoting the last element of the simplification sequence for by , we then employ a traceback through to obtain the aforementioned binary phylogenetic network from a binary phylogenetic network that attains (Algorithm 2). Motivated by our partial results for binary phylogenetic networks that realize a simple ploidy profile summarized in Theorem 1, we provide an upper bound on the hybrid number of a ploidy profile for special cases of (Proposition 2). After collecting some preliminary results for in Sect. 5, we establish in Sect. 6 that attains for a large class of ploidy profiles (Theorem 2). In Sect. 7, we turn our attention to computing the hybrid number of the ploidy profile of a simplified version of the aforementioned Viola dataset from Marcussen et al. (2012). We conclude with Sect. 8 where we outline potential directions of further research.

Preliminaries

We start with introducing basic concepts surrounding phylogenetic networks. Subsequent to this, we briefly describe two basic operations concerning phylogenetic networks that are central for establishing a key result (Proposition 1). For the convenience of the reader, we illustrate both operations in Figs. 2 and 3 by means of an example. Throughout the paper we assume that X is a non-empty finite set. We denote the size of X by n.

Fig. 2

i The MUL-tree M obtained by unfolding the phylogenetic network on in iv. The trees T(u) and T(v) rooted at u and v and indicated with a double arrow, respectively, are equivalent. In fact, they are maximal inextendible. ii Subdivision of the incoming arcs of u and v by and , respectively. iii Identifying the vertices and . iv Deleting the subtree T(v) and the incoming arc of v (indicated by dotted lines in iii)

Fig. 3

i The MUL-tree M obtained by unfolding the phylogenetic network on pictured in iv. The vertices u and v as indicated in i are the root of the maximal inexendible subtrees of M to which the subdivision, identification and deletion process described in Fig. 2 is applied to obtain the rooted directed acyclic graph G presented in iii. The two leaves labelled in G are the roots of two equivalent maximal inextendible subtree of G and applying the subdivision, identification, and deletion process to it results in F(M). In each case, the equivalent subMUL-trees are indicated by a double arrow

Basic concepts

Suppose for the following that G is a rooted directed connected acyclic graph which might contain parallel arcs but no loops. Then we denote the vertex set of G by V(G) and its set of arcs by A(G). We denote an arc starting at a vertex u and ending in a vertex v by (u, v) and refer to u as the tail of a and to v as the head of a. We call an arc a cut-arc if the deletion of a disconnects G. We call a cut-arc a of G trivial if the head of a is a leaf. Following Van Iersel et al. (2020), we call an induced subgraph of G with two vertices u and v and two parallel arcs form u to v a bead of G. Suppose . Then we refer to the number of arcs coming into v as the indegree of v, denoted by , and the number of outgoing arcs of v as the outdegree of v, denoted by . If G is clear from the context then we will omit the subscript in and , respectively. We call v the root of G, denoted by , if , and we call v a leaf of G if and . We denote the set of leaves of G by L(G). We call v a tree vertex if and . And we call v a hybrid vertex if and . We denote the set of hybrid vertices of G by H(G). We call any two leaves x and y of G a cherry, denoted by , if x and y share a parent. We say that G is binary if, and, for all other than , we have that the degree sum is three. We say that a vertex is above v if there exists a directed path P from w to v. In that case, we also say that v is below w. If, in addition, then we say that w is strictly above v and that v is strictly below w. We call G a (phylogenetic) network (on X) if , every vertex other than is a tree vertex or a hybrid vertex and . Note that phylogenetic networks in our sense were called semi-resolved phylogenetic networks in Huber and Moulton (2006). Also note that our definition of a phylogenetic network differs from the standard definition of such an object (see e.g. Steel 2016) by allowing beads. To emphasise that a phylogenetic network has no beads, we will sometimes refer to it as a beadless phylogenetic network. Suppose G is a phylogenetic network on X. Then following Bordewich and Semple (2007), we define the hybrid number h(G) of G to beWe refer to a phylogenetic network G (on X) as a phylogenetic tree (on X) if . For a phylogenetic tree T on X and a non-root vertex we denote by T(v) the subtree of T obtained by deleting the incoming arc of v and the subsequently generated connected component that does not contain v. Suppose that N is a phylogenetic network on X. Then we denote the number of directed paths from the root of N to a leaf x of N by . In case N is clear from the context, we will write m(x) rather than . For a further phylogenetic network on X, we say that N and are equivalent if there exists a graph isomorphism between N and that is the identity on X. Furthermore, we say that is a (binary) resolution of N if is obtained from N by resolving all vertices in H(N) so that every vertex in has indegree two. Note that for any resolution of N, we have .

The fold-up F(U(N)) of the unfold U(N) of a phylogenetic network N

Phylogenetic trees on X were generalized in Huber and Moulton (2006) to so called multi-labelled trees (on X) or MUL-trees (on X), for short, by replacing the leaf set of a phylogenetic tree by a multiset Y on X. Put differently, X is the set obtained from Y by ignoring the multiplicities of the elements in Y. As was pointed out in the same paper, every phylogenetic network N gives rise to a MUL-tree U(N) on X by recording, for every vertex v of N, every directed path from the root of N to v. More precisely, the vertex set of U(N) is, for all vertices , the set of all directed paths P from to v where we identify P with its end vertex v. Two vertices P and in U(N) are joined by an arc if there exists an arc such that P is obtained from by extending by the arc a. For example, the vertex u in Fig. 2i is the directed path , s, u in the phylogenetic network in Fig. 2iv which crosses the arc a. The vertex v in Fig. 2i is the directed path , s, u in Fig. 2iv which crosses the arc . i The MUL-tree M obtained by unfolding the phylogenetic network on in iv. The trees T(u) and T(v) rooted at u and v and indicated with a double arrow, respectively, are equivalent. In fact, they are maximal inextendible. ii Subdivision of the incoming arcs of u and v by and , respectively. iii Identifying the vertices and . iv Deleting the subtree T(v) and the incoming arc of v (indicated by dotted lines in iii) Reading Fig. 2 from left to right suggests that the unfolding operation can also be reversed. We next briefly outline this reversal operation which may be thought of as the fold-up of a MUL-tree M into a phylogenetic network F(M) (see Huber and Moulton 2006 for details, Huber et al. 2016; Huber and Scholz 2020 for more on both constructions, and Fig. 3 for an example). To make this more precise, we require further terminology. Suppose that M is a MUL-tree on X. Then we denote for a non-root vertex v of M the parent of v by . Extending the relevant notions from phylogenetic trees to MUL-trees, we say that a subMUL-tree T with root u of M is inextendible if there exists a subMUL-tree of M with root vertex such that T and are equivalent and either or and and are not equivalent. By definition, every subMUL-tree of M that is equivalent with an inextendible subMUL-tree of M is necessarily also inextendible. In view of this, we refer to an inextendible subMUL-tree T of M as maximal inextendible if no subMUL-tree of M that is equivalent with T is a subMUL-tree of an inextendible subMUL-tree of M. So, for example, the subMUL-tree T(u) of the MUL-tree M depicted in Fig. 3i is inextendible but the subMUL-tree is not. In fact, T(u) is maximal inextendible because the only equivalent copy of T(u) in M that is not T(u) is T(v) and neither T(u) nor T(v) is a subMUL-tree of an inextendible subMUL-tree in M. To construct F(M), we first construct a sequence of subMUL-trees of M which we call a guide sequence for F(M) and which we initialize with the empty sequence. Let T denote a maximal inextendible subMUL-tree of M. Let u denote the root of T, and let denote the set of vertices such that the subMUL-tree rooted at v is equivalent with T(u). Note that, by definition, . Then, for all , we first subdivide the incoming arc of v by a vertex (cf Fig. 2ii and then identify all vertices , , with the vertex (cf Fig. 2iii. By construction, clearly has |U| incoming arcs and also |U| outgoing arcs. From these |U| outgoing arcs of , we delete all but one arc and, for each deleted arc a, we remove the subMULtree T(v) rooted at the head v of a (Fig. 2iv. We then grow by adding an equivalent copy of T(u) at the end of in case is not the empty sequence. Otherwise we add T(u) as the first element to . Replacing M with the resulting graph , we then find a new maximal inextendible subMUL-tree in and proceed as before (where we canonically extend the notions of a maximal inextendible subMUL-tree and of a subMUL-tree rooted at a vertex to ). In the case of the example in Fig. 3, the next maximal inextendible subMUL-tree in Fig. 3ii is one of the leaves labelled . By construction, the process of subdividing (cf Fig. 2ii, identifying (cf Fig. 2iii, and deleting (cf Fig. 2iv terminates in a phylogenetic network on X. That network is F(M). We depict F(M) in Fig. 3(iv) for the MUL-tree M pictured in Fig. 3i. As was pointed out in (Huber and Moulton (2006), Section 6), F(M) is independent of the order in which ties are resolved when processing maximal inextendible subMUL-trees. Also, all tree vertices of F(M) have outdegree two because M is a binary MUL-tree. However, F(M) might contain hybrid vertices whose indegree is two or more since when processing a maximal inextendible subMUL-tree T there might be more than two subMUL-trees in the graph generated thus far that are equivalent with T. Finally, F(M) cannot contain arcs whose tail and head is a hybrid vertex because the hybrid vertices of F(M) are in bijective correspondence with the elements in the guide sequence for F(M). i The MUL-tree M obtained by unfolding the phylogenetic network on pictured in iv. The vertices u and v as indicated in i are the root of the maximal inexendible subtrees of M to which the subdivision, identification and deletion process described in Fig. 2 is applied to obtain the rooted directed acyclic graph G presented in iii. The two leaves labelled in G are the roots of two equivalent maximal inextendible subtree of G and applying the subdivision, identification, and deletion process to it results in F(M). In each case, the equivalent subMUL-trees are indicated by a double arrow We conclude the outline of both constructions with the following remark. Suppose N is a phylogenetic network on X. Then we call two tree vertices u and v in V(N) distinct an identifiable pair if the subMUL-trees of U(N) rooted at the vertex that is a directed path in N from the root of N to u is equivalent with the subMUL-trees of U(N) rooted at the vertex that is a directed path in N from to v. Let C(N) denote the compressed phylogenetic network obtained from N i. e. the phylogenetic network obtained from N by contracting all arcs (u, v) for which both u and v is a hybrid vertex. Bearing in mind that the phylogenetic network F(M) associated to a MUL-tree M was denoted in Huber and Moulton (2006), the following holds F(U(N)) does not contain an identifiable pair of vertices (Huber and Moulton 2006, Theorem 3). If N and are phylogenetic networks such that the MUL-trees U(N) and are equivalent then (Huber and Moulton 2006, Corollary 2(ii)). If N is a phylogenetic network that does not contain an identifiable pair of vertices then the compressed phylogenetic networks and C(N) are equivalent (Consequence of (R1) and (Huber and Moulton 2006, Theorem 2)).

Properties of phylogenetic networks that attain the hybrid number of a ploidy profile

In this section, we collect structural properties of phylogenetic networks that attain the hybrid number of a ploidy profile. For ease of readability, we will assume from now on that for a ploidy profile on X the elements in X are always ordered in such a way that holds for all and that is in descending order, that is, holds for all . We start with some notations and definitions. Suppose that N is a phylogenetic network on and that is a ploidy profile on X. Then we call simple if for all (i. e. is the only component of that is at least 2). Moreover, we call strictly simple if is simple and . We say that N realizes a ploidy profile if the elements in X can be ordered in such a way that holds for all . In this case, we also call N a realization of . Furthermore, we say that N is a binary realization of if N is binary. We say that N attains if N realizes and . In this case, we refer to N as an attainment of . If N is an attainment and also binary then we call N a binary attainment of . As is straight-forward to verify using the construction of the phylogenetic network indicated in Fig. 4 and the definition of m(x), , every ploidy profile on with is realized by a phylogenetic network that contains at most hybrid vertices. Thus, the hybrid number of a ploidy profile always exists. As we shall see in Proposition 2, this bound can be improved for many ploidy profiles.

Fig. 4

A phylogenetic network on that realizes the ploidy profile on X. For all , the number of curved lines is

A phylogenetic network on that realizes the ploidy profile on X. For all , the number of curved lines is To be able to collect some simple properties of attainments which we will do next, we require further terminology and notation. Suppose N is a binary phylogenetic network on X. Then we say that N is semi-stable if N is equivalent to a resolution of F(U(N)). Motivated by the fact that a beadless phylogenetic network N that is equivalent to F(U(N)) was called stable in Huber et al. (2016), we canonically extend this concept to our types of phylogenetic networks by saying that a phylogenetic network N is stable if N is equivalent with F(U(N)). For example, the binary phylogenetic network N depicted in Fig. 5i is semi-stable but not stable since U(N) is the MUL-tree depicted in Fig. 5ii and F(U(N)) is the phylogenetic network depicted in Fig. 5iii. The phylogenetic network pictured in Fig. 5iv is not semi-stable. In fact, for a binary phylogenetic network N to be stable it cannot contain the phylogenetic network pictured in Fig. 5iv as an induced subgraph (where and need not be leaves in ) since is the phylogenetic network depicted in Fig. 5v. As we shall see below, certain types of binary phylogenetic networks called beaded trees are examples of stable phylogenetic networks. Although introduced in Van Iersel et al. (2020) in the context of a study of binary phylogenetic networks whose root have indegree one and not zero as in our case, the main feature of beaded trees is that a hybrid vertex must be contained in a bead. In view of this, we call a binary phylogenetic network N on X a beaded tree if N is either a phylogenetic tree on X or every hybrid vertex is contained in a bead (see e. g. Huber et al. 2021 for more on such graphs). Then since a beaded tree N cannot contain an identifiable pair of vertices, it follows by (R3) that the compressed phylogenetic networks C(N) and F(U(N)) are equivalent. Since N is a beaded tree and so does not contain arcs whose tail and head are hybrid vertices, it follows that C(N) is in fact N. Thus, N must be stable.

Fig. 5

The phylogenetic network N depicted in i is semi-stable but not stable since it is not equivalent with F(U(N)) i. e. the phylogenetic network depicted in iii. the MUL-tree U(N) is pictured in ii. The phylogenetic network pictured in iv is not semi-stable. For a phylogenetic network to be stable it cannot contain the phylogenetic network pictured in iv as an induced subgraph since Suppose N is an attainment of a ploidy profile on X that contains a cut-arc a. Then deleting a results in two connected components and , one of which contains the root of N, say , and the other is a phylogenetic network on . For we let denote the phylogenetic network on obtained from by adding a pendant arc to tail(a) and labelling the head of by x. For any phylogenetic network N on X, we denote by the ploidy profile on X realized by N.

Lemma 1

Suppose that N is an attainment of a ploidy profile on X. Then the following holds. F(U(N)) and any resolution of F(U(N)) is an attainment of . N is semi-stable. Suppose N contains a cut-arc a and and are the connected components of N obtained by deleting a. If and then is an attainment of and is an attainment of .

Proof

(i): Clearly, U(N) is the unfold of N and also of F(U(N)). In view of (R2), we obtain . Since N is a attainment of and F(U(N)) realizes it follows that must hold too. Thus, . Consequently, F(U(N)) is an attainment of . To see the remainder, suppose for contradiction that F(U(N)) has a resolution D that is not an attainment of . Then ; a contradiction. (ii): Since N is an attainment of it cannot contain a pair of identifiable vertices as otherwise would hold which is impossible in view of Assertion (i). By (R3) it follows that the compressed networks C(N) and C(F(U(N))) are equivalent. Hence N must be a resolution of F(U(N)). (iii): Since a is a cut-arc of N and therefore cannot have a head that is a hybrid vertex, we have . Since every directed path from the root of N to a leaf of must cross a because a is a cut-arc of N it follows that holds for all . This implies the statement. The unfold and fold-up operations described in Sect. 2.2 lie at the heart of the proof of Proposition 1.

Proposition 1

Suppose is a ploidy profile on and that N is an attainment of . Then there must exist a directed path P from the root of F(U(N)) to in F(U(N)) such that every hybrid vertex in F(U(N)) lies on P. If, in addition, N is stable then P must be a directed path in N. Put . Suppose for contradiction that there exists no directed path from the root of F(U(N)) to in F(U(N)) that contains all hybrid vertices of F(U(N)). Then since N is an attainment of , Lemma 1 implies that F(U(N)) is also an attainment of . Consequently, . Let , some , denote a guide sequence for F(U(N)). Without loss of generality we may assume that since otherwise F(U(N)) only contains one hybrid vertex and, so, the proposition holds. Then there must exist some such that is not a subMUL-tree of as otherwise all hybrid vertices of F(U(N)) would lie on a directed path from to . Without loss of generality, we may assume that i is as small as possible with this property, i. e. is a subMUL-tree of , for all . Let M denote the MUL-tree obtained from U(N) as follows. For let denote the number of equivalent copies of in U(N). Let . Then . Choose t equivalent copies of in U(N). For all , delete the incoming arc of the root of . Next choose t equivalent copies of in U(N) and, for all , subdivide the incoming arc of the root of by a vertex . Note that this is possible since is the first element in and so cannot be U(N). Last-but-not-least, add the arcs , for all . Since this might have resulted in arcs whose head is not contained in X and also vertices that have indegree one and outdegree one, we clean the resulting MUL-tree by removing the former and repeatedly suppressing the latter. Also we repeatedly identify the root with its unique child if this has rendered it a vertex with outdegree one. By construction, F(M) is a phylogenetic network that realizes . Furthermore, must hold since ; a contradiction as N is an attainment of . The remainder of the proposition is an immediate consequence because N and F(U(N)) are equivalent in this case. Since, as mentioned above, beaded trees are stable phylogenetic networks the corresponding result for beaded trees in (Van Iersel et al. 2020, Lemma 13) is a consequence of Proposition 1 (once an incoming arc has been added to the root).

Lemma 2

Suppose is a simple ploidy profile on X such that is a prime number. Then any cut-arc in an attainment of must be trivial. Suppose N is an attainment of . Then the phylogenetic network obtained from N by removing, for all , the cut arcs ending in a leaf of N as well as the leaves (suppressing the resulting vertices of indegree one and outdegree one and also the root in case this has rendered it an outdegree one vertex) is a phylogenetic network on . Note that since none of the elements indexing , , contributes to h(N), we have . Thus, is an attainment of the ploidy profile . Put and . If then the lemma clearly holds since the only cut arc of is the incoming arc of and therefore is trivial. So assume that . Assume for contradiction that has a non-trivial cut-arc a. Let and denote the connected components of obtained by deleting a. Assume without loss of generality that the root of is contained in . Let . Then since for all leaves z in a phylogenetic network M the number of directed paths from the root of M to z is it follows that . Since and m is prime this is impossible.

Realizing simple ploidy profiles

We start this section with associating to a simple ploidy profile a binary phylogenetic network that is based on the prime factor decomposition of and also a binary phylogenetic network that is based on the unique bitwise representation of . As we shall see, other ways to define binary realizations of that are based on the prime factor decomposition of or on the bitwise representation of and that are similar in spirit to the definitions of and are conceivable. Furthermore, the ploidy profiles considered in Fig. 6 suggest that the relationship between the number of hybrid vertices in and in is not straight forward.

Fig. 6

For a strictly simple ploidy profile we depict in i, iii, v and viii the phylogenetic network and in ii, iv, and vi the phylogenetic network . i and ii: and ; iii and iv: and ; v and vi: and . vii A realization of the ploidy profile that uses eight hybrid vertices. viii The realization of the ploidy profile in vii in terms of

Suppose that , , is a ploidy profiles on . For a strictly simple ploidy profile we depict in i, iii, v and viii the phylogenetic network and in ii, iv, and vi the phylogenetic network . i and ii: and ; iii and iv: and ; v and vi: and . vii A realization of the ploidy profile that uses eight hybrid vertices. viii The realization of the ploidy profile in vii in terms of

The phylogenetic network

We begin with introducing further terminology. Suppose that m is a positive integer and that, for all , is a prime and is an integer such that is a prime factor decomposition of m. Without loss of generality, we may assume throughout the remainder of the paper that the primes are indexed in such a way that holds for all . For all , let denote the strictly simple ploidy profile on . Also let denote a binary phylogenetic network on Y that attains . Note that need not be unique. For all , we then define a binary phylogenetic network on Y as follows:

The phylogenetic network

We take the root of to be the root of . If then we take to be . If then we make equivalent copies of and order them in some way. Next, we identify the unique leaf of the first of the copies of under that ordering with the root of the second copy of and so on until we have processed all copies of this way. The resulting directed acyclic graph is in this case. To illustrate this construction, assume that . Then , , and . Furthermore, the phylogenetic network depicted in Fig. 3iv with the leaf and its incoming arc removed, and the resulting vertex of indegree and outdegree one suppressed, is .

From to in case is strictly simple

Suppose is strictly simple. Then we obtain by ‘stacking’ the networks obtained as described above for a prime factor decomposition of and a choice of attainment of , for all . If then is . So assume . Then we define to be the phylogenetic network on obtained by identifying, for all , the unique leaf of with the root of . For the convenience of the reader, we depict for the strictly simple ploidy profile on in Fig. 6iv.

From to in case is not strictly simple

For all primes p in the prime factor decomposition of , choose a binary attainment of the strictly simple ploidy profile and construct the network for the strictly simple ploidy profile as described above. That network we then process further as follows. First, we choose an outgoing arc a of the root of and subdivide it with subdivision vertices where, starting at the tail of a, the first subdivision vertex is , the next is , and so on. To the vertices , we then add the arcs to obtain ). As an immediate consequence of the construction of , we have that does not contain an identifiable pair of vertices. In view of (R1) it follows that is semi-stable. In summary, we therefore have the following result.

Lemma 3

Suppose is a simple ploidy profile on X. Then is a binary, semi-stable phylogenetic network on X that realizes . Note that as the strictly simple ploidy profile with shows, the phylogenetic network depicted in Fig. 6v uses fewer hybrid vertices to attain than the phylogenetic network depicted in Fig. 6vi. Thus, an attainment of a simple ploidy profile need not be obtained from a prime factor decomposition of the first component of . For the reaminder of this section, assume again that , is a simple ploidy profile on . We start with associating two vectors to a positive integer m which we call the bitwise representation (of m) and the binary representation (of m), respectively. For m a positive integer, the first is the 0-1 vector such that . For ease of presentation, and unless stated otherwise, we denote by the most significant bit that is one. The second is the vector , and , for all , such that holds. Informally speaking, the j-th entry of that vector is the exponent of the term in the bitwise representation of m. Note that indexes the component of . For example, the bitwise representation of is (1, 0, 1, 1) and the binary representation of m is (3, 1, 0).

The phylogenetic network in case is strictly simple

Then and . Let B(q) denote the beaded tree with unique leaf and hybrid vertices. Let denote the binary representation of . Then is obtained from the beaded tree as follows. Choose one the two outgoing arcs of the root of and subdivide it with vertices not contained in so that is the child of the root of , is the child of , and so on. For all , we then add an arc to whose head is a subdivision vertex of the outgoing arc of the hybrid vertex of that has precisely hybridization vertices of strictly below it. We refer the interested reader to Fig. 6iii for an illustration of for the strictly simple ploidy profile .

The phylogenetic network in case is not strictly simple

We first construct the phylogenetic network for the strictly simple ploidy profile on . Next, we choose one of the two outgoing arcs of the root of and subdivide that arc with subdivision vertices such that is the child of the root of , is the child of and so on. Finally, we attach to each the arc , . To illustrate this construction, consider the simple ploidy profile on . Then and the phylogenetic network D depicted in Fig. 8 is . In fact, is a binary attainment of .

Fig. 8

The construction of for the ploidy profile on where we have combined the steps and into the step . The leftmost network D on is an attainment of in the form of and initializes the construction of . The network on realizes the ploidy profile and the network on X realizes the ploidy profile . The rightmost network is . The arrow labels indicate how a ploidy profile in was obtained

As indicated in Fig. 6, the relationship between , , and a binary attainment of a simple ploidy profile is far from clear in general. This holds even if is strictly simple and m is a prime. Indeed for the hybrid number of is at most eight since the phylogenetic network depicted in Fig. 6vi realizes . However . This implies that, in general, with and p a prime cannot be used as an attainment with which to initialize the construction of . As an immediate consequence of the construction of , we have the following companion result of Lemma 3 since similar arguments as in the case of imply that is semi-stable.

Lemma 4

Suppose is a simple ploidy profile on X. Then is a binary, semi-stable phylogenetic network on X that realizes . To gain insight into the structure of , we next present formulae for counting, for a simple ploidy profile , the number of vertices in and also the number of hybrid vertices of . Note that such formulae are known for certain types of phylogenetic networks without beads (see e.g. McDiarmid et al. 2015; van Iersel and Kelk 2011 and Steel 2016 for more). To state them, we require further terminology. Suppose is an integer and is the bitwise representation of m. Then we denote by p(m) the number of non-zero bits in bar the first one. For example, if then . Furthermore, we denote the dimension of a vector by . Armed with this, the construction of from a simple ploidy profile implies our first main result.

Theorem 1

Suppose that , , is a simple ploidy profile. Let , some , denote the binary representation of . ThenFurthermore, has hybrid vertices. We remark in passing that in case is strictly simple then any binary phylogenetic network N that realizes has vertices since N has only one leaf and, so, the number of tree vertices of N plus the root must equal its number of hybrid vertices. Note that in case N is then this also follows from Theorem 1 since and is the number of hybrid vertices of N and therefore also the number of tree vertices of N plus the root.

Realizing general ploidy profiles

To help establish a formula for computing the hybrid number of a ploidy profile, we start by associating a binary phylogenetic network on X to a ploidy profile on X that realizes . This network is recursively obtained via a two-phase process which we present in the form of pseudo-code in Algorithms 1 (Phase I) and 2 (Phase II). We next outline both phases and refer the reader to Fig. 7 for an illustration of the three cases considered in Algorithm 2 and to Fig. 8 for an illustration of the construction of from the ploidy profile . The phylogenetic network D in that figure is the phylogenetic network with which the construction of is initialized.

Fig. 7

The three cases in the construction of the network from a ploidy profile considered in Algorithm 2. For , the case is depicted in i, the case in ii, and the case in iii. In iii, the dashed arc and the vertex are deleted and the vertex v is suppressed. In each case, the grey disk indicates the part of the phylogenetic network of no relevance to the discussion

Suppose is a ploidy profile on X. Then, in Phase I, we iteratively generate a simple ploidy profile from . This process is captured via a sequence of ploidy profiles which we call the simplification sequence for and formally define as the output of Algorithm 1 when given as input. The first element of is and the last element is a simple ploidy profile which we call the terminal element of and denote by . We denote the number of elements of other than by . Note that if is a simple ploidy profile then as holds in this case. Informally speaking, the purpose of is to allow us to construct, for all , the network from by reusing (or parts of it) as much as possible (see Huber and Maher 2022 for more on such sequences). To formally state Algorithm 1, we require further notations. Suppose is a ploidy profile on X. Then we denote for all the element of X that indexes by . Furthermore, for any non-empty sequence and any z, we denote by the sequence obtained by adding z to the end of . Phase II is concerned with generating the phylogenetic network from the simplification sequence of and the set (for both see Phase I), and an attainment of . Note that in case an attainment for is not known, we can always initialize the construction of with or . The number of hybrid vertices of the generated network in this case is an upper bound on and therefore also on the hybrid number of . To obtain , we use a trace-back through starting with . More precisely, assume that , some and are two ploidy profiles in , some . Then to obtain from we distinguish again between the cases that , and , see Fig. 7. Note that there might be non-equivalent attainments of with which to initialize the construction of . The three cases in the construction of the network from a ploidy profile considered in Algorithm 2. For , the case is depicted in i, the case in ii, and the case in iii. In iii, the dashed arc and the vertex are deleted and the vertex v is suppressed. In each case, the grey disk indicates the part of the phylogenetic network of no relevance to the discussion To illustrate the construction of , consider the ploidy profile on . Then , (6, 6, 6, 5), (6, 6, 5), (6, 5), (5, 1) is the simplification sequence associated to because, by definition, the first element of is always . The ploidy profile (5, 1) is . The phylogenetic network D on on the left of Fig. 8 is an attainment of in the form of . Initializing Algorithm 2 with yields the phylogenetic network at the right of that figure. Apart from the second arrow which is labelled as it combines the steps and , each arrow is labelled with the corresponding traceback step in . The construction of for the ploidy profile on where we have combined the steps and into the step . The leftmost network D on is an attainment of in the form of and initializes the construction of . The network on realizes the ploidy profile and the network on X realizes the ploidy profile . The rightmost network is . The arrow labels indicate how a ploidy profile in was obtained For any attainment of the terminal element of the simplification sequence of a ploidy profile on X, the graph is a phylogenetic network on X that realizes . Also, at each step in the traceback through the number of vertices is increased by exactly two. Denoting the number of vertices of by and the number of vertices in a binary attainment of by , we obtain our next result.

Lemma 5

Suppose is a ploidy profile on X. Then for any binary attainment of used in the initialization of the construction of , we have that is a binary phylogenetic network on X that realizes . Furthermore, . In combination with Theorem 1, it follows that has at most vertices and also at most hybrid vertices where , some , and is the first component in the binary representation of . Furthermore, we have

Proposition 2

Suppose is a ploidy profile on X such that is a binary attainment of . For all , let denote the binary representation of , some . Then the following holds. . In case is simple, which is sharp. If holds for all then . (i) To see the stated inequality, we construct a binary phylogenetic network B on from as follows. For all , we first construct where is the strictly simple ploidy profile . Next, we add a new vertex and, for all , an arc from to the root of . If the resulting phylogenetic network on X is binary then that network is B. Otherwise, B is a phylogenetic network obtained by resolving so that has outdegree two. By construction, B realizes because realizes , for all . By Theorem 1, it follows that . Thus, , as required. If is simple then and so . (ii) This is a straight forward consequence of (i) and the fact that in this case is the beaded tree . Note that as the example of the ploidy profile for some shows, there exists an infinite family of ploidy profiles for which the length of the simplification sequence for is at least and therefore grows exponentially in l. As a consequence of this, we also have, for any attainment of , that the number of hybrid vertices in can grow exponentially in l. In view of this, we next study simplification sequences for special types of ploidy profiles. To this end we call an element maximum if is the last component of a ploidy profile , , that is not one.

Proposition 3

Suppose is a ploidy profile on X. Let q denote the maximum index of . Then the following holds If is an integer such that holds for all then . If and are integers such that holds for all then . Note first that for both statements, we may assume without loss of generality that since elements in X with ploidy number one do not contribute to . (i): Since holds for all , the difference in dimension between any two consecutive ploidy profiles in is one. Hence, operations are needed to transform into . Consequently, . (ii): Since holds for all , it follows that operations are needed to transform into a ploidy profile of the form where the components after the last k may or may not exist. To transform into a ploidy profile of the from a further operations are needed. By Assertion (i), a further operations are needed to transform into a simple ploidy profile. Since is the concatenation of the underlying simplification sequences it follows that . Together with Lemma 5, the next result may be viewed as the companion result of Lemmas 3 and 4 for general ploidy profiles.

Proposition 4

For any ploidy profile on X and any binary attainment of the terminal element in , the graph is a binary, semi-stable phylogenetic network on X that realizes . In view of Lemma 5, it suffices to show that is semi-stable. Assume for contradiction that there exists a ploidy profile on X such that is not semi-stable. Since the construction of is initialized with an attainment of the terminal element of , some and, by Lemma 1(ii), an attainment is semi-stable there must exist some such that the network is not semi-stable but all networks , are semi-stable. Without loss of generality, we may assume that . Put . We claim first that . Indeed, if then . Hence, Line 6 in Algorithm 2 is executed to obtain from . Since, by assumption, is not semi-stable it follows that is not semi-stable; a contradiction. Thus, , as claimed. We next claim that cannot hold either. Assume for contradiction that . Put . Assume first that . Then Line 8 in Algorithm 2 is executed to obtain from . Since is semi-stable, and this does not introduce an identifiable pair of vertices in , it follows that is also semi-stable which is impossible. So assume that . Then Line 10 in Algorithm 2 is executed to obtain from . Similar arguments as in the previous two cases imply again a contradiction. This completes the proof of the claim. Thus, must hold. Consequently, is not a ploidy profile; a contradiction. Thus, must be semi-stable.

The hybrid number of a ploidy profile

In this section, we prove Theorem 2 which implies a closed formula for the hybrid number of a ploidy profile (Corollary 1). To help illustrate our theorem, we remark that for Line 8 in Algorithm 2 not to be executed we must have for every element , some , in the simplification sequence of that does not hold.

Theorem 2

Suppose is a ploidy profile on X such that, for every ploidy profile in , Line 8 in Algorithm 2 is not executed. If is an attainment for with which the construction of is initialized then is an attainment for . Put and assume that is such that is an attainment of . Suppose , . Note that we may assume that as otherwise is simple. Hence, and, so, the theorem follows by assumption on . Similar arguments as before imply that we may also assume that is not simple. Assume for contradiction that is not an attainment of . Let Q denote an attainment of . Then . In view of Proposition 1, there must exist a directed path R in F(U(Q)) from the root of F(U(Q)) to that contains all hybrid vertices of F(U(Q)). Since as C(Q) and F(U(Q)) are equivalent by (R3), it follows that we may also assume that Q is binary and that R gives rise to a path P from to that contains all hybrid vertices of Q. Since the construction of is initialized with an attainment of , there must exist a ploidy profile in such that there exists a binary phylogenetic network that realizes and for which holds. Without loss of generality, we may assume that is such that for all ploidy profiles succeeding in we have for all binary phylogenetic networks that realize . For ease of presentation we may assume that . Put , some . Also, put , , and . Since Line 8 in Algorithm 2 is not executed for any element in , it follows that either or that since either Line 6 or Line 10 of that algorithm must be executed in a pass through the algorithm’s while loop. Case (a): Assume that . Let and as in Line 7 in Algorithm 2. Let such that holds. By the minimality of h(Q) it follows that the induced subgraph T of Q connecting the elements in must be a phylogenetic tree on where, for all , we put . Subject to potentially having to relabel the leaves of T, we may assume that is a cherry in T. Since the directed acyclic graph obtained from Q by deleting and its incoming arc (suppressing resulting vertices of indegree and outdegree one) and renaming by , for all , is a phylogenetic network on . Clearly, realizes since Q realizes . By assumption on it follows that is an attainment of . Hence, . Since N is obtained from by executing Line 6 in Algorithm 2 it follows that because T is a tree; a contradiction. Consequently, N must attain in this case. Case (b): Assume that . Let j, , and be as in Line 11 in Algorithm 2. We start with analyzing the structure of Q with regards to and . To this end, note first that must hold since otherwise is simple and the theorem follows in view of our observation at the beginning of the proof. By assumption on Q, there must exist a hybrid vertex h on P such that there is a directed path from h to because . Without loss of generality, we may assume that h is such that every vertex on other than h is either a tree vertex or a leaf of Q. Let t be the last vertex on P that is also contained in . We next transform Q into a new phylogenetic network that is an attainment of (see Fig. 9 for an illustration). To do this, note first that since there must exist a hybrid vertex on P below t. We modify Q as follows to obtain a further attainment of . If t is the parent of then is Q. So assume that t is not the parent of . Then we delete the subtree T of Q that is rooted at the child of t not contained in P. Note that T must have at least two leaves. Next, we subdivide the incoming arc of t by subdivision vertices. To each created subdivision vertex we add an arc and bijectively label the heads of these arcs by the elements in . Next, we add an arc to t and label its head by so that t is now the parent of . By construction, is a phylogenetic network on X that attains because .

Fig. 9

The transformation of Q i into the phylogenetic networks ii and iii as described in Case (b) of Theorem 2 for . In each case, the dashed lines indicate paths. Note that in iii the dashed line could also start at Let be a hybrid vertex on the subpath of P from t to so that no vertex strictly below is a hybrid vertex of . Let denote the incoming arc of that lies on . Furthermore, let denote the incoming arc of that does not lie on . For , let denote the tail of . Note that might hold. Also note that the assumptions on Q imply that must be below t. Finally, note that must be a hybrid vertex unless . We claim that if then any vertex v on other than t and must be a hybrid vertex. Assume for contradiction that there exists a vertex on that is a tree vertex. We show first that must also be below t. Since all hybrid vertices of Q lie on P, it follows that, v contributes at least to the number of directed paths from to as is the number of directed paths from to and therefore, also from to t. Since contributes at least one further directed path from to in case is not below t, it follows that for some . Hence, because . Thus, ; a contradiction as . Hence, must also be below t, as required. We next show that must be a vertex on . Indeed, if were not a vertex of then it cannot be a hybrid vertex in view of our assumptions on Q. Thus, must be a tree vertex in this case. Since we obtain a contradiction as the choice of implies that is the parent of . Thus, must be a vertex of , as required. Since is a tree vertex it contributes at least directed paths from to . Since contributes at least a further directed paths from to , we obtain a contradiction using similar arguments as before. Thus any vertex on other than t and must be a hybrid vertex in case , as claimed. We claim that if then has precisely 4 vertices and there exists two arcs from to . To see this claim, note that contributes at least directed paths from to because it is a tree vertex. If there existed a vertex v on distinct from , , , t then v would contribute at least further directed paths from to . Thus, we have again at least directed paths from to . Similar arguments as in the previous claim yield again a contradiction. By the choice of it follows that t, , and are the only vertices on . Since and are the parents of and , it follows that there are two parallel arcs from to . This concludes the proof of our second claim. Bearing in mind the previous two claims, we next transform into a new phylogenetic network on X as follows. If then we first delete from and add an arc from to the child of t on . Next, we remove the arc and suppress and t as they are now vertices with indegree one and outdegree one. The resulting directed acyclic graph is . By construction, is clearly a phylogenetic network on X. Furthermore, the construction combined with our two claims, implies that realizes because the arc contributes directed paths from to in Q and therefore also in . By construction, . Furthermore, by the construction of N from . By the minimality of h(Q) and the choice of , it follows that ; a contradiction. This concludes the proof of the theorem in case . If then we delete one of the two parallel arcs from to and suppress and as this has rendered them vertices of indegree one and outdegree one. The resulting directed acyclic graph is in this case. As before, is a phylogenetic network that, in view of our second claim, realizes . Similar arguments as in the case that yield again a contradiction. This concludes the proof of the theorem in this case, and therefore, the proof of the theorem. To illustrate Theorem 2, note that the ploidy profile in Fig. 1 satisfies the assumptions of Theorem 2. Consequently, the phylogenetic network depicted in that figure is an attainment of . As the example depicted in Fig. 10 indicates, the assumption that Line 8 in Algorithm 2 is not executed is necessary for Theorem 2 to hold. In fact, if is a ploidy profile such that contains the subgraph highlighted by the dashed rectangle in the network in Fig. 10, then can in general not be an attainment of .

Fig. 10

i The phylogenetic network for the ploidy profile on obtained via Algorithms 1 and 2. ii A phylogenetic network on X that attains and has fewer hybrid vertices than

i The phylogenetic network for the ploidy profile on obtained via Algorithms 1 and 2. ii A phylogenetic network on X that attains and has fewer hybrid vertices than A phylogenetic network on leaf set V.langsdorffii, V.tracheliifolia, V.grahamii, V.721palustris, V.blanda, V.933palustris, V.glabella, V.macloskeyi, V.repens, V.verecunda, Viola, Rubellium adapted from a more general phylogenetic network that appeared as Figure 4 in Marcussen et al. (2012). Hybrid vertices are indicated with a filled circle and labelled by their corresponding ploidy number i. e. the number of directed paths from the root to the vertex times two because the root is assumed to be diploid. Leaves are labelled by the first two characters of their names (omitting ’V.’, where applicable) Theorem 2 and Case (b) in its proof combined with Theorem 1 and Proposition 2 implies our next result since additional hybrid vertices are inserted into to obtain where is a simple ploidy profile and , , is the binary representation of the first component of . To state it we require a further definition. Let denote the simplification sequence of a ploidy profile . Then we denote by the number of steps in , for which holds where and .

Corollary 1

Suppose is a ploidy profile such that Line 8 in Algorithm 1 is not executed when constructing . Then . If is an attainment of and is the binary representation of the first component of , some , then .

A Viola dataset

In this section, we turn our attention to computing the hybrid number of the ploidy profile of a Viola dataset that appeared in more general form in Marcussen et al. (2012). Denoting that dataset by X, the authors of Marcussen et al. (2012) constructed a MUL-tree M on X and then used the PADRE software Huber et al. (2006) to derive a phylogenetic network N to help them shed light on the evolutionary past of their Viola species (Marcussen et al. 2012, Figure 4). We depict a simplified network representing that past in Fig. 11i the only difference being that we have removed species that are not below a hybrid vertex of N as they do not contribute to the number of hybrid vertices of N. If more than one species were below a hybrid vertex of N, then we have also randomly removed all but one of them thereby ensuring that the hybrid vertex is still present in . The resulting simplified dataset comprises the taxa V.langsdorffii, V.tracheliifolia, = V.grahamii, V.721palustris, V.blanda, V.933palustris, V.glabella, V.macloskeyi, V.repens V.verecunda, Viola, and Rubellium (see Huber and Maher 2022 for more details on the simplified dataset). The labels of the internal vertices of represent the ploidy number of the ancestral species represented by that vertex where we canonically extend the concept of a ploidy profile to the interior vertices of a phylogenetic network. By counting directed paths from the root to each leaf, it is easy to check, .

Fig. 11

A phylogenetic network on leaf set V.langsdorffii, V.tracheliifolia, V.grahamii, V.721palustris, V.blanda, V.933palustris, V.glabella, V.macloskeyi, V.repens, V.verecunda, Viola, Rubellium adapted from a more general phylogenetic network that appeared as Figure 4 in Marcussen et al. (2012). Hybrid vertices are indicated with a filled circle and labelled by their corresponding ploidy number i. e. the number of directed paths from the root to the vertex times two because the root is assumed to be diploid. Leaves are labelled by the first two characters of their names (omitting ’V.’, where applicable)

By taking directed paths from the root to the leaves of , we obtain the ploidy profile on X. Note, since the root is diploid (labelled ), multiplying each component of by two results in the ploidy numbers induced by the hybrid vertices in the network. The simplification sequence for contains twelve elements and . Since an attainment of must have one hybrid vertex and are equal and have one hybrid vertex each, it follows that is an attainment for . The phylogenetic network obtained by initializing Algorithm 2 with is depicted in Fig. 11ii. Since at no stage in the construction of Line 8 of that algorithm is executed, it follows by Theorem 2 that is an attainment of . Counting again directed paths from the root to each leaf, it is easy to check that has five hybrid vertices implying that . To compute the hybrid number of a ploidy profile whose components are not too large and, thererfore, we can find an attainment of its terminal element, we refer the interested reader to our R-function ‘ploidy profile hybrid number bound (PPHNB)’ which is obtainable from [1].

Discussion

Motivated by the signal left behind by polyploidization, we have introduced and studied the problem of computing the hybrid number of a ploidy profile . Our arguments apply, however, to any type of dataset that induces a multiplicity vector. Although stated within a phylogenetics context, the underlying optimization problem is, at its heart, a natural mathematical problem: “Given a multiplicity vector find a rooted, leaf-labelled, directed acyclic graph G so that is the path-multiplicity vector of G and the cyclomatic number of G is minimum”. Our results might therefore be also of relevance beyond phylogenetics. Using the framework of a phylogenetic network, we provide a construction of a phylogenetic network that is guaranteed to attain a ploidy profile for a large class of ploidy profiles provided the construction of is initialized with an attainment of the terminal element of the simplification sequence associated to . Members of that class include the ploidy profiles described in Proposition 3(ii). As a consequence, we obtain an exact formula for the hybrid number of and also the size of the vertex set of in terms of the length of and the number of vertices of for the members of our class. In case the ploidy numbers that make up are not too large, both and can be computed easily by computing ) to obtain and using, for example, an exhaustive search for ). Having said this, we also present an infinite family of ploidy profiles for which grows exponentially. Motivated by this, we provide a bound for and show that that bound is sharp for certain types of ploidy profiles. To help demonstrate the applicability of our approach, we compute the hybrid number of a simplified version of a Viola dataset that appeared in more general form in Marcussen et al. (2012). Our result suggests that the authors of Marcussen et al. (2012) potentially overestimate the number of polyploidization events that gave rise to their dataset. Despite these encouraging results, numerous questions that might merit further research remain. These include “What can be said about if the ploidy profile is not a member of our class?”, and “Can we shed more light on the length of and also into attainments of the terminal element of ?”. Looking a little bit further afield, it might also be of interest to explore the relationship between so called accumulation phylogenies introduced in Baroni and Steel (2006) and ploidy profiles and also the relationship between ploidy profiles and ancestral profiles introduced in Steel et al. (2019).

11 in total

The hybrid number of a ploidy profile.

Introduction

Preliminaries

Basic concepts

The fold-up F(U(N)) of the unfold U(N) of a phylogenetic network N

Properties of phylogenetic networks that attain the hybrid number of a ploidy profile

Lemma 1

Proof

Proposition 1

Lemma 2

Realizing simple ploidy profiles

The phylogenetic network

The phylogenetic network

From to in case is strictly simple

From to in case is not strictly simple

Lemma 3

The phylogenetic network in case is strictly simple

The phylogenetic network in case is not strictly simple

Lemma 4

Theorem 1

Realizing general ploidy profiles

Lemma 5

Proposition 2

Proposition 3

Proposition 4

The hybrid number of a ploidy profile

Theorem 2

Corollary 1

A Viola dataset

Discussion

Review 1. Less is better: new approaches for seedless fruit production.

2. Reconstructing the evolutionary history of polyploids from multilabeled trees.

3. A class of phylogenetic networks reconstructable from ancestral profiles.

4. Statistical inference of allopolyploid species networks in the presence of incomplete lineage sorting.

5. A distance metric for a class of tree-sibling phylogenetic networks.

6. Phylogenetic networks from multi-labelled trees.

7. Inferring species networks from gene trees in high-polyploid North American and Hawaiian violets (Viola, Violaceae).

8. Folding and unfolding phylogenetic trees and networks.

9. The rigid hybrid number for two phylogenetic trees.

10. From gene trees to a dated allopolyploid network: insights from the angiosperm genus Viola (Violaceae).