Katharina T Huber1, Leo Van Iersel1, Vincent Moulton2, Taoyang Wu1. 1. School of Computing Sciences, University of East Anglia, Norwich, UK, and Centrum Wiskunde & Informatica (CWI), Amsterdam, Netherlands. 2. School of Computing Sciences, University of East Anglia, Norwich, UK, and Centrum Wiskunde & Informatica (CWI), Amsterdam, Netherlands vincent.moulton@cmp.uea.ac.uk.
Abstract
Phylogenetic networks are a generalization of evolutionary trees and are an important tool for analyzing reticulate evolutionary histories. Recently, there has been great interest in developing new methods to construct rooted phylogenetic networks, that is, networks whose internal vertices correspond to hypothetical ancestors, whose leaves correspond to sampled taxa, and in which vertices with more than one parent correspond to taxa formed by reticulate evolutionary events such as recombination or hybridization. Several methods for constructing evolutionary trees use the strategy of building up a tree from simpler building blocks (such as triplets or clusters), and so it is natural to look for ways to construct networks from smaller networks. In this article, we shall demonstrate a fundamental issue with this approach. Namely, we show that even if we are given all of the subnetworks induced on all proper subsets of the leaves of some rooted phylogenetic network, we still do not have all of the information required to completely determine that network. This implies that even if all of the building blocks for some reticulate evolutionary history were to be taken as the input for any given network building method, the method might still output an incorrect history. We also discuss some potential consequences of this result for constructing phylogenetic networks.
Phylogenetic networks are a generalization of evolutionary trees and are an important tool for analyzing reticulate evolutionary histories. Recently, there has been great interest in developing new methods to construct rooted phylogenetic networks, that is, networks whose internal vertices correspond to hypothetical ancestors, whose leaves correspond to sampled taxa, and in which vertices with more than one parent correspond to taxa formed by reticulate evolutionary events such as recombination or hybridization. Several methods for constructing evolutionary trees use the strategy of building up a tree from simpler building blocks (such as triplets or clusters), and so it is natural to look for ways to construct networks from smaller networks. In this article, we shall demonstrate a fundamental issue with this approach. Namely, we show that even if we are given all of the subnetworks induced on all proper subsets of the leaves of some rooted phylogenetic network, we still do not have all of the information required to completely determine that network. This implies that even if all of the building blocks for some reticulate evolutionary history were to be taken as the input for any given network building method, the method might still output an incorrect history. We also discuss some potential consequences of this result for constructing phylogenetic networks.
Modern systematics assumes a tree as an integral component of the evolutionary model (Penny et al. 1992). However, genome science is also delivering a level of complexity previously under appreciated for many biological systems and organisms (e.g., Mallet 2007; Abbott et al. 2013; Liu et al. 2013; Muhlfeld et al. 2014). This growing appreciation has in turn motivated the development of phylogenetic networks (see e.g., Huson et al. 2010; Morrison 2011. These networks are a generalization of evolutionary trees and, in the broadest sense, can be any type of graph-theoretical network that is used to represent potentially complex patterns of evolutionary relationship.Some networks, also referred to as data-display or split networks (Dress and Huson 2004; Morrison 2010), attempt only to represent bipartitions or splits in data, and the evidence these splits provide for contradictory relationships. In these networks the internal nodes usually have no explicit meaning. Such networks generalize unrooted evolutionary trees and have been used to visualize homoplasy and detect errors in human sequence data (e.g., Bandelt et al. 2000; Bandelt et al. 2001), for visualizing the support for particular bifurcating trees and hypotheses (e.g., Holland et al. 2005) and for exploring the genetic complexity of plant and animal data sets (e.g., Morrison 2005). There are numerous ways of computing these graphs (e.g., Huson and Bryant 2006). Methods essentially differ in the extent to which they visualize incompatibilities either because of the way they compute the splits and/or because of the dimensionality of the displayed network.Other networks, also referred to as genealogical networks, are constructed to model evolutionary history wherein the evolution is suspected of being reticulate in nature. These networks, which are the focus of the present study, are typically rooted and contain internal vertices that represent hypothetical ancestors and leaves that represent taxa sampled from the data (extant or extinct). They are directed graphs with a single root vertex and leaves labeled with taxon names (see e.g., Fig. 1 and Mathematical Definitions section). They also contain no directed cycles, thus ensuring that no taxon can be a descendent of itself. In these networks, vertices with more than one parent correspond to taxa that are formed by reticulate evolutionary events such as recombination or hybridization. In particular, a rooted evolutionary tree is a special type of rooted phylogenetic network which does not represent any reticulate evolutionary events. Genealogical networks are reviewed in for example (Huson et al. 2010), and have been used to study the evolution of organisms such as plants (Marcussen et al. 2011), viruses (Visser et al. 2012), and bacteria (Kunin et al. 2005).
F
i) and ii): Two phylogenetic networks on the set of taxa . iii) and iv): The trinets induced on the leaves in networks in (i) and (ii), respectively. (v): The trinet induced on the leaves by both of the networks in (i) and (ii). Here the network in (i) is the subnetwork of the network in (van Iersel et al. 2009, Fig. 10) computed from a data set of the yeast Cryptococcus gattii, where taxa correspond to taxa 1,16,8,18,7 and 20, respectively, in the yeast network.
i) and ii): Two phylogenetic networks on the set of taxa . iii) and iv): The trinets induced on the leaves in networks in (i) and (ii), respectively. (v): The trinet induced on the leaves by both of the networks in (i) and (ii). Here the network in (i) is the subnetwork of the network in (van Iersel et al. 2009, Fig. 10) computed from a data set of the yeastCryptococcus gattii, where taxa correspond to taxa 1,16,8,18,7 and 20, respectively, in the yeast network.Various methods have been proposed to construct genealogical phylogenetic networks, although it is generally agreed that there is still much more to be done in this direction (see e.g., Nakhleh 2011; Bapteste et al. 2013). Many of these methods follow a strategy that is also commonly used to build evolutionary trees (e.g., to construct supertrees), namely to infer networks from building blocks such as triplets (evolutionary trees with three leaves) (Huber et al. 2011), evolutionary trees (Kelk et al. 2012; D.Huson and Scornavacca 2012), or clusters/clades (van Iersel et al. 2010). However, a fundamental issue with this strategy is that the commonly used building blocks do not necessarily determine or encode networks, in contrast to evolutionary trees. In other words, there can be pairs of rooted phylogenetic networks that do not represent the same evolutionary histories, but still display exactly the same building blocks (see e.g., Gambette and Huber (2012) for triplets and clusters, and Willson (2011) for evolutionary trees). For example, considering the two networks in Figure 1 (i) and (ii), the first one of which is adapted from the network pictured in van Iersel et al. (2009, Fig. 10), which was constructed from a data set of the yeastCryptococcus gattii (see Hagen et al. 2013, for a following up study). Both networks display the same collection of evolutionary trees (pictured in the Supplementary Material available on Dryad (http://dx.doi.org/10.5061/dryad.f6n8s)), and therefore the same triplets, but they are not equivalent as networks. This is of importance since it implies that even if all of the building blocks for some reticulate evolutionary history were to be taken as the input for any given network building method, the method might still output an incorrect history.To address this problem, it was recently proposed that networks be constructed using a network analog of triplets called trinets (Huber and Moulton 2012). Trinets are rooted phylogenetic networks with three leaves (see e.g., Fig. 1 (iii)–(v)); they can be induced on any three leaves of a rooted network by taking the union of all paths from the root to one of the three leaves, and then removing all vertices that lie above the last vertex that is on all such paths, and suppressing parallel edges (see Supplementary Material available on Dryad (http://dx.doi.org/10.5061/dryad.f6n8s) for an illustration). For example, in Figure 1 the trinet pictured in (iii) is induced on the three leaves of the network pictured in (i). Note that in this example, even though the two networks in (i) and (ii) both induce the trinet pictured in (v), the trinets (iii) and (iv) that they induce on are not equivalent. In particular, it follows that the networks in (i) and (ii) are also not equivalent. Thus, considering trinets could hold some promise for distinguishing between networks, especially since some special types of rooted phylogenetic networks (e.g., level-1, level-2, and tree-child) are in fact encoded by their trinets (see e.g.,Huber and Moulton 2012; van Iersel and Moulton 2014).Even so, when trying to extend these results on trinet encodings to more general networks we were somewhat surprised to discover that trinets do not necessarily encode networks. Indeed, more generally, in this article we shall show that even if we are given the networks induced on all subsets of the leaves of a network except for the leaf-set itself, (which includes all possible trinets), we still do not necessarily have enough information to encode the network. More specifically, for any set of taxa of size at least three, we shall present an example of two nonequivalent rooted, binary phylogenetic networks with leaf set which both induce exactly the same network on any subset of with (see Theorem 2). As an illustration, we present these two networks in the case that has four elements in Figure 2. In addition, in the Supplementary Material available on Dryad (http://dx.doi.org/10.5061/dryad.f6n8s), we show that these networks also induce exactly the same set of evolutionary trees. Hence, even knowing all of the induced networks together with all of the induced trees for each of these two networks is still not enough information to distinguish between them.
F
Two distinct rooted phylogenetic networks on the set of taxa . The networks induce exactly the same set of trinets (pictured in the Supplementary Material available on Dryad (http://dx.doi.org/10.5061/dryad.f6n8s)) and also the same set of trees.
Two distinct rooted phylogenetic networks on the set of taxa . The networks induce exactly the same set of trinets (pictured in the Supplementary Material available on Dryad (http://dx.doi.org/10.5061/dryad.f6n8s)) and also the same set of trees.Our examples were inspired by some results due to Thatte concerning the reconstructability of the so-called pedigree graphs (Thatte 2008), which are used to represent ancestral relationships between individuals in a population. Thatte was able to show that a pedigree cannot in general be reconstructed from the collection of its proper subpedigrees. Although this result is similar in nature to ours, it is not a simple corollary, as pedigree graphs have quite a different structure to phylogenetic networks (e.g., a pedigree graph can have multiple roots or “founders” and all other vertices have two parents). Moreover, Thatte's concept of a subpedigree is different from our concept of a network induced on a subset of a network's leaves. Intriguingly, both Thatte's and our results are somewhat related to the Kelly–Ulam reconstruction conjecture that states that a graph is uniquely determined by all of its subgraphs. This conjecture is still open, although for directed graphs it is known to be false (see e.g., Stockmeyer 1977). Even so there are again important mathematical distinctions between graphs in general and phylogenetic networks and pedigrees (e.g., graphs are not labeled by a set of taxa and the concept of a subgraph is different from an induced network).The contents of the rest of the article are as follows. First we present some mathematical preliminaries on phylogenetic networks and also some terminology concerning binary sequences which will be key for constructing our examples. Then, given any leaf set of size at least three we present an example of two distinct nonbinary, rooted phylogenetic networks having the same leaf set which both induce exactly the same network on any proper subset of their leaves. These were the first examples that we discovered, and at the time we were uncertain as to whether or not there could be examples of binary networks with this property, as there are various mathematical results in phylogenetics that hold for binary trees/networks but not for nonbinary ones. However, by adapting our nonbinary networks we are also able to construct two binary networks with the same property. Since the proof of this fact follows the same approach to that for the nonbinary case but is considerably more technical, we shall present this in the Appendix. We conclude with a brief discussion of some ramifications and future directions as well as some potential consequences of our results for constructing reticulate evolutionary histories.
Mathematical Definitions
Digraphs
The basic graph-theoretical structure that underlies the phylogenetic networks in this article is called a digraph. This is a connected, directed graph consisting of a set of vertices representing taxa (both hypothetical and sampled) and a set of directed edges or arcs that join pairs of them. We denote an arc starting at vertex and ending at vertex by , and call a parent of and a child of . This represents the fact that is a direct ancestor of . The in- and outdegree of a vertex in is the number of arcs ending and starting at , respectively. A vertex of that has outdegree 0 is called a leaf of (which corresponds to a sampled taxon, either extinct or extant), and the set of all leaves of is denoted by . Note that vertices with indegree at most one and outdegree at least two represent speciations, whereas those with indegree at least two represent reticulations (e.g., evolutionary events such as hybridization and recombination). If a digraph has a unique vertex with indegree zero, corresponding to a common ancestor of all of the taxa in question, then that vertex is called the root of , denoted by , and we call a rooted digraph. If is rooted and is a further rooted digraph then we say that and are isomorphic (as digraphs) if they are isomorphic in the usual graph-theoretical sense. If, in addition to being isomorphic, every leaf is mapped to itself by the underlying map, then and are called equivalent.A digraph with no directed cycles is called a directed acyclic graph (DAG). For a rooted DAG , a vertex in that is neither a leaf nor the root is called an interior vertex of . In addition, a vertex in is called an ancestor of a vertex in if and are equal (Although this means that in mathematical terms every vertex is considered to be an ancestor of itself, we adopt this mathematical convention as it simplifies the mathematics and is a common assumption in the theory of directed graphs.) or there exists a directed path in starting at and ending at . If is an ancestor of but then we say that is below
. Thus, in a DAG a vertex can never be below itself, which corresponds to the fact that cannot be a biological descendent of itself. Furthermore, if has at least three vertices and is a vertex with outdegree one then we call
degenerate if indegree of is not at least two. Finally, we call
binary if the outdegree of is two and the sum of the indegree and outdegree of every interior vertex of is three.
Phylogenetic Trees and Networks
Suppose for the remainder of the article that is some (nonempty) set of taxa. A (phylogenetic) network
(on
) is a rooted DAG without degenerate vertices whose set of leaves is . Unless the phylogenetic network in question has precisely two vertices, we always assume that the outdegree of the root of is at least two. Note that a network that does not contain vertices with indegree two or more is just an evolutionary or phylogenetic tree (on
). As usual, we call a phylogenetic tree in which every leaf is the child of the root a star tree.Now, suppose that is a nonempty subset of the set of species. We now consider the subnet of induced by restricting our attention to the leaves in . The lowest stable ancestor
lsa
of
in
is the vertex that lies on all directed paths from the root of to the elements in , so that no vertex of below enjoys this property. In case lsa, we call
recoverable. The subnet
of induced by is defined as the phylogenetic network on obtained from as follows: First, delete all vertices of (and their incident arcs) that are not on a directed path from lsa to some element in . Next, repeatedly suppress all resulting degenerate vertices (i.e. replace any such vertex and the two arcs and containing it by a single arc ) and remove all parallel arcs until a phylogenetic network on is obtained. This definition for a subnet was introduced by Huber and Moulton (2012), and it aims to capture features that can be recovered from data (e.g., all degenerate vertices are suppressed as it would not be possible to decide how many degenerate vertices to include in a reconstructed network). Note that if and only if is recoverable. Also note that every subnet of induced by restricting to some nonempty subset of its leaves is necessarily recoverable.We say that two phylogenetic networks and on are network-equivalent if for every nonempty, proper subset of , the phylogenetic networks and are equivalent. Thus, two phylogenetic networks are equivalent if and only if they represent the same evolutionary histories. Note that the following useful observation concerning network-equivalence is an immediate consequence of our definitions.Lemma 1.
Suppose that
and
are two recoverable phylogenetic networks on
. If
and
are equivalent, then
and
are network-equivalent.
Binary Sequences
All of our networks will be constructed using special types of binary sequences, that is, sequences over the alphabet . We use binary sequences since they provide a convenient way to encode the vertices of certain phylogenetic trees that will be relevant to our constructions.As our examples rely on using some special types of binary sequences we now introduce some general terminology concerning such sequences. Suppose that is a nonnegative integer. We denote by the length of a binary sequence . We let denote the empty sequence, that is, the unique sequence with length 0, let be the binary sequence of length with 0's in all but the -th place, , and let , be the binary sequences of length consisting of all 0's and all 1's, respectively. We also let denote the set of all binary sequences that have length . Note that .Now, assume . For each sequence in and all , we denote by the -th letter of starting from the left. We define the weight of as , that is, the number of 1's contained in . Moreover, for each sequence we define the support
of to be the subset of consisting of all indices with . Finally, we denote by and the subsets of consisting of sequences whose weights are odd and even, respectively. Note that we will assume that is the only sequence in that is contained in neither nor . Thus, while . As an illustration of these definitions, , the weight of the sequence 011 is 2 and its support is , and , .Now, suppose that and are two binary sequences. Then is called a prefix of
if and holds for all . Note that the empty sequence is a prefix of every binary sequence. Also, if is a set of binary sequences and , then we call a sequence a precursor of
(in
) if is a prefix of , that is, a prefix of is a precursor of in if and only if this prefix is contained in . And if, in addition, every precursor of in other than is also a precursor of in , then we say that is the maximal precursor of
(in
). Note that if this exists, then it is unique. Finally, we call a common precursor of
if, for every sequence , is a precursor of in .
Main Results
Nonbinary Network Examples
In this section we shall present two nonbinary, phylogenetic networks and on an arbitrary set with at least three elements that are not equivalent and prove that they are network-equivalent.We begin by defining two rooted DAGs and from which we will obtain and , respectively. Let , let , and let be a set such that . For , associate to and the rooted DAG with vertex set , and arc set comprising of (i) for all the arcs , (ii) for all the arcs , and (iii) for all and the arcs if and only if . Note that is the set of leaves of and is the root. We illustrate these DAGs for in Figure 3 and list the binary sequences that label the vertices in and in its caption. To obtain the phylogenetic networks and we just suppress all degenerate vertices of and , respectively. Note that both and are recoverable because we have lsa in and lsa
in .
F
The rooted DAGs and for the case and . The labels of the vertices in , , directly below the root in both DAGs are omitted; listed from left to right they are 1110,1101,1000,1011,0100,0111,0010,0001 for , and 1111,1100,1010,1001,0110,0101,0011 for .
The rooted DAGs and for the case and . The labels of the vertices in , , directly below the root in both DAGs are omitted; listed from left to right they are 1110,1101,1000,1011,0100,0111,0010,0001 for , and 1111,1100,1010,1001,0110,0101,0011 for .We now prove the first of our main results.Theorem 1
For every
, the networks
and
are not equivalent. However,
and
are network-equivalent.Proof. To see that and are not equivalent note that and that sequence is contained in . Consequently, there exists a child of the root of or (but not both) that has outdegree . Thus, and cannot be equivalent.We next show that and are network-equivalent. Let and put , . Note that and are recoverable as they are subnets of and , respectively. In view of Lemma 1, it therefore suffices to show that and are equivalent.To this end, for any , we associate for a rooted DAG to with the leaf removed. In particular, we define to be the rooted DAG with leaf set obtained from by first deleting all arcs from that do not lie on a path from the root of to a leaf in and then removing all resulting isolated vertices. Note that .For brevity, for the rest of this proof, we let and denote the vertex set and edge set of , respectively, and we put . We define a map from to as follows. Let denote the map from to that “flips” precisely the -th letter of a sequence in , that is the map given by, for all , putting for , and for . Note that and . Moreover, the map induces a bijection from to . Using this bijection, we now define the map by putting, for ,
We shall show that this map is a bijection from to that extends to an isomorphism from to which maps every element in to itself. This implies that and are equivalent.Clearly, maps every element in to itself. To see that is a bijection, note that since is the only element in that is contained in but not , we have . Combined with the fact that also holds and that is a bijection, it follows that is a bijection.To see that induces an isomorphism between and it suffices to show for all , that if and only if . In view of holding for all and holding for all , it follows that we may restrict our attention to showing that for all and all we have that if and only if . So let and . Assume first that . Then and so as . Thus, as . Conversely, assume that . Then since we have , and, hence in view of . Thus, , as required. ▪
Binary Network Examples
We now extend the definitions of the networks and defined in the previous section so as to define two binary phylogenetic networks and that are not equivalent, but which are network-equivalent. We shall just present the definitions of and ; the proof of their network-equivalence is quite technical and can be found in the Appendix.Let and . Starting with the rooted DAGs defined in the previous section, we shall define a sequence of three rooted DAGs all having leaf set , the last one of which will yield . We illustrate this process in Figure 4, for the rooted DAG depicted in Figure 3.
F
Constructing the network from the network in the case . At each stage, we indicate those vertices that have been inserted by unfilled circles. i) The network in which the root has been replaced by the tree . ii) The network with the vertices in the middle layer indicated by squares. The tree is indicated in bold. iii) The network . The tree associated to the binary sequence is indicated in bold. iv) The network obtained by suppressing all vertices in having indegree and outdegree equal to one.
Constructing the network from the network in the case . At each stage, we indicate those vertices that have been inserted by unfilled circles. i) The network in which the root has been replaced by the tree . ii) The network with the vertices in the middle layer indicated by squares. The tree is indicated in bold. iii) The network . The tree associated to the binary sequence is indicated in bold. iv) The network obtained by suppressing all vertices in having indegree and outdegree equal to one.Step 1: We begin by replacing the star tree containing the root vertex of by a tree with leaf set that is a subtree of a certain tree which is defined as follows. Let be the set of all binary sequences with length at most . The tree is the rooted tree with vertex set and arc set consisting of all pairs for which is the maximal precursor of in . Note that the common precursor of is clearly the root of . In addition, since each sequence is the maximal precursor of exactly two sequences in if , and is not the maximal precursor of any sequence in if , it follows that is a binary phylogenetic tree on . We depict the tree in Figure 5(i).
F
i) The tree on . ii) The rooted caterpillar .
i) The tree on . ii) The rooted caterpillar .Now, we replace the subgraph of with vertex set consisting of the root of and the children of (i.e. the star tree on } with root ) by the (necessarily binary) restriction of to . Let denote the resulting rooted DAG (see e.g., Fig. 4(i)).Step 2: We now replace each of the vertices , , in the “bottom layer” of by a tree . This tree is defined by reversing the direction of all arcs in the tree obtained by restricting the tree to the set of all binary sequences in whose -th letter is 1. Note that the unique leaf of is since for all the source set of is and which is the leaf set of .Now, note that, for all , the indegree of in is and that therefore the indegree of in is also . We now replace for all the subgraph of induced on the set by . Let denote the resulting rooted DAG (see e.g., Fig. 4(ii)).Step 3: The final stage of the construction involves replacing each of the vertices in the “middle layer” of with another phylogenetic tree which is defined as follows.Let , , denote a set of positive integers with . If , then we denote by the phylogenetic tree whose unique leaf is labeled by the sole element in . More generally, for we denote by the (up to equivalence) unique binary phylogenetic tree on such that, over all non-leaf vertices of , the collection of leaves below is . Note that is an example of a rooted caterpillar tree (see e.g., Semple and Steel 2003). In Figure 5(ii) we present the tree for .Now, any nondegenerate vertex of is not binary if and only if and . Therefore, we shall consider vertices in whose support has size at least three. We shall replace all such vertices by a rooted tree that is derived from the tree as follows (essentially, we replace and its outgoing arcs by a rooted caterpillar whose leaves are the children of ). Put . Then, since for all the trees and defined in Step 2 do not share an interior vertex in , it follows that for every child of there exists a unique vertex below . For all let denote the child of in that lies on the path from to so that, in particular, the set of children of is . To obtain the final digraph in our sequence, we replace, for each with , the subgraph of induced on the set consisting of and its children by the tree obtained from by replacing each of its leaves by the corresponding child of and replacing its root by (see e.g., Fig. 4(iii)).The phylogenetic network is now defined to be the rooted DAG obtained from by suppressing all degenerate vertices (see e.g., Fig. 4(iv)). Note that, by construction, the leaf set of is and is binary. Also note that is recoverable.The proof of our second main result is quite technical and is given in the Appendix.Theorem2
For every
, the binary phylogenetic networks
and
are not equivalent. However,
and
are network-equivalent.
Discussion
Our examples illustrate a problem with generalizing evolutionary models from rooted trees to rooted networks. We show that there are pairs of phylogenetic networks on an arbitrary set of taxa that are not equivalent, and yet display the same set of evolutionary trees (see Supplementary Material available on Dryad (http://dx.doi.org/10.5061/dryad.f6n8s) for additional details), as well as the same set of induced subnetworks. Although these examples are artificial in their construction, they still point to the possibility that this phenomenon could arise in nature, especially since phylogenetic networks can be extremely complex (see e.g., Kunin et al. 2005; Dagan et al. 2008).The problem that we have presented has some potential ramifications for the development and use of new methods for constructing networks that explicitly represent evolution. First, as mentioned in the introduction, it implies that in practice we will have to be careful to ensure that the output from any network construction method is uniquely determined by its input. This in itself is not necessarily a great problem since even when we construct phylogenetic trees there can be multiple solutions (e.g., there can be several most parsimonious trees). Second, given that we know that there are cases where a network cannot be uniquely recovered from all of its induced subnetworks, it becomes important to characterize under which conditions these cases will be manifested, and we should try to understand how often biological data will actually meet these conditions. Finally, in the context of extending consensus and supertree methods to include phylogenetic networks, our result shows that, unlike trees, it will not be possible to develop supernetwork methods in general that are consistent, that is methods that are guaranteed to output a given network from all of its induced subnetworks. However, again it will be interesting to better understand how important this will actually be in practice.Even though we have found that networks are not necessarily encoded by their induced subnets in general, some classes of networks are. For example, level-2 networks and thus also phylogenetic trees and level-1 networks are encoded by their trinets (note that the level of a binary phylogenetic network is the maximum number of indegree-2 vertices taken over all biconnected components of the network) (van Iersel and Moulton 2014). Hence, it might be of interest to determine which types of networks are encoded by their induced subnets and also to possibly concentrate on developing methods to construct these special types of networks. Note that various methods have already been designed to construct special types of networks (see e.g., Willson 2012), but this obviously requires some care to ensure that the properties of the networks under consideration are realistic enough to represent real data. Note also that the level of the networks in our examples is exponential in (it is with ). Hence, it could be of interest to decide whether networks with reasonably low level relative to the size of their leaf set (e.g., linear level as function of ) are encoded by their subnets.Even if we are not necessarily able to encode a network by its subnets, it could still be of interest to investigate whether at least some parameters (e.g., the number of reticulation vertices) can be determined or at least approximated by the knowledge of their induced subnets (or even trees). In addition, it could be useful to decide whether or not networks might be encoded if more information is available (e.g., if we are given branch lengths/dates for vertices or some model of evolution). Note that Thatte and Steel investigated reconstructability of pedigrees assuming a certain probabilistic model and were able to prove some encoding results for pedigrees in general (see e.g., Thatte and Steel 2008; Thatte 2013), so analogous results might also hold for phylogenetic networks.There are some related mathematical problems that are also worth mentioning. It has been shown that a graph drawn uniformly at random is encoded by its subgraphs with probability 1, as the size of the vertex set goes to infinity (see e.g., Bollobás 1990). It would be interesting to work out the probability that a randomly selected phylogenetic network is encoded by its induced subnets. This might also provide some clues about whether or not networks arising in practice could be expected to be encoded by induced subnets or not. In particular, the aforementioned probabilistic result suggests that maybe networks on large sets of taxa that are not encoded by their induced subnetworks might be quite rare in practice. In addition, an interesting algorithmic question is the following: if we are given a phylogenetic network, can we decide efficiently if it is uniquely encoded by its induced subnets? And, if we are given a set of networks, can we efficiently decide if they are induced subnets of some network?In conclusion, even if there may be more than one network that can induce the same set of trees and/or subnetworks, it is still useful to find ways to construct these networks so that alternative evolutionary scenarios can be explored. This has already proven a useful strategy in phylogenetics (for example, understanding the number of reconciliations of a gene tree with a species tree (Bansal et al. 2013)). In regards to this, it would be interesting to develop ways to determine how many networks can potentially display the same set of subnetworks. More generally, a better understanding of the structure of networks in terms of substructures could also give us a better understanding of the performance of current methods for network construction, and will hopefully also eventually help us to design new methods for confidently recovering reticulate evolutionary histories.
Authors: Eric Bapteste; Leo van Iersel; Axel Janke; Scot Kelchner; Steven Kelk; James O McInerney; David A Morrison; Luay Nakhleh; Mike Steel; Leen Stougie; James Whitfield Journal: Trends Genet Date: 2013-06-11 Impact factor: 11.639
Authors: Zhenfeng Liu; Johannes Müller; Tao Li; Richard M Alvey; Kajetan Vogl; Niels-Ulrik Frigaard; Nathan C Rockwell; Eric S Boyd; Lynn P Tomsho; Stephan C Schuster; Petra Henke; Manfred Rohde; Jörg Overmann; Donald A Bryant Journal: Genome Biol Date: 2013-11-22 Impact factor: 13.583
Authors: Elizabeth Gross; Leo van Iersel; Remie Janssen; Mark Jones; Colby Long; Yukihiro Murakami Journal: J Math Biol Date: 2021-09-04 Impact factor: 2.259