| Literature DB >> 34714118 |
Mercè Llabrés1,2, Francesc Rosselló1,2, Gabriel Valiente3.
Abstract
The Robinson-Foulds (RF) distance, one of the most widely used metrics for comparing phylogenetic trees, has the advantage of being intuitive, with a natural interpretation in terms of common splits, and it can be computed in linear time, but it has a very low resolution, and it may become trivial for phylogenetic trees with overlapping taxa, that is, phylogenetic trees that share some but not all of their leaf labels. In this article, we study the properties of the Generalized Robinson-Foulds (GRF) distance, a recently proposed metric for comparing any structures that can be described by multisets of multisets of labels, when applied to rooted phylogenetic trees with overlapping taxa, which are described by sets of clusters, that is, by sets of sets of labels. We show that the GRF distance has a very high resolution, it can also be computed in linear time, and it is not (uniformly) equivalent to the RF distance.Entities:
Keywords: Robinson-Foulds distance; metrics; phylogenetic tree
Mesh:
Year: 2021 PMID: 34714118 PMCID: PMC8742253 DOI: 10.1089/cmb.2021.0342
Source DB: PubMed Journal: J Comput Biol ISSN: 1066-5277 Impact factor: 1.479
FIG. 1.Chloroplast DNA phylogeny (left) and mitochondrial DNA phylogeny (right) of several species of the genus Solanum. 1: S. lycopersicoides; 2: S. juglandifolium; 3: S. peruvianum; 4: S. chilense; 5: S. pennellii; 6: S. hirsutum; 7: S. chmielewskii; 8: S. esculentum; 9: S. pimpinellifolium; 10: S. cheesmaniae; 11: S. rickii.
FIG. 2.The contraction of edge in the caterpillar on the left yields the phylogenetic tree at the center, and the contraction of edge in the caterpillar on the left yields the phylogenetic tree on the right.
FIG. 3.The caterpillars with n and leaves.
Number of Different Values Taken by the Distance, the Cluster Dissimilarity, and the Generalized Robinson-Foulds Distance, on All Pairs of Binary Phylogenetic Trees with Labeled Leaves (a) and for a Random Uniform Sample of 10,000 Pairs of Binary Phylogenetic Trees with Labeled Leaves (b)
| (a) n | No. of values | ||
|---|---|---|---|
|
| CD | GRF | |
| 3 | 2 | 2 | 2 |
| 4 | 5 | 3 | 9 |
| 5 | 9 | 7 | 32 |
| 6 | 15 | 11 | 142 |
CD, cluster dissimilarity; GRF, generalized Robinson-Foulds.
Ratio of Generalized Robinson-Foulds-Equidistant Triplets to Robinson-Foulds-Equidistant Triplets and Vice Versa, for All the Triplets of Binary Phylogenetic Trees with Labeled Leaves (a) and for a Random Uniform Sample of 10,000 Triplets of Binary Phylogenetic Trees with Labeled Leaves (b)
| (a) n | GRF vs. RF | RF vs. GRF |
|---|---|---|
| 3 | 0.000000 | 0.000000 |
| 4 | 0.697417 | 0.000000 |
| 5 | 0.801518 | 0.000000 |
| 5 | 0.957795 | 0.000228 |
FIG. 4.Triplet of phylogenetic trees with but .