| Literature DB >> 21697992 |
Sylvain Lespinats1, Delphine Grando, Eric Maréchal, Mohamed-Ali Hakimi, Olivier Tenaillon, Olivier Bastien.
Abstract
Whatever the phylogenetic method, genetic sequences are often described as strings of characters, thus molecular sequences can be viewed as elements of a multi-dimensional space. As a consequence, studying motion in this space (ie, the evolutionary process) must deal with the amazing features of high-dimensional spaces like concentration of measured phenomenon.TO STUDY HOW THESE FEATURES MIGHT INFLUENCE PHYLOGENY RECONSTRUCTIONS, WE EXAMINED A PARTICULAR POPULAREntities:
Keywords: Fitch-Margoliash; Least Square methods; Multi Dimensional Scaling; Sammon’s mapping; molecular phylogeny
Year: 2011 PMID: 21697992 PMCID: PMC3118699 DOI: 10.4137/EBO.S7048
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Figure 1200 random data are uniformly distributed in a unit cube (of a given dimensionality). Histograms of distances between every pairs of items (19 500 distances) are displayed according to space dimension. Distributions of distances for dimensions larger than 200 would have the same Gaussian-like shape, but their centers would be shifted to the right proportionally to the square root of the dimension.
Figure 2Summary of the eight tested criteria and links between them. Every combination of criterion components is tested to evaluate each improvement. Components allows: penalizing either “tears” (as in original criteria) or “tears” and “false neighborhoods” together (as it is suggested here); emphasizing small distances either thanks to the traditional reverse function or while considering the concentration of measure phenomenon with more (cases ζ2 and ζ5) or less (cases ζ1 and ζ4) focusing on smaller distances. ζFM corresponds to the original Fitch-Margoliash’s criterion and ζSa to the Sanjuán and Wróbel’s one. ζ1 and ζ2 avoid both risks of “false neighborhoods” and risks related to the concentration of measure phenomenon. ζ3 and ζ6 avoid risks related to the concentration of measure phenomenon and risks of “tears” (but do not considered the risk of “false neighborhoods”). ζ4 and ζ5 avoid risks related to the concentration of measure phenomenon (while highly focusing on smaller distances in case of ζ5) but lightly penalize “false neighborhoods”.
Kappa coefficients (c.f. section 5.3) for each method (in column) and for various k (in row). Note that the whole number of distances between 15 species is 105.
| 1 | 0.85 | 0.98 | 0.91 | 0.85 | 0.98 | 0.9 | 0.88 | 0.89 | 0.86 |
| 2 | 0.83 | 0.96 | 0.9 | 0.84 | 0.95 | 0.89 | 0.90 | 0.89 | 0.87 |
| 3 | 0.8 | 0.96 | 0.87 | 0.81 | 0.95 | 0.87 | 0.86 | 0.86 | 0.88 |
| 4 | 0.81 | 0.96 | 0.88 | 0.82 | 0.94 | 0.88 | 0.87 | 0.87 | 0.85 |
| 5 | 0.82 | 0.94 | 0.87 | 0.83 | 0.92 | 0.87 | 0.86 | 0.86 | 0.86 |
| 10 | 0.85 | 0.88 | 0.87 | 0.85 | 0.87 | 0.86 | 0.86 | 0.86 | 0.87 |
| 15 | 0.85 | 0.85 | 0.86 | 0.85 | 0.85 | 0.86 | 0.86 | 0.86 | 0.87 |
Figure 3Example of comparison between two tree-building methods: Tree built according to the classical Fitch-Margoliash method versus the tree built thanks to the new criterion. Original data (and the associate distance matrix) are displayed in the upper insert. Left and right inserts express the two trees. “Continuity” and “trustworthiness” can then be compared on the lower insert (the grey curve corresponds to the Fitch-Margoliash method and the black curve is related to the new method).
Figure 4Evaluation of distance preservation by nine tree building methods (analysis on 200 sets of 15 random data in a two-dimensional space). The up and left insert shows the Robinson and Foulds distance between trees generated by the various methods: the distance that separates two methods in the graph accounts for the average Robinson and Foulds distance between methods’ trees (see Hillis et al)78 for a somewhat similar display of Robinson and Foulds distances). Other inserts express “continuity” (lefts inserts) and “trustworthiness” (right inserts). Every curve is reported on each graph to allow an easy comparison. On each insert, the related method corresponds to the black curve.
Figure 5Evaluation of distance preservation (data in a 100-dimensional space). The up and left insert shows the Robinson and Foulds distance between trees generated by the various methods. Other inserts express “continuity” (lefts inserts) and “trustworthiness” (right inserts). Every curve is reported on each graph to allow an easy comparison. On each insert, the related method corresponds to the black curve.
Kappa coefficients (c.f. section 5.3) for each method (in column) and for various k (in row). Note that the whole number of distances between 15 species is 105.
| 1 | 0.77 | 0.87 | 0.75 | 0.78 | 0.89 | 0.77 | 0.74 | 0.77 | 0.77 |
| 2 | 0.71 | 0.87 | 0.68 | 0.71 | 0.89 | 0.69 | 0.67 | 0.69 | 0.69 |
| 3 | 0.68 | 0.86 | 0.66 | 0.71 | 0.84 | 0.66 | 0.65 | 0.67 | 0.66 |
| 4 | 0.69 | 0.85 | 0.65 | 0.73 | 0.83 | 0.65 | 0.65 | 0.65 | 0.67 |
| 5 | 0.69 | 0.84 | 0.65 | 0.72 | 0.81 | 0.64 | 0.64 | 0.64 | 0.66 |
| 10 | 0.67 | 0.72 | 0.64 | 0.7 | 0.71 | 0.64 | 0.64 | 0.64 | 0.65 |
| 15 | 0.68 | 0.64 | 0.65 | 0.69 | 0.67 | 0.65 | 0.66 | 0.65 | 0.65 |
Robustness of NNI according to the criterion that drives optimization. Each method is evaluated by the percentage of generated tree that corresponds to the optimal tree according to its criterion.
| ζ1 | ζ2 | ζ3 | ζ4 | ζ5 | ζFM | ζ6 | ζSa | |
|---|---|---|---|---|---|---|---|---|
| dim = 2 | 40.9% | 61.7% | 44.8% | 30.9% | 56% | 33.4% | 39.1% | 33.9% |
| dim = 100 | 45.2% | 48.8% | 45% | 41.3% | 48.6% | 42.1% | 45.2% | 42.2% |
| A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 17998 | 17998 | 17998 | 17998 | 17998 | 17998 | 17998 | 17998 | 17998 | 17998 | 17998 | 17798 | 17998 | 17998 |
| 17998 | 0 | 13858 | 13858 | 13858 | 13858 | 13858 | 13858 | 13858 | 13858 | 13858 | 13858 | 13858 | 16750 | 16750 |
| 17998 | 13858 | 0 | 10530 | 10800 | 10800 | 10800 | 10800 | 11344 | 11344 | 11344 | 11344 | 13678 | 16750 | 16750 |
| 17998 | 13858 | 10530 | 0 | 10800 | 10800 | 10800 | 10800 | 11344 | 11344 | 11344 | 11344 | 13678 | 16750 | 16750 |
| 17998 | 13858 | 10800 | 10800 | 0 | 2108 | 2472 | 2472 | 11344 | 11344 | 11344 | 11344 | 13678 | 16750 | 16750 |
| 17998 | 13858 | 10800 | 10800 | 2108 | 0 | 2472 | 2472 | 11344 | 11344 | 11344 | 11344 | 13678 | 16750 | 16750 |
| 17998 | 13858 | 10800 | 10800 | 2472 | 2472 | 0 | 804 | 11344 | 11344 | 11344 | 11344 | 13678 | 16750 | 16750 |
| 17998 | 13858 | 10800 | 10800 | 2472 | 2472 | 804 | 0 | 11344 | 11344 | 11344 | 11344 | 13678 | 16750 | 16750 |
| 17998 | 13858 | 11344 | 11344 | 11344 | 11344 | 11344 | 11344 | 0 | 4134 | 4134 | 8352 | 13678 | 16750 | 16750 |
| 17998 | 13858 | 11344 | 11344 | 11344 | 11344 | 11344 | 11344 | 4134 | 236 | 0 | 8352 | 13678 | 16750 | 16750 |
| 17998 | 13858 | 11344 | 11344 | 11344 | 11344 | 11344 | 11344 | 8352 | 8352 | 8352 | 0 | 13678 | 16750 | 16750 |
| 17798 | 13858 | 13678 | 13678 | 13678 | 13678 | 13678 | 13678 | 13678 | 13678 | 13678 | 13678 | 0 | 16750 | 16750 |
| 17798 | 16750 | 16750 | 16750 | 16750 | 16750 | 16750 | 16750 | 16750 | 16750 | 16750 | 16750 | 16750 | 0 | 722 |
| 17798 | 16750 | 16750 | 16750 | 16750 | 16750 | 16750 | 16750 | 16750 | 16750 | 16750 | 16750 | 16750 | 722 | 0 |
| A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 0 | 3000 | 3000 | 3000 | 5296 | 5296 | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 |
| B | 3000 | 0 | 2088 | 2088 | 5296 | 5296 | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 |
| C | 3000 | 2088 | 0 | 444 | 5296 | 5296 | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 |
| D | 3000 | 2088 | 444 | 0 | 5296 | 5296 | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 |
| E | 5296 | 5296 | 5296 | 5296 | 0 | 5014 | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 |
| F | 5296 | 5296 | 5296 | 5296 | 5014 | 0 | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 |
| G | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 | 0 | 2716 | 15744 | 15744 | 15744 | 15744 | 15744 | 15744 | 15744 |
| H | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 | 2716 | 0 | 15744 | 15744 | 16248 | 16248 | 16248 | 16248 | 16248 |
| I | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 | 15744 | 15744 | 0 | 7154 | 16248 | 16248 | 16248 | 16248 | 16248 |
| J | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 | 15744 | 15744 | 7154 | 0 | 16248 | 16248 | 16248 | 16248 | 16248 |
| K | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 | 16248 | 16248 | 16248 | 16248 | 0 | 3690 | 13248 | 13248 | 13248 |
| L | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 | 16248 | 16248 | 16248 | 16248 | 3690 | 0 | 13248 | 13248 | 13248 |
| M | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 | 16248 | 16248 | 16248 | 16248 | 13248 | 13248 | 0 | 3178 | 9164 |
| N | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 | 16248 | 16248 | 16248 | 16248 | 13248 | 13248 | 3178 | 0 | 9164 |
| O | 18406 | 18406 | 18406 | 18406 | 18406 | 18406 | 16248 | 16248 | 16248 | 16248 | 13248 | 13248 | 9164 | 9164 | 0 |