| Literature DB >> 17147817 |
Christophe Dessimoz1, Manuel Gil, Adrian Schneider, Gaston H Gonnet.
Abstract
BACKGROUND: The estimation of the difference between two evolutionary distances within a triplet of homologs is a common operation that is used for example to determine which of two sequences is closer to a third one. The most accurate method is currently maximum likelihood over the entire triplet. However, this approach is relatively time consuming.Entities:
Mesh:
Substances:
Year: 2006 PMID: 17147817 PMCID: PMC1762028 DOI: 10.1186/1471-2105-7-529
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Unrooted tree topology of all triplets of homologs. Sequences X, Y and Z originating from O. The problem addressed here is the estimation of the difference Δ = d- d= d- d
Coefficient of the approximation of σ2()
| Type | error | dim | |||||
| Day | -1.3090 | 1.0435 | 0.6895 | -0.3339 | 0.1590 | 0.087 | 2.13 |
| DNA | -1.2449 | 1.0933 | 0.6591 | -0.3026 | 0.1181 | 0.098 | 2.13 |
| JTT | -1.2921 | 1.0978 | 0.6741 | -0.3065 | 0.1144 | 0.080 | 2.10 |
Coefficients of the regression on the logarithms for the three types of scoring matrices. The error column shows the mean error, which by virtue of being a regression on logarithms is very close to the relative error.
Verification of accuracy of confidence intervals
| | | 0.95129 ± 0.00067 | 0.99062 ± 0.00030 |
| | | 0.9511 ± 0.0020 | 0.99001 ± 0.00091 |
| | | 0.94641 ± 0.00070 | 0.98896 ± 0.00032 |
| | | 0.94808 ± 0.00069 | 0.98953 ± 0.00032 |
| | | 0.98137 ± 0.00042 | 0.99774 ± 0.00015 |
Comparison among the different methods to estimate the variance of the two estimators and , resulting from a simulation using updated Dayhoff matrices over 400,000 proteins triplets, except for the bootstrapping method, based on 40,000 samples. The first column tests the 95% confidence interval, the second the 99% confidence interval.
Figure 2Scatter plots comparing the variance estimators. The upper-left plot shows the strong agreement between σ2() and our approximation σ2(). From the upper-right and the lower-left plots, it can be seen that both have similar correlation with (). Finally, the lower-right plot confirms that variance estimation under the assumption of independence can yield a large overestimation of the correct variance.
Figure 3Detection of asymmetric evolution. Detection of Asymmetric Evolution. Comparison between the results of Kellis et al. and the three variants of closer, with k = 1.96. The circles separate cases of significant asymmetry (inside) from insignificant asymmetry (outside). For instance, there were 92 cases where all three variants of closer reported significant asymmetry, while the method of Kellis et al. did not detect significant asymmetry.
Figure 4Tree randomly generated for closest homolog simulation. Example of a random tree (see text for description of the procedure) used to compare the different methods to infer the closest homolog to each leaf. Distances indicated are in PAM units.
Figure 5Identification of the closest homolog. Identification of the closest homolog: comparison between methods using alignment score (1), distance with assumption of independence (2) and distance using our variance approximation (3), on simulated data.