| Literature DB >> 15840174 |
Abstract
BACKGROUND: Evolutionary distances are a critical measure in comparative genomics and molecular evolutionary biology. A simulation study was used to examine the effect of alignment accuracy of DNA sequences on evolutionary distance estimation.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15840174 PMCID: PMC1087827 DOI: 10.1186/1471-2105-6-102
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Relationship of alignment accuracy and sequence identity. (A) Proportion of sites correctly aligned versus true percent identity among the sequences. (B) Proportion of sites correctly aligned versus observed percent identity after alignment. (C) Observed versus true percent identity. JC = Jukes-Cantor model [37]; HKY = Hasegawa-Kishino-Yano model [38]; HKY + Γ = Hasegawa-Kishino-Yano plus gamma-distributed rates model. All points represent the average of 1000 simulation replicates.
Figure 2Relationships among true and estimated evolutionary distances. True distances were measured using the appropriate substitution model. (A) True P-distance versus true evolutionary distance. (B) Estimated evolutionary distance from aligned data versus true evolutionary distance.
Figure 3Effect of alignment error on evolutionary distance estimation. Relative error in evolutionary distance is measured as the absolute difference between the distance estimated from the true alignment and the distance estimated from the observed alignment, divided by the distance estimate from the true alignment. (A) Relative error in evolutionary distance versus proportion of correctly aligned sites. (B) Relative error in evolutionary distance versus true percent identity.
Figure 4Effects of parameter changes on alignment accuracy and relative error in evolutionary distance estimation. Ordinate axes are scaled to match Figure 3. Alignment accuracy (A, C, E, G, &I) and evolutionary distance estimation (B, D, F, H, &J). (A &B) Effect of initial sequence length. (C &D) Effect of mean insertion and deletion size. (E &F) Effect of insertion and deletion rate. (G &H) Effect of intersite rate variation. (I &J). Effect of nucleotide frequency bias (G+C content). Error bars represent ± one standard deviation. Black and white points represent HKY simulations with expected distances of 0.5 and 1.0, respectively.
Summary of all simulation conditions
| Model | G+C% | Initial # Sites | κ | α | Insertion/Deletion Rate | Mean Indel Size | True distances simulated |
| JC | 0.5 | 1000 | n/a | n/a | 100/40 | 4 | 0.02, 0.04, 0.06, 0.08, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0 |
| HKY | 0.6 | 1000 | 3.6 | n/a | 100/40 | 4 | 0.02, 0.04, 0.06, 0.08, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0 |
| HKY + Γ | 0.6 | 1000 | 3.6 | 1.0 | 100/40 | 4 | 0.02, 0.04, 0.06, 0.08, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0 |
| HKY | 0.6 | 100 | 3.6 | n/a | 100/40 | 4 | 0.5, 1.0 |
| HKY | 0.6 | 200 | 3.6 | n/a | 100/40 | 4 | 0.5, 1.0 |
| HKY | 0.6 | 300 | 3.6 | n/a | 100/40 | 4 | 0.5, 1.0 |
| HKY | 0.6 | 400 | 3.6 | n/a | 100/40 | 4 | 0.5, 1.0 |
| HKY | 0.6 | 500 | 3.6 | n/a | 100/40 | 4 | 0.5, 1.0 |
| HKY | 0.6 | 1500 | 3.6 | n/a | 100/40 | 4 | 0.5, 1.0 |
| HKY | 0.6 | 2000 | 3.6 | n/a | 100/40 | 4 | 0.5, 1.0 |
| HKY | 0.6 | 5000 | 3.6 | n/a | 100/40 | 4 | 0.5, 1.0 |
| HKY | 0.6 | 10000 | 3.6 | n/a | 100/40 | 4 | 0.5, 1.0 |
| HKY | 0.6 | 1000 | 3.6 | n/a | 100/40 | 2 | 0.5, 1.0 |
| HKY | 0.6 | 1000 | 3.6 | n/a | 100/40 | 6 | 0.5, 1.0 |
| HKY | 0.6 | 1000 | 3.6 | n/a | 100/40 | 8 | 0.5, 1.0 |
| HKY | 0.6 | 1000 | 3.6 | n/a | 100/40 | 10 | 0.5, 1.0 |
| HKY | 0.6 | 1000 | 3.6 | n/a | 200/80 | 4 | 0.5, 1.0 |
| HKY | 0.6 | 1000 | 3.6 | n/a | 150/60 | 4 | 0.5, 1.0 |
| HKY | 0.6 | 1000 | 3.6 | n/a | 75/30 | 4 | 0.5, 1.0 |
| HKY | 0.6 | 1000 | 3.6 | n/a | 50/20 | 4 | 0.5, 1.0 |
| HKY | 0.7 | 1000 | 3.6 | n/a | 100/40 | 4 | 0.5, 1.0 |
| HKY | 0.8 | 1000 | 3.6 | n/a | 100/40 | 4 | 0.5, 1.0 |
| HKY | 0.9 | 1000 | 3.6 | n/a | 100/40 | 4 | 0.5, 1.0 |
| HKY + Γ | 0.6 | 1000 | 3.6 | 0.25 | 100/40 | 4 | 0.5, 1.0 |
| HKY + Γ | 0.6 | 1000 | 3.6 | 0.5 | 100/40 | 4 | 0.5, 1.0 |
κ is the transition/transversion bias. α is the shape parameter for Γ-distributed intersite rate variation. Insertion/Deletion Rate is relative to the point mutation rate, i.e., a rate of 100/40 indicates 1 insertion every 100 point mutations and 1 deletion every 40 point mutations. Each simulation condition was replicated 1000 times.