| Literature DB >> 16293191 |
Alexander K Hudek1, Daniel G Brown.
Abstract
BACKGROUND: Multiple genome alignment is an important problem in bioinformatics. An important subproblem used by many multiple alignment approaches is that of aligning two multiple alignments. Many popular alignment algorithms for DNA use the sum-of-pairs heuristic, where the score of a multiple alignment is the sum of its induced pairwise alignment scores. However, the biological meaning of the sum-of-pairs of pairs heuristic is not obvious. Additionally, many algorithms based on the sum-of-pairs heuristic are complicated and slow, compared to pairwise alignment algorithms. An alternative approach to aligning alignments is to first infer ancestral sequences for each alignment, and then align the two ancestral sequences. In addition to being fast, this method has a clear biological basis that takes into account the evolution implied by an underlying phylogenetic tree. In this study we explore the accuracy of aligning alignments by ancestral sequence alignment. We examine the use of both maximum likelihood and parsimony to infer ancestral sequences. Additionally, we investigate the effect on accuracy of allowing ambiguity in our ancestral sequences.Entities:
Mesh:
Year: 2005 PMID: 16293191 PMCID: PMC1310622 DOI: 10.1186/1471-2105-6-273
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 3Phylogenetic tree with early growth. Phylogenetic tree with a period of rapid growth followed by a period of slow growth. This tree resembles the tree of placental mammals and the distance from the root to taxa is approximately 100 million years.
Figure 4Phylogenetic tree with late growth. Phylogenetic tree with a period of slow growth followed by a period of rapid growth. The distance from root to taxa is approximately 100 million years.
Figure 1Example random trees. Example of a random tree with early growth (A) and a random tree with late growth (B).
Alignment accuracies using correct gap costs. Alignment accuracies using a gap open cost of 7 and the optimal gap extension cost of 0.57. P-values are computed using a paired Student's t-test.
| Data Set | Measure | Parsimony | Ambiguous Parsimony | ML | Ambiguous ML | Ma et al. | ClustalW |
| Early Growth | Mean | 88.43% | 83.83% | 87.86% | 86.26% | 91.74% | 91.25% |
| Std. | 2.89% | 5.94% | 2.79% | 4.27% | 1.69% | 2.28% | |
| P-value | 1.6461 × 10-65 | 1.1992 × 10-17 | N/A | ||||
| Early Growth Double Length | Mean | 64.68% | 57.24% | 63.54% | 59.89% | 74.29% | 72.35% |
| Std. | 5.23% | 8.25% | 4.99% | 7.59% | 4.62% | 4.30% | |
| P-value | 1.5922 × 10-103 | 1.9161 × 10-21 | N/A | ||||
| Late Growth | Mean. | 95.76% | 96.01% | 89.44% | 87.52% | 96.63% | 96.79% |
| Std | 1.29% | 1.28% | 5.44% | 7.03% | 0.99% | 1.10% | |
| P-value | 7.7818 × 10-10 | 1.0404 × 10-12 | N/A | ||||
| Late Growth Double Length | Mean | 86.71% | 88.12% | 64.76% | 55.67% | 89.68% | 89.54% |
| Std. | 2.71% | 2.85% | 7.94% | 12.62% | 2.42% | 2.11% | |
| P-value | 2.1699 × 10-43 | 2.5033 × 10-50 | N/A | ||||
Mean pairwise alignment accuracies using correct gap costs. Mean pairwise alignment accuracies using a gap open cost of 7 and the optimal gap extension cost of 0.57. P-values are computed using a paired Student's t-test.
| Data Set | Measure | Parsimony | Ambiguous Parsimony | ML | Ambiguous ML | Ma et al. | ClustalW |
| Early Growth | Mean | 95.64% | 92.61% | 96.19% | 95.41% | 97.61% | 97.48% |
| Std. | 1.43% | 3.47% | 1.08% | 1.98% | 0.48% | 0.64% | |
| P-value | 1.3830 × 10-67 | 9.6503 × 10-17 | N/A | ||||
| Early Growth Double Length | Mean | 84.33% | 78.29% | 84.30% | 82.19% | 91.20% | 90.58% |
| Std. | 2.99% | 5.50% | 2.91% | 4.95% | 1.75% | 1.51% | |
| P-value | 9.1586 × 10-114 | 3.5959 × 10-18 | N/A | ||||
| Late Growth | Mean | 98.13% | 98.26% | 94.73% | 93.74% | 98.65% | 98.67% |
| Std. | 0.62% | 0.62% | 3.14% | 4.05% | 0.39% | 0.44% | |
| P-value | 5.9142 × 10-09 | 1.1169 × 10-10 | N/A | ||||
| Late Growth Double Length | Mean | 93.63% | 94.45% | 80.79% | 75.51% | 95.50% | 95.44% |
| Std. | 1.37% | 1.47% | 4.85% | 7.91% | 1.14% | 0.95% | |
| P-value | 2.1676 × 10-44 | 6.5523 × 10-47 | N/A | ||||
Figure 2Plot of column scores against mean pairwise column scores. Plot of column accuracy versus mean pairwise column accuracy for alignments from all experiments. The column accuracy and the mean pairwise column accuracy have a roughly linear relationship.
Alignment accuracies for differing gap open costs. Alignment accuracies for various gap open costs using a gap extension cost of 0.57.
| Data Set | Gap Open Cost | Parsimony | Ambiguous Parsimony | ML | Ambiguous ML | Ma et al. | ClustalW |
| Early Growth | 5 | 89.27% | 88.41% | 88.43% | 88.09% | 92.77% | 90.48% |
| 7 | 88.43% | 83.83% | 87.86% | 86.26% | 91.74% | 91.25% | |
| 9 | 86.17% | 76.04% | 86.13% | 82.58% | 90.42% | 91.48% | |
| Change | 3.10% | 12.37% | 2.3% | 5.51% | 2.35% | 1.00% | |
| Early Growth Double Length | 5 | 63.43% | 63.32% | 63.01% | 63.05% | 78.19% | 69.63% |
| 7 | 64.68% | 57.24% | 63.54% | 59.89% | 74.29% | 72.35% | |
| 9 | 61.40% | 46.72% | 60.02% | 52.33% | 68.44% | 73.59% | |
| Change | 2.03% | 16.6% | 2.99% | 10.72% | 9.75% | 3.96% | |
| Late Growth | 5 | 95.95% | 96.35% | 92.06% | 91.70% | 96.97% | 96.50% |
| 7 | 95.76% | 96.01% | 89.44% | 87.52% | 96.63% | 96.79% | |
| 9 | 95.21% | 95.39% | 84.95% | 80.71% | 96.12% | 96.91% | |
| Change | 0.74% | 0.96% | 7.11% | 10.99% | 0.85% | 0.41% | |
| Late Growth Double Length | 5 | 86.21% | 88.91% | 73.21% | 68.18% | 91.43% | 88.17% |
| 7 | 86.71% | 88.12% | 64.76% | 55.67% | 89.68% | 89.54% | |
| 9 | 85.58% | 86.16% | 54.36% | 43.02% | 87.08% | 90.15% | |
| Change | 0.63% | 2.75% | 18.85% | 25.16% | 4.35% | 1.98% |
Alignment accuracies using different gap open cost scaling functions. Alignment accuracies using two different gap cost scaling functions. The unsealed gap open cost is 7 and the unsealed gap extension cost is 1. The Max method scales gap open costs according to the maximum value in the scoring matrix. The Expected method scales gap open costs according to the expected score of a related symbol pair from the two ancestral sequences.
| Data Set | Gap Scaling Method | Parsimony | Ambiguous Parsimony | ML | Ambiguous ML |
| Early Growth | Max | 86.92% | 85.07% | 86.59% | 86.13% |
| Expected | 89.95% | 89.79% | 89.17% | 89.31% | |
| Early Growth Double Length | Max | 66.10% | 64.51% | 67.17% | 66.10% |
| Expected | 71.12% | 73.48% | 72.02% | 73.81% | |
| Late Growth | Max | 95.02% | 95.21% | 93.52% | 92.31% |
| Expected | 96.04% | 96.35% | 95.03% | 94.68% | |
| Late Growth Double Length | Max | 85.95% | 86.39% | 82.74% | 75.98% |
| Expected | 88.38% | 89.91% | 87.17% | 85.35% |