| Literature DB >> 23217018 |
Joseph P Rusinko1, Brian Hipp.
Abstract
BACKGROUND: First proposed by Cavender and Felsenstein, and Lake, invariant based algorithms for phylogenetic reconstruction were widely dismissed by practicing biologists because invariants were perceived to have limited accuracy in constructing trees based on DNA sequences of reasonable length. Recent developments by algebraic geometers have led to the construction of lists of invariants which have been demonstrated to be more accurate on small sequences, but were limited in that they could only be used for trees with small numbers of taxa. We have developed and tested an invariant based quartet puzzling algorithm which is accurate and efficient for biologically reasonable data sets.Entities:
Year: 2012 PMID: 23217018 PMCID: PMC3549829 DOI: 10.1186/1748-7188-7-35
Source DB: PubMed Journal: Algorithms Mol Biol ISSN: 1748-7188 Impact factor: 1.405
Figure 1Depiction of Unrooted Tree Topology (12)(34).
Minimal invariant scores for crab data
| 1 | 4231 | .00660042 | (13)(24) |
| 2 | 2341 | .00662293 | (14)(23) |
| 3 | 3214 | .00664202 | (14)(23) |
| 4 | 3241 | .00664335 | (14)(23) |
| 5 | 3142 | .00668515 | (13)(24) |
| 6 | 3412 | .00669565 | (12)(34) |
| 7 | 1234 | .00672918 | (12)(34) |
| 8 | 4123 | .00681494 | (14)(23) |
| 9 | 1432 | .00683403 | (14)(23) |
| 10 | 4132 | .00683536 | (13)(23) |
Minimal invariant scores for selected orderings of a four species subset of crab data.
Biologically symmetric invariant scores
| 1 | (14)(23) | .00595688 |
| 2 | (13)(24) | .00601559 |
| 3 | (12)(34) | .00632047 |
Biologically symmetric scores for a four species subset of crab data.
Length 300 simulation accuracy percentages
| | ||||||||
|---|---|---|---|---|---|---|---|---|
| M=0.1 BSI-JC | 5 | 9 | 8 | 7 | 11 | 10 | 12 | 11 |
| M=0.1 K2P | 5 | 8 | 6 | 6 | 10 | 10 | 11 | 11 |
| M=0.1 ML | 1 | 3 | 2 | 2 | 3 | 3 | 4 | 3 |
| M=0.3 BSI-JC | 14 | 27 | 17 | 19 | 33 | 34 | 35 | 34 |
| M=0.3 K2P | 5 | 22 | 11 | 13 | 27 | 25 | 25 | 26 |
| M=0.3 ML | 4 | 14 | 7 | 9 | 18 | 24 | 21 | 21 |
| M=1.0 BSI-JC | 3 | 11 | 3 | 6 | 20 | 25 | 23 | 23 |
| M=1.0 K2P | 1 | 7 | 2 | 3 | 5 | 7 | 6 | 6 |
| M=1.0 ML | 0 | 3 | 1 | 2 | 17 | 26 | 22 | 22 |
| M=2.0 BSI-JC | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| M=2.0 K2P | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| M=2.0 ML | 0 | 0 | 0 | 0 | 1 | 3 | 1 | 2 |
Comparison of the percentage of reconstructed trees that match the correct tree using Biologically Symmetric Invariants (BSI) and Kimura 2 parameter invariants (K2P) versus Maximum Likelihood (ML) for sequences of length 300.
Length 300 simulation Robinson-Foulds distances
| | ||||||||
|---|---|---|---|---|---|---|---|---|
| M=0.1 BSI-JC | 4.9 | 4.8 | 4.9 | 4.9 | 4.4 | 4.5 | 4.3 | 4.4 |
| M=0.1 K2P | 5.1 | 4.9 | 5.0 | 5.0 | 4.7 | 4.7 | 4.4 | 4.6 |
| M=0.1 ML | 4.3 | 3.8 | 4.1 | 4.1 | 3.6 | 3.7 | 3.6 | 3.6 |
| M=0.3 BSI-JC | 3.2 | 2.6 | 3.3 | 3.0 | 2.1 | 2.1 | 2.1 | 2.1 |
| M=0.3 K2P | 4.4 | 3.3 | 4.0 | 3.9 | 2.7 | 2.6 | 2.7 | 2.7 |
| M=0.3 ML | 3.1 | 2.2 | 2.9 | 2.7 | 1.8 | 1.7 | 1.8 | 1.8 |
| M=1.0 BSI-JC | 4.8 | 4.8 | 5.3 | 5.0 | 3.1 | 2.8 | 2.9 | 2.9 |
| M=1.0 K2P | 5.9 | 5.7 | 6.1 | 5.9 | 5.9 | 5.5 | 5.7 | 5.7 |
| M=1.0 ML | 4.3 | 3.6 | 4.1 | 4.0 | 1.9 | 1.7 | 1.8 | 1.8 |
| M=2.0 BSI-JC | 9.1 | 11.6 | 10.3 | 10.3 | 11 | 11.4 | 11.2 | 11.2 |
| M=2.0 K2P | 8.8 | 11.7 | 9.9 | 10.1 | 15.1 | 15.0 | 14.9 | 15.0 |
| M=2.0 ML | 6.6 | 6.5 | 6.6 | 6.6 | 4.3 | 4.1 | 4.2 | 4.2 |
Comparison of the average Robinson-Foulds distance between the reconstructed tree and the correct tree for sequences of length 300.
Length 600 simulation accuracy percentages
| | ||||||||
| M=0.1 BSI-JC | 36 | 40 | 37 | 38 | 50 | 45 | 49 | 48 |
| M=0.1 K2P | 28 | 35 | 30 | 31 | 47 | 44 | 44 | 45 |
| M=0.1 ML | 17 | 29 | 21 | 22 | 29 | 28 | 27 | 28 |
| M=0.3 BSI-JC | 57 | 66 | 63 | 62 | 77 | 80 | 80 | 79 |
| M=0.3 K2P | 33 | 57 | 43 | 44 | 70 | 69 | 69 | 69 |
| M=0.3 ML | 48 | 61 | 54 | 54 | 74 | 79 | 78 | 77 |
| M=1.0 BSI-JC | 23 | 41 | 29 | 31 | 70 | 71 | 68 | 70 |
| M=1.0 K2P | 9 | 36 | 16 | 20 | 32 | 36 | 36 | 35 |
| M=1.0 ML | 21 | 43 | 26 | 30 | 76 | 84 | 79 | 80 |
| M=2.0 BSI-JC | 0 | 2 | 0 | 1 | 2 | 5 | 2 | 3 |
| M=2.0 K2P | 0 | 4 | 1 | 2 | 0 | 0 | 0 | 0 |
| M=2.0 ML | 0 | 1 | 0 | 1 | 24 | 36 | 30 | 30 |
Comparison of the percentage of reconstructed trees that match the correct tree using Biologically Symmetric Invariants (BSI) and Kimura 2 parameter invariants (K2P) versus Maximum Likelihood (ML) for sequences of length 600.
Length 600 simulation Robinson-Foulds distances
| | ||||||||
|---|---|---|---|---|---|---|---|---|
| M=0.1 BSI-JC | 1.9 | 1.7 | 1.9 | 1.8 | 1.4 | 1.6 | 1.5 | 1.5 |
| M=0.1 K2P | 2.4 | 2.1 | 2.6 | 2.3 | 1.6 | 1.7 | 1.7 | 1.6 |
| M=0.1 ML | 1.9 | 1.5 | 1.7 | 1.7 | 1.4 | 1.5 | 1.5 | 1.5 |
| M=0.3 BSI-JC | 1.0 | 0.8 | 0.9 | 0.9 | 0.5 | 0.5 | 0.5 | 0.5 |
| M=0.3 K2P | 1.9 | 1.2 | 1.6 | 1.5 | .7 | .7 | .7 | .7 |
| M=0.3 ML | 0.8 | 0.6 | 0.7 | 0.7 | 0.4 | 0.3 | 0.3 | 0.3 |
| M=1.0 BSI-JC | 2.3 | 1.7 | 2.3 | 2.1 | 0.7 | 0.7 | 0.8 | 0.7 |
| M=1.0 K2P | 3.6 | 2.2 | 3.0 | 2.9 | 2.2 | 2.0 | 2.0 | 2.1 |
| M=1.0 ML | 1.8 | 1.0 | 1.5 | 1.4 | 0.4 | 0.3 | 0.3 | 0.3 |
| M=2.0 BSI-JC | 6.7 | 7.8 | 7.6 | 7.4 | 7.4 | 6.9 | 7.2 | 7.2 |
| M=2.0 K2P | 6.3 | 7.0 | 6.8 | 6.7 | 11.6 | 11.8 | 11.8 | 11.7 |
| M=2.0 ML | 4.4 | 3.8 | 4.3 | 4.2 | 1.7 | 1.4 | 1.6 | 1.6 |
Comparison of the average Robinson-Foulds distance between the reconstructed tree and the correct tree for sequences of length 600.
Comparison of BSI with relevant invariants only
| M=0.1 BSI-JC | 7 | 38 | 11 | 48 |
| M=0.1 JC-R | 10 | 41 | 11 | 45 |
| M=0.3 BSI-JC | 19 | 62 | 34 | 79 |
| M=0.3 JC-R | 24 | 69 | 32 | 78 |
| M=1.0 BSI-JC | 6 | 31 | 23 | 70 |
| M=1.0 JC-R | 5 | 26 | 14 | 48 |
| M=2.0 BSI-JC | 0 | 1 | 0 | 3 |
| M=2.0 R-JC | 0 | 1 | 0 | 2 |
Comparison of percentage of reconstructed trees that match the correct tree using biologically symmetric invariants (BSI-JC) compared with using only the relevant invariants (JC-R).
Comparison of BSI-invariants to minimal invariants
| CC M=0.1 | 11 | 10 |
| CC M=0.3 | 33 | 32 |
| CC M=1.0 | 20 | 15 |
| CC M=2.0 | 0 | 0 |
| CD M=0.1 | 12 | 10 |
| CD M=0.3 | 36 | 32 |
| CD M=1.0 | 21 | 17 |
| CD M=2.0 | 0 | 0 |
| DD M=0.1 | 10 | 9 |
| DD M=0.3 | 34 | 32 |
| DD M=1.0 | 23 | 20 |
| DD M=2.0 | 0 | 0 |
Percentage of reconstructed trees that match the correct tree using BSI invariants (BSI-JC) vs minimal Jukes Cantor invariants (JC) (length=300 bp).
BSI-JC accuracy based on number of orderings
| CC M=0.1 | 11 | 10 | 11 | 11 | 10 |
| CC M=0.3 | 33 | 33 | 34 | 32 | 27 |
| CC M=1.0 | 20 | 19 | 18 | 18 | 14 |
| CC M=2.0 | 0 | 0 | 0 | 0 | 0 |
| CD M=0.1 | 12 | 13 | 13 | 12 | 11 |
| CD M=0.3 | 36 | 33 | 34 | 33 | 27 |
| CD M=1.0 | 21 | 22 | 20 | 19 | 14 |
| CD M=2.0 | 0 | 0 | 0 | 0 | 0 |
| DD M=0.1 | 10 | 10 | 10 | 10 | 9 |
| DD M=0.3 | 34 | 34 | 34 | 33 | 28 |
| DD M=1.0 | 23 | 24 | 22 | 22 | 18 |
| DD M=2.0 | 0 | 0 | 0 | 0 | 0 |
Percentage of reconstructed trees that match the correct tree using N randomly generated orderings (length=300 bp).
Effect of model selection on reconstruction accuracy for sequences of length 300 and 600 with low, medium or high evolutionary rates
| Sequences generated under Jukes Cantor model assumptions | 35.3 | 31.3 |
| Sequences generated 2 under Kimura Parameter model assumptions | 34.1 | 23.0 |
Percentage of reconstructed trees for data data sets generated by Jukes Cantor versus Kimura 2 parameter model assumptions.
Effect of model selection on reconstruction accuracy for sequences of length 5000 with very high evolutionary rates
| Sequences generated under Jukes Cantor model assumptions for tree AA | 24 | 12 |
| Sequences generated under Kimura 2 Parameter model assumptions for tree AA | 56 | 69 |
| Sequences generated under Jukes Cantor model assumptions for tree CC | 99 | 84 |
| Sequences generated under Kimura 2 Parameter model assumptions for tree CC | 63 | 88 |
Percentage of reconstructed trees for data data sets generated by Jukes Cantor versus Kimura 2 parameter model assumptions.