| Literature DB >> 17150093 |
Aleksandra Czarna1, Rafael Sanjuán, Fernando González-Candelas, Borys Wróbel.
Abstract
BACKGROUND: The least squares (LS) method for constructing confidence sets of trees is closely related to LS tree building methods, in which the goodness of fit of the distances measured on the tree (patristic distances) to the observed distances between taxa is the criterion used for selecting the best topology. The generalized LS (GLS) method for topology testing is often frustrated by the computational difficulties in calculating the covariance matrix and its inverse, which in practice requires approximations. The weighted LS (WLS) allows for a more efficient albeit approximate calculation of the test statistic by ignoring the covariances between the distances.Entities:
Mesh:
Substances:
Year: 2006 PMID: 17150093 PMCID: PMC1698936 DOI: 10.1186/1471-2148-6-105
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Confidence sets of trees derived from mammalian mitochondrial protein data
| (opposum,(mouse,(rabbit,((seal, cow), human)))) | |||||
| (opposum,(mouse,((rabbit,(seal, cow)), human))) | |||||
| (opposum,(mouse,((seal, cow),(rabbit, human)))) | |||||
| (opposum,((mouse,(rabbit,(seal, cow))), human)) | |||||
| (opposum,((rabbit,(seal, cow)),(mouse, human))) | 0.0012 | 0.0090 | |||
| (opposum,(rabbit,(mouse,((seal, cow), human)))) | 0.0000 | 0.0050 | 0.0244 | ||
| (opposum,((rabbit, mouse),((seal, cow), human))) | 0.0039 | 0.0244 | |||
| 0.0037 | 0.0410 | 0.0135 | 0.0494 | ||
| (opposum,(((seal, cow),(rabbit, mouse)), human)) | 0.0135 | 0.0496 | |||
| (opposum,(rabbit,((seal, cow),(mouse, human)))) | 0.0000 | 0.0000 | 0.0130 | 0.0496 | |
| (opposum,((seal, cow),(mouse, (rabbit, human)))) | 0.0004 | 0.0150 | 0.0130 | 0.0494 | |
| (opposum,((seal, cow),(rabbit,(mouse, human)))) | 0.0026 | 0.0180 | 0.0130 | 0.0495 | |
| (opposum,((mouse,(seal, cow)),(rabbit, human))) | 0.0000 | 0.0000 | 0.0130 | 0.0493 | |
| (opposum,(rabbit,((mouse,(seal, cow)), human))) | 0.0000 | 0.0000 | 0.0130 | 0.0494 | |
| (opposum,((rabbit,(mouse,(seal, cow))), human)) | 0.0000 | 0.0120 | 0.0130 | 0.0495 | |
| (opposum,(mouse,(rabbit,(seal,(cow, human))))) | 0.0000 | 0.0010 | 0.0480 | 0.0000 | 0.0000 |
| (opposum,(mouse,(rabbit,(cow,(seal, human))))) | 0.0000 | 0.0030 | 0.0480 | 0.0000 | 0.0000 |
| (opposum,((rabbit, mouse),(seal,(cow, human)))) | 0.0000 | 0.0010 | 0.0010 | 0.0000 | 0.0000 |
| (opposum,(rabbit,(mouse,(seal,(cow, human))))) | 0.0000 | 0.0000 | 0.0010 | 0.0000 | 0.0000 |
| (opposum,(rabbit,(mouse,(cow,(seal, human))))) | 0.0000 | 0.0000 | 0.0010 | 0.0000 | 0.0000 |
Aminoacid sequences were taken from six mammalian species: Homo sapiens (human), Phoca vitulina (seal), Bos taurus (cow) Oryctolagus cuniculus (rabbit), Mus musculus (mouse), and Didelphis virginiana (opposum). The tree believed to be the best estimate of mammalian phylogeny [40] has been underlined. Values in bold indicate the trees included in the 0.95 confidence set: the trees with the highest confidence levels which add up to 0.95 for the expected likelihood weights (ELW) test, and the trees with P-values above 0.05 for one-tailed Kishino-Hasegawa (KH) test, Shimodaira-Hasegawa (SH) test, generalised least squares (GLS) test, and weighted least squares test (WLS).
Confidence sets of trees derived from eight-taxon data sets obtained from the EMBL-ALIGN database.
| 1 | ALIGN_000002 | 1,2,3,4,5,6,7,8 | 1632 | 6.4141 | 22.9 | 141 | 14 | 9 | 135 |
| 2 | ALIGN_000205 | 2,3,4,6,8,10,11,12 | 1386 | 6.2104 | 4.3 | 15 | 6 | 18 | 9 |
| 3 | ALIGN_000297 | 2,3,4,6,15,16,17,19 | 1167 | 0.0000 | 31.4 | 315 | 258 | 315 | 315 |
| 4 | ALIGN_000397 | 2,3,4,6,7,8,9,10 | 1662 | 0.0000 | 24.3 | 2745 | 328 | 10395 | 10206 |
| 5 | ALIGN_000398 | 1,2,3,4,5,6,7,8 | 1656 | 3.1567 | 0.0 | 477 | 20 | 815 | 77 |
| 6 | ALIGN_000521 | 1,2,3,4,5,6,7,8 | 1325 | 8.4751 | 5.7 | 135 | 11 | 105 | 107 |
| 7 | ALIGN_000623 | 2,3,4,5,6,10,11,12 | 1312 | 0.0000 | 0.0 | 380 | 20 | 10395 | 33 |
| 8 | ALIGN_000628 | 2,3,4,5,7,13,17,31 | 1385 | 0.3071 | 0.0 | 141 | 5 | 117 | 21 |
| 9 | ALIGN_000767 | 2,3,4,5,6,7,8,10 | 1386 | 6.6421 | 4.3 | 15 | 6 | 9 | 9 |
| 10 | ALIGN_000771 | 1,2,3,4,5,6,7,8 | 4547 | 1.8389 | 14.3 | 81 | 9 | 10 | 15 |
| 11 | ALIGN_000788 | 2,3,4,5,6,7,12,14 | 1629 | 0.7085 | 24.3 | 945 | 80 | 225 | 135 |
| 12 | ALIGN_000832 | 2,3,4,5,6,7,9,10 | 1185 | 0.0000 | 10.0 | 327 | 50 | 49 | 315 |
| 13 | ALIGN_000853 | 1,2,3,4,5,6,7,8 | 5307 | 3.3574 | 12.9 | 225 | 20 | 225 | 45 |
| 14 | ALIGN_000930 | 2,3,4,5,6,7,8,9 | 1321 | 0.0000 | 78.6 | 10395 | 8925 | 10395 | 10393 |
| 15 | ALIGN_000931 | 2,3,5,14,15,16,19,21 | 1231 | 0.8196 | 100.0 | 10395 | 9876 | 10395 | 10395 |
| 16 | ALIGN_000984 | 2,3,4,6,7,11,12,13 | 1139 | 2.0933 | 45.7 | 10395 | 2344 | 10395 | 10391 |
The sequences taken from each alignment are listed in the second column. ΔAIC values for the Felsenstein84 nucleotide substitution model were calculated using PAUP* and ModelTest. The percentage of four-taxon subset for which the star topology was the ML solution (unresolved quartets) was calculated using TreePuzzle. The last four columns show number of trees out of possible 10395 included in the 0.95 confidence set using: expected likelihood weights (ELW) test, Shimodaira-Hasegawa (SH) test, generalised least squares (GLS) test, and weighted least squares test (WLS).
Figure 1The 10-, 15- and 20-taxon trees used in the simulations.
The confidence sets and their coverage for the simulated data
| 10 | 19.28 | 5.26 | 5.32 | 14.98 | 0.84 | 1.00 |
| 15 | 14.43 | 3.65 | 8.30 | 14.84 | 0.78 | 1.00 |
| 20 | 57.69 | 32.48 | 52.04 | 95.43 | 0.77 | 1.00 |
The table shows the average number of trees (for 100 simulations) in the 95% confidence sets of GLS, WLS, SH and ELW out of 100 trees constructed for simulated 10-, 15- and 20-taxon sequences and the coverage of this confidence sets for GLS and WLS (the frequency in which the confidence set included the true tree).
Topology testing with a large data set of closely related Hepatitis C Virus sequences.
| none (ML tree) | ||||
| LC-51 | ||||
| LC-86 | 0.0430 | |||
| LC-26 | 0.0057 | 0.0180 | ||
| LC-24 | 0.0026 | 0.0090 | ||
| LC-59 | 0.0051 | 0.0230 | ||
| LC-53 | ||||
| LC-38 | ||||
| LC-63 | 0.0131 | 0.0210 | ||
| EO-79 | ||||
| EO-47 | ||||
| EO-95 | 0.0009 | 0.0060 | ||
| EO-12 | 0.0055 | 0.0260 | ||
| EO-02 | ||||
| O1–00 | 0.0000 | 0.0000 | 0.0000 | |
| O1–69 | 0.0057 | 0.0170 | ||
| O2–60 | 0.0000 | 0.0000 | 0.0000 | 0 (56449.21) |
| O2–61 | 0.0000 | 0.0000 | 0.0000 | |
| O2–29 | 0.0000 | 0.0000 | 0.0000 | 0 (64100.40) |
| O3–10 | 0.0000 | 0.0000 | 0.0000 | 0 (50280.91) |
| O3–72 | 0.0000 | 0.0000 | 0.0000 | 0 (57742.44) |
| O3–62 | 0.0000 | 0.0000 | 0.0010 | 0 (56689.96) |
| O3–63 | 0.0000 | 0.0000 | 0.0000 | 0 (58317.22) |
| O3–91 | 0.0000 | 0.0000 | 0.0000 | 0 (54015.96) |
| O3–12 | 0.0000 | 0.0000 | 0.0000 | 0 (63679.38) |
| O3–03 | 0.0000 | 0.0000 | 0.0000 | 0 (63107.46) |
| O3–42 | 0.0000 | 0.0000 | 0.0000 | 0 (62828.41) |
| O3–50 | 0.0000 | 0.0000 | 0.0000 | 0 (55717.93) |
| O3–54 | 0.0000 | 0.0000 | 0.0000 | 0 (45640.57) |
| O3–80 | 0.0000 | 0.0000 | 0.0010 | 0 (63024.57) |
| O3–79 | 0.0000 | 0.0000 | 0.0000 | 0 (58853.93) |
| O3–98 | 0.0000 | 0.0000 | 0.0000 | 0 (51672.80) |
The data set consisted of 295 sequences corresponding to the E1–E2 region of the viral genome taken from 23 patients plus 8 local control (LC) sequences of the same genotype (HCV-1b) taken from individuals unrelated to the outbreak. The results from a more detailed analysis with an expanded data set were used to separate the 23 patients into four groups: EO, excluded from the outbreak; O1, involved in the outbreak, transmission chain 1; O2, involved in transmission chain 2; and O3, involved in transmission chain 3. The test set consisted of 32 trees, the ML tree and 31 trees in which the sequences from each patient where moved to form a monophyletic group with the external controls. For each alternative topology the probability associated to the corresponding test statistic (see abbreviations in Table 1) is shown. Topologies included in the confidence set around the ML tree at the 0.05 level (bold) are indicated. For WLS, the value of the corresponding statistic is shown between parentheses.
Figure 2Shape of the chi square distribution with 42778 degrees of freedom. Panel A shows the density distribution; panel B the cumulative density.
Figure 3The analysis of sea dollar DNA/DNA hybridization data using the WLS method compared with the results of bootstrap [33].