| Literature DB >> 17883877 |
Hidenori Nishihara1, Norihiro Okada, Masami Hasegawa.
Abstract
BACKGROUND: Ongoing genome sequencing projects have led to a phylogenetic approach based on genome-scale data (phylogenomics), which is beginning to shed light on longstanding unresolved phylogenetic issues. The use of large datasets in phylogenomic analysis results in a global increase in resolution due to a decrease in sampling error. However, a fully resolved tree can still be wrong if the phylogenetic inference is biased.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17883877 PMCID: PMC2375037 DOI: 10.1186/gb-2007-8-9-r199
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Three phylogenetic hypotheses for the root of theeutherian tree. (a) Tree 1: basal Afrotheria. (b) Tree 2: basal Xenarthra. (c) Tree 3: basal Boreotheria, or Afrotheria/Xenarthra clade. The phylogenetic relationships within Boreotheria (cow, dog, mouse, rat, human, chimpanzee, and macaque) are fixed in this study.
Comparison of the log-likelihood for the three hypotheses with each model
| Concatenate or separate model | Substitution model | Tree | < ln | KH | wSH | BP | #p | AIC |
| Concatenate model | GTR + Γ8 | 1 | -117.2 ± 31.1 | 0.000 | 0.000 | 0.0 | ||
| 2 | -147.3 ± 29.7 | 0.000 | 0.000 | 0.0 | ||||
| 3 | < -4,076,316.3 > | 100.0 | 26 | 8,152,684.6 | ||||
| Codon + Γ4 | 1 | < -3,828,351.7 > | 88.1 | 81 | 7,656,865.4 | |||
| 2 | -77.8 ± 64.5 | 0.112 | 0.185 | 11.3 | ||||
| 3 | -142.7 ± 65.0 | 0.014 | 0.026 | 0.6 | ||||
| JTT-F + Γ8 | 1 | < -1,905,933.9 > | 51.6 | 37 | 3,811,941.8 | |||
| 2 | -84.1 ± 37.4 | 0.014 | 0.028 | 0.2 | ||||
| 3 | -1.7 ± 41.9 | 0.478 | 0.637 | 48.2 | ||||
| Separate model (among 2,789 genes) | GTR + Γ8 | 1 | < -3,963,489.9 > | 86.2 | 72,514 | 8,072,007.8 | ||
| 2 | -117.4 ± 72.3 | 0.050 | 0.092 | 4.1 | ||||
| 3 | -91.4 ± 72.7 | 0.104 | 0.174 | 9.7 | ||||
| Codon + Γ4 | 1 | < -3,621,322.1 > | 89.6 | 225,909 | 7,694,462.2 | |||
| 2 | -128.0 ± 103.2 | 0.107 | 0.164 | 10.4 | ||||
| 3 | -527.9 ± 96.3 | 0.000 | 0.000 | 0.0 | ||||
| JTT-F + Γ8 | 1 | < -1,799,245.4 > | 93.4 | 103,193 | 3,804,876.8 | |||
| 2 | -134.9 ± 88.5 | 0.064 | 0.112 | 6.6 | ||||
| 3 | -317.6 ± 85.5 | 0 | 0.000 | 0.0 | ||||
Maximum likelihood (ML) trees varied depending on the substitution model used for the concatenate analysis, whereas the separate model analyses consistently supported tree 1. The log-likelihood of the ML tree is given in angled brackets, and the differences in the log-likelihoods of alternative trees from that of the ML tree ± 1 standard error were estimated using the formula of Kishino and Hasegawa [28]. Numbering of the trees corresponds to that shown in Figure 1. KH and wSH denote P values derived using by the test of Kishino and Hasegawa [28] and the weighted test of Shimodaira and Hasegawa [27], respectively, calculated by the CONSEL program [47]. AIC, the Akaike Information Criterion [29]; #p, number of parameters of the model.
Comparison of BPs among trees 1 to 3 analyzed with concatenate and separate models
| Model | #c | Ln L | #p | #s | #s/#p | AIC | AICc | Tree 1 | Tree 2 | Tree 3 |
| Nucleotide (GTR + Γ8) | 1 | -4,076,316.3 | 26 | 1,011,870 | 38,918.1 | 8,152,684.6 | 0.0 | 0.0 | 100.0 | |
| 5 | -4,059,904.9 | 130 | 1,011,870 | 7,783.6 | 8,120,069.8 | 0.0 | 0.0 | 100.0 | ||
| 10 | -4,058,547.6 | 260 | 1,011,870 | 3,891.8 | 8,117,615.3 | 0.0 | 0.0 | 100.0 | ||
| 56 | -4,055,469.5 | 1,456 | 1,011,870 | 695.0 | 8,113,855.2 | 0.1 | 0.0 | 99.9 | ||
| 100 | -4,053,634.1 | 2,600 | 1,011,870 | 389.2 | 8,112,481.6 | 0.1 | 0.0 | 99.9 | ||
| 200 | -4,049,237.9 | 5,200 | 1,011,870 | 194.6 | 8,108,929.5 | 0.2 | 0.0 | 99.8 | ||
| 558 | -4,035,535.0 | 14,508 | 1,011,870 | 69.7 | 8,100,508.1 | 1.7 | 0.0 | 98.3 | ||
| 930 | -4,022,303.0 | 24,180 | 1,011,870 | 41.8 | 8,094,150.0 | 3.6 | 0.0 | 96.4 | ||
| 1,395 | -4,006,623.4 | 36,270 | 1,011,870 | 27.9 | 8,085,786.8 | 25.0 | 0.7 | 74.3 | ||
| Codon (+ Γ4) | 1 | -3,828,351.7 | 81 | 337,290 | 4,164.1 | 7,656,865.4 | 88.1 | 11.3 | 0.6 | |
| 5 | -3,810,589.3 | 405 | 337,290 | 832.8 | 7,621,989.6 | 94.3 | 5.1 | 0.7 | ||
| 10 | -3,808,198.7 | 810 | 337,290 | 416.4 | 7,618,021.3 | 93.3 | 5.9 | 0.8 | ||
| 56 | -3,802,941.9 | 4,536 | 337,290 | 74.4 | 7,615,079.5 | 93.0 | 5.2 | 1.7 | ||
| 200 | -3,791,928.7 | 16,200 | 337,290 | 20.8 | 7,616,257.4 | 91.0 | 8.1 | 1.0 | ||
| 558 | -3,766,336.0 | 45,198 | 337,290 | 7.5 | 7,623,068.0 | 96.7 | 2.9 | 0.3 | ||
| 930 | -3,741,173.9 | 75,330 | 337,290 | 4.5 | 7,633,007.8 | 98.0 | 1.7 | 0.3 | ||
| 1,395 | -3,712,084.5 | 112,995 | 337,290 | 3.0 | 7,650,159.0 | 96.2 | 3.8 | 0.0 | ||
| 2,789 | -3,621,322.1 | 225,909 | 337,290 | 1.5 | 7,694,462.2 | 89.6 | 10.4 | 0.0 | ||
| Amino acid (JTT-F + Γ8) | 1 | -1,905,933.9 | 37 | 337,290 | 9,115.9 | 3,811,941.8 | 51.6 | 0.2 | 48.2 | |
| 5 | -1,879,320.4 | 185 | 337,290 | 1,823.2 | 3,759,011.0 | 63.4 | 0.2 | 36.5 | ||
| 10 | -1,877,405.7 | 370 | 337,290 | 911.6 | 3,755,552.2 | 63.9 | 0.3 | 35.9 | ||
| 100 | -1,873,607.4 | 3,700 | 337,290 | 91.2 | 3,754,696.9 | 58.7 | 0.5 | 40.9 | ||
| 200 | -1,870,213.5 | 7,400 | 337,290 | 45.6 | 3,755,559.0 | 59.8 | 0.2 | 40.1 | ||
| 558 | -1,858,842.6 | 20,646 | 337,290 | 16.3 | 3,758,977.2 | 81.2 | 1.1 | 17.7 | ||
| 930 | -1,847,528.8 | 34,410 | 337,290 | 9.8 | 3,763,877.6 | 81.6 | 6.5 | 11.9 | ||
| 1,395 | -1,834,624.0 | 51,615 | 337,290 | 6.5 | 3,772,478.0 | 87.1 | 10.9 | 2.0 | ||
| 2,789 | -1,799,245.4 | 103,193 | 337,290 | 3.3 | 3,804,876.8 | 93.4 | 6.6 | 0.0 | ||
Maximum likelihood (ML) analyses with nucleotide, codon, and amino acid substitution models and comparison of bootstrap probabilities (BPs) among trees 1 to 3. Concatenate (#c = 1) and separate analyses were performed for each dataset. The #c, #p, and #s represent the number of categories separated according to the total branch length of the 2,789 genes, the number of parameters, and the number of characters (or sites), respectively. AIC is the Akaike Information Criterion, and AICc is the AIC with second order correction. AIC with #s/#p > 40 and AICc with #s/#p < 40 are shown in italics. The best models based on AIC or AICc are shown in bold.
Figure 2BPs of the three trees for the datasets constructed by successively removing the 50 most rapidly evolving genes. The horizontal axis shows the number of genes removed from the whole dataset of 2,789 genes. The dataset was analyzed using the (a) concatenate model; the (b) separate model, in which a category contains 50 genes grouped according to their total branch length; and (c) the separate model, in which different parameters were provided to each gene. Each analysis was performed using nucleotide (GTR + Γ8; the left-most column of panels), codon (+ Γ4; the middle column of panels), and amino acid (JTT + Γ8; the right-most column of panels) substitution models.