| Literature DB >> 18820254 |
Chih-Horng Kuo1, John P Wares, Jessica C Kissinger.
Abstract
The protistan phylum Apicomplexa contains many important pathogens and is the subject of intense genome sequencing efforts. Based upon the genome sequences from seven apicomplexan species and a ciliate outgroup, we identified 268 single-copy genes suitable for phylogenetic inference. Both concatenation and consensus approaches inferred the same species tree topology. This topology is consistent with most prior conceptions of apicomplexan evolution based upon ultrastructural and developmental characters, that is, the piroplasm genera Theileria and Babesia form the sister group to the Plasmodium species, the coccidian genera Eimeria and Toxoplasma are monophyletic and are the sister group to the Plasmodium species and piroplasm genera, and Cryptosporidium forms the sister group to the above mentioned with the ciliate Tetrahymena as the outgroup. The level of incongruence among gene trees appears to be high at first glance; only 19% of the genes support the species tree, and a total of 48 different gene-tree topologies are observed. Detailed investigations suggest that the low signal-to-noise ratio in many genes may be the main source of incongruence. The probability of being consistent with the species tree increases as a function of the minimum bootstrap support observed at tree nodes for a given gene tree. Moreover, gene sequences that generate high bootstrap support are robust to the changes in alignment parameters or phylogenetic method used. However, caution should be taken in that some genes can infer a "wrong" tree with strong support because of paralogy, model violations, or other causes. The importance of examining multiple, unlinked genes that possess a strong phylogenetic signal cannot be overstated.Entities:
Mesh:
Year: 2008 PMID: 18820254 PMCID: PMC2582981 DOI: 10.1093/molbev/msn213
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
List of Species Name Abbreviations and Data Sources
| Abbreviation | Species Name | Data Source | Version Date | Number of Proteins | Genome Size (Mb) |
| GenBank | 06 August 2007 | 3,703 | 8 | ||
| CryptoDB.org | 13 November 2007 | 3,805 | 9 | ||
| GeneDB.org | 01 January 2005 | 11,393 | 60 | ||
| PlasmoDB.org | 24 September 2007 | 5,460 | 23 | ||
| PlasmoDB.org | 24 September 2007 | 5,352 | 27 | ||
| GeneDB.org | 17 July 2005 | 3,795 | 8 | ||
| ToxoDB.org | 01 November 2007 | 7,793 | 63 | ||
| J. Craig Venter Institute | 04 October 2006 | 27,424 | 104 |
The annotated protein sequences were downloaded from the respective data source with the version date as indicated.
All annotated protein sequences from each species are used to identify single-copy genes that are shared by all species.
The free-living ciliate, T. thermophila, is included as the outgroup.
FThe inferred apicomplexan species tree. The ML tree is generated from the concatenated alignment of 268 single-copy genes (71,830 aligned amino acid sites). One free-living ciliate, Tetrahymena thermophila, is included as the outgroup to root the tree. Bootstrap support based on 100 replicates is 100% for all internal branches. Labels above branches indicate the level of consensus support (%) based on ML, MP, and NJ.
FFrequency distribution of gene-tree topologies. Based on the 268 single-copy genes examined, we observed a total of 48 gene-tree topologies. The six most frequently observed gene-tree topologies, each supported by more than 5% of the genes, are provided in figure 3.
FThe six most frequently observed gene-tree topologies. Each topology is supported by more than 5% of the 268 genes examined. The exact count and frequency of genes that support (or significantly reject) each topology are provided under the tree. ML: frequency of genes that infer the specific topology using ML inference; AU: frequency of genes that significantly reject the topology using AU test; SH: frequency of genes that significantly reject the topology using SH test.
Effects of Removing Genes Based on the Minimum Bootstrap Support
| Minimum Bootstrap Cutoff (%) | Number of Genes | Number of Topologies | Percentage of Genes that Inferred | Clade Support Based on ML Consensus (%) | |
| (( | ((( | ||||
| 0 | 268 | 48 | 19 | 44 | 38 |
| 50 | 130 | 25 | 25 | 50 | 40 |
| 60 | 69 | 15 | 29 | 55 | 49 |
| 70 | 30 | 10 | 47 | 63 | 60 |
| 80 | 15 | 5 | 73 | 73 | 80 |
| 90 | 5 | 1 | 100 | 100 | 100 |
The bootstrap support for each gene is inferred by the ML method based on 100 replicates. A gene is removed from the analysis if the minimum bootstrap support observed on the gene tree does not meet the cutoff.
Number of observed gene-tree topologies based on ML.
Robustness to Alignment Settings as a Function of the Minimum Bootstrap Support
| Minimum Bootstrap Cutoff (%) | Percentage of Genes in Each Class | ||
| Robust | Intermediate | Sensitive | |
| 0 | 60 | 27 | 12 |
| 50 | 77 | 18 | 5 |
| 60 | 83 | 16 | 1 |
| 70 | 90 | 10 | 0 |
| 80 | 93 | 7 | 0 |
| 90 | 100 | 0 | 0 |
Genes are categorized into three classes based on the sensitivity to sequence alignment settings.
A gene is classified as robust if it produces the same gene-tree topology under all three alignment settings (for details, see Materials and Methods).
A gene is classified as intermediate if it produces the same gene-tree topology under two out of the three alignment settings.
A gene is classified as sensitive if each alignment setting leads to a different gene-tree topology.
Robustness to Substitution Model as a Function of the Minimum Bootstrap Support
| Minimum Bootstrap Cutoff (%) | Precentage of Genes in Each Class | ||||
| JTT = LG = WAG | JTT = LG | JTT = WAG | LG = WAG | All Different | |
| 0 | 67 | 6 | 10 | 10 | 7 |
| 50 | 79 | 5 | 7 | 6 | 3 |
| 60 | 84 | 6 | 4 | 4 | 1 |
| 70 | 93 | 0 | 0 | 7 | 0 |
| 80 | 93 | 0 | 0 | 7 | 0 |
| 90 | 80 | 0 | 0 | 20 | 0 |
Genes are categorized into five classes based on the agreements among the three substitution models used in ML inference. Note that this classification only concerns the consistency of gene-tree topologies inferred by different substitution models for each individual gene. The agreement between a gene tree and the species tree is not considered.
Methodological Concordance as a Function of the Minimum Bootstrap Support
| Minimum Bootstrap Cutoff (%) | Percentage of Genes in Each Class | ||||
| ML = MP = NJ | ML = MP | ML = NJ | MP = NJ | All Different | |
| 0 | 22 | 12 | 25 | 8 | 34 |
| 50 | 32 | 14 | 34 | 5 | 15 |
| 60 | 43 | 7 | 38 | 4 | 7 |
| 70 | 57 | 7 | 33 | 3 | 0 |
| 80 | 60 | 7 | 33 | 0 | 0 |
| 90 | 100 | 0 | 0 | 0 | 0 |
Genes are categorized into five classes based on the agreements among the three phylogenetic methods used. Note that this classification only concerns the consistency of gene-tree topologies inferred by different phylogenetic methods for each individual gene. The agreement between a gene tree and the species tree is not considered. Because we used the strict consensus method to consolidate all equally parsimonious trees of a gene, a multifurcating MP tree always has a nonzero topology distance from a fully bifurcating ML or NJ tree.
Effects of Taxon Removal
| Removal of the outgroup | |||||
| Minimum bootstrap cutoff (%) | Number of genes | Number of topologies | Consensus support based on ML (%) | ||
| (( | (( | (( | |||
| 0 | 268 | 16 | 57 | 20 | 22 |
| 50 | 215 | 10 | 62 | 17 | 20 |
| 60 | 169 | 8 | 64 | 16 | 20 |
| 70 | 124 | 7 | 48 | 15 | 17 |
| 80 | 81 | 4 | 70 | 14 | 15 |
| 90 | 42 | 3 | 71 | 10 | 19 |