| Literature DB >> 19204819 |
Abstract
A phylogenetic profile captures the pattern of gene gain and loss throughout evolutionary time. Proteins that interact directly or indirectly within the cell to perform a biological function will often co-evolve, and this co-evolution should be well reflected within their phylogenetic profiles. Thus similar phylogenetic profiles are commonly used for grouping proteins into functional groups. However, it remains unclear how the size and content of the phylogenetic profile impacts the ability to predict function, particularly in Eukaryotes. Here we developed a straightforward approach to address this question by constructing a complete set of phylogenetic profiles for 31 fully sequenced Eukaryotes. Using Gene Ontology as our gold standard, we compared the accuracy of functional predictions made by a comprehensive array of permutations on the complete set of genomes. Our permutations showed that phylogenetic profiles containing between 25 and 31 Eukaryotic genomes performed equally well and significantly better than all other permuted genome sets, with one exception: we uncovered a core of group of 18 genomes that achieved statistically identical accuracy. This core group contained genomes from each branch of the eukaryotic phylogeny, but also contained several groups of closely related organisms, suggesting that a balance between phylogenetic breadth and depth may improve our ability to use Eukaryotic specific phylogenetic profiles for functional annotations.Entities:
Keywords: comparative genomics; functional prediction; phylogenetic profiles
Year: 2008 PMID: 19204819 PMCID: PMC2614202 DOI: 10.4137/ebo.s863
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Figure 1The functional prediction accuracy of phylogenetic profiles from 30 different eukaryotic genome sets. Accuracy was determined by t-test comparison of average hamming distance for phylogenetic profiles within a functional module versus phylogenetic profiles in all 885 other functional modules. P values were adjusted to account for multiple testing by the method described in (Storey, 2002). The plot depicts the number of corrected p values per functional module and demonstrates that there are 4 zones of decreasing accuracy within which the level of performance of each phylogenetic profile for predicting function is statistically indistinguishable, but between which the performance declines significantly.
Order and identity of genomes deletions for the 30 permutations depicted in Figure 1. Genomes are in decreasing order from largest to smallest outlier. Phylogenetic profiles for a given genome set were grouped into biological processes defined by gene ontology; a pairwise distance matrix was then generated to identify the outlier genome (no ties were discovered). To evaluate the affect of size and composition on the power of a phylogenetic profile to predict function, outlier genomes were removed one-by-one until the size of the genome set equaled 2, i.e. composed of just Hsa and Mus.
| Cint | |
| Ame | |
| Cel | |
| Aga | |
| Spu | |
| Sce | |
| Ath | |
| Dme | |
| Spa | |
| Cgl | |
| Cal | |
| Spo | |
| Sba | |
| Sca | |
| Gga | |
| Dre | |
| Tni | |
| Fru | |
| Sku | |
| Mmu | |
| Xtr | |
| Skl | |
| Ptr | |
| Mdo | |
| Smi | |
| Cfa | |
| Rno | |
| Bta | |
| Ecu | |
| Hsa* | |
| Mus* |
Functional modules predicted with 80% or higher accuracy by the 30 genome sets tested in the present study. A functional module includes all Gene Ontology (GO) terms along the path to the root of the GO process ontology excluding those closest to the root, specifically terms at levels 1 and 2.
| Parent GO ID | Biological Process Description | Size of Module | # Profiles in Subgraph |
|---|---|---|---|
| GO:0048691 | positive regulation of axon extension involved in regeneration | 24 | 309 |
| GO:0048478 | replication fork protection | 32 | 267 |
| GO:0048128 | oocyte axis determination, oocyte nuclear migration | 43 | 274 |
| GO:0046638 | positive regulation of alpha-beta T cell differentiation | 69 | 687 |
| GO:0045500 | sevenless signaling pathway | 30 | 378 |
| GO:0045082 | positive regulation of interleukin-10 biosynthetic process | 73 | 677 |
| GO:0043306 | positive regulation of mast cell degranulation | 15 | 223 |
| GO:0042776 | mitochondrial ATP synthesis coupled proton transport | 29 | 291 |
| GO:0035067 | negative regulation of histone acetylation | 44 | 660 |
| GO:0035056 | negative regulation of nuclear mRNA splicing via U2-type spliceosome | 22 | 216 |
| GO:0030702 | chromatin silencing at centromere | 33 | 371 |
| GO:0008377 | light-induced release of internally sequestered calcium ion | 47 | 482 |
| GO:0007253 | cytoplasmic sequestering of NF-kappaB | 78 | 569 |
| GO:0000752 | agglutination during conjugation with cellular fusion | 10 | 203 |
Figure 2Heatmap showing significant differences in the accuracy of the functional predictions made by 30 different genome sets tested in the present study. Significance was measured by Kolmogorov-Smirnov (KS) tests. Red indicates an insignificant p value = 1, Blue represents a significant p value = 0. The 4 large zones of accuracy also evident in Figure 1 are shown here to differ significantly with all KS-test p values < 0.01 (Z1–Z4).