| Literature DB >> 20041191 |
Wai Lok Sibon Li1, Allen G Rodrigo.
Abstract
Recent studies have shown evidence for the coevolution of functionally-related genes. This coevolution is a result of constraints to maintain functional relationships between interacting proteins. The studies have focused on the correlation in gene tree branch lengths of proteins that are directly interacting with each other. We here hypothesize that the correlation in branch lengths is not limited only to proteins that directly interact, but also to proteins that operate within the same pathway. Using generalized linear models as a basis of identifying correlation, we attempted to predict the gene ontology (GO) terms of a gene based on its gene tree branch lengths. We applied our method to a dataset consisting of proteins from ten prokaryotic species. We found that the degree of accuracy to which we could predict the function of the proteins from their gene tree varied substantially with different GO terms. In particular, our model could accurately predict genes involved in translation and certain ribosomal activities with the area of the receiver-operator curve of up to 92%. Further analysis showed that the similarity between the trees of genes labeled with similar GO terms was not limited to genes that physically interacted, but also extended to genes functioning within the same pathway. We discuss the relevance of our findings as it relates to the use of phylogenetic methods in comparative genomics.Entities:
Mesh:
Year: 2009 PMID: 20041191 PMCID: PMC2793527 DOI: 10.1371/journal.pone.0008487
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Prediction accuracy of the GLMs for the leave-one out tests, measured by the ROC area under the curve.
| GO biological process(es) | GO molecular function(s) | Number of genes | ROC area | Adjusted |
| translation | structural constituent of ribosome | 38 | 0.92 | 0.00 |
| translation | rRNA binding | 28 | 0.88 | 0.00 |
| translation | RNA binding | 34 | 0.82 | 0.00 |
| translation | tRNA binding | 7 | 0.80 | 0.01 |
| translation | protein binding | 22 | 0.69 | 0.03 |
| translation | aminoacyl-tRNA ligase activity; ATP binding; ligase activity | 12 | 0.71 | 0.05 |
| regulation of transcription, DNA-dependent | protein binding | 8 | 0.70 | 0.10 |
| transport | protein binding | 7 | 0.69 | 0.10 |
| protein folding | protein binding | 7 | 0.66 | 0.17 |
| DNA replication | protein binding | 8 | 0.67 | 0.19 |
| tRNA aminoacylation for protein translation | aminoacyl-tRNA ligase activity; ATP binding; ligase activity; nucleotide binding | 7 | 0.62 | 0.23 |
| DNA repair | hydrolase activity | 8 | 0.61 | 0.23 |
| translation | nucleotide binding | 15 | 0.59 | 0.23 |
| response to DNA damage stimulus | hydrolase activity | 7 | 0.55 | 0.37 |
| transport | ATP binding | 7 | 0.54 | 0.41 |
| DNA replication | DNA binding | 7 | 0.52 | 0.41 |
| SOS response | DNA binding | 7 | 0.49 | 0.52 |
| metabolic process | transferase activity | 10 | 0.48 | 0.58 |
| regulation of transcription, DNA-dependent | RNA binding | 7 | 0.43 | 0.67 |
| metabolic process | protein binding | 7 | 0.39 | 0.74 |
| metabolic process | catalytic activity | 13 | 0.42 | 0.74 |
| DNA repair; response to DNA damage stimulus | DNA binding | 11 | 0.41 | 0.74 |
| cell cycle; cell division | nucleotide binding | 8 | 0.38 | 0.74 |
| DNA repair; response to DNA damage stimulus | ATP binding; nucleotide binding | 7 | 0.33 | 0.80 |
| transcription | DNA binding | 7 | 0.29 | 0.86 |
| transcription | protein binding | 8 | 0.25 | 0.91 |
Different GO process terms and function terms often shared the exact same set of genes. For example the functions of “aminoacyl-tRNA ligase activity”, “ATP binding” and “ligase activity” within the “translation” process have the same genes involved in them. These are grouped as a single category in the table.
Figure 1Plots of true positive rate against false positive rate for a few example GO process-function pairs.
The predictions from the GLMs of each function were estimated using different values of the cut-off point (shown by the colored scale on the right), and error rates calculated from these predictions. (a)–(d) shows the accuracy of four related ribosomal functions within the GO process of “translation”. The four GO functions are “structural constituent of ribosome”, “tRNA binding”, “rRNA binding” and “RNA binding”, respectively.
Figure 2A detailed analysis of the proteins in our dataset annotated as being involved in GO process “translation” and GO function “structural constituent of ribosome”.
(a) The pathway interaction network of these proteins, as shown in Cytoscape [64]. Proteins P02378, P02371, P02373 and P02372 (in the first column) contain no known physical interactions to any other proteins in our list. (b) Example gene trees of proteins from our dataset. From top left to bottom right, the trees are from gene P02386, P02410 (a protein known to physically interact with P02386), P02351 (a protein that does not interact with either of the previous genes but contributes to the pathway) and the consensus of all gene trees in our dataset not labeled with these two GO terms. (c) The models built by the GLMs for (i) the proteins labeled with the two GO terms and (ii) for the 10000 randomizations of the null distribution. The end predicted value is obtained by adding the products of each coefficient and its corresponding predictor value, and the intercept value.