| Literature DB >> 17217503 |
Zhong-Hui Duan1, Brent Hughes, Lothar Reichel, Dianne M Perez, Ting Shi.
Abstract
BACKGROUND: One main research challenge in the post-genomic era is to understand the relationship between protein sequences and their biological functions. In recent years, several automated annotation systems have been developed for the functional assignment of uncharacterized proteins. The underlying assumption of these systems is that similar sequences imply similar biological functions. However, it has been noted that matching sequences do not always infer similar functions.Entities:
Mesh:
Substances:
Year: 2006 PMID: 17217503 PMCID: PMC1780109 DOI: 10.1186/1471-2105-7-S4-S11
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The p-value distributions of protein sequence pairs.
| Percentage of pairs | Percentage of pairs in biological process | Percentage of pairs in molecular function | Percentage of pairs in cellular component | |
| [1, 10-1) | 87.81 | 86.08 | 86.92 | 85.96 |
| [10-1, 10-2) | 8.559 | 9.466 | 9.105 | 9.589 |
| [10-2, 10-3) | 2.244 | 2.634 | 2.399 | 2.676 |
| [10-3, 10-4) | 0.7139 | 0.8862 | 0.7671 | 0.8903 |
| [10-4, 10-5) | 0.2654 | 0.3493 | 0.29 | 0.3485 |
| [10-5, 10-6) | 0.1194 | 0.1618 | 0.1342 | 0.1614 |
| [10-6, 10-10) | 0.1556 | 0.2208 | 0.1843 | 0.2126 |
| [10-10, 10-15) | 0.05141 | 0.07762 | 0.06913 | 0.06826 |
| [10-15, 10-20) | 0.02427 | 0.03747 | 0.0342 | 0.03117 |
| [10-20, 10-50) | 0.03834 | 0.06107 | 0.06368 | 0.04699 |
| [10-50, 10-100) | 0.01032 | 0.01468 | 0.016365 | 0.01087 |
| [10-100, 10-200) | 0.004558 | 0.005692 | 0.007037 | 0.004185 |
| [10-200, 10-300) | 5.74E-05 | 8.03E-05 | 7.27E-05 | 5.35E-05 |
| [10-300, 0] | 0.004419 | 0.004935 | 0.007291 | 0.002721 |
Figure 1Numbers of GO groups at different levels of the gene ontologies.
Figure 2Average size of GO groups at different levels of the ontologies.
Figure 3The p-value distributions of sequence pairs annotated for molecular function. Each curve represents the percentages of sequence pairs of less than or equal to certain p-value across different levels of the GO category. (Some curves do not include the percentages for all levels because no sequence pair on those levels has a p-value less than or equal to certain thresholds.)
Figure 4The p-value distributions of sequence pairs annotated for biological process. Each curve represents the percentages of sequence pairs of less than or equal to certain p-value across different levels of the GO category. (Some curves do not include the percentages for all levels because no sequence pair on those levels has a p-value less than or equal to certain thresholds.)
Figure 5The p-value distributions of sequence pairs annotated for cellular component. Each curve represents the percentages of sequence pairs of less than or equal to certain p-value across different levels of the GO category. (Some curves do not include the percentages for all levels because no sequence pair on those levels has a p-value less than or equal to certain thresholds.)
The p-value distribution of sequence pairs in some GO groups on transporter activity branch.
| GO ID1 | GO group description | |||||||
| [1, 10-2) | [10-2, 10-5) | [10-5, 10-10) | [10-10, 10-20) | [10-20, 10-50) | [10-50, 10-100) | [10-100, 0] | ||
| * GO0005215(392) | 94.13 | 3.68 | 0.66 | 0.42 | 0.52 | 0.26 | 0.33 | transporter activity |
| ** GO0042626(43) | 85.6 | 2.1 | 2.21 | 4.1 | 2.21 | 1.99 | 1.77 | ATPase activity, coupled to transmembrane movement of substances |
| *** GO0042625(30) | 85.75 | 1.84 | 1.61 | 3.68 | 3.91 | 2.07 | 1.15 | ATPase activity, coupled to transmembrane movement of ions |
| **** GO0019829(19) | 92.4 | 2.92 | 0 | 1.75 | 1.75 | 0 | 1.17 | cation-transporting ATPase activity |
| ***** GO0046961(18) | 92.16 | 2.61 | 0 | 1.96 | 1.96 | 0 | 1.31 | hydrogen-transporting ATPase activity, rotational mechanism |
| **** GO0015662(12) | 30.3 | 0 | 10.61 | 19.7 | 21.21 | 13.64 | 4.55 | ATPase activity, coupled to transmembrane movement of ions, phosphorylative mechanism |
| **GO0015267(11) | 87.27 | 3.64 | 1.82 | 0 | 0 | 1.82 | 5.45 | channel or pore class transporter activity |
| *** GO0015268(9) | 88.89 | 5.56 | 2.78 | 0 | 0 | 2.78 | 0 | alpha-type channel activity |
| **** GO0005216(7) | 85.71 | 4.76 | 4.76 | 0 | 0 | 4.76 | 0 | ion channel activity |
| ** GO0015238(14) | 41.76 | 10.99 | 15.38 | 12.09 | 10.99 | 1.1 | 7.69 | drug transporter activity |
| *** GO0015239(11) | 47.27 | 9.09 | 14.55 | 9.09 | 12.73 | 0 | 7.27 | multidrug transporter activity |
| ** GO0015144(27) | 21.08 | 1.71 | 3.13 | 8.55 | 23.65 | 5.7 | 36.18 | carbohydrate transporter activity |
| *** GO0051119(24) | 0.36 | 1.81 | 3.62 | 10.87 | 30.07 | 7.25 | 46.01 | sugar transporter activity |
| **** GO0015145(17) | 0 | 0 | 0 | 0 | 0 | 11.03 | 88.97 | monosaccharide transporter activity |
| ***** GO0015149(17) | 0 | 0 | 0 | 0 | 0 | 11.03 | 88.97 | hexose transporter activity |
| ****** GO0005354(6) | 0 | 0 | 0 | 0 | 0 | 33.33 | 66.67 | galactose transporter activity |
| ****** GO0015578(15) | 0 | 0 | 0 | 0 | 0 | 0 | 100 | mannose transporter activity |
| ****** GO0005353(15) | 0 | 0 | 0 | 0 | 0 | 0 | 100 | fructose transporter activity |
| ****** GO0005355(16) | 0 | 0 | 0 | 0 | 0 | 0 | 100 | glucose transporter activity |
| ** GO0015075(140) | 95.6 | 3.02 | 0.16 | 0.29 | 0.37 | 0.2 | 0.36 | ion transporter activity |
| *** GO0046873(38) | 91.18 | 4.98 | 0.14 | 0.28 | 0.71 | 1.28 | 1.42 | metal ion transporter activity |
| **** GO0046915(29) | 88.67 | 6.9 | 0.25 | 0.25 | 0.49 | 1.48 | 1.97 | transition metal ion transporter activity |
| ***** GO0005375(8) | 82.14 | 10.71 | 0 | 3.57 | 0 | 3.57 | 0 | copper ion transporter activity |
| ***** GO0005381(10) | 75.56 | 6.67 | 0 | 0 | 0 | 8.89 | 8.89 | iron ion transporter activity |
| *** GO0042625(30) | 85.75 | 1.84 | 1.61 | 3.68 | 3.91 | 2.07 | 1.15 | ATPase activity, coupled to transmembrane movement of ions |
| **** GO0019829(19) | 92.4 | 2.92 | 0 | 1.75 | 1.75 | 0 | 1.17 | cation-transporting ATPase activity |
| ***** GO0046961(18) | 92.16 | 2.61 | 0 | 1.96 | 1.96 | 0 | 1.31 | hydrogen-transporting ATPase activity, rotational mechanism |
| **** GO0015662(12) | 30.3 | 0 | 10.61 | 19.7 | 21.21 | 13.64 | 4.55 | ATPase activity, coupled to transmembrane movement of ions, phosphorylative mechanism |
1The tree structure of the transporter activity branch is embedded in the first column of the table. The number of stars before a GO ID indicates the level of the GO group. If a group (A) is at one level higher than the group (B) on a row immediate above, A is a child of B. Otherwise, A is a sibling of the group that is at the same level as A and described on a nearest row above A. The numerical number in the parentheses after the GO ID represents the size of the group.
Posterior probability of a correct assignment for sequences in GO groups on kinase activity branch.
| GO ID | GO group description | |||||||
| 1 | 10-2 | 10-5 | 10-10 | 10-20 | 10-50 | 10–100 | ||
| **** GO0016301(161) | 1.25 | 2.70 | 29.14 | 30.75 | 33.48 | 4.57 | 18.67 | kinase activity |
| ***** GO0019205(10) | 0.070 | 0.21 | 10.20 | 29.41 | 54.53 | 28.54 | 99.90 | nucleobase, nucleoside, nucleotide kinase activity |
| ****** GO0019201(6) | 0.039 | 0.20 | 8.11 | 19.35 | 60.00 | 33.33 | 99.98 | nucleotide kinase activity |
| ***** GO0004672(94) | 0.72 | 3.52 | 30.07 | 30.37 | 31.81 | 33.25 | 13.76 | protein kinase activity |
| ****** GO0004674(65) | 0.50 | 2.70 | 21.43 | 20.10 | 18.71 | 3.00 | 14.07 | protein serine/threonine kinase activity |
| ******* GO0004693(6) | 0.039 | 0.176 | 1.21 | 1.42 | 2.83 | 2.86 | 0 | cyclin-dependent protein kinase activity |
| ******* GO0004680(7) | 0.046 | 0.23 | 1.83 | 1.34 | 4.50 | 6.67 | 10.00 | casein kinase activity |
| ******* GO0004702(8) | 0.054 | 0.33 | 2.29 | 2.67 | 2.64 | 1.03 | 5.88 | receptor signaling protein serine/threonine kinase activity |
| ****** GO0004713(6) | 0.039 | 0.24 | 1.70 | 1.45 | 0.87 | 0 | 0 | protein-tyrosine kinase activity |
| ***** GO0019200(15) | 0.11 | 0.23 | 15.11 | 25.43 | 65.24 | 65.24 | 28.6 | carbohydrate kinase activity |
| ***** GO0001727(8) | 0.054 | 0.16 | 3.53 | 4.38 | 28.58 | 25.03 | 25.06 | lipid kinase activity |
| ***** GO0004428(12) | 0.085 | 0.15 | 3.21 | 4.97 | 29.39 | 29.96 | 33.22 | inositol or phosphatidylinositol kinase activity |
The p-value distribution of sequence pairs in GO groups on kinase activity branch.
| GO ID | GO group description | |||||||
| [1, 10-2) | [10-2, 10-5) | [10-5, 10-10) | [10-10, 10-20) | [10-20, 10-50) | [10-50, 10-100) | [10-100, 0] | ||
| **** GO0016301(161) | 70.45 | 4.28 | 5.17 | 10.82 | 8.48 | 0.5 | 0.31 | kinase activity |
| ***** GO0019205(10) | 66.67 | 11.11 | 0 | 8.89 | 8.89 | 0 | 4.44 | nucleobase, nucleoside, nucleotide kinase activity |
| ****** GO0019201(6) | 40 | 20 | 0 | 0 | 26.67 | 0 | 13.33 | nucleotide kinase activity |
| ***** GO0004672(94) | 25.14 | 5.51 | 14.37 | 29.83 | 23.5 | 1.14 | 0.5 | protein kinase activity |
| ****** GO0004674(65) | 18.89 | 5.19 | 17.31 | 33.65 | 22.45 | 1.59 | 0.91 | protein serine/threonine kinase activity |
| ******* GO0004693(6) | 33.33 | 0 | 0 | 0 | 26.67 | 40 | 0 | cyclin-dependent protein kinase activity |
| ******* GO0004680(7) | 47.62 | 0 | 19.05 | 9.52 | 4.76 | 14.29 | 4.76 | casein kinase activity |
| ******* GO0004702(8) | 0 | 0 | 0 | 42.86 | 42.86 | 10.71 | 3.57 | receptor signaling protein serine/threonine kinase activity |
| ****** GO0004713(6) | 0 | 0 | 26.67 | 53.33 | 20 | 0 | 0 | protein-tyrosine kinase activity |
| ***** GO0019200(15) | 78.1 | 1.9 | 5.71 | 0 | 0 | 8.57 | 5.71 | carbohydrate kinase activity |
| ***** GO001727(8) | 53.57 | 7.14 | 14.29 | 10.71 | 7.14 | 3.57 | 3.57 | lipid kinase activity |
| ***** GO0004428(12) | 72.73 | 6.06 | 6.06 | 7.58 | 3.03 | 1.52 | 3.03 | inositol or phosphatidylinositol kinase activity |