| Literature DB >> 26767846 |
Kamal Taha1, Paul D Yoo2.
Abstract
BACKGROUND: All proteins associate with other molecules. These associated molecules are highly predictive of the potential functions of proteins. The association of a protein and a molecule can be determined from their co-occurrences in biomedical abstracts. Extensive semantically related co-occurrences of a protein's name and a molecule's name in the sentences of biomedical abstracts can be considered as indicative of the association between the protein and the molecule. Dependency parsers extract textual relations from a text by determining the grammatical relations between words in a sentence. They can be used for determining the textual relations between proteins and molecules. Despite their success, they may extract textual relations with low precision. This is because they do not consider the semantic relationships between terms in a sentence (i.e., they consider only the structural relationships between the terms). Moreover, they may not be well suited for complex sentences and for long-distance textual relations.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26767846 PMCID: PMC4714473 DOI: 10.1186/s12859-016-0882-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The distribution of semantically related and semantically unrelated co-occurrences of molecule m and Protein p Pair in an Abstract A
The weight of associations between 10 molecules and protein PA1535 based on their co-occurrences in the abstract of Förster et al. [41]
|
|
|
|
|
|
|
|
| ||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| related | unrelated | related | unrelated | related | unrelated | related | unrelated | ||||
| AtuD | 4 | 3 | 11 | 9 | 1.6 | 1.5 | 0.24 | 0.18 | 0.069 | 0.049 | 0.020 |
| citronellyl-CoA | 2 | 2 | 12 | 8 | 1.3 | 1.3 | 0.22 | 0.19 | 0.116 | 0.099 | 0.017 |
| octanoyl-CoA | 2 | 1 | 10 | 9 | 1.3 | 1 | 0.25 | 0.23 | 0.115 | 0.103 | 0.012 |
| terpenoid-CoA | 1 | 1 | 10 | 8 | 1 | 1 | 0.23 | 0.26 | 0.009 | 0.001 | 0.008 |
| isovaleryl-CoA | 1 | 2 | 2 | 6 | 1 | 1.3 | 0.12 | 0.22 | 0.005 | 0.002 | 0.003 |
| Docosenoyl-CoA | 0 | 0 | 9 | 7 | 0 | 0 | 0.24 | 0.25 | 0 | 0 | 0 |
| OPC4-CoA | 0 | 0 | 11 | 6 | 0 | 0 | 0.24 | 0.24 | 0 | 0 | 0 |
| Sirodesmin H | 0 | 0 | 5 | 8 | 0 | 0 | 0.21 | 0.28 | 0 | 0 | 0 |
| OPC8-CoA | 0 | 0 | 7 | 3 | 0 | 0 | 0.23 | 0.18 | 0 | 0 | 0 |
| 3-dipole | 0 | 0 | 4 | 2 | 0 | 0 | 0.22 | 0.15 | 0 | 0 | 0 |
The weight of associations between 10 molecules and protein PA1535 based on their co-occurrences in 12 abstracts
| molecule |
|
|
|
|
| Docosenoyl-CoA | OPC4-CoA | Sirodesmin H | OPC8-CoA | 3-dipole |
|---|---|---|---|---|---|---|---|---|---|---|
| Abstract | ||||||||||
| A1 | 0.020 | 0.017 | 0.012 | 0.008 | 0.003 | 0 | 0 | 0 | 0 | 0 |
| A2 | 0.060 | 0 | 0 | 0 | 0.778 | 0 | 0.060 | 0.270 | 0.060 | 0 |
| A3 | 0 | 0.060 | 0.778 | 0.060 | 0 | 0 | 0 | 0.060 | 0 | 0.088 |
| A4 | 0.060 | 0.060 | 0.118 | 0 | 0 | 0.270 | 0 | 0 | 0.088 | 0 |
| A5 | 0.060 | 0 | 0 | 0 | 0.778 | 0 | 0.060 | 0.270 | 0.060 | 0 |
| A6 | 0 | 0.652 | 0 | 0.055 | 0.121 | 0 | 0.004 | 0 | 0 | 0.058 |
| A7 | 0.493 | 0.116 | 0 | 0.008 | 0.072 | 0.002 | 0 | 0.603 | 0 | 0 |
| A8 | 0 | 0 | 0.387 | 0.184 | 0 | 0 | 0.035 | 0 | 0.004 | 0.002 |
| A9 | 0 | 0.002 | 0.0548 | 0 | 0.735 | 0.017 | 0 | 0.357 | 0 | 0.085 |
| A10 | 0.664 | 0.183 | 0 | 0.006 | 0 | 0 | 0.736 | 0 | 0.002 | 0.006 |
| A11 | 0.068 | 0.389 | 0.216 | 0.003 | 0 | 0.047 | 0.009 | 0 | 0 | 0.364 |
| A12 | 0.213 | 0 | 0.735 | 0 | 0.043 | 0.003 | 0 | 0.007 | 0 | 0 |
Beats/looses scores and normalized weights of the 10 molecules that associate with protein PA1535 based on their co-occurrences in 12 abstracts, calculated based on their weights shown in Table 3
| AtuD | citronellyl-CoA | octanoyl-CoA | terpenoid-CoA | isovaleryl-CoA | Docosenoyl-CoA | OPC4-CoA | Sirodesmin H | OPC8-CoA | 3-dipole | |
|---|---|---|---|---|---|---|---|---|---|---|
| AtuD | 0 | - | + | - | - | - | - | 0 | - | - |
| citronellyl-CoA | + | 0 | 0 | − | − | − | − | 0 | − | − |
| octanoyl-CoA | − | 0 | 0 | − | − | − | − | − | − | − |
| terpenoid-CoA | + | + | + | 0 | + | − | − | 0 | − | + |
| isovaleryl-CoA | + | + | + | − | 0 | − | − | − | − | − |
| Docosenoyl-CoA | + | + | + | + | + | 0 | 0 | + | - | + |
| OPC4-CoA | + | + | + | + | + | 0 | 0 | + | - | 0 |
| Sirodesmin H | 0 | 0 | + | 0 | + | - | - | 0 | - | 0 |
| OPC8-CoA | + | + | + | + | + | + | + | + | 0 | + |
| 3-dipole | + | + | + | - | + | - | 0 | 0 | - | 0 |
|
| +6 | +5 | +8 | +2 | +3 | −6 | −5 | +1 | −9 | −1 |
|
| 0.16 | 0.15 | 0.18 | 0.12 | 0.13 | 0.03 | 0.04 | 0.10 | 0 | 0.09 |
The Symbol “+” denotes that molecule m (column) Beats molecule m (row) in the Abstracts, while “-” denotes that m Lost. “0” denotes that m and m have the same Number of Beats and Looses. S(m , p) and denote the Score and Normalized Weight, respectively, of Molecule m in The 12 Abstracts. An Entry is based on Column-Row Order
The go dataset used in the experiments
| Biological process sub-ontology | Molecular function sub-ontology | |
|---|---|---|
| No. of GO terms selected for the experiments | 70 | 30 |
| No. of proteins annotated to the GO terms | 584, 973 | 604,625 |
| No. of proteins selected for the experimentsa | 62,386 | 16,576 |
aWe selected for the experiments only the proteins that: (1) are associated with at least one PubMed abstract based on their entries in UniProtKB [28], and (2) have experimental evidence code: IDA, IC, IPI, EXP, IEP, IMP, TAS, IC, or IGI
Fig. 1Performance of the four systems using CAFA dataset and 5-fold Cross Validation for predicting: (a) the Biological Process annotations, and (b) the Molecular Function annotations
Fig. 2Performance of the four systems using the Yeast protein dataset and 5-fold Cross Validation for predicting: (a) the Biological Process annotations, and (b) the Molecular Function annotations
Performance of predicting the biological process annotations using randomly selected sets of training and testing proteins
| GO Term | Average depth (level) of GO term | Number of training proteins | Number of testing protein | PPFBM | GOstruct | Text-KNN | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| R | P | F | R | P | F | R | P | F | ||||
| GO:0048856 | 4 | 2130 | 420 | 0.74 | 0.71 | 0.72 | 0.45 | 0.49 | 0.47 | 0.24 | 0.26 | 0.25 |
| GO:0002009 | 4 | 633 | 125 | 0.55 | 0.60 | 0.57 | 0.37 | 0.35 | 0.36 | 0.19 | 0.22 | 0.20 |
| GO:0072088 | 4 | 36 | 9 | 0.34 | 0.35 | 0.34 | 0.54 | 0.58 | 0.56 | 0.12 | 0.04 | 0.06 |
| GO:0035295 | 4 | 1890 | 370 | 0.75 | 0.78 | 0.76 | 0.45 | 0.43 | 0.44 | 0.30 | 0.28 | 0.29 |
| GO:0035239 | 4 | 1304 | 260 | 0.71 | 0.75 | 0.73 | 0.36 | 0.35 | 0.35 | 0.28 | 0.26 | 0.27 |
| GO:0001763 | 4 | 865 | 173 | 0.66 | 0.65 | 0.65 | 0.43 | 0.45 | 0.44 | 0.20 | 0.24 | 0.22 |
| GO:0072001 | 5 | 450 | 90 | 0.55 | 0.59 | 0.57 | 0.45 | 0.50 | 0.47 | 0.20 | 0.25 | 0.22 |
| GO:0009653 | 5 | 1345 | 265 | 0.75 | 0.73 | 0.74 | 0.41 | 0.47 | 0.44 | 0.25 | 0.32 | 0.28 |
| GO:0009888 | 5 | 859 | 171 | 0.66 | 0.67 | 0.66 | 0.35 | 0.39 | 0.37 | 0.18 | 0.23 | 0.20 |
| GO:0048589 | 5 | 1828 | 360 | 0.76 | 0.8 | 0.78 | 0.54 | 0.57 | 0.55 | 0.25 | 0.27 | 0.26 |
| GO:0060562 | 5 | 1212 | 240 | 0.71 | 0.74 | 0.72 | 0.43 | 0.47 | 0.45 | 0.23 | 0.27 | 0.25 |
| GO:0001657 | 5 | 438 | 87 | 0.51 | 0.56 | 0.53 | 0.46 | 0.46 | 0.46 | 0.19 | 0.22 | 0.20 |
| GO:0061138 | 5 | 792 | 158 | 0.69 | 0.75 | 0.72 | 0.42 | 0.45 | 0.43 | 0.15 | 0.21 | 0.18 |
| GO:0060429 | 6 | 528 | 105 | 0.60 | 0.65 | 0.62 | 0.34 | 0.27 | 0.30 | 0.22 | 0.30 | 0.25 |
| GO:0048731 | 6 | 1183 | 225 | 0.72 | 0.78 | 0.75 | 0.38 | 0.43 | 0.40 | 0.31 | 0.31 | 0.31 |
| GO:0072009 | 6 | 86 | 20 | 0.38 | 0.41 | 0.39 | 0.45 | 0.49 | 0.47 | 0.09 | 0.07 | 0.08 |
| GO:0001655 | 6 | 204 | 41 | 0.41 | 0.46 | 0.43 | 0.35 | 0.31 | 0.33 | 0.18 | 0.24 | 0.21 |
| GO:0001822 | 6 | 110 | 30 | 0.39 | 0.48 | 0.43 | 0.37 | 0.41 | 0.39 | 0.19 | 0.15 | 0.17 |
| GO:0072073 | 6 | 84 | 21 | 0.46 | 0.49 | 0.47 | 0.53 | 0.62 | 0.57 | 0.07 | 0.09 | 0.08 |
| GO:0060560 | 6 | 1062 | 200 | 0.69 | 0.71 | 0.70 | 0.37 | 0.39 | 0.38 | 0.27 | 0.31 | 0.29 |
| GO:0072033 | 6 | 61 | 13 | 0.29 | 0.33 | 0.31 | 0.48 | 0.55 | 0.51 | 0.09 | 0.2 | 0.12 |
| GO:0060675 | 6 | 277 | 55 | 0.41 | 0.42 | 0.41 | 0.40 | 0.44 | 0.42 | 0.20 | 0.25 | 0.22 |
| GO:0045165 | 6 | 1379 | 270 | 0.72 | 0.78 | 0.75 | 0.39 | 0.40 | 0.39 | 0.31 | 0.33 | 0.32 |
| GO:0007267 | 6 | 1532 | 290 | 0.70 | 0.76 | 0.73 | 0.48 | 0.51 | 0.49 | 0.23 | 0.24 | 0.23 |
| GO:0030154 | 6 | 1596 | 310 | 0.71 | 0.78 | 0.74 | 0.45 | 0.47 | 0.46 | 0.23 | 0.26 | 0.24 |
| GO:0065008 | 6 | 1400 | 270 | 0.73 | 0.76 | 0.74 | 0.44 | 0.47 | 0.45 | 0.27 | 0.28 | 0.27 |
| GO:0048754 | 6 | 687 | 137 | 0.55 | 0.57 | 0.56 | 0.37 | 0.39 | 0.38 | 0.17 | 0.20 | 0.18 |
| GO:0009887 | 6 | 12 | 4 | 0.17 | 0.24 | 0.20 | 0.52 | 0.53 | 0.52 | 0.00 | 0.00 | 0.00 |
| GO:0044699 | 6 | 1912 | 370 | 0.72 | 0.71 | 0.71 | 0.38 | 0.37 | 0.37 | 0.27 | 0.32 | 0.29 |
| GO:2001141 | 6 | 1731 | 335 | 0.70 | 0.71 | 0.70 | 0.42 | 0.43 | 0.42 | 0.32 | 0.36 | 0.34 |
| GO:0010468 | 6 | 1758 | 340 | 0.75 | 0.70 | 0.72 | 0.39 | 0.41 | 0.40 | 0.22 | 0.26 | 0.24 |
| GO:2000112 | 6 | 1637 | 320 | 0.64 | 0.68 | 0.66 | 0.39 | 0.40 | 0.39 | 0.31 | 0.35 | 0.33 |
| GO:0048513 | 7 | 1107 | 220 | 0.65 | 0.72 | 0.68 | 0.47 | 0.47 | 0.47 | 0.23 | 0.29 | 0.26 |
| GO:0048729 | 7 | 465 | 93 | 0.55 | 0.62 | 0.58 | 0.39 | 0.40 | 0.39 | 0.19 | 0.23 | 0.21 |
| GO:0001656 | 7 | 72 | 18 | 0.38 | 0.42 | 0.40 | 0.43 | 0.51 | 0.47 | 0.20 | 0.25 | 0.22 |
| GO:0060993 | 7 | 109 | 21 | 0.39 | 0.42 | 0.40 | 0.42 | 0.43 | 0.42 | 0.16 | 0.23 | 0.19 |
| GO:0072006 | 7 | 100 | 25 | 0.37 | 0.42 | 0.39 | 0.37 | 0.42 | 0.39 | 0.04 | 0.18 | 0.07 |
| GO:0001658 | 7 | 402 | 80 | 0.52 | 0.57 | 0.54 | 0.44 | 0.45 | 0.44 | 0.22 | 0.25 | 0.23 |
| GO:0061326 | 7 | 309 | 61 | 0.49 | 0.53 | 0.51 | 0.38 | 0.39 | 0.38 | 0.21 | 0.28 | 0.24 |
| GO:0045168 | 7 | 459 | 91 | 0.79 | 0.82 | 0.80 | 0.43 | 0.45 | 0.44 | 0.23 | 0.23 | 0.23 |
| GO:0051094 | 7 | 1768 | 340 | 0.75 | 0.81 | 0.78 | 0.49 | 0.52 | 0.50 | 0.31 | 0.35 | 0.33 |
| GO:0051240 | 7 | 1780 | 340 | 0.76 | 0.79 | 0.77 | 0.44 | 0.44 | 0.44 | 0.28 | 0.32 | 0.30 |
| GO:0022603 | 7 | 1850 | 350 | 0.67 | 0.70 | 0.68 | 0.39 | 0.41 | 0.40 | 0.33 | 0.35 | 0.34 |
| GO:0072087 | 7 | 44 | 11 | 0.33 | 0.38 | 0.35 | 0.54 | 0.64 | 0.59 | 0.00 | 0.00 | 0.00 |
| GO:0090183 | 7 | 345 | 69 | 0.50 | 0.56 | 0.53 | 0.43 | 0.42 | 0.42 | 0.22 | 0.29 | 0.25 |
| G0:0061005 | 7 | 279 | 55 | 0.49 | 0.48 | 0.48 | 0.43 | 0.43 | 0.43 | 0.19 | 0.23 | 0.21 |
| GO:0032835 | 7 | 338 | 67 | 0.45 | 0.53 | 0.49 | 0.40 | 0.42 | 0.41 | 0.20 | 0.26 | 0.23 |
| GO:2000027 | 8 | 631 | 126 | 0.59 | 0.61 | 0.60 | 0.34 | 0.37 | 0.35 | 0.21 | 0.25 | 0.23 |
| GO:0072080 | 8 | 241 | 48 | 0.40 | 0.43 | 0.41 | 0.36 | 0.36 | 0.36 | 0.21 | 0.23 | 0.22 |
| GO:0003338 | 8 | 52 | 13 | 0.26 | 0.35 | 0.30 | 0.41 | 0.48 | 0.44 | 0.07 | 0.17 | 0.10 |
| GO:0044767 | 8 | 1755 | 351 | 0.78 | 0.82 | 0.80 | 0.38 | 0.45 | 0.41 | 0.23 | 0.26 | 0.24 |
| GO:0072028 | 8 | 48 | 12 | 0.36 | 0.38 | 0.37 | 0.42 | 0.54 | 0.47 | 0.00 | 0.00 | 0.00 |
| GO:0006366 | 8 | 1840 | 350 | 0.67 | 0.71 | 0.69 | 0.45 | 0.47 | 0.46 | 0.24 | 0.27 | 0.25 |
| GO:0006355 | 8 | 1804 | 350 | 0.51 | 0.55 | 0.53 | 0.38 | 0.39 | 0.38 | 0.30 | 0.29 | 0.29 |
| GO:0031128 | 8 | 213 | 42 | 0.42 | 0.44 | 0.43 | 0.46 | 0.46 | 0.46 | 0.17 | 0.23 | 0.20 |
| GO:0090184 | 8 | 1717 | 34 | 0.70 | 0.73 | 0.71 | 0.40 | 0.38 | 0.39 | 0.32 | 0.35 | 0.33 |
| GO:0072210 | 8 | 72 | 18 | 0.39 | 0.46 | 0.42 | 0.45 | 0.46 | 0.45 | 0.00 | 0.00 | 0.00 |
| GO:0072215 | 8 | 132 | 26 | 0.42 | 0.44 | 0.43 | 0.39 | 0.41 | 0.40 | 0.10 | 0.12 | 0.11 |
| GO:0077273 | 8 | 199 | 39 | 0.46 | 0.47 | 0.46 | 0.39 | 0.42 | 0.40 | 0.15 | 0.17 | 0.16 |
| GO:0072202 | 8 | 119 | 24 | 0.42 | 0.44 | 0.43 | 0.41 | 0.38 | 0.39 | 0.12 | 0.14 | 0.13 |
| GO:0072207 | 8 | 125 | 25 | 0.41 | 0.45 | 0.43 | 0.33 | 0.41 | 0.37 | 0.09 | 0.12 | 0.10 |
| GO:0072075 | 8 | 183 | 36 | 0.41 | 0.43 | 0.42 | 0.44 | 0.45 | 0.44 | 0.18 | 0.21 | 0.19 |
| GO:0072170 | 8 | 108 | 28 | 0.32 | 0.36 | 0.34 | 0.39 | 0.42 | 0.40 | 0.11 | 0.15 | 0.13 |
| GO:0072234 | 9 | 176 | 45 | 0.39 | 0.46 | 0.42 | 0.32 | 0.45 | 0.37 | 0.08 | 0.17 | 0.11 |
| GO:0072017 | 9 | 104 | 20 | 0.38 | 0.47 | 0.42 | 0.40 | 0.45 | 0.42 | 0.15 | 0.19 | 0.17 |
| GO:0072077 | 9 | 32 | 8 | 0.25 | 0.38 | 0.30 | 0.46 | 0.51 | 0.48 | 0.00 | 0.00 | 0.00 |
| GO:0072078 | 9 | 148 | 38 | 0.36 | 0.39 | 0.37 | 0.40 | 0.41 | 0.40 | 0.14 | 0.16 | 0.15 |
| GO:0072070 | 9 | 147 | 37 | 0.38 | 0.46 | 0.42 | 0.36 | 0.42 | 0.39 | 0.19 | 0.23 | 0.21 |
| GO:0072050 | 9 | 67 | 15 | 0.38 | 0.45 | 0.41 | 0.40 | 0.38 | 0.39 | 0.14 | 0.07 | 0.09 |
| GO:0006357 | 9 | 1992 | 390 | 0.69 | 0.75 | 0.72 | 0.39 | 0.45 | 0.42 | 0.31 | 0.32 | 0.31 |
The table shows the average depth (level) of each GO term in the biological process subontology and the accuracy of predicting the function of this term. R, P, and F DENOTE Recall, Precision, and F-value respectively
Fig. 3Performance of the four systems using the GO dataset and 5-fold Cross Validation for predicting: (a) the Biological Process annotations, and (b) the Molecular Function annotations
Fig. 4Precision-Recall curves plotted using CAFA protein-centric metrics with confidence scores above thresholds distributed evenly in the range [0, 1] at step size 0.01. (a) shows the curves for the Biological Process annotations, and (b) shows the curves for the Molecular Function annotations
Fig. 5The average Recall, Precision, and F-value of predicting the functions of each set of GO terms located at the same average depth (level) in the Biological Process subontology
Fig. 6The average Recall, Precision, and F-value of predicting the functions of each set of GO terms located at the same average depth (level) in the Molecular Function subontology
Performance of predicting the molecular function annotations using randomly selected sets of training and testing proteins
| GO Term | Average depth (level) of GO term | Number of training proteins | Number of testing proteins | PPFBM | GOstruct | Text-KNN | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| R | P | F | R | P | F | R | P | F | ||||
| GO:0038023 | 4 | 830 | 210 | 0.62 | 0.67 | 0.55 | 0.41 | 0.43 | 0.53 | 0.25 | 0.33 | 0.28 |
| GO:0009927 | 4 | 51 | 15 | 0.42 | 0.46 | 0.44 | 0.53 | 0.56 | 0.54 | 0.00 | 0.00 | 0.00 |
| GO:0000156 | 4 | 1399 | 350 | 0.78 | 0.79 | 0.78 | 0.37 | 0.44 | 0.40 | 0.37 | 0.46 | 0.41 |
| GO:0005057 | 4 | 1014 | 250 | 0.59 | 0.64 | 0.61 | 0.40 | 0.43 | 0.41 | 0.35 | 0.43 | 0.39 |
| GO:0004888 | 5 | 580 | 140 | 0.60 | 0.64 | 0.62 | 0.43 | 0.48 | 0.45 | 0.24 | 0.29 | 0.26 |
| GO:0015026 | 5 | 109 | 20 | 0.46 | 0.54 | 0.50 | 0.45 | 0.49 | 0.47 | 0.21 | 0.29 | 0.24 |
| GO:0005220 | 5 | 42 | 8 | 0.37 | 0.42 | 0.39 | 0.49 | 0.56 | 0.52 | 0.18 | 0.31 | 0.23 |
| GO:0030594 | 5 | 546 | 130 | 0.78 | 0.79 | 0.78 | 0.48 | 0.50 | 0.49 | 0.30 | 0.43 | 0.35 |
| GO:0000155 | 5 | 1034 | 250 | 0.81 | 0.84 | 0.82 | 0.42 | 0.46 | 0.44 | 0.39 | 0.42 | 0.40 |
| GO:0009881 | 5 | 289 | 70 | 0.61 | 0.66 | 0.55 | 0.45 | 0.48 | 0.46 | 0.29 | 0.37 | 0.33 |
| GO:0008329 | 5 | 136 | 30 | 0.50 | 0.55 | 0.52 | 0.43 | 0.46 | 0.44 | 0.15 | 0.30 | 0.20 |
| GO:0004887 | 5 | 81 | 20 | 0.44 | 0.53 | 0.48 | 0.46 | 0.59 | 0.52 | 0.09 | 0.16 | 0.12 |
| GO:0003707 | 5 | 878 | 220 | 0.59 | 0.68 | 0.63 | 0.38 | 0.41 | 0.39 | 0.23 | 0.32 | 0.27 |
| GO:0004896 | 6 | 130 | 35 | 0.52 | 0.52 | 0.52 | 0.45 | 0.48 | 0.46 | 0.26 | 0.32 | 0.29 |
| GO:0016502 | 6 | 169 | 45 | 0.56 | 0.57 | 0.56 | 0.47 | 0.49 | 0.48 | 0.25 | 0.31 | 0.28 |
| GO:0005035 | 6 | 51 | 10 | 0.46 | 0.48 | 0.47 | 0.51 | 0.53 | 0.52 | 0.00 | 0.00 | 0.00 |
| GO:0016917 | 6 | 198 | 50 | 0.55 | 0.63 | 0.59 | 0.46 | 0.48 | 0.47 | 0.30 | 0.38 | 0.34 |
| GO:0008066 | 6 | 301 | 80 | 0.56 | 0.63 | 0.59 | 0.43 | 0.46 | 0.44 | 0.32 | 0.39 | 0.35 |
| GO:0008158 | 6 | 138 | 35 | 0.49 | 0.56 | 0.52 | 0.46 | 0.51 | 0.48 | 0.25 | 0.35 | 0.29 |
| GO:0008046 | 6 | 58 | 15 | 0.44 | 0.45 | 0.44 | 0.47 | 0.55 | 0.51 | 0.00 | 0.00 | 0.00 |
| GO:0004984 | 6 | 3474 | 870 | 0.84 | 0.87 | 0.85 | 0.33 | 0.39 | 0.35 | 0.33 | 0.41 | 0.37 |
| GO:0035586 | 6 | 207 | 55 | 0.54 | 0.65 | 0.59 | 0.44 | 0.46 | 0.45 | 0.31 | 0.40 | 0.35 |
| GO:0017154 | 6 | 82 | 20 | 0.51 | 0.59 | 0.56 | 0.47 | 0.56 | 0.51 | 0.16 | 0.20 | 0.18 |
| GO:0019199 | 6 | 756 | 190 | 0.56 | 0.60 | 0.58 | 0.37 | 0.41 | 0.55 | 0.28 | 0.40 | 0.33 |
| GO:0042813 | 6 | 141 | 40 | 0.48 | 0.53 | 0.50 | 0.43 | 0.46 | 0.44 | 0.29 | 0.32 | 0.30 |
| GO:0004915 | 7 | 111 | 30 | 0.44 | 0.47 | 0.45 | 0.38 | 0.43 | 0.40 | 0.19 | 0.27 | 0.22 |
| GO:0004908 | 7 | 35 | 10 | 0.38 | 0.39 | 0.38 | 0.54 | 0.57 | 0.55 | 0.17 | 0.29 | 0.21 |
| GO:0004950 | 7 | 210 | 50 | 0.55 | 0.58 | 0.56 | 0.44 | 0.42 | 0.42 | 0.28 | 0.42 | 0.34 |
| GO:0004897 | 7 | 29 | 7 | 0.41 | 0.42 | 0.41 | 0.56 | 0.58 | 0.67 | 0.19 | 0.34 | 0.24 |
| GO:0004904 | 7 | 176 | 45 | 0.55 | 0.56 | 0.55 | 0.42 | 0.43 | 0.42 | 0.26 | 0.35 | 0.30 |
The table shows the average depth (level) of each GO term in the molecular function subontology and the accuracy of predicting the function of this term. R, P, and F DENOTE Recall, Precision, and F-value respectively
Sample of the 6086 yeast proteins downloaded from [34] and their biological process annotations identified by PPFBM
| Protein | Already published biological process annotations that are also identified by PPFBM | Missing (unpublished) annotations identified by PPFBM |
|---|---|---|
| YKR087C | GO:0006515 (misfolded or incompletely synthesized protein catabolic process); GO:0006508 (proteolysis) | GO:0044257 (cellular protein catabolic process) |
| YML120C | GO:0006120 (mitochondrial electron transport, NADH to ubiquinone); GO:0001300 (chronological cell aging); GO:0055114 (oxidation-reduction process); GO:0006116 (NADH oxidation) | GO:0042775 (mitochondrial ATP synthesis coupled electron transport); GO:0022904 (respiratory electron transport chain); GO:0045333 (cellular respiration); GO:0022900 (electron transport chain); GO:0044237 (cellular metabolic process); GO:0009987 (cellular process) |
| YIL156W | GO:0006511 (ubiquitin-dependent protein breakdown); GO:0006508 (peptidolysis) | GO:0044257 (cellular protein breakdown) |
| YJL207C | GO:0008104 (protein localization); GO:0006810 (transport); GO:0015031 (protein transport); GO:0042147 (retrograde transport, endosome to Golgi) | GO:0051179 (localization); GO:0051641 (cellular localization) |
| YML074C | GO:0000412 (histone peptidyl-prolyl isomerization); GO:0018208 (peptidyl-proline modification); GO:0006457 (protein folding) | GO:0000413 (protein peptidyl-prolyl isomerization) |
| YIL115C | GO:0031081 (nuclear pore distribution); GO:0006810 (transport); GO:0015031 (protein transport); GO:0006611 (protein export from nucleus); GO:0006607 (NLS-bearing protein import into nucleus); GO:0051028 (mRNA transport); GO:0016973 (poly(A)+ mRNA export from nucleus); GO:0000055 (ribosomal large subunit export, nucleus); GO:0000056 (ribosomal small subunit export, nucleus) | GO:0051179 (localization); GO:0034613 (cellular protein localization); GO:0008104 (protein localization); GO:0051641 (cellular localization); GO:0034504 (protein localization to nucleus); GO:0006403 (RNA localization); GO:0033750 (ribosome localization); GO:0051640 (organelle localization) |
| YNL305C | GO:0019722 (calcium-mediated signaling); GO:0006915 (apoptotic process); GO:0030968 (endoplasmic reticulum unfolded response) | GO:0023052 (signaling); GO:0007154 (cell communication) |
| YFL016C | GO:0006515 (misfolded or incompletely synthesized protein catabolic process); GO:0006457 (protein folding); GO:0006458 ('de novo' protein folding); GO:0042026 (protein refolding); GO:0006950 (response to stress); GO:0009408 (response to heat) | GO:0044257 (cellular protein catabolic process) |
| YGL001C | GO:0055114 (oxidation-reduction process); GO:0006694 (steroid biosynthetic process); GO:0016126 (sterol biosynthetic process); GO:0006696 (ergosterol biosynthetic process) | GO:0008610 (lipid biosynthetic process) |
| YJR068W | GO:0006260 (DNA replication); GO:0006298 (mismatch repair); GO:0006272 (leading strand elongation); GO:0007049 (cell cycle); GO:0007062 (sister chromatid cohesion) | GO:0006261 (DNA-dependent DNA replication); GO:0007059 (chromosome segregation); GO:0009987 (cellular process) |
| YOR201C | GO:0032259 (methylation); GO:0001510 (RNA methylation); GO:0006396 (RNA processing); GO:0000154 (rRNA modification) | GO:0010467 (rRNA modification); GO:0043170 (macromolecule metabolic) |
| YNL267W | GO:0046854 (phosphatidylinositol phosphorylation); GO:0016310 (phosphorylation); GO:0048015 (phosphatidylinositol-mediated) | GO:0007154 (cell communication); GO:0023052 (signaling) |
| YPR188C | GO:0007049 (cell cycle); GO:0051301 (cell division); GO:0000916 (actomyosin contractile ring contraction) | GO:0033205 (cell cycle cytokinesis); GO:0000910 (cytokinesis); GO:0022402 (cell cycle process); GO:0009987 (cellular process) |
| YOR332W | GO:0007035 (vacuolar acidification); GO:0015991 (ATP hydrolysis coupled proton transport); GO:0006810 (transport); GO:0006811 (ion transport); GO:0015992 (proton trans) | GO:0051179 (localization) |
| YJR042W | GO:0006606 (protein import into nucleus); GO:0000055 (ribosomal large subunit export from nucleus); GO:0051028 (mRNA transport); GO:0006406 (mRNA transport); GO:0006810 (transport); GO:0015031 (protein transport); GO:0031081 (nuclear pore distribution) | GO:0034504 (protein localization to nucleus); GO:0006403 (RNA localization); GO:0033365 (protein localization to organelle); GO:0008104 (protein localization); GO:0051641 (cell. localization); GO:0033036 (macromolecule localization); GO:0051179 (localization); GO:0033750 (ribosome localization) |
| YNL090W | GO:0007017 (microtubule-based process); GO:0030010 (establishment of cell polarity); GO:0007015 (actin filament organization); GO:0007264 (small GTPase mediated signal transduction) | GO:0007154 (cell communication); GO:0023052 (signaling) |
| YMR223W | GO:0006511 (ubiquitin-dependent protein catabolic process); GO:0006351 (transcription, DNA-templated); GO:0034729 (histone H3-K methylation); GO:0051568 (histone H3-K4 methylation); GO:0006508 (proteolysis); GO:0016578 (histone deubiquitination) | GO:0044257 (cell protein catabolic process); GO:0043170 (macromolecule metabolic process); GO:0008152 (metabolic proc.); GO:0010467 (gene exp.) |
| YML085C | GO:0006184 (GTP catabolic process); GO:0007017 (microtubule-based process); GO:0000070 (mitotic sister chromatid segregation); GO:0045143 (homologous chromosome segregation); GO:0030473 (nuclear migration along microtubule) | GO:0051647 (nucleus localization); GO:0000747 (conjugation with cellular fusion); GO:0051640 (organelle localization); GO:0051641 (cellular localization); GO:0000746 (conjugation); GO:0051704 (multi-organism process); GO:0007018 (microtubule-based movement); GO:0022403 |
| YPR187W | GO:0006351 (transcription, DNA-templated); GO:0006360 (transcription from RNA polymerase I promoter); GO:0006366 (transcription from RNA polymerase II promoter); GO:0006383 (transcription from RNA polymerase III promoter); GO:0042797 (tRNA transcription from RNA polymerase III promoter) | GO:0043170 (macromolecule metabolic process); GO:0008152 (metabolic process); GO:0010467 (gene expression) |
| YGL103W | GO:0006412 (translation); GO:0002181 (cytoplasmic translation); GO:0046677 (response to antibiotic); GO:0046898 (response to cycloheximide) | GO:0010467 (gene expression); GO:0043170 (macromolecule metabolic process); GO:0008152 (metabolic process) |
| YGR216C | GO:0006506 (GPI anchor biosynthetic process) | GO:0042158 (lipoprotein biosynthetic process) |
| YER157W | GO:0016236 (macroautophagy); GO:0030242 (peroxisome degradation); GO:0006886 (intracellular protein transport); GO:0006810 (transport); GO:0015031 (protein transport); GO:0032258 (CVT pathway); GO:0006888 (ER Golgi vesicle-mediated transport); GO:0006891 (intra-Golgi vesicle-mediated transport); GO:0000301 (retrograde transport within Golgi) | GO:0008104 (protein localization); GO:0051641 (cellular localization); GO:0033036 (macromolecule localization); O:0051179 (localization); GO:0034613 (cellular protein localization) |
| YGR247W | GO:0009187 (cyclic nucleotide metabolic process) | GO:0016070 (RNA metabolic process) |
| YGL243W | GO:0006396 (RNA processing); GO:0006400 (tRNA modification); GO:0008033 (tRNA processing) | GO:0010467 (gene expression); GO:0043170 (macromolecule metabolic process); GO:0008152 (metabolic process) |
| YMR166C | GO:0055085 (transmembrane transport); GO:0006810 (transport) | GO:0051179 (localization) |
| YMR178W | GO:0008150 (biological_process); GO:0006777 (Mo-molybdopterin cofactor biosynthetic process) | GO:0044267 (cellular protein metabolic process) |
| YML077W | GO:0006914 (autophagy); GO:0006810 (transport); GO:0016192 (vesicle-mediated transport); GO:0006888 (ER vesicle- transport) | GO:0051179 (localization); GO:0051641 (cellular localization) |
| YML073C | GO:0006412 (translation); GO:0002181 (cytoplasmic translation) | GO:0043170 (macromolecule metabolic proc.); GO:0008152 (metabolic proc.) |
| YOR035C | GO:0007533 (mating type switching); GO:0030036 (actin cytoskeleton organization); GO:0008298 (intracellular mRNA localization) | GO:0030154 (cell differentiation); GO:0032505 (reproduction of a single-celled organism); GO:0000003 (reproduction) |
| YOR222W | GO:0055085 (transmembrane transport); GO:0006810 (transport); GO:0006839 (mitochondrial transport) | GO:0051179 (localization); GO:0051641 (cellular localization) |
| YNL135C | GO:0018208 (peptidyl-proline modification); GO:0000413 (protein peptidyl-prolyl isomerization); GO:0006457 (protein folding) | GO:0009092 (homoserine metabolic process) |
| YGL200C | GO:0006810 (transport); GO:0015031 (protein transport); GO:0016192 (vesicle-mediated transport); GO:0006888 (ER to Golgi vesicle-mediated transport) | GO:0051179 (localization); GO:0051641 (cellular localization) |
| YGR260W | GO:0055085 (transmembrane transport); GO:0006810 (transport); GO:0015890 (nicotinamide mononucleotide transport) | GO:0051179 (localization) |
| YPR166C | GO:0006412 (translation); GO:0032543 (mitochondrial translation) | GO:0010467 (gene expression); GO:0043170 (macromolecule metabolic proc.) |
| YKR019C | GO:0006914 (autophagy); GO:0006629 (lipid metabolic process); GO:0009267 (cellular response to starvation); GO:0000183 (chromatin silencing at rDNA); GO:0048017 (inositol lipid-mediated signaling); GO:0032258 (CVT pathway) | GO:0007154 (cell communication); GO:0023052 (signaling); GO:0034613 (cellular protein localization); GO:0008104 (protein localization); GO:0051641 (cellular localization); GO:0051179 (localization) |
| YLR348C | GO:0006810 (transport); GO:0006817 (phosphate ion transport) | GO:0051179 (localization) |
| YLR431C | GO:0006914 (autophagy); GO:0034497 (protein localization to pre-autophagosomal structure); GO:0006810 (transport); GO:0015031 (protein transport); GO:0032258 (CVT pathway) | GO:0034613 (cellular protein localization); GO:0008104 (protein localization); GO:0051179 (localization) |
| YJL004C | GO:0006810 (transport); GO:0015031 (protein transport); GO:0043001 (Golgi to plasma membrane protein transport); GO:0006895 (Golgi to endosome transport) | GO:0051179 (localization); GO:0034613 (cell protein localization); GO:0008104 (protein localization); GO:0051641 (cell localization) |
| YFL055W | GO:0055085 (transmembrane transport); GO:0003333 (amino acid transmembrane transport); GO:0006810 (transp) | GO:0051179 (localization) |
| YPR179C | GO:0006351 (transcription, DNA-templated); GO:0016575 (histone deacetylation); GO:0007059 (chromosome segregation); GO:0010978 (gene silencing involved in chronological cell aging); GO:0031047 (gene silencing by RNA) | GO:0043170 (macromolecule metabolic process); GO:0008152 (metabolic process); GO:0001300 (chronological cell aging); GO:0007568 (aging); GO:0009987 (cellular process) |
The already known annotations and also the missing annotations Identified by PPFBM are both shown. A demo of PPFBM that identifies the biological process annotations of the complete yeast protein dataset is available at: http://ecesrvr.kustar.ac.ae:8080/PPFBM/
Fig. 7The Recall, Precision, and F-value for predicting GO Biological Process annotations using a successively accumulating set of training proteins
Fig. 8The Recall, Precision, and F-value for predicting GO Molecular Function annotations using a successively accumulating set of training proteins
|
|
| We illustrate some of the concepts presented in this paper using a running example pertaining to protein PA1535. We illustrate in the running example how the molecules associated with PA1535 can be used as a vector of weights to represent the protein. In Example 1, we present the abstract of Förster et al. [ |