| Literature DB >> 20824206 |
Omer Sinan Saraç1, Volkan Atalay, Rengul Cetin-Atalay.
Abstract
Functional protein annotation is an important matter for in vivo and in silico biology. Several computational methods have been proposed that make use of a wide range of features such as motifs, domains, homology, structure and physicochemical properties. There is no single method that performs best in all functional classification problems because information obtained using any of these features depends on the function to be assigned to the protein. In this study, we portray a novel approach that combines different methods to better represent protein function. First, we formulated the function annotation problem as a classification problem defined on 300 different Gene Ontology (GO) terms from molecular function aspect. We presented a method to form positive and negative training examples while taking into account the directed acyclic graph (DAG) structure and evidence codes of GO. We applied three different methods and their combinations. Results show that combining different methods improves prediction accuracy in most cases. The proposed method, GOPred, is available as an online computational annotation tool (http://kinaz.fen.bilkent.edu.tr/gopred).Entities:
Mesh:
Substances:
Year: 2010 PMID: 20824206 PMCID: PMC2930845 DOI: 10.1371/journal.pone.0012382
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Sample GO DAG showing how we prepared positive and negative training data.
The double-circled node indicates the target term. Nodes with green check symbol represent terms included in the positive dataset while those labeled by red X symbol represent terms included in the negative dataset.
Evidence codes used by the GOA Project.
| Code | Explanation |
| IDA | Inferred from Direct Assay |
| IEP | Inferred from Expression Pattern |
| IGI | Inferred from Genetic Expression |
| IMP | Inferred from Mutant Phenotype |
| IPI | Inferred from Physical Interaction |
| RCA | Inferred from Reviewed Computational Analysis |
| TAS | Traceable Author Statement |
| IC | Inferred by Curator |
| IEA | Inferred by Electronic Annotation |
| ISS | Inferred from Sequence or Structural Similarity |
| NAS | Non-Traceable Author Statement |
| ND | No biological Data available |
Features used in Pepstats-SVM and their dimensions.
| Feature | Dimension |
| Molecular Weight | 1 |
| Number of residues | 1 |
| Average residues weight | 1 |
| Isoelectric point | 1 |
| Charge | 1 |
| A280 Molar Extinction Coefficient | 1 |
| A280 Extinction Coefficient 1mg/ml | 1 |
| Improbability of expression in inclusion bodies | 1 |
| Dayhoff Statistics for each amino acid | 20 |
| Percent of tiny residues | 1 |
| Percent of small residues | 1 |
| Percent of aliphatic residues | 1 |
| Percent of aromatic residues | 1 |
| Percent of non-polar residues | 1 |
| Percent of polar residues | 1 |
| Percent of charged residues | 1 |
| Percent of basic residues | 1 |
| Percent of acidic residues | 1 |
| Total | 37 |
Average scores, sensitivity and specificity values over 300 GO functional term classifiers.
| Method |
| Sensitivity | Specificity | Precision |
| SPMap | 0.62 | 89.12 | 88.92 | 0.51 |
| BLAST- | 0.70 | 92.07 | 92.53 | 0.59 |
| Pepstats-SVM | 0.39 | 75.47 | 75.48 | 0.29 |
| Voting | 0.71 | 90.50 | 92.85 | 0.61 |
| Mean | 0.74 | 91.11 | 93.74 | 0.65 |
| Weighted Mean | 0.77 | 91.82 | 94.79 | 0.68 |
| Addition | 0.70 | 92.72 | 92.49 | 0.60 |
Figure 2Histogram of scores of BLAST-NN and Weighted Mean methods for 300 GO terms.
Figure 3Comparison of sensitivity and specificity values with and without threshold relaxation.
The first and second columns are the sensitivity without and with threshold relaxation, third and forth columns are the specificity without and with threshold relaxation.
Changes in sensitivity and specificity, total change and change in F1 score when threshold relaxation is applied.
| Methods |
|
|
|
|
| SPMap | 20.80 | −9.94 | 10.86 | 0.14 |
| BLAST-kNN | 4.13 | −3.44 | 0.69 | −0.10 |
| Pepstats-SVM | 62.66 | −21.17 | 41.49 | 0.26 |
| Voting | 21.68 | −5.86 | 15.82 | −0.14 |
| Mean | 24.02 | −4.95 | 19.07 | −0.13 |
| Weighted Mean | 15.56 | −3.96 | 11.60 | −0.14 |
| Addition | 25.63 | −6.19 | 19.44 | −0.05 |
A positive value indicates an increase whereas a negative value indicates a decrease.
GOPred, ConFunc, PFP and GOtcha annotations for 12 human gene entries from the NCBI gene database.
| Gene Symbol | Literature Report | GOPred annotations:Probability | ConFunc (GO c-value) | PFP: Probabality | GOtcha: Est. likelyhood% |
| DDX11L1 | a protein from novel transcript family from human subtelomeric regions with unestablished function | hydrolase activity, acting on ester bonds: | RNA binding (c:4.5569e-05), nucleic acid binding (c:0.00020006), binding (c:0.00020006) | hydrolase activity, acting on acid anhydrides:100%, purine nucleotide binding:100%, binding:98% | catalytic activity:52%, DNA helicase activity:52%, hydrolase activity:52%, helicase activity:52%, nucleoside-triphosphatase activity:52% |
| KILLIN | Nuclear inhibitor of DNA synthesis with high affinity DNA binding | Exonuclease activity: | No results generated because insufficient Annotated sequences were identified | DNA binding:34%, nucleotide binding:26%, ATP binding:26% | Molecular function child node absent |
| GLRX | glutaredoxin-like, oxidoreductase | oxidoreductase activity: | glutathione disulfide oxidoreductase activity (c:1.0138e-08), peptide disulfide oxidoreductase activity (c:1.0138e-08), disulfide oxidoreductase activity (c:7.3644e-08), oxidoreductase activity (c:1.0175e-07), catalytic activity (c:1.0175e-07) | purine nucleotide binding:97%, porter activity:96%, binding:89%, steroid sulfotransferase activity:87% | Molecular function child node absent |
| FINP2 | AMPK and FLCN interaction ( | enzyme activator activity: | No results generated because insufficient Annotated sequences were identified | binding:88%, transition metal ion binding:80%, cation binding:71% | Molecular function child node absent |
| KIF18B | microtubule associated motor protein that use ATP | microtubule binding: | motor activity (c:1.3769e-17) | purine nucleotide binding:97%, porter activity:96%, binding:89%, steroid sulfotransferase activity:87% | binding:33%, ribonucleotide binding:33%, nucleotide binding:33%, purine nucleotide binding:33%, purine ribonucleotide binding:33% |
| HELT | transcription regulator activity | protein homodimerization activity: | DNA binding (c:1.2677e-09), nucleic acid binding (c:1.2677e-09), binding (c:1.2677e-09) | hydrolase activity, acting on acid anhydrides:100%, purine nucleotide binding:100%, binding:98% | transcription regulator activity:23%, binding:23%, DNA binding:23%, nucleic acid binding:23%, transcription factor activity:23% |
| RGL4 | guanin nucleotide dissociation | guanyl-nucleotide exchange factor: | receptor binding (c:1.3056e-10), protein binding (c:2.4304e-09), binding (c:2.4283e-09), Molecular Function (c:4.5798e-10) | binding:78%, cation binding:71%, trimethylamine-N-oxide reductase (cytochrome c) activity:65%, nucleic acid binding:63% | Molecular function child node absent |
| PGAP1 | GPI inositol-deacylase | lipase activity: | phosphoric ester hydrolase activity (c:0), nuclease activity (c:0), hydrolase activity, acting on ester bonds (c:3.0683e-17), hydrolase activity (c:1.5396e-17), catalytic activity (c:1.5396e-17) | cation binding:62%, binding:59%, ion binding:58%, metal ion binding:52% | Molecular function child node absent |
| COBRA1 | member of negative elongation factor complex during transcription, inhibitor of AP1 | ribonucleotide binding: | binding (c:3.361e-18) | binding:88%, transition metal ion binding:80%, cation binding:71%, nucleic acid binding:68% | Molecular function child node absent |
| GCK | phosphorylation of glucose during glycolysis | carbohydrate kinase activity: | glucose binding (c:2.7105e-19), monosaccharide binding (c:2.7105e-19), sugar binding (c:2.7105e-19), carbohydrate binding (c:2.7105e-19), binding (c:2.7555e-14) | hexokinase activity:100%, binding:97%, transferase activity, transferring phosphorus-containing groups:89%, catalytic activity:80%, nucleotide binding:77%, glucokinase activity:76% | binding:38%, nucleotide binding:38%, adenyl ribonucleotide binding:38%, ribonucleotide binding:38%, purine nucleotide binding:38%, ATP binding:38% |
| TP53 | p53 tumor supressor, transcription regulation | chromatin binding: | transcription factor activity (c:3.0644e-12), DNA binding (c:1.0205e-11), nucleic acid binding (c:1.0205e-11) binding (c:1.0205e-11), transcription regulator activity (c:1.1331e-11) | purine nucleotide binding:100%, DNA strand annealing activity:100%, binding:99%, nucleic acid binding:98%, transcription factor activity:94%, single-stranded DNA binding:90% | binding:33%, ion binding:33%, metal ion binding:33%, cation binding:33%, zinc ion binding:33%, transition metal ion binding:33% |
| HRAS | v-Ha-ras Harvey rat sarcoma viral oncogene homolog | protein C-terminus binding: | GTP-dependent protein binding (c:2.1523e-09), protein binding (c:7.4218e-09), binding (c:1.003e-06) | hydrolase activity, acting on acid anhydrides:100%, purine nucleotide binding:100%, guanyl nucleotide binding:100%, GTP binding:99%, binding:99% | binding:34%, nucleotide binding:34%, purine nucleotide binding:34%, ribonucleotide binding:34%, purine ribonucleotide binding:34%, guanyl ribonucleotide binding:31%, GTP binding:31% |
Figure 4GOPred output for helt (HES/HEY-like transcription factor) protein.