| Literature DB >> 30084920 |
Jeffrey M Yunes1,2, Patricia C Babbitt2,3,4.
Abstract
Motivation: Critical evaluation of methods for protein function prediction shows that data integration improves the performance of methods that predict protein function, but a basic BLAST-based method is still a top contender. We sought to engineer a method that modernizes the classical approach while avoiding pitfalls common to state-of-the-art methods.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30084920 PMCID: PMC6361244 DOI: 10.1093/bioinformatics/bty672
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Graphical Summary of Effusion, using cytochrome P450 CYP1C1 (UniProt Q4ZIL6) as an example. (a) Homologs of UniProt Q4ZIL6 collected by BLAST. UniProt Q4ZIL6 has been experimentally annotated to a descendent of steroid hydroxylase activity (GO:0008395), but this annotation was one of the ones withheld during the test phase, and it has no annotation to any arachidonic acid 14, 15-epoxygenase activity (GO:0008404), an annotation for a homolog of the query reported by BLAST. (b) An SSN is built from the all-by-all edges computed via DIAMOND. The network is visualized with Cytoscape (Shannon, 2003) using Organic Layout. (c) The reduced network. The layout is applied to all edges, but only the MST edges are used in the model. The resulting network is used as the topology of a PGM. (d) The protein function of each node is represented by a subset of GO, with each GO term represented by a Bernoulli random variable. (e) Two views of the network following inference. The left figure is shaded according to the probability of GO:0008395. The right figure is shaded according to the probability of GO:0008404. (f) Probabilities for a subset of terms for query UniProt Q4ZIL6. Probabilities are calculated for each candidate GO term for each node. Nodes are shaded from white being 0% to black being 100%, except for the node representing GO:0008395, which is colored a shade of blue based on its posterior probability
Raw contingency table for GO:0016790
| Protein’s annotation to GO:0016788 (GO parent) | BLAST neighbor’s annotation to GO:0016790 | Count protein annotation to GO:0016790 is negative or unknown | Count protein is positively annotated to GO:0016790 |
|---|---|---|---|
| 41446 | 0 | ||
| 5 | 0 | ||
| 1453 | 5 | ||
| 3 | 19 |
Fig. 2.Performance plots over all proteins in the test set, regardless of whether any of the methods failed to make predictions. (Left) Precision vs. Recall (Right) Sample-Weighted Weighted Precision vs. Sample-Weighted Weighted Recall
Fig. 3.Performance plots over treated proteins. Carroll2006 was not plotted because of its low coverage. SIFTER was not plotted due to its effect on the scale. (Left) Precision vs. Recall (Right) Sample-Weighted Weighted Precision vs. Sample-Weighted Weighted Recall