| Literature DB >> 27525735 |
Ying Hong Li1, Jing Yu Xu1,2, Lin Tao1,3, Xiao Feng Li1, Shuang Li1, Xian Zeng3, Shang Ying Chen3, Peng Zhang3, Chu Qin3, Cheng Zhang3, Zhe Chen4, Feng Zhu1, Yu Zong Chen3.
Abstract
Knowledge of protein function is important for biological, medical and therapeutic studies, but many proteins are still unknown in function. There is a need for more improved functional prediction methods. Our SVM-Prot web-server employed a machine learning method for predicting protein functional families from protein sequences irrespective of similarity, which complemented those similarity-based and other methods in predicting diverse classes of proteins including the distantly-related proteins and homologous proteins of different functions. Since its publication in 2003, we made major improvements to SVM-Prot with (1) expanded coverage from 54 to 192 functional families, (2) more diverse protein descriptors protein representation, (3) improved predictive performances due to the use of more enriched training datasets and more variety of protein descriptors, (4) newly integrated BLAST analysis option for assessing proteins in the SVM-Prot predicted functional families that were similar in sequence to a query protein, and (5) newly added batch submission option for supporting the classification of multiple proteins. Moreover, 2 more machine learning approaches, K nearest neighbor and probabilistic neural networks, were added for facilitating collective assessment of protein functions by multiple methods. SVM-Prot can be accessed at http://bidd2.nus.edu.sg/cgi-bin/svmprot/svmprot.cgi.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27525735 PMCID: PMC4985167 DOI: 10.1371/journal.pone.0155290
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Partial list of the protein functional families covered by SVM-Prot and the prediction performance of the SVM, kNN and PNN models on the independent testing sets.
The complete list is provided in . The predicted results are given in Sensitivity SE = TP/(TP+FN), Specificity SP = TN/(TN+FP), Precision PR = TP/(TP + FP), where TP = true positive, FN = false negative, TN = true negative, and FP = false positive respectively.
| Family Name | GO Id | Training Dataset | Testing Dataset | Independent Dataset | SVM | KNN | PNN | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Positive | Negative | Positive | Negative | Positive | Negative | SE (%) | SP (%) | PR (%) | SE (%) | SP (%) | PR (%) | SE (%) | SP (%) | PR (%) | ||
| Actin capping | GO:0051693 | 652 | 41797 | 128 | 39584 | 102 | 36797 | 95.1 | 99.99 | 93.3 | 73.3 | 99.9 | 55.0 | 91.2 | 99.9 | 71.0 |
| Calmodulin-binding | GO:0005516 | 465 | 41405 | 223 | 39198 | 164 | 36421 | 87.2 | 99.99 | 90.5 | 70.0 | 99.4 | 41.6 | 82.9 | 99.9 | 84.0 |
| DNA recombination | GO:0006310 | 1678 | 10614 | 3382 | 18224 | 2391 | 13763 | 85.7 | 97.4 | 92.1 | 67.5 | 99.3 | 80.3 | 77.6 | 98.9 | 77.0 |
| DNA repair | GO:0006281 | 2142 | 10643 | 1179 | 17646 | 1438 | 13544 | 88.7 | 96.8 | 85.9 | 67.6 | 96.8 | 68.0 | 64.3 | 99.3 | 90.4 |
| DNA-directed DNA polymerase | GO:0003887 | 825 | 9588 | 963 | 19524 | 869 | 13900 | 81.9 | 98.7 | 88.5 | 51.1 | 99.4 | 41.2 | 80.2 | 99.7 | 66.4 |
| EC1.5 Oxidoreductases (CH-NH donors) | GO:0016645 | 276 | 8755 | 59 | 15283 | 70 | 12006 | 58.6 | 99.6 | 66.1 | 84.5 | 95.8 | 76.0 | 64.2 | 99.2 | 92.6 |
| EC2.9 Transferases (selenium-containing) | GO:0016785 | 693 | 41834 | 620 | 39620 | 617 | 36835 | 96.0 | 99.99 | 99.3 | 83.7 | 99.7 | 81.4 | 92.4 | 99.9 | 92.5 |
| EC3.7 Acting on carbon-carbon bonds | GO:0016822 | 1429 | 41786 | 760 | 39543 | 738 | 36786 | 96.5 | 99.9 | 94.6 | 84.4 | 99.4 | 78.0 | 91.2 | 99.9 | 95.2 |
| EC4.4 Carbon-sulfur lyases | GO:0016846 | 182 | 8999 | 76 | 15086 | 58 | 12031 | 60.3 | 99.9 | 83.3 | 77.0 | 99.0 | 82.7 | 83.8 | 99.2 | 86.8 |
| EC5.1 Racemases and Epimerases | GO:0016854 | 379 | 8796 | 95 | 15268 | 66 | 12020 | 53.0 | 99.4 | 53.9 | 80.7 | 93.8 | 80.0 | 69.3 | 98.7 | 94.5 |
| EC6.6 Forming nitrogen-metal bonds | GO:0051002 | 1590 | 41758 | 348 | 39529 | 336 | 36762 | 89.3 | 99.9 | 91.7 | 88.4 | 98.7 | 55.7 | 79.5 | 99.99 | 94.0 |
| Elongation factor activity | GO:0003746 | 1069 | 41788 | 938 | 39570 | 914 | 36788 | 97.5 | 99.99 | 98.8 | 95.8 | 99.6 | 83.7 | 84.1 | 99.9 | 94.0 |
| G protein coupled receptors | GO:0004930 | 927 | 8320 | 4998 | 20216 | 2532 | 14244 | 95.6 | 98.1 | 94.5 | 96.6 | 98.9 | 64.1 | 94.1 | 99.9 | 93.4 |
| Growth factor activity | GO:0008083 | 423 | 41680 | 301 | 39458 | 243 | 36696 | 88.9 | 99.9 | 88.5 | 76.7 | 99.9 | 81.9 | 86.0 | 99.9 | 86.7 |
| GTPase activation | GO:0005096 | 429 | 41584 | 207 | 39359 | 113 | 36597 | 92.9 | 99.9 | 83.3 | 61.8 | 99.6 | 42.2 | 86.7 | 99.9 | 78.4 |
| Heparin-binding | GO:0008201 | 182 | 41591 | 123 | 39344 | 92 | 36600 | 89.1 | 99.9 | 73.9 | 70.7 | 99.9 | 75.0 | 90.2 | 99.9 | 61.0 |
| Lipid degradation | GO:0016042 | 403 | 8775 | 233 | 20635 | 237 | 14701 | 78.9 | 99.9 | 97.4 | 64.8 | 99.8 | 72.0 | 75.1 | 99.9 | 89.6 |
| Lipid-binding | GO:0008289 | 274 | 8530 | 166 | 20926 | 167 | 14724 | 84.4 | 99.9 | 93.4 | 72.8 | 99.6 | 71.2 | 66.9 | 99.7 | 72.1 |
| rRNA-binding protein | GO:0019843 | 708 | 7972 | 1245 | 16044 | 101 | 11997 | 94.1 | 98.7 | 59.0 | 96.5 | 98.3 | 91.4 | 95.8 | 98.7 | 93.6 |
| Sigma factor activity | GO:0016987 | 101 | 41835 | 60 | 39616 | 54 | 36835 | 87.0 | 99.99 | 85.5 | 68.3 | 99.9 | 50.6 | 83.3 | 99.99 | 81.8 |