| Literature DB >> 23729470 |
Shengnan Tang1, Tonghua Li, Peisheng Cong, Wenwei Xiong, Zhiheng Wang, Jiangming Sun.
Abstract
Knowledge of subcellular localizations (SCLs) of plant proteins relates to their functions and aids in understanding the regulation of biological processes at the cellular level. We present PlantLoc, a highly accurate and fast webserver for predicting the multi-label SCLs of plant proteins. The PlantLoc server has two innovative characters: building localization motif libraries by a recursive method without alignment and Gene Ontology information; and establishing simple architecture for rapidly and accurately identifying plant protein SCLs without a machine learning algorithm. PlantLoc provides predicted SCLs results, confidence estimates and which is the substantiality motif and where it is located on the sequence. PlantLoc achieved the highest accuracy (overall accuracy of 80.8%) of identification of plant protein SCLs as benchmarked by using a new test dataset compared other plant SCL prediction webservers. The ability of PlantLoc to predict multiple sites was also significantly higher than for any other webserver. The predicted substantiality motifs of queries also have great potential for analysis of relationships with protein functional regions. The PlantLoc server is available at http://cal.tongji.edu.cn/PlantLoc/.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23729470 PMCID: PMC3692052 DOI: 10.1093/nar/gkt428
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The numbers of plant proteins for the training and testing datasets. The STrain (4436 entries, the sum of each number in cylinder) and STest (230 entries) sets are divided by time (bottom). The names and the numbers of sequences of 11 subcellular domains of plant proteins are shown in different colors (top). The 36 651 entries (bottom middle) obtained by ‘similarity’ annotation were used in selection of LM (see text).
Figure 2.The process of the LM library building strategy. The LM algorithm is an enumeration and merging procedure, prefix-of-seed and suffix-of-seed are marked with tangerine. In LM selection, the negative set is enlarged. The LM libraries contain LMs (expressed by characters) and their frequencies in training sets.
Figure 3.Compared results with other methods on STest.
Performance of PlantLoc and other webserver on SMS
| PlanLoc | iLoc-Plant | Yloc | mGoasum | ngLOC | WoLF PSORT | WegoLoc | |
|---|---|---|---|---|---|---|---|
| MSA (%) | 86.3 | 25.5 | 35.3 | 41.2 | 39.2 | 58.8 | 64.7 |
| RA (%) | 100.0 | 50.0 | 75.0 | 72.4 | 27.8 | 42.9 | 45.8 |
Figure 4.A screenshot of a PlantLoc output and obtained substantiality motifs of Q8S8N6 (protein ID). The probability of prediction expressed by graph. The identified SCL(s) and probability localization(s).
Figure 5.The substantiality motifs with annotation by UniProtKB. Characters colored blue are annotated from UniProtKB. Characters colored red are substantiality motifs for EXC and GOL.