| Literature DB >> 22780965 |
Brian R King1, Suleyman Vural, Sanjit Pandey, Alex Barteau, Chittibabu Guda.
Abstract
<span class="abstract_title">BACKGROUND: Understanding protein subcellular localization is a necessary component toward understanding the overall function of a protein. Numerous computational methods have been published over the past decade, with varying degrees of success. Despite the large number of published methods in this area, only a small fraction of them are available for researchers to use in their own studies. Of those that are available, many are limited by predicting only a small number of organelles in the cell. Additionally, the majority of methods predict only a single location for a sequence, even though it is known that a large fraction of the proteins in eukaryotic species shuttle between locations to carry out their function.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22780965 PMCID: PMC3532370 DOI: 10.1186/1756-0500-5-351
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Eukaryotic training datasets
| Cytoplasm | CYT | 2513 | 481 |
| Cytoskeleton | CSK | 778 | 550 |
| Endoplasmic Reticulum | END | 870 | 121 |
| Extracellular/Secreted | EXC | 9618 | 238 |
| Golgi Apparatus | GOL | 290 | 59 |
| Lysosome | LYS | 215 | --- |
| Mitochondria | MIT | 2348 | 469 |
| Nuclear | NUC | 4216 | 630 |
| Plasma Membrane | PLA | 6006 | 351 |
| Peroxisome | POX | 183 | 50 |
| Cell Junction | JNC | 62 | --- |
| Chloroplast | CHL | --- | 4862 |
| Vacuole | VAC | --- | 131 |
| Multiple localizations | | 3309 | 304 |
Prokaryotic training datasets
| Cytoplasm | CYT | 4139 | 1776 |
| Extracellular | EXC | 263 | 292 |
| Inner Membrane | IN | 1397 | 347 |
| Outer Membrane | OUT | 344 | --- |
| Periplasm | PER | 415 | --- |
| Cell Wall | WAL | --- | 32 |
Figure 1Details of the query output showing the top three predictions with probability scores.
Class-wise performance of ngLOC method on eukaryotic datasets
| Cytoplasm | CYT | 0.818 | 0.750 | 0.983 | 0.762 | 0.864 | 0.832 | 0.991 | 0.838 |
| Cytoskeleton | CSK | 0.937 | 0.784 | 0.998 | 0.853 | 0.988 | 0.965 | 1.000 | 0.976 |
| Endoplasmic Reticulum | END | 0.970 | 0.785 | 0.999 | 0.869 | 0.876 | 0.645 | 0.999 | 0.748 |
| Extracellular | EXC | 0.953 | 0.946 | 0.974 | 0.922 | 0.966 | 0.723 | 0.999 | 0.831 |
| Golgi Apparatus | GOL | 0.940 | 0.593 | 1.000 | 0.745 | 1.000 | 0.509 | 1.000 | 0.712 |
| Lysosome | LYS | 0.949 | 0.693 | 1.000 | 0.810 | | | | |
| Mitochondria | MIT | 0.979 | 0.852 | 0.998 | 0.906 | 0.912 | 0.727 | 0.995 | 0.804 |
| Nuclear | NUC | 0.805 | 0.914 | 0.960 | 0.831 | 0.769 | 0.873 | 0.976 | 0.802 |
| Plasma Membrane | PLA | 0.876 | 0.957 | 0.961 | 0.890 | 0.796 | 0.866 | 0.989 | 0.822 |
| Peroxisome | POX | 0.946 | 0.760 | 1.000 | 0.847 | 0.906 | 0.580 | 1.000 | 0.724 |
| Cell Junction | JNC | 0.774 | 0.387 | 1.000 | 0.547 | | | | |
| Chloroplast | CHL | | | | | 0.946 | 0.977 | 0.899 | 0.889 |
| Vacuole | VAC | | | | | 0.844 | 0.702 | 0.998 | 0.766 |
Prec-precision; Sens-sensitivity; Spec-specificity; MCC-Matthews correlation coefficient.
Class-wise performance of ngLOC method on prokaryotic datasets
| Cytoplasm | CYT | 0.887 | 0.992 | 0.785 | 0.822 | 0.888 | 0.993 | 0.668 | 0.755 |
| Extracellular | EXC | 0.940 | 0.597 | 0.998 | 0.742 | 0.899 | 0.637 | 0.990 | 0.731 |
| Inner Membrane | IN | 0.908 | 0.830 | 0.977 | 0.835 | 0.941 | 0.648 | 0.993 | 0.754 |
| Outer Membrane | OUT | 0.987 | 0.680 | 1.000 | 0.812 | | | | |
| Periplasm | PER | 0.925 | 0.561 | 0.997 | 0.707 | | | | |
| Cell Wall | WAL | | | | | 0.786 | 0.344 | 0.999 | 0.516 |
Prec-precision; Sens-sensitivity; Spec-specificity; MCC-Matthews correlation coefficient.
- Benchmarking the confidence score on eukaryotic datasets
| | | ||||||||||
| Animal | % of dataset | - | 5.0 | 10.5 | 5.5 | 5.4 | 5.4 | 7.8 | 11.2 | 21.0 | 28.3 |
| | % accuracy | - | 50.2 | 48.9 | 80.4 | 89.0 | 95.4 | 98.4 | 98.9 | 99.5 | 99.9 |
| | Cumulative % of data | 100.0 | 100.0 | 95.0 | 84.5 | 79.0 | 73.6 | 68.2 | 60.5 | 49.3 | 28.3 |
| | Cumulative % accuracy | 89.9 | 89.9 | 92.0 | 97.3 | 98.5 | 99.2 | 99.5 | 99.6 | 99.8 | 99.9 |
| Plant | % of dataset | - | 7.9 | 6.1 | 4.6 | 4.8 | 4.7 | 6.1 | 8.3 | 14.3 | 43.2 |
| | % accuracy | - | 40.3 | 65.8 | 83.9 | 88.8 | 94.1 | 97.8 | 99.0 | 99.9 | 100.0 |
| | Cumulative % of data | 100.0 | 100.0 | 92.1 | 86.0 | 81.4 | 76.6 | 71.9 | 65.8 | 57.5 | 43.2 |
| Cumulative % accuracy | 91.4 | 91.4 | 95.8 | 97.9 | 98.7 | 99.3 | 99.7 | 99.9 | 100.0 | 100.0 | |
Class-wise performance of ngLOC and SherLoc2 on animal test dataset
| | CYT | 212 | 78.5 | 238 | 270 | |
| | CSK | 60 | 0 | 0.0 | 83 | |
| | END | 72 | 52 | 56.5 | 92 | |
| | EXC | 930 | 508 | 51.9 | 978 | |
| | GOL | 12 | 3 | 15.0 | 20 | |
| | LYS | 6 | 2 | 20.0 | 10 | |
| | MIT | 185 | 112 | 52.3 | 214 | |
| | NUC | 357 | 149 | 38.0 | 392 | |
| | PLA | 556 | 275 | 47.0 | 585 | |
| | POX | 14 | 87.5 | 14 | 87.5 | 16 |
| Total (Single) | 2404 | 1353 | 50.9 | 2660 | ||
| Total (Multi) | 249 | 218 | 72.2 | 302 | ||
| TOTAL | 2653 | 1571 | 53.0 | 2962 | ||
Bold letters denote better performance. TP- true positives; sens- sensitivity. Class-wise sensitivities are calculated for single-localized sequences only.
Class-wise performance of ngLOC and SherLoc2 on plant test dataset (single-localized only)
| CYT | 45 | 84.9 | 49 | 53 | |
| END | 13 | 9 | 58.8 | 17 | |
| GOL | 4 | 3 | 30.0 | 10 | |
| CSK | 13 | 0.0 | 13 | ||
| MIT | 40 | 69.0 | 36 | 58 | |
| NUC | 63 | 46 | 72.5 | 80 | |
| PLA | 36 | 1 | 5.0 | 40 | |
| EXC | 21 | 14 | 61.3 | 31 | |
| CHL | 539 | 0.0 | 543 | ||
| VAC | 10 | 0.0 | 15 | ||
| POX | 3 | 2 | 40.0 | 5 | |
| TOTAL | 787 | 229 | 26.5 | 865 |
Bold letters denote better performance. TP- true positives; sens- sensitivity.
Class-wise performance of ngLOC and WegoLoc on animal test dataset
| CSK | 93 | 0 | 0.0 | 117 | ||
| CYT | 271 | 75.1 | 329 | 361 | ||
| END | 90 | 76.9 | 93 | 117 | ||
| EXC | 1298 | 94.7 | 1307 | 1371 | ||
| GOL | 18 | 58.1 | 19 | 31 | ||
| LYS | 22 | 81.5 | 23 | 27 | ||
| MIT | 301 | 84.3 | 344 | 357 | ||
| NUC | 555 | 90.8 | 581 | 611 | ||
| PLA | 798 | 689 | 83.2 | 828 | ||
| POX | 16 | 66.7 | 24 | 24 | ||
| Total (Single) | 3462 | 3409 | 88.5 | 3854 | ||
| Total (Multi) | 415 | 400 | 82.3 | 486 | ||
| 3877 | 3809 | 87.8 | 4340 |
Bold letters denote better performance. TP- true positives; sens- sensitivity. Class-wise sensitivities are calculated for single-localized sequences only.
Class-wise performance of ngLOC and WegoLoc on plant test dataset (single-localized data only)
| CYT | 54 | 81.8 | 63 | 66 | |
| END | 9 | 5 | 38.5 | 13 | |
| GOL | 2 | 22.2 | 4 | 9 | |
| CSK | 22 | 0 | 0.0 | 23 | |
| MIT | 39 | 75.0 | 50 | 52 | |
| NUC | 53 | 80.3 | 60 | 66 | |
| PLA | 35 | 19 | 43.2 | 44 | |
| EXC | 20 | 14 | 46.7 | 30 | |
| CHL | 587 | 293 | 49.5 | 592 | |
| VAC | 11 | 61.1 | 12 | 18 | |
| POX | 4 | 3 | 50.0 | 6 | |
| TOTAL | 836 | 523 | 56.9 | 919 |
Bold letters denote better performance. TP- true positives; sens- sensitivity.