| Literature DB >> 17254308 |
Josefine Sprenger1, J Lynn Fink, Rohan D Teasdale.
Abstract
BACKGROUND: Determination of the subcellular location of a protein is essential to understanding its biochemical function. This information can provide insight into the function of hypothetical or novel proteins. These data are difficult to obtain experimentally but have become especially important since many whole genome sequencing projects have been finished and many resulting protein sequences are still lacking detailed functional information. In order to address this paucity of data, many computational prediction methods have been developed. However, these methods have varying levels of accuracy and perform differently based on the sequences that are presented to the underlying algorithm. It is therefore useful to compare these methods and monitor their performance.Entities:
Mesh:
Year: 2006 PMID: 17254308 PMCID: PMC1764480 DOI: 10.1186/1471-2105-7-S5-S3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The distribution of individual subcellular locations in the SP3763 and LOC2145 datasets
| Subcellular location | SP3763 | LOC2145 | ||
| 1147 | (26.8%) | 559 | (25.5%) | |
| 637 | (14.9%) | 87 | (4.0%) | |
| 347 | (8.1%) | 175 | (8.0%) | |
| 1547 | (36.1%) | 206 | (9.4%) | |
| 396 | (9.2%) | 703 | (32.1%) | |
| 96 | (2.2%) | 155 | (7.1%) | |
| 75 | (1.7%) | 163 | (7.4%) | |
| 20 | (0.5%) | 57 | (2.6%) | |
| 21 | (0.5%) | 86 | (3.9%) | |
Some proteins have been reported to localize to multiple subcellular locations and are thus represented multiple times in the table.
Evaluation of subcellular prediction methods on SP3763 (A) and LOC2145 (B)
| 0.58 | 0.51 | 0.77 (0.65) | 0.49 | 0.61 | ||||||||
| 0.83 | 0.63 | 0.55 | 0.79 | 0.87 (0.80) | 0.91 | 0.62 | 0.78 | 0.75 | 0.74 | 0.11 | 0.30 | |
| 0.27 | 0.40 | 0.51 | 0.39 | 0.57 (0.49) | 0.56 | 0.41 | 0.40 | 0.40 | 0.41 | 0.11 | 0.17 | |
| 0.45 | 0.51 | 0.41 | 0.53 | 0.66 (0.55) | 0.81 | 0.50 | 0.50 | 0.45 | 0.41 | 0.11 | 0.09 | |
| 0.47 | 0.82 | 0.41 | 0.81 | 0.70 (0.58) | 0.90 | 0.30 | 0.88 | 0.62 | 0.84 | 0.11 | 0.41 | |
| 0.38 | 0.28 | 0.38 | 0.26 | 0.10 (0.07) | 0.38 | 0.42 | 0.33 | 0.45 | 0.36 | 0.11 | 0.11 | |
| 0.04 | 0.80 | 0.08 | 0.08 | 0.48 (0.41) | 0.45 | 0.24 | 0.12 | 0.02 | 0.33 | 0.11 | 0.03 | |
| 0.03 | 0.22 | 0.16 | 0.08 | 0.57 (0.43) | 0.19 | 0.32 | 0.11 | 0.05 | 0.05 | 0.11 | 0.02 | |
| 0.30 | 0.67 | 0.55 | 0.07 | 0.46 (0.30) | 0.55 | 0.55 | 0.08 | 0.05 | 0.03 | 0.11 | 0.01 | |
| 0.24 | 0.10 | 0.24 | 0.05 | 0.50 (0.43) | 0.19 | 0.19 | 0.02 | 0 | 0 | 0.11 | 0.01 | |
| 0.44 | 0.43 | 0.56 (0.43) | 0.45 | 0.43 | ||||||||
| 0.62 | 0.55 | 0.36 | 0.70 | 0.67 (0.57) | 0.92 | 0.42 | 0.71 | 0.49 | 0.71 | 0.11 | 0.26 | |
| 0.23 | 0.08 | 0.39 | 0.08 | 0.63 (0.51) | 0.18 | 0.34 | 0.10 | 0.32 | 0.07 | 0.11 | 0.04 | |
| 0.60 | 0.48 | 0.54 | 0.57 | 0.83 (0.77) | 0.80 | 0.59 | 0.43 | 0.52 | 0.43 | 0.11 | 0.08 | |
| 0.65 | 0.30 | 0.58 | 0.33 | 0.92 (0.90) | 0.34 | 0.52 | 0.52 | 0.84 | 0.33 | 0.11 | 0.10 | |
| 0.44 | 0.57 | 0.53 | 0.76 | 0.16 (0.10) | 0.76 | 0.51 | 0.74 | 0.54 | 0.65 | 0.11 | 0.33 | |
| 0.05 | 0.78 | 0.25 | 0.44 | 0.58 (0.50) | 0.73 | 0.30 | 0.33 | 0.03 | 1.00 | 0.11 | 0.07 | |
| endoplasmic reticulum | 0.03 | 0.83 | 0.14 | 0.27 | 0.61 (0.48) | 0.56 | 0.31 | 0.35 | 0.07 | 0.22 | 0.11 | 0.08 |
| 0.12 | 1.00 | 0.46 | 0.17 | 0.31 (0.20) | 0.92 | 0.51 | 0.24 | 0.04 | 0.06 | 0.11 | 0.03 | |
| 0.05 | 0.25 | 0.09 | 0.11 | 0.17 (0.13) | 0.50 | 0.17 | 0.09 | 0 | 0.06 | 0.11 | 0.04 | |
The individual sensitivity (Sens) and specificity (Spec) achieved by each predictor overall and individual locations was calculated. Values when all unpredicted subcellular locations are regarded as failed (i.e. false negative) are shown in parentheses. When calculating the overall sensitivity, a true positive was assigned for proteins with multiple subcellular locations when any of the locations were correctly predicted.