| Literature DB >> 16011808 |
Jiren Wang1, Wing-Kin Sung, Arun Krishnan, Kuo-Bin Li.
Abstract
BACKGROUND: Predicting the subcellular localization of proteins is important for determining the function of proteins. Previous works focused on predicting protein localization in Gram-negative bacteria obtained good results. However, these methods had relatively low accuracies for the localization of extracellular proteins. This paper studies ways to improve the accuracy for predicting extracellular localization in Gram-negative bacteria.Entities:
Mesh:
Year: 2005 PMID: 16011808 PMCID: PMC1190155 DOI: 10.1186/1471-2105-6-174
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Number of protein sequences in different sites
| cytoplasmic | 248 |
| inner membrane | 268 |
| periplasmic | 244 |
| outmembrane | 352 |
| extracellular | 190 |
| cytoplasmic / inner membrane | 14 |
| membrane / periplasmic | 49 |
| outer membrane / extracellular | 76 |
| All sites | 1441 |
Prediction recall for a single localization.
| Cytoplasmic | 94.8% (235 / 248) |
| Extracellular | 83.2% (158 / 190) |
| Innermembrane | 88.1% (236 / 268) |
| Outermembrane | 93.2% (328 / 352) |
| Periplasmic | 86.9% (212 / 244) |
| Overall recall | 89.8% (1169/1302) |
Prediction recall for dual localizations.
| Cytoplasmic / innermembrane | 92.9% (13/14) |
| Outermembrane / extracellular | 98.9% (75/76) |
| Periplasmic / innermembrane | 75.5% (37/49) |
| Overall recall | 89.9% (125/139) |
Performance comparisons among P-CLASSIFIER's, PSORT-B's, and CELLO's methods.
| Cytoplasmic | 94.6% | 0.85 | 90.7% | 0.85 | 69.4% | 0.79 |
| Extracellular | 86.0% | 0.89 | 78.9% | 0.82 | 70.0% | 0.79 |
| Innermembrane | 87.1% | 0.92 | 88.4% | 0.92 | 78.7% | 0.85 |
| Outermembrane | 93.6% | 0.90 | 94.6% | 0.90 | 90.3% | 0.93 |
| Periplasmic | 85.9% | 0.81 | 86.9% | 0.80 | 57.6% | 0.69 |
| Overall recall | 89.8% | - | 88.9% | - | 74.8% | - |
Prediction recall for dual localizations when "half" predictions are only counted as half correct.
| Cytoplasmic / innermembrane | 75.0% (10.5/14) |
| Outermembrane / extracellular | 84.2% (64/76) |
| Periplasmic / innermembrane | 38.8% (19/49) |
| Overall recall | 67.3% (93.5/139) |
An example of clustering 20 amino acids into 4 groups.
| (A, G, I, L, M, P, V) | 71.2413% | Move 'G' from group 1 to group 4 |
| (A, I, L, M, P, V) | 74.0941% | Move 'A' from group 1 to group 4 |
| (I, L, M, P, V) | 75.9445% | Move 'P' from group 1 to group 4 |
| (I, L, M, V) | 77.5636% | Move 'C' from group 2 to group 3 |
| (I, L, M, V) | 78.4888% | Move 'Q' from group 2 to group 3 |
| (I, L, M, V) | 78.9514% | Move 'Y' from group 4 to group 3 |
| (I, L, M, V) | 79.0285% | Reach local maximal grouping score and stop. |
Algorithm for amino acid subalphabets searching
| 1 | current_node ← the initial group assignment by dividing the 20 amino acids into Ng groups. |
| 2 | REPEAT |
| 3 | best_node ← current_node |
| 4 | REPEAT |
| 5 | current_node ← best_node |
| 6 | generate all child nodes of the current node in the search tree. |
| 7 | best_node ← the child node with the highest |
| 8 | UNTIL |
| 9 | IF |
| 10 | current_node ← randomly re-generate initial group assignment |
| 11 | ENDIF |
| 12 | UNTIL |
Algorithm for SVM subset selection
| 1 | Let M = {SVM1, SVM2, ..., SVMN} be the set of candidate SVMs |
| 2 | Let Scoremax = V(S, M) and Setmax = M |
| 3 | FOR i = N-1 to 1 |
| 4 | Vmax = max{V(S, M - {SVMr}) | SVMr ∈ M, 1 ≤ r ≤ N } |
| 5 | IF V(S, M - {SVMj}) == Vmax (1 ≤ j ≤ N) THEN |
| 6 | M = M - {SVMj} |
| 7 | ENDIF |
| 8 | IF Vmax ≥ Scoremax THEN |
| 9 | Scoremax = Vmax |
| 10 | Setmax = M |
| 11 | ENDIF |
| 12 | END FOR |
The encoding methods of input vectors in the fifteen selected SVMs.
| 1 | 1-gram with 2 partitioned parts |
| 2 | 1-gram with 3 partitioned parts |
| 3 | 1-gram with 4 partitioned parts |
| 4 | 1-gram with 4 partitioned parts (apply feature selection to No. 3) |
| 5 | 1-gram with 6 partitioned parts |
| 6 | 2-gram without any gaps |
| 7 | 2-gram without any gaps (apply feature selection to No. 6) |
| 8 | 2-gram with one gap |
| 9 | 3-gram with 6 merged groups |
| 10 | 3-gram with 7 merged groups |
| 11 | 3-gram with 8 merged groups |
| 12 | 4-gram with 4 merged groups |
| 13 | 4-gram with 4 merged groups |
| 14 | 4-gram with 4 merged groups |
| 15 | 4-gram with 4 merged groups (apply feature selection to No. 14) |