| Literature DB >> 17597907 |
Paul D Taylor1, Teresa K Attwood, Darren R Flower.
Abstract
Based on Bayesian Networks, methods were created that address protein sequence-based bacterial subcellular location prediction. Distinct predictive algorithms for the eight bacterial subcellular locations were created. Several variant methods were explored. These variations included differences in the number of residues considered within the query sequence - which ranged from the N-terminal 10 residues to the whole sequence - and residue representation - which took the form of amino acid composition, percentage amino acid composition, or normalised amino acid composition. The accuracies of the best performing networks were then compared to PSORTB. All individual location methods outperform PSORTB except for the Gram+ cytoplasmic protein predictor, for which accuracies were essentially equal, and for outer membrane protein prediction, where PSORTB outperforms the binary predictor. The method described here is an important new approach to method development for subcellular location prediction. It is also a new, potentially valuable tool for candidate subunit vaccine selection.Entities:
Year: 2006 PMID: 17597907 PMCID: PMC1891713 DOI: 10.6026/97320630001276
Source DB: PubMed Journal: Bioinformation ISSN: 0973-2063
Prediction accuracies of the Gram+ individual location predictors. The results of highest accuracy are shown in bold. Specificity refers to the accuracy of prediction from the positive test set while sensitivity refers to accuracy of prediction for the negative test set
| Sequence representation | Sub-sequence length | Cytoplasmic accuracy (%) | Membrane accuracy (%) | Extracellular accuracy (%) | |||
|---|---|---|---|---|---|---|---|
| Spec | Sens | Spec | Sens | Spec | Sens | ||
| Amino acid composition | 10 | 88.42 | 34.25 | 73.54 | 55.25 | 76.23 | 42.55 |
| 20 | 89.84 | 42.83 | 72.92 | 58.03 | 73.73 | 38.08 | |
| 30 | 94.52 | 55.90 | 84.25 | 67.06 | 77.81 | 65.97 | |
| 40 | 93.6 | 77.68 | 89.03 | 78.94 | 80.51 | 81.60 | |
| 50 | 96.78 | 94.24 | 96.30 | 89.51 | 82.53 | 93.90 | |
| All sequence | 91.51 | 90.82 | 91.41 | 80.03 | 84.91 | 74.84 | |
| Actual amino acids | 10 | 52.51 | 22.36 | 12.14 | 1.41 | 0.04 | 1.14 |
| 20 | 63.35 | 26.51 | 15.52 | 6.77 | 3.62 | 2.62 | |
| 30 | 64.93 | 34.99 | 24.05 | 9.59 | 9.27 | 5.01 | |
| 40 | 68.34 | 38.27 | 29.15 | 17.97 | 15.73 | 12.63 | |
| 50 | 69.41 | 48.15 | 32.33 | 16.11 | 18.42 | 11.09 | |
| All sequence | 72.42 | 58.73 | 36.87 | 23.72 | 16.64 | 14.78 | |
| Normalised amino acid composition | 10 | 89.52 | 29.86 | 69.93 | 52.93 | 77.36 | 40.42 |
| 20 | 89.72 | 38.11 | 70.95 | 61.09 | 78.41 | 47.73 | |
| 30 | 91.42 | 44.19 | 74.01 | 73.60 | 82.25 | 57.80 | |
| 40 | 92.20 | 61.07 | 79.44 | 85.26 | 81.09 | 71.03 | |
| 50 | 93.13 | 79.14 | 83.10 | 97.88 | 83.98 | 84.76 | |
| All sequence | 90.96 | 73.77 | 84.76 | 93.61 | 80.15 | 84.08 | |
Prediction accuracies of the Gram- individual location predictors. The results of highest accuracy are shown in bold
| Sequence representation | Sub-sequence length | Cytoplasmic accuracy (%) | Inner Membrane accuracy (%) | Periplasmic accuracy (%) | Outer Membrane accuracy (%) | Extra-celluar accuracy (%) |
|---|---|---|---|---|---|---|
| Amino acid composition | 10 | 98.35 | 78.42 | 84.35 | 48.85 | 71.14 |
| 20 | 92.52 | 81.09 | 88.89 | 56.50 | 76.91 | |
| 30 | 94.99 | 91.75 | 90.14 | 63.88 | 81.06 | |
| 40 | 96.34 | 89.33 | 93.98 | 69.25 | 82.37 | |
| 50 | 97.48 | 96.83 | 94.57 | 77.90 | 87.97 | |
| All sequence | 91.41 | 94.79 | 94.02 | 73.21 | 81.96 | |
| Actual amino acids | 10 | 68.53 | 64.36 | 24.79 | 13.16 | 52.62 |
| 20 | 74.52 | 60.23 | 33.05 | 14.93 | 58.35 | |
| 30 | 77.90 | 61.85 | 41.21 | 17.09 | 52.70 | |
| 40 | 74.08 | 66.33 | 45.82 | 24.51 | 55.08 | |
| 50 | 79.76 | 64.67 | 53.68 | 22.74 | 61.98 | |
| All sequence | 73.13 | 63.16 | 53.68 | 25.88 | 59.22 | |
| Normalised amino acid composition | 10 | 94.32 | 77.35 | 80.41 | 51.51 | 71.01 |
| 20 | 93.45 | 84.24 | 83.85 | 53.86 | 75.25 | |
| 30 | 93.78 | 86.94 | 87.02 | 57.12 | 72.09 | |
| 40 | 96.26 | 90.24 | 91.97 | 63.26 | 73.63 | |
| 50 | 94.78 | 93.12 | 93.52 | 61.03 | 77.60 | |
| All sequence | 93.21 | 91.51 | 92.87 | 67.73 | 74.28 |
Prediction accuracies of the Gram- individual location predictors for the negative test sets. The results of highest accuracy are shown in bold
| Sequence representation | Sub-sequence length | Cytoplasmic accuracy (%) | Inner Membrane accuracy (%) | Periplasmic accuracy (%) | Outer Membrane accuracy (%) | Extra-celluar accuracy (%) |
|---|---|---|---|---|---|---|
| Amino acid composition | 10 | 51.03 | 83.52 | 44.02 | 37.85 | 73.31 |
| 20 | 53.64 | 81.09 | 58.23 | 51.67 | 76.90 | |
| 30 | 64.07 | 84.24 | 62.68 | 64.69 | 85.02 | |
| 40 | 81.75 | 88.31 | 79.43 | 77.11 | 86.42 | |
| 50 | 90.13 | 94.76 | 92.01 | 86.36 | 92.85 | |
| All sequence | 88.24 | 93.41 | 84.42 | 86.02 | 88.22 | |
| Actual amino acids | 10 | 40.03 | 53.59 | 20.52 | 12.05 | 23.24 |
| 20 | 41.49 | 58.21 | 23.84 | 16.73 | 22.86 | |
| 30 | 48.79 | 68.84 | 55.08 | 22.41 | 26.72 | |
| 40 | 55.32 | 61.33 | 42.21 | 25.62 | 30.55 | |
| 50 | 58.62 | 63.71 | 49.33 | 32.41 | 29.86 | |
| All sequence | 64.28 | 59.35 | 43.57 | 34.79 | 30.08 | |
| Normalised amino acid composition | 10 | 44.63 | 84.04 | 41.93 | 44.32 | 74.59 |
| 20 | 48.32 | 88.68 | 56.26 | 46.72 | 77.35 | |
| 30 | 61.04 | 87.14 | 63.17 | 51.48 | 81.68 | |
| 40 | 71.38 | 93.73 | 71.87 | 56.37 | 84.24 | |
| 50 | 77.20 | 96.26 | 78.88 | 68.53 | 85.32 | |
| All sequence | 83.56 | 92.47 | 73.29 | 59.25 | 84.99 |
Results of the individual method predictions compared to the PSORTB algorithm
| Gram-type | Subcellular location | PSORTB accuracy (%) | Individual location predictors accuracy (%) |
|---|---|---|---|
| Gram+ | Cytoplasmic | 96.38 | 96.78 |
| Membranous | 91.47 | 96.30 | |
| Extra-cellular | 70.42 | 82.53 | |
| Gram- | Cytoplasmic | 91.37 | 97.48 |
| Inner membrane | 94.68 | 96.83 | |
| Periplasmic | 84.69 | 94.57 | |
| Outer membrane | 83.70 | 77.90 | |
| Extra-cellular | 77.55 | 87.97 |