| Literature DB >> 16774677 |
Alla Bulashevska1, Roland Eils.
Abstract
BACKGROUND: The subcellular location of a protein is closely related to its function. It would be worthwhile to develop a method to predict the subcellular location for a given protein when only the amino acid sequence of the protein is known. Although many efforts have been made to predict subcellular location from sequence information only, there is the need for further research to improve the accuracy of prediction.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16774677 PMCID: PMC1525000 DOI: 10.1186/1471-2105-7-298
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Performance comparison of single Bayesian classifier (BC) and hierarchical ensemble of Bayesian classifiers (HensBC).
| Dataset | BC-approach Accuracy (%) | HensBC-approach Accuracy (%) |
| Data_Euk | 70.8 | 78.7 |
| Data_Prok | 89.0 | 89.3 |
| Data_SWISS | 62.9 | 80.2 |
| Data_Gram | 77.2 | 83.2 |
| Data_OMP | 89.9 | 91.2 |
| Data_Apoptosis | 85.7 | 89.8 |
The predictive accuracy for subcellular locations of single Bayesian classifier (BC) and hierarchical ensemble of Bayesian classifiers (HensBC) for Data_Euk.
| BC-approach | HensBC-approach | |||
| Cellular location | Accuracy (%) | MCC | Accuracy (%) | MCC |
| Cytoplasmic | 78.8 | 0.53 | 76.3 | 0.63 |
| Extracellular | 61.5 | 0.59 | 78.8 | 0.76 |
| Mitochondrial | 52.3 | 0.41 | 53.0 | 0.53 |
| Nuclear | 74.0 | 0.64 | 87.7 | 0.73 |
| Overall accuracy | - | - | ||
The predictive accuracy for subcellular locations of single Bayesian classifier (BC) and hierarchical ensemble of Bayesian classifiers (HensBC) for Data_Prok.
| BC-approach | HensBC-approach | |||
| Cellular location | Accuracy (%) | MCC | Accuracy (%) | MCC |
| Cytoplasmic | 95.1 | 0.81 | 95.5 | 0.82 |
| Extracellular | 74.8 | 0.75 | 72.9 | 0.75 |
| Periplasmic | 75.8 | 0.69 | 76.7 | 0.70 |
| Overall accuracy | - | - | ||
The predictive accuracy for subcellular locations of single Bayesian classifier (BC) and hierarchical ensemble of Bayesian classifiers (HensBC) for Data_SWISS.
| BC-approach | HensBC-approach | |||
| Cellular location | Accuracy (%) | MCC | Accuracy (%) | MCC |
| Chloroplast | 53.6 | 0.53 | 75.8 | 0.74 |
| Cytoplasm | 52.7 | 0.38 | 75.7 | 0.67 |
| Cytoskeleton | 58.3 | 0.19 | 66.7 | 0.57 |
| Endoplasmic ret | 48.9 | 0.31 | 67.2 | 0.67 |
| Extracellular | 71.3 | 0.61 | 86.2 | 0.82 |
| Golgi apparatus | 11.8 | 0.07 | 20.6 | 0.23 |
| Lysosome | 78.6 | 0.52 | 80.2 | 0.69 |
| Mitochondria | 59.8 | 0.43 | 64.1 | 0.61 |
| Nuclear | 66 | 0.55 | 85.3 | 0.76 |
| Peroxisome | 38.5 | 0.33 | 58.2 | 0.57 |
| Vacuole | 24.1 | 0.23 | 51.9 | 0.57 |
| Overall accuracy | - | - | ||
The predictive accuracy for subcellular locations of single Bayesian classifier (BC) and hierarchical ensemble of Bayesian classifiers (HensBC) for Data_Gram.
| BC-approach | HensBC-approach | |||
| Cellular location | Accuracy (%) | MCC | Accuracy (%) | MCC |
| Cytoplasmic | 84.6 | 0.67 | 76.9 | 0.71 |
| Inner membrane | 80.7 | 0.82 | 85.8 | 0.84 |
| Periplasmic | 75.9 | 0.66 | 79.1 | 0.71 |
| Outer membrane | 78.2 | 0.71 | 87.1 | 0.79 |
| Extracellular | 60.2 | 0.61 | 73.9 | 0.73 |
| Overall accuracy | - | - | ||
Utility of predictions according to partial credit method. Label denotes true localization.
| Label | Prediction | Utility |
| Cytoplasmic | Cytoplasmic | 1 |
| Cytoplasmic/Inner membrane | Cytoplasmic/Inner membrane | 1 |
| Cytoplasmic/Inner membrane | Cytoplasmic | 0.5 |
| Cytoplasmic | Cytoplasmic/Inner membrane | 0.5 |
| Cytoplasmic/Inner membrane | Inner Membrane/Periplasmic | 0.333 |
| Cytoplasmic | Inner membrane | 0 |
| Periplasmic | Cytoplasmic/Inner membrane | 0 |
| Periplasmic | Cytoplasmic | 0 |
The predictive accuracy for subcellular locations of single Bayesian classifier (BC) and hierarchical ensemble of Bayesian classifiers (HensBC) for Data_OMP.
| BC-approach | HensBC-approach | |||
| Cellular location | Accuracy (%) | MCC | Accuracy (%) | MCC |
| OMP | 90.7 | 0.79 | 94.2 | 0.82 |
| Globular | 89.5 | 0.79 | 89.6 | 0.82 |
| Overall accuracy | - | - | ||
The predictive accuracy for subcellular locations of single Bayesian classifier (BC) and hierarchical ensemble of Bayesian classifiers (HensBC) for Data_Apoptosis.
| BC-approach | HensBC-approach | |||
| Cellular location | Accuracy (%) | MCC | Accuracy (%) | MCC |
| Cytoplasmic | 90.7 | 0.81 | 95.3 | 0.89 |
| Plasma membrane | 90 | 0.83 | 90 | 0.83 |
| Mitochondrial | 92.3 | 0.83 | 92.3 | 0.83 |
| Other | 50.0 | 0.57 | 66.7 | 0.80 |
| Overall accuracy | - | - | ||
Eukaryotic sequences within each subcellular location group (Data_Euk).
| Cellular location | Number of proteins |
| Cytoplasmic | 684 |
| Extracellular | 325 |
| Mitochondrial | 321 |
| Nuclear | 1097 |
| Sum | 2427 |
Prokaryotic sequences within each subcellular location group (Data_Prok).
| Cellular location | Number of proteins |
| Cytoplasmic | 688 |
| Extracellular | 107 |
| Periplasmic | 202 |
| Sum | 997 |
Protein sequences within each subcellular location group (Data_SWISS).
| Cellular location | Number of proteins |
| Chloroplast | 1145 |
| Cytoplasm | 2465 |
| Cytoskeleton | 24 |
| Endoplasmic | 137 |
| Extracellular | 4228 |
| Golgi | 34 |
| Lysosome | 131 |
| Mitochondria | 1106 |
| Nuclear | 3419 |
| Peroxisome | 122 |
| Vacuole | 54 |
| Sum | 12865 |
Bacterial proteins within each subcellular location group (Data_Gram).
| Cellular location | Number of proteins |
| Cytoplasmic | 278 |
| Cytoplasmic/Inner membrane | 16 |
| Inner Membrane | 309 |
| Inner Membrane/Periplasmic | 51 |
| Periplasmic | 276 |
| Periplasmic/Outer membrane | 2 |
| Outer membrane | 391 |
| Outer membrane/Extracellular | 78 |
| Extracellular | 190 |
| Sum | 1591 |
Apoptosis proteins within each subcellular location group (Data_Apoptosis).
| Cellular location | Number of proteins |
| Cytoplasmic | 43 |
| Plasma membrane | 30 |
| Mitochondrial | 13 |
| Other | 12 |
| Sum | 98 |
Figure 1Hierarchical ensemble constructed for the task of discriminating Outer membrane (OMP) from Globular proteins. Each node is labelled with the numbers of OMPs and Globular proteins associated with it. At each internal node a Markov chains based Bayesian classifier is learned from the associated proteins, saved in the node and applied on these proteins. Two edges originate from each internal node, labelled "OMP" and "Globular", corresponding to the child nodes, which become proteins assigned by the classifier to OMP or Globular class, respectively. The final localization class to be outputed at each leaf node in the application phase is underlined.
Figure 2Hierarchical ensemble consisting of 3 Classifiers constructed to solve 3-class classification problem.