| Literature DB >> 16336650 |
Abstract
BACKGROUND: The large gap between the number of protein sequences in databases and the number of functionally characterized proteins calls for the development of a fast computational tool for the prediction of subnuclear and subcellular localizations generally applicable to protein sequences. The information on localization may reveal the molecular function of novel proteins, in addition to providing insight on the biological pathways in which they function. The bulk of past work has been focused on protein subcellular localizations. Furthermore, no specific tool has been dedicated to prediction at the subnuclear level, despite its high importance. In order to design a suitable predictive system, the extraction of subtle sequence signals that can discriminate among proteins with different subnuclear localizations is the key.Entities:
Mesh:
Substances:
Year: 2005 PMID: 16336650 PMCID: PMC1325059 DOI: 10.1186/1471-2105-6-291
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The summary of the nuclear proteins
| Label | Compartment | Number of sequences |
| 1 | PML BODY | 38 |
| 2 | Nuclear Lamina | 55 |
| 3 | Nuclear Splicing Speckles | 56 |
| 4 | Chromatin | 61 |
| 5 | Nucleoplasm | 75 |
| 6 | Nucleolus | 219 |
| - | Multiple Localizations | 92 |
AA – amino acid composition encoding method;
DI – di-peptide encoding method;
TRI – tri-peptide encoding method;
D1X1 – amino acid composition encoding vector transformed with D1;
D2X2 – di-peptide encoding vector transformed with D2;
D3X3 – tri-peptide encoding vector transformed with D3.
Results for each individual encoding method
| Method | AA | DI | TRI | D1X1 | D2X2 | D3X3 |
| Compartment | Accuracy % [MCC] | |||||
| PML BODY | 26.3 [0.144] | 13.2 [0.091] | 0.0 [-0.045] | 31.6 [0.183] | 29.0 [0.139] | 10.5 [0.066] |
| Nuclear Lamina | 40.0 [0.363] | 27.3 [0.256] | 40.0 [0.228] | 45.5 [0.340] | 41.8 [0.279] | 36.4 [0.331] |
| Nuclear Splicing Speckles | 30.4 [0.326] | 32.1 [0.358] | 30.4 [0.365] | 33.9 [0.321] | 33.9 [0.316] | 33.9 [0.391] |
| Chromatin | 14.8 [0.174] | 11.5 [0.106] | 13.1 [0.191] | 19.8 [0.215] | 21.3 [0.248] | 21.3 [0.271] |
| Nucleoplasm | 25.3 [0.189] | 26.7 [0.207] | 12.0 [0.123] | 20.0 [0.182] | 22.7 [0.246] | 28.0 [0.229] |
| Nucleolus | 78.1 [0.374] | 83.1 [0.357] | 85.8 [0.357] | 73.5 [0.357] | 72.2 [0.364] | 83.1 [0.367] |
| Single-localization Overall Accuracy and MCC | ||||||
| Multi-localization Overall Accuracy and MCC | ||||||
AA – amino acid composition encoding method;
DI – di-peptide encoding method;
TRI – tri-peptide encoding method;
D1X1 – amino acid composition encoding vector transformed with D1;
D2X2 – di-peptide encoding vector transformed with D2;
D3X3 – tri-peptide encoding vector transformed with D3.
Results using combined methods
| Methods | Combination of AA, DI, TRI | Combination of D1X1, D2X2, and D3X3 |
| Compartment | Accuracy % [MCC] | |
| PML BODY | 13.2 [0.073] | 29.0 [0.172] |
| Nuclear Lamina | 30.9 [0.275] | 43.6 [0.338] |
| Nuclear Splicing Speckles | 32.1 [0.410] | 35.7 [0.363] |
| Chromatin | 9.8 [0.170] | 19.7 [0.260] |
| Nucleoplasm | 20.0 [0.182] | 22.7 [0.206] |
| Nucleolus | 88.1 [0.374] | 76.7 [0.367] |
| Single-localization Overall Accuracy and MCC | ||
| Multi-localization Overall Accuracy and MCC | ||
AA – amino acid composition encoding method;
DI – di-peptide encoding method;
TRI – tri-peptide encoding method;
D1X1 – amino acid composition encoding vector transformed with D1;
D2X2 – di-peptide encoding vector transformed with D2;
D3X3 – tri-peptide encoding vector transformed with D3.