| Literature DB >> 25734546 |
Abdollah Dehzangi, Sohrab Sohrabi, Rhys Heffernan, Alok Sharma, James Lyons, Kuldip Paliwal, Abdul Sattar.
Abstract
BACKGROUND: The functioning of a protein relies on its location in the cell. Therefore, predicting protein subcellular localization is an important step towards protein function prediction. Recent studies have shown that relying on Gene Ontology (GO) for feature extraction can improve the prediction performance. However, for newly sequenced proteins, the GO is not available. Therefore, for these cases, the prediction performance of GO based methods degrade significantly.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25734546 PMCID: PMC4347615 DOI: 10.1186/1471-2105-16-S4-S1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The list of the physicochemical attributes and the number assigned to them.
| Physicochemical Attirbutes | |
|---|---|
| 1 | Average number of surrounding residues |
| 2 | Polarity |
| 3 | |
| 4 | Mean |
| 5 | Solvent accessible reduction ratio |
| 6 | Partition Coefficient |
| 7 | Rigidity |
| 8 | Average surrounding hydrophobicity |
| 9 | Hydrophobicity scale (contact energy derived from 3D data) |
| 10 | Hydrophilicity scale derived from (HPLC) peptide retention data |
Figure 1Overlapped segmented distribution-based feature extraction method.
Figure 2The overall architecture of our proposed approach.
Results achieved for the feature vectors extracted for all 10 physicochemical-based attributes for Gram-positive data set (in percentage %) for all 4 subcellular locations ((1) Cell membrane, (2) Cell wall (3) Cytoplasm, (4) Extracellular, which are numbered from one to four respectively).
| Features | (1) | (2) | (3) | (4) | Overall |
|---|---|---|---|---|---|
| Comb_1 | 79.9 | 16.7 | 89.9 | 79.7 | 81.6 |
| Comb_2 | 78.7 | 16.7 | 89.4 | 80.5 | 81.2 |
| Comb_3 | 78.7 | 16.7 | 88.9 | 81.3 | 81.2 |
| Comb_4 | 81.0 | 11.1 | 88.9 | 79.7 | 81.5 |
| Comb_5 | 76.4 | 16.6 | 90.7 | 82.1 | 81.5 |
| Comb_6 | 78.2 | 16.7 | 89.9 | 80.5 | 81.3 |
| Comb_7 | 77.0 | 16.7 | 91.3 | 80.5 | 81.5 |
| Comb_8 | 77.6 | 16.7 | 90.4 | 82.1 | 81.7 |
| Comb_9 | 82.2 | 16.7 | 92.3 | 80.5 | 83.6 |
| Comb_10 | 79.9 | 16.7 | 91.8 | 82.9 | 83.1 |
Results achieved for the feature vectors extracted for all 10 physicochemical-based attributes for Gram-negative data set (in percentage %) for all 8 subcellular locations ((1) Cell inner membrane, (2) Cell outer membrane, (3) Cytoplasm, (4) Extracellular, (5) Fimbrium, (6) Flagellum, (7) Nucleoid, (8) Periplasm which are numbered from one to eight respectively).
| Features | (1) | (2) | (3) | (4) | (5) | (6) | (7) | (8) | Overall |
|---|---|---|---|---|---|---|---|---|---|
| Comb_1 | 86.9 | 54.0 | 88.5 | 51.1 | 56.3 | 16.7 | 12.5 | 61.7 | 76.4 |
| Comb_2 | 86.3 | 50.0 | 88.8 | 50.4 | 59.4 | 25.0 | 00.0 | 58.9 | 75.6 |
| Comb_3 | 86.7 | 53.2 | 88.3 | 52.6 | 62.5 | 00.0 | 00.0 | 58.9 | 76.0 |
| Comb_4 | 87.5 | 56.5 | 87.1 | 49.6 | 59.4 | 00.0 | 12.5 | 62.2 | 76.4 |
| Comb_5 | 87.4 | 51.6 | 87.3 | 45.9 | 68.8 | 08.3 | 12.5 | 61.7 | 75.9 |
| Comb_6 | 86.9 | 52.4 | 86.3 | 52.6 | 68.8 | 00.0 | 25.0 | 62.8 | 76.2 |
| Comb_7 | 87.0 | 54.0 | 88.3 | 48.9 | 65.6 | 16.7 | 00.0 | 58.9 | 76.0 |
| Comb_8 | 86.4 | 55.6 | 87.8 | 51.1 | 59.4 | 08.3 | 25.0 | 61.7 | 76.3 |
| Comb_9 | 87.7 | 53.2 | 87.6 | 49.6 | 68.8 | 16.7 | 12.5 | 61.7 | 76.6 |
| Comb_10 | 86.4 | 53.2 | 87.7 | 51.1 | 65.6 | 08.3 | 12.5 | 60.6 | 75.8 |
The overall prediction accuracy achieved using Rotation Forest to each feature groups investigated in this study (in percentage).
| Features | Gram-positive | Gram-negative |
|---|---|---|
| PSSM_AAC | 75.7 | 71.2 |
| PSSM_AC | 79.5 | 72.2 |
| OSD | 63.9 | 63.9 |
| OSA | 67.7 | 68.9 |
| PSSM_AAC + PSSM_AC | 80.5 | 74.9 |
| PSSM_AAC + PSSM_AC + OSD | 81.3 | 75.6 |
| PSSM_AAC + PSSM_AC + OSD + OSA | 83.6 | 76.6 |
It also shows the results achieved using Rotation Forest to the combination of feature groups to build the Combination of all for feature groups together. We use attribute number 9 (Hydrophobicity scale (contact energy derived from 3D data) attribute) for this experiment as using this attribute we attain our best results.