| Literature DB >> 31888447 |
Yu-Hua Yao1,2, Ya-Ping Lv3, Ling Li4, Hui-Min Xu5, Bin-Bin Ji3, Jing Chen5, Chun Li3, Bo Liao3, Xu-Ying Nan6.
Abstract
BACKGROUND: Subcellular localization prediction of protein is an important component of bioinformatics, which has great importance for drug design and other applications. A multitude of computational tools for proteins subcellular location have been developed in the recent decades, however, existing methods differ in the protein sequence representation techniques and classification algorithms adopted.Entities:
Keywords: Gene ontology; Physicochemical properties; Position-specific score matrix; Principal component analysis; Support vector machine
Mesh:
Substances:
Year: 2019 PMID: 31888447 PMCID: PMC6936157 DOI: 10.1186/s12859-019-3232-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The composition of Gneg 1456 dataset
| Subcellular location | Number of proteins |
|---|---|
| Cell inner membrane | 557 |
| Cell outer membrane | 124 |
| Cytoplasm | 410 |
| Extracellular | 133 |
| Fimbrium | 32 |
| Flagellum | 32 |
| Nucleoid | 8 |
| Periplasm | 180 |
| Sum | 1456 |
Reduced scheme of amino acid
| Classification | Shorthand | Abbreviation |
|---|---|---|
| Hydrophilic | L | R, D, E, N, Q, K, H |
| Hydrophobic | B | L, I, V, A, M, F |
| neutral | W | S, T, Y, W |
| proline | P | P |
| glycine | G | G |
| cysteine | C | C |
Fig. 1Gapped k-mer calculation method based on multi-treeand the final binary tree with 1 space in Gapped k-mer
Fig. 2ZD98 and CL317 classification results with the change of d
the prediction results based on Gapped k-mer of ZD98 data set
| k | Space(g) | Dimension | OA(%) |
|---|---|---|---|
| 2 | 0 | 36 | 87.76 |
| 2 | 1 | 12 | 84.69 |
| 3 | 0 | 221 | 88.78 |
| 3 | 1 | 108 | 89.80 |
| 3 | 2 | 18 | 85.71 |
| 4 | 0 | 1071 | 90.82 |
| 4 | 1 | 849 | 91.84 |
| 4 | 2 | 216 | 90.82 |
| 4 | 3 | 24 | 86.73 |
| 5 | 0 | 3732 | 93.88 |
| 5 | 1 | 5351 | 92.86 |
| 5 | 2 | 2127 | 93.88 |
| 5 | 3 | 360 | 89.80 |
| 5 | 4 | 30 | 87.76 |
| 6 | 0 | 8698 | 92.86 |
| 6 | 1 | 22,263 | 92.86 |
| 6 | 2 | 15,986 | 93.88 |
| 6 | 3 | 4260 | 93.88 |
| 6 | 4 | 540 | 91.84 |
| 6 | 5 | 36 | 88.78 |
the prediction results based on Gapped k-mer of CL317 data set
| k | Space | Dimension | OA(%) |
|---|---|---|---|
| 2 | 0 | 36 | 82.22 |
| 2 | 1 | 12 | 71.75 |
| 3 | 0 | 216 | 86.23 |
| 3 | 1 | 108 | 86.98 |
| 3 | 2 | 18 | 75.56 |
| 4 | 0 | 1234 | 88.89 |
| 4 | 1 | 864 | 88.89 |
| 4 | 2 | 216 | 88.89 |
| 4 | 3 | 24 | 77.78 |
| 5 | 0 | 5607 | 87.94 |
| 5 | 1 | 6145 | 90.48 |
| 5 | 2 | 2158 | 88.89 |
| 5 | 3 | 360 | 89.84 |
| 5 | 4 | 30 | 81.59 |
| 6 | 0 | 17,637 | 90.16 |
| 6 | 1 | 33,470 | 90.48 |
| 6 | 2 | 18,424 | 90.48 |
| 6 | 3 | 4316 | 89.84 |
| 6 | 4 | 540 | 90.16 |
| 6 | 5 | 36 | 82.22 |
Fig. 3The highest overall accuracy of two data sets correspond to different k values
Comparison of the results of Gneg1456 data sets
| Methods | Overall |
|---|---|
| iLoc-Gneg by Xiao et al. [ | 91.4% |
| Li and Yu [ | 93.2% |
| Gneg-mPLoc by Shen and Chou [ | 85.7% |
| The proposed method | 93.3% |