| Literature DB >> 19152693 |
Manish Kumar1, Gajendra P S Raghava.
Abstract
BACKGROUND: The nucleus, a highly organized organelle, plays important role in cellular homeostasis. The nuclear proteins are crucial for chromosomal maintenance/segregation, gene expression, RNA processing/export, and many other processes. Several methods have been developed for predicting the nuclear proteins in the past. The aim of the present study is to develop a new method for predicting nuclear proteins with higher accuracy.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19152693 PMCID: PMC2632991 DOI: 10.1186/1471-2105-10-22
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Average amino acid composition of nuclear and non-nuclear protein sequences of main dataset.
The performance of SVM models using various types of composition.
| 81.33 | 81.75 | 81.64 | 0.59 | |
| 82.03 | 83.11 | 82.83 | 0.61 | |
| 82.18 | 82.68 | 82.55 | 0.61 | |
| 83.03 | 85.76 | 85.05 | 0.65 | |
| 83.69 | 83.89 | 83.84 | 0.63 | |
| 77.41 | 82.06 | 80.84 | 0.56 | |
| 80.29 | 80.93 | 80.77 | 0.57 | |
| 80.19 | 81.22 | 80.95 | 0.57 | |
| 81.48 | 85.04 | 84.11 | 0.63 | |
| 83.62 | 83.65 | 83.64 | 0.63 | |
| 83.77 | 84.03 | 83.96 | 0.63 | |
| 82.77 | 83.95 | 83.64 | 0.62 | |
| 83.80 | 83.47 | 83.55 | 0.63 |
In split amino acid composition whole protein was divided into X {X = 2,3,4} equal parts; amino acid composition of each fragments was determined individually and concatenated together to make final input vector of dimension 20*X. NT15 = amino acid composition of N-terminal 15 residues, CT15 = amino acid composition of C-terminal 15 residues and so on; R = amino acid composition of remaining residues.
Figure 2Variation in the preference of amino acids at N-terminal 25 residues, C-terminal 25 residues and full length nuclear and non-nuclear proteins.
The performance of hybrid module, which combines HMM and SVM model (using NT25+R).
| 99.19 | 88.03 | 90.95 | 0.80 | |
| 99.04 | 88.86 | 91.53 | 0.82 | |
| 98.86 | 89.61 | 92.03 | 0.82 | |
| 98.63 | 90.37 | 92.53 | 0.83 | |
| 98.34 | 91.08 | 92.98 | 0.84 | |
| 97.82 | 91.67 | 93.28 | 0.85 | |
| 97.49 | 92.39 | 93.72 | 0.85 | |
| 96.86 | 92.96 | 93.98 | 0.86 | |
| 96.13 | 93.41 | 94.12 | 0.86 | |
| 95.46 | 94.05 | 94.42 | 0.86 | |
| 94.02 | 95.00 | 94.75 | 0.87 | |
| 92.80 | 95.22 | 94.59 | 0.86 | |
| 92.07 | 95.77 | 94.80 | 0.87 | |
| 90.70 | 96.18 | 94.75 | 0.87 | |
| 89.52 | 96.62 | 94.77 | 0.87 | |
| 87.90 | 96.91 | 94.55 | 0.86 | |
| 86.27 | 97.25 | 94.38 | 0.85 | |
| 85.21 | 97.57 | 94.34 | 0.85 | |
| 83.58 | 97.70 | 94.01 | 0.84 | |
| 82.66 | 98.00 | 93.99 | 0.84 |
Threshold is for cut-off for SVM on the basis of which performance is calculated. Values in bold shows the region where sensitivity and specificity are approximately equal.
The performance of different subcellular localization methods on blind/independent dataset used in BaCelLo.
| 66.1 | 56.4 | 61.1 | 66.4 | 71.3 | 68.8 | |
| 62.2 | 49.5 | 55.5 | 66.4 | 66.9 | 66.6 | |
| 70.2 | 43.0 | 54.9 | 71.1 | 44.2 | 56.1 | |
| 67.8 | 37.2 | 50.2 | 70.5 | 38.4 | 52.0 | |
| 79.1 | 35.8 | 53.2 | 84.4 | 37.5 | 56.3 | |
| 80.2 | 38.7 | 55.7 | 88.5 | 51.0 | 67.2 | |
| 73.3 | 64.2 | 68.6 | 62.3 | 63.5 | 62.9 | |
Cov is coverage/sensitivity; nAcc is normalized accuracy; GAv = geometric average between coverage and normalized accuracy (*see Additional file 1 for detail description of these parameters).
The performance of nuclear protein prediction methods on 2213 human proteins (1526 non-nuclear and 687 nuclear).
| 0.31 | 0.62 | |
| 0.63 | 0.48 | |
| 0.23 | 0.63 | |
| 0.70 | 0.47 | |
| 0.17 | 0.73 | |
| 0.43 | 0.57 | |
| 0.63 | 0.59 | |
| 0.61 | 0.67 | |
PPV = Tp/Tp+Fp; Tp = True positive predictions; Fp = False positive predictions.
Figure 3Prediction of nuclear proteins using NpPred in proteome of Yeast (.