| Literature DB >> 24804260 |
Pengmian Feng1, Hao Lin2, Wei Chen3, Yongchun Zuo4.
Abstract
J-proteins are molecular chaperones and present in a wide variety of organisms from prokaryote to eukaryote. Based on their domain organizations, J-proteins can be classified into 4 types, that is, Type I, Type II, Type III, and Type IV. Different types of J-proteins play distinct roles in influencing cancer properties and cell death. Thus, reliably annotating the types of J-proteins is essential to better understand their molecular functions. In the present work, a support vector machine based method was developed to identify the types of J-proteins using the tripeptide composition of reduced amino acid alphabet. In the jackknife cross-validation, the maximum overall accuracy of 94% was achieved on a stringent benchmark dataset. We also analyzed the amino acid compositions by using analysis of variance and found the distinct distributions of amino acids in each family of the J-proteins. To enhance the value of the practical applications of the proposed model, an online web server was developed and can be freely accessed.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24804260 PMCID: PMC3996952 DOI: 10.1155/2014/935719
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Breakdown of the benchmark dataset used in current study.
| Total number | Subfamily | Number |
|---|---|---|
| 1245 | Type I J-protein | 63 |
| Type II J-protein | 53 | |
| Type III J-protein | 1107 | |
| Type IV J-protein | 22 |
Scheme for reduced amino acid alphabet based on protein blocks method.
| Cluster profiles | Protein blocks method |
|---|---|
| CP(13) | G- |
| CP(11) | G- |
| CP(9) | G- |
| CP(8) | G- |
| CP(5) | G- |
Feature vector dimension of n-peptide composition with different cluster profiles.
|
| Cluster profiles | ||||
|---|---|---|---|---|---|
| CP(13) | CP(11) | CP(9) | CP(8) | CP(5) | |
|
| 13 | 11 | 9 | 8 | 5 |
|
| 169 | 121 | 81 | 64 | 25 |
|
| 2197 | 1331 | 729 | 512 | 125 |
Results obtained in identifying J-protein functional types with tripeptide case (n = 3).
| Subfamily | Metrics | Feature dimension of | ||||
|---|---|---|---|---|---|---|
| CP(13) | CP(11) | CP(9) | CP(8) | CP(5) | ||
| 2197 | 1331 | 729 |
| 125 | ||
| Type I J-protein | Sn | 63.49% | 74.60% | 77.78% |
| 60.31% |
| Sp | 99.56% | 98.94% | 99.11% |
| 98.93% | |
| MCC | 0.74 | 0.76 | 0.79 |
| 0.66 | |
| Type II J-protein | Sn | 37.73% | 45.28% | 39.62% |
| 24.53% |
| Sp | 100% | 99.31% | 99.39% |
| 99.56% | |
| MCC | 0.60 | 0.57 | 0.53 |
| 0.41 | |
| Type III J-protein | Sn | 99.81% | 98.82% | 99.09% |
| 99.19% |
| Sp | 44.44% | 58.78% | 55.72% |
| 40.00% | |
| MCC | 0.63 | 0.68 | 0.67 |
| 0.56 | |
| Type IV J-protein | Sn | 0 | 27.27% | 13.64% |
| 4.54% |
| Sp | 100.00% | 100.00% | 100.00% |
| 100.00% | |
| MCC | 0 | 0.52 | 0.37 |
| 0.21 | |
|
| ||||||
| OA | 93.57% | 94.06% | 93.98% |
| 92.36% | |
Comparative result of SVM with other methods for J-protein types classification.
| Subfamily | SVM | Random Forest | Naïve Bayes | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Sn | SP | MCC | Sn | SP | MCC | Sn | SP | MCC | |
| Type I J-protein | 74.60% | 98.76% | 0.75 | 14.29% | 99.55% | 0.29 | 74.60% | 92.17% | 0.47 |
| Type II J-protein | 49.06% | 99.05% | 0.57 | 13.33% | 99.82% | 0.31 | 54.72% | 94.67% | 0.39 |
| Type III J-protein | 98.56% | 62.02% | 0.69 | 99.73% | 12.70% | 0.31 | 88.62% | 65.83% | 0.43 |
| Type IV J-protein | 31.81% | 100.00% | 0.56 | 4.55% | 100.00% | 0.21 | 13.64% | 100.00% | 0.37 |
|
| |||||||||
| OA | 94.06% | 89.96% | 85.14% | ||||||
Figure 1Statistical results to show the divergent distributions of the 20 amino acids among the four (I, II, III, and IV) types of J-proteins. The green boxes indicate that the frequency differences among different types of J-proteins are not significant. The blue boxes indicate that the amino acid is significantly enriched (P < 0.05; LSD test) in one type of J-proteins compared with its counterpart. Taking W as an example, the blue box with the coordinate (W, I–IV) indicates that W is enriched in Type I J-proteins compared with Type IV J-proteins. The red boxes indicate that the amino acid is lacking in one type of J-proteins but significantly enriched (P < 0.05; LSD-test) in its counterpart. Also taking W as the example, the two red boxes with the coordinates (W, I–III) and (W, II-III) indicate that W is lacking in both Type I and Type II J-proteins compared with Type III J-proteins, respectively.
Figure 2A semiscreenshot to show the top page of the web server. It is available at http://lin.uestc.edu.cn/server/Jpred.
(a) For the single amino acid case (n = 1)
| Subfamily | Metrics | Feature dimension of | |||||
|---|---|---|---|---|---|---|---|
| CP(20) | CP(13) | CP(11) | CP(9) | CP(8) | CP(5) | ||
| 20 | 13 | 11 | 9 | 8 | 5 | ||
| Type I J-protein | Sn | 71.42% | 65.08% | 68.25% | 52.38% | 50.79% | 22.22% |
| Sp | 98.58% | 98.66% | 98.66% | 98.93% | 98.48% | 98.03% | |
| MCC | 0.71 | 0.67 | 0.69 | 0.60 | 0.56 | 0.26 | |
| Type II J-protein | Sn | 33.96% | 30.19% | 33.96% | 16.98% | 16.98% | 15.09% |
| Sp | 99.82% | 99.21% | 99.12% | 99.47% | 99.56% | 99.09% | |
| MCC | 0.54 | 0.42 | 0.45 | 0.30 | 0.31 | 0.23 | |
| Type III J-protein | Sn | 98.74% | 98.28% | 98.10% | 99.09% | 98.73% | 98.19% |
| Sp | 48.12% | 42.86% | 45.52% | 32.31% | 31.54% | 17.46% | |
| MCC | 59.71% | 0.53 | 0.54 | 0.48 | 0.45 | 0.26 | |
| Type IV J-protein | Sn | 4.54% | 0 | 0 | 0 | 0 | 0 |
| Sp | 99.91% | 100.00% | 100.00% | 100.00% | 100.00% | 100.00% | |
| MCC | 0.15 | 0.53 | 0 | 0 | 0 | 0 | |
|
| |||||||
| OA | 92.93% | 91.97% | 92.13% | 91.48% | 91.08% | 89.08% | |
(b) For the dipeptide case (n = 2)
| Subfamily | Metrics | Feature dimension of | |||||
|---|---|---|---|---|---|---|---|
| CP(20) | CP(13) | CP(11) | CP(9) | CP(8) | CP(5) | ||
| 400 | 169 | 121 | 81 | 64 | 25 | ||
| Type I J-protein | Sn | 74.42% | 60.31% | 73.02% | 60.32% | 58.73% | 49.20% |
| Sp | 97.58% | 98.59% | 98.76% | 97.71% | 98.32% | 97.79% | |
| MCC | 0.75 | 0.63 | 0.73 | 0.58 | 0.60 | 0.5 | |
| Type II J-protein | Sn | 39.76% | 45.23% | 39.62% | 39.62% | 35.84% | 28.30% |
| Sp | 94.31% | 99.29% | 99.48% | 99.03% | 98.60% | 97.99% | |
| MCC | 0.57 | 0.57 | 0.54 | 0.49 | 0.42 | 0.31 | |
| Type III J-protein | Sn | 98.88% | 98.10% | 98.82% | 97.74% | 98.01% | 97.31% |
| Sp | 46.37% | 50.74% | 51.14% | 50.79% | 48.80% | 40.34% | |
| MCC | 60.08% | 0.59 | 0.62 | 0.57 | 0.56 | 0.46 | |
| Type IV J-protein | Sn | 13.16% | 27.27% | 0 | 22.73% | 25.00% | 9.09% |
| Sp | 99.91% | 99.91% | 100.00% | 100.00% | 100.00% | 99.91% | |
| MCC | 0.13 | 0.48 | 0 | 0.47 | 0.47 | 0.24 | |
|
| |||||||
| OA | 91.47% | 92.93% | 93.25% | 91.97% | 92.04% | 91.16% | |