| Literature DB >> 25705698 |
Yan-ping Zhang1, Ya-jun Sheng2, Wei Zheng3, Ping-an He4, Ji-shuo Ruan3.
Abstract
The hydrophobicity and hydrophilicity of amino acids play a very important role in protein folding and its interaction with the environment and other molecules, as well as its catalytic mechanism. Based on the two physicochemical indexes, a 2D graphical representation of protein sequences is introduced; meanwhile, a new numerical characteristic has been proposed to compute the distance of different sequences for analysis of sequence similarity/dissimilarity on the basis of this graphical representation. Furthermore, we apply the new distance in the similarities/dissimilarities of ND5 proteins of nine species and predict the four major classes based on the dataset containing 639 domains. The results show that the method is simple and effective.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25705698 PMCID: PMC4332462 DOI: 10.1155/2015/909567
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
The EH 0 and Hp 0 values of 20 AAs and their coordinates in the 2D-Cartesian derived from (1).
| Amino acid | Code | EH0 | Hp0 | EH1 | Hp1 |
|---|---|---|---|---|---|
| Alanine | A | 0.62 | 1.8 | 0.62 | 2.29 |
| Cysteine | C | 0.29 | 2.5 | 0.29 | 2.99 |
| Aspartate | D | −0.9 | −3.5 | −0.9 | −3.01 |
| Glutamate | E | −0.74 | −3.5 | −0.74 | −3.01 |
| Phenylalanine | F | 1.19 | 2.8 | 1.19 | 3.29 |
| Glycine | G | 0.48 | −0.4 | 0.48 | 0.09 |
| Histidine | H | −0.4 | −3.2 | −0.4 | −2.71 |
| Isoleucine | I | 1.38 | 4.5 | 1.38 | 4.99 |
| Lysine | K | −1.5 | −3.9 | −1.5 | −3.41 |
| Leucine | L | 1.06 | 3.8 | 1.06 | 4.29 |
| Methionine | M | 0.64 | 1.9 | 0.64 | 2.39 |
| Asparagine | N | −0.78 | −3.5 | −0.78 | −3.01 |
| Proline | P | 0.12 | −1.6 | 0.12 | −1.11 |
| Glutamine | Q | −0.85 | −3.5 | −0.85 | −3.01 |
| Arginine | R | −2.53 | −4.5 | −2.53 | −4.01 |
| Serine | S | −0.18 | −0.8 | −0.18 | −0.31 |
| Threonine | T | −0.05 | −0.7 | −0.05 | −0.21 |
| Valine | V | 1.08 | 4.2 | 1.08 | 4.69 |
| Tryptophan | W | 0.81 | −0.9 | 0.81 | −0.41 |
| Tyrosine | Y | 0.26 | −1.3 | 0.26 | −0.81 |
Protein I: WTFESRNKPAKDPVILWLNGGPGCSSLTGL.
Protein II: WFFESRNKPANDPIILWLNGGPGCSSFTGL.
Figure 1The two curves of protein sequences I and II in the coordinate value.
The slope difference distances of ND5 proteins of nine species by our approach.
| Gorilla | Pygmy | Common | Fin whale | Blue whale | Rat | Mouse | Opossum | |
|---|---|---|---|---|---|---|---|---|
| Human |
|
|
| 0.7717 | 0.7816 | 0.8681 | 0.8075 | 1.5101 |
| Gorilla |
|
| 0.7824 | 0.7899 | 0.9509 | 0.8444 | 1.6152 | |
| Pygmy |
| 0.7747 | 0.7843 | 0.8898 | 0.8082 | 1.5345 | ||
| Common | 0.7588 | 0.7700 | 0.8909 | 0.7701 | 1.5315 | |||
| Fin whale |
| 0.7588 | 0.7314 | 1.4427 | ||||
| Blue whale | 0.7947 | 0.7452 | 1.4880 | |||||
| Rat |
| 1.4290 | ||||||
| Mouse | 1.3969 |
The distance matrix for ND5 proteins of nine species calculated by ClustalW.
| Gorilla | Pygmy | Common | Fin whale | Blue whale | Rat | Mouse | Opossum | |
|---|---|---|---|---|---|---|---|---|
| Human |
|
|
| 41.0 | 41.3 | 50.2 | 48.9 | 50.4 |
| Gorilla |
|
| 42.7 | 42.4 | 51.4 | 49.9 | 54.0 | |
| Pygmy |
| 40.1 | 40.1 | 50.2 | 48.9 | 50.1 | ||
| Common | 40.4 | 40.4 | 50.8 | 49.6 | 51.4 | |||
| Fin whale |
| 45.3 | 46.8 | 52.7 | ||||
| Blue whale | 45.0 | 45.9 | 52.7 | |||||
| Rat |
| 54.0 | ||||||
| Mouse | 50.8 |
Figure 2The correlation analysis between ClustalW and other methods.
Comparison of Jackknife Accuracies of Different Classification and algorithm.
| Dataset | Algorithm | Accuracy (%) | ||||
|---|---|---|---|---|---|---|
| All-α | All-β | α/β | α + β | Overall | ||
| 639 domains (25% sequence identity) | SVM [ | 73.91 | 61.04 | 81.92 | 33.92 | 62.34 |
| IB1 [ | 53.62 | 46.10 | 68.93 | 34.50 | 50.94 | |
| C4.5 [ | 59.42 | 49.35 | 58.19 | 28.65 | 48.44 | |
| Naive Bayes [ | 55.07 | 62.34 | 80.26 | 19.88 | 54.38 | |
| Logistic regression [ | 69.57 | 58.44 | 61.58 | 29.82 | 54.06 | |
|
| 54.35 | 36.36 | 77.97 | 37.06 | 51.96 | |
| Our method | 54.71 |
| 72.32 |
|
| |
The other four Jackknife performance of different classification using our method.
| Classes | Sensitivity (%) | Specificity (%) | MCC (%) | AUC (%) |
|---|---|---|---|---|
| All-α | 52.97 | 61.40 | 11.64 | 60.93 |
| All-β | 61.36 | 64.89 | 25.97 | 61.63 |
| α/β | 65.25 | 91.58 | 50.36 | 87.51 |
| α + β | 52.14 | 57.14 | 8.21 | 53.51 |
Figure 3The ROC curve about the four classes (all-α, all-β, α/β, and α + β) and AUC values, respectively.