| Literature DB >> 29977111 |
Zhao-Hui Qi1, Ke-Cheng Li1, Jin-Long Ma1, Yu-Hua Yao2, Ling-Yun Liu1.
Abstract
In this article, we propose a 3-dimensional graphical representation of protein sequences based on 10 physicochemical properties of 20 amino acids and the BLOSUM62 matrix. It contains evolutionary information and provides intuitive visualization. To further analyze the similarity of proteins, we extract a specific vector from the graphical representation curve. The vector is used to calculate the similarity distance between 2 protein sequences. To prove the effectiveness of our approach, we apply it to 3 real data sets. The results are consistent with the known evolution fact and show that our method is effective in phylogenetic analysis.Entities:
Keywords: BLOSUM62 matrix; graphical representation; physicochemical properties; protein sequences
Year: 2018 PMID: 29977111 PMCID: PMC6024350 DOI: 10.1177/1176934318777755
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Numerical values about 10 physicochemical properties of 20 amino acids.
| Amino acid | Pro1 | Pro2 | Pro3 | Pro4 | Pro5 | Pro6 | Pro7 | Pro8 | Pro9 | Pro10 |
|---|---|---|---|---|---|---|---|---|---|---|
| A (Ala) | 2.34 | 9.69 | 7.0 | 6.00 | 0.33 | −0.062 | 31 | −0.11 | 0.239 | 8.1 |
| R (Arg) | 2.17 | 9.04 | 9.1 | 10.76 | −0.176 | −0.167 | 124 | 0.079 | 0.211 | 10.5 |
| N (Asn) | 2.02 | 8.80 | 10.0 | 5.41 | −0.233 | 0.166 | 56 | −0.136 | 0.249 | 11.6 |
| D (Asp) | 2.09 | 9.82 | 13.0 | 2.77 | −0.371 | −0.079 | 54 | −0.285 | 0.171 | 13.0 |
| C (Cys) | 1.71 | 10.78 | 4.8 | 5.07 | 0.074 | 0.38 | 55 | −0.184 | 0.22 | 5.5 |
| Q (Gln) | 2.17 | 9.13 | 8.6 | 5.65 | −0.409 | −0.025 | 85 | −0.246 | 0.26 | 10.5 |
| E (Glu) | 2.19 | 9.67 | 12.5 | 3.22 | −0.254 | −0.184 | 83 | −0.067 | 0.187 | 12.3 |
| G (Gly) | 2.34 | 9.60 | 7.9 | 5.97 | 0.37 | −0.017 | 3 | −0.073 | 0.16 | 9.0 |
| H (His) | 1.82 | 9.17 | 8.4 | 7.59 | −0.078 | 0.056 | 96 | 0.32 | 0.205 | 10.4 |
| I (Ile) | 2.36 | 9.68 | 4.9 | 6.02 | 0.149 | −0.309 | 111 | 0.001 | 0.273 | 5.2 |
| L (Leu) | 2.36 | 9.60 | 4.9 | 5.98 | 0.129 | −0.264 | 111 | −0.008 | 0.281 | 4.9 |
| K (Lys) | 2.18 | 8.95 | 10.1 | 9.74 | −0.075 | −0.371 | 119 | 0.049 | 0.228 | 11.3 |
| M (Met) | 2.28 | 9.21 | 5.3 | 5.74 | −0.092 | 0.077 | 105 | −0.041 | 0.253 | 5.7 |
| F (Phe) | 1.83 | 9.13 | 5.0 | 5.48 | 0.011 | 0.074 | 132 | 0.438 | 0.234 | 5.2 |
| P (Pro) | 1.99 | 10.60 | 6.6 | 6.30 | 0.37 | −0.036 | 32.5 | −0.016 | 0.165 | 8.0 |
| S (Ser) | 2.21 | 9.15 | 7.5 | 5.68 | 0.022 | 0.47 | 32 | −0.153 | 0.236 | 9.2 |
| T (Thr) | 2.63 | 10.43 | 6.6 | 6.16 | 0.136 | 0.348 | 61 | −0.208 | 0.213 | 8.6 |
| W (Trp) | 2.38 | 9.39 | 5.2 | 5.89 | 0.011 | 0.05 | 170 | 0.493 | 0.183 | 5.4 |
| Y (Tyr) | 2.20 | 9.11 | 5.4 | 5.66 | −0.138 | 0.22 | 136 | 0.381 | 0.193 | 6.2 |
| V (Val) | 2.32 | 9.62 | 5.6 | 5.96 | 0.245 | 0.212 | 84 | −0.155 | 0.255 | 5.9 |
Pro1, the pK1 (–COOH); pro2, the pK2 (–NH3); pro3, the polar requirement; pro4, the isoelectric point; pro5, the hydrogenation; pro6, the hydroxythiolation; pro7, the molecular volume; pro8, the aromaticity; pro9, the aliphaticity; and pro10, the polarity values.
Grouping information of 20 amino acids after clustering.
| Properties | G1 | G2 | G3 | G4 | G5 | G6 | G7 | G8 | G9 | G10 |
|---|---|---|---|---|---|---|---|---|---|---|
| Pro1 | AGILMWV | HF | RQEKSY | T | NP | C | D | |||
| Pro2 | AEI | QHMFSY | P | RK | W | C | T | N | GLV | D |
| Pro3 | CILF | QH | E | A | NK | MWYV | GS | PT | R | D |
| Pro4 | ANCQGHILMFPSTWYV | RK | DE | |||||||
| Pro5 | ILT | HKM | DQ | GP | NE | FPW | V | RY | A | C |
| Pro6 | IL | HMFW | CT | QGP | YV | RE | S | K | AD | N |
| Pro7 | ILM | APS | W | QEV | NDCT | FY | G | H | RK | |
| Pro8 | ARNDCQEGILKMPSTV | HFWY | ||||||||
| Pro9 | IL | RCHT | AKFS | DGP | EWY | NQMV | ||||
| Pro10 | NK | CILMFWYV | AGPST | DE | RQH |
The similar degree of each pair of amino acids.
| Amino acid | A | R | N | D | C | Q | E | G | H | I | L | K | M | F | P | S | T | W | Y | V |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A (Ala) | −1 | −4 | −4 | 0 | −2 | −2 | 0 | −2 | −4 | −3 | −2 | −3 | −4 | −4 | 5 | 0 | −6 | −2 | 0 | |
| R (Arg) | −1 | 0 | −2 | −6 | 3 | 0 | −2 | 0 | −3 | −2 | 10 | −1 | 0 | −2 | −2 | −2 | 0 | −4 | −3 | |
| N (Asn) | −4 | 0 | 2 | −9 | 0 | 0 | 0 | 1 | −6 | −6 | 0 | −6 | −3 | −6 | 2 | 0 | −4 | −2 | −9 | |
| D (Asp) | −4 | −2 | 2 | −6 | 0 | 6 | −2 | 0 | −3 | −4 | −1 | −3 | 0 | −2 | 0 | −2 | 0 | 0 | −3 | |
| C (Cys) | 0 | −6 | −9 | −6 | −6 | −4 | −6 | −6 | −4 | −4 | −3 | −3 | −6 | −6 | −2 | −5 | −4 | −4 | −3 | |
| Q (Gln) | −2 | 3 | 0 | 0 | −6 | 6 | −6 | 0 | −6 | −4 | 2 | 0 | −6 | −3 | 0 | −2 | −2 | −3 | −8 | |
| E (Glu) | −2 | 0 | 0 | 6 | −4 | 6 | −2 | 0 | −6 | −3 | 2 | −2 | 0 | −1 | 0 | −1 | −3 | −4 | −4 | |
| G (Gly) | 0 | −2 | 0 | −2 | −6 | −6 | −2 | −2 | −12 | −16 | −2 | −9 | −3 | −12 | 0 | −6 | −4 | −3 | −12 | |
| H (His) | −2 | 0 | 1 | 0 | −6 | 0 | 0 | −2 | −3 | −3 | −1 | −8 | −5 | −2 | −2 | −4 | −6 | 6 | −3 | |
| I (Ile) | −4 | −3 | −6 | −3 | −4 | −6 | −6 | −12 | −3 | 18 | −3 | 5 | 0 | −6 | −4 | −3 | −9 | −2 | 12 | |
| L (Leu) | −3 | −2 | −6 | −4 | −4 | −4 | −3 | −16 | −3 | 18 | −2 | 10 | 0 | −6 | −4 | −3 | −6 | −2 | 5 | |
| K (Lys) | −2 | 10 | 0 | −1 | −3 | 2 | 2 | −2 | −1 | −3 | −2 | −2 | −3 | −1 | 0 | −1 | 0 | −2 | −2 | |
| M (Met) | −3 | −1 | −6 | −3 | −3 | 0 | −2 | −9 | −8 | 5 | 10 | −2 | 0 | −4 | −3 | −2 | −5 | −4 | 6 | |
| F (Phe) | −4 | 0 | −3 | 0 | −6 | −6 | 0 | −3 | −5 | 0 | 0 | −3 | 0 | −4 | −8 | −2 | 5 | 15 | −2 | |
| P (Pro) | −4 | −2 | −6 | −2 | −6 | −3 | −1 | −12 | −2 | −6 | −6 | −1 | −4 | −4 | −4 | −4 | −4 | −3 | −4 | |
| S (Ser) | 5 | −2 | 2 | 0 | −2 | 0 | 0 | 0 | −2 | −4 | −4 | 0 | −3 | −8 | −4 | 3 | −6 | −6 | −4 | |
| T (Thr) | 0 | −2 | 0 | −2 | −5 | −2 | −1 | −6 | −4 | −3 | −3 | −1 | −2 | −2 | −4 | 3 | −2 | −2 | 0 | |
| W (Trp) | −6 | 0 | −4 | 0 | −4 | −2 | −3 | −4 | −6 | −9 | −6 | 0 | −5 | 5 | −4 | −6 | −2 | 10 | −12 | |
| Y (Tyr) | −2 | −4 | −2 | 0 | −4 | −3 | −4 | −3 | 6 | −2 | −2 | −2 | −4 | 15 | −3 | −6 | −2 | 10 | −4 | |
| V (Val) | 0 | −3 | −9 | −3 | −3 | −8 | −4 | −12 | −3 | 12 | 5 | −2 | 6 | −2 | −4 | −4 | 0 | −12 | −4 |
Slope, intercept, and linear equation of each amino acid similarity degree sequence.
| Amino acid (X) | Slope (a) | Intercept (b) | Linear equation |
|---|---|---|---|
| A (Ala) | 0.05 | −2.46 | |
| R (Arg) | −0.05 | −0.4 | |
| N (Asn) | −0.14 | −1.23 | |
| D (Asp) | 0.01 | −1.35 | |
| C (Cys) | 0.09 | −5.44 | |
| Q (Gln) | −0.22 | 0.26 | |
| E (Glu) | −0.19 | 1.0 | |
| G (Gly) | −0.3 | −2.25 | |
| H (His) | −0.08 | −1.28 | |
| I (Ile) | 0.32 | −5.26 | |
| L (Leu) | 0.23 | −4.16 | |
| K (Lys) | −0.19 | 1.33 | |
| M (Met) | 0.16 | −3.4 | |
| F (Phe) | 0.32 | −4.61 | |
| P (Pro) | 0.02 | −4.26 | |
| S (Ser) | −0.36 | 1.74 | |
| T (Thr) | 0.05 | −2.51 | |
| W (Trp) | 0.06 | −3.65 | |
| Y (Tyr) | 0.23 | −3.11 | |
| V (Val) | 0.05 | −3.09 |
Figure 1.Graphical representation of the protein I and protein II by our method.
The similarity matrix for the 9 ND5 protein sequences.
| Species | Human | Common chimpanzee | Pygmy chimpanzee | Gorilla | Fin whale | Blue whale | Rat | Mouse | Opossum |
|---|---|---|---|---|---|---|---|---|---|
| Human | 0 | 0.03204 | 0.04244 | 0.04723 | 0.07263 | 0.07744 | 0.18081 | 0.21400 | 0.24337 |
| Common chimpanzee | 0 | 0.02842 | 0.04783 | 0.08253 | 0.08917 | 0.17555 | 0.21422 | 0.24196 | |
| Pygmy chimpanzee | 0 | 0.05134 | 0.07250 | 0.07938 | 0.16509 | 0.20525 | 0.22669 | ||
| Gorilla | 0 | 0.06761 | 0.07679 | 0.17006 | 0.20336 | 0.23017 | |||
| Fin whale | 0 | 0.02457 | 0.16685 | 0.18642 | 0.20773 | ||||
| Blue whale | 0 | 0.16515 | 0.18039 | 0.21064 | |||||
| Rat | 0 | 0.07539 | 0.11658 | ||||||
| Mouse | 0 | 0.12586 | |||||||
| Opossum | 0 |
The information of 29 spike protein sequences.
| Number | Abbreviation | Access number |
|---|---|---|
| 1 | TGEVG | CAB91145 |
| 2 | TGEV | NP 058424 |
| 3 | PEDVC | AAK38656 |
| 4 | PEDV | NP 598310 |
| 5 | HCoVOC43 | NP 937950 |
| 6 | BCoVE | AAK83356 |
| 7 | BCoVL | AAL57308 |
| 8 | BCoVM | AAA66399 |
| 9 | BCoVQ | AAL40400 |
| 10 | MHVA | AAB86819 |
| 11 | MHVJHM | YP 209233 |
| 12 | MHVP | AAF69334 |
| 13 | MHVM | AAF69344 |
| 14 | IBVBJ | AAP92675 |
| 15 | IBVC | AAS00080 |
| 16 | IBV | NP 040831 |
| 17 | GD03T0013 | AAS10463 |
| 18 | PC4127 | AAU93318 |
| 19 | PC4137 | AAV49720 |
| 20 | PC4205 | AAU93319 |
| 21 | civet007 | AAU04646 |
| 22 | civet010 | AAU04649 |
| 23 | A022 | AAV91631 |
| 24 | GD01 | AAP51227 |
| 25 | GZ02 | AAS00003 |
| 26 | BJ01 | AAP30030 |
| 27 | FRA | AAP50485 |
| 28 | TOR2 | AAP41037 |
| 29 | TaiwanTC1 | AAQ01597 |
Figure 2.The phylogenetic tree of the 29 spike proteins of coronavirus using our method.
Figure 3.The phylogenetic tree of the 560 influenza A (H1N1) isolates from March to April 2009 by our method.
Figure 4.The phylogenetic tree of the 560 influenza A (H1N1) isolates from March to April 2009 using ClustalW method under MEGA6.0 software.