| Literature DB >> 24899916 |
Wei Deng1, Yihui Luan2.
Abstract
Based on the detailed hydrophobic-hydrophilic(HP) model of amino acids, we propose dual-vector curve (DV-curve) representation of protein sequences, which uses two vectors to represent one alphabet of protein sequences. This graphical representation not only avoids degeneracy, but also has good visualization no matter how long these sequences are, and can reflect the length of protein sequence. Then we transform the 2D-graphical representation into a numerical characterization that can facilitate quantitative comparison of protein sequences. The utility of this approach is illustrated by two examples: one is similarity/dissimilarity comparison among different ND6 protein sequences based on their DV-curve figures the other is the phylogenetic analysis among coronaviruses based on their spike proteins.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24899916 PMCID: PMC4034481 DOI: 10.1155/2014/203871
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Figure 1The representation of four alphabets of DV-curve: (a) B 1, (b) B 2, (c) B 3, and (d) B 4.
Figure 2The DV-curve of sequence “WTFESR.”
Figure 3The DV-curve graphical representations of different ND6 proteins.
The information of 35 coronavirus spike proteins.
| Number | Accession number | Abbreviation notation | Length (aa) | Group |
|---|---|---|---|---|
| 1 | P10033 | FCoV1 | 1452 | I |
| 2 | Q66928 | FCoV2 | 1454 | I |
| 3 | Q91AV1 | PEDV3 | 1383 | I |
| 4 | Q9DY22 | TGEV4 | 1449 | I |
| 5 | P18450 | TGEV5 | 1449 | I |
| 6 | P36300 | CCoV6 | 1451 | I |
| 7 | Q9J3E7 | MHV7 | 1324 | II |
| 8 | Q83331 | MHV8 | 1361 | II |
| 9 | P11224 | MHV9 | 1324 | II |
| 10 | O55253 | MHV10 | 1360 | II |
| 11 | Q9IKD1 | RtCoV11 | 1360 | II |
| 12 | P25190 | BCoV12 | 1363 | II |
| 13 | P15777 | BCoV13 | 1363 | II |
| 14 | Q9QAR5 | BCoV14 | 1363 | II |
| 15 | P36334 | BCoV15 | 1363 | II |
| 16 | P36334 | HCoV16 | 1353 | II |
| 17 | Q82666 | IBV17 | 1166 | III |
| 18 | P05135 | IBV18 | 1163 | III |
| 19 | P12722 | IBV19 | 1154 | III |
| 20 | Q64930 | IBV20 | 1168 | III |
| 21 | Q82624 | IBV21 | 1159 | III |
| 22 | P11223 | IBV22 | 1162 | III |
| 23 | Q98Y27 | IBV23 | 1162 | III |
| 24 | AAP41037 | SCoV24 | 1255 | IV |
| 25 | AAP300030 | SCoV25 | 1255 | IV |
| 26 | AAR91586 | SCoV26 | 1255 | IV |
| 27 | AAP51227 | SCoV27 | 1255 | IV |
| 28 | AAP33697 | SCoV28 | 1255 | IV |
| 29 | AAP13441 | SCoV29 | 1255 | IV |
| 30 | AAQ01597 | SCoV30 | 1255 | IV |
| 31 | AAU81608 | SCoV31 | 1255 | IV |
| 32 | AAS00003 | SCoV32 | 1255 | IV |
| 33 | AAR86788 | SCoV33 | 1255 | IV |
| 34 | AAR23250 | SCoV34 | 1255 | IV |
| 35 | AAT76147 | SCoV35 | 1255 | IV |
Figure 4The phylogenetic tree based on the spike proteins.