| Literature DB >> 27974244 |
Wenbing Hou1, Qiuhui Pan2, Qianying Peng3, Mingfeng He4.
Abstract
Sequences similarity analysis is one of the major topics in bioinformatics. It helps researchers to reveal evolution relationships of different species. In this paper, we outline a new method to analyze the similarity of proteins by Discrete Fourier Transform (DFT) and Dynamic Time Warping (DTW). The original symbol sequences are converted to numerical sequences according to their physico-chemical properties. We obtain the power spectra of sequences from DFT and extend the spectra to the same length to calculate the distance between different sequences by DTW. Our method is tested in different datasets and the results are compared with that of other software algorithms. In the comparison we find our scheme could amend some wrong classifications appear in other software. The comparison shows our approach is reasonable and effective.Entities:
Keywords: Discrete Fourier Transform; Dynamic Time Warping; Phylogenetic tree; Protein sequences similarity analysis
Mesh:
Substances:
Year: 2016 PMID: 27974244 PMCID: PMC7125777 DOI: 10.1016/j.ygeno.2016.12.002
Source DB: PubMed Journal: Genomics ISSN: 0888-7543 Impact factor: 5.736
Hydropathy and isoelectric point values of 20 amino acids.
| Amino acid | Abbreviation | Hydropathy | Isoelectric point |
|---|---|---|---|
| Isoleucine | I | 4.5 | 6.02 |
| Valine | V | 4.2 | 5.96 |
| Leucine | L | 3.8 | 5.98 |
| Phenylalanine | F | 2.8 | 5.48 |
| Cysteine | C | 2.5 | 5.07 |
| Methionine | M | 1.9 | 5.74 |
| Alanine | A | 1.8 | 6.00 |
| Glycine | G | − 0.4 | 5.97 |
| Threonine | T | − 0.7 | 6.16 |
| Serine | S | − 0.8 | 5.68 |
| Tryptophan | W | − 0.9 | 5.89 |
| Tyrosine | Y | − 1.3 | 5.66 |
| Proline | P | − 1.6 | 6.30 |
| Histidine | H | − 3.2 | 7.59 |
| Aspartic acid | D | − 3.5 | 2.77 |
| Asparagine | N | − 3.5 | 5.41 |
| Glutamic acid | E | − 3.5 | 3.22 |
| Glutamine | Q | − 3.5 | 5.65 |
| Lysine | K | − 3.9 | 9.74 |
| Arginine | R | − 4.5 | 10.76 |
Fig. 1Original sequences and sequences after warping.
Information of sequences used in our test.
| Sequence name | NCBI accession number |
|---|---|
| Blue whale | |
| Bornean orangutan | |
| Cat | |
| Common chimpanzee | |
| Fin whale | |
| gibbon | |
| gorilla | |
| Gray seal | |
| Harbor seal | |
| Human | |
| Horse | |
| Mouse | |
| Opossum | |
| Pigmy chimpanzee | |
| Platypus | |
| Rat | |
| Rhino | |
| Sumatran orangutan | |
| Wallaroo | |
| Tiger | |
| Korean bovine | |
| Spain bovine |
similarity/dissimilarity of 22 kinds of animals.
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0 | |||||||||||||||||||||
| 2 | 0.773 | 0 | ||||||||||||||||||||
| 3 | 0.6073 | 0.7402 | 0 | |||||||||||||||||||
| 4 | 0.7356 | 0.5935 | 0.7373 | 0 | ||||||||||||||||||
| 5 | 0.2693 | 0.7698 | 0.573 | 0.7089 | 0 | |||||||||||||||||
| 6 | 0.8056 | 0.6492 | 0.7325 | 0.6371 | 0.796 | 0 | ||||||||||||||||
| 7 | 0.8362 | 0.7991 | 0.7825 | 0.6393 | 0.8809 | 0.6817 | 0 | |||||||||||||||
| 8 | 0.6922 | 0.787 | 0.6168 | 0.8027 | 0.7342 | 0.7899 | 0.9324 | 0 | ||||||||||||||
| 9 | 0.7234 | 0.7823 | 0.6484 | 0.7766 | 0.7437 | 0.7727 | 0.8938 | 0.2107 | 0 | |||||||||||||
| 10 | 0.7832 | 0.6711 | 0.722 | 0.4971 | 0.7885 | 0.5414 | 0.6569 | 0.7941 | 0.8339 | 0 | ||||||||||||
| 11 | 0.7148 | 0.7234 | 0.6448 | 0.7555 | 0.7189 | 0.7683 | 0.8991 | 0.683 | 0.6984 | 0.7577 | 0 | |||||||||||
| 12 | 0.7184 | 0.8341 | 0.7096 | 0.8027 | 0.7216 | 0.8231 | 0.9665 | 0.7239 | 0.7332 | 0.8207 | 0.7348 | 0 | ||||||||||
| 13 | 0.7284 | 0.8274 | 0.7539 | 0.8621 | 0.7458 | 0.8686 | 1 | 0.7839 | 0.7394 | 0.8784 | 0.7287 | 0.7241 | 0 | |||||||||
| 14 | 0.7525 | 0.6402 | 0.7168 | 0.4056 | 0.7466 | 0.6185 | 0.6591 | 0.8118 | 0.7849 | 0.5381 | 0.7685 | 0.8589 | 0.9143 | 0 | ||||||||
| 15 | 0.7398 | 0.7428 | 0.7173 | 0.7593 | 0.7239 | 0.7997 | 0.8807 | 0.7663 | 0.7657 | 0.7718 | 0.7532 | 0.7776 | 0.8091 | 0.7927 | 0 | |||||||
| 16 | 0.7544 | 0.9225 | 0.7435 | 0.8505 | 0.7864 | 0.7991 | 0.8177 | 0.8257 | 0.8221 | 0.7799 | 0.7279 | 0.8625 | 0.8292 | 0.8865 | 0.8853 | 0 | ||||||
| 17 | 0.7875 | 0.7944 | 0.8335 | 0.8411 | 0.8157 | 0.8174 | 0.8964 | 0.8312 | 0.8217 | 0.852 | 0.7951 | 0.8116 | 0.792 | 0.8712 | 0.8102 | 0.8054 | 0 | |||||
| 18 | 0.6592 | 0.5813 | 0.6342 | 0.658 | 0.6543 | 0.7318 | 0.8514 | 0.7249 | 0.7178 | 0.7364 | 0.6682 | 0.7195 | 0.7576 | 0.7077 | 0.6915 | 0.7887 | 0.7581 | 0 | ||||
| 19 | 0.6858 | 0.7034 | 0.7492 | 0.7788 | 0.6866 | 0.7099 | 0.8833 | 0.7057 | 0.7166 | 0.7887 | 0.6786 | 0.7561 | 0.7009 | 0.7752 | 0.6889 | 0.8266 | 0.8192 | 0.6458 | 0 | |||
| 20 | 0.6882 | 0.8015 | 0.5009 | 0.803 | 0.6389 | 0.809 | 0.8869 | 0.6557 | 0.6988 | 0.7584 | 0.7406 | 0.7652 | 0.7902 | 0.8204 | 0.8072 | 0.7709 | 0.8307 | 0.699 | 0.7963 | 0 | ||
| 21 | 0.6547 | 0.708 | 0.648 | 0.7473 | 0.6951 | 0.6789 | 0.793 | 0.6331 | 0.6644 | 0.7088 | 0.6631 | 0.7599 | 0.7063 | 0.7526 | 0.7416 | 0.7576 | 0.78 | 0.6697 | 0.6959 | 0.6222 | 0 | |
| 22 | 0.6455 | 0.7223 | 0.5916 | 0.747 | 0.6751 | 0.6874 | 0.7846 | 0.6316 | 0.6914 | 0.6847 | 0.652 | 0.7448 | 0.7061 | 0.7319 | 0.7017 | 0.726 | 0.768 | 0.6829 | 0.7218 | 0.5976 | 0.05929 | 0 |
Fig. 2The phylogenetic tree of 22 species based on our algorithm.
Fig. 3The phylogenetic tree of 22 species based on Yau's protein map.
Information of protein sequences used in this paper.
| Sequence name | NCBI accession number |
|---|---|
| A/Adachi/2/1957(H2N2) | |
| A/bar-headed_goose/Qinghai/1/2005(H5N1) | |
| A/Beijing/4/2009(H1N1) | |
| A/Berkeley/1/1968(H2N2) | |
| A/blue-winged_teal/Ohio/566/2006(H7N9) | |
| A/California/1/1966(H2N2) | |
| A/California/04/2009(H1N1) | |
| A/cat/Germany/R606/06(H5N1) | |
| A/chicken/Dongguan/1096/2014(H7N9) | |
| A/Cygnus_olor/Italy/742/2006(H5N1) | |
| A/chicken/Quzhou/2/2015(H7N9) | |
| A/Duck/Ohio/118C/93(H1N1) | |
| A/blue_winged_teal/Louisiana/A00557206/2009(H7N7) | |
| A/canine/Guangxi/1/2011(H9N2) | |
| A/chicken/China/AH-10-01/2010(H9N2) | |
| A/chicken/Hubei/01-MA01/1999(H9N2) | |
| A/chicken/Iran/B263/2004(H9N2) | |
| A/England/1/1961(H2N2) | |
| A/equine/Prague/1/1956(H7N7) | |
| A/equine/Santiago/77(H7N7) | |
| A/fowl/Weybridge(H7N7) | |
| A/Georgia/1/1967(H2N2) | |
| A/goose/Czech_Republic/1848-K9/2009(H7N9) | |
| A/GuangzhouSB/01/2009(H1N1) | |
| A/Nagasaki/07N020/2008(H1N1) | |
| A/lesser_white-fronted_goose/HuNan/412-3Y/2010(H7N7) | |
| A/muscovy_duck/Vietnam/LBM66/2011(H5N1) | |
| A/tree_sparrow/Shanghai/01/2013(H7N9) |
Fig. 4The phylogenetic tree of 28 influenza A virus by Mega 6.06.
Fig. 5The phylogenetic tree of 28 influenza A virus calculated by our method.
Fig. 6The phylogenetic tree of 28 influenza A virus calculated by Clustal X.
Fig. 7The phylogenetic tree of 50 coronavirus spike proteins.