Literature DB >> 29977111

Novel Method of 3-Dimensional Graphical Representation for Proteins and Its Application.

Zhao-Hui Qi1, Ke-Cheng Li1, Jin-Long Ma1, Yu-Hua Yao2, Ling-Yun Liu1.   

Abstract

In this article, we propose a 3-dimensional graphical representation of protein sequences based on 10 physicochemical properties of 20 amino acids and the BLOSUM62 matrix. It contains evolutionary information and provides intuitive visualization. To further analyze the similarity of proteins, we extract a specific vector from the graphical representation curve. The vector is used to calculate the similarity distance between 2 protein sequences. To prove the effectiveness of our approach, we apply it to 3 real data sets. The results are consistent with the known evolution fact and show that our method is effective in phylogenetic analysis.

Entities:  

Keywords:  BLOSUM62 matrix; graphical representation; physicochemical properties; protein sequences

Year:  2018        PMID: 29977111      PMCID: PMC6024350          DOI: 10.1177/1176934318777755

Source DB:  PubMed          Journal:  Evol Bioinform Online        ISSN: 1176-9343            Impact factor:   1.625


Introduction

With the number of available biological sequences developing rapidly, how to mine essential information from a huge amount of biological sequences effectively and reliably has become a critical problem. As a result, many methods in information extraction are proposed by researchers. Among them, the graphical representation of DNA sequences is an effective method for the virtualization and similarity analysis. Graphical representation is a kind of alignment-free method. It provides intuitive information of data by visualization of biological sequences. What is more, it is more generally applicable because its mathematical description of data facilitates numerical analysis without difficult calculations. Therefore, numerous works based on graphical representation have been presented by researchers.[1-8] For example, Randić et al[1] proposed a graphical representation of RNA secondary structure based on twelve symbols. Bielińska-Waż et al[5] proposed a 2D-dynamic representation of DNA sequences in 2007. After that they proposed more dynamic representations of DNA sequences for generalization.[6,7] However, the graphical representation of protein sequences is much more difficult because there are 20 amino acids instead of 4 nucleotides. Various approaches have been proposed by researchers only until recently.[9-16] Among them, many approaches are based primarily on the physicochemical properties of amino acids. Randić[9] early proposed a 2-dimensional graphical representation of proteins based on a pair of physicochemical properties in 2007. After that, Yu et al[11] proposed a protein mapping method of protein sequences based on 10 physicochemical properties. Wang et al[10] presented a graphical representation of protein sequences based on 9 physicochemical properties. In the works by He et al[15] and Hu,[16] the physicochemical properties are also indispensable in information extraction from proteins because they have effects on the rate and pattern of protein evolution. From these, we can see that physicochemical properties are widely applied with graphical representation of protein sequences by these researchers and their results seem well. In this article, we propose a 3-dimensional (3D) graphic representation of protein sequences based on 10 physicochemical properties[17-21] of amino acids and the BLOSUM62 matrix.[22] The representation can provide good visualization without degeneracy or circuit. Then, we extract a specific vector from the graphical curve of a protein sequence. In addition, we proposed 2 applications based on the vector to analyze the similarity and evolutionary relationship of 3 data sets, respectively. The results are consistent with the evolution fact and works by other researchers. This shows our approach can be applied to hundreds of sequences with different lengths and perform well.

Methods

As we know, a protein sequence is usually composed of 20 kinds of amino acids. Every amino acid has its own particular physicochemical properties. Therefore, to mine essential information from a protein sequence, we propose an effective graphical method combining physicochemical properties of amino acids and the BLOSUM62 matrix.

BLOSUM62 matrix

BLOSUM62 matrix by Henikoff and Henikoff[22] is a substitution matrix applied to the alignment of protein sequences. The values of the BLOSUM62 matrix represent the probability that one amino acid is replaced by other amino acids. In their scoring scheme, a positive score represents a higher similarity between 2 amino acids and a negative score represents a lower similarity.

Physicochemical properties of amino acids

Here, we consider 10 primary physicochemical properties of amino acids, such as the pK1 (–COOH),[17] the pK2 (–NH3),[21] the polar requirement,[21] the isoelectric point,[18] the hydrogenation,[20] the hydroxythiolation,[20] the molecular volume,[19] the aromaticity,[20] the aliphaticity,[20] and the polarity values.[19] The 10 physicochemical properties of 20 amino acids are shown in Table 1.
Table 1.

Numerical values about 10 physicochemical properties of 20 amino acids.

Amino acidPro1Pro2Pro3Pro4Pro5Pro6Pro7Pro8Pro9Pro10
A (Ala)2.349.697.06.000.33−0.06231−0.110.2398.1
R (Arg)2.179.049.110.76−0.176−0.1671240.0790.21110.5
N (Asn)2.028.8010.05.41−0.2330.16656−0.1360.24911.6
D (Asp)2.099.8213.02.77−0.371−0.07954−0.2850.17113.0
C (Cys)1.7110.784.85.070.0740.3855−0.1840.225.5
Q (Gln)2.179.138.65.65−0.409−0.02585−0.2460.2610.5
E (Glu)2.199.6712.53.22−0.254−0.18483−0.0670.18712.3
G (Gly)2.349.607.95.970.37−0.0173−0.0730.169.0
H (His)1.829.178.47.59−0.0780.056960.320.20510.4
I (Ile)2.369.684.96.020.149−0.3091110.0010.2735.2
L (Leu)2.369.604.95.980.129−0.264111−0.0080.2814.9
K (Lys)2.188.9510.19.74−0.075−0.3711190.0490.22811.3
M (Met)2.289.215.35.74−0.0920.077105−0.0410.2535.7
F (Phe)1.839.135.05.480.0110.0741320.4380.2345.2
P (Pro)1.9910.606.66.300.37−0.03632.5−0.0160.1658.0
S (Ser)2.219.157.55.680.0220.4732−0.1530.2369.2
T (Thr)2.6310.436.66.160.1360.34861−0.2080.2138.6
W (Trp)2.389.395.25.890.0110.051700.4930.1835.4
Y (Tyr)2.209.115.45.66−0.1380.221360.3810.1936.2
V (Val)2.329.625.65.960.2450.21284−0.1550.2555.9

Pro1, the pK1 (–COOH); pro2, the pK2 (–NH3); pro3, the polar requirement; pro4, the isoelectric point; pro5, the hydrogenation; pro6, the hydroxythiolation; pro7, the molecular volume; pro8, the aromaticity; pro9, the aliphaticity; and pro10, the polarity values.

Numerical values about 10 physicochemical properties of 20 amino acids. Pro1, the pK1 (–COOH); pro2, the pK2 (–NH3); pro3, the polar requirement; pro4, the isoelectric point; pro5, the hydrogenation; pro6, the hydroxythiolation; pro7, the molecular volume; pro8, the aromaticity; pro9, the aliphaticity; and pro10, the polarity values.

The 3D graphical representation of protein sequences based on Blosum62 matrix and physicochemical properties of amino acids

For each physicochemical property, we will use K-means clustering method[23] to classify the 20 amino acids into several groups. K-means clustering is an efficient unsupervised clustering method which is widely used in a diverse range of fields such as data mining, bioinformatics, and natural language processing.[24] However, there are some weaknesses in K-means. K-means needs to be given the number of clusters beforehand. Silhouette[25] is a cluster validity index that can be used to determine the number of clusters. It considers 2 factors: cohesion and separation. Its value ranges from −1 to 1 and a higher value represents a better effect of clustering. According to this index, we can obtain a valid number of clusters of the given data set. In this way, we can obtain 10 kinds of clustering classification based on the 10 different properties, which are shown in Table 2.
Table 2.

Grouping information of 20 amino acids after clustering.

PropertiesG1G2G3G4G5G6G7G8G9G10
Pro1AGILMWVHFRQEKSYTNPCD
Pro2AEIQHMFSYPRKWCTNGLVD
Pro3CILFQHEANKMWYVGSPTRD
Pro4ANCQGHILMFPSTWYVRKDE
Pro5ILTHKMDQGPNEFPWVRYAC
Pro6ILHMFWCTQGPYVRESKADN
Pro7ILMAPSWQEVNDCTFYGHRK
Pro8ARNDCQEGILKMPSTVHFWY
Pro9ILRCHTAKFSDGPEWYNQMV
Pro10NKCILMFWYVAGPSTDERQH
Grouping information of 20 amino acids after clustering. According to the property pK1 (–COOH), we can divide the 20 amino acids into 7 groups: G1 (A, G, I, L, M, W, V), G2 (H, F), G3 (Q, E, K, S, Y), G4 (T), G5 (N, P), G6 (C), and G7 (D). If 2 or more amino acids are divided into the same group, it denotes that they are similar to each other by the property pK1 (–COOH). Taking all the properties into consideration, we can obtain the number of similar properties between each pair of amino acids. If denotes an amino acid and denotes another amino acid, then we define the similar degree of 2 amino acids as follows: where is the number of similar properties between amino acid and . is the value of amino acid and in the Blosum62 matrix. Then, we calculate all the values of and the result is shown in Table 3.
Table 3.

The similar degree of each pair of amino acids.

Amino acidARNDCQEGHILKMFPSTWYV
A (Ala)−1−4−40−2−20−2−4−3−2−3−4−450−6−20
R (Arg)−10−2−630−20−3−210−10−2−2−20−4−3
N (Asn)−402−90001−6−60−6−3−620−4−2−9
D (Asp)−4−22−606−20−3−4−1−30−20−200−3
C (Cys)0−6−9−6−6−4−6−6−4−4−3−3−6−6−2−5−4−4−3
Q (Gln)−2300−66−60−6−420−6−30−2−2−3−8
E (Glu)−2006−46−20−6−32−20−10−1−3−4−4
G (Gly)0−20−2−6−6−2−2−12−16−2−9−3−120−6−4−3−12
H (His)−2010−600−2−3−3−1−8−5−2−2−4−66−3
I (Ile)−4−3−6−3−4−6−6−12−318−350−6−4−3−9−212
L (Leu)−3−2−6−4−4−4−3−16−318−2100−6−4−3−6−25
K (Lys)−2100−1−322−2−1−3−2−2−3−10−10−2−2
M (Met)−3−1−6−3−30−2−9−8510−20−4−3−2−5−46
F (Phe)−40−30−6−60−3−500−30−4−8−2515−2
P (Pro)−4−2−6−2−6−3−1−12−2−6−6−1−4−4−4−4−4−3−4
S (Ser)5−220−2000−2−4−40−3−8−43−6−6−4
T (Thr)0−20−2−5−2−1−6−4−3−3−1−2−2−43−2−20
W (Trp)−60−40−4−2−3−4−6−9−60−55−4−6−210−12
Y (Tyr)−2−4−20−4−3−4−36−2−2−2−415−3−6−210−4
V (Val)0−3−9−3−3−8−4−12−3125−26−2−4−40−12−4
The similar degree of each pair of amino acids. From Table 3, we can find that the similarity degree of different amino acids can be numerically different from each other. To describe the similarity degree graphically, we will use a unitary linear regression to extract characters from every amino acid. Here, we take I = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19} as independent variables of linear regression. For amino acid X, we take the 19 values in its corresponding row in Table 3 as dependent variables. Using unitary linear regression, we can obtain the corresponding slope and intercept of amino acid X. The slope and intercept can describe amino acid X effectively. All the slopes and intercepts of 20 amino acids are given in Table 4.
Table 4.

Slope, intercept, and linear equation of each amino acid similarity degree sequence.

Amino acid (X)Slope (a)Intercept (b)Linear equation
A (Ala)0.05−2.46y = 0.05x + (−2.46)
R (Arg)−0.05−0.4y = −0.05x + (−0.4)
N (Asn)−0.14−1.23y = −0.14x + (−1.23)
D (Asp)0.01−1.35y = 0.01x + (−1.35)
C (Cys)0.09−5.44y = 0.09x + (−5.44)
Q (Gln)−0.220.26y = −0.22x + (0.26)
E (Glu)−0.191.0y = −0.19x + (1.0)
G (Gly)−0.3−2.25y = −0.3x + (−2.25)
H (His)−0.08−1.28y = −0.08x + (−1.28)
I (Ile)0.32−5.26y = 0.32x + (−5.26)
L (Leu)0.23−4.16y = 0.23x + (−4.16)
K (Lys)−0.191.33y = −0.19x + (1.33)
M (Met)0.16−3.4y = 0.16x + (−3.4)
F (Phe)0.32−4.61y = 0.32x + (−4.61)
P (Pro)0.02−4.26y = 0.02x + (−4.26)
S (Ser)−0.361.74y = −0.36x + (1.74)
T (Thr)0.05−2.51y = 0.05x + (−2.51)
W (Trp)0.06−3.65y = 0.06x + (−3.65)
Y (Tyr)0.23−3.11y = 0.23x + (−3.11)
V (Val)0.05−3.09y = 0.05x + (−3.09)
Slope, intercept, and linear equation of each amino acid similarity degree sequence. We assume that is an arbitrary protein sequence composed of n amino acids. If , , and represent the 3D coordinates of in the protein sequence, then the 3D representation of a protein sequence will be constructed as follows: where and represent the slope and intercept of . In addition, the initial condition is . Next, we can convert the n points into a graphical curve. To demonstrate the effectiveness of the 3D graphical method, we take 2 protein sequences as an example. Both the sequences are taken from yeast Saccharomyces cerevisiae.[26] The graphical representations of 2 protein sequences are shown in Figure 1.
Figure 1.

Graphical representation of the protein I and protein II by our method.

Graphical representation of the protein I and protein II by our method. Protein I: WTFESRNDPAKDPVILWLNGGPGCSSLTGL Protein II: WFFESRNDPANDPIILWLNGGPGCSSFTGL As can be seen from Figure 1, the 2 curves are similar to each other. Furthermore, we can see that there are some differences between 2 figures in , , , and . This indicates the effectiveness of the proposed 3D graphical representation method.

Numerical Characterization and Similarity Analysis of Proteins

Based on the constructed graphical curve, we can get a specific vector from a protein sequence. Using this vector we can analyze the similarity between 2 protein sequences effectively.

40-dimensional characteristic vector

Characteristic vector is a common method to calculate the pairwise distance between 2 protein sequences. A good characteristic vector should avoid the problem about different lengths of sequences and complicated calculation. Here, we define a 2-tuple of amino acid for characterization. Given a protein sequence composed of amino acids , we can compute the 2-tuple as follows: where and are, respectively, slope and intercept of amino acid in Table 4. is the number of amino acid in the sequence. From equation (3), we can see that indicates the proportion of the amino acid in the whole sequence. Taking proportion into consideration, we can eliminate the effects of the lengths of proteins. Thus, for each kind of amino acid, we get a 2-tuple for characterization. As there are 20 amino acids, we can construct a 40-dimensional characteristic vector . The component order is the same with Table 4. Taking a short segment of 10 amino acids, AARRARRNNN, as an example, the numbers of amino acid A, R, and N in the segment are 3, 4, and 3. Therefore, we can obtain the 40-dimensional characterizing vector (0.015, −0.738, −0.02, −0.16, −0.042, −0.369, 0, 0, . . ., 0, 0) according to Table 4 and equation (3).

Similarity analysis

The similarity/dissimilarity between 2 protein sequences can be represented by similar distance. There are several calculating methods for measurement of similar distance such as Euclidean distance, City Block distance, and Manhattan distance. Here, we use Euclidean distance as a measure to represent the similarity/dissimilarity between 2 sequences. We will compute the similarity distance using the 40-dimensional characteristic vector. If the two 40-dimensional characteristic vectors are denoted as their Euclidean distance is calculated as follows: The smaller the distance is, the more similar 2 protein sequences are.

Applications and Discussion

Similarity analysis of 9 ND5 proteins and 29 spike proteins

To show the effectiveness of the proposed similarity analysis method, we apply it to 9 ND5 protein sequences (provided as Supplementary File 1): human, common chimpanzee, pygmy chimpanzee, gorilla, fin whale, blue whale, rat, mouse, and opossum (their accession number in NCBI [National Center for Biotechnology Information] are AP_000649, NP_008196, NP_008209, NP_008222, NP_006899, NP_007066, AP_004902, NP_904338, and NP_007105, respectively). According to the method given in section “Similarity analysis,” we can obtain the similarity distance matrix of these protein sequences. The corresponding result is shown in Table 5.
Table 5.

The similarity matrix for the 9 ND5 protein sequences.

SpeciesHumanCommon chimpanzeePygmy chimpanzeeGorillaFin whaleBlue whaleRatMouseOpossum
Human00.032040.042440.047230.072630.077440.180810.214000.24337
Common chimpanzee00.028420.047830.082530.089170.175550.214220.24196
Pygmy chimpanzee00.051340.072500.079380.165090.205250.22669
Gorilla00.067610.076790.170060.203360.23017
Fin whale00.024570.166850.186420.20773
Blue whale00.165150.180390.21064
Rat00.075390.11658
Mouse00.12586
Opossum0
The similarity matrix for the 9 ND5 protein sequences. On the basis of Table 5, we can find that the distance between fin whale and blue whale is the smallest. This indicates that they have a high degree of similarity. The distance between human, common chimpanzee, pygmy chimpanzee, and gorilla is relatively small, which means that they are similar to each other. Besides, opossum is quite dissimilar to other species because the similarity distances between opossum and other species are large. All these results are consistent with the evolution theory and the recent studies.[14-16] That is to say the proposed method can analyze the similarities of proteins effectively. To further demonstrate the effectiveness of our method, we apply it to another data set which is widely used in many works.[10,27] This data set consists of 29 spike protein sequences of coronavirus (provided as Supplementary File 2). The basic information of the protein sequences is shown in Table 6. We construct the phylogenetic tree for the 29 spike protein sequences based on our method using UPMGA method in Figure 2. From Figure 2, we can see that all the sequences are mainly classified into 4 groups by our method. This is consistent with the works[10,27] and the known biology fact that coronavirus are always classified into 4 groups: the group I (contains PEDV, TGEV), the group II (contains BCoV, MHV, RtCoV), the group III (contains IBV, TCoV), and the SARS-CoVs (severe acute respiratory syndrome coronavirus).
Table 6.

The information of 29 spike protein sequences.

NumberAbbreviationAccess number
1TGEVGCAB91145
2TGEVNP 058424
3PEDVCAAK38656
4PEDVNP 598310
5HCoVOC43NP 937950
6BCoVEAAK83356
7BCoVLAAL57308
8BCoVMAAA66399
9BCoVQAAL40400
10MHVAAAB86819
11MHVJHMYP 209233
12MHVPAAF69334
13MHVMAAF69344
14IBVBJAAP92675
15IBVCAAS00080
16IBVNP 040831
17GD03T0013AAS10463
18PC4127AAU93318
19PC4137AAV49720
20PC4205AAU93319
21civet007AAU04646
22civet010AAU04649
23A022AAV91631
24GD01AAP51227
25GZ02AAS00003
26BJ01AAP30030
27FRAAAP50485
28TOR2AAP41037
29TaiwanTC1AAQ01597
Figure 2.

The phylogenetic tree of the 29 spike proteins of coronavirus using our method.

The information of 29 spike protein sequences. The phylogenetic tree of the 29 spike proteins of coronavirus using our method.

Similarity analysis of 560 gene sequences of influenza A (H1N1) virus

In this section, we give an application for the similarity analysis of HA gene sequences of influenza A (H1N1) from March 1, 2009 to April 30, 2009 (available online at https://www.ncbi.nlm.nih.gov). We obtain a data set that consists of 560 gene sequences with full length (provided as Supplementary File 3). To further demonstrate the validity of our method, we apply the method to this data set. According to our method, for each virus isolate, we can get a corresponding 20-dimensional vector. Thus, we can obtain a vector set of 560 vectors. By computing the similarity distance between pairs of these vectors, we can obtain a similarity distance matrix. Next, we construct the phylogenetic tree based on our method in Figure 3. To analyze the results better, we mark 2 typical strains: A/California/07/2009 (H1N1) and A/Indiana/08/2009 (H1N1). From Figure 3, it is easy to identify that all virus isolates are mainly classified into 2 groups. This illustrates that there are 2 different kinds of influenza A (H1N1) virus isolates from March 1, 2009 to April 30, 2009. This result is consistent with the works by Qi et al.[14,28] Furthermore, the result is also consistent with the biology fact that a new influenza virus, A/California/07/2009 (H1N1)–like virus, appeared and showed a strong ability to infect human beings in April 2009.[23] The branch length in Figure 3 is the similarity distance between 2 virus isolates.
Figure 3.

The phylogenetic tree of the 560 influenza A (H1N1) isolates from March to April 2009 by our method.

The phylogenetic tree of the 560 influenza A (H1N1) isolates from March to April 2009 by our method. ClustalW is one of the most widely used multiple sequence alignment method for nucleic acid and protein sequence in molecular biology. We construct the phylogenetic tree of the 560 gene sequences using ClustalW method[29] under MEGA6.0 software for comparison. From the phylogenetic tree in Figure 4, we can see that all virus isolates are also classified into 2 groups. In the figure, we also mark 2 typical strains: A/California/07/2009 (H1N1) and A/Indiana/08/2009 (H1N1). Observing Figures 3 and 4, one can easily find out that the results by our method are consistent with those by ClustalW method. Furthermore, it takes about 126 minutes to obtain the multiple sequence alignment result on our Intel Core i5-3230M CPU @ 2.60 GHz 2.60 GHz Windows PC with 4 GB RAM. However, the computation time of our method is 105.769 seconds by a Python program. It can indicate that our method is a computational efficiency method when dealing with sequences with different lengths.
Figure 4.

The phylogenetic tree of the 560 influenza A (H1N1) isolates from March to April 2009 using ClustalW method under MEGA6.0 software.

The phylogenetic tree of the 560 influenza A (H1N1) isolates from March to April 2009 using ClustalW method under MEGA6.0 software.

Conclusions

In this article, a new 3D graphical representation of protein sequences is introduced based on 10 physicochemical properties and BLOSUM62 matrix. On the basis of the graphical representation curve, we extract a specific vector and use the vector to calculate the similarity distance between 2 protein sequences. To prove the effectiveness of our method, we apply our method to 3 real data sets. The results show the validity of our method in phylogenetic analysis compared with related works and evolution facts. Click here for additional data file. Supplemental material, uomf_march_22_2018_supplementary_file_1 for Novel Method of 3-Dimensional Graphical Representation for Proteins and Its Application by Zhao-Hui Qi, Ke-Cheng Li, Jin-Long Ma, Yu-Hua Yao and Ling-Yun Liu in Evolutionary Bioinformatics Click here for additional data file. Supplemental material, uomf_march_22_2018_supplementary_file_2 for Novel Method of 3-Dimensional Graphical Representation for Proteins and Its Application by Zhao-Hui Qi, Ke-Cheng Li, Jin-Long Ma, Yu-Hua Yao and Ling-Yun Liu in Evolutionary Bioinformatics Click here for additional data file. Supplemental material, uomf_march_22_2018_supplementary_file_3 for Novel Method of 3-Dimensional Graphical Representation for Proteins and Its Application by Zhao-Hui Qi, Ke-Cheng Li, Jin-Long Ma, Yu-Hua Yao and Ling-Yun Liu in Evolutionary Bioinformatics
  15 in total

1.  Amino acid substitution matrices from protein blocks.

Authors:  S Henikoff; J G Henikoff
Journal:  Proc Natl Acad Sci U S A       Date:  1992-11-15       Impact factor: 11.205

2.  Graphical representation of proteins as four-color maps and their numerical characterization.

Authors:  Milan Randić; Ketij Mehulić; Damir Vukicević; Tomaz Pisanski; Drazen Vikić-Topić; Dejan Plavsić
Journal:  J Mol Graph Model       Date:  2008-11-01       Impact factor: 2.518

3.  MEGA6: Molecular Evolutionary Genetics Analysis version 6.0.

Authors:  Koichiro Tamura; Glen Stecher; Daniel Peterson; Alan Filipski; Sudhir Kumar
Journal:  Mol Biol Evol       Date:  2013-10-16       Impact factor: 16.240

4.  The graphical representation of protein sequences based on the physicochemical properties and its applications.

Authors:  Ping-An He; Yan-Ping Zhang; Yu-Hua Yao; Yi-Fa Tang; Xu-Ying Nan
Journal:  J Comput Chem       Date:  2010-08       Impact factor: 3.376

5.  A protein mapping method based on physicochemical properties and dimension reduction.

Authors:  Zhao-Hui Qi; Meng-Zhe Jin; Su-Li Li; Jun Feng
Journal:  Comput Biol Med       Date:  2014-11-28       Impact factor: 4.589

6.  Spectral-dynamic representation of DNA sequences.

Authors:  Dorota Bielińska-Wąż; Piotr Wąż
Journal:  J Biomed Inform       Date:  2017-06-03       Impact factor: 6.317

7.  Amino acid difference formula to help explain protein evolution.

Authors:  R Grantham
Journal:  Science       Date:  1974-09-06       Impact factor: 47.728

8.  The genetic code and error transmission.

Authors:  C Alff-Steinberger
Journal:  Proc Natl Acad Sci U S A       Date:  1969-10       Impact factor: 11.205

9.  On the fundamental nature and evolution of the genetic code.

Authors:  C R Woese; D H Dugre; S A Dugre; M Kondo; W C Saxinger
Journal:  Cold Spring Harb Symp Quant Biol       Date:  1966

10.  An efficient numerical method for protein sequences similarity analysis based on a new two-dimensional graphical representation.

Authors:  A El-Lakkani; H Mahran
Journal:  SAR QSAR Environ Res       Date:  2015       Impact factor: 3.000

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.