Literature DB >> 20969878

Three 3D graphical representations of DNA primary sequences based on the classifications of DNA bases and their applications.

Abstract

In this article, we introduce three 3D graphical representations of DNA primary sequences, which we call RY-curve, MK-curve and SW-curve, based on three classifications of the DNA bases. The advantages of our representations are that (i) these 3D curves are strictly non-degenerate and there is no loss of information when transferring a DNA sequence to its mathematical representation and (ii) the coordinates of every node on these 3D curves have clear biological implication. Two applications of these 3D curves are presented: (a) a simple formula is derived to calculate the content of the four bases (A, G, C and T) from the coordinates of nodes on the curves; and (b) a 12-component characteristic vector is constructed to compare similarity among DNA sequences from different species based on the geometrical centers of the 3D curves. As examples, we examine similarity among the coding sequences of the first exon of beta-globin gene from eleven species and validate similarity of cDNA sequences of beta-globin gene from eight species.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：

Year: 2010 PMID： 20969878 PMCID： PMC7126940 DOI： 10.1016/j.jtbi.2010.10.018

Source DB: PubMed Journal: J Theor Biol ISSN： 0022-5193 Impact factor: 2.691

Introduction

Advances in DNA sequencing technology and DNA databases have greatly facilitated biological research involving DNA sequences. However, it has been acknowledged that information contained in DNA sequences is difficult for humans to comprehend without careful extraction and processing. Many methods have been proposed to characterize DNA sequences, with special efforts given to representing the sequence graphically. Using graphical approaches to study biological problems can provide an intuitive picture or useful insights for helping analyzing complicated relations in these systems, as demonstrated by many previous studies on a series of important biological topics, such as analysis of codon usage (Chou and Zhang, 1992, Zhang and Chou, 1993, Zhang and Chou, 1994), base frequencies in the anti-sense strands (Chou et al., 1996), analysis of DNA and protein sequence (Qi et al., 2007, Wu et al., 2010, Yu et al., 2009), enzyme-catalyzed reactions (Andraos, 2008, Chou, 1980, Chou, 1981, Chou, 1989, Chou et al., 1979, Chou and Forsen, 1980, Chou and Liu, 1981, Lin and Neet, 1990, Zhou and Deng, 1984), protein folding kinetics and folding rates (Chou, 1990, Chou and Shen, 2009, Shen et al., 2009), inhibition kinetics of processive nucleic acid polymerases and nucleases (Chou et al., 1994), and drug metabolism systems (Chou, 2010). Moreover, graphical methods have also been introduced to deal with some other biological and medical related problems (Diao et al., 2007, Gonzalez-Diaz et al., 2009, Munteanu et al., 2009). Recently, the images of cellular (Wolfram, 1984, Wolfram, 2002) automata were also used to represent biological sequences (Xiao et al., 2005a) for predicting protein structural classes (Xiao et al., 2008) and subcellular location (Xiao et al., 2006b), identifying G-protein-coupled receptor functional classes (Xiao et al., 2009), investigating HBV virus gene missense mutation (Xiao et al., 2005b), HBV viral infections (Xiao et al., 2006a), as well as analyzing SARS-cov (Gao et al., 2006, Wang et al., 2005). Graphical representation of DNA sequences was first proposed by Hamori and Ruskin (1983). Gates (1986), Nandy (1994) and Leong and Morgenthaler (1995) developed 2D graphical representations of DNA sequences. These methods are straightforward but are accompanied with some loss of information due to overlapping and crossing of the curve representing DNA with itself and degeneracy generated by the circuit. Randic et al. (2003) developed a novel 2D representation method in which there is no loss of information in transferring a DNA sequence to its mathematical representation. More recently, several other 2D representations have been proposed (Wang and Zhang, 2006; Zhang, 2009; Yao et al., 2008, Zhao et al., 2010). As for the 3D graphical representation, Hamori and Ruskin (1983) developed the H-curve. It can uniquely represent a DNA sequence. Based on the classifications of DNA bases, Zhang et al. (Zhang and Zhang, 1994, Zhang, 1997, Zhang et al., 2003) created the Z-curve to represent DNA sequences, in which the four bases (A, G, T and C) are represented by the four vertexes of the regular tetrahedron, as A(1,1,1), T(–1,–1,1), C(–1,1,–1) and G(1,–1,–1). The Z-curve is a 3D graphical representation and it has clear biological implication. However, as pointed out by Tang et al. (2010), the Z-curve representation has a defeat that it might cause a loop in the resulting spatial curve if the frequencies of the four bases present in the sequence are the same. Randic et al. (2000) presented another 3D graphical representation method, but the limitation in forms of crossing and overlapping of the spatial curve representing a DNA sequence still remains. Recently, more other 3D representations were developed by several authors (Li and Wang, 2004, Liao and Wang, 2004; Yu et al., 2009) to overcome the problem of degeneration in graphical representation. These methods, however, do not seem to possess apparent biological meanings. In this article, we will introduce three novel 3D graphical representations of DNA primary sequences, namely, the RY-curve, the MK-curve and the SW-curve. These curves are derived from three classifications of the four DNA bases A, G, T and C, respectively. It can be proved that the proposed representations are strictly non-degenerate, therefore can avoid potential information loss when transferring a DNA sequence to its representations. Moreover, the coordinates of every node on these 3D curves have clear biological implication. In Section 4, we will present three applications developed based on the proposed representations.

Construction of RY-curve, MK-curve and SW-curve

The four DNA bases (A, G, T and C) can be classified by the following three ways according to their chemical properties: Chemical structures of the bases: R (purine)=A, G/Y (pyrimidines)=T,C; Functional groups of the bases: M (amido)=A, C/K (keto)=G, T. The strength of the hydrogen bonds between paired bases: S(strong)=G, C/W=(weak)A, T. First consider the R/Y classification. In a 3D space, a point or a vector has three components. We assign the following vectors to the four DNA bases: Notice that we restrict the two vectors representing purine bases R=A,G in the x–y plane and two vectors representing pyrimidine bases Y=T,C in the x–z plane (see Fig. 1).

Fig. 1

The vectors representing the four bases according to the R/Y classification. Purine bases R=A,G are limited in x–y plane and pyrimidine bases Y=T,C are limited in x–z plane.

The vectors representing the four bases according to the R/Y classification. Purine bases R=A,G are limited in x–y plane and pyrimidine bases Y=T,C are limited in x–z plane. Given a DNA sequence with n bases, S=s 1 s 2,⋯,s , we look at one base at a time. For the i-th one (i=1, 2,…,n), a corresponding point P (x ,y ,z ) can be determined in the 3D space as follows:where , and represent the x-component, y-component and z-component of the vector corresponding to S , respectively. All n bases on the DNA sequence are examined consecutively, and in the end we will obtain n points: P 1,P 2,…,P in the 3D space. Then, starting from the original point (0, 0, 0), connecting adjacent points, we will obtain a 3D curve, called as the RY-curve. As an example, suppose we have a sequence S=ATGGTCTTG. Applying the proposed method, we get ten points corresponding to the nine bases on the sequence (including original point) to be Connecting these points sequentially, we obtain the RY-curve (see Fig. 2) for this particular DNA sequence.

Fig. 2

The RY-curve representation of the sequence ATGGTCTTG.

The RY-curve representation of the sequence ATGGTCTTG. Now we consider the M/K classification and the S/W classification of the four bases. For the M/K classification, we assign the following vectors to the four bases: Here, we restrict two vectors representing the amino bases M=A, C in the x–y plane and two vectors representing the keto bases K=G, T in the x–z plane. A different way of representing the DNA sequence graphically is thus established. We call the 3D curve generated under this definition the MK-curve. Similarly, for the S/W classification, we assign the following vectors to the four bases: This time the strong hydrogen bases S=A, T are restricted in the x–y plane and the weak hydrogen bases W=G, C are restricted in the x–z plane. We then obtain the third 3D graphical representation of the DNA sequence. 3D curves generated under this definition are called the SW-curve. As an example, in Fig. 3 we plot the RY-curve, MK-curve and SW-curve of human’s exon 1 of beta-globin gene in Table 1.

Fig. 3

The RY-curve, MK-curve and SW-curve of the coding sequences of the first exon of beta-globin gene of human. (A) The RY-curve of the coding sequences of the first exon of beta-globin gene of human. (B) The MK-curve of the coding sequences of the first exon of beta-globin gene of human. (C) The SW-curve of the coding sequences of the first exon of beta-globin gene of human.

Table 1

The coding sequences of the first exon of beta-globin gene of 11 different species.

Species	Coding sequences
Human	ATGGTGCACCTGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATTAAGTTGGTGGTGAGGCCCTGGGCAG
Goat	ATGCTGACTGCTGAGGAGAAGGCTGCCGTCACCGGCTTCTGGGGCAAGGTGAAAGTGGATGAAGTTGGTGCTGAGGCCCTGGGCAG
Gallus	ATGGTGCACTGGACTGCTGAGGAGAAGCAGCTCATCACCGGCCTCTGGGGCAAGGTCAATGTGGCCGAATGTGGGGCCGAAGCCCTGGCCAG
Mouse	ATGGTTGCACCTGACTGATGCTGAGAAGTCTGCTGTCTCTTGCCTGTGGGCAAAGGTGAACCCCGATGAAGTTGGTGGTGAGGCCCTGGGCAGG
Rat	ATGGTGCACCTAACTGATGCTGAGAAGGCTACTGTTAGTGGCCTGTGGGGAAAGGTGAACCCTGATAATGTTGGCGCTGAGGCCCTGGGCAG
Chimpanzee	ATGGTGCACCTGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAGGTTGGTATCAAGG
Bovine	ATGCTGACTGCTGAGGAGAAGGCTGCCGTCACCGCCTTTTGGGGCAAGGTGAAAGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAG
Gorilla	ATGGTGCACCTGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAGG
Opossum	ATGGTGCACTTGACTTCTGAGGAGAAGAACTGCATCACTACCATCTGGTCTAAGGTGCAGGTTGACCAGACTGGTGGTGAGGCCCTTGGCAG
Lemur	ATGACTTTGCTGAGTGCTGAGGAGAATGCTCATGTCACCTCTCTGTGGGGCAAGGTGGATGTAGAGAAAGTTGGTGGCGAGGCCTTGGGCAG
Rabbit	ATGGTGCATCTGTCCAGTGAGGAGAAGTCTGCGGTCACTGCCCTGTGGGGCAAGGTGAATGTGGAAGAAGTTGGTGGTGAGGCCCTGGGC

Properties of RY-curve, MK-curve and SW-curve

In this section, we will prove some properties of RY-curve, MK-curve and SW-curve. We use notations A, G, T and C to denote the content of bases A, T, G and C, respectively, in a DNA sequence: S=s 1 s 2⋯s , s ∈{A,T,G,C}. There is no circuit and degeneracy in RY-curve, MK-curve and SW-curve. We can prove this property by contradiction. First consider the RY-curve. Suppose that there are one or more circuits in a RY-curve. Then there exists at least one point in the 3D space at which the curve crosses itself. That is, two points on the curve, say P and P , i≠j, have exactly the same coordinates (x ,y ,z )=(x ,y ,z ). So we must have x =x . According to the Assignment (1) and Eq. (2), we have and . This implies i=j. However, this contradicts the supposition that i≠j. Therefore, there is no circuit and degeneracy in RY-curve. Similarly, we can show that there is also no circuit and degeneracy in MK-curve and SW-curve. □ There exists an one-to-one correspondence between a DNA sequence and a RY-curve (MK-curve or SW-curve) and no loss of information is resulted. First consider RY-curve. From the previous proof, we know that, for a given DNA sequences S=s 1 s 2⋯s , there exists one unique RY-curve. □ Conversely, suppose that RY-curve of a DNA sequence is given; it then follows immediately that the coordinates of all n nodes on the RY-curve, (x ,y ,z ), i=1,2,…,n are given. Let (x 0,y 0,z 0)=(0,0,0). According to Eq. (2), bases s corresponding to the node P(x ,y ,z ) on the RY-curve can be calculated using the following formula: Formula (5) consists of the followings set of equations:where a ∈{(1,−1,0),(1,1,0),(1,0,1),(1,0,−1)} and i=1,2,…,n is known. Note (x 0,y 0,z 0)=(0,0,0). Regarding (x 1,y 1,z 1))⋯(x ,y ,z ) as unknown, we obtain the coefficient matrix of the Eq. (6) to be The determinant |A|=1≠0, therefore for the given RY-curve, Eqs. (6) have a unique solution. This implies that one RY-curve uniquely determines one correspondent DNA sequence. Hence, the correspondence between DNA sequences and RY-curves is one-to-one and there is no loss of information. Similarly, we can prove that Property 3.2 holds for MK-curve and SW-curve as well. The x-component of the vector corresponding to the node P(x ,y ,z ) of the RY-curve (MK-curve or SW-curve), x , is just the length of the DNA sequence S=s 1 s 2⋯s , we have The proof follows immediately from assignment (1) and Eq. (2). □ For the RY-curve, its projections (2D curve) onto the x–y plane and the x–z plane denote the distributions of purine bases (A,G) and pyrimidine bases (T,C) along the sequence S=s s …,s , respectively, and we have For the MK-curve, its projections (2D curve) onto the x–y plane and the x–z plane denotes the distributions of bases of amino group (A,C) and keto group (G,T) along the sequence S=s s ,…,s respectively, and we have For the SW-curve, its projections (2D curve) onto the x–y plane and the x–z plane denotes the distributions of bases of weak H-bonds (A,T) and strong H-bonds (G,C) along the sequence S=s s ,…,s respectively, and we have First we prove (i). From assignment (1), we know that the vectors representing bases, G and A, are symmetrical about x axis on the x-y plane, and the vectors representing bases, T and C, also are symmetrical about x axis on the x-z plane. So, according to Equation (2), we have The projection of RY-curve onto the x–y plane is a 2D curve with nodes: {(x ,y ), k=1,2,…,n} and the projection of RY-curve onto the x–z plane is a 2D curve with nodes:{(x ,z ), k=1,2,…,n}. So, the projections of RY-curve onto the x–y plane and the x–z plane display the distributions of purine bases (A,G) and pyrimidine bases (T,C) along the sequence, respectively. □ Similarly, we can prove (ii) and (iii). Fig. 4 shows the projections of the RY-curve of human’s exon 1 of beta-globin genes onto the x–y plane and the x–z plane.

Fig. 4

The 2D projection of the RY-curve of the first exon of beta-globin gene of human in x–y plane and x–z plane. (A) The 2D projection of the RY-curve of the first exon of beta-globin gene of human in x–y plane, which denotes the distribution of purine bases (A,G) along the coding sequence. (B) The 2D projection of the RY-curve of the first exon of beta-globin gene of human in x–z plane, which denotes the distribution of pyrimidine bases (T,C) along the coding sequence. From Fig. 4, we can see that G>A , T>C, and the changing trend of the content of bases A,G and bases T,C along the sequence is also directly observable. Let (x ,y ,z ) denote the coordinates of the terminal point on an RY-curve, then the following relationships are true: if y >0,z >0, then T +G >A +C . if y <0,z <0,then T +G C . if y >0, z <0,then G +C >A +T if y <0, z >0, then G +C if y =0, z =0, then T +G =A +C and T =C , G =A . The above results follow from assignment (1) and Eq. (2) directly. For MK-curve and SW-curve, there are analogous properties as well. □

The applications of RY-curve, MK-curve and SW-curve

In this section, we will present two applications of RY-curve, MK-curve and SW-curve.

Calculation of the base content of a DNA sequence

Based on Property 3.3, Property 3.4, we can obtain three equations for each curve. For RY-curve, we have For MK-curve, we have And for SW-curve, we have Without loss of generality, we select the following four independent equations from (7), (8), (9): Notice that since the coefficient matrix of Eq. (10) is nonsingular, there exists one unique solution. The solution of Eq. (10) can be obtained recursively as follows: For example, for the complete coding sequences of beta-globin genes of human, from its RY-curve and MK-curve, we obtain Substituting these values into formula (11), we get Similarly, using formula (11), we can calculate the base content of DNA sequences for the eleven species presented in Table 1 (see Table 2).

Table 2

The base contents of the 11 coding sequences of Table 1.

Species	A	G	T	C	Total
Human	17	35	21	19	92
Goat	17	35	17	17	86
Gallus	19	34	15	24	92
Mouse	17	34	23	20	94
Rat	20	33	21	18	92
Chimpanzee	20	41	24	20	105
Bovine	17	35	18	16	86
Gorilla	17	37	20	19	93
Opossum	21	29	22	20	92
Lemur	19	35	23	15	92
Rabbit	17	37	20	16	90

The base contents of the 11 coding sequences of Table 1.

Similarity analysis based on the RY-curve, MK-curve and SW-curve

Construction of the 12-component sequence descriptor

In Section 2, we have constructed the RY-curve of representing a DNA sequence restricting purine bases R=A,G in the x–y plane and pyrimidine bases Y=T,C in the x–z plane. Conditional on this assumption, there exist four possible ways of assigning the four vectors to the four bases (A, G, T and C): Thus, from assignment (12) and Eq. (2), we could have four different kinds of RY-curve, denoted as RY-curve11, RY-curve12, RY-curve13 and RY-curve14. Note these curves are listed in the same order as they appear in Eq. (2). Analogously, for MK-curve, we can also form four kinds of MK-curve, denoted by MK-curve21, MK-curve22, MK-curve23 and MK-curve24. For SW-curve, we can also obtain four kinds of SW-curve, denoted by SW-curve31, SW-curve32, SW-curve33 and SW-curve34. Therefore, we can have a total of twelve 3D curves representing a DNA sequence. For a given sequence with length n, we have a set of points (x , y , z ), i=1, 2, 3,…, n, from the graphical representation of the sequence. The coordinates of the geometrical center of all the points, denoted by x 0, y 0, and z 0, can be calculated as follows (Yao et al., 2005): Next, we calculate the following index by (13): Using formula (14), we calculate an index vector based on all above twelve 3D curves, denoted by Here, we use the first subscript to denote the particular curve (RY, MK, SW) and use the second subscript to denote the four different ways concerning how the vectors are assigned. The 12-component vector (15) can be used as the sequence descriptors. To ease notational exposition, we rewrite the 12-component vector (15) as follows:

Similarity analysis of the coding sequences of beta-globin gene among different species

Comparison based on sequence descriptors is one method, which has been routinely used in similarity analysis. Here, we use the 12-component vector (15) as the index for comparing different DNA sequences. Suppose that for species i and j, their 12-component vectors areand We introduce two measures to quantify similarity between the two species. They are the Euclidean distance d and the correlation angle θ : The smaller the d and θ are, the more the similar species i and j are. Calculating d and θ for all eleven species presented in Table 1, we obtain two similarity matrices: M1 and M2, where M1=(d )11×11 and M2=(θ )11×11. To combine information from these two matrices together, we compute a weighted sum: M(a)=aM1+(1−a)M2, (0≤a≤1), as the overall similarity matrix of the eleven species. Setting a=1/2, we compute the overall similarity matrix M (1/2) for the eleven species and list the result in Table 3.

Table 3

The similarity matrix of the 11 coding sequences of Table 1: M(1/2).

Species	Human	Goat	Gallus	Mouse	Rat	Chimpanzee	Bovine	Gorilla	Opossum	Lemur	Rabbit
Human	0	0.0789	1.1475	1.489	1.0999	0.0062	0.0735	0.0424	0.6468	0.5246	0.3757
Goat		0	1.2189	1.5666	1.1719	0.0736	0.0057	0.0367	0.7203	0.4509	0.3034
Gallus			0	0.3509	0.0478	1.1529	1.2142	1.1849	0.6076	1.4876	1.5201
Mouse				0	0.397	1.4951	1.5617	1.5305	0.8499	1.1405	1.2874
Rat					0	1.1054	1.1671	1.1376	0.5699	1.5337	1.4727
Chimpanzee						0	0.0681	0.0371	0.6528	0.52	0.371
Bovine							0	0.0314	0.7159	0.4563	0.3086
Gorilla								0	0.6872	0.486	0.3379
Opossum									0	1.0483	0.9319
Lemur										0	0.1497
Rabbit											0

Table 4

The degree of similarity of the coding sequences of several species with the coding sequences of human.

Species	Chimpanzee	Gorilla	Gallus	Opossum	Bovine	Goat
Our work, Table 3	0.0062	0.0424	1.1475	0.6468	0.0735	0.0789
Liao and Ding (2006), Table 5	0.022893	0.025960	0.0106123	0.095765	0.048664	0.052039
Liu et al. (2006), Table 5	0.0145	0.0079	0.2417	0.2815	0.0750	0.1078
Yao et al. (2008), Table 10	0.00449	0.00478	0.02916	0.02999	0.01359	0.01633
Zhang (2009), Table 1	0.9572	0.2633	1.1559	1.1863	0.3606	0.4769
Tang et al. (2010), Table 3	0.0399	0.0441	0.1766	0.1598	0.0799	0.0869
Tang et al. (2010), Table 4	0.0379	0.0423	0.1781	0.1598	0.0796	0.0855

The degree of similarity of the coding sequences of several species with the coding sequences of human.

Similarity analysis of cDNA sequences of beta-globin gene among different species

Based on the method proposed in Section 4.2.1., we compare similarities among cDNA sequences of beta‐globin gene of eight species in Table 5. The results are listed in Table 6.

Table 5

The cDNA sequences of beta-globin gene of 8 species.

Species	Release date	UCSC version	Length (bp)
Human	Feb. 2009	hg19/GRCh37	444
Chimpanzee	Mar. 2006	panTro2	444
Rat	Nov. 2004	rn4	444
Mouse	July 2007	mm9	444
Tetraodon	Mar. 2007	tetNig2	448
Fugu	Oct. 2004	fr2	444
Mouse lemur	Jun. 2003	micMur1	443
Bushbaby	Dec. 2006	otoGar1	444

Table 6

The similarity matrix of the cDNA of 8 species in Table 5: M(1/2).

Species	Human	Chimpanzee	Rat	Mouse	Tetraodon	Fugu	Mouse lemur	Bushbaby
Human	0	0.013149	0.36733	0.44743	0.60224	1.4339	0.37625	0.42344
Chimpanzee		0	0.38038	0.46053	0.61479	1.4465	0.38932	0.43653
Rat			0	0.083386	0.27248	1.0744	0.009278	0.057991
Mouse				0	0.19741	0.99561	0.074264	0.025603
Tetraodon					0	0.83818	0.26465	0.22101
Fugu						0	1.066	1.0195
Mouse lemur							0	0.048804
Bushbaby								0

The cDNA sequences of beta-globin gene of 8 species. The similarity matrix of the cDNA of 8 species in Table 5: M(1/2). It can be observed in Table 6 that the following pairs of species have significantly smaller similarity scores: human–chimpanzee, rat–mouse and mouse lemur–bushbaby. In fact, the eight species chosen here are four pairs of close relatives in their evolution, namely human–chimpanzee, rat–mouse, mouse lemur –bushbaby and tetraodon–fugu. However, we notice that the similarity score of tetraodon–fugu is the smallest in the seventh column of Table 6, but it is much bigger than the other three close relative entries. This problem remains to be further studied.

Conclusion

In this paper, we propose three graphical representations, namely RY-curve, MK-curve and SW-curve, to represent the DNA sequence in a 3D space. We prove that the 3D curves are strictly non-degenerate and there is no loss of information in transferring the DNA sequence to the proposed curves. Compared with other graphical representations, the main advantage of our method is that the 2D projection of RY-curve, MK-curve and SW-curve onto the x–y plane and the x–z plane has clear biological implication. For example, the 2D projection of RY-curve onto the x–y plane denotes the changing trend of the content of A, G (see Fig. 4). The three components of the terminal node of these 3D curves algebraically relate to the content of the bases: A, G, T and C (see Property 3.3, Property 3.4). Therefore, more information is retained by our method compared to other available methods. As the application of the graphical representation, we derive a simple formula to recover the content of the four kinds of bases (A, G, C and T) in a DNA sequence from the proposed curves. The sequence descriptors of 12-component vectors we have constructed enabled us to conduct similarity analysis among the coding sequences of first exon of beta-globin gene of 11 species. Our results are in overall agreement with the results reported in the article (Zhang, 2009, Yao et al., 2008, Tang et al., 2010, Liao and Ding, 2006, Liu et al., 2006) (see Table 4). We also have a good validation of similarities of cDNA sequences of the related ones by our method. Computation involved in implementing the proposed methods is fairly straightforward.

40 in total

Review 1. On a 3-D representation of DNA primary sequences.

Authors: Chun Li; Jun Wang
Journal: Comb Chem High Throughput Screen Date: 2004-02 Impact factor: 1.339

2. Do "antisense proteins" exist?

Authors: K C Chou; C T Zhang; D W Elrod
Journal: J Protein Chem Date: 1996-01

3. Graphic analysis of codon usage strategy in 1490 human proteins.

Authors: C T Zhang; K C Chou
Journal: J Protein Chem Date: 1993-06

4. A new schematic method in enzyme kinetics.

Authors: K C Chou
Journal: Eur J Biochem Date: 1980-12

5. An extension of Chou's graphic rules for deriving enzyme kinetic equations to systems involving parallel reaction pathways.

Authors: G P Zhou; M H Deng
Journal: Biochem J Date: 1984-08-15 Impact factor: 3.857

5. The disposition of the LZCC protein residues in wenxiang diagram provides new insights into the protein-protein interaction mechanism.

Authors: Guo-Ping Zhou
Journal: J Theor Biol Date: 2011-06-22 Impact factor: 2.691

6. Similarity Estimation Between DNA Sequences Based on Local Pattern Histograms of Binary Images.

Authors: Yusei Kobori; Satoshi Mizuta
Journal: Genomics Proteomics Bioinformatics Date: 2016-04-27 Impact factor: 7.691

7. Alignment-free genomic sequence comparison using FCGR and signal processing.

Authors: Daniel Lichtblau
Journal: BMC Bioinformatics Date: 2019-12-30 Impact factor: 3.169

7 in total

Review 1. On a 3-D representation of DNA primary sequences.

2. Do "antisense proteins" exist?

3. Graphic analysis of codon usage strategy in 1490 human proteins.

4. A new schematic method in enzyme kinetics.

5. An extension of Chou's graphic rules for deriving enzyme kinetic equations to systems involving parallel reaction pathways.

6. A probability cellular automaton model for hepatitis B viral infections.

7. GPCR-CA: A cellular automaton image approach for predicting G-protein-coupled receptor functional classes.

8. Using cellular automata images and pseudo amino acid composition to predict protein subcellular location.

9. A novel fingerprint map for detecting SARS-CoV.

10. The community structure of human cellular signaling network.