Literature DB >> 17931659

New 3D graphical representation of DNA sequence based on dual nucleotides.

Xiao-Qin Qi1, Jie Wen, Zhao-Hui Qi.   

Abstract

We introduce a 3D graphical representation of DNA sequences based on the pairs of dual nucleotides (DNs). Based on this representation, we consider some mathematical invariants and construct two 16-component vectors associated with these invariants. The vectors are used to characterize and compare the complete coding sequence part of beta globin gene of nine different species. The examination of similarities/dissimilarities illustrates the utility of the approach.

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 17931659      PMCID: PMC7094097          DOI: 10.1016/j.jtbi.2007.08.025

Source DB:  PubMed          Journal:  J Theor Biol        ISSN: 0022-5193            Impact factor:   2.691


Introduction

The number of biological sequences is rapidly increasing in the biological database. It is one of the challenges for bio-scientists to analyze mathematically the large volume of biological sequence data. It is good to use the graphic representation to study complicated biological systems because it can provide an intuitive picture and help people gain useful insights. Similar graphical approaches have also been used to deal with a wide variety of biological problems. For instance, various graphic approaches have been successfully used to study enzyme-catalyzed system (see, e.g., King and Altman, 1956, Chou et al., 1979, Chou and Forsen, 1980, Chou and Liu, 1981, Zhou and Deng, 1984, Chou, 1989, Chou, 1990; Kuzmic et al., 1992, Lin and Neet, 1990), protein folding kinetics (Chou, 1990, Chou, 1993), condon usage (Chou and Zhang, 1992, Zhang and Chou, 1994), HIV reverse transcriptase inhibition mechanisms (Althaus et al., 1993a, Althaus et al., 1993b, Althaus et al., 1993c) and base frequency distribution in the anti-sense strands (Chou and Zhang, 1996). Recently, the images of cellular automata were also used to represent biological sequences (Xiao et al., 2005a), predict protein subcellular location (Xiao et al., 2006a), investigate HBV virus gene missense mutation (Xiao et al., 2005b) and HBV viral infections (Xiao et al., 2006b), as well as analyze the fingerprint of SARS coronavirus (Wang et al., 2005). As for an important part of graphical techniques, graphical representations of DNA sequences have been proposed by several authors (Zhang, 1991, Nandy, 1994, Nandy and Nandy, 2003, Liao and Wang, 2004, Randic et al., 2003, Liu et al., 2006; Zhang and Chen, 2006). Some of them, for example Nandy’ graphical representation (Nandy, 1994), are accompanied by some loss of visual information associated with crossing and overlapping of the curve with itself. In order to avoid the limitations related to crossing and overlapping, Liao (Liao and Wang, 2004) and Randic (Randic and Vracko, 2000, Randic et al., 2003) present their 2D or 3D graphical representations. However, their approaches are associated with the computations of D/D, L/L and leading eigenvalue, which need a great deal of running time and memory space. Moreover, the dinucleotide analysis has also been tried by several previous authors. Randic (2000) proposed a condensed representation of DNA based on pairs of nucleotides. This approach can offer fast, qualitative comparisons of DNA and allow quantitative comparisons of DNA from different sources. Wu et al. (2003) and Liu et al. (2006) proposed their analysis approaches based on neighboring nucleotides of DNA sequence, which reveal the biology information hidden between the dual nucleotides (DNs). Qi and Qi (2007) also suggest a dinucleotide analysis method to reveal the biology information of DNA sequences. Recently, Qi and Fan (2007) proposed a 3D graphical representation of DNA sequence based on a pair of nucleotides. Based on similar research object (3D graphical representation of DNA sequence based on a pair of nucleotides), in this paper we introduce a new 3D graphical representation (3D-DN curve) of DNA primary sequences, in which there is also no loss of information in the transfer of data from a DNA sequence to its mathematical representation. Our representation is different from that of PN-curve (Qi and Fan, 2007). The two papers are highly dissimilar with respect to each other in many aspects: the methods and contents of research, the map used to construct graphical representation, the graphical curve and numerical invariants characterizing DNA sequences. The introduced representation is simple and direct, and gives us more biology information based on DNs.

3D graphical representation of DNA sequences based on dual nucleotides

Given a DNA primary sequence, there are 16 kinds of the pairs of the neighboring nucleotides. These pairs can be classified as four categories based on their chemical properties: purine-DN {AG, GA}/pyrimidine-DN {CT, TC}, amino-DN {AC, CA}/keto-DN {GT, TG}, weak-H bond DN {AT, TA}/strong-H bond DN {CG, GC} and repeat-DN {AA, CC, GG, TT}. Then we design a matrix and give a new 3D graphical representation of DNA sequences. We arrange 16 DNs in a matrix according to the above four categories. The matrix is Every element of the matrix has a corresponding index , ; . Based on the index, we assign one DN as follows:That is to say, we assign every DN to its corresponding index , respectively, while the corresponding curve extending along with z-axes. In detail, let be an arbitrary DNA primary sequence. Then we define a map as follows: The map maps G into a plot set. For example, the corresponding plot set of the sequence ATGGTGCACC is {(2, 0, 1), (1, 3, 2), (3, 2, 3), (1, 2, 4), (1, 3, 5), (2, 3, 6), (1, 1, 7), (1, 0, 8), (3, 1, 9)}. The corresponding plot set is called as characteristic plot set. The curve connected all plots of the characteristic plot set in turn is called 3D-DN curve. In Table 1 and Fig. 1 , we show the corresponding coordinates and the 3D graphical representation of the sequence, respectively.
Table 1

Cartesian 3D coordinates for the sequence ATGGTGCACC of the coding sequence of the first exon of human -globin gene

BaseDNsxyz
1AT201
2TG132
3GG323
4GT124
5TG135
6GC236
7CA117
8AC108
9CC319
Fig. 1

Characteristic curve of the sequence ATGGTGCACC, the dots denote the DNs making up the sequence.

Cartesian 3D coordinates for the sequence ATGGTGCACC of the coding sequence of the first exon of human -globin gene Characteristic curve of the sequence ATGGTGCACC, the dots denote the DNs making up the sequence. From the construction of the matrix, we know that their designs are not unique. There are 16 kinds of DNs, so they have combinations. But we design the matrix based on the classifications of nucleotides. In this paper, we only consider the above matrix to illustrate our scheme.

Numerical characterization of DNA sequences

Given a DNA sequence with the length of N. Based on the definition of the map , we can have a set of points , , and the correspondence between the DNs and the points is one-to-one. In order to find some of the invariants sensitive to the form of the characteristic curve we will transform the 3D graphical representation of the characteristic curve into another mathematical object. Firstly, let denote the total number of the DN ab appearing in the given sequence, , . The vertex (dot) denotes the first dot of the 3D-DN curve. The vertex (dot) denotes the ith dot of the 3D-DN curve. Then let denote the sum of geometrical lengths of edges between vertices (dots) and of the 3D-DN curve, where denotes the vertex representing the DN ab appearing in the given sequence. The parameter is defined as the distribution of DN ab frequency. For 3D-DN curve, after simple computation, we can obtain , , where denotes the sum of geometrical lengths of edges between vertices (dots) and of the 3D-DN curve when the DN ab appears kth time in the given sequence. Here, we calculate the -Matrix as the following: The direct biological meaning of the -Matrix is that they indicate the mean spaces and the distributions of DNs in the graph of the given sequence, respectively. Here, we regard them as the invariants to numerically characterize the DNA sequences. In Nandy et al. (2006), Nandy suggests that authors apply their techniques to complete genes, or at least the complete coding sequence part. The complete genes of the beta globin genes have an interrupted structure with three exons and two introns. Comparisons of related genes in different species show that the sequences of the corresponding exons are usually conserved but the sequences of the introns are much less well related. In this paper, we apply our method to the complete coding sequence part (i.e. three exons). For simplification, in Table 2 we only list the primary DNA sequences of the complete coding sequences of part species.
Table 2

The complete coding sequences of -globin genes of nine species

SpeciesComplete coding sequence
HumanACCESSION U01317; REGION: join(6218762278,6240962631,6348263610)
Exon1 192; Exon2 93315; Exon3 316444;
ATGGTGCACCTGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATGCTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCACCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCACTAA


GoatACCESSION M15387; REGION: join(279364,493715,16211749)
Exon1 186; Exon2 87309; Exon3 310438;
ATGCTGACTGCTGAGGAGAAGGCTGCCGTCACCGGCTTCTGGGGCAAGGTGAAAGTGGATGAAGTTGGTGCTGAGGCCCTGGGCAGGCTGCTGGTTGTCTACCCCTGGACTCAGAGGTTCTTTGAGCACTTTGGGGACTTGTCCTCTGCTGATGCTGTTATGAACAATGCTAAGGTGAAGGCCCATGGCAAGAAGGTGCTAGACTCCTTTAGTAACGGCATGAAGCATCTTGACGACCTCAAGGGCACCTTTGCTCAGCTGAGTGAGCTGCACTGTGATAAGCTGCACGTGGATCCTGAGAACTTCAAGCTCCTGGGCAACGTGCTGGTGGTTGTGCTGGCTCGCCACCATGGCAGTGAATTCACCCCGCTGCTGCAGGCTGAGTTTCAGAAGGTGGTGGCTGGTGTTGCCAATGCCCTGGCCCACAGATATCACTAA


North American opossumACCESSION J03643; REGION: join(467558,672894,23602488)
Exon1 192; Exon2 93315; Exon3 316444;
ATGGTGCACTTGACTTCTGAGGAGAAGAACTGCATCACTACCATCTGGTCTAAGGTGCAGGTTGACCAGACTGGTGGTGAGGCCCTTGGCAGGATGCTCGTTGTCTACCCCTGGACCACCAGGTTTTTTGGGAGCTTTGGTGATCTGTCCTCTCCTGGCGCTGTCATGTCAAATTCTAAGGTTCAAGCCCATGGTGCTAAGGTGTTGACCTCCTTCGGTGAAGCAGTCAAGCATTTGGACAACCTGAAGGGTACTTATGCCAAGTTGAGTGAGCTCCACTGTGACAAGCTGCATGTGGACCCTGAGAACTTCAAGATGCTGGGGAATATCATTGTGATCTGCCTGGCTGAGCACTTTGGCAAGGATTTTACTCCTGAATGTCAGGTTGCTTGGCAGAAGCTCGTGGCTGGAGTTGCCCATGCCCTGGCCCACAAGTACCACTAA


GallusACCESSION V00409; REGION: join(465556,649871,16821810)
For simplification, only Exon1 (192) is listed;
ATGGTGCACTGGACTGCTGAGGAGAAGCAGCTCATCACCGGCCTCTGGGGCAAGGTCAATGTGGCCGAATGTGGGGCCGAAGCCCTGGCCAG


Black lemurACCESSION M15734; REGION: join(154245,376598,14671595)
For simplification, only Exon1 (192) is listed;
ATGACTTTGCTGAGTGCTGAGGAGAATGCTCATGTCACCTCTCTGTGGGGCAAGGTGGATGTAGAGAAAGTTGGTGGCGAGGCCTTGGGCAG


House mouseACCESSION V00722; REGION: join(275367,484705,13341462)
For simplification, only Exon1 (193) is listed;
ATGGTGCACCTGACTGATGCTGAGAAGTCTGCTGTCTCTTGCCTGTGGGCAAAGGTGAACCCCGATGAAGTTGGTGGTGAGGCCCTGGGCAGG


RabbitACCESSION V00882; REGION: join(277368,495717,12911419)
For simplification, only Exon1 (192) is listed;
ATGGTGCATCTGTCCAGTGAGGAGAAGTCTGCGGTCACTGCCCTGTGGGGCAAGGTGAATGTGGAAGAAGTTGGTGGTGAGGCCCTGGGCAG


Norway ratACCESSION X06701; REGION: join(310401,517739,1377>1505)
For simplification, only Exon1 (1 …92) is listed;
ATGGTGCACCTAACTGATGCTGAGAAGGCTACTGTTAGTGGCCTGTGGGGAAAGGTGAACCCTGATAATGTTGGCGCTGAGGCCCTGGGCAG


CattleACCESSION X00376; REGION: join(278363,492714,16131741)
For simplification, only Exon1 (1 … 86) is listed;
ATGCTGACTGCTGAGGAGAAGGCTGCCGTCACCGCCTTTTGGGGCAAGGTGAAAGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAG
The complete coding sequences of -globin genes of nine species In Table 3 , the -Matrix is constructed for the nine species presented in Table 2. To compare conveniently, we list the comparison of the mean space and the distribution p of DNs among the nine species in Fig. 2 . Taking a closer look at Fig. 2, we can find some common features of nine DNA primary sequences, which are not easily visible in Table 3. The features may give us more information about their evolution. The DNs AG, GA, TC, AC, CA, GT, GC, AA, CC and TT occur appropriately in all species presented in Table 2. The DNs CT, TG and GG occur more frequently in all species. The DNs AT, TA and CG occur more rarely in all species. Moreover, observing the lines whose localities and heights denote the distributions of the corresponding DNs and mean space of every two identical DNs, respectively, we find Gallus (the only nonmammalian species) and Opossum (the most remote species from the remaining mammals) show larger entries among these species.
Table 3

The -Matrix of the nine different species presented in Table 2

HumanGoat
500.6136396.8885586.6441591.9827456.9633418.3000524.6166563.2798
0.06090.05640.09930.04510.07090.06860.10300.0412
379.2788613.9212416.0631381.3174380.0210614.6291456.6932413.2502
0.05420.06770.07220.13540.04580.06180.05950.1350
432.6606614.6310446.4609423.7280455.4911499.6837521.9378425.9806
0.02930.02030.01130.07900.03430.02060.01830.0961
597.4704487.8472486.8177487.0639569.0279539.2355491.6873515.4189
0.05190.07900.09930.03840.05030.05720.08920.0481


North American opossumGallus
523.8582533.3919556.4765444.8475513.3896474.0916615.7531521.0959
0.06770.06770.09260.05420.05190.05640.09480.0655
346.9337518.0918402.6976459.9530417.5562523.9502421.4063431.0943
0.05190.07000.06770.12420.05870.08350.04290.1016
426.4998515.1239495.2860549.4444376.1917699.1648409.4878425.4332
0.04290.02480.00900.06550.03610.00680.03160.0858
643.2122540.4797525.8275525.6048632.2262544.2404464.1535659.8438
0.04970.06550.07900.06770.05420.11510.08350.0316


Black lemurHouse mouse
451.0993413.3322561.5003453.7127520.6454492.1484600.5855553.3528
0.06550.06090.10380.05870.06090.06090.10380.0429
442.8868600.2678447.9226413.2137417.1156610.4255391.1613369.3045
0.04970.06770.07220.13090.05420.06550.05190.1242
321.9795414.8371482.6028431.5504409.7837468.5294419.5267436.9504
0.02710.01350.01580.07900.03610.02480.01130.0835
650.6342530.5800526.9254534.0476482.2962548.1338589.0919536.1029
0.04740.05640.10160.04970.06090.09030.09030.0384


RabbitNorway rat
443.4465442.0772626.0979512.4670545.4528513.0164551.5921604.1700
0.06770.05870.09710.06090.05640.06090.09710.0384
449.8174577.2756371.9956388.3690393.9856622.8308434.4702409.3697
0.04290.07220.07670.13320.05640.06320.05640.1264
376.8481590.5806470.5930415.0455385.2784364.5492376.9265405.8272
0.03610.01580.00900.07450.04510.03390.00450.0700
572.4836553.8482543.3979589.6406535.1879529.7384513.1191633.0783
0.06320.05640.09710.03840.06320.08580.09480.0474


Cattle
483.0319428.6478545.2948609.2222
0.06860.06860.09150.0389
369.6844610.7933423.1906411.7154
0.03660.05720.06180.1350
479.0289529.5687471.0091421.5105
0.04350.02290.01600.0892
593.1544478.8359526.3381517.2071
0.05490.06180.09380.0595
Fig. 2

The comparison of the mean spaces and the distributions of DNs among the nine species in Table 2; i of x-coordinate denotes the ith species in Table 2, ; the value of y-coordinate denotes the distributions of DNs; the value of z-coordinate denotes the mean spaces of DNs.

The -Matrix of the nine different species presented in Table 2 The comparison of the mean spaces and the distributions of DNs among the nine species in Table 2; i of x-coordinate denotes the ith species in Table 2, ; the value of y-coordinate denotes the distributions of DNs; the value of z-coordinate denotes the mean spaces of DNs.

Similarities/dissimilarities among the complete coding sequences of -globin gene of different species

In this section, we will illustrate the use of the quantitative characterization of DNA sequences by an examination of similarities/dissimilarities among the nine complete coding sequences in Table 2. The analysis of similarity/dissimilarity between two DNA sequences represented by the vectors is based on the assumption that the two sequences are similar if the corresponding vectors point to a similar direction and have similar magnitudes. Similar assumption is done in Randic et al. (2001). In order to facilitate the quantitative comparison of different species, we extract some invariants with simple methods. Firstly, we calculate the space-sum matrix (s-M) as follows: The element of the matrix s-M reveals the total sum of geometrical lengths of edges between vertices (dots) and all of the 3D-DN curve, where , , . Similarly, we have the distribution matrix (p-M) as follows: The element in the matrix p-M indicates the distribution of DNs on the 3D-DN curve. We will illustrate the use of the 3D quantitative characterization of DNA sequences with an examination of similarities/dissimilarities among the nine complete coding sequences listed in Table 2. We construct two 16-component vectors (s-vector, p-vector): s-vector consisting of the 16 space-sums in the matrix s-M; p-vector consisting of the 16 distributions in the matrix p-M. Based on the assumption of similarity/dissimilarity between two DNA sequences, the similarities among such vectors can be computed in two ways: (1) calculating the Euclidean distance between the end point of the s-vectors; (2) calculating the Euclidean distance between the end point of the p-vectors. When comparing two DNA sequences, we suppose that there are two species and , the parameters i and j denote row number and column number of matrix, respectively. The distance between the two s-vectors is The distance between the two p-vectors is The smaller the Euclidean distance is, the more similar the DNA sequences are. We list the similarities and dissimilarities for the nine complete coding sequences in Table 4, Table 5 .
Table 4

The similarity/dissimilarity matrix for the complete coding sequences of Table 2 based on the Euclidean distances between the end points of the 16-component vectors of the space-sums of 16 DNs

SpeciesHumanGoatNorth American opossumGallusBlack lemurHouse mouseRabbitNorway ratCattle
Human0816011 87714 923707786835874880110 650
Goat0779017 978822412 258937910 7045732
North American opossum018 83011 64313 72811 88684836705
Gallus018 70611 79217 41914 36720 194
Black lemur012 323563610 9839938
House mouse010 992842213 848
Rabbit0981310 067
Norway rat010 173
Cattle0
Table 5

The similarity/dissimilarity matrix for the complete coding sequences of Table 2 based on the Euclidean distances between the end points of the 16-component vectors of the distributions of the 16 DNs

SpeciesHumanGoatNorth American opossumGallusBlack lemurHouse mouseRabbitNorway ratCattle
Human00.03980.04800.07130.03200.03110.03480.03580.0442
Goat00.04740.08570.03450.04420.04300.05190.0243
North American opossum00.08340.04240.05220.04530.04500.0416
Gallus00.08340.05590.08530.07150.0900
Black lemur00.05090.02650.05470.0415
House mouse00.05150.02530.0479
Rabbit00.05250.0450
Norway rat00.0468
Cattle0
The similarity/dissimilarity matrix for the complete coding sequences of Table 2 based on the Euclidean distances between the end points of the 16-component vectors of the space-sums of 16 DNs The similarity/dissimilarity matrix for the complete coding sequences of Table 2 based on the Euclidean distances between the end points of the 16-component vectors of the distributions of the 16 DNs Observing Table 4, Table 5, we find Gallus and Opossum are very dissimilar to others among the nine species because their corresponding rows have larger entries. On the other hand, the most similar species pairs are HumanRabbit, GoatCattle and Black lemurRabbit. The more similar species pairs are HumanGoat, House mouseNorway rat, HumanBlack lemur, GoatNorth American opossum and North American opossumCattle. This is not an accident, but indicates that they have close evolutionary relationship. For comparison, we denote the degree of similarity of the pair Human–Gallus as 1. Then we list the above results of the examination of the degree of similarity between human and other several species in Fig. 3 . As one can see there is an overall agreement among similarity between human and other several species in Fig. 3. But the results for species 5 and 6 for methods a and b show divergent trends. We think that different invariants derived from the same 3D-DN curve bring about the divergent trends. The method a (i.e. s-vector or matrix s-M) includes the distribution information and the position information hidden in DNs of the 3D-DN curve. However, the method b (i.e. p-vector or matrix p-M) only reveals the distribution information of DNs of the 3D-DN curve. Even in Fig. 3 the emergence of the results for species 5 and 6 for methods a and b is remarkable but not for other species. The method a with more biological information, we believe, is relatively more credible than method b.
Fig. 3

The degree of similarity of the complete coding sequences of several species with the complete coding sequence of human (a: from [this work, Table 4]; b: from [this work, Table 5]); i of x-coordinate denotes the species of Table 4 (x-coord 1: Goat, x-coord 2: North American opossum, x-coord 3: Gallus, x-coord 4: Black lemur, x-coord 5: House mouse, x-coord 6: Rabbit, x-coord 7: Norway rat, x-coord 8: Cattle).

The degree of similarity of the complete coding sequences of several species with the complete coding sequence of human (a: from [this work, Table 4]; b: from [this work, Table 5]); i of x-coordinate denotes the species of Table 4 (x-coord 1: Goat, x-coord 2: North American opossum, x-coord 3: Gallus, x-coord 4: Black lemur, x-coord 5: House mouse, x-coord 6: Rabbit, x-coord 7: Norway rat, x-coord 8: Cattle).

Discussion

In this paper we arrange 16 DNs in a matrix according to the four categories. The matrix M is From the construction of the matrix, we know that their designs are not unique. There are 16 kinds of DNs, so they have combinations. Similarly, we find out the same phenomenon in Randic (2000) and Liu et al. (2006). Randic (2000) think that the ordering of the nucleic bases in his matrix is not important. Liu et al. (2006) only consider an ordering matrix to illustrate their method. We suggest a novel approach based on DNs to compute parameters to determine similarity/dissimilarity between two DNA sequences. The ordering of the nucleic bases in our matrix is not important. But we want to know whether the ordering of the DNs of the matrix brings about divergent trends in computing parameters to determine similarity/dissimilarity between two DNA sequences. So we randomly arrange 16 DNs in another matrix as follows: Every element of the matrix has a corresponding index , ; . Based on the index, we assign one DN as follows:Based on the above designation we can draw another 3D-DN curve to represent the same DNA sequence, which is named as . For comparison, we list the similarities and dissimilarities for the nine complete coding sequences in Table 6 by using the s-vector derived from the new 3D-DN curve based on the new random matrix . As one can see there is an overall agreement among similarities obtained by different methods, despite some variation of numerical value among them. The variation of numerical value is not important. It is important whether there exit divergent trends in computing parameters to determine similarity/dissimilarity between two DNA sequences. We list the results of the examination of the degree of similarity between human and other several species in Fig. 4 . According to the results of the examination of Fig. 4 we can draw a conclusion that there exit the same trends in computing parameters to determine similarity/dissimilarity between two DNA sequences by using the s-vectors derived from different 3D-DN curves. The ordering of the nucleic bases in the suggested approach is not important.
Table 6

The similarity/dissimilarity matrix for the complete coding sequences of Table 2 based on the Euclidean distances between the end points of the 16-component vectors of the space-sums of 16 DNs (by using the s-vector derived from the new 3D-DN curve based on the new random matrix )

SpeciesHumanGoatNorth American opossumGallusBlack lemurHouse mouseRabbitNorway ratCattle
Human010 37912 54519 172921010 8147897960212 403
Goat0805420 54111 12513 40210 12312 8876669
North American opossum019 74013 30214 17112 39190868976
Gallus021 69311 95220 07116 04523 525
Black lemur015 087604413 75913 335
House mouse013 184934016 032
Rabbit010 14811 375
Norway rat014 140
Cattle0
Fig. 4

The degree of similarity of the complete coding sequences of several species with the complete coding sequence of human (a: from [this work, Table 4]; c: from [this work, Table 6]); i of x-coordinate denotes the species of Table 4 (x-coord 1: Goat, x-coord 2: North American opossum, x-coord 3: Gallus, x-coord 4: Black lemur, x-coord 5: House mouse, x-coord 6: Rabbit, x-coord 7: Norway rat, x-coord 8: Cattle).

The similarity/dissimilarity matrix for the complete coding sequences of Table 2 based on the Euclidean distances between the end points of the 16-component vectors of the space-sums of 16 DNs (by using the s-vector derived from the new 3D-DN curve based on the new random matrix ) The degree of similarity of the complete coding sequences of several species with the complete coding sequence of human (a: from [this work, Table 4]; c: from [this work, Table 6]); i of x-coordinate denotes the species of Table 4 (x-coord 1: Goat, x-coord 2: North American opossum, x-coord 3: Gallus, x-coord 4: Black lemur, x-coord 5: House mouse, x-coord 6: Rabbit, x-coord 7: Norway rat, x-coord 8: Cattle).

Conclusion

In this paper, we give a novel approach to graphically characterize DNA primary sequences. The properties of DNs in a DNA sequence based on the matrix consisting of 16 DNs are presented in the 3D graphical representation. Based on this representation, we construct two 16-component vectors and employ the vectors in characterizing and comparing the complete coding sequence part of beta globin gene of nine different species. The results of examination show that our method is useful for visualizing the local and global features of long or short DNA sequences and can reveal the visual characteristic in a DNA sequence. The advantage of our approach is that it allows visual inspection of DNs data, helping in recognizing major similarities among different DNA sequences.
  23 in total

1.  On the characterization of DNA primary sequences by triplet of nucleic acid bases.

Authors:  M Randić; X Guo; S C Basak
Journal:  J Chem Inf Comput Sci       Date:  2001 May-Jun

2.  Mixtures of tight-binding enzyme inhibitors. Kinetic analysis by a recursive rate equation.

Authors:  P Kuzmic; K Y Ng; T D Heath
Journal:  Anal Biochem       Date:  1992-01       Impact factor: 3.365

3.  Invariants of DNA sequences based on 2DD-curves.

Authors:  Yusen Zhang; Wei Chen
Journal:  J Theor Biol       Date:  2006-05-02       Impact factor: 2.691

4.  An application of gene comparative image for predicting the effect on replication ratio by HBV virus gene missense mutation.

Authors:  Xuan Xiao; Shihuang Shao; Yongsheng Ding; Zhengde Huang; Xiaojing Chen; Kuo-Chen Chou
Journal:  J Theor Biol       Date:  2005-03-24       Impact factor: 2.691

5.  Analysis of distribution of bases in the coding sequences by a diagrammatic technique.

Authors:  C T Zhang; R Zhang
Journal:  Nucleic Acids Res       Date:  1991-11-25       Impact factor: 16.971

6.  An extension of Chou's graphic rules for deriving enzyme kinetic equations to systems involving parallel reaction pathways.

Authors:  G P Zhou; M H Deng
Journal:  Biochem J       Date:  1984-08-15       Impact factor: 3.857

7.  Graphical rules for non-steady state enzyme kinetics.

Authors:  C Kuo-Chen; L W Min
Journal:  J Theor Biol       Date:  1981-08-21       Impact factor: 2.691

8.  Graphical rules for enzyme-catalysed rate laws.

Authors:  K C Chou; S Forsén
Journal:  Biochem J       Date:  1980-06-01       Impact factor: 3.857

9.  A graphic approach to analyzing codon usage in 1562 Escherichia coli protein coding sequences.

Authors:  C T Zhang; K C Chou
Journal:  J Mol Biol       Date:  1994-04-22       Impact factor: 5.469

10.  Using cellular automata images and pseudo amino acid composition to predict protein subcellular location.

Authors:  X Xiao; S Shao; Y Ding; Z Huang; K-C Chou
Journal:  Amino Acids       Date:  2005-07-28       Impact factor: 3.520

View more
  21 in total

1.  A study of the Immune Epitope Database for some fungi species using network topological indices.

Authors:  Severo Vázquez-Prieto; Esperanza Paniagua; Hugo Solana; Florencio M Ubeira; Humberto González-Díaz
Journal:  Mol Divers       Date:  2017-05-31       Impact factor: 2.943

2.  Study of peptide fingerprints of parasite proteins and drug-DNA interactions with Markov-Mean-Energy invariants of biopolymer molecular-dynamic lattice networks.

Authors:  Lázaro Guillermo Pérez-Montoto; María Auxiliadora Dea-Ayuela; Francisco J Prado-Prado; Francisco Bolas-Fernández; Florencio M Ubeira; Humberto González-Díaz
Journal:  Polymer (Guildf)       Date:  2009-06-03       Impact factor: 4.430

3.  A novel model for DNA sequence similarity analysis based on graph theory.

Authors:  Xingqin Qi; Qin Wu; Yusen Zhang; Eddie Fuller; Cun-Quan Zhang
Journal:  Evol Bioinform Online       Date:  2011-10-04       Impact factor: 1.625

4.  Normalization of Complete Genome Characteristics: Application to Evolution from Primitive Organisms to Homo sapiens.

Authors:  Kenji Sorimachi; Teiji Okayasu; Shuji Ohhira
Journal:  Curr Genomics       Date:  2015-04       Impact factor: 2.236

5.  Circular Helix-Like Curve: An Effective Tool of Biological Sequence Analysis and Comparison.

Authors:  Yushuang Li; Wenli Xiao
Journal:  Comput Math Methods Med       Date:  2016-06-14       Impact factor: 2.238

6.  A Novel Method for Alignment-free DNA Sequence Similarity Analysis Based on the Characterization of Complex Networks.

Authors:  Jie Zhou; Pianyu Zhong; Tinghui Zhang
Journal:  Evol Bioinform Online       Date:  2016-10-06       Impact factor: 1.625

7.  Sequence comparison via polar coordinates representation and curve tree.

Authors:  Qi Dai; Xiaodong Guo; Lihua Li
Journal:  J Theor Biol       Date:  2011-10-06       Impact factor: 2.691

8.  A Tool Preference Choice Method for RNA Secondary Structure Prediction by SVM with Statistical Tests.

Authors:  Chiou-Yi Hor; Chang-Biau Yang; Chia-Hung Chang; Chiou-Ting Tseng; Hung-Hsin Chen
Journal:  Evol Bioinform Online       Date:  2013-04-14       Impact factor: 1.625

9.  Donut-shaped fingerprint in homologous polypeptide relationships--a topological feature related to pathogenic structural changes in conformational disease.

Authors:  Xin Liu; Ya-Pu Zhao
Journal:  J Theor Biol       Date:  2009-02-25       Impact factor: 2.691

10.  TN curve: a novel 3D graphical representation of DNA sequence based on trinucleotides and its applications.

Authors:  Jia-Feng Yu; Xiao Sun; Ji-Hua Wang
Journal:  J Theor Biol       Date:  2009-08-11       Impact factor: 2.691

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.