Literature DB >> 23908584

Effective Encoding for DNA Sequence Visualization Based on Nucleotide's Ring Structure.

A T M Golam Bari¹, Mst Rokeya Reaz, A K M Tauhidul Islam, Ho-Jin Choi, Byeong-Soo Jeong.

Abstract

Effective representation of DNA sequences is one of the important tasks in the study of genome sequences. In this paper, we propose a graphical representation of DNA sequences based on nucleotide ring structure. In the proposed representation, we convert DNA sequences into 16 dinucleotides on the surface of the hexagon so that it can preserve nucleotide's chemical property and positional information. Our approach can provide capability of efficient similarity comparison between DNA sequences and also high comparison accuracy. Furthermore, our approach satisfies uniqueness and no degeneracy of DNA sequences. In the experimental study, we use phylogeny analysis for evolutionary relationship among different species. Extensive performance study shows that the proposed method can give better performance than existing methods in comparison with the degree of similarity.

Entities: CellLine Chemical Disease Gene Species

Keywords: DNA curve; hexagon; ring structure; β-globin gene

Year: 2013 PMID： 23908584 PMCID： PMC3712558 DOI： 10.4137/EBO.S12160

Source DB: PubMed Journal: Evol Bioinform Online ISSN： 1176-9343 Impact factor: 1.625

Introduction

The rapid growth of biological sequences, such as of DNA, RNA, and protein, has demanded effective analysis methods for large biological sequences. Additionally, the analysis results are very helpful to biological researchers for predicting genes’ structure and function, as well as similarity comparison between genes and different species. For biological sequence analysis, two approaches have been mainly used: (i) sequence alignment method and (ii) non-sequence alignment method. The first approach obtains a degree of similarity between DNA sequences by comparing alignment scores of two sequences. This approach suffers from expensive computational cost as the length of sequences grows exponentially. The second approach analyzes DNA sequences by establishing a statistical model or a graphical representation model, or some machine learning model of DNA sequences. Recently, this approach is popularly studied due to the fact that it can give better accuracy and low computational overhead. In the case of non-sequence alignment method, effective DNA sequence representation or feature selection from DNA sequences is essential for DNA sequence analysis, in areas such as gene prediction, similarity comparison between genes of different species, and finding gene structure and function. For this purpose, several graphical representations have been proposed according to chemical structures of 4 nucleotides, reflecting their distribution with different chemical structure and allowing numerical characterization. As for feature selection, several machine learning techniques are effectively applied such as principal component analysis (PCA), neural network, and several classification models. In this paper, we propose a graphical representation of DNA sequences based on nucleotide ring structure. In the proposed representation, we convert DNA sequences into 16 dinucleotides on the surface of the hexagon so that the nucleotide’s chemical property and positional information is preserved. Our approach satisfies uniqueness and no degeneracy of DNA sequence is observed. It can also provide capability of efficient similarity comparison between DNA sequences in addition to high comparison accuracy. Extensive performance study shows that the proposed method can give better performance than existing methods in comparison with the degree of similarity.

Related Works

Graphical technique of DNA sequences was first initiated by Hamori and Ruskin.1 Afterwards, many advances in 2D,2–5,14,15 3D,6,7 4D,8 5D,9 and 6D10 representations of DNA sequences were developed. In this type of graphical presentation, nucleotides, dinucleotides, or tri-nucleotides are given a Cartesian coordinate in 2D through to 6D. Then DNA sequences are mapped into a set of Cartesian points and are plotted. Additionally, there is some research which compares DNA sequences based on several mathematical invariants. For example, Wu et al11 proposed 10 correlation factors: 4 mononucleotide and 6 dinucleotide factors. Qi et al12 proposed a graph theory based representation for DNA sequences recently. Word-based measure13 is one of the most widely used alignment-free approaches for sequence comparison where each sequence is mapped into an n-dimensional vector according to its k-word frequencies/probabilities. Randić et al14,15 have considered kinds of condensed matrices. Genomic rules are proposed by Castro-Chavez21 to compare biological sequences and to find compatible genomes. The classic circular genetic code is used to present the practical aspect of the code rules of variation. Castro-Chavez22,23 proposed natural patterns of symmetry and periodicity for tetrahedral representation of the genetic code. The method is applied to defragged I Ching genetic code and compared to Nirenbeg’s 16 × 4 codon table. Those two properties (ie, symmetry and periodicity) act as the harmony between the chosen geometry and the biological reality. Graphical representation of DNA sequences based on mono, di, trinucleotides, etc. need to consider this harmony. Otherwise, it would merely be an instance of displaying the nucleotides (eg, mononucleotide, dinucleotide, codon) which have little biological sense. When representing DNA sequences graphically, it must be ensured that there is no loss of information due to overlapping, loop, etc. and that the conversion from DNA sequence to graph and graph to DNA sequence should be one to one. However, some representations do not meet these criteria. Therefore, a graphical representation containing uniqueness and no degeneracy is another contribution in the era of DNA sequence visualization. On the other hand, methods which are based on non-graphical representation must also ensure that no information is lost. If not, the result would be compatible but not precise with other methods that have no loss in conversion. In this paper, we converted DNA sequences into DNA curves without any loss of information and degeneracy.

DNA Sequence Visualization by Hexagonal Structure

Chemical structure and classification of DNA bases

As stated previously, DNA sequences are the strings of four bases, that is, A, T, C and G. The core of these bases is heterocyclic organic compound, which forms ring in their chemical structure. Of them purines (A and G) have two rings while pyrimidines (C and T) have one. The chemical ring structures of those four bases are depicted in Figure 1.

Figure 1

Heterogenic cycle of four bases.

The element of these cycles are carbon and nitrogen, hydrogen and oxygen. In Figure 1, sky balls are carbon and blue balls are nitrogen. The hexagonal cycle has nitrogen in positions 1 and 3, and carbon in positions 2, 4, 5, and 6. Other than carbon and nitrogen, the bases have oxygen and hydrogen bonded with carbon and nitrogen in different number. Hence, the bases differ in molecular weight. The molecular weight of A, T, C and G are 135.13, 112.1, 111.1 and 151.13 respectively. Their ascending order in terms of molecular weight is C→T→A→G. The bases also differ by heterogenic cycle, functionality, and their bonding with hydrogen. A and C fall into the amino category while G and T are in the keto group, based on their functionality. A and T are bonded by three hydrogen bonds, and hence are in strong-H group while G and C are in weak-H group as they are bonded by only two hydrogen bonds.

Proposed DNA encoding

The proposed encoding of dinucleotides for DNA sequence visualization is solely based on ring structure of DNA bases and their molecular weight. The bases are paired to make dinucleotides in such a way that their ascending/descending order in terms of molecular weight remains intact. The dinucleotides are placed on the 6 end of the heterocyclic hexagon as well as at the midpoint of each arm of the hexagon. The six dinucleotides which are placed on the 6 ends of the hexagon are in ascending order. The midpoint dinucleotides are positioned by descending order of molecular weight. We place six ordered dinucleotides on opposite ends of the heterogenic cycle. The opposite ends are 1–4, 2–5, and 3–6. Any class (purine, pyrimidine; amino, keto; strong-H, weak-H) can be positioned at either end of the hexagon. Therefore, there are six possible combinations, as shown in Figure 2. The names of these combinations are Cycle 1, Cycle 2, Cycle 3, Cycle 4, Cycle 5, and Cycle 6 respectively.

Figure 2

Six combinations of heterogenic cycle in 2D space.

In Cycle 1, purines and pyrimidines are positioned at ends 1 and 4 of the hexagon respectively. A and G, the purines, form two dinucleotides: AG and GA. We keep AG on end 1 as it retains the sequence C→T→A→G. For the same reason, CT are placed on end 4, and CA (amino) and TG (keto) are placed on the 2 and 5 ends, respectively, and CG (strong-H) and TA (weak-H) are positioned on the 3 and 6 ends of the hexagon, respectively. Conversely, midpoint of 2–3, 3–4 and 4–5 arms are determined by the following rule: take the uncommon nucleotides and form a dinucleotide with them such that descending order (G→A→T→C) of molecular weight prevails. As for example, the midpoint of 2–3 arms is GA because the commonality between CA and CG is C. So, G and A are uncommon. This rule is different for the midpoint of 5–6, 6–1 and 1–2 arms: take the common nucleotide as well as the other which is not available on both ends. For example, the midpoint of 5–6 arms is TC because T is common between TA and TG, while C is neither in TA nor in TG. We follow these simple rules to position the 12 dinucleotides on the hexagon (six dinucleotide on six ends + six dinucleotide on midpoint of each arm of the hexagon). Based on the above discussions, Cycle 1 is drawn in the 2D Cartesian space, shown in Figure 3.

Figure 3

Cartesian coordinates of 16 dinucleotide in a hexagon.

From Figure 3, we can derive the set of position coordinates of 16 dinucleotide: (0, 1.5) → AG, (0.5, 1.25) → AT, (1, 1) → CA, (1, 0) → GA, (1, −1) → CG, (0.5, −1.25) → GT, (0, −1.5) → CT, (−0.5, −1.25) → GC, (−1, −1) → TG, (−1, 0) → TC, (−1, 1) → TA, (−0.5, 1.25) → AC, (0, 1) → AA, (−0.5, 0) → CC, (−1, 0) → GG, (0.5, 0) → TT. Let S = {s1, s2, …, sN} be a DNA sequence where si ∈ Σ = {A, T, C, G} and i = 1, 2, 3, …, N. S is mapped into a series of points P1, P2, …, PN-1. We introduce a map function φ such that S can be formulated as S = φ(sisi+1)φ(si+1si+2) … φ(sN-1sN) where, xs, ys and i represent the x-coordinate, y- coordinate, and z-coordinate respectively. Thus, we connect the N-1 points from the first one and derive a 3D curve. To locate the local and global features of the 3D curve as well as to visualize the 3D representation of this curve, we take another numerical representation. Let , we derive another mapping function for cumulative feature of the 3D curve such that Connecting N-1 points from the first one, we get the proposed novel 3D zigzag curve.

Example of the proposed method

The following example is used with the arbitrary DNA sequence ATACGATGCAG. The length of the string is 11, hence there are 10 dinucleotide. The 3D coordinate for all cycles of the sequence is shown in Table 1.

Table 1

3D coordinates of ATACGATGCAG based on the proposed method.

Points	Dinucleotide	Cycle 1			Cycle 2			Cycle 3			Cycle 4			Cycle 5			Cycle 6

		x	y	z	x	y	z	x	y	z	x	y	z	x	y	z	x	y	z
P₁	AT	0.5	1.25	1	1	0	1	0.5	−1.25	1	−0.5	−1.25	1	−1	0	1	−0.5	1.25	1
P₂	TA	−0.5	2.25	2	1	1.5	2	1.5	−0.25	2	0.5	−2.25	2	−1	−1.5	2	−1.5	0.25	2
P₃	AC	−1.0	3.5	3	1.5	2.75	3	2.5	−0.25	3	1.0	−3.5	3	−1.5	−2.75	3	−2.5	0.25	3
P₄	CG	0	2.5	4	1.5	1.25	4	1.5	−1.25	4	0	−2.5	4	−1.5	−1.25	4	−1.5	1.25	4
P₅	GA	1.0	2.5	5	2	0	5	1.0	−2.5	5	−1.0	−2.5	5	−2	0	5	−1.0	2.5	5
P₆	AT	1.5	3.75	6	3	0	6	1.5	−3.75	6	−1.5	−3.75	6	−3	0	6	−1.5	3.75	6
P₇	TG	0.5	2.75	7	2	1	7	1.5	−2.25	7	−0.5	−2.75	7	−2	−1	7	−1.5	2.25	7
P₈	GC	0	1.5	8	1	1	8	1	−1.0	8	0	−1.5	8	−1	−1	8	−1.0	1.0	8
P₉	CA	1.0	2.5	9	2	0	9	1	−2.5	9	−1	−2.5	9	−2	0	9	−1.0	2.5	9
P₁₀	AG	1.0	4.0	10	3	1	10	2	−3.5	10	−1	−4	10	−3	−1	10	−2.0	3.5	10

As for graphical representation, the 10 points P1, P2, …, P10 are plotted in 3D space for the example sequence ATACGATGCAG. The six possible DNA curves for the example sequence are shown in Figure 4.

Figure 4

The graphical representation of the proposed model for the example sequence ATACGATGCAG.

In this way, each DNA sequence is converted into a series of points. Then DNA curves are drawn from those points. Connecting N-1 points from the first one, we get the proposed novel 3D zigzag curve in the 3D space. The DNA curve is helpful to easily distinguish among different species. It can easily be seen that the example graphical representation does not hold any overlapping or loop. This property will be retained for any DNA sequence because the value of “i” in the proposed method is unique in every point.

Graphical representation of the proposed method

The proposed model is useful to show the hidden properties of long DNA sequences which are not seen from the sequence. The pictorial presentation of the proposed method proves that it is very useful to understand the evolutionary similarity/dissimilarity of different species. Figure 5 shows the 3D zigzag curve based on Cycle 1 of first exon of β-globin for 11 different species. The graphical representation clearly shows that:

Figure 5

DNA curves of 11 different species.

DNA curves of human, gorilla, chimpanzee and lemur are closely similar; Mouse and rat have also same DNA curves, so as rabbit’s DNA curve; Goat and bovine are similar; and that Gallus and opossum seem to be outliers.

Experimental Analysis

Performance metric, dataset and experimental environment

To evaluate the performance, we illustrate the use of the proposed method with an examination of similarities/ dissimilarities among the β-globin gene of 11 different species, listed in Table 2, which were also previously studied.16–20 The table shows the different important characteristics of the dataset. First, we show the overall performance of the proposed method. To do this, two features are extracted from the DNA curves: (i) geometric center and (ii) mathematical descriptor. Each DNA sequence is finally represented by their mathematical descriptors. These descriptors form six dimensional feature vectors. After that, the Euclidian distance is calculated among feature vectors of the DNA sequences. Secondly, we draw the phylogenic tree from similarity/dissimilarity matrix using UPGMA method in PHYLIP package. Finally, we compare the proposed method with the already mentioned research works16–20 to show its superiority to others.

Table 2

The first exon of β-globin gene of 11 different species.

Species	ID/Accession	Database	Length
Human	U01317	NCBI	92
Chimpanzee	X02345	NCBI	105
Gorilla	X61109	NCBI	93
Lemur	M15734	NCBI	92
Rat	X06701	NCBI	92
Mouse	V00722	NCBI	93
Rabbit	V00882	NCBI	92
Goat	M15387	NCBI	86
Bovine	X00376	NCBI	86
Opossum	J03643	NCBI	92
Gallus	V00409	NCBI	92

Our programs were written in Python 2.7, and run with the Windows XP operating system on a Pentium dual-core 2.13 GHz CPU with 2 GB main memory. We used BioPython 1.60 for sequence parsing and also ACD/ChemSketch for drawing the ring structure of nucleotides.

Numerical analysis of the proposed method

As stated earlier, features from DNA curves are extracted two ways. Firstly, the geometric centers of the curves are calculated using the following equations. Table 3 shows the geometric center of 11 DNA curves.

Table 3

Geometrical center of 11 different species.

Species	Cycle 1			Cycle 2			Cycle 3			Cycle 4			Cycle 5			Cycle 6

	u_x	u_y	u_z	u_x	u_y	u_z	u_x	u_y	u_z	u_x	u_y	u_z	u_x	u_y	u_z	u_x	u_y	u_z
Human	−4.489	−15.0275	46	−9.81	−3.75	46	−6.74725	5.947802	46	−0.45055	9.55	46	6.967033	−1.71703	46	3.901099	−11.4203	46
Chimpanzee	−4.7981	−17.5024	52.5	−10.88	−4.89	52.5	−7.59135	5.947115	52.5	−0.70192	10.33	52.5	7.875	−2.28606	52.5	4.581731	−13.1202	52.5
Gorilla	−4.5163	−15.2255	46.5	−9.9	−3.85	46.5	−6.82065	5.942935	46.5	−0.47283	9.62	46.5	7.032609	−1.76087	46.5	3.951087	−11.5516	46.5
Lemur	−2.3132	−12.6181	46	−6.71	−2.23	46	−3.65385	6.035714	46	2.208791	8.78	46	8.208791	−1.61538	46	5.148352	−9.88187	46
Rat	−6.7802	−11.8489	46	−9.78	1.53	46	−3.74176	10.74725	46	3.615385	9.04	46	8.296703	−4.3489	46	2.258242	−13.5604	46
Mouse	−5.6882	−18.2285	47	−13	−1.65	47	−7.8172	13.25	47	2.844086	16.7	47	12.13441	0.172043	47	6.903226	−14.7769	47
Rabbit	−2.3736	−13.6401	46	−7.31	−4.88	46	−5.9011	2.239011	46	−2.49451	7.46	46	5.39011	−1.29121	46	3.978022	−8.41484	46
Goat	−2.7824	−16.1882	43	−9.106	−4.98	43	−6.68235	5.679412	43	1.088235	10.82	43	14.5	−2.75	43	5.964706	−11.0441	43
Bovine	−1.7824	−14.6706	43	−7.747	−4.76	43	−6.06471	4.447059	43	0.517647	10.15	43	12	−2.25	43	5.864706	−8.96471	43
Opossum	−1.8571	−3.8159	46	−1.92	0.266	46	0.489011	2.379121	46	1.686813	0.431	46	3.032967	−3.6511	46	0.620879	−5.76374	46
Gallus	−2.4615	−11.5604	46	−5.9	−5.24	46	−4.34615	0.898352	46	−0.33516	3.341	46	4.071429	−2.97527	46	2.521978	−9.11813	46

The significance of the geometric center is that it shows the average value of x, y and z coordinate. Normally, if the geometric centers are plotted, then similar species fall into same cluster. So, the geometric center is an important feature of the DNA curves for the analysis of evolutionary relationship among different species. To make our system unbiased, we take all the possible rotations (such as Cycle 1, Cycle 2, …, Cycle 6) of hexagonal ring structure and extract the geometric center for each combination. Secondly, mathematical descriptors are obtained from the first feature, the geometric center, using the following equation. Table 4 shows the mathematical descriptor of the 11 curves.

Table 4

Mathematical descriptor of 11 different species.

Species	Cycle 1	Cycle 2	Cycle 3	Cycle 4	Cycle 5	Cycle 6
Human	48.60017	47.18367	46.87112	46.98303	46.55629	47.55673
Chimpanzee	55.54823	53.83806	53.37834	53.51123	53.13654	54.30821
Gorilla	49.13718	47.69782	47.37182	47.48703	47.06175	48.07599
Lemur	47.75529	46.54027	46.53795	46.88248	46.75461	47.3303
Rat	47.98299	47.05305	47.38675	47.01907	46.9441	48.01026
Mouse	50.73099	48.79265	49.45373	49.95977	48.54146	49.74948
Rabbit	48.03838	46.83215	46.43098	46.6677	46.33272	46.93223
Goat	46.03042	44.23482	43.88519	44.35377	43.81217	44.79453
Bovine	45.46871	43.95081	43.65269	44.18473	43.65795	44.31434
Opossum	46.19535	46.04082	46.06408	46.03293	46.24424	46.36385
Gallus	47.49423	46.67191	46.21359	46.12239	46.27557	46.96276

We use the Euclidian distance for similarity measurement. Let two different species be i and j. The mathematical descriptor of i are p1i, p2i, p3i, p4i, p5i and p6i. The same descriptors for species j are p1j, p2j, p3j, p4j, p5j and p6j. The Euclidian distance of i and j are then calculated using the following equation: The similarity/dissimilarity matrix found from the above Euclidian distance metric is shown in Table 5.

Table 5

Euclidian distance among 11 different species.

Species	Chimpanzee	Gorilla	Lemur	Rat	Mouse	Rabbit	Goat	Bovine	Opossum	Gallus
Human	0.0020	0.0002	0.0124	0.0109	0.0336	0.0116	0.0236	0.0139	0.0343	0.0189
Chimpanzee		0.0018	0.0138	0.0115	0.0330	0.0131	0.0226	0.0137	0.0358	0.0201
Gorilla			0.0126	0.0109	0.0336	0.0118	0.0235	0.0139	0.0344	0.0190
Lemur				0.0134	0.0420	0.0080	0.0280	0.0156	0.0236	0.0114
Rat					0.0155	0.0174	0.0233	0.0318	0.0342	0.0218
Mouse						0.0443	0.0280	0.0330	0.0644	0.0516
Rabbit							0.0311	0.0186	0.0238	0.0088
Goat								0.0131	0.0494	0.0368
Bovine									0.0370	0.0245
Opossum										0.0169

Some observations are vividly depicted from Table 5 which are also consistent with the graphical representation portrayed in Section 3.4. They are as follows: The smallest entry is 0.0002 for the pair (human, gorilla), showing that human and gorilla are almost same in terms of evolutionary characteristics. The same is applied for the pair (human, chimpanzee) = 0.0020. Therefore, human, chimpanzee and gorilla are similar species; The pair (goat, bovine) has the small entry 0.0131 which indicates the evolutionary similarity between goat and bovine. The biological taxonomy of bovine and goat proves that both of them are even-toed ungulates and belong to the family of “Bovidae”;16 Rat and mouse also show a small entry which indicates their evolutionary closeness; The remote mammalian opossum has the largest entry to all other mammalians.

Phylogenic analysis

A phylogeny tree was drawn from the above similarity matrix using the UPGMA method of PHYLIP software package to see the relationship among different species. The tree is shown in Figure 6.

Figure 6

Phylogenic analysis of 11 different species.

The tree also shows the similarity among (human, chimpanzee, gorilla), (mouse, rat), and (goat, bovine). Conversely, gallus is the outlier and opossum is the remote mammalian species than others.

Comparison with other methods

We see that there is an overall agreement between numerical and phylogenic analysis. To see it visually, we denote the degree of similarity of the pair (human, gorilla) as 1 in Table 5, and the results of the examination of the degree of similarity/dissimilarity between human and other several species under the Euclidian measurement are shown in Figure 7. To draw the other curves, we used Table 3 of Qi’s work,16 Table 7 of Jafarzadeh’s work,17 Table 2 of Huang,18 Table 3 of Qi’s work,19 and Table 4 of Liao’s work.20 Those tables provide the best similarity/dissimilarity value of the research works.

Figure 7

The degree of similarity/dissimilarity of the other 10 species with human.

Several reference papers16–20 work on same dataset. Of them, the research papers of Qi16 and Huang18 were based on dinucleotides, while Jafarzadeh17 used trinucleotides, and Qi19 and Liao20 used single nucleotides. Those works do not reflect the degree of similarity/dissimilarity among different species as accurately as it should be. For example, the difference in degree of similarity/dissimilarity among (rat, opossum), (mouse, opossum), and (goat, opossum) are almost the same in the above listed papers.16–20 This, however, is not true in nature as the opossum is the most remote mammal species. Therefore it can be concluded that intra-mammalian degree of similarity/ dissimilarity is not properly reflected by the above methods. The proposed method, on the other hand, shows this natural consistency among (rat, opossum), (mouse, opossum), and (goat, opossum) clearly. As the opossum shows the highest peak, this demonstrates that it is not an outlier of the dataset but actually very different from other mammalian species. The only non-mammalian species, gallus, is not truly represented by the above methods as the difference between peak value of gallus and opossum is not significant. From Table 5, we see that the opossum has variation in similarity score from other species. This is reflected in Figure 7. Gallus has the positive difference of distance in terms of degree of similarity/dissimilarity from all species analyzed, except goat and opossum. For those two species, it maintains negative differences. Therefore, from this analysis it could be concluded that goat and opossum are either mammals or non-mammals, however it is known that the goat is a mammal. It therefore must be concluded that the opossum is also a mammal, but the most remote species from the remaining mammals. As a result, gallus is the only species that is neither mammalian nor shows score value like opossum. Hence, gallus falls into a single group within the species analyzed: non-mammalian.

Conclusion

A graphical method based on dinucleotides and their positional information is proposed in this research work. Graphical as well as numeric analyses of the model show that the proposed novel method is compatible with the natural consistency in terms of evolutionary relationship of 11 different species. In this paper, DNA sequences are transformed into 3D DNA curves, and features from those curves are then extracted. DNA curves are represented by their feature vector. Subsequently, Euclidian distance is applied to those feature vectors to deduce the evolutionary relationship among 11 different species. Tri-nucleotide based DNA sequence analysis using the proposed method would be one recommended future work.

9 in total

1. Analysis of similarity/dissimilarity of DNA sequences based on nonoverlapping triplets of nucleotide bases.

Authors: Bo Liao; Tian-Ming Wang
Journal: J Chem Inf Comput Sci Date: 2004 Sep-Oct

2. Numerical characterization of DNA sequences based on digital signal method.

Authors: Zhao-Hui Qi; Xiao-Qin Qi
Journal: Comput Biol Med Date: 2009-03-03 Impact factor: 4.589

3. Defragged Binary I Ching Genetic Code Chromosomes Compared to Nirenberg's and Transformed into Rotating 2D Circles and Squares and into a 3D 100% Symmetrical Tetrahedron Coupled to a Functional One to Discern Start From Non-Start Methionines through a Stella Octangula.

Authors: Fernando Castro-Chavez
Journal: J Proteome Sci Comput Biol Date: 2012

4. A Tetrahedral Representation of the Genetic Code Emphasizing Aspects of Symmetry.

Authors: Fernando Castro-Chavez
Journal: BIOcomplexity Date: 2012-06-29

5. C-curve: a novel 3D graphical representation of DNA sequence based on codons.

Authors: Nafiseh Jafarzadeh; Ali Iranmanesh
Journal: Math Biosci Date: 2012-12-13 Impact factor: 2.144

6. H curves, a novel method of representation of nucleotide series especially suited for long DNA sequences.

Authors: E Hamori; J Ruskin
Journal: J Biol Chem Date: 1983-01-25 Impact factor: 5.157

7. Most Used Codons per Amino Acid and per Genome in the Code of Man Compared to Other Organisms According to the Rotating Circular Genetic Code.

Authors: Fernando Castro-Chavez
Journal: Neuroquantology Date: 2011-12

8. A novel model for DNA sequence similarity analysis based on graph theory.

Authors: Xingqin Qi; Qin Wu; Yusen Zhang; Eddie Fuller; Cun-Quan Zhang
Journal: Evol Bioinform Online Date: 2011-10-04 Impact factor: 1.625

9. Numerical characterization of DNA sequence based on dinucleotides.

Authors: Xingqin Qi; Edgar Fuller; Qin Wu; Cun-Quan Zhang
Journal: ScientificWorldJournal Date: 2012-04-24

9 in total

2 in total

1. File Compression and Expansion of the Genetic Code by the use of the Yin/Yang Directions to find its Sphered Cube.

Authors: Fernando Castro-Chavez
Journal: J Biodivers Bioprospect Dev Date: 2014-07

2. One novel representation of DNA sequence based on the global and local position information.

Authors: Zhiyi Mo; Wen Zhu; Yi Sun; Qilin Xiang; Ming Zheng; Min Chen; Zejun Li
Journal: Sci Rep Date: 2018-05-15 Impact factor: 4.379

2 in total