| Literature DB >> 20158921 |
Eran Elhaik1, Dan Graur, Kresimir Josić.
Abstract
BACKGROUND: The Z-curve is a three dimensional representation of DNA sequences proposed over a decade ago and has been extensively applied to sequence segmentation, horizontal gene transfer detection, and sequence analysis. Based on the Z-curve, a "genome order index," was proposed, which is defined as S = a2+ c2+t2+g2, where a, c, t, and g are the nucleotide frequencies of A, C, T, and G, respectively. This index was found to be smaller than 1/3 for almost all tested genomes, which was taken as support for the existence of a constraint on genome composition. A geometric explanation for this constraint has been suggested. Each genome was represented by a point P whose distance from the four faces of a regular tetrahedron was given by the frequencies a, c, t, and g. They claimed that an inscribed sphere of radius r = 1/ square root 3 contains almost all points corresponding to various genomes, implying that S <r2. The distribution of the points P obtained by S was studied using the Z-curve.Entities:
Mesh:
Year: 2010 PMID: 20158921 PMCID: PMC2841071 DOI: 10.1186/1745-6150-5-10
Source DB: PubMed Journal: Biol Direct ISSN: 1745-6150 Impact factor: 4.540
Figure 1Points of 235 bacterial genomes mapped to a regular tetrahedron with a height of 1 and edge lengths of . Here, the face of the tetrahedron towards the observer is flat against the plane of projection. The points are mapped according to Z-curve coordinates with an origin in the center of the tetrahedron. The two spheres in the figures are an inscribed sphere of a regular tetrahedron with a radius of 0.25 and the sphere calculated by Zhang and Zhang [7]. Forty five percents of the points P are found outside the inscribed sphere thus violating the "constraint."
Figure 2Histogram of the genome order index .
Figure 3Genome order index . The bacterial genome at (1.76, 0.32) has the smallest GC content of 0.22.
Figure 4Histograms of the three coordinates of the Z-curve for 235 bacterial genomes representing the difference between the nucleotides (a) AG and CT, (b) AC and GT, and (c) AT and GC.