| Literature DB >> 24822026 |
Abstract
In theoretical physics, there exist two basic mathematical approaches, algebraic and geometrical methods, which, in most cases, are complementary. In the area of genome sequence analysis, however, algebraic approaches have been widely used, while geometrical approaches have been less explored for a long time. The Z-curve theory is a geometrical approach to genome analysis. The Z-curve is a three-dimensional curve that represents a given DNA sequence in the sense that each can be uniquely reconstructed given the other. The Z-curve, therefore, contains all the information that the corresponding DNA sequence carries. The analysis of a DNA sequence can then be performed through studying the corresponding Z-curve. The Z-curve method has found applications in a wide range of areas in the past two decades, including the identifications of protein-coding genes, replication origins, horizontally-transferred genomic islands, promoters, translational start sides and isochores, as well as studies on phylogenetics, genome visualization and comparative genomics. Here, we review the progress of Z-curve studies from aspects of both theory and applications in genome analysis.Entities:
Keywords: GC profile; Gene finding; Genomic island; Replication origin; Z-curve.
Year: 2014 PMID: 24822026 PMCID: PMC4009844 DOI: 10.2174/1389202915999140328162433
Source DB: PubMed Journal: Curr Genomics ISSN: 1389-2029 Impact factor: 2.236
Twelve Elements of the DNA Group (A4 Group or the Tetrahedron Group).
| Element | A4 Group | Tetrahedron Group |
|---|---|---|
| I | A C G T | x y z |
| Rx | G T A C | x -y -z |
| Ry | C A T G | -x y -z |
| Rz | T G C A | -x -y z |
| RA | A T C G | z x y |
| RC | G C T A | z -x -y |
| RG | T A G C | -z -x y |
| RT | C G A T | -z x -y |
| R2A | A G T C | y z x |
| R2C | T C A G | -y -z x |
| R2G | C T G A | -y z -x |
| R2T | G A C T | y -z -x |
Coordinates of the 4 Vertices of the Regular Tetrahe-dron ACGTa.
| Coordinates | Vertices | |||
|---|---|---|---|---|
| A | C | G | T | |
| - | - | |||
| - | - | |||
| - | - | |||
Refer to Fig. 2 (d) for the original coordinate system, where the height of the tetrahe-dron is 1. Consequently, the edge length of the tetrahedron is , and the edge length of the cube is .
A Partial List of Z-curve Applications in Genome Analysis.
| Research areas | Involved Z-Curve Components | Algorithm, Software or Database | Life Domains or Virus | Species |
|---|---|---|---|---|
| Protein-coding gene recognition a | x, y, z, S | Z-curve algorithm [1, 2], Zcurve [12] | Bacteria | Acinetobacter baumannii [17], Variovorax paradoxus [18], Amycolatopsis mediterranei [19], Bacillus thuringiensis [20], Streptomyces tendae [21], Phaeobacter gallaeciensis [22], Desulfobacterium autotrophicum [23], Mycobacterium tuberculosis [24], Magnetospirillum gryphiswaldense [25], Beggiatoa [26] |
| Phage, plasmid | Fosmids of marine Planctomycetes [127], plasmids in the human gut [128], phage Rtp [129] | |||
| Archaea | Archaea of the ANME-1 group [27] | |||
| Eukaryotes | Leptospira interrogans [130], Yeast [11], Short human protein-coding genes [56, 131], Drosophila [55] | |||
| Zcurve_V [14], Zcurve_CoV [15] | Virus, Coronavirus, phages | Prophage [33], Me Tri virus [28], novel human coronaviruses NL63 and HKU1 [34], novel bat coronaviruses [35], bat coronaviruses 1A, 1B and HKU8 [36], novel human coronavirus [37] | ||
| SARS_CoV | Various strains of SARS_CoV [38-53] | |||
| Replication origin identification | AT, GC, MK and RY disparity b | Ori-finder [78], DoriC [132, 133] | Archaea | Methanosarcina mazei[69], Halobacterium species NRC-1[63], Methanocaldococcus jannaschii [68], Sulfolobus acidocaldarius [72], Haloferax volcanii [73], Desulfurococcus kamchatkensis [74], Thermococcus sibiricus [75], Sulfolobus islandicus [76] |
| Bacteria | Moraxella catarrhalis [79], Sorangium cellulosum [80], Microcystis aeruginosa [80], Cyanothece [81], Cupriavidus metallidurans [82], Azolla filiculoides [83], Variovorax paradoxus [18], Corynebacterium pseudotuberculosis [84], [85], Orientia tsutsugamushi [86], Propionibacterium freudenreichii [87], Laribacter hongkongensis [88], Legionella pneumophila [89], Ehrlichia canis [90] | |||
| Phage, plasmid | Streptococcus pneumoniae Virulent Phage Dp-1 [134], R-plasmid pPRS3a from Bacillus cereus [135] | |||
| Genomic island identification | z’ | GC profile [9, 10] | Bacteria | Corynebacterium efficiens [105], Rhodopseudomonas palustris [106], Corynebacterium glutamicum [104], Vibrio vulnificus and Bacillus cereus [103], Agrobacterium tumefaciens, Rolstonia solanacearum, Xanthomonas axonopodis, Xanthomonas campestris, Xylella fastidiosa and Pseudomonas syringae [107], Streptomyces lividans [108], Parachlamydiaceae UWE25 [109], epsilon proteobacteria Sulfurovum and Nitratiruptor [110], Acinetobacter oleivorans [111], Silicibacter pomeroyi [112] |
| Archaea | Haloquadratum walsbyi [136] | |||
| GC content variation, | z’, S | GC profile [9, 10] | Eukaryotes | Human genome: isochores [94, 98, 137] and replication time zones [138]; Isochores for chicken [97], Arabidopsis thaliana [96], mice [95] and pig [99]; DNA curvature profile for Aspergillus fumigatus [100] |
| isochore, genome segmentation | Bacteria | Bifidobacterium longum [139], Streptomyces avermitilis [140], Erwinia amylovora [141], Ralstonia pickettii [142] | ||
| Promoter, translational start sites, nucleosome positioning | x, y, z | Z-curve algorithm [11, 12], GS-finder [113] | Bacteria | Translational start sites [113] and promoters [115] of Escherichia coli and Bacillus subtilis |
| Eukaryotes | Human Pol II promoter [114], Yeast genome for stable and dynamic nucleosome positioning [116] | |||
| Comparative genomics, genome visualization | x, y, z, z’ | Z-curve database [117] | Bacteria, archaea, eukaryotes and viruses |