| Literature DB >> 16845098 |
Abstract
In order to understand the evolution, structure and function of genomes, it is important to know the general compositional features of DNA sequences. Based on the quadratic divergence, a new segmentation algorithm to partition a given genome or DNA sequence into compositionally distinct domains has been put forward. With the aid of the technique of cumulative GC profile, the distribution of segmentation points can be displayed intuitively. We have therefore developed them into GC-Profile, an interactive web-based software system, which can be used to segment prokaryotic and eukaryotic genomes. GC-Profile provides a quantitative and qualitative view of genome organization. Based on the obtained results, the relationships between the G+C content and other genomic features, such as distributions of genes and CpG islands, can be analyzed in a perceivable manner. It shows that GC-Profile would be an appropriate starting point for analyzing the isochore structure of higher eukaryotic genomes, and an intuitive tool for identifying genomic islands in prokaryotic genomes. GC-Profile is freely available at the website http://tubic.tju.edu.cn/GC-Profile/. In addition, precompiled binaries, together with examples and documentation, can also be freely downloaded for a local execution.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16845098 PMCID: PMC1538862 DOI: 10.1093/nar/gkl040
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1An example of output pages of GC-Profile when the input is the sequence of chicken chromosome 28. (A) Coordinates, sizes and G + C contents of the segmented domains as an HTML table. (B) Number, coordinates, segmentation strength, segmentation times and segmented contig of the segmentation points as an HTML table. (C) The negative cumulative GC profile for chicken chromosome 28 marked with the segmentation points obtained. The lower plot shows the distributions of the G + C content and CpG islands along chicken chromosome 28. The G + C content is calculated for the domains segmented at t0 = 300. Here, the halting parameter t calculated for each segmentation point is also referred to as the segmentation strength, which is defined based on the quadratic divergence instead of the Jensen–Shannon divergence.
Figure 2The negative cumulative GC profile for the genome of V.vulnificus CMCP6 chromosome I marked with the segmentation points obtained. It shows that from 357 145 to 394 176 bp, 2 432 023 to 2 603 700 bp and 3 250 386 to 3 281 945 bp, there are three regions of low GC content, which are recognized as genomic islands. The segmentation points are obtained at t0 = 100. Here, we also mapped the horizontally transferred genes from HGT-DB to the negative cumulative GC profile. It can be seen that the three regions contain clusters of horizontally transferred genes, which strongly suggests that these regions are horizontally transferred genomic islands.