| Literature DB >> 28158313 |
Abstract
Due to different replication mechanisms between the leading and lagging strands, nucleotide composition asymmetries widely exist in bacterial genomes. A general consideration reveals that the leading strand is enriched in Guanine (G) and Thymine (T), and the lagging strand shows richness in Adenine (A) and Cytosine (C). However, some bacteria like Bacillus subtilis have been discovered composing more A than T in the leading strand. To investigate the difference, we analyze the nucleotide asymmetry from the aspect of AT and GC bias correlations. In this study, we propose a windowless method, the Z-curve Correlation Coefficient (ZCC) index, based on the Z-curve method, and analyzed more than 2000 bacterial genomes. We find that the majority of bacteria reveal negative correlations between AT and GC biases, while most genomes in Firmicutes and Tenericutes have positive ZCC indexes. The presence of PolC, purine asymmetry and stronger genes preference in the leading strand are not confined to Firmicutes, but also likely to happen in other phyla dominated by positive ZCC indexes. This method also provides a new insight into other relevant features like aerobism, and can be applied to analyze the correlation between RY (Purine and Pyrimidine) and MK (Amino and Keto) bias and so on.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28158313 PMCID: PMC5291525 DOI: 10.1371/journal.pone.0171408
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary information of ZCC indexes in different phyla.
| Phylum | Average ZCC index | Standard Deviation | Number of Bacterial Genomes | ||
|---|---|---|---|---|---|
| Negative ZCC index | Positive ZCC index | Total | |||
| Proteobacteria | -0.594 | 0.567 | 875 (82.9%) | 180 (17.1%) | 1055 |
| Actinobacteria | -0.617 | 0.475 | 195 (85.2%) | 34 (14.8%) | 229 |
| Bacteroidetes | -0.288 | 0.657 | 55 (64.7%) | 30 (35.3%) | 85 |
| Cyanobacteria | -0.350 | 0.595 | 47 (69.1%) | 21 (30.9%) | 68 |
| Spirochaetes | -0.848 | 0.195 | 57 (100.0%) | 0 | 57 |
| Chlamydiae | -0.941 | 0.043 | 52 (100.0%) | 0 | 52 |
| Deinococcus-Thermus | -0.782 | 0.332 | 19 (95.0%) | 1 (5.0%) | 20 |
| Chloroflexi | -0.897 | 0.131 | 17 (100.0%) | 0 | 17 |
| Firmicutes | 0.818 | 0.415 | 34 (7.3%) | 433 (92.7%) | 467 |
| Tenericutes | 0.663 | 0.505 | 7 (14.0%) | 43 (86.0%) | 50 |
| Thermotogae | 0.096 | 0.634 | 6 (40.0%) | 9 (60.0%) | 15 |
| Sum | - | - | 1364 (64.5%) | 751 (35.5%) | 2115 |
Fig 1Percentages of bacterial genomes with the positive and negative ZCC indexes in 11 phyla.
The abbreviated phylum names of each histogram represent the full names of Proteobacteria, Actinobacteria, Bacteroidetes, Cyanobacteria, Spirochaetes, Chlamydiae, Deinococcus-Thermus, Chloroflexi, Firmicutes, Tenericutes and Thermotogae, respectively. We classified phyla Proteobacteria, Actinobacteria, Bacteroidetes, Cyanobacteria, Spirochaetes, Chlamydiae, Deinococcus-Thermus and Chloroflexi as the Negative ZCC phylum group (N-ZCC group), while phyla Firmicutes, Tenericutes and Thermotogae are classified as the Positive ZCC phylum group (P-ZCC group), according to the predominant signs of genomes in the corresponding phylum.
Fig 2The box-plot of ZCC indexes in different phyla.
Small rings represent outliers with extreme ZCC indexes. The genomes tending to have large absolute values of ZCC indexes indicate the correlation between AT and GC disparities are widely and obviously exist.
Fig 3The Z-curve disparity figures.
Among different genomes, GC disparity curves always show inverted-V curves, while the shapes of AT disparity curves vary from phyla, ZCC index signs and numerical values.
Genome distributions to DE and PC groups in different phyla.
| Polymerase group | Prot. | Act. | Bact. | Cyan. | Spir. | Chla. | Dein. | Chlf. | Firm. | Tene. | Ther. |
|---|---|---|---|---|---|---|---|---|---|---|---|
| DE | 321/285 | 101/93 | 66/66 | 23/23 | 17/16 | 8/7 | 12/10 | 10/9 | 7 | 0/0 | 0/0 |
| PC | 0/0 | 1 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 | 153/137 | 17/15 | 12/12 |
a Represent the Positive ZCC phyla. The rest phyla without superscript belong to the Negative ZCC phyla.
b DE represent genomes which only encode DnaE1-DnaEXs without PolC.
c PC represent genomes which encode PolC as well as DnaE1-DnaEXs.
d Each cell contains two numbers, X/Y. X represents the genome number among all 772 datasets. Y represents the genome number eliminating outliers in Fig 2.
e Exceptional genome numbers compared with the general trend.
Fig 4(A) Mean values of GC contents of genomes in each phylum. (B) Average percentages of genes in the leading strand grouped by genomes with the positive and negative ZCC indexes in each phylum. In the histogram (A), mean values of GC content in N-ZCC phyla are entirely larger than those in P-ZCC phyla. The histogram (B) shows that genes are preferred to located in leading strands. Besides, the degree of strand-biased gene distribution (SGD) is generally stronger among genomes with positive ZCC indexes than those with negative ZCC indexes.