| Literature DB >> 26404268 |
Hai-Long Zhao1,2, Zhong-Kui Xia3,4, Fa-Zhan Zhang5,6, Yuan-Nong Ye7,8, Feng-Biao Guo9,10.
Abstract
Composition bias from Chargaff's second parity rule (PR2) has long been found in sequenced genomes, and is believed to relate strongly with the replication process in microbial genomes. However, some disagreement on the underlying reason for strand composition bias remains. We performed an integrative analysis of various genomic features that might influence composition bias using a large-scale dataset of 1111 genomes. Our results indicate (1) the bias was stronger in obligate intracellular bacteria than in other free-living species (p-value=0.0305); (2) Fusobacteria and Firmicutes had the highest average bias among the 24 microbial phyla analyzed; (3) the strength of selected codon usage bias and generation times were not observably related to strand composition bias (p-value=0.3247); (4) significant negative relationships were found between GC content, genome size, rearrangement frequency, Clusters of Orthologous Groups (COG) functional subcategories A, C, I, Q, and composition bias (p-values<1.0×10(-8)); (5) gene density and COG functional subcategories D, F, J, L, and V were positively related with composition bias (p-value<2.2×10(-16)); and (6) gene density made the most important contribution to composition bias, indicating transcriptional bias was associated strongly with strand composition bias. Therefore, strand composition bias was found to be influenced by multiple factors with varying weights.Entities:
Keywords: COG functional category; gene density; genomic features; multiple factors; obligate intracellular bacteria; strand composition bias
Mesh:
Year: 2015 PMID: 26404268 PMCID: PMC4613354 DOI: 10.3390/ijms160923111
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Box-and-whiskers represent for composition bias of all genomes, which sorted into 24 phyla. The bottom and top of box mark the first and third quartiles, and the band inside the box denotes the median. The ends of the whiskers in each plot represent the lowest datum still within 1.5 IQR (interquartile range) of the lower quartiles, and the highest datum still within 1.5 IQR of the upper quartiles. Any data not included between the whiskers is plotted as an outlier with a small circle. This boxplot graphically depict the different bias distribution in respective phylum.
Strand composition bias for each phylum a.
| Phylum | Standard Deviation | Variance | Mean |
|---|---|---|---|
| 0.005309 | 2.82 × 10−5 | 0.009124 | |
| 0.015749 | 0.000248 | 0.018728 | |
| 0.005263 | 2.77 × 10−5 | 0.00957 | |
| 0.016805 | 0.000282 | 0.027048 | |
| 0.012521 | 0.000157 | 0.055526 | |
| 0.018046 | 0.000326 | 0.051947 | |
| 0.015056 | 0.000227 | 0.024993 | |
| 0.021638 | 0.000468 | 0.019847 | |
| 0.007318 | 5.36 × 10−5 | 0.051752 | |
| 0.007668 | 5.88 × 10−5 | 0.015442 | |
| 0.010132 | 0.000103 | 0.052093 | |
| 0.030697 | 0.000942 | 0.052418 | |
| NA | NA | 0.056901 | |
| 0.028571 | 0.000816 | 0.071236 | |
| 0.048886 | 0.00239 | 0.099682 | |
| NA | NA | 0.020857 | |
| NA | NA | 0.013445 | |
| 0.012161 | 0.000148 | 0.023082 | |
| 0.017163 | 0.000295 | 0.028607 | |
| 0.046978 | 0.002207 | 0.062153 | |
| 0.012306 | 0.000151 | 0.047907 | |
| 0.023255 | 0.000541 | 0.030599 | |
| 0.004126 | 1.70 × 10−5 | 0.016197 | |
| 0.005228 | 2.73 × 10−5 | 0.029585 |
a All genomes are grouped by phylum, NA refer to that there is only one species in this phylum. The phylum Fusobacteria owned the highest mean bias value, and the Firmicutes comes second.
Mean value of various biological characters for each phylum a.
| Phylum | Genome Size | GC Content | Gene Density | ||
|---|---|---|---|---|---|
| 6,581,121.33 | 0.602611 | 0.524179 | 0.546299 | 0.239179 | |
| 4,434,386.26 | 0.647473 | 0.591745 | 0.655926 | 0.5707 | |
| 1,680,594.86 | 0.3874153 | 0.514286 | 0.026764 | 0.090473 | |
| 3,688,038.52 | 0.4246355 | 0.553854 | 0.035009 | 0.101365 | |
| 1,265,852.44 | 0.4046721 | 0.544713 | 0.022567 | 0.081014 | |
| 2,618,734.27 | 0.5079388 | 0.583907 | 0.061015 | 0.114787 | |
| 2,435,937.54 | 0.5531583 | 0.519221 | 0.044977 | 0.063278 | |
| 3,397,176.98 | 0.4460103 | 0.508569 | −0.33356 | −0.55797 | |
| 2,728,233 | 0.3682745 | 0.642415 | 0.012609 | 0.057666 | |
| 2,411,100.11 | 0.66285 | 0.517812 | −0.10793 | −0.12243 | |
| 1,907,773.5 | 0.3384917 | 0.681195 | 0.01941 | 0.055101 | |
| 1,384,709.5 | 0.3757977 | 0.726988 | 0.014904 | 0.078649 | |
| 3,842,635 | 0.4805184 | 0.580603 | 0.047916 | 0.088216 | |
| 3,077,249.49 | 0.3853 | 0.786812 | 0.020021 | 0.081354 | |
| 2,680,383 | 0.29141 | 0.72341 | 0.01046 | 0.05595 | |
| 4,636,964 | 0.6427436 | 0.566455 | 0.043068 | 0.055612 | |
| 2,003,803 | 0.341289 | 0.552386 | 0.019141 | 0.07548 | |
| 6,254,950 | 0.5550987 | 0.502151 | 0.116125 | 0.138471 | |
| 3,506,416.55 | 0.5337785 | 0.569934 | 0.067462 | 0.135439 | |
| 1,702,653.17 | 0.3721947 | 0.600467 | 0.021591 | 0.121083 | |
| 1,914,533 | 0.5454971 | 0.75006 | 0.023406 | 0.050368 | |
| 892,007.889 | 0.2794737 | 0.665323 | −0.02018 | −0.08702 | |
| 1,976,742.36 | 0.4028872 | 0.54724 | 0.024232 | 0.083806 | |
| 3,998,507 | 0.5480856 | 0.51413 | 0.093882 | 0.10771 | |
| 3,329,265.48 | 0.4952767 | 0.612158 | 0.092191 | 0.127667 |
a Genome size, GC content and rearrangement frequency of Fusobacteria and Firmicutes are all smaller than average of each trait for all genomes, but the opposite was true for the gene density.
Figure 2The phylogenetic tree of the 24 phyla. N means the total strains in a phylum, M means the average Score in a phylum.
The correlation of each Clusters of Orthologous Groups (COG) functional subcategory and strand composition bias.
| COG Functional Category | Correlation | ||
|---|---|---|---|
| J | Translation, ribosomal structure and biogenesis P | 8.11 × 10−32 | 0.341886 |
| A | RNA processing and modification N | 2.44 × 10−13 | −0.21728 |
| K | Transcription | 0.099239 | −0.04948 |
| L | Replication, recombination and repair P | 1.01 × 10−8 | 0.170797 |
| B | Chromatin structure and dynamics | 0.002404 | −0.09097 |
| D | Cell cycle control, cell division, chromosome partitioning P | 1.05 × 10−45 | 0.407564 |
| Y | Nuclear structure | 0.222949 | 0.036592 |
| V | Defense mechanisms P | 3.93 × 10−14 | 0.224269 |
| T | Signal transduction mechanisms | 1.77 × 10−7 | −0.15589 |
| M | Cell wall/membrane/envelope biogenesis | 0.609835 | −0.01533 |
| N | Cell motility | 0.198305 | 0.038623 |
| Z | Cytoskeleton | 0.006632 | −0.0814 |
| W | Extracellular structures | 0.901043 | −0.00373 |
| U | Intracellular trafficking, secretion, and vesicular transport | 0.908091 | 0.003467 |
| O | Posttranslational modification, protein turnover, chaperones | 0.188347 | −0.0395 |
| C | Energy production and conversion N | 4.51 × 10−11 | −0.1959 |
| G | Carbohydrate transport and metabolism | 0.193919 | 0.039003 |
| E | Amino acid transport and metabolism | 0.417676 | −0.02434 |
| F | Nucleotide transport and metabolism P | 5.99 × 10−39 | 0.377498 |
| H | Coenzyme transport and metabolism | 0.01405 | 0.073666 |
| I | Lipid transport and metabolism N | 1.22 × 10−19 | −0.26737 |
| P | Inorganic ion transport and metabolism | 0.081681 | −0.05226 |
| Q | Secondary metabolites biosynthesis, transport and catabolism N | 6.65 × 10−40 | −0.38194 |
N denotes significantly negative correlation between subcategories and composition bias. P denotes significantly positive correlation between subcategories and composition bias.
Average value of discrepant times (AVDT) between strong-biased group and week-biased group for each functional subcategory in descending order.
| COG | AVDT | COG | AVDT |
|---|---|---|---|
| D | 5.709197 | C | 1.086021 |
| K | 3.415376 | H | 1.053758 |
| N | 2.848684 | F | 1.046122 |
| T | 2.229241 | V | 1.02066 |
| M | 2.181872 | E | 0.99786 |
| O | 2.089135 | I | 0.936222 |
| U | 2.013089 | P | 0.914553 |
| G | 1.472415 | A | 0.864394 |
| L | 1.363586 | Z | 0.775298 |
| B | 1.266486 | Q | 0.64794 |
| J | 1.23429 | W | 0.6 |
Relationship between each type of replication and repair genes and composition bias.
| Pathway | Function | Correlation | |
|---|---|---|---|
| ko03030 | DNA replication | 3.69 × 10−10 | 0.18656 |
| ko03032 | DNA replication proteins | 6.70 × 10−9 | 0.172841 |
| ko03036 | Chromosome and associated proteins | 3.28 × 10−7 | 0.152472 |
| ko03400 | DNA repair and recombination proteins | 6.73 × 10−10 | 0.183808 |
| ko03410 | Base excision repair | 2.11 × 10−6 | 0.141724 |
| ko03420 | Nucleotide excision repair | 4.15 × 10−12 | 0.2059713 |
| ko03430 | Mismatch repair | 9.39 × 10−12 | 0.2025802 |
| ko03440 | Homologous recombination | 1.16 × 10−10 | 0.191753 |
| ko03450 | Non-homologous end-joining | 0.926821 | 0.002759 |
| ko03460 | Fanconi anemia pathway | 0.002531 | 0.090509 |
Principal component regression analysis of various genomic features a.
| Genomic Features | Genome Size | Gene Density | GC Content | SCOGs | WCOGs | A | ||
|---|---|---|---|---|---|---|---|---|
| 0.0558 | 0.0648 | 0.0391 | 0.0004 | 0.0003 | 0.0332 | 0.0326 | 0.0122 | |
| Genomic features | C | D | F | I | J | L | Q | V |
| 0.0634 | 0.0348 | 0.0272 | 0.0238 | 0.0299 | 0.0371 | 0.0262 | 0.0297 |
a Detail values for each of the genomic features and strand composition bias are listed in Table S2.