| Literature DB >> 17925038 |
Abstract
BACKGROUND: The nucleotide compositional asymmetry between the leading and lagging strands in bacterial genomes has been the subject of intensive study in the past few years. It is interesting to mention that almost all bacterial genomes exhibit the same kind of base asymmetry. This work aims to investigate the strand biases in Chlamydia muridarum genome and show the potential of the Z curve method for quantitatively differentiating genes on the leading and lagging strands.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17925038 PMCID: PMC2089121 DOI: 10.1186/1471-2164-8-366
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1The distribution of points based on the two most important axes using the correspondence analysis of the nine variables u1 – u9 for 909 genes of the C. muridarum genome. The genes transcribed on the leading strand are denoted by crosses, whereas the genes located on the lagging strand are denoted by open circles. The partition between the two categories of points in the plot shows that the genes located on the two strands of replication have separate base usages.
Base usage for genes located on the leading and lagging strands in the C. muridarum
| Leading strand | Lagging strand | |||||||||||
| a | c | g | t | g-c | t-a | a | c | g | t | g-c | t-a | |
| 1st codon position | 0.261 | 0.177 | 0.333 | 0.229 | 0.156 | -0.032 | 0.280 | 0.222 | 0.271 | 0.227 | 0.049 | -0.053 |
| 2nd codon position | 0.303 | 0.208 | 0.172 | 0.316 | -0.036 | 0.02 | 0.300 | 0.243 | 0.140 | 0.317 | -0.103 | 0.017 |
| 3rd codon position | 0.284 | 0.122 | 0.212 | 0.383 | 0.090 | 0. 099 | 0.307 | 0.199 | 0.135 | 0.359 | -0.0643 | 0.052 |
| Average | 0.283 | 0.169 | 0.239 | 0.309 | 0.070 | 0.029 | 0.296 | 0.221 | 0.182 | 0.301 | -0.039 | 0.005 |
The Results of K-means clustering based on the variables u1–u9 defined in equation (3)
| Leading strand | Lagging strand | Total a | ||||
| No. of genes | Clustered correctlya | No. of genes | Clustered correctly a | No. of genes | Clustered correctly a | |
| 565 | 547 (96.8%) | 285 | 277 (97.2%) | 850 | 824 (96.9%) | |
| 680 | 604 (88.8%) | 351 | 319 (90.9%) | 1031 | 923 (89.5%) | |
| 494 | 479 (97.0%) | 400 | 355 (88.8%) | 894 | 834 (93.3%) | |
| 499 | 481 (96.4%) | 410 | 372 (90.7%) | 909 | 853 (93.8%) | |
a The percentage in the parenthesis denotes the number of the genes clustered correctly divided by the total number of the genes.
b The origin of replication of the linear chromosome is assumed to be upstream of dnaA (BB0437 gene).
c The origin and termination of replication are assumed to be upstream of dnaA (TP001 gene) and between genes TP0515 and TP0516, respectively.
d The origin of replication is assumed to lie between CT632 and CT633, while the termination is assumed to lie between CT177 and CT178.
e The locations of the origin and termination of replication have been mentioned in the Section Material and Method.
Codon usage for genes located on the leading and lagging strands in the C. muridarum genome
| AA | Leading | Significant a | Lagging | |||
| N | RSCU | N | RSCU | |||
| Phe | TTT | 6116 | 1.45 | >> | 4543 | 1.24 |
| TTC | 2333 | 0.55 | << | 2806 | 0.76 | |
| Leu | TTA | 6382 | 2.01 | >> | 4608 | 1.61 |
| TTG | 4653 | 1.46 | >> | 1955 | 0.68 | |
| CTT | 3866 | 1.21 | << | 4497 | 1.57 | |
| CTC | 1099 | 0.35 | << | 2230 | 0.78 | |
| CTA | 1748 | 0.55 | << | 2616 | 0.91 | |
| CTG | 1348 | 0.42 | -- | 1266 | 0.44 | |
| Ile | ATT | 6568 | 1.78 | >> | 5719 | 1.60 |
| ATC | 2264 | 0.61 | << | 3227 | 0.91 | |
| ATA | 2225 | 0.60 | >> | 1747 | 0.49 | |
| Met | ATG | 3833 | 1.00 | -- | 2565 | 1.00 |
| Val | GTT | 5637 | 1.73 | >> | 3056 | 1.59 |
| GTC | 1470 | 0.45 | << | 1515 | 0.79 | |
| GTA | 3172 | 0.97 | << | 2012 | 1.05 | |
| GTG | 2766 | 0.85 | >> | 1104 | 0.57 | |
| Tyr | TAT | 4137 | 1.52 | >> | 2842 | 1.27 |
| TAC | 1321 | 0.48 | << | 1628 | 0.73 | |
| TER | TAA | 270 | 0.00 | -- | 256 | 0.00 |
| TAG | 161 | 0.00 | -- | 89 | 0.00 | |
| His | CAT | 2882 | 1.58 | >> | 2406 | 1.31 |
| CAC | 776 | 0.42 | << | 1276 | 0.69 | |
| Gln | CAA | 4154 | 1.17 | << | 4833 | 1.52 |
| CAG | 2974 | 0.83 | >> | 1539 | 0.48 | |
| Asn | AAT | 4480 | 1.52 | >> | 3811 | 1.29 |
| AAC | 1418 | 0.48 | << | 2085 | 0.71 | |
| Lys | AAA | 6850 | 1.29 | << | 6888 | 1.60 |
| AAG | 3801 | 0.71 | >> | 1726 | 0.40 | |
| Asp | GAT | 7033 | 1.66 | >> | 4364 | 1.43 |
| GAC | 1462 | 0.34 | << | 1758 | 0.57 | |
| Glu | GAA | 7392 | 1.18 | << | 6318 | 1.49 |
| GAG | 5141 | 0.82 | >> | 2180 | 0.51 | |
| Ser | TCT | 5851 | 2.58 | -- | 5418 | 2.51 |
| TCC | 1530 | 0.67 | << | 2523 | 1.17 | |
| TCA | 1626 | 0.72 | -- | 1640 | 0.76 | |
| TCG | 1193 | 0.53 | >> | 786 | 0.36 | |
| Pro | CCT | 3633 | 2.16 | >> | 3588 | 1.99 |
| CCC | 834 | 0.50 | << | 1532 | 0.85 | |
| CCA | 1606 | 0.95 | -- | 1648 | 0.91 | |
| CCG | 666 | 0.40 | >> | 454 | 0.25 | |
| Thr | ACT | 2794 | 1.44 | >> | 2898 | 1.35 |
| ACC | 1010 | 0.52 | << | 1875 | 0.88 | |
| ACA | 2504 | 1.29 | -- | 2849 | 1.33 | |
| ACG | 1466 | 0.75 | >> | 939 | 0.44 | |
| Ala | GCT | 6448 | 2.01 | >> | 4661 | 1.82 |
| GCC | 1459 | 0.45 | << | 1797 | 0.70 | |
| GCA | 3191 | 0.99 | << | 2826 | 1.11 | |
| GCG | 1753 | 0.55 | >> | 936 | 0.37 | |
| Cys | TGT | 2044 | 1.41 | >> | 1266 | 1.08 |
| TGC | 851 | 0.59 | << | 1079 | 0.92 | |
| TER | TGA | 69 | 0.00 | -- | 65 | 0.00 |
| Trp | TGG | 1863 | 1.00 | -- | 1275 | 1.00 |
| Arg | CGT | 2571 | 1.65 | >> | 1399 | 1.40 |
| CGC | 915 | 0.59 | << | 1351 | 1.35 | |
| CGA | 1844 | 1.18 | << | 1362 | 1.36 | |
| CGG | 989 | 0.63 | >> | 336 | 0.34 | |
| Ser | AGT | 2208 | 0.97 | >> | 1149 | 0.53 |
| AGC | 1211 | 0.53 | << | 1411 | 0.65 | |
| Arg | AGA | 2228 | 1.43 | >> | 1284 | 1.28 |
| AGG | 815 | 0.52 | >> | 271 | 0.27 | |
| Gly | GGT | 2331 | 0.76 | >> | 1199 | 0.59 |
| GGC | 1241 | 0.41 | << | 1312 | 0.64 | |
| GGA | 5116 | 1.67 | << | 4021 | 1.98 | |
| GGG | 3538 | 1.16 | >> | 1611 | 0.79 | |
a Results of Chi-squared test: << indicates that the leading strand genes used the condon more frequently than the lagging strand genes; >> indicates the lagging strand genes used the codon more frequently than the leading strand genes; -- indicates that there is no significant difference in usage of the codon on either strand. Significance is examined at the level of 5%.
Figure 2(a) The y components of the Z curves for B. burgdorferi, T. pallidum, C. trachomatis and C. muridarum genomes. (b) The y component of the Z curve for E. coli K-12 genome. Comparing (a) with (b) and putting emphasis on the coordinate values, it is found that the y component increases or decreases much slower along the DNA sequence for E. coli genome than that for the other four genomes. Note that the y component of the Z curve represents the plus of cumulative excess of G over C and T over A. Therefore, there are remarkable excess of G over C and T over A in the four bacterial genomes.