| Literature DB >> 19861381 |
B R Powdel1, Siddhartha Sankar Satapathy, Aditya Kumar, Pankaj Kumar Jha, Alak Kumar Buragohain, Munindra Borah, Suvendra Kumar Ray.
Abstract
Chargaff's rule of intra-strand parity (ISP) between complementary mono/oligonucleotides in chromosomes is well established in the scientific literature. Although a large numbers of papers have been published citing works and discussions on ISP in the genomic era, scientists are yet to find all the factors responsible for such a universal phenomenon in the chromosomes. In the present work, we have tried to address the issue from a new perspective, which is a parallel feature to ISP. The compositional abundance values of mono/oligonucleotides were determined in all non-overlapping sub-chromosomal regions of specific size. Also the frequency distributions of the mono/oligonucleotides among the regions were compared using the Kolmogorov-Smirnov test. Interestingly, the frequency distributions between the complementary mono/oligonucleotides revealed statistical similarity, which we named as intra-strand frequency distribution parity (ISFDP). ISFDP was observed as a general feature in chromosomes of bacteria, archaea and eukaryotes. Violation of ISFDP was also observed in several chromosomes. Chromosomes of different strains belonging a species in bacteria/archaea (Haemophilus influenza, Xylella fastidiosa etc.) and chromosomes of a eukaryote are found to be different among each other with respect to ISFDP violation. ISFDP correlates weakly with ISP in chromosomes suggesting that the latter one is not entirely responsible for the former. Asymmetry of replication topography and composition of forward-encoded sequences between the strands in chromosomes are found to be insufficient to explain the ISFDP feature in all chromosomes. This suggests that multiple factors in chromosomes are responsible for establishing ISFDP.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19861381 PMCID: PMC2780954 DOI: 10.1093/dnares/dsp021
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
ISFDP analysis in bacterial chromosomes
| Serial number | Strain name | Size (kb) | GC% | KS (W) | KS (S) | |(∑A − ∑T)|/(∑A + ∑T) | |(∑G − ∑C)|/(∑G + ∑C) | Bacterial group | TB (°) |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 3598 | 40.43 | 0.745 | 0.00068 | 0.00484 | G-Proteobacteria | 7.07 | ||
| 2 | 2274 | 41.3 | 0.436 | 0.819 | 0.00187 | 0.00109 | NA | ||
| 3 | 2319 | 44.91 | 0.312 | 0.291 | 0.00232 | 0.00291 | |||
| 4 | 4744 | 61.55 | 0.88 | 0.19 | 0.00141 | 0.00139 | |||
| 5 | 4702 | 58.51 | 0.04 | 0.959 | 0.00215 | 0.00073 | |||
| 6 | 2841 | 59.38 | 0.00694 | 0.00967 | A-Proteobacteria | 7.37 | |||
| 7 | 3123 | 36.26 | 0.00615 | 0.01324 | Firmicutes | NA | |||
| 8 | 5013 | 74.9 | 0.077 | 0.00476 | 0.00249 | D-Proteobacteria | 70.57 | ||
| 9 | 5277 | 73.53 | 0.712 | 0.00073 | 0.00216 | 7.48 | |||
| 10 | 5227 | 35.38 | 0.00215 | 0.00581 | Firmicutes | NA | |||
| 11 | 5227 | 35.38 | 0.00215 | 0.00582 | 7.48 | ||||
| 12 | 5228 | 35.38 | 0.00221 | 0.00588 | 7.46 | ||||
| 13 | 4214 | 43.52 | 0.219 | 0.234 | 0.00212 | 0.00224 | 13.69 | ||
| 14 | 5257 | 35.43 | 0.123 | 0.00042 | 0.00081 | NA | |||
| 15 | 5237 | 35.41 | 0.015 | 0.00194 | 0.00438 | 3.98 | |||
| 16 | 4773 | 68.1 | 0.433 | 0.00247 | 0.00776 | B-Proteobacteria | 37.01 | ||
| 17 | 4086 | 67.72 | 0.861 | 0.00022 | 0.00390 | 71.28 | |||
| 18 | 9105 | 64.06 | 0.512 | 0.31 | 0.00070 | 0.00038 | A-Proteobacteria | 7.07 | |
| 19 | 8264 | 64.92 | 0.381 | 0.01 | 0.00100 | 0.00163 | NA | ||
| 20 | 1177 | 57.35 | 0.472 | 0.00227 | 0.00312 | ||||
| 21 | 2052 | 39.43 | 0.033 | 0.048 | 0.00038 | 0.00599 | E-Proteobacteria | ||
| 22 | 1971 | 44.54 | 0.028 | 0.752 | 0.00745 | 0.00282 | |||
| 23 | 1777 | 30.31 | 0.574 | 0.23 | 0.00330 | 0.00436 | 8.69 | ||
| 24 | 1628 | 30.54 | 0.491 | 0.029 | 0.00250 | 0.00613 | NA | ||
| 25 | 1641 | 30.55 | 0.067 | 0.132 | 0.00296 | 0.00457 | 10.25 | ||
| 26 | Candidatus | 3944 | 56.17 | 0.258 | 0.133 | 0.00199 | 0.00157 | Firmicutes | NA |
| 27 | 4016 | 67.22 | 0.042 | 0.171 | 0.00396 | 0.00188 | A-Proteobacteria | 8.56 | |
| 28 | 1072 | 40.34 | 0.221 | 0.853 | 0.00107 | 0.00337 | Chlamydiae | 1.17 | |
| 29 | 1044 | 41.31 | 0.228 | 0.284 | 0.00230 | 0.00059 | 1.30 | ||
| 30 | 1144 | 39.87 | 0.534 | 0.00065 | 0.00361 | 0.57 | |||
| 31 | 2158 | 42.44 | 0.00592 | 0.00573 | G-Proteobacteria | NA | |||
| 32 | 1995 | 42.66 | 0.014 | 0.467 | 0.00198 | 0.00029 | 31.15 | ||
| 33 | 3730 | 57.84 | 0.59 | 0.00189 | 0.00322 | Firmicutes | 10.70 | ||
| 34 | 3462 | 63.01 | 0.3 | 0.159 | 0.00152 | 0.00106 | D-Proteobacteria | NA | |
| 35 | 3570 | 63.14 | 0.557 | 0.082 | 0.00143 | 0.00024 | 4.78 | ||
| 36 | 4368 | 56.77 | 0.167 | 0.388 | 0.00359 | 0.00044 | G-Proteobacteria | NA | |
| 37 | 4518 | 52.98 | 0.645 | 0.39 | 0.00169 | 0.00163 | NA | ||
| 38 | 4938 | 50.52 | 0.714 | 0.084 | 0.00062 | 0.00328 | 7.40 | ||
| 39 | 5082 | 50.55 | 0.779 | 0.576 | 0.00032 | 0.00070 | NA | ||
| 40 | 5231 | 50.48 | 0.112 | 0.92 | 0.00173 | 0.00080 | 5.66 | ||
| 41 | 4979 | 50.62 | 0.736 | 0.128 | 0.00205 | 0.00212 | NA | ||
| 42 | 4643 | 50.82 | 0.328 | 0.469 | 0.00151 | 0.00207 | |||
| 43 | 4639 | 50.79 | 0.732 | 0.587 | 0.00054 | 0.00113 | 4.28 | ||
| 44 | 5065 | 50.6 | 0.51 | 0.237 | 0.00076 | 0.00203 | 3.70 | ||
| 45 | 4646 | 50.8 | 0.873 | 0.729 | 0.00073 | 0.00091 | 12.64 | ||
| 46 | 7497 | 72.82 | 0.463 | 0.036 | 0.00141 | 0.00139 | Actinobacteria | NA | |
| 47 | 5433 | 70.08 | 0.808 | 0.662 | 0.00129 | 0.00017 | |||
| 48 | 1914 | 38.16 | 0.886 | 0.654 | 0.00089 | 0.00044 | G-Proteobacteria | ||
| 49 | 1813 | 38.04 | 0.544 | 0.038 | 0.00054 | 0.00317 | |||
| 50 | 1887 | 38.01 | 0.125 | 0.00005 | 0.01016 | ||||
| 51 | 1830 | 38.15 | 0.154 | 0.00298 | 0.00472 | 46.61 | |||
| 52 | 1553 | 38.18 | 0.596 | 0.00869 | 0.00164 | E-Proteobacteria | NA | ||
| 53 | 1799 | 35.93 | 0.161 | 0.00499 | 0.01518 | 46.54 | |||
| 54 | 1643 | 39.19 | 0.246 | 0.256 | 0.00259 | 0.00510 | 10.97 | ||
| 55 | 1993 | 34.72 | 0.382 | 0.00066 | 0.01644 | Firmicutes | 19.54 | ||
| 56 | 2291 | 46.22 | 0.023 | 0.00271 | 0.02882 | NA | |||
| 57 | 1856 | 49.69 | 0.491 | 0.264 | 0.00201 | 0.00087 | |||
| 58 | 1999 | 38.87 | 0.00122 | 0.01040 | |||||
| 59 | 2529 | 35.75 | 0.233 | 0.056 | 0.00352 | 0.00524 | |||
| 60 | 2438 | 35.86 | 0.399 | 0.521 | 0.00147 | 0.00136 | |||
| 61 | 4719 | 54.17 | 0.00490 | 0.01198 | Magnetococcus | ||||
| 62 | 4967 | 65.09 | 0.031 | 0.00339 | 0.00288 | A-Proteobacteria | 2.14 | ||
| 63 | 2971 | 55.72 | 0.03 | 0.916 | 0.00226 | 0.00135 | B-Proteobacteria | 10.57 | |
| 64 | 3304 | 63.59 | 0.145 | 0.00150 | 0.00287 | G-Proteobacteria | NA | ||
| 65 | 3268 | 57.8 | 0.00378 | 0.00609 | Actinobacteria | 7.04 | |||
| 66 | 5737 | 68.44 | 0.389 | 0.478 | 0.00030 | 0.00060 | NA | ||
| 67 | 4424 | 65.62 | 0.366 | 0.00006 | 0.00198 | ||||
| 68 | 5631 | 65.47 | 0.00433 | 0.00374 | |||||
| 69 | 996 | 31.45 | 0.18 | 0.615 | 0.00626 | 0.00021 | Tenericutes | 9.32 | |
| 70 | 580 | 31.69 | 0.148 | 0.01219 | 0.00433 | 3.75 | |||
| 71 | 897 | 28.52 | 0.033 | 0.599 | 0.01020 | 0.00067 | NA | ||
| 72 | 816 | 40.01 | 0.115 | 0.01767 | 0.00243 | 16.23 | |||
| 73 | 2153 | 52.69 | 0.07 | 0.033 | 0.00601 | 0.00144 | B-Proteobacteria | 9.20 | |
| 74 | 2273 | 51.52 | 0.695 | 0.00135 | 0.00806 | NA | |||
| 75 | 4406 | 61.72 | 0.332 | 0.53 | 0.00112 | 0.00041 | A-Proteobacteria | ||
| 76 | 3402 | 62.05 | 0.011 | 0.00323 | 0.00294 | 37.15 | |||
| 77 | 3481 | 50.32 | 0.02 | 0.056 | 0.00530 | 0.00243 | G-Proteobacteria | 8.39 | |
| 78 | 2661 | 48.49 | 0.992 | 0.318 | 0.00043 | 0.00162 | B-Proteobacteria | NA | |
| 79 | 6413 | 41.35 | 0.134 | 0.857 | 0.00129 | 0.00162 | Cyanobacteria | ||
| 80 | 5888 | 64.16 | 0.657 | 0.251 | 0.00078 | 0.00173 | G-Proteobacteria | 1.99 | |
| 81 | 6438 | 60.52 | 0.028 | 0.00443 | 0.00222 | 3.18 | |||
| 82 | 5959 | 61.86 | 0.602 | 0.013 | 0.00113 | 0.00187 | 36.81 | ||
| 83 | 2912 | 66.78 | 0.238 | 0.47 | 0.00483 | 0.00023 | B-Proteobacteria | NA | |
| 84 | 3716 | 67.04 | 0.056 | 0.00636 | 0.00581 | 22.40 | |||
| 85 | 4381 | 61.27 | 0.107 | 0.00175 | 0.01177 | A-Proteobacteria | 17.65 | ||
| 86 | 5057 | 61.09 | 0.00363 | 0.01196 | NA | ||||
| 87 | 1522 | 31.65 | 0.00859 | 0.01514 | 26.08 | ||||
| 88 | 1268 | 32.44 | 0.584 | 0.052 | 0.00294 | 0.00634 | 16.28 | ||
| 89 | 1257 | 32.47 | 0.575 | 0.00182 | 0.00767 | NA | |||
| 90 | 1111 | 28.92 | 0.919 | 0.00020 | 0.01395 | 26.15 | |||
| 91 | 4809 | 52.09 | 0.267 | 0.043 | 0.00151 | 0.00152 | G-Proteobacteria | 9.85 | |
| 92 | 4857 | 52.22 | 0.89 | 0.585 | 0.00043 | 0.00008 | 3.58 | ||
| 93 | 4519 | 51.21 | 0.571 | 0.00022 | 0.00249 | 11.05 | |||
| 94 | 4574 | 50.92 | 0.48 | 0.268 | 0.00147 | 0.00214 | NA | ||
| 95 | 2742 | 32.78 | 0.788 | 0.427 | 0.00130 | 0.00247 | Firmicutes | 0.10 | |
| 96 | 2499 | 32.1 | 0.01246 | 0.01087 | 21.12 | ||||
| 97 | 2685 | 32.79 | 0.00584 | 0.00643 | NA | ||||
| 98 | 2030 | 36.83 | 0.111 | 0.046 | 0.00403 | 0.00679 | |||
| 99 | 1860 | 38.73 | 0.619 | 0.15 | 0.00133 | 0.00154 | 3.71 | ||
| 100 | 1796 | 39.08 | 0.05 | 0.863 | 0.00537 | 0.00459 | 2.63 | ||
| 101 | 8667 | 72.12 | 0.037 | 0.00394 | 0.00134 | Actinobacteria | NA | ||
| 102 | 1860 | 46.25 | 0.171 | 0.00344 | 0.01548 | Thermotogae | 39.15 | ||
| 103 | 1824 | 46.09 | 0.733 | 0.00013 | 0.01687 | NA | |||
| 104 | 2909 | 66.07 | 0.962 | 0.086 | 0.00027 | 0.00059 | B-Proteobacteria | 5.70 | |
| 105 | 3024 | 47.78 | 0.069 | 0.00514 | 0.00105 | G-Proteobacteria | NA | ||
| 106 | 1332 | 37.03 | 0.037 | 0.00994 | 0.00491 | ||||
| 107 | 5076 | 65.07 | 0.196 | 0.719 | 0.00302 | 0.00038 | |||
| 108 | 4941 | 63.69 | 0.87 | 0.499 | 0.00104 | 0.00065 | |||
| 109 | 2679 | 52.68 | 0.04727 | 0.05291 | 62.97 | ||||
| 110 | 2519 | 51.78 | 0.044 | 0.00379 | 0.01093 | 6.44 | |||
| 111 | 4653 | 47.64 | 0.649 | 0.00090 | 0.00520 | NA | |||
| 112 | 4744 | 47.61 | 0.969 | 0.00124 | 0.00496 |
TB, termination bias. Chromosomes of bacteria analyzed in this study. The KS test for significance between the frequency distribution of complementary nucleotide values are given as: KS (W) between A and T and KS (S) between G and C. In bacteria, archaea and eukaryotes, P-values of <10−4 (strong violation of ISFDP) are shown in bold and P-values of <0.01 but≥10−4 (weak violation of ISFDP) are shown in italics. The P-value between 10−4 and 10−3 is shown as 0.000. Relative absolute abundance value difference between the complementary nucleotides is given by |(∑A − ∑T)|/(∑A + ∑T) and |(∑G − ∑C)|/(∑G + ∑C) for ATS and GCS, respectively. In chromosome of X. fastidiosa 9a5c, the GCS/ATS value is highest suggesting that the difference between the abundance values of complementary nucleotides is high. The P-value by the KS test is in concordant with the ATS/GCS suggesting that the abundance difference can be represented by the frequency distribution study of the nucleotides. Similar relation is also observed in other chromosomes.
ISFDP analysis in archaea chromosomes
| Serial number | Strain name | Size (kb) | GC% | KS (W) | KS (S) | |(∑A − ∑T)|/(∑A + ∑T) | |(∑G − ∑C)|/(∑G + ∑C) | Archaea group |
|---|---|---|---|---|---|---|---|---|
| 1 | 1670 | 56.3 | 0.025 | 0.01292 | 0.00695 | Crenarchaeota | ||
| 2 | 2179 | 48.5 | 0.037 | 0.093 | 0.00365 | 0.00350 | Euryarchaeota | |
| 3 | 2078 | 43.08 | 0.586 | 0.643 | 0.00146 | 0.00104 | Crenarchaeota | |
| 4 | Candidatus | 2543 | 54.51 | 0.058 | 0.191 | 0.00311 | 0.00108 | Euryarchaeota |
| 5 | 2046 | 57.34 | 0.101 | 0.00574 | 0.00161 | Crenarchaeota | ||
| 6 | 3132 | 62.35 | 0.252 | 0.905 | 0.01075 | 0.00024 | Euryarchaeota | |
| 7 | 2015 | 67.88 | 0.862 | 0.313 | 0.00056 | 0.00151 | ||
| 8 | 3133 | 47.85 | 0.578 | 0.027 | 0.00160 | 0.00523 | ||
| 9 | 1668 | 53.7 | 0.019 | 0.908 | 0.00531 | 0.00100 | Crenarchaeota | |
| 10 | 1298 | 56.5 | 0.118 | 0.901 | 0.00199 | 0.00014 | ||
| 11 | 2192 | 46.21 | 0.00668 | 0.01423 | Crenarchaeota | |||
| 12 | 1854 | 31.02 | 0.02048 | 0.03768 | Euryarchaeota | |||
| 13 | 1666 | 31.4 | 0.132 | 0.031 | 0.00450 | 0.01128 | ||
| 14 | 2576 | 40.74 | 0.078 | 0.00266 | 0.00845 | |||
| 15 | 1570 | 30.02 | 0.218 | 0.52 | 0.00399 | 0.00063 | ||
| 16 | 1781 | 32.99 | 0.065 | 0.00846 | 0.00454 | |||
| 17 | 1745 | 33.4 | 0.045 | 0.045 | 0.00553 | 0.00224 | ||
| 18 | 1773 | 33.27 | 0.256 | 0.784 | 0.00430 | 0.00088 | ||
| 19 | 1662 | 33.08 | 0.021 | 0.08 | 0.00619 | 0.00259 | ||
| 20 | 1721 | 31.31 | 0.505 | 0.519 | 0.00364 | 0.00400 | ||
| 21 | 1806 | 49.97 | 0.606 | 0.05 | 0.00097 | 0.00404 | ||
| 22 | 2479 | 62.04 | 0.816 | 0.745 | 0.00234 | 0.00000 | ||
| 23 | 1696 | 61.22 | 0.556 | 0.032 | 0.00230 | 0.00471 | ||
| 24 | 1880 | 53.53 | 0.673 | 0.00018 | 0.00595 | |||
| 25 | 5752 | 42.67 | 0.839 | 0.00628 | 0.00083 | |||
| 26 | 4838 | 39.27 | 0.00475 | 0.00391 | ||||
| 27 | 4097 | 41.47 | 0.252 | 0.812 | 0.00212 | 0.00079 | ||
| 28 | 1768 | 27.62 | 0.275 | 0.00897 | 0.00652 | |||
| 29 | 3545 | 45.14 | 0.015 | 0.00951 | 0.00411 | |||
| 30 | 1752 | 49.52 | 0.022 | 0.114 | 0.00566 | 0.00166 | ||
| 31 | 491 | 31.55 | 0.549 | 0.177 | 0.00000 | 0.00127 | Nanoarchaeota | |
| 32 | 2596 | 63.42 | 0.473 | 0.228 | 0.00174 | 0.00091 | Euryarchaeota | |
| 33 | 1646 | 31.15 | 0.00921 | 0.00855 | Crenarchaeota | |||
| 34 | 1546 | 35.96 | 0.296 | 0.00096 | 0.00793 | Euryarchaeota | ||
| 35 | 2223 | 51.34 | 0.00727 | 0.01022 | Crenarchaeota | |||
| 36 | 2122 | 54.98 | 0.795 | 0.431 | 0.00138 | 0.00316 | ||
| 37 | 2010 | 57.13 | 0.148 | 0.337 | 0.00294 | 0.00008 | ||
| 38 | 1827 | 49.58 | 0.305 | 0.436 | 0.00085 | 0.00183 | ||
| 39 | 1766 | 44.69 | 0.652 | 0.574 | 0.00219 | 0.00342 | Euryarchaeota | |
| 40 | 1909 | 40.75 | 0.754 | 0.757 | 0.00004 | 0.00094 | ||
| 41 | 1739 | 41.86 | 0.133 | 0.00229 | 0.01262 | |||
| 42 | 1571 | 35.71 | 0.02078 | 0.02726 | Crenarchaeota | |||
| 43 | 2227 | 36.69 | 0.413 | 0.526 | 0.00309 | 0.00124 | ||
| 44 | 2993 | 35.77 | 0.747 | 0.00533 | 0.00241 | |||
| 45 | 2695 | 32.78 | 0.029 | 0.00521 | 0.00659 | |||
| 46 | 2089 | 51.98 | 0.062 | 0.328 | 0.00418 | 0.00160 | Euryarchaeota | |
| 47 | 1782 | 57.66 | 0.014 | 0.00346 | 0.00665 | Crenarchaeota | ||
| 48 | 1565 | 45.99 | 0.016 | 0.016 | 0.00680 | 0.00383 | Euryarchaeota | |
| 49 | 1585 | 39.91 | 0.055 | 0.361 | 0.00404 | 0.00263 |
Chromosomes of archaea analyzed in this study. The KS test for significance between the frequency distribution of complementary nucleotide values are given as KS (W) between A and T and KS (S) between G and C. In bacteria, archaea and eukaryotes, P-values of <10−4 (strong violation of ISFDP) are shown in bold and P-values of <0.01 but ≥10−4 (weak violation of ISFDP) are shown in italics. The P-value between 10−4 and 10−3 is shown as 0.000. Relative absolute abundance value difference between the complementary nucleotides is given by |(∑A − ∑T)|/(∑A + ∑T)and |(∑G − ∑C)|/(∑G + ∑C) for ATS and GCS, respectively. In chromosome of X. fastidiosa 9a5c, the GCS/ATS value is highest suggesting the difference between the abundance values of complementary nucleotides is high. The P-value by the KS test is in concordant with the ATS/GCS suggesting that the abundance difference can be represented by the frequency distribution study of the nucleotides. Similar relation is also observed in other chromosomes.
ISFDP analysis in eukaryotes chromosomes
| Serial number | Strain name | Size (kb) | GC% | KS (W) | KS (S) | |(∑A − ∑T)|/(∑A + ∑T) | |(∑G − ∑C)|/(∑G + ∑C) | Eukaryotes group |
|---|---|---|---|---|---|---|---|---|
| 1 | 197 | 25.64 | 0.411 | 0.468 | 0.00080 | 0.00517 | Cryptophyta | |
| 2 | 181 | 26.7 | 0.435 | 0.35 | 0.00451 | 0.00356 | ||
| 3 | 175 | 26.81 | 0.671 | 0.403 | 0.00051 | 0.00622 | ||
| 4 | 270 | 62.84 | 0.055 | 0.01290 | 0.02500 | Euglenozoa | ||
| 5 | 644 | 20.52 | 0.69 | 0.02184 | 0.01210 | Alveolata | ||
| 6 | 1344 | 19.32 | 0.01288 | 0.01482 | ||||
| 7 | 2036 | 18.95 | 0.043 | 0.027 | 0.00339 | 0.00994 | ||
| 8 | 2272 | 19.31 | 0.05 | 0.677 | 0.00597 | 0.00376 | ||
| 9 | 2733 | 19.11 | 0.105 | 0.266 | 0.00422 | 0.00914 | ||
| 10 | 3292 | 18.43 | 0.258 | 0.062 | 0.00275 | 0.00730 | ||
| 11 | 231 | 39.14 | 0.731 | 0.088 | 0.00100 | 0.01231 | Fungi | |
| 12 | 1532 | 37.9 | 0.807 | 0.379 | 0.00240 | 0.00345 | ||
| 13 | 1091 | 38.05 | 0.285 | 0.85 | 0.00136 | 0.00080 | ||
| 14 | 1079 | 38.44 | 0.055 | 0.461 | 0.00325 | 0.00173 | ||
| 15 | 1092 | 38.13 | 0.181 | 0.64 | 0.00584 | 0.00387 | ||
| 16 | 5574 | 36.09 | 0.4 | 0.076 | 0.00073 | 0.00086 | ||
| 17 | 4510 | 35.92 | 0.461 | 0.825 | 0.00207 | 0.00039 | ||
| 18 | 2453 | 36.23 | 0.152 | 0.012 | 0.00217 | 0.00369 |
Chromosomes of eukaryotes analyzed in this study. The KS test for significance between the frequency distribution of complementary nucleotide values are given as KS (W) between A and T and KS (S) between G and C. In bacteria, archaea and eukaryotes, P-values of <10−4 (strong violation of ISFDP) are shown in bold and P-values of <0.01 but ≥10−4 (weak violation of ISFDP) are shown in italics. The P-value between 10−4 and 10−3 is shown as 0.000. Relative absolute abundance value difference between the complementary nucleotides is given by |(∑A − ∑T)|/(∑A + ∑T) and |(∑G − ∑C)|/(∑G + ∑C) for ATS and GCS, respectively. In chromosome of X. fastidiosa 9a5c, the GCS/ATS value is highest suggesting that the difference between the abundance values of complementary nucleotides is high. The P-value by the KS test is in concordant with the ATS/GCS suggesting that the abundance difference can be represented by the frequency distribution study of the nucleotides. Similar relation is also observed in other chromosomes.
Figure 1(A–E) Frequency distribution of nucleotides in chromosomes. Smooth curves present the group-frequency distribution of the four nucleotides a (square), t (asterisk), g (triangle) and c (rhombus). The X-axis represents the abundance values of the nucleotide spanning a range, whereas the Y-axis represents the frequency of the abundance values. In (A), the chromosome is AT rich; in (B), the chromosome is composed of similar AT and GC and in (C), the chromosome is GC rich. This is also evident from the group-frequency distribution curve. The smooth frequency curves of complementary nucleotides in these chromosomes are overlapping with each other. The KS test is shown for S and W nucleotides separately adjacent to the figures, respectively [a(ii, iii)–e(ii, iii)]. The KS test is in concordance with the curve obtained by smoothing group-frequency distribution. In (D) and (E), the group-frequency distribution for the chromosomes of two strains of X. fastidiosa is shown. In 9a5c strain chromosome, the smooth frequency curve between the complementary nucleotides does not overlap which is also suggested by the KS test. However, in Temecula 1 strain chromosome, the parity is maintained.
Summary of ISFDP violations in chromosomes of Bacteria, Archaea and Eukaryotes
| Organism | Number of chromosomes | Number of chromosomes exhibiting ISFDP for both W and S | Number of chromosomes violating* ISFDP for both W and S | Number of chromosomes violating ISFDP only between S nucleotides | Number of chromosomes violating ISFDP only between W nucleotides |
|---|---|---|---|---|---|
| Bacteria | 112 | 60 | 15 (5a+8b+0c+2d) | 30 (13e+17f) | 07 (1g+6h) |
| Archaea | 49 | 30 | 06 (2a+2b+2c+0d) | 06 (0e+6f) | 07 (2g+5h) |
| Eukaryotes | 18 | 15 | 01 (0a+0b+0c+1d) | 01 (1e+0f) | 01 (0g+1h) |
*Violation of ISFDP includes both weak (10−2 > P ≥ 10−4) and strong (P < 10−4).
aStrong violation between S nucleotides as well as between W nucleotides.
bStrong violation between S nucleotides but weak violation between W nucleotides.
cWeak violation between S nucleotides but strong violation between W nucleotides.
dWeak violation between S nucleotides as well as between W nucleotides.
eStrong violation only between S nucleotides.
fWeak violation only between S nucleotides.
gStrong violation only between W nucleotides.
hWeak violation only between W nucleotides.
Figure 2A schematic representation of coding sequence arrangement studied. In the upper row, the entire DNA strand is composed of forward encoded sequences (black color). Parity is not observed in this case. In the lower row, the DNA strand is made up of 50% forward encoded sequences and the other 50% is the reverse encoded sequences (white color). Parity is observed in this case.
Figure 3(A and B) Frequency distribution study of nucleotides in coding sequences. Smooth curves present the group-frequency distribution of the four nucleotides a (square), t (asterisk), g (triangle) and c (rhombus). The X-axis represents the abundance values of the nucleotide spanning a range, whereas the Y-axis represents the frequency of the abundance values. In (A), the frequency of the nucleotides in a DNA strand only composed of forward encoded sequences of E. coli is shown (coding sequences analyzed for other chromosomes exhibited the similar feature). It is evident from (B) that the frequency distributions of the complementary nucleotides do not overlap. In (B), the frequency of the nucleotides of the same DNA strand done where 50% of the sequence was joined with the rest after reverse complementation (see the Materials and methods section). This resembled a strand composed of 50% forward encoded sequences and 50% reverse encoded sequences. It is evident from the figures that parity between the complementary nucleotides is observed in this case. These observations have been confirmed by the KS test.
Figure 4Relative disproportionate composition of ORFs between Ws and Cs in chromosomes. The composition of ORFs in Ws and Cs of seven bacteria and two archaea was studied. Relative disproportionate composition was found out by deducting the ORF numbers between the two strands and then dividing the value obtained by the total number of ORFs present in both the strands. In A. tumefaciens, relative disproportionate value found to be minimum suggesting that the difference in the number of ORFs between the strands is relatively minimum when compared with others. In the archaea S. marinus, the value is found to be maximum among these nine strains. Both A. tumefaciens and S. marinus exhibited ISFDP violations, whereas insignificant ISFDP violation observed between E. coli and B. subtilis. Comparison between the strains of X. fastidiosa and H. influenzae is shown.