| Literature DB >> 15239845 |
Oleg N Reva1, Burkhard Tümmler.
Abstract
BACKGROUND: Oligonucleotide frequencies were shown to be conserved signatures for bacterial genomes, however, the underlying constraints have yet not been resolved in detail. In this paper we analyzed oligonucleotide usage (OU) biases in a comprehensive collection of 155 completely sequenced bacterial chromosomes, 316 plasmids and 104 phages.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15239845 PMCID: PMC487896 DOI: 10.1186/1471-2105-5-90
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Tetranucleotide usage patterns of The deviation Δof observed from expected counts defined by eq. 6 is shown for all 256 tetranucleotides (16 × 16 cells) by color code (right bar). Tetranucleotides are grouped into 39 classes of equivalent structural features [13] and sorted by decreasing base stacking energy row-by-row starting at the upper left corner (class 39). Within a class members are sorted alphabetically.
Figure 2Pattern skew of DNA sequences of different length. (A) Pattern skew (n0_4mer PS) values determined for a comprehensive collection of sequences of bacterial chromosomes, plasmids and phages are plotted against the logarithmic scale of sequence lengths. The grey shaded area depicts the 95% confidence intervals of variation of n0_4mer PS values in the complete chromosomal and plasmid sequences. Accession names of genomes been discussed in the text are presented. The genomes where the n0_4mer PS values exceeded the corresponding n1_4mer PS values by more than 2.5 σPS (see equation 2) are shown in red. (B) The n0_4mer PS values determined for arbitrary loci randomly cut out of the E. coli K12 (blue) and B. subtilis 168 (red) chromosome sequences are compared with PS values determined for complete bacterial chromosomes.
Local and global PS of bacterial chromosomes
| Bacterial chromosome | Length (bp) of sequence | Global n0_4mer PS (%) | Local n0_4mer PS (%)* | ||||
| Total | Coding | Non-coding | Total | Coding | Non-coding | Median (inner quartiles, range) | |
| 1,669,695 | 1,490,824 | 178,871 | 4.92 | 4.46 | 7.81 | 9.61 (6.19 – 13.63, 4.37 – 41.24) | |
| 1,551,335 | 1,448,950 | 102,385 | 3.40 | 3.49 | 8.18 | 5.37 (4.15 – 7.36, 2.89 – 32.50) | |
| 4,214,814 | 3,684,952 | 529,862 | 2.50 | 2.40 | 5.27 | 23.71 (21.50 – 26.81, 4.36 – 47.16) | |
| 8,619,960 | 7,515,107 | 1,104,853 | 1.27 | 1.38 | 3.01 | 5.89 (5.18 – 7.09, 2.18 – 23.18) | |
| 1,641,481 | 1,555,799 | 85,682 | 2.26 | 2.49 | 15.10 | 15.86 (11.46 – 20.79, 1.80 – 28.85) | |
| 3,309,401 | 2,867,342 | 442,059 | 3.71 | 3.49 | 9.68 | 17.84 (16.45 – 21.99, 6.26 – 56.73) | |
| 4,639,221 | 4,096,745 | 542,476 | 2.15 | 2.44 | 5.85 | 15.01 (12.04 – 23.11, 2.86 – 49.02) | |
| 963,879 | 869,493 | 94,386 | 2.45 | 2.52 | 6.37 | Not applicable | |
| 2,410,873 | 1,982,808 | 428,065 | 15.97 | 15.75 | 14.45 | 31.40 (19.54 – 35.60, 3.46 – 45.97) | |
| 1,751,080 | 1,566,066 | 185,014 | 1.82 | 1.81 | 4.85 | 9.99 (6.89 – 12.90, 1.58 – 37.52) | |
| 6,181,863 | 5,439,657 | 742,206 | 2.75 | 2.22 | 5.85 | 10.95 (9.67 – 12.91, 2.14 – 22.30) | |
| 7,145,576 | 6,817,640 | 327,936 | 3.45 | 3.45 | 6.65 | 14.83 (11.77 – 17.05, 3.69 – 26.42) | |
| 2,814,816 | 2,357,692 | 457,124 | 2.12 | 2.04 | 3.78 | 22.21 (19.99 – 23.74, 7.27 – 38.55) | |
| 8,667,507 | 7,379,401 | 1,288,106 | 1.48 | 1.42 | 1.77 | 4.70 (3.51 – 6.52, 1.91 – 20.42) | |
| 2,679,306 | 2,244,990 | 434,316 | 24,27 | 21.00 | 36.74 | 50.64 (39.86 – 57.52, 7.41 – 68.29) | |
| 2,519,802 | 1,967,507 | 552,295 | 6.38 | 5.02 | 13.69 | 53.17 (41.73 – 59.23, 8.80 – 71.71) | |
*PS values of n0_4mer patterns were calculated for 100 arbitrary loci (200 for E. coli and B. subtilis) of 5 to 1,000 kbp in size (median 289,752 bp, inner quartiles 114,406 – 477,801 bp).
Figure 3Relation between PS and the length of oligonucleotide words. OU patterns were determined for typical bacterial genomes represented by sequences of B. subtilis 168, E. coli K12 and P.aeruginosa PA01 chromosomes, and 5 chromosomal sequences with anomalously high PS values.
Figure 4Deviations of oligonucleotide usage patterns in local loci of two bacterial chromosomes. Lower panel: Distances (eq.8) between n0_4mer patterns calculated for local regions of the leading strand and the standard patterns determined for the clockwise replichor of the two bacterial chromosomes: A) X. fastidiosa 9a5c; B) P. putida KT2440. Local patterns were determined in 15 kbp sliding windows in steps of 7.5 kbp. The 95% confidence interval of distance values is depicted as the turquoise shaded area. The abscissa indicates the coordinates of the chromosomes starting from the putative replication origins (Ori). Positions of the putative chromosomal replication termini are depicted by Term. Upper panel: GC-skew between leading and lagging strands of the (A) X. fastidiosa 9a5c and (B) P. putida KT2440 chromosomes.
Figure 5The OUV values defined for n1_4mer patterns of 155 bacterial chromosomes and plotted against the mol% GC content. Curve lines depict the boundaries of the 95% confidence interval of OUV variation determined by eq.3 and 4.