| Literature DB >> 19845945 |
Jon Bohlin1, Eystein Skjerve, David W Ussery.
Abstract
BACKGROUND: Recently there has been an explosion in the availability of bacterial genomic sequences, making possible now an analysis of genomic signatures across more than 800 hundred different bacterial chromosomes, from a wide variety of environments.Using genomic signatures, we pair-wise compared 867 different genomic DNA sequences, taken from chromosomes and plasmids more than 100,000 base-pairs in length. Hierarchical clustering was performed on the outcome of the comparisons before a multinomial regression model was fitted. The regression model included the cluster groups as the response variable with AT content, phyla, growth temperature, selective pressure, habitat, sequence size, oxygen requirement and pathogenicity as predictors.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19845945 PMCID: PMC2770534 DOI: 10.1186/1471-2164-10-487
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Regression models of genomic di-, tetra- and hexanucleotide frequencies and AT content
| Dinucleotides | Y2 = exp(-6.42-8.64XAT + 6.59X2AT) | ||
| Tetranucleotides | Y4 = exp(-8.85-14.73XAT + 12.39X2AT) | ||
| Hexanucleotides | Y6 = exp(-11.74-21.94XAT + 19.40X2AT) |
Figure 1Cluster diagram of 867 prokaryotic genomic DNA sequences compared pair-wise using hexanucleotide-based genomic signatures. 867 prokaryotic genomic DNA sequences were compared pair-wise with hexanucleotide-based genomic signatures. Hierarchical clustering was performed on the resulting 867 × 867 correlation matrix using average linkage and Euclidean distance. The cluster diagram was grouped into different segments, Groups 1-7, based on the cluster-tree which reflected how the prokaryotic DNA sequences compared pair-wise. Lighter colors mean higher correlation scores, and thus closer similarity between the compared genomes. The multi-colored horizontal bar on top indicates each chromosome's respective phylum, while the vertical red and blue coloured bar shows AT/GC content, where red means GC content larger than 50% and blue AT content larger than 50%. Groups 5 and 7 are mainly populated with free-living, GC rich, prokaryotes with diverse metabolic capabilities. Groups 1 and 3 consist predominantly of AT rich and host-associated archaea and bacteria, while group 2 and 6 consisted mainly of larger host-associated γ-Proteobacteria. Group 4, was the smallest and most dissimilar group, consisting of many extremophiles.
Figure 2Average AT scores and OUV content in cluster groups. The graphs shows average AT content (left) and OUV scores (right) on the vertical axis, for each group on the horizontal axis. High OUV scores indicate strong bias in genomic hexanucleotide usage, while low scores imply more random DNA composition. Free-living archaea and bacteria (groups 5 and 7) obtain higher average OUV scores than host-associated (groups 1 and 3), indicating pronounced differences in mutational pressures in the respective environments. Average AT content was considerably higher in the host-associated groups than in the free-living.
Polychotomous regression model with added predictors to the far left
| Model 0: constant | -1534 | 0 | 0 | 3080 |
| Model 1: Size | -1475 | 0.04 | 95 | 2985 |
| Model 2: AT content | -796 | 0.48 | 1333 | 1652 |
| Model 3: OUV | -775 | 0.49 | 30 | 1622 |
| Model 4: Phyla | -433 | 0.72 | 455 | 1167 |
| Model 5: Oxygen req. | -414 | 0.73 | 15 | 1152 |
| Model 6: Habitat | -360 | 0.77 | 61 | 1091 |
| Model 7: Temperature | -320 | 0.79 | 56 | 1035 |
| Final model | -320 | 0.79 | - | 1035 |
The table shows a forward fitting of a set of predictors to the response variable representing the cluster groups.