| Literature DB >> 23282160 |
Dan B Jensen1, Tammi C Vesth, Peter F Hallin, Anders G Pedersen, David W Ussery.
Abstract
BACKGROUND: The preferred habitat of a given bacterium can provide a hint of which types of enzymes of potential industrial interest it might produce. These might include enzymes that are stable and active at very high or very low temperatures. Being able to accurately predict this based on a genomic sequence, would thus allow for an efficient and targeted search for production organisms, reducing the need for culturing experiments.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23282160 PMCID: PMC3521210 DOI: 10.1186/1471-2164-13-S7-S3
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Phylogenetic relationships of 117 bacteria from the four different thermophilicity classes. The relationship is based on predicted 16S rRNA sequences. Red tips are hyperthermophiles, orange are thermophiles, green are mesophiles and blue are psychrophiles. The purple lines indicate the species exemplifying evolutionary flexibility, as discussed in the text.
The number of protein families found to be overrepresented in each of the three thermophilicity classes.
| Class of overrepresentation | Number of protein families |
|---|---|
| Thermophiles | 8 |
| Mesophiles | 0 |
| Psychrophiles | 32 |
Overrepresentation is defined as presence in more than 65% of one class, and at a significantly (p < 0.01) lower frequency in all other classes.
The predictive performance of the naïve Bayesian inference program, achieved when implementing a Gaussian likelihood function of a.) the observed structural characteristics alone, b.) when implementing the observed protein family frequencies alone as likelihoods and c.) when combining the observed protein family frequencies with the Gaussian likelihood functions of observed structural characteristics.
| Class | Test set | |
|---|---|---|
| Thermophiles | 0.24 | 80.0 |
| Mesophiles | 0.36 | 50.0 |
| Psychrophiles | 0.47 | 25.0 |
| Thermophiles | 0.60 | 92.9 |
| Mesophiles | 0.13 | 28.6 |
| Psychrophiles | 0.51 | 50.0 |
| Thermophiles | 0.67 | 92.0 |
| Mesophiles | 0.40 | 57.1 |
| Psychrophiles | 0.68 | 50.0 |
(For the individual predictions, see Additional file 5, 6 and 7)
Figure 2Pearson's correlation coefficients between thermophilicity class-associated protein families, shown as a heat map. Lighter colors indicate stronger correlations. The top seven protein families (2075, 4698, 1149, 6954, 11184 and 14495) were all found to be overrepresented in thermophile genomes. The remaining protein families were overrepresented in psychrophile genomes. Families associated with the same thermophilicity class tend to correlate moderately with each other and anti-correlate moderately with families associated with other classes.
Figure 3Pearson's correlation coefficients of sequence features, presented as a heat map. Lighter color indicates stronger correlation. The blue lines separate the amino acids from the codons, while the green lines separate the codons from the genome size and the AT content (genomic and 16S rDNA).