| Literature DB >> 26913185 |
Dan B Jensen1, David W Ussery2.
Abstract
BACKGROUND: Prediction of the optimal habitat conditions for a given bacterium, based on genome sequence alone would be of value for scientific as well as industrial purposes. One example of such a habitat adaptation is the requirement for oxygen. In spite of good genome data availability, there have been only a few prediction attempts of bacterial oxygen requirements, using genome sequences. Here, we describe a method for distinguishing aerobic, anaerobic and facultative anaerobic bacteria, based on genome sequence-derived input, using naive Bayesian inference. In contrast, other studies found in literature only demonstrate the ability to distinguish two classes at a time.Entities:
Keywords: Comparative genomics, oxygen requirements, prediction, Bayesian inference
Year: 2013 PMID: 26913185 PMCID: PMC4743139 DOI: 10.12688/f1000research.2-184.v1
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Number of genomes of the three different oxygen requirement classifications included in this study.
| Classification | Number of included genomes |
|---|---|
| Aerobe | 175 |
| Anaerobe | 112 |
| Facultative | 91 |
Figure 1. Schematic overview of the two methods used to predict oxygen requirement in bacteria.
a) In the one-step prediction method the genomes in the test set are assigned a posterior probability for each of the three included classifications, given their protein domain profile. The genomes are predicted to belong to the classification to which they have the highest posterior probability. b) The genomes in the test set are first assigned posterior probabilities for being able or unable to respire, based on their protein domain profile. Using a second model, those genomes found most likely to be capable of respiration are assigned a posterior probability of belonging to the classifications Aerobe or Facultative.
Predictive performance, measured in Matthew's Correlation Coefficient (MCC), achieved when using N-fold cross validation for one-step prediction.
Predictions of all classes are performed better than random chance, although aerobe and anaerobe bacteria clearly show the best performance compared to facultative anaerobe bacteria.
| Classification | Predictive performance (MCC) |
|---|---|
| Aerobe | 0.63 |
| Anaerobe | 0.76 |
| Facultative | 0.31 |
Overview of how the different classes are predicted, when using the one-step method.
Aerobe bacteria are correctly predicted to aerobe in 87% of the cases and are mis-predicted to be facultative anaerobes in 11% of the cases. Similarly anaerobe bacteria are correctly predicted in 88% of the cases, and are mis-prediction of anaerobes as aerobe or facultative anaerobes happen equally frequently, in 6% of the cases. Facultative anaerobes are most commonly mis-predicted to be aerobes, in 44% of the cases. The facultative anaerobes are only correctly predicted in 35% of the cases.
| Aerobe genomes | Anaerobe genomes | Facultative genomes | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Predictions | Aerobe | 137 | 87 | % | Aerobe | 6 | 6 | % | Aerobe | 43 | 44 | % |
| Anaerobe | 3 | 2 | % | Anaerobe | 95 | 88 | % | Anaerobe | 21 | 21 | % | |
| Facultative | 17 | 11 | % | Facultative | 7 | 6 | % | Facultative | 34 | 35 | % | |
Predictive performance, measured in Matthew's Correlation Coefficient (MCC), of two-step Bayesian network for oxygen requirement prediction.
The performance for aerobe and anaerobe predictions are the same as for the one step prediction method, but the performance for prediction of facultative anaerobes have increased from 0.31 to 0.39.
| Classification | MCC |
|---|---|
| Aerobe | 0.63 |
| Anaerobe | 0.76 |
| Facultative | 0.39 |
Overview of how the different classes are predicted, when using the two-step method.
Notice that the frequency of correctly predicted facultative anaerobes have not increased compared with the one-step method (33% vs. 35%), but that the fraction of erroneous predictions of aerobe and anaerobe bacteria have been decreased (5% vs. 11% for aerobes, 4% vs. 6% for anaerobes). Thus the better performance of the prediction of facultative anaerobe genomes is due to an increased accuracy in predicting aerobe and anaerobe bacteria rather than an increased accuracy in predicting facultative anaerobe bacteria.
| Aerobe genomes | Anaerobe genomes | Facultative genomes | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Predictions | Aerobe | 141 | 90 | % | Aerobe | 8 | 7 | % | Aerobe | 47 | 48 | % |
| Anaerobe | 8 | 5 | % | Anaerobe | 96 | 89 | % | Anaerobe | 18 | 19 | % | |
| Facultative | 8 | 5 | % | Facultative | 4 | 4 | % | Facultative | 32 | 33 | % | |