| Literature DB >> 17038174 |
Babak Shahbaba1, Radford M Neal.
Abstract
BACKGROUND: We investigate whether annotation of gene function can be improved using a classification scheme that is aware that functional classes are organized in a hierarchy. The classifiers look at phylogenic descriptors, sequence based attributes, and predicted secondary structure. We discuss three Bayesian models and compare their performance in terms of predictive accuracy. These models are the ordinary multinomial logit (MNL) model, a hierarchical model based on a set of nested MNL models, and an MNL model with a prior that introduces correlations between the parameters for classes that are nearby in the hierarchy. We also provide a new scheme for combining different sources of information. We use these models to predict the functional class of Open Reading Frames (ORFs) from the E. coli genome.Entities:
Mesh:
Year: 2006 PMID: 17038174 PMCID: PMC1618412 DOI: 10.1186/1471-2105-7-448
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Comparison of models based on their predictive accuracy (%) using each data source separately.
| Accuracy (%) | SEQ | STR | SIM | ||||||
| Level 1 | Level 2 | Level 3 | Level 1 | Level 2 | Level 3 | Level 1 | Level 2 | Level 3 | |
| Baseline | 42.56 | 21.21 | 8.15 | 42.56 | 21.21 | 8.15 | 42.56 | 21.21 | 8.15 |
| MNL | 60.25 | 33.99 | 20.93 | 50.98 | 25.14 | 15.87 | 69.10 | 45.79 | 30.76 |
| treeMNL | 59.27 | 34.13 | 18.26 | 52.67 | 27.39 | 16.29 | 67.70 | 45.93 | 30.34 |
| corMNL | |||||||||
Comparison of models based on their predictive accuracy (%) for specific coverage (%) provided in parenthesis. The C5 results and the coverage values are from [3].
| Accuracy (%) | SEQ | STR | SIM | ||||||
| Level 1 | Level 2 | Level 3 | Level 1 | Level 2 | Level 3 | Level 1 | Level 2 | Level 3 | |
| (20) | (18) | (4) | (10) | (1) | (5) | (29) | (26) | (16) | |
| C5 | 64 | 63 | 41 | 59 | 44 | 17 | 75 | 74 | 69 |
| MNL | 81 | 79 | 88 | 67 | 96 | ||||
| treeMNL | 81 | 76 | 70 | 70 | 86 | 69 | 95 | 87 | 84 |
| corMNL | 82 | ||||||||
Accuracy (%) of models on the combined dataset with and without separate scale parameters. Results using SIM alone are provided for comparison.
| Accuracy (%) | SIM only | Combined dataset single scale parameter | Combined dataset separate scale parameters | ||||||
| Level 1 | Level 2 | Level 3 | Level 1 | Level 2 | Level 3 | Level 1 | Level 2 | Level 3 | |
| MNL | 69.10 | 45.79 | 30.76 | 69.66 | 48.88 | 32.02 | 70.65 | 33.71 | |
| treeMNL | 67.70 | 45.93 | 30.34 | 68.26 | 46.63 | 30.34 | 68.82 | 46.63 | 31.74 |
| corMNL | |||||||||
Predictive accuracy (%) for different coverage values (%) of the corMNL model using all three sources with separate scale parameters.
| Accuracy (%) | Coverage (%) | |||||
| Level 1 | 100 | 98 | 96 | 92 | 76 | 73 |
| Level 2 | 100 | 98 | 96 | 71 | 53 | 49 |
| Level 3 | 100 | 97 | 80 | 52 | 36 | 34 |
Figure 1The corMNL model for a simple hierarchy. The coeffcient parameter for each class is a sum of parameters at different levels of the hierarchy.