| Literature DB >> 18302784 |
Meng Hu1, Kwangmin Choi, Wei Su, Sun Kim, Jiong Yang.
Abstract
BACKGROUND: Mining gene patterns that are common to multiple genomes is an important biological problem, which can lead us to novel biological insights. When family classification of genes is available, this problem is similar to the pattern mining problem in the data mining community. However, when family classification information is not available, mining gene patterns is a challenging problem. There are several well developed algorithms for predicting gene patterns in a pair of genomes, such as FISH and DAGchainer. These algorithms use the optimization problem formulation which is solved using the dynamic programming technique. Unfortunately, extending these algorithms to multiple genome cases is not trivial due to the rapid increase in time and space complexity.Entities:
Mesh:
Year: 2008 PMID: 18302784 PMCID: PMC2279103 DOI: 10.1186/1471-2105-9-124
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1DISPattern Algorithm. This figure is the Flowchart of DISPattern algorithm.
Figure 2Accuracy of DISPattern w.r.t. support. Different support thresholds yield different sets of DISPattern. By applying these different sets of patterns, different gene annotations may be obtained. This figure shows the recall and precision of the gene annotation w.r.t. the support threshold. Since each gene is assigned to one group, the precision is the same as the recall.
Figure 3Accuracy of DISPattern w.r.t. Top-k Ortholog Candidates. When assigning a gene, we use the k most common Orthologs as candidates. This figure shows the accuracy when a different number of ortholog candidates are used.
Results on two example target genomes
| Genome | NC_000962 | NC_002929 |
| No. of genes | 2756 | 2723 |
| Recall of BBH | 86.8% | 89.4% |
| Recall of DISPattern | 93.5% | 91.0% |
| Precision of BBH | 86.8% | 89.4% |
| Precision of DISPattern | 93.5% | 91.0% |
| No. of errors by BBH | 186 | 106 |
| No. of errors by DISPattern | 66 | 61 |
Example genes correctly predicted by DISPattern but not predicted by BBH
| Gene(id) | COG | Pattern | Occurrence |
| Rv2141c (id: 57116951) | COG0624 Acetylornithine deacetylase Succinyl-diaminopimelate desuccinylase and related deacylases | {COG0167 COG0624 COG1881} | NC_002755 |
| Rv0558 (id: 57116754) | COG2226 Methylase involved in ubiquinone/menaquinone biosynthesis | {COG0438 COG2226} | NC_002935 NC_004369 |
| Rv1319c (id: 15608459) | COG2114 Adenylate cyclase, family 3 (some proteins contain HAMP domain) | {COG1637 COG2114 COG2114 COG2114} | NC_002755 |
| Rv1034c (id: 15608174) | COG3039 Transposase and inactivated derivatives, IS5 family | {COG0642 COG0745 COG2156 COG2216 COG3039} | NC_002755 |
| nrp (id: 15607243) | COG3320 Putative dehydrogenase domain of multifunctional non-ribosomal peptide synthetases and related enzymes | {COG0227 COG0474 COG0523 COG0664 COG2217 COG3320 COG3336} | NC_002755 |
Example genes correctly predicted by DISPattern but not predicted by BBH
| Gene(id) | COG | Pattern | Occurrence |
| BP0202 (id: 33591446), BP0203 (id: 33591447), BP0210 (id: 33591454), BP0211 (id: 33591455) | COG2801 Transposase and inactivated derivatives | {COG2801 COG2801} | NC_005090 NC_002662 |
| BP0153 (id: 33591402) | COG3565 Predicted dioxygenase of extradiol dioxygenase family | {COG0111 COG0583 COG0642 COG3019 COG3565} | NC_002929 |
| BP0778 (id: 33591402) | COG0318 Acyl-CoA synthetases (AMP-forming)/AMP-acid ligases II | {COG0318 COG0604 COG1802 COG4625} | NC_002929 |