| Literature DB >> 17526522 |
Yuki Moriya1, Masumi Itoh, Shujiro Okuda, Akiyasu C Yoshizawa, Minoru Kanehisa.
Abstract
The number of complete and draft genomes is rapidly growing in recent years, and it has become increasingly important to automate the identification of functional properties and biological roles of genes in these genomes. In the KEGG database, genes in complete genomes are annotated with the KEGG orthology (KO) identifiers, or the K numbers, based on the best hit information using Smith-Waterman scores as well as by the manual curation. Each K number represents an ortholog group of genes, and it is directly linked to an object in the KEGG pathway map or the BRITE functional hierarchy. Here, we have developed a web-based server called KAAS (KEGG Automatic Annotation Server: http://www.genome.jp/kegg/kaas/) i.e. an implementation of a rapid method to automatically assign K numbers to genes in the genome, enabling reconstruction of KEGG pathways and BRITE hierarchies. The method is based on sequence similarities, bi-directional best hit information and some heuristics, and has achieved a high degree of accuracy when compared with the manually curated KEGG GENES database.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17526522 PMCID: PMC1933193 DOI: 10.1093/nar/gkm321
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.An example of the genome annotation with the KO identifiers or the K numbers by the KAAS service, which is integrated into the KEGG resource. Once the KAAS assigns K numbers to query genes, the mapping to KEGG pathways and BRITE hierchies is generated using the existing framework of the KEGG system.
Figure 2.The overall procedure of KAAS.
Accuracy of K number assignment by KAAS with the BBH method and the whole set of KEGG GENES
| Species | ||||
|---|---|---|---|---|
| Sensitivity | 83.7% | 70.4% | 85.2% | 97.4% |
| Specificity | 98.6% | 91.5% | 94.1% | 94.3% |
| PPV | 93.6% | 47.9% | 80.7% | 94.9% |
| Precision | 98.0% | 85.5% | 91.6% | 98.5% |
Sensitivity is the rate of the true positives to all genes with KO annotations. Specificity is the rate of the true negatives to all genes without KO annotations. PPV is the rate of true positives to all positives for all genes in each organism. Precision means the rate of correctly annotated genes if the test set is limited to the genes with KO annotations.
Accuracy of K number assignment KAAS with the BBH method and the representative set
| Species | ||||
|---|---|---|---|---|
| Sensitivity | 85.4% | 62.5% | 86.8% | 90.1% |
| Specificity | 98.9% | 91.3% | 96.8% | 94.9% |
| PPV | 94.4% | 44.3% | 87.7% | 93.2% |
| Precision | 97.9% | 83.8% | 94.9% | 96.6% |