| Literature DB >> 21474551 |
Augusto F Vellozo1, Amélie S Véron, Patrice Baa-Puyoulet, Jaime Huerta-Cepas, Ludovic Cottret, Gérard Febvay, Federica Calevro, Yvan Rahbé, Angela E Douglas, Toni Gabaldón, Marie-France Sagot, Hubert Charles, Stefano Colella.
Abstract
In recent years, genomes from an increasing number of organisms have been sequenced, but their annotation remains a time-consuming process. The BioCyc databases offer a framework for the integrated analysis of metabolic networks. The Pathway tool software suite allows the automated construction of a database starting from an annotated genome, but it requires prior integration of all annotations into a specific summary file or into a GenBank file. To allow the easy creation and update of a BioCyc database starting from the multiple genome annotation resources available over time, we have developed an ad hoc data management system that we called Cyc Annotation Database System (CycADS). CycADS is centred on a specific database model and on a set of Java programs to import, filter and export relevant information. Data from GenBank and other annotation sources (including for example: KAAS, PRIAM, Blast2GO and PhylomeDB) are collected into a database to be subsequently filtered and extracted to generate a complete annotation file. This file is then used to build an enriched BioCyc database using the PathoLogic program of Pathway Tools. The CycADS pipeline for annotation management was used to build the AcypiCyc database for the pea aphid (Acyrthosiphon pisum) whose genome was recently sequenced. The AcypiCyc database webpage includes also, for comparative analyses, two other metabolic reconstruction BioCyc databases generated using CycADS: TricaCyc for Tribolium castaneum and DromeCyc for Drosophila melanogaster. Linked to its flexible design, CycADS offers a powerful software tool for the generation and regular updating of enriched BioCyc databases. The CycADS system is particularly suited for metabolic gene annotation and network reconstruction in newly sequenced genomes. Because of the uniform annotation used for metabolic network reconstruction, CycADS is particularly useful for comparative analysis of the metabolism of different organisms. Database URL: http://www.cycadsys.org.Entities:
Mesh:
Year: 2011 PMID: 21474551 PMCID: PMC3072769 DOI: 10.1093/database/bar008
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.CycADS annotation management system workflow. Genomic information is combined in CycADS with the annotation data obtained using different methods and the collected data are filtered to produce the PathoLogic files (PF files output) that will then be used to generate the BioCyc databases with the Pathway Tools system (PathoLogic module). The annotations can also be extracted for other applications using the filtering system (other files output).
Annotation methods by database
| EC annotation | No. | GO annotations | No. | |
|---|---|---|---|---|
| AcypiCyc | KAAS(2), PRIAM, Blast2GO | 4 | Phlylome DB inference | 3 |
| TricaCyc | KAAS(2), PRIAM, Blast2GO | 4 | Phlylome DB inference | 3 |
| DromeCyc | GenBank from NCBI | 1 | GO from FlyBase | 1 |
A summary table including the annotation methods used for each database (at the time of publication). For the EC numbers: ‘KAAS’—two different KAAS methods were used to annotate the protein sequence, using two different reference datasets (Eukaryotes and GENES, see text for details); ‘PRIAM’—the annotation was performed using default parameters; ‘Blast2GO’—inference of EC number from the GO annotation; ‘GenBank file from NCBI’—downloded file. For the GO numbers: the ‘Phylome DB inference’ with three levels of confidence (see text for details); ‘GO from FlyBase’—downloaded file.
Figure 2.Screenshots of a BioCyc database generated by CycADS. An example page from AcypiCyc showing the enrichment of a BioCyc gene page with complementary information about the annotation source included in the ‘Summary’ and extra hyperlinks (‘Unification Links’) to important resources.
Figure 3.Comparison of the EC annotation by different methods in AcypiCyc. (A) Reaction annotation by EC methods. Venn-diagrams showing the number of reactions (total of 1176) identified in the metabolic reconstructions using data from the different annotation methods [PRIAM, KAAS (two methods), Blast2GO-EC], the total number of reactions annotated by each method is specified in black below the method name, while specified in white is the number of unique or shared reactions among annotations. (B) Gene annotation by EC methods. Venn-diagrams showing the number of genes (total of 2281) annotated using the different methods [colour code for annotations as in (A)]. Note: multiple genes may catalyse a single reaction. This figure was generated using Aduna Cluster Map - http://www.aduna-software.com/technology/clustermap.