| Literature DB >> 21408217 |
Kamal Kumar1, Valmik Desai, Li Cheng, Maxim Khitrov, Deepak Grover, Ravi Vijaya Satya, Chenggang Yu, Nela Zavaljevski, Jaques Reifman.
Abstract
BACKGROUND: The annotation of genomes from next-generation sequencing platforms needs to be rapid, high-throughput, and fully integrated and automated. Although a few Web-based annotation services have recently become available, they may not be the best solution for researchers that need to annotate a large number of genomes, possibly including proprietary data, and store them locally for further analysis. To address this need, we developed a standalone software application, the Annotation of microbial Genome Sequences (AGeS) system, which incorporates publicly available and in-house-developed bioinformatics tools and databases, many of which are parallelized for high-throughput performance.Entities:
Mesh:
Year: 2011 PMID: 21408217 PMCID: PMC3049762 DOI: 10.1371/journal.pone.0017469
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Annotation of microbial Genome Sequences (AGeS) system architecture.
The Web server hosts the AGeS Web application and accepts user requests via standard Web browsers. The workflow manager handles user requests for sequence management, runs the annotation pipeline, and presents the annotation results via GBrowse visualization. The sequence database stores all sequence and job-related data.
Figure 2Schematic representation of the various tools of the genome annotation pipeline.
Given assembled contigs in a FASTA format file, processing starts with the Do-It-Yourself Annotation (DIYA) genome annotation tool, followed by post-processing, tandem repeat annotation, and protein function prediction with Pipeline for Protein Annotation (PIPA).
List of genome annotation tools incorporated in DIYA and protein annotation tools integrated in PIPA.
| Resource | Description | Reference |
|
|
|
|
| Glimmer | Program for microbial gene identification |
|
| RNAmmer | Program for rRNA gene prediction |
|
| tRNAscan-SE | Program to identify tRNAs |
|
| TRF | Tandem Repeats Finder |
|
|
|
|
|
| CatFam | Enzyme profile databases based on three- and four-digit EC numbers |
|
| CDD | NCBI Conserved Domains Database |
|
| COG | Clusters of Orthologous Groups of proteins |
|
| InterPro | Integrated member databases |
|
| PSORTb | Prediction of bacterial subcellular localization |
|
| Phobius | A combined transmembrane topology and signal peptide predictor |
|
DIYA, Do-It-Yourself Annotation; PIPA, Pipeline for Protein Annotation; EC, Enzyme Commission.
Summary of genomic features predicted by AGeS and other annotation methods for two draft genomes and one completed genome.
|
|
|
| ||||
| Feature | AGeS | JCVI | AGeS | BCM | AGeS | Sanger Institute |
| Genes | 2,229 | 2,244 | 2,652 | 2,805 | 4,336 | 4,103 |
| CDSs | 2,172 | 2,182 | 2,591 | 2,738 | 4,249 | 3,885 |
| rRNAs | 4 | 4 | 4 | 4 | 19 | 19 |
| tRNAs | 53 | 52 | 57 | 57 | 68 | 70 |
| Tandem Repeats | 60 | NA | 123 | NA | 780 | NA |
AGeS, Annotation of microbial Genome Sequences; JCVI, J. Craig Venter Institute; BCM, Baylor College of Medicine; CDSs, coding sequences; NA, not applicable.
*The original source did not provide annotation for this feature.
Detailed comparison of overlapping gene segments for the three analyzed genomes, displaying the number and percentage of genes in each category.
|
|
|
| ||||
| Category | No. of genes | Percentage | No. of genes | Percentage | No. of genes | Percentage |
| 1) Identical | 1,753 | 78.7 | 2,037 | 76.8 | 2,639 | 60.9 |
| 2) Identical start | 252 | 11.3 | 286 | 10.8 | 634 | 14.6 |
| 3) Identical end | 210 | 9.4 | 283 | 10.7 | 655 | 15.1 |
| 4) Overlap | 10 | 0.4 | 20 | 0.7 | 201 | 4.6 |
| 5) No overlap | 4 | 0.2 | 26 | 1.0 | 207 | 4.8 |
Comparison of enzyme protein function (EC number) predictions between AGeS and other annotation methods for the three analyzed genomes.
|
|
|
| |
| No. of enzymes | 515 (AGeS) and 565 (JCVI) | 562 (AGeS) and 583 (BCM) | 833 (AGeS) and 836 (Sanger) |
| No. of overlapping enzymes | 413 | 459 | 671 |
| No. of enzymes with multiple EC numbers | 36 (AGeS) and 18 (JCVI) | 43 (AGeS) and 0 (BCM) | 64 (AGeS) and 22 (Sanger) |
| No. of overlapping enzymes with identical EC numbers | 379 | 437 | 606 |
AGeS, Annotation of microbial Genome Sequences; JCVI, J. Craig Venter Institute; BCM, Baylor College of Medicine; EC, Enzyme Commission.