| Literature DB >> 21255336 |
Roland J Siezen1, Sacha A F T van Hijum.
Abstract
Entities:
Mesh:
Year: 2010 PMID: 21255336 PMCID: PMC3815804 DOI: 10.1111/j.1751-7915.2010.00191.x
Source DB: PubMed Journal: Microb Biotechnol ISSN: 1751-7915 Impact factor: 5.813
Figure 1A generalised flow chart of genome annotation. Statistical gene prediction: use of methods like GeneMark or Glimmer to predict protein‐coding genes. General database search: searching sequence databases (typically, NCBI NR) for sequence similarity, usually using blast. Specialized database search: searching domain databases (such as Pfam, SMART and CDD), for conserved domains, genome‐oriented databases (such as COGs), for identification of orthologous relationship and refined functional prediction, metabolic databases (such as KEGG) for metabolic pathway reconstruction and other database searches. Prediction of structural features: prediction of signal peptide, transmembrane segments, coiled domain and other features in putative protein functions.
Selection of re‐annotated microbial genomes.
| Genome | Re‐sequencing | Deleted genes | New genes | Corrected genes | Original publication | Publication |
|---|---|---|---|---|---|---|
| Eukaryotes | ||||||
| | No | 370 | 3 | 46 | 1996 | |
| | No | 640 | 494 | 2005 | ||
| Prokaryotes | ||||||
| | 454 pyro, Solexa | 171 | 326 | 1997 | ||
| | No | 2000 | ||||
| | No | 608 | 299 | 435 | 2002 | |
| No | 10 | 82 | 60 | 1998 | ||
| | 454 pyro | 271 | 48 | 539 | 2005 |
Includes new pseudogenes.
Includes corrected pseudogenes, but not genes with SNPs leading to only amino acid changes.
Genome (re‐)annotation databases.
| Database | Organization | Description | Access/distribution | Reference |
|---|---|---|---|---|
| NCBI Genbank | National Institutes of Health, USA | An annotated collection of all publicly available DNA sequences | ||
| DDBJ | DDBJ (DNA Data Bank of Japan) | General nucleotide database | None | |
| EMBL | EMBL‐EBI | Nucleotide sequence database | None | |
| Entrez Genome Project | National Institutes of Health, USA | Collection of complete and incomplete genome sequences | None | |
| ERGO | Integrated Genomics, USA | A systems‐biology informatics toolkit for comparative genomics | ||
| Genome Reviews | EMBL‐EBI | Up‐to‐date, standardised and comprehensively annotated complete genomes | ||
| RefSeq | National Institutes of Health, USA | A curated non‐redundant sequence database | ||
| The SEED | Fellowship for Integration of Genomes (FIG) | Subsystems approach to genome annotation | ||
| IMG | DOE Joint Genome Institute, USA | Integrated microbial genomes database | ||
| Microbes Online | Virtual Institute for Microbial Stress and Survival | An integrated portal for comparative and functional genomics | ||
| CMR | J. Craig Venter Institute (JCVI) | Comprehensive Microbial Resource: display information on all of the publicly available, complete prokaryotic genomes | ||
| GOLD | DOE Joint Genome Institute, USA | Genomes On Line Database | ||
| Genome information broker (GIB) | DDBJ (DNA Data Bank of Japan) | Database of microbial genomes and some comparative genomic tools | ||
| Genome Atlas | CBS, Technical University of Denmark | DNA structural atlases for complete microbial genomes | ||
| Pedant | Munich Information Center for Protein Sequences (MIPS) | PEDANT 3 database: a Protein Extraction, Description and ANalysis Tool | ||
| REGANOR | CeBiTec, Germany | Gene prediction server and database | ||
| BacMap | University of Alberta, Canada | An interactive picture atlas of annotated bacterial genomes | ||
| MOSAIC | INRA, France | Database dedicated to the comparative genomics of bacterial strains at the intra‐species level | ||
| InterPro | EMBL‐EBI | Integrative protein signature database | ||
| Pfam | Sanger Institute, UK | Protein families and domains database | ||
| SMART | EMBL, Germany | Protein domain architecture database | ||
| Gene Ontology Annotation (GOA) | The Gene Ontology | GO controlled vocabulary of biological processes | ||
| TIGRFAMs | J. Craig Venter Institute (JCVI) | Assignment of molecular function and biological process | ||
| Pseudogene.Org | Yale Gerstein Group | A comprehensive database and comparison platform for pseudogene annotation | ||
| ExPASy ENZYME | Swiss Institute for Bioinformatics (SIB) | Enzyme nomenclature database | ||
| MetaCyc | SRI International, USA | Database of metabolic pathways and enzymes | ||
| KEGG | Kyoto Encyclopedia for Genes and Genomes: Kanehisa Laboratories | A bioinformatics resource for linking genomes to life and the environment |
Genome (re‐)annotation pipelines.
| Pipeline | Organization | Description | Access/distribution | Reference |
|---|---|---|---|---|
| IGS | University of Maryland | A FREE resource for genomics researchers and educators bringing advanced bioinformatics tools to the lab bench and the classroom | None | |
| JCVI annotation service | J. Craig Venter Institute (JCVI) | Free to use genome annotation service | None | |
| MiGAP | Database Center for Life Sciences (DBCLS) | Microbial Genome Annotation Pipeline (MiGAP) for diverse users | ||
| MaGe/MicroScope | GENOSCOPE | Magnifying Genomes: microbial genome annotation system | ||
| BASys | University of Alberta, Canada | A web server for bacterial genome annotation | ||
| RAST | Fellowship for Integration of Genomes (FIG) | The RAST Server: Rapid Annotations using Subsystems Technology based on the Seed | ||
| xBASE | University of Birmingham, UK | Bacterial genome annotation service | ||
| IMG ER | Joint Genome Institute (JGI) | A system for microbial genome annotation expert review and curation | ||
| GenVar | Virginia Bioinformatics Institute | Bacterial gene annotation and comparative genomics pipeline | ||
| Pedant‐Pro | Biomax | Genome analysis package for comprehensive analysis of DNA and protein sequences | ||
| AGMIAL | INRA, France | An annotation strategy for prokaryote genomes as a distributed system | ||
| GenDB | CeBiTec, Germany | Bacterial annotation system | ||
| DIYA | DIY Genomics Consortium | A bacterial annotation pipeline for any genomics lab | ||
| SABIA | LNCC, Brazil | Bacterial annotation system | ||
| MAGPIE | Genome Prairie Project, Canada | Genome annotation system | ||
| Restauro‐G | Institute for Advanced Biosciences, Keio University | A Rapid Genome Re‐Annotation System for Comparative Genomics | ||
| ATUCG system | Universidade Federal do Rio Grande do Sul, Brasil | Agent‐based environment for automatic annotation of Genomes | None Software should be requested at authors | |
| Taverna: annotation of genomes | University of Manchester | Interactive genome annotation pipeline. | ||
| KAAS | Kyoto Encyclopedia for Genes and Genomes (KEGG) | KEGG automated annotation service for metabolic pathways |
Figure 2Simplified prokaryotic genome database (PkGDB) relational model composed of three main components: sequence and annotation data (in green), annotation management (in blue) and functional predictions (in purple). Sequences and annotations come from public databanks, sequencing centres and specialized databases focused on model organisms. For genomes of interest, a (re)‐annotation process is performed using AMIGene (Bocs ) and leads to the creation of new ‘Genomic Objects’. Each ‘Genomic Object’ and associated functional prediction results are stored in the PkGDB. The database architecture supports integration of automatic and manual annotations, and management of a history of annotations and sequence updates. Reproduced from Vallenet and colleagues (2006).
Figure 3Venn diagram of comparison of gene prediction in Halorhabdus utahensis using the RAST, IMG and JCVI automated annotation services. The diagram shows the number of predicted protein‐coding genes that share start site and stop site with the other annotations. Overlapping regions indicate genes having exact matches between annotations. Adapted from Bakke and colleagues (2009).