| Literature DB >> 24293654 |
Ross Overbeek1, Robert Olson, Gordon D Pusch, Gary J Olsen, James J Davis, Terry Disz, Robert A Edwards, Svetlana Gerdes, Bruce Parrello, Maulik Shukla, Veronika Vonstein, Alice R Wattam, Fangfang Xia, Rick Stevens.
Abstract
In 2004, the SEED (http://pubseed.theseed.org/) was created to provide consistent and accurate genome annotations across thousands of genomes and as a platform for discovering and developing de novo annotations. The SEED is a constantly updated integration of genomic data with a genome database, web front end, API and server scripts. It is used by many scientists for predicting gene functions and discovering new pathways. In addition to being a powerful database for bioinformatics research, the SEED also houses subsystems (collections of functionally related protein families) and their derived FIGfams (protein families), which represent the core of the RAST annotation engine (http://rast.nmpdr.org/). When a new genome is submitted to RAST, genes are called and their annotations are made by comparison to the FIGfam collection. If the genome is made public, it is then housed within the SEED and its proteins populate the FIGfam collection. This annotation cycle has proven to be a robust and scalable solution to the problem of annotating the exponentially increasing number of genomes. To date, >12 000 users worldwide have annotated >60 000 distinct genomes using RAST. Here we describe the interconnectedness of the SEED database and RAST, the RAST annotation pipeline and updates to both resources.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24293654 PMCID: PMC3965101 DOI: 10.1093/nar/gkt1226
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The ‘Compare Regions’ tool in the SEED. The Staphylococcal SCCmec element is shown as an example. Re-arrangements within Staphylococcal SCCmec element lead to constitutive expression of resistance determinant MecA due to (partial) deletion of repressor MecI and/or sensor-transducer MecR. Homologous genes are presented as arrows with matching colors and numbers. Genes not conserved within the displayed region are gray. The graphic is centered on the focus gene (red, #1): Methicillin resistance determinant MecA; green, #8: Methicillin resistance regulatory sensor-transducer MecR1; blue, #18: Methicillin resistance repressor MecI; green, #2: transposase for IS431.
Figure 2.Circle plot showing the comparison of eight Brucella genomes relative to a user-defined reference genome. The zoomed regions highlight insertions/deletions (colored versus white) and changes in conservation relative to the reference genome (going from blue representing the highest protein sequence similarity to red representing the lowest).
Online resources supported by SEED technology
| Resource | Input | Usage | Description | URL |
|---|---|---|---|---|
| PubSEED | Genome, gene, protein, functional role, pathway (text search and sequence search) | • Browse SEED and explore SEED-based knowledge about the feature of interest • Find contextual clues based on gene co-localization, fusion events, phylogenetic profiling • Compare genomes (sequence based or function based) • Explore subsystems • Browse pre-computed alignments and trees for protein of interest • Register as user and get annotation rights (add/change annotations, build subsystems, add literature and so forth.) | Genome database and collection of tools designed for high-quality genome annotation and comparative genome analysis for research applications; genome context analysis tools use gene co-localization, fusion events, phyletic (occurrence) profiling; the only major database editable and expandable by a user (on registration); intended for experimental biologists, does not require programming skills | |
| RAST | DNA sequence (genome, phage, plasmid) | • Download RAST-annotated genome (gene calls, protein functions, subsystems) and use your own tools • Browse RAST-annotated genome in SEED Viewer (compare with public genomes or other genomes that you have submitted to RAST) • Curate your RAST-annotated genome in SEED Viewer (change annotations, add/delete gene calls) • Allow collaborators pre-publication access to your RAST-annotated genome • Request automatic metabolic model when submitting your genome to RAST | Automatic server for rapid and accurate annotation of prokaryotic, phage or plasmid genomes using SEED technology | |
| myRAST | DNA sequence (prokaryotic genome or metagenomic data) | • Download and install locally a myRAST distribution package • Perform automated and manual annotation of private genome or metagenome on your laptop • Use pre-programmed scripts (>150 available) • Extract various types of data from SEED or run numerous computational tasks remotely | Standalone application for a user's computer capable of performing computationally expensive operations (e.g. annotation of genomes or collections of metagenomic sequences) using SEED web service technology | |
| Server scripts | Research questions | • Download and install locally a small Client Package (Perl or Java) that defines network-based SEED API • Use pre-programmed scripts (>150 available) or pipe them together • Extract various types of data from SEED or run numerous computational tasks remotely | High-performance network-based servers that provide programmatic access to all data types in SEED: genomes, annotations and metabolic models | |
| ModelSEED | RAST-annotated genome | • Generate draft genome-scale metabolic model starting from a RAST-annotated prokaryotic genome sequence • Compare two or more models for the same organism or for diferent species • Predict culture conditions for an organism • Predict essential genes | Public resource for the generation, optimization, exploration, comparison and analysis of genome-scale metabolic models | |
| PATRIC | bacterial taxon, genome, gene, pathway, transcriptomic data, (text search, sequence search, metadata-based filtering and browsing) | • View and analyze RAST-annotated genomes, compare annotations from different sources • Compare protein families and pathways across hundreds of genomes using interactive analysis and visualization tools • View, analyze and compare public and private transcriptomic data sets • Use metadata-based filtering and smart searches to find data of interest • Analyze protein–protein interactions and disease-associated data • Work in private workspace and save default data sets | The all bacterial bioinformatics resource center (BRC) that provides integrated data and analysis tools, intended as a resouce for experimental biologists, tries to meet the needs of both bioinformaticians and the computationally naïve user | |
| PhAnToMe | phage or prophage | • Browse phage database • Identify prophages in microbial genomes • Compare phages and prophages in SEED Viever • Explore phage subsystems | Phage and prophage annotation database with a visual programming interface |
Major milestones and improvements in the RAST system over the past 5 years
| Categories | 2008 | 2013 |
|---|---|---|
| Users | 120 | 12 000 |
| Jobs | 1200 | 100 000 |
| Distinct genomes | 350 | 60 000 |
| Number of FIGfams | 100 000 | 185 000 |
| Number of PEGs in FIGfams | 1.1 million | 16 million |
| Throughput | 50–100 genomes/day | 500–1000 genomes/day |
| Maximum throughput | 300 genomes/day | 1000 genomes/day |
| Number of subsystems | 700 | 1600 |
| Number of literature references attached to features | 19 562 | 1 349 874 |
| Data types accepted | Complete genomes | Phages, plasmids, draft genomes, complete genomes |
| Formats accepted | FASTA | FASTA, GenBank |
| Submissions | Single, web-based submissions only | Web submissions and batch submissions |
| ORF calling | Glimmer2 | Gimmer3, RAST, user provided ORF calls |
Figure 3.Number of users (open squares) and number of jobs (closed circles) in the RAST system. As of September 2013, there were over 100 000 jobs processed by RAST and >12 000 active users of the system.
Figure 4.Genomes processed by RAST displayed over a taxonomic tree. In all, 12 289 RAST annotated public genomes for PATRIC available on the PubSEED were compared at the order level using the NCBI taxonomy (25). Black bars show the number of sequenced representatives per order. White bars show those orders with no sequenced representatives. The tree was created using the Interactive Tree of Life (http://itol.embl.de/) and is unrooted.