| Literature DB >> 25666585 |
Thomas Brettin1, James J Davis1, Terry Disz2, Robert A Edwards3, Svetlana Gerdes4, Gary J Olsen5, Robert Olson6, Ross Overbeek4, Bruce Parrello4, Gordon D Pusch4, Maulik Shukla7, James A Thomason8, Rick Stevens9, Veronika Vonstein4, Alice R Wattam7, Fangfang Xia6.
Abstract
The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.Entities:
Mesh:
Year: 2015 PMID: 25666585 PMCID: PMC4322359 DOI: 10.1038/srep08365
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1RASTtk options that are available on the RAST website (http://rast.nmpdr.org).
A table of options is displayed when the user selects the RASTtk annotation scheme and clicks the checkbox for “Customize”. Individual steps can be turned off and on using the check boxes. Parameters and conditions can be changed or added as needed. Dragging and dropping table rows will change the order of the steps.
Characteristics of the RASTtk scripts
| Tool | Feature Type Annotated | Input file type | Output file type | Default | Citation |
|---|---|---|---|---|---|
| rast-create-genome | n/a | Contigs in FASTA format | GTO | yes | This study |
| rast-process-genome | CDS, RNA, Repeat Regions, CRISPRS | GTO | GTO | yes | This study |
| rast-export-genome | All feature types | GTO | FASTA, Genbank, feature table etc. | yes | This study |
| rast-add-features | user-defined | tab-delimited text | GTO | no | This study |
| rast-annotate-proteins-kmer-v1 | CDS | GTO | GTO | yes | [ |
| rast-annotate-proteins-kmer-v2 | CDS | GTO | GTO | yes | This study |
| rast-annotate-proteins-similarity | CDS | GTO | GTO | no | This study |
| rast-call-features-CDS-genemark | CDS | GTO | GTO | no | [ |
| rast-call-features-CDS-glimmer3 | CDS | GTO | GTO | yes | [ |
| rast-call-features-CDS-prodigal | CDS | GTO | GTO | yes | [ |
| rast-call-features-crispr | CRISPR array, CRISPR repeat and CRISPR spacer | GTO | GTO | yes | This study |
| rast-call-features-insertion-sequences | IS elements | GTO | GTO | no | This study |
| rast-call-features-prophage-phispy | Prophage | GTO | GTO | no | [ |
| rast-call-features-pyrrolysoprotein | CDS | GTO | GTO | yes | [ |
| rast-call-features-repeat-region-SEED | Repeat regions | GTO | GTO | yes | This study |
| rast-call-features-rRNA-SEED | RNA (rRNA) | GTO | GTO | yes | This study |
| rast-call-features-selenoprotein | CDS | GTO | GTO | yes | [ |
| rast-call-features-strep-pneumo-repeat | Repeat regions | GTO | GTO | conditional | [ |
| rast-call-features-strep-suis-repeat | Repeat regions | GTO | GTO | conditional | [ |
| rast-call-features-tRNA-trnascan | RNA (tRNA) | GTO | GTO | yes | [ |
| rast-resolve-overlapping-features | n/a | GTO | GTO | yes | This study |
| rast-update-annotations | n/a | GTO | GTO | no | This study |
| rast-set-metadata | n/a | GTO | GTO | no | This study |
| rast-process-genome-batch | CDS, RNA, Repeat Regions, CRISPRS, IS elements | GTO | n/a | no | This study |
| rast-query-genome-batch | n/a | n/a | n/a | no | This study |
| rast-download-genome-batch | n/a | n/a | GTO | no | This study |
| rast-call-features-ProtoCDS-kmer-v1 | n/a | GTO | GTO | no | This study |
| rast-call-features-ProtoCDS-kmer-v2 | n/a | GTO | GTO | no | This study |
| rast-compute-special-proteins | n/a | GTO | tab-delimited text | no | [ |
| rast-enumerate-special-protein-databases | n/a | n/a | n/a | no | This study |
Figure 2The RAST workflow.
Each individual step is bounded by a box, and steps are connected by arrows. New RASTtk steps are indicated by red boxes and arrows. Improvements in the original steps are indicated in red text. Steps that are no longer part of the RASTtk pathway are indicated by gray arrows.