| Literature DB >> 29106469 |
Yasuhiro Tanizawa1, Takatomo Fujisawa1, Yasukazu Nakamura1.
Abstract
Summary: We developed a prokaryotic genome annotation pipeline, DFAST, that also supports genome submission to public sequence databases. DFAST was originally started as an on-line annotation server, and to date, over 7000 jobs have been processed since its first launch in 2016. Here, we present a newly implemented background annotation engine for DFAST, which is also available as a standalone command-line program. The new engine can annotate a typical-sized bacterial genome within 10 min, with rich information such as pseudogenes, translation exceptions and orthologous gene assignment between given reference genomes. In addition, the modular framework of DFAST allows users to customize the annotation workflow easily and will also facilitate extensions for new functions and incorporation of new tools in the future. Availability and implementation: The software is implemented in Python 3 and runs in both Python 2.7 and 3.4-on Macintosh and Linux systems. It is freely available at https://github.com/nigyta/dfast_core/under the GPLv3 license with external binaries bundled in the software distribution. An on-line version is also available at https://dfast.nig.ac.jp/. Contact: yn@nig.ac.jp. Supplementary information: Supplementary data are available at Bioinformatics online.Entities:
Mesh:
Year: 2018 PMID: 29106469 PMCID: PMC5860143 DOI: 10.1093/bioinformatics/btx713
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1DFAST annotation workflow. Items marked with asterisks are included in the default workflow
Comparison of annotation results of E.coli O26: H11 str. 11368
| Data source/Annotation tool | INSDC | RefSeq | DFAST | Prokka | MiGAP |
|---|---|---|---|---|---|
| Total CDS | 5795 | 6243 | 5740 | 5759 | 5721 |
| | 276 | 337 (250/87) | 344 (158/186) | [30 | — |
| | 3 | 1 | 3 | — | — |
| | — | — | 3965 | — | 4392 |
| | 1203 | 1514 | 1347 | 2068 | 418 |
| tRNA | 101 | 101 | 105 | 105 | 100 |
| rRNA | 22 | 22 | 22 | 22 | 22 |
| CRISPR array | — | 2 | 2 | 2 | — |
| Running time | — | — | 3 m 27 s | 3 m 20 s | 4 h 43 m |
Note: Numbers represent annotated features and running time. DFAST and Prokka were run on a 4-core Macintosh laptop with default settings.
Original annotation by submitters (GCA_000091005.1).
Annotated by PGAP (GCF_000091005.1).
Numbers in parentheses denote internal stop codon/frameshift and partial genes, respectively.
Candidates for pseudogenes are mentioned in the log file, not in the result.