| Literature DB >> 27141966 |
David Arndt1, Jason R Grant1, Ana Marcu1, Tanvir Sajed1, Allison Pon1, Yongjie Liang1, David S Wishart2.
Abstract
PHASTER (PHAge Search Tool - Enhanced Release) is a significant upgrade to the popular PHAST web server for the rapid identification and annotation of prophage sequences within bacterial genomes and plasmids. Although the steps in the phage identification pipeline in PHASTER remain largely the same as in the original PHAST, numerous software improvements and significant hardware enhancements have now made PHASTER faster, more efficient, more visually appealing and much more user friendly. In particular, PHASTER is now 4.3× faster than PHAST when analyzing a typical bacterial genome. More specifically, software optimizations have made the backend of PHASTER 2.7X faster than PHAST, while the addition of 80 CPUs to the PHASTER compute cluster are responsible for the remaining speed-up. PHASTER can now process a typical bacterial genome in 3 min from the raw sequence alone, or in 1.5 min when given a pre-annotated GenBank file. A number of other optimizations have also been implemented, including automated algorithms to reduce the size and redundancy of PHASTER's databases, improvements in handling multiple (metagenomic) queries and higher user traffic, along with the ability to perform automated look-ups against 14 000 previously PHAST/PHASTER annotated bacterial genomes (which can lead to complete phage annotations in seconds as opposed to minutes). PHASTER's web interface has also been entirely rewritten. A new graphical genome browser has been added, gene/genome visualization tools have been improved, and the graphical interface is now more modern, robust and user-friendly. PHASTER is available online at www.phaster.ca.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27141966 PMCID: PMC4987931 DOI: 10.1093/nar/gkw387
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.A screenshot montage of the upgraded PHASTER user interface.
Detail of PHASTER's performance upgrades, and a comparison of their impact on runtimes for a 5.5 Mbp test genome (Escherichia coli O157:H7, GenBank accession NC_002655)
| Cumulative set of performance enhancements | BLAST versus viral DB runtime (s) | BLAST versus bacterial DB runtime (s) | Total runtime on GenBank annotated genome (s) | Total runtime on unannotated genome (s) |
|---|---|---|---|---|
| PHAST (baseline) current DBs, no other upgrades | 191 | 576 | 270 | 899 |
| PHASTER (upgrade 1) – filter bacterial DB | 191 | 309 | 270 | 632 |
| PHASTER (upgrade 2) – cluster upgrade | 88 | 166 | 167 | 386 |
| PHASTER (upgrade 3) – BLAST+ | 77 | 132 | 156 | 341 |
| PHASTER (upgrade 4) – partition query sequences evenly | 47 | 103 | 126 | 282 |
| PHASTER (upgrade 5) – bacterial DB, optimize parameters | 47 | 48 | 126 | 227 |
| PHASTER (upgrade 6) – faster front-end server | 47 | 48 | 100 | 205 |
A feature comparison between PHAST and PHASTER
| Feature | PHAST (as of January 2011) | PHASTER |
|---|---|---|
| Viral sequence database | ∼45 000 sequences | ∼187 000 sequences |
| Bacterial sequence database | ∼4 million sequences | ∼9 million sequences, streamlined through CD-HIT filtering |
| Computing cluster | 32 CPU cores | 112 CPU cores |
| BLAST | Legacy version 2.2.16 | BLAST+ version 2.3.0+ |
| Cluster use optimization | Rudimentary | Smart partitioning of query sequences and target bacterial DB; optimized execution parameters |
| Front-end server | Shared, single CPU | 50% faster, dedicated |
| Front-end website | Perl and CGI | Ruby on Rails |
| Genome viewer | Adobe Flash | JavaScript, AngularPlasmid and D3 |
| Queuing system | Flat file | Uses Sidekiq for threading submissions |
| Recall previous user submissions | Bookmark page | ‘My Searches’ feature or bookmark |
| Pre-computed genome results for quick query searching | 0 | >14 000 |
| Retrieve previously annotated genome results | GenBank accession or GI number only | GenBank accession, GI number, or full sequence |
| Metagenomic data handling | NA | Supported for contigs >2000 bp |