Literature DB >> 31411700

Sequenceserver: A Modern Graphical User Interface for Custom BLAST Databases.

Anurag Priyam¹, Ben J Woodcroft², Vivek Rai³, Ismail Moghul¹, Alekhya Munagala⁴, Filip Ter¹, Hiten Chowdhary⁴, Iwo Pieniak¹, Lawrence J Maynard¹, Mark Anthony Gibbins⁵, HongKee Moon⁶, Austin Davis-Richardson⁷, Mahmut Uludag⁸, Nathan S Watson-Haigh⁹, Richard Challis^10,11, Hiroyuki Nakamura¹², Emeline Favreau¹, Esteban A Gómez¹, Tomás Pluskal¹³, Guy Leonard¹⁴, Wolfgang Rumpf¹⁵, Yannick Wurm^1,16.

Abstract

Comparing newly obtained and previously known nucleotide and amino-acid sequences underpins modern biological research. BLAST is a well-established tool for such comparisons but is challenging to use on new data sets. We combined a user-centric design philosophy with sustainable software development approaches to create Sequenceserver, a tool for running BLAST and visually inspecting BLAST results for biological interpretation. Sequenceserver uses simple algorithms to prevent potential analysis errors and provides flexible text-based and visual outputs to support researcher productivity. Our software can be rapidly installed for use by individuals or on shared servers.

Entities: CellLine Chemical Disease Gene Species

Keywords: BLAST; comparative genomics; sequence analysis; visualization

Mesh：

Year: 2019 PMID： 31411700 PMCID： PMC6878946 DOI： 10.1093/molbev/msz185

Source DB: PubMed Journal: Mol Biol Evol ISSN： 0737-4038 Impact factor: 16.240

Introduction

The dramatic drop in sequencing costs has created many opportunities for individuals and groups of researchers to generate genomic or transcriptomic sequences from previously understudied organisms. Many research questions require small- or large-scale sequence comparisons, and BLAST (Basic Local Alignment Search Tool) is the most established tool for many such analyses (Altschul et al. 1990; Camacho et al. 2009). Unfortunately, BLAST analysis of new data can be challenging. There are delays before new data are submitted to and become publicly available on central BLAST repositories such as the NCBI (National Center for Biotechnology Information), and only small queries are feasible on such repositories. BLAST can be downloaded and installed locally, but its usage can be challenging for researchers without experience of command-line interfaces. Finally, commercial software to overcome such hurdles is too costly for many laboratories. Here, we present Sequenceserver, a free graphical interface for BLAST designed to increase the productivity of biologist researchers performing and interpreting BLAST searches on custom data sets, and of bioinformaticians setting up shared laboratory or community databases. It has a user-centric focus (Garrett 2011) on accompanying researchers through their work process. Below, we provide an overview of Sequenceserver features that facilitate BLAST query submission and interpretation.

Assisted Installation and BLAST Query Submission

Installing Sequenceserver on computers running macOS or Linux is typically rapid, requiring only one or few commands (see online documentation). If necessary, Sequenceserver automates the download of BLAST (Camacho et al. 2009) binaries and can manage the conversion of FASTA files to BLAST databases. A user accesses Sequenceserver’s graphical interface in a web browser at http://localhost:4567 (fig. 1). All detected BLAST databases are automatically listed here. The user types, pastes or drag-and-drops FASTA format query sequences into a text-field (fig. 1). To prevent common errors, an alert message is shown and query submission is disabled if the query is invalid (e.g., combining nucleotide and protein sequences). The user then selects databases. The appropriate basic BLAST algorithm will automatically be used (supplementary fig. S1, Supplementary Material online). When multiple algorithms are appropriate, a pull-down in the BLAST submission button allows the user to toggle between them. An “advanced parameters” field provides access to all standard BLAST parameters.

. 1.

(A) Partial screenshot of the query interface. Numbers circled in red highlight the steps involved and some specific features. (1) Three or more sequences were pasted into the query field (typewriter font; only the identifier is visible for the third sequence); a message confirms to the user that these are amino acid sequences. (2) The Swiss-Prot protein database was the first database to be selected. As a result, additional database selections are limited to protein databases; nucleotide databases are disabled. (3) Optional advanced parameters were entered which constrain the results to the ten strongest hits with E-values stronger than 10−10. (4) The BLAST button is automatically activated and labeled “BlastP” as this is the only possible basic BLAST algorithm for the given query-database combination. As the user’s mouse pointer hovers over the BlastP button, a tooltip indicates that a keyboard shortcut exists for this button. (B) Partial screenshot of a Sequenceserver BLAST report. An interactive version of this figure is online at http://sequenceserver.com/paper/resultsinteractive (last accessed August 25, 2019). Three amino acid sequences were compared against the Swiss-Prot database using BlastP with an E-value cutoff of 10−10 and keeping only the ten strongest hits per query. This screenshot shows a portion of the results for the first query. Numbers circled in red highlight some specific features of this report. (1) An index overview summarizes the query and database information and provides clickable links to query-specific results. (2) Results for the first query are shown. These include a graphical overview indicating which parts of the query sequence align to each hit, a tabular summary of all hits, and alignment details for each hit. (3) The first hit is selected for download; its alignment details have been folded away. (4) The user is studying the second hit; the mouse pointer hovers over the link to the hit’s UniProt page. (C) Sequenceserver usage as of June 11, 2019. These include download statistics from https://rubygems.org/gems/sequenceserver, Google Analytics statistics for http://sequenceserver.com, and citation statistics from https://app.dimensions.ai/details/publication/pub.1085102830, and GitHub statistics from https://github.com/wurmlab/sequenceserver.

BLAST Result Visualization and Further Analysis

The Sequenceserver results page is designed to facilitate navigation, interpretation, and follow-up analysis (fig. 1 and http://sequenceserver.com/paper/resultsinteractive; last accessed August 25, 2018). Results are visually structured and will feel familiar to users of NCBI BLAST. If multiple query sequences were submitted, a clickable index of queries is shown. Queries, hits, and BLAST HSPs (high-scoring segment pairs) are numbered to facilitate navigation. For each query, identified hits are summarized in a table and an overview graphic. Each hit includes links for FASTA download, sequence visualization, and potentially to other resources. Such links can be automatically added based on regular expression analysis of identifiers (see online documentation). BLAST results can be downloaded in XML or tab-delimited table formats for further analysis. Similarly, a FASTA file containing all hit sequences, or a selection of hit sequences can be downloaded.

Usage by Individual Researchers and as Part of Community Databases

Usage statistics including downloads, preprint citations, GitHub, and mailing list participation (fig. 1) indicate that Sequenceserver is extensively used for molecular-genetic research on emerging model organisms (supplementary table S1, Supplementary Material online). For example, Sequenceserver installations on personal computers helped characterize the evolution of tunicate genomes (Blanchoud et al. 2018), fire ant olfactory genes (Pracana et al. 2017), and loci affecting Sorghum shoot architecture (McCormick et al. 2016). Sequenceserver has also been used to analyze human prostate cancer genomes (Seim et al. 2017) and to identify bacteria affecting shelf life of milk (Reichler et al. 2018). Importantly, Sequenceserver also represents a main querying mechanism for more than 50 community genome databases (supplementary table S2, Supplementary Material online), including the PHI-base database of genes underpinning pathogen–host interactions (Winnenburg et al. 2006), an initiative to sequence 1,000 wild yeast genomes (Shen et al. 2016), and the http://reefgenomics.org coral genomics database; last accessed August 25, 2019 (Liew et al. 2016). Such community resources typically integrate Sequenceserver as part of larger web servers (e.g., Nginx [Reese 2008]) and customize it by adding links from BLAST hits to genome browsers or other gene-specific information. Additionally, many password-protected Sequenceserver instances exist for unpublished data.

Outlook

In creating Sequenceserver, we aimed to respect user-centric design principles, open-source, and sustainable software engineering practices (Supplementary Material online). Our software is built using Ruby and Javascript frameworks commonly used for professional software development. The resulting robust architecture and flexibility facilitate customization and integration with other tools. This has led to contributions of improvements and bug-fixes by 21 bioinformaticians unrelated to the initial project; many are now coauthors. Our community is testing the ability to import preexisting BLAST or DIAMOND XML result files (Buchfink et al. 2015), and new manners of visualizing results (Wintersinger and Wasmuth 2015; Cui et al. 2016). Such efforts will continue to improve the ability of researchers to analyze and interpret genomic data.

Data Availability

Source code is available under GNU Affero General Public License (AGPL) 3.0 at https://github.com/sequenceserver (last accessed August 25, 2019). Additional documentation is available online at http://sequenceserver.com (last accessed August 25, 2019).

Supplementary Material

Supplementary materials are available at Molecular Biology and Evolution online. Click here for additional data file.

13 in total

1. BioCircos.js: an interactive Circos JavaScript library for biological data visualization on web applications.

Authors: Ya Cui; Xiaowei Chen; Huaxia Luo; Zhen Fan; Jianjun Luo; Shunmin He; Haiyan Yue; Peng Zhang; Runsheng Chen
Journal: Bioinformatics Date: 2016-01-27 Impact factor: 6.937

2. Kablammo: an interactive, web-based BLAST results visualizer.

Authors: Jeff A Wintersinger; James D Wasmuth
Journal: Bioinformatics Date: 2014-12-05 Impact factor: 6.937

3. Fast and sensitive protein alignment using DIAMOND.

Authors: Benjamin Buchfink; Chao Xie; Daniel H Huson
Journal: Nat Methods Date: 2014-11-17 Impact factor: 28.547

4. 3D Sorghum Reconstructions from Depth Images Identify QTL Regulating Shoot Architecture.

Authors: Ryan F McCormick; Sandra K Truong; John E Mullet
Journal: Plant Physiol Date: 2016-08-15 Impact factor: 8.340

5. PHI-base: a new database for pathogen host interactions.

Authors: Rainer Winnenburg; Thomas K Baldwin; Martin Urban; Chris Rawlings; Jacob Köhler; Kim E Hammond-Kosack
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

6. Reefgenomics.Org - a repository for marine genomics data.

Authors: Yi Jin Liew; Manuel Aranda; Christian R Voolstra
Journal: Database (Oxford) Date: 2016-12-26 Impact factor: 3.451

7. Whole-Genome Sequence of the Metastatic PC3 and LNCaP Human Prostate Cancer Cell Lines.

Authors: Inge Seim; Penny L Jeffery; Patrick B Thomas; Colleen C Nelson; Lisa K Chopin
Journal: G3 (Bethesda) Date: 2017-06-07 Impact factor: 3.154

8. Reconstructing the Backbone of the Saccharomycotina Yeast Phylogeny Using Genome-Scale Data.

Authors: Xing-Xing Shen; Xiaofan Zhou; Jacek Kominek; Cletus P Kurtzman; Chris Todd Hittinger; Antonis Rokas
Journal: G3 (Bethesda) Date: 2016-12-07 Impact factor: 3.154

9. De novo draft assembly of the Botrylloides leachii genome provides further insight into tunicate evolution.

Authors: Simon Blanchoud; Kim Rutherford; Lisa Zondag; Neil J Gemmell; Megan J Wilson
Journal: Sci Rep Date: 2018-04-03 Impact factor: 4.379

10. Fire ant social chromosomes: Differences in number, sequence and expression of odorant binding proteins.

Authors: Rodrigo Pracana; Ilya Levantis; Carlos Martínez-Ruiz; Eckart Stolle; Anurag Priyam; Yannick Wurm
Journal: Evol Lett Date: 2017-08-23

55 in total

1. Independent evolution of rosmarinic acid biosynthesis in two sister families under the Lamiids clade of flowering plants.

Authors: Olesya Levsh; Tomáš Pluskal; Valentina Carballo; Andrew J Mitchell; Jing-Ke Weng
Journal: J Biol Chem Date: 2019-09-03 Impact factor: 5.157

2. Doing Genetic and Genomic Biology Using the Legume Information System and Associated Resources.

Authors: Sven Redsun; Sam Hokin; Connor T Cameron; Alan M Cleary; Joel Berendzen; Sudhansu Dash; Anne V Brown; Andrew Wilkey; Jacqueline D Campbell; Wei Huang; Scott R Kalberer; Nathan T Weeks; Steven B Cannon; Andrew D Farmer
Journal: Methods Mol Biol Date: 2022

3. Discovery and biosynthesis of cyclic plant peptides via autocatalytic cyclases.

Authors: Desnor N Chigumba; Lisa S Mydy; Floris de Waal; Wenjie Li; Khadija Shafiq; Jesse W Wotring; Osama G Mohamed; Tim Mladenovic; Ashootosh Tripathi; Jonathan Z Sexton; Satria Kautsar; Marnix H Medema; Roland D Kersten
Journal: Nat Chem Biol Date: 2021-11-22 Impact factor: 15.040

4. Coordination mechanism of cell and cyanelle division in the glaucophyte alga Cyanophora sudae.

Authors: Nobuko Sumiya
Journal: Protoplasma Date: 2021-09-23 Impact factor: 3.356

5. Ancient plant-like terpene biosynthesis in corals.

Authors: Immo Burkhardt; Tristan de Rond; Percival Yang-Ting Chen; Bradley S Moore
Journal: Nat Chem Biol Date: 2022-05-23 Impact factor: 16.174

6. GrainGenes: a data-rich repository for small grains genetics and genomics.

Authors: Eric Yao; Victoria C Blake; Laurel Cooper; Charlene P Wight; Steve Michel; H Busra Cagirici; Gerard R Lazo; Clay L Birkett; David J Waring; Jean-Luc Jannink; Ian Holmes; Amanda J Waters; David P Eickholt; Taner Z Sen
Journal: Database (Oxford) Date: 2022-05-25 Impact factor: 4.462