Literature DB >> 25845596

The EMBL-EBI bioinformatics web and programmatic tools framework.

Weizhong Li1, Andrew Cowley1, Mahmut Uludag1, Tamer Gur1, Hamish McWilliam1, Silvano Squizzato1, Young Mi Park1, Nicola Buso1, Rodrigo Lopez2.   

Abstract

Since 2009 the EMBL-EBI Job Dispatcher framework has provided free access to a range of mainstream sequence analysis applications. These include sequence similarity search services (https://www.ebi.ac.uk/Tools/sss/) such as BLAST, FASTA and PSI-Search, multiple sequence alignment tools (https://www.ebi.ac.uk/Tools/msa/) such as Clustal Omega, MAFFT and T-Coffee, and other sequence analysis tools (https://www.ebi.ac.uk/Tools/pfa/) such as InterProScan. Through these services users can search mainstream sequence databases such as ENA, UniProt and Ensembl Genomes, utilising a uniform web interface or systematically through Web Services interfaces (https://www.ebi.ac.uk/Tools/webservices/) using common programming languages, and obtain enriched results with novel visualisations. Integration with EBI Search (https://www.ebi.ac.uk/ebisearch/) and the dbfetch retrieval service (https://www.ebi.ac.uk/Tools/dbfetch/) further expands the usefulness of the framework. New tools and updates such as NCBI BLAST+, InterProScan 5 and PfamScan, new categories such as RNA analysis tools (https://www.ebi.ac.uk/Tools/rna/), new databases such as ENA non-coding, WormBase ParaSite, Pfam and Rfam, and new workflow methods, together with the retirement of depreciated services, ensure that the framework remains relevant to today's biological community.
© The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Year:  2015        PMID: 25845596      PMCID: PMC4489272          DOI: 10.1093/nar/gkv279

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The European Bioinformatics Institute (EMBL-EBI https://www.ebi.ac.uk) has provided free and open access to a range of bioinformatics applications for sequence analysis since 1998 (1). In 2009 the Job Dispatcher framework (2,3) was released to provide consistent, robust and updatable access to modern bioinformatics tools such as NCBI BLAST+ (4) and PSI-Search (5) for sequence similarity searching; InterProScan (6) and PfamScan (7) for protein functional analysis; and multiple sequence alignment tools such as Clustal Omega (8), Kalign2 (9) and MAFFT (10). Through these applications the latest mainstream bioinformatics databases can be searched, for example ENA (11), Ensembl Genomes (12), UniProt (13), InterPro (14) and Pfam (15). The framework is used by academic and industry scientists, and in 2014 handled roughly 110 million analysis jobs, up from 65 million in 2013. Help pages, tutorials and user guides (available as protocols (16)) are provided, together with training courses and helpdesk support. Continued feedback from the biological community, collaboration with bioinformatics tools and data providers and comprehensive metrics analysis helps to drive improvements to the accessibility and quality of the services.

THE TOOLS FRAMEWORK

The EMBL-EBI Job Dispatcher is a modular and configuration-driven framework aimed at both novice and expert users. A uniform web browser interface enables users to upload their data or select existing data from our databases for analysis in a wide range of applications (Table 1). Browser inputs are checked to validate all the parameters required for successful job submission and guidance is provided to the user in the case of failure. Default parameter choices are set in collaboration with the tool authors for the intended uses of the tools, and can be adjusted by the user. Results are presented visually and enriched with data from other applications, for example cross-reference annotations via EBI Search (17) or functional domain predictions via InterPro (14). Biological data entries discovered as part of the analysis can be retrieved via the dbfetch service (3). SOAP and REST Web Services provide stable APIs for programmatic use. Input validation and parameter help is also built-in to Web Service use and results can be retrieved in a range of graphical and machine readable formats. Sample Web Services clients are available in a range of programming languages (e.g. C#, Java, Perl, Python and Ruby).
Table 1.

Tool services available in the Job Dispatcher framework

CategoryTool
EMBOSS Programs (https://www.ebi.ac.uk/Tools/emboss/)needle, stretcher, water, matcher, transeq, sixpack, backtranseq, backtranambig, pepinfo, pepstats, pepwindow, cpgplot, newcpgreport, isochore & seqret
Multiple Sequence Alignment (https://www.ebi.ac.uk/Tools/msa/)clustal omega, clustalw2, dbclustal, kalign, mafft, mafft_addseq, muscle, mview, tcoffee & prank
Pairwise Sequence Alignment (https://www.ebi.ac.uk/Tools/psa/)needle, stretcher, water, matcher, lalign, wise2dba, genewise & promoterwise
Phylogeny Analysis (https://www.ebi.ac.uk/Tools/phylogeny/)clustalw2 phylogeny & raxml_epa
Protein Functional Analysis (https://www.ebi.ac.uk/Tools/pfa/)censor, fingerprintscan, interproscan 5, pfamscan, phobius, pratt, prosite scan & radar
RNA Analysis (https://www.ebi.ac.uk/Tools/rna/)infernal_cmscan & mapmi
Sequence Format Conversion (https://www.ebi.ac.uk/Tools/sfc/)seqret, readseq & mview
Sequence Operation (https://www.ebi.ac.uk/Tools/so/)censor & seqcksum
Sequence Similarity Search (https://www.ebi.ac.uk/Tools/sss/)ncbiblast+, fasta, ggsearch, glsearch, psiblast, psisearch, ssearch & wublast
Sequence Statistics (https://www.ebi.ac.uk/Tools/seqstats/)pepinfo, pepstats, pepwindow, saps, cpgplot, newcpgplot & isochore
Sequence Translation (https://www.ebi.ac.uk/Tools/st/)transeq, sixpack, backtranseq & backtranambig

New analysis tools and databases

New tool developments include NCBI BLAST+ for sequence similarity searching, InterProScan 5 (6) and PfamScan (7) for protein functional analysis, Infernal_cmscan (18) and MapMi (19) for RNA analysis, RAxML_EPA (20) for phylogenetic analysis and MAFFT_addseq (10) for multiple sequence alignment. Please see the Supplementary Information for sample inputs for PfamScan, Infernal_cmscan and MapMi. New sequence databases include ENA Coding and Non-coding sequence databases, WormBase ParaSite (21), Pfam (15), Rfam (22) along with many new genomes and proteomes as existing databases are updated.

Tool and database retirements

Legacy applications SRS (23), InterProScan 4 (24) and NCBI BLAST (non-plus) have been retired. EMBLCDS (11), HGVbase (25), IPI (26) and proteomes databases (13) have been removed from sequence similarity searching services.

New functionalities

As a result of user-feedback we have incorporated additional workflow functionalities to the framework. Interactive workflows help the investigator move between different tool categories and can be utilised through both the web browser interface and Web Services. Figure 1 illustrates an example workflow constructing a phylogenetic tree from sequence similarity search results. The top BLAST hit sequences (Figure 1a) are selected and aligned using the Clustal Omega tool (8); the alignment (Figure 1b) is then used to generate a phylogenetic analysis (Neighbor-Joining clustering (27)) and the final phylogenetic tree is displayed (Figure 1c). The user can control which sequences are selected at each stage in the process. Sequences are retrieved behind the scenes and robust filtering and validation procedures and additional pre- and post-processing steps have been implemented to ensure successful job submissions across the workflow.
Figure 1.

An example workflow from NCBI BLAST+ to Clustal Omega and construction of a phylogenetic tree. (a) Perform a NCBI BLAST+ similarity search and select sequence hits from the summary table to align with Clustal Omega; (b) Perform a simple phylogenetic analysis on the Clustal Omega alignment; (c) Visualise the phylogenetic tree.

An example workflow from NCBI BLAST+ to Clustal Omega and construction of a phylogenetic tree. (a) Perform a NCBI BLAST+ similarity search and select sequence hits from the summary table to align with Clustal Omega; (b) Perform a simple phylogenetic analysis on the Clustal Omega alignment; (c) Visualise the phylogenetic tree.

New result representations

The web interfaces have adopted the latest EMBL-EBI web style guidelines (https://www.ebi.ac.uk/web/guidelines) and are more user-friendly as a result of extensive usability testing. Feature annotations can be displayed that (Figure 2) highlight UniProt sequence features present within well-aligned regions and are available in the FASTA (28), PSI-Search and LALIGN services. The result summary tables (Figure 1a) for sequence similarity searching can now be downloaded in XML, CSV, TSV and JSON formats. The NCBI BLAST+ service now offers more BLAST alignment views, including ASN archive format. Phylogenetic analysis offers output in percentage identity matrix (PIM) and a new tree viewing (Figure 1c) component is now available that uses JavaScript technologies such as BioJS (29) and D3 (d3js.org).
Figure 2.

An example domain display from PSI-Search output, showing UniProt sequence features that are present in significantly aligned regions.

An example domain display from PSI-Search output, showing UniProt sequence features that are present in significantly aligned regions. WSDLs for the SOAP Web Services API have been provided since the first availability of the framework in 2009. Users of the REST API are now supported through the provision of equivalent WADLs for all tools. The parameter settings of analysis jobs can be accessed through the REST API as well. Integrated tests have been implemented to make the Web Services more robust and stable.

Help and documentation

EMBL-EBI offers helpdesk support and training courses for the use of the tool services provided by the framework. General help, FAQ pages, tutorials and example protocols (16) are available for using the services via web browser interfaces and sample clients for Web Services. A brief guide to Web Services technologies is also provided for those wishing to learn more and develop their own client programs (https://www.ebi.ac.uk/Tools/webservices/tutorials/00_contents).

FUTURE DEVELOPMENTS

As well as continuing to maintain existing services, future planned developments include the integration of new tools such as HMMER 3 (30), R-COFFEE (31) and new data resources such as ENA Barcode and Geospatial databases (11). Further cross-resource integration will be available, such as additional annotations to sequence similarity results using the EBI Search Web Services (17) and visualisations using novel client-side technologies that render complex data faster and in more efficient ways than traditional server-side methods. Ensembl data (32) will be available via the NCBI BLAST+ service. Some applications have been flagged for retirement from EMBL-EBI in 2015. These include ClustalW2 (27), DaliLite (33), DbClustal (34), MaxSprout (35), ReadSeq (36) and WU-BLAST (37). Further details will be announced on the web site. Additional support for users in the future will include webinars and the production of video-based tutorials and other integrated online learning capabilities.

DISCUSSION

Having a tools framework for EMBL-EBI applications allows users access to a range of services through uniform interfaces and helps the maintenance of a robust, relevant service by enabling individual applications to be added, updated, or retired as required. Improvements to the web browser interface help usability and allow more complex analyses to be carried out through the provision of workflow mechanisms between tools. Integration of other resources such as EBI Search and dbfetch expands the resources the framework can draw on and facilitates user acquisition of biological data. New and updated tools and databases ensure that scientists have access to the most recent analyses and data available, while retirement of depreciated services helps to ensure that the application set is well maintained and resources are dedicated to the most relevant services. Since becoming available in 2009, the framework has been used by academic and industry users for almost 260 million analysis jobs and the volume of usage has been increasing significantly with roughly 110 million analyses in 2014 alone. Web Services in particular lend themselves to integration in third party pipelines, and the applications have been of use to commercial and academic organisations as well as to other EMBL-EBI teams such as Ensembl Genomes, Pfam and UniProt. Where such integrations are present it is especially important not to break dependencies. So, a careful process of communication and change management is in place, including updates through a range of channels that include mailing lists, news feeds, web site announcements and Twitter.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.
  37 in total

1.  Improved tools for biological sequence comparison.

Authors:  W R Pearson; D J Lipman
Journal:  Proc Natl Acad Sci U S A       Date:  1988-04       Impact factor: 11.205

2.  MAFFT multiple sequence alignment software version 7: improvements in performance and usability.

Authors:  Kazutaka Katoh; Daron M Standley
Journal:  Mol Biol Evol       Date:  2013-01-16       Impact factor: 16.240

3.  A new bioinformatics analysis tools framework at EMBL-EBI.

Authors:  Mickael Goujon; Hamish McWilliam; Weizhong Li; Franck Valentin; Silvano Squizzato; Juri Paern; Rodrigo Lopez
Journal:  Nucleic Acids Res       Date:  2010-05-03       Impact factor: 16.971

4.  MapMi: automated mapping of microRNA loci.

Authors:  José Afonso Guerra-Assunção; Anton J Enright
Journal:  BMC Bioinformatics       Date:  2010-03-16       Impact factor: 3.169

5.  InterProScan: protein domains identifier.

Authors:  E Quevillon; V Silventoinen; S Pillai; N Harte; N Mulder; R Apweiler; R Lopez
Journal:  Nucleic Acids Res       Date:  2005-07-01       Impact factor: 16.971

6.  Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions.

Authors:  Jaina Mistry; Robert D Finn; Sean R Eddy; Alex Bateman; Marco Punta
Journal:  Nucleic Acids Res       Date:  2013-04-17       Impact factor: 16.971

7.  Analysis Tool Web Services from the EMBL-EBI.

Authors:  Hamish McWilliam; Weizhong Li; Mahmut Uludag; Silvano Squizzato; Young Mi Park; Nicola Buso; Andrew Peter Cowley; Rodrigo Lopez
Journal:  Nucleic Acids Res       Date:  2013-05-13       Impact factor: 16.971

8.  Predicting active site residue annotations in the Pfam database.

Authors:  Jaina Mistry; Alex Bateman; Robert D Finn
Journal:  BMC Bioinformatics       Date:  2007-08-09       Impact factor: 3.169

9.  R-Coffee: a method for multiple alignment of non-coding RNA.

Authors:  Andreas Wilm; Desmond G Higgins; Cédric Notredame
Journal:  Nucleic Acids Res       Date:  2008-04-17       Impact factor: 16.971

10.  The InterPro protein families database: the classification resource after 15 years.

Authors:  Alex Mitchell; Hsin-Yu Chang; Louise Daugherty; Matthew Fraser; Sarah Hunter; Rodrigo Lopez; Craig McAnulla; Conor McMenamin; Gift Nuka; Sebastien Pesseat; Amaia Sangrador-Vegas; Maxim Scheremetjew; Claudia Rato; Siew-Yit Yong; Alex Bateman; Marco Punta; Teresa K Attwood; Christian J A Sigrist; Nicole Redaschi; Catherine Rivoire; Ioannis Xenarios; Daniel Kahn; Dominique Guyot; Peer Bork; Ivica Letunic; Julian Gough; Matt Oates; Daniel Haft; Hongzhan Huang; Darren A Natale; Cathy H Wu; Christine Orengo; Ian Sillitoe; Huaiyu Mi; Paul D Thomas; Robert D Finn
Journal:  Nucleic Acids Res       Date:  2014-11-26       Impact factor: 16.971

View more
  414 in total

1.  Impact of gyrB and eis Mutations in Improving Detection of Second-Line-Drug Resistance among Mycobacterium tuberculosis Isolates from Georgia.

Authors:  N Bablishvili; N Tukvadze; E Shashkina; B Mathema; N R Gandhi; H M Blumberg; R R Kempker
Journal:  Antimicrob Agents Chemother       Date:  2017-08-24       Impact factor: 5.191

2.  The immune molecular landscape of the B7 and TNFR immunoregulatory ligand-receptor families in head and neck cancer: A comprehensive overview and the immunotherapeutic implications.

Authors:  Yu-Pei Chen; Jian Zhang; Ya-Qin Wang; Na Liu; Qing-Mei He; Xiao-Jing Yang; Ying Sun; Jun Ma
Journal:  Oncoimmunology       Date:  2017-02-16       Impact factor: 8.110

3.  WW domain-mediated regulation and activation of E3 ubiquitin ligase Suppressor of Deltex.

Authors:  Weiyi Yao; Zelin Shan; Aihong Gu; Minjie Fu; Zhifeng Shi; Wenyu Wen
Journal:  J Biol Chem       Date:  2018-09-13       Impact factor: 5.157

4.  Quality control by trans-editing factor prevents global mistranslation of non-protein amino acid α-aminobutyrate.

Authors:  Jo Marie Bacusmo; Alexandra B Kuzmishin; William A Cantara; Yuki Goto; Hiroaki Suga; Karin Musier-Forsyth
Journal:  RNA Biol       Date:  2017-11-03       Impact factor: 4.652

5.  Generation of focal mutations and large genomic deletions in the pancreas using inducible in vivo genome editing.

Authors:  Amrendra Mishra; Fatemeh Emamgholi; Zulrahman Erlangga; Björn Hartleben; Kristian Unger; Katharina Wolff; Ulrike Teichmann; Michael Kessel; Norman Woller; Florian Kühnel; Lukas E Dow; Michael P Manns; Arndt Vogel; Scott W Lowe; Anna Saborowski; Michael Saborowski
Journal:  Carcinogenesis       Date:  2020-05-14       Impact factor: 4.944

6.  Near-Neighbor Interactions in the NS3-4A Protease of HCV Impact Replicative Fitness of Drug-Resistant Viral Variants.

Authors:  Nadezhda T Doncheva; Francisco S Domingues; David R McGivern; Tetsuro Shimakami; Stefan Zeuzem; Thomas Lengauer; Christian M Lange; Mario Albrecht; Christoph Welsch
Journal:  J Mol Biol       Date:  2019-04-30       Impact factor: 5.469

7.  Haemophilus influenzae genome evolution during persistence in the human airways in chronic obstructive pulmonary disease.

Authors:  Melinda M Pettigrew; Christian P Ahearn; Janneane F Gent; Yong Kong; Mary C Gallo; James B Munro; Adonis D'Mello; Sanjay Sethi; Hervé Tettelin; Timothy F Murphy
Journal:  Proc Natl Acad Sci U S A       Date:  2018-03-19       Impact factor: 11.205

8.  No Evidence for Recent Selection at FOXP2 among Diverse Human Populations.

Authors:  Elizabeth Grace Atkinson; Amanda Jane Audesse; Julia Adela Palacios; Dean Michael Bobo; Ashley Elizabeth Webb; Sohini Ramachandran; Brenna Mariah Henn
Journal:  Cell       Date:  2018-08-02       Impact factor: 41.582

9.  Drosophila Subdued is a moonlighting transmembrane protein 16 (TMEM16) that transports ions and phospholipids.

Authors:  Trieu Le; Son C Le; Huanghe Yang
Journal:  J Biol Chem       Date:  2019-01-30       Impact factor: 5.157

10.  Development and characterization of a novel monoclonal antibody that recognizes an epitope in the central protein interaction domain of RapGEF1 (C3G).

Authors:  Zareena Begum; Ch Varalakshmi; Divya Sriram; Vegesna Radha
Journal:  Mol Biol Rep       Date:  2018-08-24       Impact factor: 2.316

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.