Literature DB >> 15980463

SOAP-based services provided by the European Bioinformatics Institute.

S Pillai1, V Silventoinen, K Kallio, M Senger, S Sobhany, J Tate, S Velankar, A Golovin, K Henrick, P Rice, P Stoehr, R Lopez.   

Abstract

SOAP (Simple Object Access Protocol) (http://www.w3.org/TR/soap) based Web Services technology (http://www.w3.org/ws) has gained much attention as an open standard enabling interoperability among applications across heterogeneous architectures and different networks. The European Bioinformatics Institute (EBI) is using this technology to provide robust data retrieval and data analysis mechanisms to the scientific community and to enhance utilization of the biological resources it already provides [N. Harte, V. Silventoinen, E. Quevillon, S. Robinson, K. Kallio, X. Fustero, P. Patel, P. Jokinen and R. Lopez (2004) Nucleic Acids Res., 32, 3-9]. These services are available free to all users from http://www.ebi.ac.uk/Tools/webservices.

Entities:  

Mesh:

Substances:

Year:  2005        PMID: 15980463      PMCID: PMC1160251          DOI: 10.1093/nar/gki491

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Today, biological databases are large collections of data that are relatively difficult to maintain outside the centres and institutions that produce them. These data are traditionally accessed using browser-based World Wide Web interfaces. When large amounts of data need to be retrieved and analysed, this often proves to be tedious and impractical. Web Services technology enables scientists to access these data and analysis applications as if they were installed on their laboratory computers. Similarly, it enables programmers to build complex applications without the need to install and maintain the databases and analysis tools (1) and without having to take on the financial overheads that accompany these. Moreover, Web Services provide easier integration and interoperability between bioinformatics applications and the data they require.

THE TECHNOLOGIES

The European Bioinformatics Institute (EBI) has tried and tested standards such as CORBA () and Web Services. CORBA is standardized and mtature; it uses the Inter-ORB Protocol (IIOP) and can be tunnelled through HTTP but does not natively support HTTP. It is trickier to communicate through firewalls. Web Services uses SOAP (the Simple Object Access Protocol) over HTTP. It interacts with other systems using messages based on eXtensible Markup Language (XML) (). A SOAP message can be transferred using almost any application or transport protocol. SOAP uses the Web Services Description Language (WSDL) () to describe its interface. A SOAP client can read the WSDL at runtime and dynamically select the proper data-encoding scheme and network transfer protocol. SOAP implementations are available for many programming languages, including Perl and Java, which are popular languages among bioinformaticians. On the basis of these observations, the EBI has chosen to use the Web Services technology to expose its services in a programmatically accessible manner. All that is required by the bioinformatics programmer is a lightweight program that communicates with existing services running at the EBI. These services have several advantages. As traditional web browsers cannot be used programmatically, these services provide an easy and flexible way to deal with repetitive tasks such as bulk submission with minimal intervention from the user. Web Services clients allow the programmer as well as the service provider to integrate and build more complex analysis workflows using existing EBI services. Also, using these services effectively avoids the need to maintain many programs and databases locally.

SERVICES

In this article, we describe services currently available at the EBI via a SOAP server. These include tools for sequence and literature data retrieval, sequence similarity search services, protein function analysis and structural analysis tools that access the Macromolecular Structure Database (MSD) (2) and a set of Web Services called Soaplab (3) for the European Molecular Biology Open Software Suite (EMBOSS) (4). Table 1 lists services available, links to the web pages and WSDL.
Table 1

Web Services available at the EBI

ServiceURLWSDL
WSDbfetch
WSFasta
WSWUBlast
WSInterProScan
MSD Services
Soaplab

Sequence and Literature data retrieval: WSDbfetch

WSDbfetch provides programmatic access to the popular sequence and literature data retrieval tool dbfetch (). The databases currently available for data retrieval using this service include EMBL (5), EMBL-SVA (6), MEDLINE, UniProt (7), InterPro (8), PDB (9), RefSeq (10) and HGVBase (11). The data backends currently used are SRS (12), Sequence Version Archive (SVA) and the UniProt consortium server at the EBI (), but the service can be easily modified to use other data retrieval systems. It is implemented on Apache Axis (). Users can call the WSDbfetch service from an application written in any programming language that supports SOAP. Data can be retrieved from a database using either a primary or secondary identifier. Each database supports various formats and styles, of which one is set as a default. The results can be obtained as pure ASCII text, HTML with hyperlinks or XML, where available. The various methods available for this service and their descriptions are shown in Table 2. Fully functional Java and Perl client programs are available from . A sample client in Perl is shown in Figure 1, and Figure 2 illustrates how to run the client.
Table 2

Methods available in the WSDbfetch service

MethodsDescription
getSupporteDBsLists the databases available for data retrieval
getSupportedFormatsLists the format in which data can be obtained for each of the available databases along with the default format
getSupportedStylesLists the style in which the result can be obtained
fetchDataTakes the database name followed by a primary or secondary identifier, format and style as parameters and returns the result as a string
Figure 1

A sample Perl client calling the fetchData method.

Figure 2

A sample client invocation showing the method called and result obtained.

Sequence searching and protein function analysis: WSFasta, WSWUBlast, WSInterProScan

The EBI provides Web Services for sequence similarity tools such as Fasta (13), WUBlast (14) and the protein function analysis tool InterProScan (15). These are Web Services servers that provide the same functionality as the traditional browser-based services found at , and , respectively. These services are implemented on a Perl-based, SOAP::Lite () server. The methods provided for using these services are listed in Table 3. Fully functional Java and Perl client programs are available from . A sample client in Perl is shown in Figure 3, and Figure 4 illustrates how to run the client. Depending on the input and the databases chosen for the search, jobs may take seconds to complete or up to a few hours. Two modes of job submission exist: synchronous and asynchronous.
Table 3

Methods available in the WSFasta, WSWUBlast and WSInterProScan services

MethodsDescription
doFasta/doWUBlast/doIprscanInput parameters are a set of key-value pairs that correspond to choosing program names, databases, gap values, matrices, job mode etc., and the input sequence in Fasta format. Depending on the job mode chosen, the method returns either a job identifier or the result of the job when completed
polljobThis method is used when a job was submitted in asynchronous mode. This method takes the job identifier and an optional format name as arguments. It returns either the result or the status of the job
Figure 3

A sample Perl client for WSFasta calling the doFasta method asynchronously.

Figure 4

A sample client invocation. Note that ‘\’ means a continuous one-line command. The input file (e.g. mysequence) is a Fasta-formatted sequence.

Synchronous mode

This mode is equivalent to a user running a command on a console or terminal and waiting for it to complete. This requires the client to be constantly connected to the server. This mode is suitable for database searches that can be executed in up to 5 min (e.g. protein versus protein searches).

Asynchronous mode

In this mode, the user submits a job and receives a job identifier in return. This is the same as running a UNIX command in the background and obtaining a job id. The user can use the ‘jobs’ command to list processes that are running in the background. Similarly, the user can query or poll the status of an asynchronous mode job and receive the following four states in response: JOB RUNNING (i.e. the job is currently being processed), JOB PENDING (i.e. the job is in a queue waiting processing), JOB NOT FOUND (i.e. the job id is no longer available; job results are deleted after 24 h) and JOB FAILED (i.e. the job failed or no results where found). Typically, the asynchronous submission mode is recommended when users are submitting batch jobs (e.g. many protein sequences to analyse using InterProScan) or large database searches (e.g. searching the whole of the EMBL nucleotide sequence database). One advantage of this mode is that it is impervious to system or network failure. The results of jobs are stored at the EBI for 24 h after the job has completed.

Structural Analysis

The EBI provides a Web Services interface to tools that access the MSD. This service enables software developers to query the MSD directly from their own application programs and is further described at . The available functions are described in the corresponding WSDL description at . As well as simple extraction of data from the database, the interface also provides methods for performing complex queries on the MSD relational database remotely. For protein structure analysis, MSDfold, a protein secondary structure-matching tool, is available as a Web Service. An example client is described at .

Soaplab

Soaplab () is a tool that can automatically generate and deploy Web Services on top of existing command-line analysis programs. It is especially well suited for EMBOSS-type applications. It allows the integration of many applications within a single programming interface. It can also interoperate with other Web Services described earlier (e.g. WSInterProScan) and it can create Web Services on top of existing web resources (e.g. extracting data from a third-party web page and providing its data as a Web Service). Soaplab in its basic form is a tool for non-programmers who need only to create metadata describing resources (command-line applications, web pages) and let Soaplab generate the rest. The resulting Web Services are uniform and provide a good platform for integration into a workflow such as in Taverna (). The initial metadata are available from the Soaplab Web Services interface. They make the services self-describing. Soaplab is also a reference implementation of the OMG (Object Management Group, ) standard for the Life Sciences Analysis Engine (LSAE).

CONCLUSION

We present here a set of applications that give the user more direct access to data and services from the EBI. From the user's perspective, these are equivalent to installing and maintaining software and databases on local computers. From the programmer's point of view, Web Services provide a robust and flexible environment in which to build applications and provide complex and novel services.
  14 in total

1.  The Protein Data Bank.

Authors:  H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  EMBOSS: the European Molecular Biology Open Software Suite.

Authors:  P Rice; I Longden; A Bleasby
Journal:  Trends Genet       Date:  2000-06       Impact factor: 11.639

3.  Introducing RefSeq and LocusLink: curated human genome resources at the NCBI.

Authors:  K D Pruitt; K S Katz; H Sicotte; D R Maglott
Journal:  Trends Genet       Date:  2000-01       Impact factor: 11.639

4.  The EMBL sequence version archive.

Authors:  Rasko Leinonen; Francesco Nardone; Olalekan Oyewole; Nicole Redaschi; Peter Stoehr
Journal:  Bioinformatics       Date:  2003-09-22       Impact factor: 6.937

5.  UniProt: the Universal Protein knowledgebase.

Authors:  Rolf Apweiler; Amos Bairoch; Cathy H Wu; Winona C Barker; Brigitte Boeckmann; Serenella Ferro; Elisabeth Gasteiger; Hongzhan Huang; Rodrigo Lopez; Michele Magrane; Maria J Martin; Darren A Natale; Claire O'Donovan; Nicole Redaschi; Lai-Su L Yeh
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

Review 6.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors:  S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal:  Nucleic Acids Res       Date:  1997-09-01       Impact factor: 16.971

7.  Improved tools for biological sequence comparison.

Authors:  W R Pearson; D J Lipman
Journal:  Proc Natl Acad Sci U S A       Date:  1988-04       Impact factor: 11.205

8.  SRS--an indexing and retrieval tool for flat file data libraries.

Authors:  T Etzold; P Argos
Journal:  Comput Appl Biosci       Date:  1993-02

9.  HGVbase: a curated resource describing human DNA variation and phenotype relationships.

Authors:  D Fredman; G Munns; D Rios; F Sjöholm; M Siegfried; B Lenhard; H Lehväslaiho; A J Brookes
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

10.  E-MSD: an integrated data resource for bioinformatics.

Authors:  S Velankar; P McNeil; V Mittard-Runte; A Suarez; D Barrell; R Apweiler; K Henrick
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

View more
  27 in total

1.  CytoSolve: A Scalable Computational Method for Dynamic Integration of Multiple Molecular Pathway Models.

Authors:  V A Shiva Ayyadurai; C Forbes Dewey
Journal:  Cell Mol Bioeng       Date:  2010-10-23       Impact factor: 2.321

2.  A heterozygous moth genome provides insights into herbivory and detoxification.

Authors:  Minsheng You; Zhen Yue; Weiyi He; Xinhua Yang; Guang Yang; Miao Xie; Dongliang Zhan; Simon W Baxter; Liette Vasseur; Geoff M Gurr; Carl J Douglas; Jianlin Bai; Ping Wang; Kai Cui; Shiguo Huang; Xianchun Li; Qing Zhou; Zhangyan Wu; Qilin Chen; Chunhui Liu; Bo Wang; Xiaojing Li; Xiufeng Xu; Changxin Lu; Min Hu; John W Davey; Sandy M Smith; Mingshun Chen; Xiaofeng Xia; Weiqi Tang; Fushi Ke; Dandan Zheng; Yulan Hu; Fengqin Song; Yanchun You; Xiaoli Ma; Lu Peng; Yunkai Zheng; Yong Liang; Yaqiong Chen; Liying Yu; Younan Zhang; Yuanyuan Liu; Guoqing Li; Lin Fang; Jingxiang Li; Xin Zhou; Yadan Luo; Caiyun Gou; Junyi Wang; Jian Wang; Huanming Yang; Jun Wang
Journal:  Nat Genet       Date:  2013-01-13       Impact factor: 38.330

3.  Evidence of mRNA-mediated intron loss in the human-pathogenic fungus Cryptococcus neoformans.

Authors:  Jason E Stajich; Fred S Dietrich
Journal:  Eukaryot Cell       Date:  2006-05

4.  Jalview Version 2--a multiple sequence alignment editor and analysis workbench.

Authors:  Andrew M Waterhouse; James B Procter; David M A Martin; Michèle Clamp; Geoffrey J Barton
Journal:  Bioinformatics       Date:  2009-01-16       Impact factor: 6.937

5.  From Corynebacterium glutamicum to Mycobacterium tuberculosis--towards transfers of gene regulatory networks and integrated data analyses with MycoRegNet.

Authors:  Justina Krawczyk; Thomas A Kohl; Alexander Goesmann; Jörn Kalinowski; Jan Baumbach
Journal:  Nucleic Acids Res       Date:  2009-06-03       Impact factor: 16.971

6.  Web services at the European Bioinformatics Institute-2009.

Authors:  Hamish McWilliam; Franck Valentin; Mickael Goujon; Weizhong Li; Menaka Narayanasamy; Jenny Martin; Teresa Miyar; Rodrigo Lopez
Journal:  Nucleic Acids Res       Date:  2009-05-12       Impact factor: 16.971

7.  TogoWS: integrated SOAP and REST APIs for interoperable bioinformatics Web services.

Authors:  Toshiaki Katayama; Mitsuteru Nakao; Toshihisa Takagi
Journal:  Nucleic Acids Res       Date:  2010-05-14       Impact factor: 16.971

8.  The EMBRACE web service collection.

Authors:  Steve Pettifer; Jon Ison; Matús Kalas; Dave Thorne; Philip McDermott; Inge Jonassen; Ali Liaquat; José M Fernández; Jose M Rodriguez; David G Pisano; Christophe Blanchet; Mahmut Uludag; Peter Rice; Edita Bartaseviciute; Kristoffer Rapacki; Maarten Hekkelman; Olivier Sand; Heinz Stockinger; Andrew B Clegg; Erik Bongcam-Rudloff; Jean Salzemann; Vincent Breton; Teresa K Attwood; Graham Cameron; Gert Vriend
Journal:  Nucleic Acids Res       Date:  2010-05-12       Impact factor: 16.971

9.  Bio-jETI: a framework for semantics-based service composition.

Authors:  Anna-Lena Lamprecht; Tiziana Margaria; Bernhard Steffen
Journal:  BMC Bioinformatics       Date:  2009-10-01       Impact factor: 3.307

10.  SSWAP: A Simple Semantic Web Architecture and Protocol for semantic web services.

Authors:  Damian D G Gessler; Gary S Schiltz; Greg D May; Shulamit Avraham; Christopher D Town; David Grant; Rex T Nelson
Journal:  BMC Bioinformatics       Date:  2009-09-23       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.