Literature DB >> 17485473

iHOP web services.

José M Fernández¹, Robert Hoffmann, Alfonso Valencia.

Abstract

iHOP provides fast, accurate, comprehensive, and up-to-date summary information on more than 80,000 biological molecules by automatically extracting key sentences from millions of PubMed documents. Its intuitive user interface and navigation scheme have made iHOP extremely successful among biologists, counting more than 500,000 visits per month (iHOP access statistics: http://www.ihop-net.org/UniPub/iHOP/info/logs/). Here we describe a public programmatic API that enables the integration of main iHOP functionalities in bioinformatic programs and workflows.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2007 PMID： 17485473 PMCID： PMC1933131 DOI： 10.1093/nar/gkm298

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

iHOP (1) (iHOP literature server, http://www.ihop.net.org) allows researchers to explore a network of gene and protein interactions by directly navigating the pool of published scientific literature. Rather than providing long lists of entire abstracts upon keyword searches, iHOP selectively retrieves information that is specific to genes and proteins and summarizes their interactions and functions. The system adds value by filtering and ranking extracted sentences according to significance, impact factor, date of publication and syntax. iHOP web content is pre-compiled and generated in a multi-step process to annotate biomedical texts with gene and protein names, chemical compounds and MeSH terms. This annotation task is computationally expensive because of the sheer number of entities, but more importantly, hindered by a high semantic overloading of abbreviations and synonyms in biomedicine. The continuous development and optimization of heuristics and machine learning algorithms to improve entity detection and synonym disambiguation is therefore a central effort in the maintenance of iHOP. Given the complexity and effort that goes into the development and maintenance of a text-mining pipeline, it makes sense to build upon the existing infrastructure of iHOP rather than reinventing the wheel. Already numerous online resources are linking to iHOP and novel tools are emerging which are based on the iHOP resource, e.g. iHOPerator (2). The iHOP web service API has already been tested in selected projects over the last 2 years and is made publicly available now. Although on any biocomputing facility APIs are not as visible to the end user, they are very important for the different omics, which usually depend on powerful data set analysis. Those powerful analysis run distributed workflows, which have to semantically integrate the results from diverse biocomputing facilities and data sources. Other large-scale biocomputing facilities provide environments such as NCBI Entrez (3) (Entrez CGI services, http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html; Entrez SOAP services, http://www.ncbi.nlm.nih.gov/entrez/query/static/esoap_help.html) or EBI WS (4) (EBI SOAP services, http://www.ebi.ac.uk/Tools/webservices/).

METHODS

To make the iHOP programmatic interface remotely accessible and integrable on workflows in a way that is neutral to programming languages and vendor independent, we decided to implement the public API in the form of web services (5). Three popular web service API models have been implemented for iHOP: the REST model (6) (Wikipedia description of REST, http://en.wikipedia.org/wiki/Representational_State_Transfer), which the DAS (7) protocol follows; SOAP + WSDL, which is based on WSDL document description and uses SOAP messages and XML for messaging; BioMOBY (8), which is focused on bioinformatic workflow building (9). Table 1 contains a brief description of these API models. All three API implementations are based on a common internal library and common XML schemas to facilitate maintenance and future developments. MOBY implementation required additional efforts to integrate iHOP web services into the MOBY ontology.

Table 1.

Brief description of web services API models, used on iHOP web services

Web services API models	Description
REST	The Representational State Transfer paradigm is based on the HTTP protocol. It is an improvement over CGI sub-protocol, used by web browsers to send the data in the HTML formularies. The main difference is on the provided results: a REST service is usually restricted to answer with an XML document with all the information; a CGI service can return any document type (HTML, GIF, PDF, etc.). There are also differences between CGI and REST is the way to send the queries: a key-value model is used to send the query data to the CGI service; REST implementations send an XML document with the whole information. Some implementations following REST paradigm (e.g. DAS and iHOP CGI-XML), follow the CGI sub-protocol for the queries, and answer an XML document as the result.
SOAP+WSDL	The Simple Object Access Protocol web services paradigm tries to isolate the used communications protocol from the message representation by using XML formatted messages and different message exchange patterns. SOAP messages have an envelope and a body, so complex features and message exchange patterns can be implemented, nevertheless the used communications protocol. Although there are SOAP client and server libraries for FTP, SMTP, POP3 and other protocols, the most used communication protocol is HTTP. Web Services Description Language is a companion technology of SOAP services. WSDL documents are usually used to describe SOAP services: their inputs and outputs, their XML data types, the data encoding and the message patterns to use. Although any SOAP service can be used without the aid of a WSDL document which describes the service, they are a standard way to distribute that information.
BioMOBY	BioMOBY web services architecture has three main roles: MOBY Central, MOBY clients and MOBY services. MOBY Central is the repository of the ontologies used by the architecture: namespace ontology, which is used to label biological references so they can be unambiguously identified; object ontology, which defines the object types which can be consumed and produced by the services; service type ontology, which contains the classification of service types used to label the different services; and the service ontology, which contains the description of all the registered MOBY services. One of the features which distinguishes BioMOBY architecture from SOAP Protocol is that many queries can be clustered on a single message when they are sent to a service. Although BioMOBY architecture has its own message formats and query protocols, SOAP infrastructure is used to wrap MOBY messages. BioMOBY community is now discussing the need to drop SOAP (due its overhead) in favour of lighter REST paradigm.

Brief description of web services API models, used on iHOP web services The schema design was driven by the iHOP functionalities that are directly useful for bioinformatical workflows (Figures 1 and 2). Table 2 contains a brief description of these functionalities, with their inputs and outputs.

Figure 1.

Schema of operations of iHOP web services. Each box is a web service, and double boxes are the recommended starting points for workflows. Links show some suggested flows between services that are useful for workflow building. Green links represent bi-directional flow, and black arrowed blue lines mean uni-directional flow. Grey thick lines illustrate bidirectional external access and orange dashed lines are accesses to the iHOP core (http://www.ihop-net.org/).

Figure 2.

This is a Taverna workflow diagram which takes as input some free text (e.g. ‘breast cancer’). The workflow fetches the gene and protein symbols related to the input free text, and it returns those symbols, their synonyms and all the abstracts with sentences where the symbols are showing interaction evidences with other protein or gene symbols.

Table 2.

Functionality	Inputs	Results
getRelatedSymbols	Free text (e.g. P53, breast cancer or BRCA2), and an optional NCBI TaxID.	A list of the possible iHOP identified gene or protein symbols, related to the input.
getSymbolsFromReference	A biological database reference (e.g. UniProt accession, NCBI GENE).	The iHOP identified gene or protein symbol related to the input.
guessSymbolIdFromSymbolText	Free text, and an optional NCBI TaxID.	The iHOP identified gene or protein symbol related to the input, chosen by naïve heuristics.
guessSymbolIdFromReference	A biological database reference	The iHOP identified gene or protein symbol related to the input.
getSymbolInfo	Free text, and an optional NCBI TaxID,or an iHOP gene or protein symbol ID,or a biological database reference.	The information available at the iHOP server about the iHOP identified gene or protein related to the input (name, organism, database references, synonyms, etc.). When free text is used, naïve heuristics have been used to choose the iHOP identified gene.
getSymbolDefinitions	Free text, and an optional NCBI TaxID,or an iHOP gene or protein symbol ID,or a biological database reference.	The abstract sentences available at the iHOP server which uniquely define the iHOP identified gene or protein related to the input, along with their score, iHOP abstract ID, journal impact, etc. When free text is used, naïve heuristics have been used to choose the iHOP symbol.
getSymbolInteractions	Free text, and an optional NCBI TaxID,or an iHOP gene or protein symbol ID,or a biological database reference.	The abstract sentences available at the iHOP server which show evidences about interactions between the iHOP identified gene or protein symbol related to the input and other iHOP symbols, along with their score, iHOP abstract ID, journal impact, etc. When free text is used, naïve heuristics have been used to choose the iHOP symbol.
getPubMed	A PubMed PMID,or and iHOP abstract ID.	iHOP analysed and enriched PubMed abstract associated to the input. All the abstract sentences are annotated, focusing on remarkable sentence elements (verbs, nouns, adjectives, gene or protein symbols, etc.).

Logical iHOP web services functionalities. These functionalities and the web services which implement them are focused on automation, so almost all functionalities have more than one input type. So, depending on the API model, some of these functionalities have been implemented more than once, based on each one of the possible input types. Schema of operations of iHOP web services. Each box is a web service, and double boxes are the recommended starting points for workflows. Links show some suggested flows between services that are useful for workflow building. Green links represent bi-directional flow, and black arrowed blue lines mean uni-directional flow. Grey thick lines illustrate bidirectional external access and orange dashed lines are accesses to the iHOP core (http://www.ihop-net.org/). This is a Taverna workflow diagram which takes as input some free text (e.g. ‘breast cancer’). The workflow fetches the gene and protein symbols related to the input free text, and it returns those symbols, their synonyms and all the abstracts with sentences where the symbols are showing interaction evidences with other protein or gene symbols. This is a snapshot of the CARGO framework, showing information about P53. The iHOP widget shows sentences with evidences of some relationship between P53 and other genes. A key issue in the development was the design of an XML schema rich enough to describe and integrate the valuable information that is already accessible through the iHOP user interface. For instance, annotated sentences are generated by getSymbolDefinitions, getSymbolInteractions and getPubMed functionalities. Each sentence also provides information about the abstract, journal and the journal impact factor. Basic symbol information is provided by getSymbolInfo, and it can also be found on getSymbolDefinitions and getSymbolInteractions results. The designed XML Schema, along with its documentation, is available at the iHOP web services site. Usually, gene symbol disambiguation is a hard task, made in the last term by the user, and its automation is an essential part in a useful workflow. Using specific heuristics for these web services, we have created an additional functionality called guessSymbolIdFromSymbolText, which guesses the nearest unambiguous iHOP gene symbol id from free text input and an optional target organism. This concept is very similar to ‘I'm feeling lucky’ Google functionality, and the functionality speeds up workflow building. Workflow writers are not tied to this service and its heuristics, because anyone can create their own heuristics about symbol selection using getRelatedSymbols output. Under the REST (Representational State Transfer) paradigm there is a CGI-XML service available for all functionalities described earlier. Special return cases have been modelled using standard HTTP codes: when there is no answer for a query in a CGI-XML service, a 404 Not Found error is returned; if an internal error happens, a 500 Internal Server Error is used; if no input parameter is specified, a 400 Bad Request error is returned. For SOAP (Simple Object Access Protocol), we created for each functionality variations of the same web service, to simplify workflow building. SOAP services use the RPC/encoded WSDL style, so they can be used from Perl programs with any SOAP::Lite version. Critical errors (no input parameter, internal server error) are reported by the iHOP SOAP services using the standard SOAP fault mechanism. When there is no answer to return, the services return a specific XML structure (iHOPSOAPNotFound) designed for these SOAP services, instead of using SOAP fault mechanism. This is important, because some workflow enactment tools (like Taverna) stop the whole workflow when a SOAP service returns a SOAP fault, an undesirable effect when a service invocation has not failed. In the design of BioMOBY services it was necessary to comply with the common object ontology on MOBY Central and the portfolio of services that are using this ontology. Although the main iHOP services take the same parameters as input and use the same XML schema as CGI-XML and SOAP for their outputs, the true power of iHOP MOBY service are the additional translation services. These services take as input iHOP XML structures generated by the iHOP services, and translate the content into a collection of usable MOBY objects. This way, other MOBY services which use the same ontology can be chained to this output. CGI-XML services were tested using both web browsers and command-line HTTP retrieval tools (like wget). We tested and cross validated the functionality of iHOP SOAP web services with unit tests based on the Perl SOAP::Lite library and in the context of Taverna (10,11), a workflow enactment tool extensively used by the bioinformatics community. We found that SOAP::Lite 0.60 had a better behaviour than former versions and some new intermediate ones (last version is 0.69). Taverna 1.4 and 1.5 are discouraged, because SOAP services results are pruned. Taverna 1.5.1 solves these and other issues, and it is recommended. Older versions, like Taverna 1.3.1, also work, but they have many limitations related to BioMOBY services.

RESULTS AND DISCUSSION

A proof for the functionality, completeness and usefulness of the iHOP web service APIs are a number of collaborative projects that make programmatic use of iHOP content. Table 3 contains a brief list of the projects where iHOP web services have been used, and Figure 3 shows a CARGO (Cases et al., submitted to NAR-WEB 2007) widget using information provided by iHOP CGI-XML services.

Table 3.

Projects where iHOP web services have been (or are being) used

Project	Description	URL/Reference
ORIEL	The CGI-XML API was developed in the context of this project, and it was integrated with other biological information resources.	http://www.oriel.org/
DIAMONDS	The iHOP SOAP interface was developed and applied to the dynamic extraction of proteins related with cell cycle in various genomes with special emphasis in Arabidopsis proteins. This information was used as input for the modelling approaches developed by the partners of this project.	http://www.sbcellcycle.org/
ECID	These projects and their tools (e.g. ENFIN Spindle proteins tool, CARGO framework) are currently using these iHOP web service APIs in different biological and technical contexts.	http://www.pdg.cnb.uam.es/ecid/
COMBIO		http://somosierra.cnb.uam.es/Servers/COMBIO/
ENFIN (Enabling Systems Biology) Network of Excellence		http://www.enfin.org/http://www.pdg.cnb.uam.es/ENFIN/index_spindle.php
CARGO		http://cargo.bioinfo.cnio.es/
INB	BioMOBY iHOP web services were funded by the Spanish National Bioinformatics Institute (INB), and published on the INB specific MOBY repository, integrated in the INB curated bioinformatics object ontology. The services will also be made available in the central MOBY repository of Canada.	http://www.inab.org/

Figure 3.

This is a snapshot of the CARGO framework, showing information about P53. The iHOP widget shows sentences with evidences of some relationship between P53 and other genes.

Projects where iHOP web services have been (or are being) used In the context of the use of iHOP as a web service it is necessary to be aware of the current limitations of biological text mining. BioCreAtIvE (12) and other blind community assessments (13) have clearly shown that name identification and in particular matching gene/name in the literature with the corresponding database entries is a hard problem and the best systems are still far from perfect (14). Our own evaluation of iHOP in 2005 (15) shows that in model organisms the average precision is around 94% and the recall around 87%. Even if the inclusion of additional refinements and dictionaries is producing continuous progress the poor adhesion of the community to naming standards (16) will continue creating problems in this area. Other obvious limitations of iHOP and all other current text mining systems are imposed by the limited availability of full text sources [main reason for the common use of abstract collections (17)] and the still limited possibilities to incorporate effective Natural Processing Techniques for the extraction of additional features from biomedical text. A more detailed description of the status of this fast developing field can be found in (18–20). Despite these general limitations in the field, the iHOP web interface has become popular among biologists searching for information about the function and relation of the genes and proteins of their interest. To our knowledge, iHOP is the only large-scale text mining resource in biology that is offered as an open web service, we therefore, expect that the novel possibilities described in this work will contribute to the use of iHOP as part of numerous high-throughput analysis environments.

AVAILABILITY

Information relevant to developers, like detailed documentation of the iHOP web service XML file format, the URLs required to invoke the REST API, the WSDL document describing SOAP services and usage examples in Perl and Taverna are available at http://www.ihop-net.org/UniPub/iHOP/webservices/.

16 in total

1. BioMOBY: an open source biological web services proposal.

Authors: Mark D Wilkinson; Matthew Links
Journal: Brief Bioinform Date: 2002-12 Impact factor: 11.622

2. Distribution of information in biomedical abstracts and full-text publications.

Authors: M J Schuemie; M Weeber; B J A Schijvenaars; E M van Mulligen; C C van der Eijk; R Jelier; B Mons; J A Kors
Journal: Bioinformatics Date: 2004-05-06 Impact factor: 6.937

3. Taverna: a tool for the composition and enactment of bioinformatics workflows.

Authors: Tom Oinn; Matthew Addis; Justin Ferris; Darren Marvin; Martin Senger; Mark Greenwood; Tim Carver; Kevin Glover; Matthew R Pocock; Anil Wipat; Peter Li
Journal: Bioinformatics Date: 2004-06-16 Impact factor: 6.937

4. A gene network for navigating the literature.

Authors: Robert Hoffmann; Alfonso Valencia
Journal: Nat Genet Date: 2004-07 Impact factor: 38.330

Review 5. Text-mining approaches in molecular biology and biomedicine.

Authors: Martin Krallinger; Ramon Alonso-Allende Erhardt; Alfonso Valencia
Journal: Drug Discov Today Date: 2005-03-15 Impact factor: 7.851

6. BioMOBY successfully integrates distributed heterogeneous bioinformatics Web Services. The PlaNet exemplar case.

Authors: Mark Wilkinson; Heiko Schoof; Rebecca Ernst; Dirk Haase
Journal: Plant Physiol Date: 2005-05 Impact factor: 8.340

Review 7. Text mining for metabolic pathways, signaling cascades, and protein networks.

Authors: Robert Hoffmann; Martin Krallinger; Eduardo Andres; Javier Tamames; Christian Blaschke; Alfonso Valencia
Journal: Sci STKE Date: 2005-05-10

8. iHOPerator: user-scripting a personalized bioinformatics Web, starting with the iHOP website.

Authors: Benjamin M Good; Edward A Kawas; Byron Yu-Lin Kuo; Mark D Wilkinson
Journal: BMC Bioinformatics Date: 2006-12-15 Impact factor: 3.169

9. Overview of BioCreAtIvE task 1B: normalized gene lists.

Authors: Lynette Hirschman; Marc Colosimo; Alexander Morgan; Alexander Yeh
Journal: BMC Bioinformatics Date: 2005-05-24 Impact factor: 3.169

10. Overview of BioCreAtIvE: critical assessment of information extraction for biology.

Authors: Lynette Hirschman; Alexander Yeh; Christian Blaschke; Alfonso Valencia
Journal: BMC Bioinformatics Date: 2005-05-24 Impact factor: 3.169

59 in total

Review 1. Frontiers of biomedical text mining: current progress.

Authors: Pierre Zweigenbaum; Dina Demner-Fushman; Hong Yu; Kevin B Cohen
Journal: Brief Bioinform Date: 2007-10-30 Impact factor: 11.622

Review 2. Bioinformatics and cancer research: building bridges for translational research.

Authors: Gonzalo Gómez-López; Alfonso Valencia
Journal: Clin Transl Oncol Date: 2008-02 Impact factor: 3.405

Review 3. Network integration and graph analysis in mammalian molecular systems biology.

Authors: A Ma'ayan
Journal: IET Syst Biol Date: 2008-09 Impact factor: 1.615

4. Klotho Deficiency Accelerates Stem Cells Aging by Impairing Telomerase Activity.

Authors: Mujib Ullah; Zhongjie Sun
Journal: J Gerontol A Biol Sci Med Sci Date: 2019-08-16 Impact factor: 6.053

5. Integrating text mining into the MGI biocuration workflow.

Authors: K G Dowell; M S McAndrews-Hill; D P Hill; H J Drabkin; J A Blake
Journal: Database (Oxford) Date: 2009-11-21 Impact factor: 3.451

6. PPLook: an automated data mining tool for protein-protein interaction.

Authors: Shao-Wu Zhang; Yao-Jun Li; Li Xia; Quan Pan
Journal: BMC Bioinformatics Date: 2010-06-16 Impact factor: 3.169

7. The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows. The DBCLS BioHackathon Consortium*.

Authors: Toshiaki Katayama; Kazuharu Arakawa; Mitsuteru Nakao; Keiichiro Ono; Kiyoko F Aoki-Kinoshita; Yasunori Yamamoto; Atsuko Yamaguchi; Shuichi Kawashima; Hong-Woo Chun; Jan Aerts; Bruno Aranda; Lord Hendrix Barboza; Raoul Jp Bonnal; Richard Bruskiewich; Jan C Bryne; José M Fernández; Akira Funahashi; Paul Mk Gordon; Naohisa Goto; Andreas Groscurth; Alex Gutteridge; Richard Holland; Yoshinobu Kano; Edward A Kawas; Arnaud Kerhornou; Eri Kibukawa; Akira R Kinjo; Michael Kuhn; Hilmar Lapp; Heikki Lehvaslaiho; Hiroyuki Nakamura; Yasukazu Nakamura; Tatsuya Nishizawa; Chikashi Nobata; Tamotsu Noguchi; Thomas M Oinn; Shinobu Okamoto; Stuart Owen; Evangelos Pafilis; Matthew Pocock; Pjotr Prins; René Ranzinger; Florian Reisinger; Lukasz Salwinski; Mark Schreiber; Martin Senger; Yasumasa Shigemoto; Daron M Standley; Hideaki Sugawara; Toshiyuki Tashiro; Oswaldo Trelles; Rutger A Vos; Mark D Wilkinson; William York; Christian M Zmasek; Kiyoshi Asai; Toshihisa Takagi
Journal: J Biomed Semantics Date: 2010-08-21

8. Biomedical text mining and its applications.

Authors: Raul Rodriguez-Esteban
Journal: PLoS Comput Biol Date: 2009-12-24 Impact factor: 4.475

9. GoGene: gene annotation in the fast lane.

Authors: Conrad Plake; Loic Royer; Rainer Winnenburg; Jörg Hakenberg; Michael Schroeder
Journal: Nucleic Acids Res Date: 2009-05-22 Impact factor: 16.971

10. Iron behaving badly: inappropriate iron chelation as a major contributor to the aetiology of vascular and other progressive inflammatory and degenerative diseases.

Authors: Douglas B Kell
Journal: BMC Med Genomics Date: 2009-01-08 Impact factor: 3.063