Literature DB >> 20460452

The Ontology Lookup Service: bigger and better.

Richard Côté1, Florian Reisinger, Lennart Martens, Harald Barsnes, Juan Antonio Vizcaino, Henning Hermjakob.   

Abstract

The Ontology Lookup Service (OLS; http://www.ebi.ac.uk/ols) has been providing several means to query, browse and navigate biomedical ontologies and controlled vocabularies since it first went into production 4 years ago, and usage statistics indicate that it has become a heavily accessed service with millions of hits monthly. The volume of data available for querying has increased 7-fold since its inception. OLS functionality has been integrated into several high-usage databases and data entry tools. Improvements in the data model and loaders, as well as interface enhancements have made the OLS easier to use and capture more annotations from the source data. In addition, newly released software packages now provide easy means to fully integrate OLS functionality in external applications.

Entities:  

Mesh:

Year:  2010        PMID: 20460452      PMCID: PMC2896109          DOI: 10.1093/nar/gkq331

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Ontologies and controlled vocabularies (CVs) have more than demonstrated their essential function when dealing with large volumes of complex data currently being generated by high-throughput multi-domain analysis techniques (1). They provide a framework around which large data sets can be systematically annotated and queried. For this framework to function efficiently, however, the ontologies and CVs must be made available to the user community. The Ontology Lookup Service (OLS) has been in production since mid-2005 and has quickly become one of the most accessed services in the Proteomics Services team at the EBI, with monthly usage figures in the millions of hits. This includes both the programmatic as well as the interactive interfaces that the service offers. The OLS has been previously described and readers are invited to refer to the original publication for in-depth information on the technical architecture and data models (2,3). The core functionality of the OLS has remained largely unchanged since its inception, allowing users to query ontologies and CVs by name or identifier as well as obtaining metadata, such as synonyms, definitions, cross references and other annotations, for a given term. Users can also traverse the relationships between terms. The usability and volume of data captured, however, has been enhanced and this will be expanded below. The OLS has always been designed to be used in other projects as a means to integrate ontology and CV annotation and query functionality. A SOAP web service has been available since the OLS went into production. A full description of the web service has already been published (2,3) and users who wish to make use of it are encouraged to go to the OLS web service developer section for the most up-to-date documentation and code samples (http://www.ebi.ac.uk/ontology-lookup/WSDLDocumentation.do).

AVAILABLE DATA

The first OLS publication described it as containing 42 ontologies, accounting for roughly 135 000 terms. Over a 4-year period, the data loaded into the OLS has been expanded to 79 ontologies, representing over 971 000 unique terms (Figure 1). These cover far-ranging topics such as model organism anatomy and development, physiology and disease, instrumentation and methods and many others. In the 2 years since the OLS was previously published in NAR, 25 new ontologies have been added (Table 1). Users are encouraged to go online at http://www.ebi.ac.uk/ontology-lookup/ontologyList.do to access a full listing of currently available ontologies and CVs.
Figure 1.

Growth chart of the OLS data content. The amount of data loaded into the OLS, based on unique terms, has shown a 7-fold increase since the service went online. The first large increase is due to the incorporation of the NEWT taxonomy and the second large increase is due to the addition of a large number of ontologies at once.

Table 1.

A list of ontologies that have been added to the OLS in the last 2 years

Ontology PrefixOntology name
AAOAmphibian Gross Anatomy Ontology
APOYeast Phenotype Ontology
ATOAmphibian Taxonomy
CCOCell Cycle Ontology
EFOArrayExpress Experimental Factor Ontology
ENAEuropean Nucleotide Archive Submission Ontology
FBspFlybase Taxonomy
FMAFoundational Model of Anatomy Ontology
HAOHymenoptera Anatomy Ontology
HOMHomology Ontology
HPHuman Phenotype Ontology
IDOInfectious Disease Ontology
LSMLeukocyte Surface Marker Ontology
MIAAMinimal Information about Anatomy Ontology
MIROMosquito Insecticide Resistance Ontology
MPATHMouse Pathology Ontology
MSMass Spectrometry Ontology
PARProtein Affinity Reagents Ontology
PROProtein Ontology
TADSTick Gross Anatomy Ontology
TTOTeleost Taxonomy
WBbtC. elegans Gross Anatomy Ontology
WBlsC. elegans Development Ontology
WBPhenotypeC. elegans Phenotype Ontology
ZFAZebrafish Anatomy and Development Ontology
Growth chart of the OLS data content. The amount of data loaded into the OLS, based on unique terms, has shown a 7-fold increase since the service went online. The first large increase is due to the incorporation of the NEWT taxonomy and the second large increase is due to the addition of a large number of ontologies at once. A list of ontologies that have been added to the OLS in the last 2 years The ontologies and CVs loaded in the OLS are maintained by various external groups that are domain experts in their fields. To maintain the OLS as up-to-date as possible with the current state of knowledge, the ontology providers are polled on a daily basis and updated files are downloaded and parsed to update the core OLS database. Currently, the OLS loaders poll six different Concurrent Versioning System (CVS) repositories, complemented with three Subversion (SVN) Version Control repositories thanks to the recently added SVN support. A mechanism to download individual files available by HTTP or FTP has also been implemented, which allows the loaders to track changes in files that are not in CVS or SVN The OLS codebase is made available under the permissive Apache 2.0 Open Source License and is freely available from the Google Code project repository (http://code.google.com/p/ols-ebi/). A weekly updated MySQL database dump is also made available from the EBI FTP server (ftp://ftp.ebi.ac.uk/pub/databases/ols).

DATA MODEL IMPROVEMENTS

The OLS data loaders have been upgraded to be able to parse ontologies produced according to the Open Biomedical Ontology (OBO) 1.2 specification (http://www.geneontology.org/GO.format.obo-1_2.shtml) and can now capture previously unavailable information, such as custom name–value pairs and new synonym types (Figure 2). Another important feature of the OBO 1.2 specification is the ability to ‘import’ other ontologies and create relationships between local and imported terms.
Figure 2.

An example of custom synonyms and annotations. The synonyms in the blue box are uniquely defined in the PSI-MOD ontology. The annotations in the red box are examples of how the OLS can capture user-defined name–value attribute pairs.

An example of custom synonyms and annotations. The synonyms in the blue box are uniquely defined in the PSI-MOD ontology. The annotations in the red box are examples of how the OLS can capture user-defined name–value attribute pairs. In order to avoid loading multiple copies of imported ontologies, the loaders and database back end has been refactored such that each ontology is only loaded once. The OLS loaders are configured so that ontologies and CVs define one or more term prefixes that are local to itself (e.g. GO for the Gene Ontology). If the loaders encounter term identifiers that begin with a non-local prefix, they will query the OLS database and retrieve the latest version of the term in question and then proceed as normal. In this way, relationships across linked ontologies always refer to the most up-to-date data. These cross-ontology links can now also be queried and browsed, as shown in Figure 3.
Figure 3.

An example term hierarchy graph from the Ontology browser of the OLS. When using the ontology browser, selecting a term will provide a graphical display of all paths from that term to the ontology root term(s). Users can click on the terms to zoom the ontology browser to a particular term. Note the cross-ontology links in this example. The term scan start time from the MS (mass spectrometry) ontology has a relation to the second and minute terms of the unit ontology (UO), which in turn has relations to the PATO (phenotypic quality ontology).

An example term hierarchy graph from the Ontology browser of the OLS. When using the ontology browser, selecting a term will provide a graphical display of all paths from that term to the ontology root term(s). Users can click on the terms to zoom the ontology browser to a particular term. Note the cross-ontology links in this example. The term scan start time from the MS (mass spectrometry) ontology has a relation to the second and minute terms of the unit ontology (UO), which in turn has relations to the PATO (phenotypic quality ontology).

INTERACTIVE USER INTERFACE IMPROVEMENTS

Users of the OLS website typically do one of two things: query the database using the auto-suggestion search box or browse an ontology (or a subset thereof). Once a term has been highlighted, either from the search suggestions or from the ontology browser, the user will be shown a table containing all the metadata associated with this term (synonyms, definitions, comments, cross-references and any other annotations that were captured during the loading process). When using the ontology browser, a graph showing either all the possible paths from the selected term to the root term(s) of the ontology and the relationships between all involved terms (Figure 3) or a local relationship graph with only the direct parent terms and children terms will also be shown. The type of graph to be displayed is configurable from the ontology browser interface. These graphs are clickable image maps that will zoom and re-root the ontology browser to the selected term.

REUSABLE CODE COMPONENTS

As mentioned previously, the OLS has always provided a SOAP web service. This service has been used by several large projects, such as PRIDE (4), IntAct (5), CheBI (6) and the Proteomics Standards Initiative (PSI) (7). However, the main drawback to its wider acceptance and uptake has been the lack of a simple GUI component that could easily be plugged in to existing code projects. This has now been solved through the release of the open source OLS Dialog GUI component (8) (Figure 4). The OLS Dialog can easily be integrated into existing Java applications and gives access to the full range of query types supported by the OLS. Users can search for terms by name or by identifier, as well as use a graphical ontology browser to navigate an ontology and select a term. It is also possible to query terms from the PSI protein modification (PSI-MOD) (9) based on captured annotations from the source ontology. Users can select the type of annotation to query and enter a mass in Daltons and a desired precision and obtain all of the PSI-MOD entries that fit those parameters (e.g. find all PSI-MOD entries whose annotated monoisotopic mass is 120 D ± 1 D)
Figure 4.

Two screenshots of the OLS Dialog GUI component. The OLS Dialog allows Java application developers to seamlessly integrate OLS functionality in existing tools. Users can query the OLS by term name or ID. They can also locate terms by browsing an ontology and search the PSI-MOD ontology entries by term annotations specific to the ontology. In the left panel, a search on term names will also include partial matches and synonyms. In both cases, when a term is selected, the relevant associated metadata will be displayed and a graph similar to Figure 3 can be shown (not shown in these examples).

Two screenshots of the OLS Dialog GUI component. The OLS Dialog allows Java application developers to seamlessly integrate OLS functionality in existing tools. Users can query the OLS by term name or ID. They can also locate terms by browsing an ontology and search the PSI-MOD ontology entries by term annotations specific to the ontology. In the left panel, a search on term names will also include partial matches and synonyms. In both cases, when a term is selected, the relevant associated metadata will be displayed and a graph similar to Figure 3 can be shown (not shown in these examples). The OLS Dialog has been developed as part of the PRIDE Converter toolkit (10), which allows users to convert multiple mass spectroscopy file formats into PRIDE XML in preparation for submission to the PRIDE database and requires users to annotate their submission files with terms from specific ontologies. User feedback has indicated that the PRIDE Converter and OLS Dialog have made submissions to PRIDE much easier and this has been made apparent in the submission figures to PRIDE (11).

DISCUSSION

The OLS has matured into a stable system and has proven to be popular beyond our initial expectations. Besides being used as a stand-alone system, its functionality has been incorporated into several independent tools and large-scale projects and is also being used by several ontology developers as the primary ontology browser (12, 13 as examples). When it went into production in mid-2005, the OLS was without peer. While it was true that each major ontology provider (GO, TAIR, FlyBase, Wormbase, etc.) generally provided its own website to browse their individual ontology, there was no unified resource to interactively and programmatically query multiple ontologies using a single, constant interface. Other services quickly followed suit and current systems that perform a similar function now include the National Center for Biomedical Ontology (NCBO) BioPortal (14) and the National Cancer Institute BioPortal (http://bioportal.nci.nih.gov/ncbo/faces/index.xhtml), which uses a scaled-down version of the NCBO BioPortal codebase. A continuous increase in the number and scope of ontologies and CVs made available, coupled with an enhanced data model and better cross-ontology support will ensure that the OLS keeps its place as a valuable tool for a broad segment of the scientific community. The development team and its collaborators are always trying to make it easier to integrate OLS functionality into other projects, and the release of the OLS Dialog will go a long way towards achieving this goal. Ontology developers who wish to make their ontology available to the OLS can do so easily and through a variety of means, thanks to a versatile and automated loading process. The OLS team is always looking for feedback to improve the project. Users are encouraged to contact pride-support@ebi.ac.uk for comments, problems and suggestions for new functionality.

FUNDING

The OLS is funded by the European Commission ‘Serving Life-science Information for the Next Generation’ (SLING), grant agreement number 226073 (Integrating Activity) within Research Infrastructures of the FP7. Formerly OLS was funded by the Biotechnology and Biological Sciences Research Council (BBSRC) ISPIDER grant and European Union (EU) FP6 ‘Free European Life-science Information and Computational Services’ (FELICS) [contract number 021902 (RII3)] grants. Funding for open access charge: EU FP6 ‘Felics’ [contract number 021902 (RII3)]. Conflict of interest statement. None declared.
  14 in total

1.  The HUPO proteomics standards initiative--overcoming the fragmentation of proteomics data.

Authors:  Henning Hermjakob
Journal:  Proteomics       Date:  2006-09       Impact factor: 3.984

Review 2.  Biomedical ontologies: a functional perspective.

Authors:  Daniel L Rubin; Nigam H Shah; Natalya F Noy
Journal:  Brief Bioinform       Date:  2007-12-12       Impact factor: 11.622

3.  PRIDE Converter: making proteomics data-sharing easy.

Authors:  Harald Barsnes; Juan Antonio Vizcaíno; Ingvar Eidhammer; Lennart Martens
Journal:  Nat Biotechnol       Date:  2009-07       Impact factor: 54.908

4.  OLS dialog: an open-source front end to the ontology lookup service.

Authors:  Harald Barsnes; Richard G Côté; Ingvar Eidhammer; Lennart Martens
Journal:  BMC Bioinformatics       Date:  2010-01-17       Impact factor: 3.169

5.  IntAct--open source resource for molecular interaction data.

Authors:  S Kerrien; Y Alam-Faruque; B Aranda; I Bancarz; A Bridge; C Derow; E Dimmer; M Feuermann; A Friedrichsen; R Huntley; C Kohler; J Khadake; C Leroy; A Liban; C Lieftink; L Montecchi-Palazzi; S Orchard; J Risse; K Robbe; B Roechert; D Thorneycroft; Y Zhang; R Apweiler; H Hermjakob
Journal:  Nucleic Acids Res       Date:  2006-12-01       Impact factor: 16.971

6.  The Ontology Lookup Service, a lightweight cross-platform tool for controlled vocabulary queries.

Authors:  Richard G Côté; Philip Jones; Rolf Apweiler; Henning Hermjakob
Journal:  BMC Bioinformatics       Date:  2006-02-28       Impact factor: 3.169

7.  BioPortal: ontologies and integrated data resources at the click of a mouse.

Authors:  Natalya F Noy; Nigam H Shah; Patricia L Whetzel; Benjamin Dai; Michael Dorf; Nicholas Griffith; Clement Jonquet; Daniel L Rubin; Margaret-Anne Storey; Christopher G Chute; Mark A Musen
Journal:  Nucleic Acids Res       Date:  2009-05-29       Impact factor: 16.971

8.  ArrayExpress update--from an archive of functional genomics experiments to the atlas of gene expression.

Authors:  Helen Parkinson; Misha Kapushesky; Nikolay Kolesnikov; Gabriella Rustici; Mohammad Shojatalab; Niran Abeygunawardena; Hugo Berube; Miroslaw Dylag; Ibrahim Emam; Anna Farne; Ele Holloway; Margus Lukk; James Malone; Roby Mani; Ekaterina Pilicheva; Tim F Rayner; Faisal Rezwan; Anjan Sharma; Eleanor Williams; Xiangqun Zheng Bradley; Tomasz Adamusiak; Marco Brandizi; Tony Burdett; Richard Coulson; Maria Krestyaninova; Pavel Kurnosov; Eamonn Maguire; Sudeshna Guha Neogi; Philippe Rocca-Serra; Susanna-Assunta Sansone; Nataliya Sklyar; Mengyao Zhao; Ugis Sarkans; Alvis Brazma
Journal:  Nucleic Acids Res       Date:  2008-11-10       Impact factor: 16.971

9.  The Ontology Lookup Service: more data and better tools for controlled vocabulary queries.

Authors:  Richard G Côté; Philip Jones; Lennart Martens; Rolf Apweiler; Henning Hermjakob
Journal:  Nucleic Acids Res       Date:  2008-05-08       Impact factor: 16.971

10.  An anatomy ontology to represent biological knowledge in Dictyostelium discoideum.

Authors:  Pascale Gaudet; Jeffery G Williams; Petra Fey; Rex L Chisholm
Journal:  BMC Genomics       Date:  2008-03-18       Impact factor: 3.969

View more
  54 in total

1.  Mechismo: predicting the mechanistic impact of mutations and modifications on molecular interactions.

Authors:  Matthew J Betts; Qianhao Lu; YingYing Jiang; Armin Drusko; Oliver Wichmann; Mathias Utz; Ilse A Valtierra-Gutiérrez; Matthias Schlesner; Natalie Jaeger; David T Jones; Stefan Pfister; Peter Lichter; Roland Eils; Reiner Siebert; Peer Bork; Gordana Apic; Anne-Claude Gavin; Robert B Russell
Journal:  Nucleic Acids Res       Date:  2014-11-11       Impact factor: 16.971

Review 2.  Protein Bioinformatics Databases and Resources.

Authors:  Chuming Chen; Hongzhan Huang; Cathy H Wu
Journal:  Methods Mol Biol       Date:  2017

3.  Research resource: dkCOIN, the National Institute of Diabetes, Digestive and Kidney Diseases (NIDDK) consortium interconnectivity network: a pilot program to aggregate research resources generated by multiple research consortia.

Authors:  Neil J McKenna; Christopher L Howard; Michael Aufiero; Jeremy Easton-Marks; David L Steffen; Lauren B Becnel; Mark A Magnuson; Richard A McIndoe; Jean-Philippe Cartailler
Journal:  Mol Endocrinol       Date:  2012-06-25

4.  SEAweb: the small RNA Expression Atlas web application.

Authors:  Raza-Ur Rahman; Anna-Maria Liebhoff; Vikas Bansal; Maksims Fiosins; Ashish Rajput; Abdul Sattar; Daniel S Magruder; Sumit Madan; Ting Sun; Abhivyakti Gautam; Sven Heins; Timur Liwinski; Jörn Bethune; Claudia Trenkwalder; Juliane Fluck; Brit Mollenhauer; Stefan Bonn
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

5.  Cloud-based archived metabolomics data: A resource for in-source fragmentation/annotation, meta-analysis and systems biology.

Authors:  Amelia Palermo; Tao Huan; Duane Rinehart; Markus M Rinschen; Shuzhao Li; Valerie B O'Donnell; Eoin Fahy; Jingchuan Xue; Shankar Subramaniam; H Paul Benton; Gary Siuzdak
Journal:  Anal Sci Adv       Date:  2020-06-13

6.  Integration of cardiac proteome biology and medicine by a specialized knowledgebase.

Authors:  Nobel C Zong; Haomin Li; Hua Li; Maggie P Y Lam; Rafael C Jimenez; Christina S Kim; Ning Deng; Allen K Kim; Jeong Ho Choi; Ivette Zelaya; David Liem; David Meyer; Jacob Odeberg; Caiyun Fang; Hao-Jie Lu; Tao Xu; James Weiss; Huilong Duan; Mathias Uhlen; John R Yates; Rolf Apweiler; Junbo Ge; Henning Hermjakob; Peipei Ping
Journal:  Circ Res       Date:  2013-08-21       Impact factor: 17.367

7.  Next-generation diagnostics and disease-gene discovery with the Exomiser.

Authors:  Damian Smedley; Julius O B Jacobsen; Marten Jäger; Sebastian Köhler; Manuel Holtgrewe; Max Schubach; Enrico Siragusa; Tomasz Zemojtel; Orion J Buske; Nicole L Washington; William P Bone; Melissa A Haendel; Peter N Robinson
Journal:  Nat Protoc       Date:  2015-11-12       Impact factor: 13.491

Review 8.  Controlled vocabularies and ontologies in proteomics: overview, principles and practice.

Authors:  Gerhard Mayer; Andrew R Jones; Pierre-Alain Binz; Eric W Deutsch; Sandra Orchard; Luisa Montecchi-Palazzi; Juan Antonio Vizcaíno; Henning Hermjakob; David Oveillero; Randall Julian; Christian Stephan; Helmut E Meyer; Martin Eisenacher
Journal:  Biochim Biophys Acta       Date:  2013-02-19

9.  Computational resources for identifying and describing proteins driving liquid-liquid phase separation.

Authors:  Rita Pancsa; Wim Vranken; Bálint Mészáros
Journal:  Brief Bioinform       Date:  2021-09-02       Impact factor: 11.622

10.  Publishing FAIR Data: An Exemplar Methodology Utilizing PHI-Base.

Authors:  Alejandro Rodríguez-Iglesias; Alejandro Rodríguez-González; Alistair G Irvine; Ane Sesma; Martin Urban; Kim E Hammond-Kosack; Mark D Wilkinson
Journal:  Front Plant Sci       Date:  2016-05-12       Impact factor: 5.753

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.