Carlo Ravagli1, Francois Pognan1, Philippe Marc1. 1. PreClinical Safety, Translational Sciences, Novartis Institute for Biomedical Research, Basel, CH-4002, Switzerland.
Abstract
The lack of controlled terminology and ontology usage leads to incomplete search results and poor interoperability between databases. One of the major underlying challenges of data integration is curating data to adhere to controlled terminologies and/or ontologies. Finding subject matter experts with the time and skills required to perform data curation is often problematic. In addition, existing tools are not designed for continuous data integration and collaborative curation. This results in time-consuming curation workflows that often become unsustainable. The primary objective of OntoBrowser is to provide an easy-to-use online collaborative solution for subject matter experts to map reported terms to preferred ontology (or code list) terms and facilitate ontology evolution. Additional features include web service access to data, visualization of ontologies in hierarchical/graph format and a peer review/approval workflow with alerting. AVAILABILITY AND IMPLEMENTATION: The source code is freely available under the Apache v2.0 license. Source code and installation instructions are available at http://opensource.nibr.com This software is designed to run on a Java EE application server and store data in a relational database. CONTACT: philippe.marc@novartis.com.
The lack of controlled terminology and ontology usage leads to incomplete search results and poor interoperability between databases. One of the major underlying challenges of data integration is curating data to adhere to controlled terminologies and/or ontologies. Finding subject matter experts with the time and skills required to perform data curation is often problematic. In addition, existing tools are not designed for continuous data integration and collaborative curation. This results in time-consuming curation workflows that often become unsustainable. The primary objective of OntoBrowser is to provide an easy-to-use online collaborative solution for subject matter experts to map reported terms to preferred ontology (or code list) terms and facilitate ontology evolution. Additional features include web service access to data, visualization of ontologies in hierarchical/graph format and a peer review/approval workflow with alerting. AVAILABILITY AND IMPLEMENTATION: The source code is freely available under the Apache v2.0 license. Source code and installation instructions are available at http://opensource.nibr.com This software is designed to run on a Java EE application server and store data in a relational database. CONTACT: philippe.marc@novartis.com.
Many code lists and ontologies have been created to model biological concepts. Databases are able to consolidate and integrate data from multiple sources by adhering to controlled terminologies and ontologies. Contributors to such databases are generally required to submit data according to a compatible vocabulary (Côté ; de Coronado ; Smedley ). However, biological results are often captured using inconsistent nomenclatures and/or vocabularies incompatible with the target databases. In order to achieve data consistency and compatibility, nomenclature from the original data must be mapped to a target ontology or code list. This translation task needs to be conducted by domain experts. Typically ontology creation tools like WebProtégé (Horridge ) are not designed for this type of task or user. Other tools such as Karma (Szekely ) are designed specifically for mapping/alignment and address the initial integration problem (see here for a review of useful tools: http://www.mkbergman.com/1769/50-ontology-mapping-and-alignment-tools/). However, none of the tools are designed to be part of an ecosystem facilitating continuous data integration. Consequently, the vocabulary mapping/translation task and consequent ontology evolution are often performed using snapshots of exported data followed by reconciliation. Continuous data integration coupled with evolving ontologies was a major challenge faced by the Innovative Medicines Initiative eTOX consortium (Cases ). Results from over 6000 toxicology reports were manually extracted over a 5-year period. The original reports, generated over several decades, were contributed by 13 independent pharmaceutical companies and hence written using many different nomenclatures. The complexity and scale of the challenge was addressed by developing the OntoBrowser tool. The tool has matured over 4 years and has been collaboratively used by over a dozen consortium domain experts to map more than 70 000 distinct terms to 6352 preferred ontology or code list terms.
2 Using OntoBrowser
2.1 Online collaborative curation
It is common for multiple curators, potentially located at different sites, to work collaboratively. The majority of tools for ontology curation and/or mapping of reported terms to controlled terminologies are deployed locally and restricted to manipulating data in isolation. This leads to multiple local copies of the data being modified independently and hence requires merging at each milestone. The peer-review process also requires careful coordination. Even with the correct file formats, tools, and procedures in place, reconciliation and coordination can be time-consuming and error prone. Furthermore, it adds additional unnecessary effort/overhead and complexity to the curation process. OntoBrowser was specifically designed for multi-user online collaboration and peer review. A central database hosts all the data (i.e. multiple ontologies and code lists) providing a single working copy shared by all users. As a web-based application, it can be deployed on a server accessible via the internet (like the eTOX instance) or within an intranet.The user interface has been developed in close collaboration with multiple biologists to ensure that the design is both logical and efficient. It allows searching and browsing of the concepts. The user interface supports a read-only mode and a curation mode, depending on the privileges defined for the user. The curation mode exposes functionality for modifying ontologies, mapping report terms and approving (or rejecting) pending changes. The peer review workflow is implemented as part of the core application functionality (Fig. 1). E-mail alerts are sent to curators when pending changes are outstanding and require approval. Other features include versioning and a complete audit history.
Fig. 1.
OntoBrowser integration in an ecosystem. OntoBrowser internal mechanics (right), supplying web services to another application (left)
OntoBrowser integration in an ecosystem. OntoBrowser internal mechanics (right), supplying web services to another application (left)Another key feature of the software is the automatic pre-mapping of unmapped reported terms to ontologies (or code lists). The logic includes stemming and ignores the order of words to provide fuzzy matching. The automated fuzzy matching pre-mapping greatly reduced the curation work required by the scientists during the eTOX project.
2.2 Web services enabling system integration
OntoBrowser provides web services to expose ontology data and application functionality to other applications or services. For example, Novartis utilized OntoBrowser web services in its Translational Safety Platform (TSP) data warehouse to develop an interactive histo-pathology search application enabling users to query microscopic findings using multiple ontology terms. These findings are continually consolidated from multiple source systems into a data warehouse. OntoBrowser is used by domain experts to map the tissue and histopathology vocabularies to two respective ontologies. At runtime, the TSP frontend calls OntoBrowser’s search and ontology visualisation web services to provide a user interface to create search criterion, allowing users to search and browse the anatomy and histopathology ontologies directly within the TSP application (Fig. 1). The TSP backend calls OntoBrowser web services to retrieve a list of subclasses of the ontology terms selected by the user to query the data warehouse.Using the web services, ontologies (optionally including synonyms) can also be fully exported from OntoBrowser. Several standard ontology formats are supported to ensure interoperability with other tools/systems, e.g. OWL (RDF and XML), OBO, Manchester and Turtle.
3 Installing OntoBrowser
OntoBrowser requires a Java EE application server (e.g. Wildfly or WebLogic) and a relational database (e.g. MySQL or Oracle). Setting up a new instance, including the initial load of ontologies and connection, takes approximately 2 h. A full installation guide is provided in the source code repository. Ontologies from the public domain as provided by the OBO Foundry (Smith ) can be easily imported and synchronized.
Authors: Barry Smith; Michael Ashburner; Cornelius Rosse; Jonathan Bard; William Bug; Werner Ceusters; Louis J Goldberg; Karen Eilbeck; Amelia Ireland; Christopher J Mungall; Neocles Leontis; Philippe Rocca-Serra; Alan Ruttenberg; Susanna-Assunta Sansone; Richard H Scheuermann; Nigam Shah; Patricia L Whetzel; Suzanna Lewis Journal: Nat Biotechnol Date: 2007-11 Impact factor: 54.908
Authors: Matthew Horridge; Tania Tudorache; Csongor Nuylas; Jennifer Vendetti; Natalya F Noy; Mark A Musen Journal: Bioinformatics Date: 2014-04-26 Impact factor: 6.937
Authors: Richard Côté; Florian Reisinger; Lennart Martens; Harald Barsnes; Juan Antonio Vizcaino; Henning Hermjakob Journal: Nucleic Acids Res Date: 2010-05-11 Impact factor: 16.971
Authors: Damian Smedley; Syed Haider; Steffen Durinck; Luca Pandini; Paolo Provero; James Allen; Olivier Arnaiz; Mohammad Hamza Awedh; Richard Baldock; Giulia Barbiera; Philippe Bardou; Tim Beck; Andrew Blake; Merideth Bonierbale; Anthony J Brookes; Gabriele Bucci; Iwan Buetti; Sarah Burge; Cédric Cabau; Joseph W Carlson; Claude Chelala; Charalambos Chrysostomou; Davide Cittaro; Olivier Collin; Raul Cordova; Rosalind J Cutts; Erik Dassi; Alex Di Genova; Anis Djari; Anthony Esposito; Heather Estrella; Eduardo Eyras; Julio Fernandez-Banet; Simon Forbes; Robert C Free; Takatomo Fujisawa; Emanuela Gadaleta; Jose M Garcia-Manteiga; David Goodstein; Kristian Gray; José Afonso Guerra-Assunção; Bernard Haggarty; Dong-Jin Han; Byung Woo Han; Todd Harris; Jayson Harshbarger; Robert K Hastings; Richard D Hayes; Claire Hoede; Shen Hu; Zhi-Liang Hu; Lucie Hutchins; Zhengyan Kan; Hideya Kawaji; Aminah Keliet; Arnaud Kerhornou; Sunghoon Kim; Rhoda Kinsella; Christophe Klopp; Lei Kong; Daniel Lawson; Dejan Lazarevic; Ji-Hyun Lee; Thomas Letellier; Chuan-Yun Li; Pietro Lio; Chu-Jun Liu; Jie Luo; Alejandro Maass; Jerome Mariette; Thomas Maurel; Stefania Merella; Azza Mostafa Mohamed; Francois Moreews; Ibounyamine Nabihoudine; Nelson Ndegwa; Céline Noirot; Cristian Perez-Llamas; Michael Primig; Alessandro Quattrone; Hadi Quesneville; Davide Rambaldi; James Reecy; Michela Riba; Steven Rosanoff; Amna Ali Saddiq; Elisa Salas; Olivier Sallou; Rebecca Shepherd; Reinhard Simon; Linda Sperling; William Spooner; Daniel M Staines; Delphine Steinbach; Kevin Stone; Elia Stupka; Jon W Teague; Abu Z Dayem Ullah; Jun Wang; Doreen Ware; Marie Wong-Erasmus; Ken Youens-Clark; Amonida Zadissa; Shi-Jian Zhang; Arek Kasprzyk Journal: Nucleic Acids Res Date: 2015-04-20 Impact factor: 16.971
Authors: Montserrat Cases; Katharine Briggs; Thomas Steger-Hartmann; François Pognan; Philippe Marc; Thomas Kleinöder; Christof H Schwab; Manuel Pastor; Jörg Wichard; Ferran Sanz Journal: Int J Mol Sci Date: 2014-11-14 Impact factor: 5.923
Authors: Sean Watford; Stephen Edwards; Michelle Angrish; Richard S Judson; Katie Paul Friedman Journal: Toxicol Appl Pharmacol Date: 2019-08-09 Impact factor: 4.219
Authors: Oriol López-Massaguer; Kevin Pinto-Gil; Ferran Sanz; Alexander Amberg; Lennart T Anger; Manuela Stolte; Carlo Ravagli; Philippe Marc; Manuel Pastor Journal: Toxicol Sci Date: 2018-03-01 Impact factor: 4.849
Authors: François Pognan; Thomas Steger-Hartmann; Carlos Díaz; Niklas Blomberg; Frank Bringezu; Katharine Briggs; Giulia Callegaro; Salvador Capella-Gutierrez; Emilio Centeno; Javier Corvi; Philip Drew; William C Drewe; José M Fernández; Laura I Furlong; Emre Guney; Jan A Kors; Miguel Angel Mayer; Manuel Pastor; Janet Piñero; Juan Manuel Ramírez-Anguita; Francesco Ronzano; Philip Rowell; Josep Saüch-Pitarch; Alfonso Valencia; Bob van de Water; Johan van der Lei; Erik van Mulligen; Ferran Sanz Journal: Pharmaceuticals (Basel) Date: 2021-03-08