Literature DB >> 28407033

BioCIDER: a Contextualisation InDEx for biological Resources discovery.

Carlos Horro¹, Martin Cook², Teresa K Attwood³, Michelle D Brazas⁴, John M Hancock¹, Patricia Palagi⁵, Manuel Corpas⁶, Rafael Jimenez².

Abstract

SUMMARY: The vast, uncoordinated proliferation of bioinformatics resources (databases, software tools, training materials etc.) makes it difficult for users to find them. To facilitate their discovery, various services are being developed to collect such resources into registries. We have developed BioCIDER, which, rather like online shopping 'recommendations', provides a contextualization index to help identify biological resources relevant to the content of the sites in which it is embedded.
AVAILABILITY AND IMPLEMENTATION: BioCIDER (www.biocider.org) is an open-source platform. Documentation is available online (https://goo.gl/Klc51G), and source code is freely available via GitHub (https://github.com/BioCIDER). The BioJS widget that enables websites to embed contextualization is available from the BioJS registry (http://biojs.io/). All code is released under an MIT licence. CONTACT: carlos.horro@earlham.ac.uk or rafael.jimenez@elixir-europe.org or manuel@repositive.io.

Entities: Chemical

Mesh：

Year: 2017 PMID： 28407033 PMCID： PMC5870719 DOI： 10.1093/bioinformatics/btx213

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Life-science resources (i.e. databases, tools, training materials, courses and event information) are many, diverse, widely dispersed and hard to find. The 2016 Nucleic Acids Research (NAR) Database Issue (Ridgen ) reported 1685 major databases in the molecular biology domain, while the latest NAR Web Server Issue (Editorial: Nucleic Acids Research annual Web Server Issue in 2016, 2016) presented 94 new resources for 2016 alone. It is thus difficult for researchers either to be aware of or to be familiar with all current and relevant research assets, compromising their uptake and general utility. Researchers do not just need better but, crucially, more practical ways to discover resources. Discoverability can be significantly enhanced if resources are exposed to users in context with the information they are currently browsing; if sufficiently relevant and well placed, this strategy may introduce advantageous new information and obviate the need to browse further. An analogy can be drawn, e.g. with prominent online retailers that use widgets to display ‘customers also bought’ or ‘recommended items based on your search’. To our knowledge, there is no life science-focused service that provides contextualized information driving researchers to discover relevant databases, tools, events and training materials. To address this gap, we have developed BioCIDER, a . BioCIDER automatically collects information (metadata and source description) from a variety of centralized registries, including the GOBLET training portal (Corpas ), the Bio.tools service registry (Ison et al., 2015), the iAnn collaborative event dissemination portal (Jimenez ) and TeSS, the ELIXIR training portal (https://tess.elixir-uk.org/); others (e.g. biosharing.org; McQuilton ) will be added in future. BioCIDER can be embedded in any website via its companion widget from the BioJavaScript (BioJS) open source library of components (Corpas ).

2 Materials and methods

The BioCIDER service comprises three parts: (i) a set of Python scripts that periodically import data from different sources across the Internet (the so-called data-import layer); (ii) a centralized Solr index, which stores all the information collected by these scripts; and (iii) a Web service provided by the Solr indexing system (http://lucene.apache.org/solr/) that allows access to the data from any location (i.e. not necessarily through one specific client). The data-import layer is highly modularized, and allows addition of new scripts in order to incorporate additional data sources to the platform. Data from each source are updated independently and automatically, triggering specific procedures with different frequencies set by timers taking into account the known update frequencies of each site. Solr is an open-source search platform which features indexed text storage, allowing data from the import layer to be stored and sorted, making it possible to perform complex searches throughout its entire content rapidly. These searches can be done (i) locally, (ii) through a Web-management application or (iii) via a Web service whose URL is publicly available and is used by the BioCIDER widget. Once a query is sent to the Web service, a simple JSON-formatted (JavaScript Object Notation) file is retrieved. The contextualization process is based on the Solr Term Frequency (TF)–Inverse Document Frequency (IDF) algorithm. This allows retrieval of lists of resources ordered by their similarity with the search phrase by measuring the TF (the number of times each term occurs in each document) and IDF (a measure of how common or rare the term is across all documents). BioCIDER can be used in any website with bioinformatics content, and can be shared and re-used by the BioJS community (Yachdav ). As input, the BioCIDER widget requires a query phrase, and returns a list of results (red rectangle, Fig. 1) showing the names of known resources, with links to their original source for further information (the number and type of results shown is configurable). The more descriptive the input words, the more relevant the suggestions. The widget can be configured to retrieve input automatically from content displayed in the webpage being browsed; its functionality is easy to integrate—it works autonomously, without interfering with the website’s behaviour, and can be themed to match the design of the host site.

Fig. 1

Screen-shot of the ‘Ensembl Browser Workshop: Plants and Microbes’ course page accessed from the GOBLET portal (www.mygoblet.org). The BioCIDER widget, framed inside the red rectangle, shows related databases, events, tools and training materials, contextualized to NGS. The widget is populated with short descriptions and links to relevant content on the course page. Original sources for each BioCIDER result can be accessed by clicking on them. The widget dynamically adapts to the shape and visual styles of its container, appearing as an integral part of the website

3 Conclusions

BioCIDER provides an infrastructure for intuitive, fast and non-intrusive discovery of bioinformatics databases, tools, training materials and events. Its Web service can be freely used by any client website or user, retrieving contextualized resource information in a simple JSON formatted file. This Web service is based on the Solr index system and its TF–IDF algorithm, which receives the query phrase from the client, measures the relevance to known resources, and returns a sorted, relevance-ranked list. An open source BioJS widget is provided to embed the query results in the host webpages. The BioCIDER widget is already being used by organizations such as GOBLET (Attwood ) and ELIXIR-UK (http://www.elixir-uk.org). Thus, users interested on NGS courses who have found the ‘Introduction to NGS Bioinformatics’ course on the GOBLET Training Portal (http://www.mygoblet.org/training-portal/courses/introduction-ngs-bioinformatics) will also discover in the BioCIDER widget (called here ‘Similar items’) many topic-related training materials and events potentially useful to them.

9 in total

1. GOBLET: the Global Organisation for Bioinformatics Learning, Education and Training.

Authors: Teresa K Attwood; Teresa K Atwood; Erik Bongcam-Rudloff; Michelle E Brazas; Manuel Corpas; Pascale Gaudet; Fran Lewitter; Nicola Mulder; Patricia M Palagi; Maria Victoria Schneider; Celia W G van Gelder
Journal: PLoS Comput Biol Date: 2015-04-09 Impact factor: 4.475

2. Anatomy of BioJS, an open source community for the life sciences.

Authors: Guy Yachdav; Tatyana Goldberg; Sebastian Wilzbach; David Dao; Iris Shih; Saket Choudhary; Steve Crouch; Max Franz; Alexander García; Leyla J García; Björn A Grüning; Devasena Inupakutika; Ian Sillitoe; Anil S Thanki; Bruno Vieira; José M Villaveces; Maria V Schneider; Suzanna Lewis; Steve Pettifer; Burkhard Rost; Manuel Corpas
Journal: Elife Date: 2015-07-08 Impact factor: 8.140

3. iAnn: an event sharing platform for the life sciences.

Authors: Rafael C Jimenez; Juan P Albar; Jong Bhak; Marie-Claude Blatter; Thomas Blicher; Michelle D Brazas; Cath Brooksbank; Aidan Budd; Javier De Las Rivas; Jacqueline Dreyer; Marc A van Driel; Michael J Dunn; Pedro L Fernandes; Celia W G van Gelder; Henning Hermjakob; Vassilios Ioannidis; David P Judge; Pascal Kahlem; Eija Korpelainen; Hans-Joachim Kraus; Jane Loveland; Christine Mayer; Jennifer McDowall; Federico Moran; Nicola Mulder; Tommi Nyronen; Kristian Rother; Gustavo A Salazar; Reinhard Schneider; Allegra Via; Jose M Villaveces; Ping Yu; Maria V Schneider; Teresa K Attwood; Manuel Corpas
Journal: Bioinformatics Date: 2013-06-05 Impact factor: 6.937

4. The GOBLET training portal: a global repository of bioinformatics training materials, courses and trainers.

Authors: Manuel Corpas; Rafael C Jimenez; Erik Bongcam-Rudloff; Aidan Budd; Michelle D Brazas; Pedro L Fernandes; Bruno Gaeta; Celia van Gelder; Eija Korpelainen; Fran Lewitter; Annette McGrath; Daniel MacLean; Patricia M Palagi; Kristian Rother; Jan Taylor; Allegra Via; Mick Watson; Maria Victoria Schneider; Teresa K Attwood
Journal: Bioinformatics Date: 2014-09-04 Impact factor: 6.937

5. BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences.

Authors: Peter McQuilton; Alejandra Gonzalez-Beltran; Philippe Rocca-Serra; Milo Thurston; Allyson Lister; Eamonn Maguire; Susanna-Assunta Sansone
Journal: Database (Oxford) Date: 2016-05-17 Impact factor: 3.451

6. Editorial: Nucleic Acids Research annual Web Server Issue in 2016.

Authors:
Journal: Nucleic Acids Res Date: 2016-07-08 Impact factor: 16.971

7. BioJS: an open source standard for biological visualisation - its status in 2014.

Authors: Manuel Corpas; Rafael Jimenez; Seth J Carbon; Alex García; Leyla Garcia; Tatyana Goldberg; John Gomez; Alexis Kalderimis; Suzanna E Lewis; Ian Mulvany; Aleksandra Pawlik; Francis Rowland; Gustavo Salazar; Fabian Schreiber; Ian Sillitoe; William H Spooner; Anil S Thanki; José M Villaveces; Guy Yachdav; Henning Hermjakob
Journal: F1000Res Date: 2014-02-13

8. Tools and data services registry: a community effort to document bioinformatics resources.

Authors: Jon Ison; Kristoffer Rapacki; Hervé Ménager; Matúš Kalaš; Emil Rydza; Piotr Chmura; Christian Anthon; Niall Beard; Karel Berka; Dan Bolser; Tim Booth; Anthony Bretaudeau; Jan Brezovsky; Rita Casadio; Gianni Cesareni; Frederik Coppens; Michael Cornell; Gianmauro Cuccuru; Kristian Davidsen; Gianluca Della Vedova; Tunca Dogan; Olivia Doppelt-Azeroual; Laura Emery; Elisabeth Gasteiger; Thomas Gatter; Tatyana Goldberg; Marie Grosjean; Björn Grüning; Manuela Helmer-Citterich; Hans Ienasescu; Vassilios Ioannidis; Martin Closter Jespersen; Rafael Jimenez; Nick Juty; Peter Juvan; Maximilian Koch; Camille Laibe; Jing-Woei Li; Luana Licata; Fabien Mareuil; Ivan Mičetić; Rune Møllegaard Friborg; Sebastien Moretti; Chris Morris; Steffen Möller; Aleksandra Nenadic; Hedi Peterson; Giuseppe Profiti; Peter Rice; Paolo Romano; Paola Roncaglia; Rabie Saidi; Andrea Schafferhans; Veit Schwämmle; Callum Smith; Maria Maddalena Sperotto; Heinz Stockinger; Radka Svobodová Vařeková; Silvio C E Tosatto; Victor de la Torre; Paolo Uva; Allegra Via; Guy Yachdav; Federico Zambelli; Gert Vriend; Burkhard Rost; Helen Parkinson; Peter Løngreen; Søren Brunak
Journal: Nucleic Acids Res Date: 2015-11-03 Impact factor: 16.971

9. The 2016 database issue of Nucleic Acids Research and an updated molecular biology database collection.

Authors: Daniel J Rigden; Xosé M Fernández-Suárez; Michael Y Galperin
Journal: Nucleic Acids Res Date: 2016-01-04 Impact factor: 16.971

9 in total

1 in total

1. ELIXIR-UK role in bioinformatics training at the national level and across ELIXIR.

Authors: L Larcombe; R Hendricusdottir; T K Attwood; F Bacall; N Beard; L J Bellis; W B Dunn; J M Hancock; A Nenadic; C Orengo; B Overduin; S-A Sansone; M Thurston; M R Viant; C L Winder; C A Goble; C P Ponting; G Rustici
Journal: F1000Res Date: 2017-06-21

1 in total