Literature DB >> 30668641

InterMineR: an R package for InterMine databases.

Konstantinos A Kyritsis1,2,3, Bing Wang2,3, Julie Sullivan2,3, Rachel Lyne2,3, Gos Micklem2,3.   

Abstract

SUMMARY: InterMineR is a package designed to provide a flexible interface between the R programming environment and biological databases built using the InterMine platform. The package offers access to the flexible query builder and the library of term enrichment tools of the InterMine framework, as well as interoperability with other Bioconductor packages. This facilitates automation of data retrieval tasks as well as downstream analysis with existing statistical tools in the R environment.
AVAILABILITY AND IMPLEMENTATION: InterMineR is free and open source, released under the LGPL licence and available from the Bioconductor project and Github (https://bioconductor.org/packages/release/bioc/html/InterMineR.html, https://github.com/intermine/interMineR). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2019. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2019        PMID: 30668641      PMCID: PMC6736411          DOI: 10.1093/bioinformatics/btz039

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Nowadays, the problem of storing, accessing and analyzing huge amounts of data is acutely felt in the life sciences. InterMine constitutes a data warehouse framework, which provides the ability to access, retrieve and analyze rapidly a variety of biological data (Smith ). With intuitive tools, like gene set statistical analysis, customized queries and pre-defined templates which incorporate popular queries for specific types of biological data, InterMine databases facilitate the analysis of heterogeneous biological information. Many model organism groups have adopted InterMine (see http://registry.intermine.org) resulting in its use in many studies. The R programming language is primarily characterized by its powerful statistical and graphical capabilities and is one of the tools of choice for the field of data science (R Core Team, 2008). The language has gained further popularity through its use by Bioconductor, an open source software project based on R, which aims to facilitate the integrative analysis of biological data (Gentleman ; Huber ). The InterMineR package has been developed to provide access to InterMine databases through the R programming environment, and its formats are compatible with many Bioconductor workflows.

2 Implementation

2.1 Performing complex queries

InterMineR performs standard HTTP requests to the InterMine web service API through the use of the httr package. The input lists of data identifiers are uploaded to InterMine and the query results returned in the form of JSON or XML before being converted to human readable data.frame or list R objects, which can be easily used for further downstream analysis. This package provides access to the pre-defined search forms (template queries) of each InterMine instance, which can be used and explored through the getTemplate() and getTemplateQuery() functions, as well as the ability to create user-defined custom queries. Users can assign several different data identifiers as input to these queries and edit or add additional constraints as required. The creation of custom queries in InterMineR is based on the data model of InterMine. Users can define which data they want to select, and constraints can be added to any attribute type, for instance numeric (e.g. genomic locations) and text (e.g. gene identifiers) data types. This enables the users to add, remove or modify existing constraints and set specific constraints on the data that are to be returned from the query. For this purpose, the getModel() function was designed to retrieve detailed information about the available attributes of each InterMine database. The functions setConstraints() and setQuery() were designed to assist the users in creating custom queries and assigning multiple data identifiers to a specific filter constraint. These functions bypass the manual design and manipulation of lengthy query list objects. Instead both the constraints and the query itself can be defined in two steps, leading to the creation of an R object of the class InterMineR which constitutes the final query.

2.2 Enrichment analysis

InterMineR also provides an interface between the statistical features of the R language and the gene set enrichment analysis provided by the InterMine framework. Specifically, InterMine provides Gene Ontology enrichment statistics as well as enrichment statistics for other annotation types (Smith ). The function getWidgets() can be used to obtain the enrichment analysis ‘widgets’ of InterMine, which can then be used to calculate enrichment for a pre-defined list of biological entities. The hypergeometric distribution is used to calculate significant P-values and various methods are available for multiple test correction. To facilitate the visualization of the enrichment analysis results, the function convertToGeneAnswers() was designed. GeneAnswers is an R package that provides statistical and network visualization functions to explore possible relationships between a group of genes and a list of categories (e.g. Gene Ontology terms) (Feng , 2012; Huang ) (Supplementary Fig. S1).

2.3 Conversion functions for InterMineR query results

For better integration of the InterMineR package in Bioconductor workflows we created two new functions. convertToGRanges() function converts genomic location data, retrieved by InterMineR queries, to GRanges objects, which constitute scalable data structures for annotated genomic ranges (Lawrence ) (Supplementary Table S1). The GRanges package allows a host of range-based operations such as overlap queries and nearest neighbour. The function convertToRangedSummarizedExperiment() was designed to facilitate the analysis of gene expression data and associated annotations that are retrieved from InterMineR queries. This function converts InterMineR query results to R objects of the class RangedSummarizedExperiment, a flexible class that converts the information about genes (rows), samples (columns) and gene expression values into separate R objects (Morgan ).

3 Conclusion

Programmatic access to the InterMine data model allows for iteration and repeated performance of complex queries with the option to adjust specific filter constraints and values. With the InterMineR package complex queries from different InterMine databases can be generated and the results analyzed with the wealth of statistical and graphical tools offered by the R language and the many Bioconductor packages. To facilitate InterMineR usage, vignettes with detailed examples are available in both the Bioconductor project and GitHub repository of the package. In the future, a graphical user interface will be developed for InterMineR, based on the Shiny framework (Chang ). This aims to further simplify the design of custom queries and facilitate the use of the package by novice R users. Click here for additional data file.
  6 in total

1.  Using the bioconductor GeneAnswers package to interpret gene lists.

Authors:  Gang Feng; Pamela Shaw; Steven T Rosen; Simon M Lin; Warren A Kibbe
Journal:  Methods Mol Biol       Date:  2012

Review 2.  Orchestrating high-throughput genomic analysis with Bioconductor.

Authors:  Wolfgang Huber; Vincent J Carey; Robert Gentleman; Simon Anders; Marc Carlson; Benilton S Carvalho; Hector Corrada Bravo; Sean Davis; Laurent Gatto; Thomas Girke; Raphael Gottardo; Florian Hahne; Kasper D Hansen; Rafael A Irizarry; Michael Lawrence; Michael I Love; James MacDonald; Valerie Obenchain; Andrzej K Oleś; Hervé Pagès; Alejandro Reyes; Paul Shannon; Gordon K Smyth; Dan Tenenbaum; Levi Waldron; Martin Morgan
Journal:  Nat Methods       Date:  2015-02       Impact factor: 28.547

3.  A collection of bioconductor methods to visualize gene-list annotations.

Authors:  Gang Feng; Pan Du; Nancy L Krett; Michael Tessel; Steven Rosen; Warren A Kibbe; Simon M Lin
Journal:  BMC Res Notes       Date:  2010-01-19

4.  Bioconductor: open software development for computational biology and bioinformatics.

Authors:  Robert C Gentleman; Vincent J Carey; Douglas M Bates; Ben Bolstad; Marcel Dettling; Sandrine Dudoit; Byron Ellis; Laurent Gautier; Yongchao Ge; Jeff Gentry; Kurt Hornik; Torsten Hothorn; Wolfgang Huber; Stefano Iacus; Rafael Irizarry; Friedrich Leisch; Cheng Li; Martin Maechler; Anthony J Rossini; Gunther Sawitzki; Colin Smith; Gordon Smyth; Luke Tierney; Jean Y H Yang; Jianhua Zhang
Journal:  Genome Biol       Date:  2004-09-15       Impact factor: 13.583

5.  InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data.

Authors:  Richard N Smith; Jelena Aleksic; Daniela Butano; Adrian Carr; Sergio Contrino; Fengyuan Hu; Mike Lyne; Rachel Lyne; Alex Kalderimis; Kim Rutherford; Radek Stepan; Julie Sullivan; Matthew Wakeling; Xavier Watkins; Gos Micklem
Journal:  Bioinformatics       Date:  2012-09-27       Impact factor: 6.937

6.  Software for computing and annotating genomic ranges.

Authors:  Michael Lawrence; Wolfgang Huber; Hervé Pagès; Patrick Aboyoun; Marc Carlson; Robert Gentleman; Martin T Morgan; Vincent J Carey
Journal:  PLoS Comput Biol       Date:  2013-08-08       Impact factor: 4.475

  6 in total
  8 in total

1.  HumanMine: advanced data searching, analysis and cross-species comparison.

Authors:  Rachel Lyne; Adrián Bazaga; Daniela Butano; Sergio Contrino; Joshua Heimbach; Fengyuan Hu; Alexis Kalderimis; Mike Lyne; Kevin Reierskog; Radek Stepan; Julie Sullivan; Archie Wise; Yo Yehudi; Gos Micklem
Journal:  Database (Oxford)       Date:  2022-07-12       Impact factor: 4.462

2.  ILF3 contributes to the establishment of the antiviral type I interferon program.

Authors:  Samir F Watson; Nicolas Bellora; Sara Macias
Journal:  Nucleic Acids Res       Date:  2020-01-10       Impact factor: 16.971

3.  Exposure to maternal high-fat diet induces extensive changes in the brain of adult offspring.

Authors:  Darren J Fernandes; Shoshana Spring; Jason P Lerch; Mark R Palmert; Anna R Roy; Lily R Qiu; Yohan Yee; Brian J Nieman
Journal:  Transl Psychiatry       Date:  2021-03-02       Impact factor: 6.222

4.  Hymenoptera Genome Database: new genomes and annotation datasets for improved go enrichment and orthologue analyses.

Authors:  Amy T Walsh; Deborah A Triant; Justin J Le Tourneau; Md Shamimuzzaman; Christine G Elsik
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

5.  Gene coexpression analysis in Arabidopsis thaliana based on public microarray data.

Authors:  Vasileios L Zogopoulos; Apostolos Malatras; Ioannis Michalopoulos
Journal:  STAR Protoc       Date:  2022-02-26

6.  The GEnetic Syntax Score: a genetic risk assessment implementation tool grading the complexity of coronary artery disease-rationale and design of the GESS study.

Authors:  Ioannis S Vizirianakis; Fani Chatzopoulou; Andreas S Papazoglou; Efstratios Karagiannidis; Georgios Sofidis; Nikolaos Stalikas; Christos Stefopoulos; Konstantinos A Kyritsis; Nikolaos Mittas; Nikoleta F Theodoroula; Aggeliki Lampri; Eleni Mezarli; Anastasios Kartas; Dimitrios Chatzidimitriou; Anna Papa-Konidari; Eleftherios Angelis; Ηaralambos Karvounis; Georgios Sianos
Journal:  BMC Cardiovasc Disord       Date:  2021-06-08       Impact factor: 2.298

7.  Plasma Proteomics of Renal Function: A Trans-ethnic Meta-analysis and Mendelian Randomization Study.

Authors:  Pamela Matías-García; Rory Wilson; Qi Guo; Shaza Zaghlool; James Eales; Xiaoguang Xu; Fadi Charchar; John Dormer; Haifa Maalmi; Pascal Schlosser; Mohamed Elhadad; Jana Nano; Sapna Sharma; Annette Peters; Alessia Fornoni; Dennis Mook-Kanamori; Juliane Winkelmann; John Danesh; Emanuele Di Angelantonio; Willem Ouwehand; Nicholas Watkins; David Roberts; Agnese Petrera; Johannes Graumann; Wolfgang Koenig; Kristian Hveem; Christian Jonasson; Anna Köttgen; Adam Butterworth; Marco Prunotto; Stefanie Hauck; Christian Herder; Karsten Suhre; Christian Gieger; Maciej Tomaszewski; Alexander Teumer; Melanie Waldenberger
Journal:  J Am Soc Nephrol       Date:  2021-06-16       Impact factor: 14.978

8.  MaizeMine: A Data Mining Warehouse for the Maize Genetics and Genomics Database.

Authors:  Md Shamimuzzaman; Jack M Gardiner; Amy T Walsh; Deborah A Triant; Justin J Le Tourneau; Aditi Tayal; Deepak R Unni; Hung N Nguyen; John L Portwood; Ethalinda K S Cannon; Carson M Andorf; Christine G Elsik
Journal:  Front Plant Sci       Date:  2020-10-22       Impact factor: 5.753

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.