Literature DB >> 25236461

The Ensembl REST API: Ensembl Data for Any Language.

Andrew Yates1, Kathryn Beal1, Stephen Keenan1, William McLaren1, Miguel Pignatelli1, Graham R S Ritchie2, Magali Ruffier1, Kieron Taylor1, Alessandro Vullo1, Paul Flicek2.   

Abstract

MOTIVATION: We present a Web service to access Ensembl data using Representational State Transfer (REST). The Ensembl REST server enables the easy retrieval of a wide range of Ensembl data by most programming languages, using standard formats such as JSON and FASTA while minimizing client work. We also introduce bindings to the popular Ensembl Variant Effect Predictor tool permitting large-scale programmatic variant analysis independent of any specific programming language.
AVAILABILITY AND IMPLEMENTATION: The Ensembl REST API can be accessed at http://rest.ensembl.org and source code is freely available under an Apache 2.0 license from http://github.com/Ensembl/ensembl-rest.
© The Author 2014. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2014        PMID: 25236461      PMCID: PMC4271150          DOI: 10.1093/bioinformatics/btu613

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

Ensembl data (Flicek ) are accessible in a variety of ways including our genome browser, BioMart data-mining tool (Kinsella ), the Bioconductor R package (Gentleman ) or viewers such as Dalliance (Down ). Direct programmatic access has historically required a native client interacting with the database in its own programming language. This solution required the reimplementation of functionality across multiple languages, which was costly to maintain and partly led to our focus only on a Perl API for Ensembl. Third-party Ensembl API bindings do exist, but may struggle to keep pace with new developments resulting in possible out-of-date implementations. Remote procedure calling and Web services are a widely accepted solution to provide a single programming interface to multiple languages. SOAP is one such popular technology (McWilliam ) but is burdened with significant setup and processing overhead for the client. Newer Web services are based on the Representational State Transfer (REST) pattern (Fielding, 2000). REST encourages the reuse of HTTP technology to send and receive data in the same way a Web browser requests and receives a Web page via uniform resource locators (URLs). REST imposes no format restrictions on the returned data. The Distributed Annotation System (Jenkinson ) is an attempt to design a generic REST system for a biological 'data, but necessitated custom client libraries coupled with an XML format seen as verbose and inflexible. We present a set of REST bindings to access Ensembl data and tools, exposing these data in simple formats that are well understood by a large proportion of programming languages.

2 IMPLEMENTATION

2.1 The REST API

Ensembl REST API calls are based on simple URLs that specify both the data required and the format returned. For example, to request the protein sequence for BRCA2 (ENSP00000439902) as a JSON document, it is only necessary to enter the following URL in a Web browser: http://rest.ensembl.org/sequence/id/ENSP00000439902.json The components of the URL define the desired data and/or action. For example, /sequence performs sequence retrieval while /vep provides access to the results of the Ensembl Variant Effect Predictor (VEP) (McLaren ). With similar commands, users can retrieve features such as genes, orthologs, variants, genomic alignments and gene trees or perform actions such as convert coordinates between assemblies among other actions. Incorporating Ensembl data into any analysis requires a HTTP library and a JSON parser. HTTPS is supported for secure client access. Below is an example of a request from Python to print number of variants that overlap with BRAF (ENSG00000157764): import requests url = ‘’ r = requests.get(url) if not r.status_code == 200: raise Exception(‘Bad response’) print len(r.json()) Required parameters are embedded within the URL. In our first example, ENSP00000439902 was a required parameter. Optional parameters are specified as simple ‘key = value’, pairs appended to a URL. Whenever possible, the server infers parameters from others. For example, species is determined from an Ensembl gene or a transcript stable ID. HTTP headers can be used to control the output format or enable on-the-fly gzip compression. Each endpoint emits four output formats: JSON, XML, YAML and JSON-P in addition to formats such as PhyloXML (Han and Zmasek, 2009). Clients are rate limited on our public REST server to 15 requests per second, i.e. 54 000 requests per hour. To enable users to manage this or any future limits, each call to the REST server returns a number of HTTP response headers describing the IP address’s current limits and how long before this limit resets. Once exceeded, a Retry-After header is sent back, and the client is expected to sleep for this period before making a new request. All endpoints are accompanied by automatically generated documentation available from http://rest.ensembl.org. Each page details parameters with brief descriptions and example values. Example URLs are shown alongside their output and example clients. These clients are written in popular programming languages, such as Python, Perl and Ruby. A higher-level user guide detailing version migration guides, best practices and more advanced clients is also provided at https://github.com/Ensembl/ensembl-rest/wiki.

2.2 Large-scale variant annotation

During our beta phase for the REST API, we observed significant traffic coming from requests to the VEP endpoint and annotating human variation. In fact, we noticed single IP addresses sending ∼3 million requests to the VEP endpoint, which we interpreted as full annotation of human variomes. This process is inefficient: using HTTP GET and rate limited to 15 requests per second, it would take 2.3 days annotate a human variome. In response to this use case, we extended the Ensembl REST API to allow the submission of up to 1000 variants in a single HTTP POST. Adoption of VEP’s offline cache files enables our service to annotate large quantities of variants without exceeding HTTP time-outs. Benchmarking (Table 1) has shown annotation rates of ∼1000 variants per second for a single sample (HG00096) extracted from 1000 Genomes Phase 1 data (The 1000 Genomes Project Consortium, 2012). Analysis of a human variome within an hour is feasible using our public server.
Table 1.

Benchmarking 3.5 million non-synonymous single nucleotide variants from three Amazon Elastic Compute Cloud (EC2) locations

EC2 locationElapsed time (s)Variants per second
Ireland3631946.66
VA, USA3508998.59
Singapore5385652.45

Benchmarks are averaged over three runs with a single Perl program with nine concurrent connections.

Benchmarking 3.5 million non-synonymous single nucleotide variants from three Amazon Elastic Compute Cloud (EC2) locations Benchmarks are averaged over three runs with a single Perl program with nine concurrent connections.

3 DISCUSSION

The Ensembl REST API can be used to query the Ensembl data resources and tools from a variety of programming languages and enables flexible programmatic access previously only supported by our Perl API. The reduced setup costs for a client means that users can interact with the latest Ensembl data without the need to follow our regular API releases. Supporting POST requests for VEP enables the annotation of large-scale variation datasets without the need to download or host the VEP code or cache files. HTTP has also proven a more robust data protocol when compared with MySQL improving user experience for worldwide users. A number of native third-party APIs have been developed to help access the REST API in languages such as Python, R and JavaScript, which demonstrates the usefulness of our REST API to these increasingly popular bioinformatics languages. JavaScript applications such as Wasabi (http://wasabiapp.org) import Ensembl Gene Trees and genome multiple sequence alignments via REST creating a seamless link between tool and data. RNACentral (Bateman ) displays non-coding gene models alongside Ensembl annotation in Genoverse (http://genoverse.org/), a HTML5 genome browser, using data from our REST API. REST has shown itself to be a sustainable model for the distribution of genomic data to multiple programming languages. We plan to expand the coverage of Ensembl data and tools hosted in it. We also plan to provide more formats from the service such as VCF output from our VEP endpoint. We will continue to work with tool developers to ensure the service is suitable for their purposes.
  10 in total

1.  RNAcentral: A vision for an international database of RNA sequences.

Authors:  Alex Bateman; Shipra Agrawal; Ewan Birney; Elspeth A Bruford; Janusz M Bujnicki; Guy Cochrane; James R Cole; Marcel E Dinger; Anton J Enright; Paul P Gardner; Daniel Gautheret; Sam Griffiths-Jones; Jen Harrow; Javier Herrero; Ian H Holmes; Hsien-Da Huang; Krystyna A Kelly; Paul Kersey; Ana Kozomara; Todd M Lowe; Manja Marz; Simon Moxon; Kim D Pruitt; Tore Samuelsson; Peter F Stadler; Albert J Vilella; Jan-Hinnerk Vogel; Kelly P Williams; Mathew W Wright; Christian Zwieb
Journal:  RNA       Date:  2011-09-22       Impact factor: 4.942

2.  Bioconductor: open software development for computational biology and bioinformatics.

Authors:  Robert C Gentleman; Vincent J Carey; Douglas M Bates; Ben Bolstad; Marcel Dettling; Sandrine Dudoit; Byron Ellis; Laurent Gautier; Yongchao Ge; Jeff Gentry; Kurt Hornik; Torsten Hothorn; Wolfgang Huber; Stefano Iacus; Rafael Irizarry; Friedrich Leisch; Cheng Li; Martin Maechler; Anthony J Rossini; Gunther Sawitzki; Colin Smith; Gordon Smyth; Luke Tierney; Jean Y H Yang; Jianhua Zhang
Journal:  Genome Biol       Date:  2004-09-15       Impact factor: 13.583

3.  Dalliance: interactive genome viewing on the web.

Authors:  Thomas A Down; Matias Piipari; Tim J P Hubbard
Journal:  Bioinformatics       Date:  2011-01-19       Impact factor: 6.937

4.  Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor.

Authors:  William McLaren; Bethan Pritchard; Daniel Rios; Yuan Chen; Paul Flicek; Fiona Cunningham
Journal:  Bioinformatics       Date:  2010-06-18       Impact factor: 6.937

5.  Ensembl BioMarts: a hub for data retrieval across taxonomic space.

Authors:  Rhoda J Kinsella; Andreas Kähäri; Syed Haider; Jorge Zamora; Glenn Proctor; Giulietta Spudich; Jeff Almeida-King; Daniel Staines; Paul Derwent; Arnaud Kerhornou; Paul Kersey; Paul Flicek
Journal:  Database (Oxford)       Date:  2011-07-23       Impact factor: 3.451

6.  Integrating biological data--the Distributed Annotation System.

Authors:  Andrew M Jenkinson; Mario Albrecht; Ewan Birney; Hagen Blankenburg; Thomas Down; Robert D Finn; Henning Hermjakob; Tim J P Hubbard; Rafael C Jimenez; Philip Jones; Andreas Kähäri; Eugene Kulesha; José R Macías; Gabrielle A Reeves; Andreas Prlić
Journal:  BMC Bioinformatics       Date:  2008-07-22       Impact factor: 3.169

7.  Ensembl 2014.

Authors:  Paul Flicek; M Ridwan Amode; Daniel Barrell; Kathryn Beal; Konstantinos Billis; Simon Brent; Denise Carvalho-Silva; Peter Clapham; Guy Coates; Stephen Fitzgerald; Laurent Gil; Carlos García Girón; Leo Gordon; Thibaut Hourlier; Sarah Hunt; Nathan Johnson; Thomas Juettemann; Andreas K Kähäri; Stephen Keenan; Eugene Kulesha; Fergal J Martin; Thomas Maurel; William M McLaren; Daniel N Murphy; Rishi Nag; Bert Overduin; Miguel Pignatelli; Bethan Pritchard; Emily Pritchard; Harpreet S Riat; Magali Ruffier; Daniel Sheppard; Kieron Taylor; Anja Thormann; Stephen J Trevanion; Alessandro Vullo; Steven P Wilder; Mark Wilson; Amonida Zadissa; Bronwen L Aken; Ewan Birney; Fiona Cunningham; Jennifer Harrow; Javier Herrero; Tim J P Hubbard; Rhoda Kinsella; Matthieu Muffato; Anne Parker; Giulietta Spudich; Andy Yates; Daniel R Zerbino; Stephen M J Searle
Journal:  Nucleic Acids Res       Date:  2013-12-06       Impact factor: 16.971

8.  phyloXML: XML for evolutionary biology and comparative genomics.

Authors:  Mira V Han; Christian M Zmasek
Journal:  BMC Bioinformatics       Date:  2009-10-27       Impact factor: 3.169

9.  An integrated map of genetic variation from 1,092 human genomes.

Authors:  Goncalo R Abecasis; Adam Auton; Lisa D Brooks; Mark A DePristo; Richard M Durbin; Robert E Handsaker; Hyun Min Kang; Gabor T Marth; Gil A McVean
Journal:  Nature       Date:  2012-11-01       Impact factor: 49.962

10.  Analysis Tool Web Services from the EMBL-EBI.

Authors:  Hamish McWilliam; Weizhong Li; Mahmut Uludag; Silvano Squizzato; Young Mi Park; Nicola Buso; Andrew Peter Cowley; Rodrigo Lopez
Journal:  Nucleic Acids Res       Date:  2013-05-13       Impact factor: 16.971

  10 in total
  70 in total

1.  Autism-associated missense genetic variants impact locomotion and neurodevelopment in Caenorhabditis elegans.

Authors:  Wan-Rong Wong; Katherine I Brugman; Shayda Maher; Jun Young Oh; Kevin Howe; Mihoko Kato; Paul W Sternberg
Journal:  Hum Mol Genet       Date:  2019-07-01       Impact factor: 6.150

2.  FAST: FAST Analysis of Sequences Toolbox.

Authors:  Travis J Lawrence; Kyle T Kauffman; Katherine C H Amrine; Dana L Carper; Raymond S Lee; Peter J Becich; Claudia J Canales; David H Ardell
Journal:  Front Genet       Date:  2015-05-19       Impact factor: 4.599

3.  easyGWAS: A Cloud-Based Platform for Comparing the Results of Genome-Wide Association Studies.

Authors:  Dominik G Grimm; Damian Roqueiro; Patrice A Salomé; Stefan Kleeberger; Bastian Greshake; Wangsheng Zhu; Chang Liu; Christoph Lippert; Oliver Stegle; Bernhard Schölkopf; Detlef Weigel; Karsten M Borgwardt
Journal:  Plant Cell       Date:  2016-12-16       Impact factor: 11.277

4.  Advancing Pharmacogenomics Education in the Core PharmD Curriculum through Student Personal Genomic Testing.

Authors:  Solomon M Adams; Kacey B Anderson; James C Coons; Randall B Smith; Susan M Meyer; Lisa S Parker; Philip E Empey
Journal:  Am J Pharm Educ       Date:  2016-02-25       Impact factor: 2.047

5.  mtProtEvol: the resource presenting molecular evolution analysis of proteins involved in the function of Vertebrate mitochondria.

Authors:  Anastasia A Kuzminkova; Anastasia D Sokol; Kristina E Ushakova; Konstantin Yu Popadin; Konstantin V Gunbin
Journal:  BMC Evol Biol       Date:  2019-02-26       Impact factor: 3.260

6.  Refgenie: a reference genome resource manager.

Authors:  Michał Stolarczyk; Vincent P Reuter; Jason P Smith; Neal E Magee; Nathan C Sheffield
Journal:  Gigascience       Date:  2020-02-01       Impact factor: 6.524

7.  The druggable genome and support for target identification and validation in drug development.

Authors:  Chris Finan; Anna Gaulton; Felix A Kruger; R Thomas Lumbers; Tina Shah; Jorgen Engmann; Luana Galver; Ryan Kelley; Anneli Karlsson; Rita Santos; John P Overington; Aroon D Hingorani; Juan P Casas
Journal:  Sci Transl Med       Date:  2017-03-29       Impact factor: 17.956

8.  Alignment of Biological Sequences with Jalview.

Authors:  James B Procter; G Mungo Carstairs; Ben Soares; Kira Mourão; T Charles Ofoegbu; Daniel Barton; Lauren Lui; Anne Menard; Natasha Sherstnev; David Roldan-Martinez; Suzanne Duce; David M A Martin; Geoffrey J Barton
Journal:  Methods Mol Biol       Date:  2021

9.  Genomic selection signatures in autism spectrum disorder identifies cognitive genomic tradeoff and its relevance in paradoxical phenotypes of deficits versus potentialities.

Authors:  Anil Prakash; Moinak Banerjee
Journal:  Sci Rep       Date:  2021-05-13       Impact factor: 4.379

10.  QuadBase2: web server for multiplexed guanine quadruplex mining and visualization.

Authors:  Parashar Dhapola; Shantanu Chowdhury
Journal:  Nucleic Acids Res       Date:  2016-05-16       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.