Literature DB >> 20562413

Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor.

William McLaren1, Bethan Pritchard, Daniel Rios, Yuan Chen, Paul Flicek, Fiona Cunningham.   

Abstract

SUMMARY: A tool to predict the effect that newly discovered genomic variants have on known transcripts is indispensible in prioritizing and categorizing such variants. In Ensembl, a web-based tool (the SNP Effect Predictor) and API interface can now functionally annotate variants in all Ensembl and Ensembl Genomes supported species. AVAILABILITY: The Ensembl SNP Effect Predictor can be accessed via the Ensembl website at http://www.ensembl.org/. The Ensembl API (http://www.ensembl.org/info/docs/api/api_installation.html for installation instructions) is open source software.

Entities:  

Mesh:

Year:  2010        PMID: 20562413      PMCID: PMC2916720          DOI: 10.1093/bioinformatics/btq330

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

As costs of resequencing and genotyping fall, increasing amounts of variation data are being produced that cannot be annotated effectively without access to considerable computational resources and genomic annotation databases. Often the most valuable information to know about a variant is the effect the observed alleles have on transcripts, which may aid selection of variations for genotyping studies and in turn have a part to play in the discovery of new drug targets and other biologically significant loci. Deriving this information manually is laborious and error-prone, impractical for large sets of data and impossible without access to suitable genomic annotation resources. Of critical interest is the existence of any novel variant positions within a dataset, and what information is available for known variant loci. Although these answers are available through dbSNP (Sherry et al., 1999), the process of submitting the data to the NCBI to be processed and annotated can often take months and requires the data to be made public. Even before the developments reported in this article, Ensembl (Flicek et al., 2010) could also be used to derive similar annotation by setting up a full local Ensembl database containing the variant information and running scripts from the Ensembl Variation database production pipeline. Other tools available for the annotation of single nucleotide polymorphisms (SNPs) in humans are comprehensively reviewed in Karchin (2008). Existing methods of deriving the effects of variants can be limiting: many present too high a hurdle in terms of timeframe, privacy or ease of use; others are species limited. To address this, Ensembl has been extended to include an easy to use web-based tool for deriving variation consequences, as well as programmatic access to the same functionality using the Ensembl Perl API.

2 IMPLEMENTATION

2.1 SNP Effect Predictor

The Ensembl project provides access to genomic annotation for numerous species via its web-based genome browser, as well as programmatic access via the object-oriented Perl API. The SNP Effect Predictor tool on the Ensembl website, accessed via the ‘Manage your data’ link on any species-specific Ensembl page (e.g. http://www.ensembl.org/Homo_sapiens/), uses the API calls described below to provide access to consequence prediction functionality without the need for writing code. The SNP Effect Predictor can be used for all species within Ensembl, including those with no existing variation dataset. Users upload lists of variant positions and alleles via a HTML form page. Input for each variant consists simply of a chromosome (or contig name in the absence of assembled chromosomes), start and end coordinates, strand designation and a set of alleles. Users can then select text or HTML formatted output, the latter incorporating hyperlinks to loci, transcripts and genes in the Ensembl genome browser. The output includes: Ensembl stable identifiers for the relevant transcript and gene; transcript-relative coordinates; possible amino acids; and the identifier of any existing variants that are co-located with the user-defined variant. Since a variant may co-locate with more than one transcript, one line of output is provided for each instance of co-location. Consequence types predicted by Ensembl are shown in transcript context in Figure 1, with further detail provided at (http://www.ensembl.org/info/docs/variation/index.html).
Fig. 1.

Consequence types predicted by Ensembl in the context of transcript structure. The other types shown apply to non-protein coding genes.

Consequence types predicted by Ensembl in the context of transcript structure. The other types shown apply to non-protein coding genes. User uploaded variations can subsequently be viewed in the context of their location on the Ensembl browser, with each uploaded file given its own track on the browser's location view.

2.2 Ensembl API

The Ensembl API can be installed on any operating system that supports Perl and MySQL, and can be configured to use any combination of local or remote databases. The Ensembl Variation API (Chen et al., 2010; Rios et al., 2010) exists to retrieve variation data such as SNPs, insertions and deletions from Ensembl databases. Entities such as variants are represented as objects, created by adaptors that act as factories for generating specific objects. Example code demonstrating the use of the API to derive consequences for a list of variant positions is shown in Supplementary Figure 1. Documentation on the API is found at http://www.ensembl.org/info/docs/Pdoc/ensembl-variation/index.html. Given a variant position, the API retrieves overlapping transcripts from the Ensembl Core database and determines where in the transcript structure the variant falls. If the variant falls within an exon, new codons for each variant allele are derived and compared to the reference codon. The location of the variant relative to regulatory regions is also assessed using the Ensembl Functional Genomics database where available. The results, including amino acid changes and relative positions in the cDNA and peptide sequences, are stored in the resulting transcript variation objects, along with one or more named consequence types. At present Ensembl provides only a Perl API, but enabled by the open source nature of the project Python (PyCogent, http://pycogent.sourceforge.net/examples/query_ensembl.html; PyGr, http://code.google.com/p/pygr/wiki/PygrOnEnsembl) APIs have been created. As yet, these do not encompass the full scope of Ensembl, and hence do not include consequence prediction functionality.

3 RESULTS

The SNP Effect Predictor tool can be used to quickly and accurately predict the effects of variants on Ensembl-annotated transcripts. Up to 750 variant loci can be uploaded in a file to http://www.ensembl.org/, with the time taken to return results scaling linearly with the number of variants uploaded within a species; calculation time will also vary by species depending on the number of transcripts. A file containing 750 variants in Homo sapiens takes ∼35 s to return results; an equivalent calculation in Danio rerio takes 20 s. Users with more than 750 variants may download a standalone script to run locally that produces identical results. The script can be configured to connect to both the public Ensembl database as well as any combination of local and remote databases. A wider range of input file formats is also supported, including the commonly used pileup variant format. The provision of a simple web interface to powerful algorithms that transparently process large data volumes is a valuable asset to users without computing expertise, and also to those who need a quick and easy way to retrieve annotation for novel variants. Having this tool integrated with the extensive, rich annotation available on the Ensembl website will facilitate interpretation and analysis of the data. Direct use of the Ensembl Variation API enables users to incorporate consequence prediction into their variation software and pipelines, providing predicted consequences for an unlimited number of variants. By optimizing code and database access times it is possible to retrieve consequences for 1000 distinct variants in H.sapiens in <30 s; for D.rerio this takes <15 s. The flexibility of the Ensembl API means that consequences can be predicted for any species with an Ensembl gene set, or using any valid Ensembl database on users' own systems. Using these features in coalition with others in the API enables the creation of advanced pipelines that can produce biologically important information from high-throughput experimental data. Such information is invaluable both as a screening system for variants and as an aid in the study of phenotypically linked variants. Funding: Wellcome Trust; European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement number 200754 – the GEN2PHEN project. Conflict of Interest: none declared.
  5 in total

Review 1.  dbSNP-database for single nucleotide polymorphisms and other classes of minor genetic variation.

Authors:  S T Sherry; M Ward; K Sirotkin
Journal:  Genome Res       Date:  1999-08       Impact factor: 9.043

2.  A database and API for variation, dense genotyping and resequencing data.

Authors:  Daniel Rios; William M McLaren; Yuan Chen; Ewan Birney; Arne Stabenau; Paul Flicek; Fiona Cunningham
Journal:  BMC Bioinformatics       Date:  2010-05-11       Impact factor: 3.169

3.  Next generation tools for the annotation of human SNPs.

Authors:  Rachel Karchin
Journal:  Brief Bioinform       Date:  2009-01       Impact factor: 11.622

4.  Novel insights into the genomic basis of citrus canker based on the genome sequences of two strains of Xanthomonas fuscans subsp. aurantifolii.

Authors:  Leandro M Moreira; Nalvo F Almeida; Neha Potnis; Luciano A Digiampietri; Said S Adi; Julio C Bortolossi; Ana C da Silva; Aline M da Silva; Fabrício E de Moraes; Julio C de Oliveira; Robson F de Souza; Agda P Facincani; André L Ferraz; Maria I Ferro; Luiz R Furlan; Daniele F Gimenez; Jeffrey B Jones; Elliot W Kitajima; Marcelo L Laia; Rui P Leite; Milton Y Nishiyama; Julio Rodrigues Neto; Letícia A Nociti; David J Norman; Eric H Ostroski; Haroldo A Pereira; Brian J Staskawicz; Renata I Tezza; Jesus A Ferro; Boris A Vinatzer; João C Setubal
Journal:  BMC Genomics       Date:  2010-04-13       Impact factor: 3.969

5.  Ensembl's 10th year.

Authors:  Paul Flicek; Bronwen L Aken; Benoit Ballester; Kathryn Beal; Eugene Bragin; Simon Brent; Yuan Chen; Peter Clapham; Guy Coates; Susan Fairley; Stephen Fitzgerald; Julio Fernandez-Banet; Leo Gordon; Stefan Gräf; Syed Haider; Martin Hammond; Kerstin Howe; Andrew Jenkinson; Nathan Johnson; Andreas Kähäri; Damian Keefe; Stephen Keenan; Rhoda Kinsella; Felix Kokocinski; Gautier Koscielny; Eugene Kulesha; Daniel Lawson; Ian Longden; Tim Massingham; William McLaren; Karine Megy; Bert Overduin; Bethan Pritchard; Daniel Rios; Magali Ruffier; Michael Schuster; Guy Slater; Damian Smedley; Giulietta Spudich; Y Amy Tang; Stephen Trevanion; Albert Vilella; Jan Vogel; Simon White; Steven P Wilder; Amonida Zadissa; Ewan Birney; Fiona Cunningham; Ian Dunham; Richard Durbin; Xosé M Fernández-Suarez; Javier Herrero; Tim J P Hubbard; Anne Parker; Glenn Proctor; James Smith; Stephen M J Searle
Journal:  Nucleic Acids Res       Date:  2009-11-11       Impact factor: 16.971

  5 in total
  916 in total

Review 1.  Bioinformatics for personal genome interpretation.

Authors:  Emidio Capriotti; Nathan L Nehrt; Maricel G Kann; Yana Bromberg
Journal:  Brief Bioinform       Date:  2012-01-13       Impact factor: 11.622

2.  Next-generation sequencing for cancer diagnostics: a practical perspective.

Authors:  Cliff Meldrum; Maria A Doyle; Richard W Tothill
Journal:  Clin Biochem Rev       Date:  2011-11

3.  Rapid whole-genome sequencing for genetic disease diagnosis in neonatal intensive care units.

Authors:  Carol Jean Saunders; Neil Andrew Miller; Sarah Elizabeth Soden; Darrell Lee Dinwiddie; Aaron Noll; Noor Abu Alnadi; Nevene Andraws; Melanie LeAnn Patterson; Lisa Ann Krivohlavek; Joel Fellis; Sean Humphray; Peter Saffrey; Zoya Kingsbury; Jacqueline Claire Weir; Jason Betley; Russell James Grocock; Elliott Harrison Margulies; Emily Gwendolyn Farrow; Michael Artman; Nicole Pauline Safina; Joshua Erin Petrikin; Kevin Peter Hall; Stephen Francis Kingsmore
Journal:  Sci Transl Med       Date:  2012-10-03       Impact factor: 17.956

4.  The first two confirmed sub-Saharan African families with germline TP53 mutations causing Li-Fraumeni syndrome.

Authors:  Shelley Macaulay; Quintin Clive Goodyear; Mia Kruger; Wenlong Chen; Fahmida Essop; Amanda Krause
Journal:  Fam Cancer       Date:  2018-10       Impact factor: 2.375

5.  Discovery of a ZIP7 inhibitor from a Notch pathway screen.

Authors:  Erin Nolin; Sara Gans; Luis Llamas; Somnath Bandyopadhyay; Scott M Brittain; Paula Bernasconi-Elias; Kyle P Carter; Joseph J Loureiro; Jason R Thomas; Markus Schirle; Yi Yang; Ning Guo; Guglielmo Roma; Sven Schuierer; Martin Beibel; Alicia Lindeman; Frederic Sigoillot; Amy Chen; Kevin X Xie; Samuel Ho; John Reece-Hoyes; Wilhelm A Weihofen; Kayla Tyskiewicz; Dominic Hoepfner; Richard I McDonald; Nicolette Guthrie; Abhishek Dogra; Haibing Guo; Jian Shao; Jian Ding; Stephen M Canham; Geoff Boynton; Elizabeth L George; Zhao B Kang; Christophe Antczak; Jeffery A Porter; Owen Wallace; John A Tallarico; Amy E Palmer; Jeremy L Jenkins; Rishi K Jain; Simon M Bushell; Christy J Fryer
Journal:  Nat Chem Biol       Date:  2019-01-14       Impact factor: 15.040

6.  Phenotypic and molecular insights into spinal muscular atrophy due to mutations in BICD2.

Authors:  Alexander M Rossor; Emily C Oates; Hannah K Salter; Yang Liu; Sinead M Murphy; Rebecca Schule; Michael A Gonzalez; Mariacristina Scoto; Rahul Phadke; Caroline A Sewry; Henry Houlden; Albena Jordanova; Iyailo Tournev; Teodora Chamova; Ivan Litvinenko; Stephan Zuchner; David N Herrmann; Julian Blake; Janet E Sowden; Gyuda Acsadi; Michael L Rodriguez; Manoj P Menezes; Nigel F Clarke; Michaela Auer Grumbach; Simon L Bullock; Francesco Muntoni; Mary M Reilly; Kathryn N North
Journal:  Brain       Date:  2014-12-14       Impact factor: 13.501

7.  NOX1 Regulates Collective and Planktonic Cell Migration: Insights From Patients With Pediatric-Onset IBD and NOX1 Deficiency.

Authors:  Razieh Khoshnevisan; Michael Anderson; Stephen Babcock; Sierra Anderson; David Illig; Benjamin Marquardt; Roya Sherkat; Katrin Schröder; Franziska Moll; Sebastian Hollizeck; Meino Rohlfs; Christoph Walz; Peyman Adibi; Abbas Rezaei; Alireza Andalib; Sibylle Koletzko; Aleixo M Muise; Scott B Snapper; Christoph Klein; Jay R Thiagarajah; Daniel Kotlarz
Journal:  Inflamm Bowel Dis       Date:  2020-07-17       Impact factor: 5.325

8.  Neurobeachin is required postsynaptically for electrical and chemical synapse formation.

Authors:  Adam C Miller; Lisa H Voelker; Arish N Shah; Cecilia B Moens
Journal:  Curr Biol       Date:  2014-12-04       Impact factor: 10.834

9.  Homologous Mutation to Human BRAF V600E Is Common in Naturally Occurring Canine Bladder Cancer--Evidence for a Relevant Model System and Urine-Based Diagnostic Test.

Authors:  Brennan Decker; Heidi G Parker; Deepika Dhawan; Erika M Kwon; Eric Karlins; Brian W Davis; José A Ramos-Vara; Patty L Bonney; Elizabeth A McNiel; Deborah W Knapp; Elaine A Ostrander
Journal:  Mol Cancer Res       Date:  2015-03-12       Impact factor: 5.852

10.  Assessing the phenotypic effects in the general population of rare variants in genes for a dominant Mendelian form of diabetes.

Authors:  Jason Flannick; Nicola L Beer; Alexander G Bick; Vineeta Agarwala; Janne Molnes; Namrata Gupta; Noël P Burtt; Jose C Florez; James B Meigs; Herman Taylor; Valeriya Lyssenko; Henrik Irgens; Ervin Fox; Frank Burslem; Stefan Johansson; M Julia Brosnan; Jeff K Trimmer; Christopher Newton-Cheh; Tiinamaija Tuomi; Anders Molven; James G Wilson; Christopher J O'Donnell; Sekar Kathiresan; Joel N Hirschhorn; Pål R Njølstad; Tim Rolph; J G Seidman; Stacey Gabriel; David R Cox; Christine E Seidman; Leif Groop; David Altshuler
Journal:  Nat Genet       Date:  2013-10-06       Impact factor: 38.330

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.