Literature DB >> 26740527

UCSC Data Integrator and Variant Annotation Integrator.

Angie S Hinrichs1, Brian J Raney1, Matthew L Speir1, Brooke Rhead2, Jonathan Casper1, Donna Karolchik1, Robert M Kuhn1, Kate R Rosenbloom1, Ann S Zweig1, David Haussler3, W James Kent1.   

Abstract

UNLABELLED: Two new tools on the UCSC Genome Browser web site provide improved ways of combining information from multiple datasets, optionally including the user's own custom track data and/or data from track hubs. The Data Integrator combines columns from multiple data tracks, showing all items from the first track along with overlapping items from the other tracks. The Variant Annotation Integrator is tailored to adding functional annotations to variant calls; it offers a more restricted set of underlying data tracks but adds predictions of each variant's consequences for any overlapping or nearby gene transcript. When available, it optionally adds additional annotations including effect prediction scores from dbNSFP for missense mutations, ENCODE regulatory summary tracks and conservation scores.
AVAILABILITY AND IMPLEMENTATION: The web tools are freely available at http://genome.ucsc.edu/ and the underlying database is available for download at http://hgdownload.cse.ucsc.edu/ The software (written in C and Javascript) is available from https://genome-store.ucsc.edu/ and is freely available for academic and non-profit usage; commercial users must obtain a license. CONTACT: angie@soe.ucsc.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2016. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2016        PMID: 26740527      PMCID: PMC4848401          DOI: 10.1093/bioinformatics/btv766

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

The UCSC Genome Browser database (Karolchik ; Speir ) contains a wealth of genomic datasets. One of its strengths is the suite of web tools at http://genome.ucsc.edu/ for visualizing and extracting data from the database in combination with the user's own custom track data as well as data provided via track hubs (Raney ). For over a decade, the Table Browser (Karolchik ) has provided the capability to extract textual data from any data track, with many options such as filtering by values, format conversion and sequence output. However, its abilities to combine data from multiple tracks are limited. It provides an intersection function that retains items in the selected track that overlap with items in a second track; however, the identities and attributes of items in the second track are not retained, so it is not possible to associate items in one track with items in another track. Over the years, many users of the UCSC Genome Browser have requested that capability, so we have developed a new tool, the Data Integrator (DI), to provide a flexible and open-ended query interface for combining data columns from multiple tracks. One common request is to add annotations to a user's custom track of variant calls, for example the name of any gene that the variant intersects. Variant functional annotation is a well-studied problem (although by no means solved) for which many tools have been developed, such as the Ensembl Variant Effect Predictor (McLaren ), snpEff (Cingolani ) and ANNOVAR (Wang ). Inspired by those tools, we have added the Variant Annotation Integrator (VAI) with a focus on data tracks that may help to predict whether a given variant may modify a gene or regulatory region.

2 Data integrator

The Data Integrator is a single-page web application for building a query on Genome Browser tracks including user custom tracks and track hubs. It is reachable by the ‘Tools’ menu in the top navigation bar of the Genome Browser web site (http://genome.ucsc.edu). The ‘Help’ menu links to the Data Integrator User’s Guide (http://genome.ucsc.edu/goldenPath/help/hgIntegratorHelp.html). The steps for building a query are as follows: Select the genome and assembly version to use. Select the genomic region(s) to annotate; the entire genome, the position range viewed in the Genome Browser, or a list of regions. The position range box accepts search terms such as gene symbols, cytobands, sequence accessions, or keywords. Add data source(s) by selecting a track from menus in the ‘Add Data Source’ section and clicking the ‘Add’ button. Tracks can be dragged and dropped to change their order, or removed by clicking the ‘X’ icon. The track at the top of the list is the primary track; all of its items within the chosen region(s) will appear in the output. Items from the rest of the tracks are included only if they overlap an item from the primary track and are in the chosen region. The output may be downloaded to a local file, optionally compressed with gzip, or may be viewed in the browser window. Click the ‘Choose fields…’ button to select or deselect data source columns to appear in the output. Click the ‘Get output’ button to start the query. The results of the query are returned as tab-separated text with selected columns of the primary data source followed by selected columns of additional data sources.

3 Variant annotation integrator

While the DI offers the entire set of tracks with none selected by default, the VAI requires variant calls as its input and requires a gene annotation track. A limited selection of additional tracks is offered. The benefit of this imposed query structure is that a more in-depth analysis of possible functional impacts of each variant can be performed. Like the DI, the VAI is reachable from the Tools menu. Documentation appears following the configuration section. Variant calls can be provided in Variant Call Format (VCF; Danecek ), Personal Genome SNP format (http://genome.ucsc.edu/FAQ/FAQformat.html#format10), or as a collection of dbSNP rsNNNNN identifiers. The VAI predicts functional consequences based on the location of a variant within a gene transcript if applicable, using terms from the Sequence Ontology (SO; Eilbeck ) to facilitate downstream analysis and comparison of results with other variant analysis tools. For example, a single-base substitution in the coding region of a transcript is classified as synonymous_variant, missense_variant, stop_lost or stop_gained (See Supplementary Table S2 for the complete set of consequence SO terms used by the VAI.) The gene annotation set should be chosen carefully, because small differences in transcript annotations can result in significant differences in predicted consequences (McCarthy ). The Genome Browser database includes a variety of gene annotation sets; experimentation in the VAI may help to choose the best one for a particular purpose. The VAI offers additional data sources when they are available in the chosen assembly database; these may be added if desired. For identifying putative regulatory regions, two summary tracks from ENCODE (The ENCODE Project Consortium, 2012) are offered for hg19/GRCh37 and hg38/GRCh38: DNase Clusters and Transcription Factor ChIP-Seq peaks. For missense coding variants in hg19/GRCh37 and hg38/GRCh38, dbNSFP (Liu ) provides scores from several tools that predict likelihood of harm from an amino acid change. Variant identifiers from dbSNP (Wheeler ) are added if the variant coordinates match. Conservation scores and elements from phastCons (Siepel ) and scores from phyloP (Pollard ) can be added if available. The user may add filters to reduce the volume of output, for example restricting the output to annotations with a particular consequence type or by overlap with common variants from dbSNP or conserved elements. Output may be either an HTML-formatted table in the web browser window, or tab-separated text that can be viewed in the web browser window or downloaded as a file, optionally compressed by gzip. Columns are comparable to the output of the Variant Effect Predictor (McLaren ). In order to make it clear to users that the VAI is only a research tool, and in no way should be used to inform medical decisions, a dialog pops up the first time a user gets output from the VAI, requiring a click-through agreement.

4 Conclusion

The DI and VAI offer two new, complementary ways to interactively mine data from the UCSC Genome Browser database, making a useful addition to the existing Table Browser. Future plans for the DI include adding selection from related database tables where applicable, drag-reorder of output columns, filters on inputs and outputs and more options for configuring intersection of items. Future plans for the VAI include VCF output, HGVS notation (http://www.hgvs.org/mutnomen) and more annotation choices.
  15 in total

1.  The UCSC Table Browser data retrieval tool.

Authors:  Donna Karolchik; Angela S Hinrichs; Terrence S Furey; Krishna M Roskin; Charles W Sugnet; David Haussler; W James Kent
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

2.  The UCSC Genome Browser Database.

Authors:  D Karolchik; R Baertsch; M Diekhans; T S Furey; A Hinrichs; Y T Lu; K M Roskin; M Schwartz; C W Sugnet; D J Thomas; R J Weber; D Haussler; W J Kent
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

3.  A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3.

Authors:  Pablo Cingolani; Adrian Platts; Le Lily Wang; Melissa Coon; Tung Nguyen; Luan Wang; Susan J Land; Xiangyi Lu; Douglas M Ruden
Journal:  Fly (Austin)       Date:  2012 Apr-Jun       Impact factor: 2.160

4.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes.

Authors:  Adam Siepel; Gill Bejerano; Jakob S Pedersen; Angie S Hinrichs; Minmei Hou; Kate Rosenbloom; Hiram Clawson; John Spieth; Ladeana W Hillier; Stephen Richards; George M Weinstock; Richard K Wilson; Richard A Gibbs; W James Kent; Webb Miller; David Haussler
Journal:  Genome Res       Date:  2005-07-15       Impact factor: 9.043

5.  dbNSFP v3.0: A One-Stop Database of Functional Predictions and Annotations for Human Nonsynonymous and Splice-Site SNVs.

Authors:  Xiaoming Liu; Chunlei Wu; Chang Li; Eric Boerwinkle
Journal:  Hum Mutat       Date:  2016-01-05       Impact factor: 4.878

6.  The variant call format and VCFtools.

Authors:  Petr Danecek; Adam Auton; Goncalo Abecasis; Cornelis A Albers; Eric Banks; Mark A DePristo; Robert E Handsaker; Gerton Lunter; Gabor T Marth; Stephen T Sherry; Gilean McVean; Richard Durbin
Journal:  Bioinformatics       Date:  2011-06-07       Impact factor: 6.937

7.  The Sequence Ontology: a tool for the unification of genome annotations.

Authors:  Karen Eilbeck; Suzanna E Lewis; Christopher J Mungall; Mark Yandell; Lincoln Stein; Richard Durbin; Michael Ashburner
Journal:  Genome Biol       Date:  2005-04-29       Impact factor: 13.583

8.  Database resources of the National Center for Biotechnology Information.

Authors:  David L Wheeler; Tanya Barrett; Dennis A Benson; Stephen H Bryant; Kathi Canese; Vyacheslav Chetvernin; Deanna M Church; Michael DiCuccio; Ron Edgar; Scott Federhen; Lewis Y Geer; Yuri Kapustin; Oleg Khovayko; David Landsman; David J Lipman; Thomas L Madden; Donna R Maglott; James Ostell; Vadim Miller; Kim D Pruitt; Gregory D Schuler; Edwin Sequeira; Steven T Sherry; Karl Sirotkin; Alexandre Souvorov; Grigory Starchenko; Roman L Tatusov; Tatiana A Tatusova; Lukas Wagner; Eugene Yaschenko
Journal:  Nucleic Acids Res       Date:  2006-12-14       Impact factor: 16.971

9.  An integrated encyclopedia of DNA elements in the human genome.

Authors: 
Journal:  Nature       Date:  2012-09-06       Impact factor: 49.962

10.  The UCSC Genome Browser database: 2016 update.

Authors:  Matthew L Speir; Ann S Zweig; Kate R Rosenbloom; Brian J Raney; Benedict Paten; Parisa Nejad; Brian T Lee; Katrina Learned; Donna Karolchik; Angie S Hinrichs; Steve Heitner; Rachel A Harte; Maximilian Haeussler; Luvina Guruvadoo; Pauline A Fujita; Christopher Eisenhart; Mark Diekhans; Hiram Clawson; Jonathan Casper; Galt P Barber; David Haussler; Robert M Kuhn; W James Kent
Journal:  Nucleic Acids Res       Date:  2015-11-20       Impact factor: 16.971

View more
  29 in total

1.  Screening the full leucocyte receptor complex genomic region revealed associations with pemphigus that might be explained by gene regulation.

Authors:  Ticiana Della Justina Farias; Danillo G Augusto; Rodrigo Coutinho de Almeida; Danielle Malheiros; Maria Luiza Petzl-Erler
Journal:  Immunology       Date:  2018-10-11       Impact factor: 7.397

2.  GSuite HyperBrowser: integrative analysis of dataset collections across the genome and epigenome.

Authors:  Boris Simovski; Daniel Vodák; Sveinung Gundersen; Diana Domanska; Abdulrahman Azab; Lars Holden; Marit Holden; Ivar Grytten; Knut Rand; Finn Drabløs; Morten Johansen; Antonio Mora; Christin Lund-Andersen; Bastian Fromm; Ragnhild Eskeland; Odd Stokke Gabrielsen; Egil Ferkingstad; Sigve Nakken; Mads Bengtsen; Alexander Johan Nederbragt; Hildur Sif Thorarensen; Johannes Andreas Akse; Ingrid Glad; Eivind Hovig; Geir Kjetil Sandve
Journal:  Gigascience       Date:  2017-07-01       Impact factor: 6.524

3.  Recent, full-length gene retrocopies are common in canids.

Authors:  Kevin Batcher; Scarlett Varney; Daniel York; Matthew Blacksmith; Jeffrey M Kidd; Robert Rebhun; Peter Dickinson; Danika Bannasch
Journal:  Genome Res       Date:  2022-08-12       Impact factor: 9.438

4.  Selective role of the DNA helicase Mcm5 in BMP retrograde signaling during Drosophila neuronal differentiation.

Authors:  Irene Rubio-Ferrera; Pablo Baladrón-de-Juan; Luis Clarembaux-Badell; Marta Truchado-Garcia; Sheila Jordán-Álvarez; Stefan Thor; Jonathan Benito-Sipos; Ignacio Monedero Cobeta
Journal:  PLoS Genet       Date:  2022-06-23       Impact factor: 6.020

5.  Variant interpretation: UCSC Genome Browser Recommended Track Sets.

Authors:  Anna Benet-Pagès; Kate R Rosenbloom; Luis R Nassar; Christopher M Lee; Brian J Raney; Hiram Clawson; Daniel Schmelter; Jonathan Casper; Jairo Navarro Gonzalez; Gerardo Perez; Brian T Lee; Ann S Zweig; W James Kent; Maximillian Haeussler; Robert M Kuhn
Journal:  Hum Mutat       Date:  2022-02-07       Impact factor: 4.700

6.  ELAVL2-regulated transcriptional and splicing networks in human neurons link neurodevelopment and autism.

Authors:  Stefano Berto; Noriyoshi Usui; Genevieve Konopka; Brent L Fogel
Journal:  Hum Mol Genet       Date:  2016-06-03       Impact factor: 6.150

7.  A Dementia-Associated Risk Variant near TMEM106B Alters Chromatin Architecture and Gene Expression.

Authors:  Michael D Gallagher; Marijan Posavi; Peng Huang; Travis L Unger; Yosef Berlyand; Analise L Gruenewald; Alessandra Chesi; Elisabetta Manduchi; Andrew D Wells; Struan F A Grant; Gerd A Blobel; Christopher D Brown; Alice S Chen-Plotkin
Journal:  Am J Hum Genet       Date:  2017-10-19       Impact factor: 11.025

8.  Metabogenomics reveals four candidate regions involved in the pathophysiology of Equine Metabolic Syndrome.

Authors:  Laura Patterson Rosa; Martha F Mallicote; Maureen T Long; Samantha A Brooks
Journal:  Mol Cell Probes       Date:  2020-07-10       Impact factor: 2.365

9.  The UCSC Genome Browser database: 2018 update.

Authors:  Jonathan Casper; Ann S Zweig; Chris Villarreal; Cath Tyner; Matthew L Speir; Kate R Rosenbloom; Brian J Raney; Christopher M Lee; Brian T Lee; Donna Karolchik; Angie S Hinrichs; Maximilian Haeussler; Luvina Guruvadoo; Jairo Navarro Gonzalez; David Gibson; Ian T Fiddes; Christopher Eisenhart; Mark Diekhans; Hiram Clawson; Galt P Barber; Joel Armstrong; David Haussler; Robert M Kuhn; W James Kent
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

10.  Prognostic Roles of Phosphofructokinase Platelet in Clear Cell Renal Cell Carcinoma and Correlation with Immune Infiltration.

Authors:  Bin Liu; Faping Li; Mingdi Liu; Zhixiang Xu; Baoshan Gao; Yishu Wang; Honglan Zhou
Journal:  Int J Gen Med       Date:  2021-07-20
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.