Literature DB >> 26315906

An interactive genome browser of association results from the UK10K cohorts project.

Matthias Geihs1, Ying Yan1, Klaudia Walter1, Jie Huang1, Yasin Memari1, Josine L Min2, Daniel Mead1, Tim J Hubbard3, Nicholas J Timpson2, Thomas A Down4, Nicole Soranzo5.   

Abstract

UNLABELLED: High-throughput sequencing technologies survey genetic variation at genome scale and are increasingly used to study the contribution of rare and low-frequency genetic variants to human traits. As part of the Cohorts arm of the UK10K project, genetic variants called from low-read depth (average 7×) whole genome sequencing of 3621 cohort individuals were analysed for statistical associations with 64 different phenotypic traits of biomedical importance. Here, we describe a novel genome browser based on the Biodalliance platform developed to provide interactive access to the association results of the project.
AVAILABILITY AND IMPLEMENTATION: The browser is available at http://www.uk10k.org/dalliance.html. Source code for the Biodalliance platform is available under a BSD license from http://github.com/dasmoth/dalliance, and for the LD-display plugin and backend from http://github.com/dasmoth/ldserv.
© The Author 2015. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2015        PMID: 26315906      PMCID: PMC4673976          DOI: 10.1093/bioinformatics/btv491

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Rare and low-frequency genetic variants play an important role in determining population variance of complex traits and disease; however, until recently their systematic evaluation has been beyond the reach of empirical population-based genetic studies. The UK10K project was designed to characterize rare and low-frequency variation in the UK genome wide, and study its contribution to a broad spectrum of biomedically relevant quantitative traits and diseases with different predicted genetic architectures. The data generated by the different arms of the UK10K project and their use are described elsewhere (UK10K Consortium, 2015). Here, we describe the development of a novel browser for genetic association data based on the Biodalliance platform (Down ), designed to facilitate the retrieval of genotype–phenotype association results from the UK10K-cohorts arm of the project, and their visualization in the context of different annotation features (Fig. 1). In particular, we developed a novel interactive display for genetic variants showing both the strength of association with a trait, and the pattern of linkage disequilibrium (LD) within the cohort.
Fig. 1.

Example screenshot of the UK10K Genome Browser. The panel shows a 30-kb region of the human genome where UK10K SNPs associate with HDL Cholesterol with high P-values around the CETP gene. Colouring identifies groups of independent SNPs

Example screenshot of the UK10K Genome Browser. The panel shows a 30-kb region of the human genome where UK10K SNPs associate with HDL Cholesterol with high P-values around the CETP gene. Colouring identifies groups of independent SNPs

2 Implementation

Biodalliance is a pure JavaScript genome browser, using HTML5 canvas for displays. Biodalliance’s preferred approach for fetching data is to use indexed binary files such as bigWig or bigBed files (Kent ) which can be made available from normal web-servers and are frequently organized as track-hubs (Raney ). Because Biodalliance is pure JavaScript it can be embedded in any normal webpage making it easy to create custom browsers such as presented here. However, creating a display showing genetic association data posed additional challenges. Firstly, there is no perfect file format for representing the association results themselves (which comprise both a P-value from the association test and additional information about the SNP). Secondly, our desire to present LD data in a flexible, interactive manner does not fit well with the usual indexed-file approach of retrieving a single block of data corresponding to some range of genomic coordinates. To represent the association results, we used a pair of binary files. Firstly, a bigWig file contains the association test result (expressed as a log P-value) for each position in the genome where a variant was tested. A bigBed file containing the variant ID and other information such as SNP consequence predictions supplements this. We added support in the core Biodalliance code for merging multiple data sources into a single track. The Biodalliance renderer and stylesheet system runs on this merged dataset, so in our association tracks we can show association score (from a bigWig file) as the y-axis position of the feature, but choose a point style based on the SNP consequence (from the bigBed file). The y-axis encoding had the added benefit of enabling a feature in the browser to search for the next variant in the genome above an association score threshold (exploiting information encoded in the reduced resolution views of bigWig files) or by identifier (using a name index in the bigBed file). Dynamic display of LD information poses a different challenge since all-against-all data are unwieldy but we do not know in advance which variants users might want to select as reference. One option would be to create a Biodalliance plug-in that calculates LD on the client side. However, this would require complete genotype data for the cohort, which is also very large, and would raise data security concerns. Instead, we developed a simple server component for LD calculation on the fly for a selected (or multiple) reference variant(s), allowing the genotype data itself to remain securely on the server. We also developed Biodalliance plug-ins for selecting reference variants, and for communicating with the LD server. Once again, the capability to merge results from multiple data sources proved useful, and the LD scores simply get merged into the rest of the feature data before it is passed to the Biodalliance renderer. Having extended the Biodalliance code in this way, bigWig and files containing P-values were prepared for each combination of the traits in Supplementary Appendix S1, and the statistical tests in Supplementary Appendix S2. A description of the statistical tests implemented is given in Supplementary Appendix 3. The large number of resulting files were organized in a track-hub, which the UK10K Genome Browser was configured to use by default. The browser functionality is described in Supplementary Appendix 4. A reference summary of test statistics, and key navigation functions, is given in Supplementary Appendix 5. Finally, a step-by-step user tutorial to the browser is also provided (Supplementary Appendix 6).

3 Results

The UK10K Genome Browser exploits the key basic features of the Biodalliance Genome Browser. A number of custom features were specifically designed to aid navigation and interpretation of the UK10K-cohorts data. First, the genome browser supports dynamic estimation of local LD statistics (i.e. the metric r2) for inferring statistical independence between local association signals and one or more SNP(s) of interest. Calculations are performed in real time using the UK10K-cohorts sequence data. Second, it allows exporting of local association results into high-quality images (in scalable vector graphics, svg) that can be used in publications. Finally, it provides dynamic annotation of each SNP, including information on minor allele frequency, SNP quality metrics, as well as links to SNP annotation resources such as dbSNP and functional genome annotation track-hubs such as those generated by the ENCODE (Encode Project Consortium ), NIH Roadmap (Roadmap Epigenomics Consortium ) and BLUEPRINT (Adams ).

4 Conclusions

In summary, the UK10K browser provides an intuitive and efficient platform to access association statistics for common, low-frequency and rare variants against a large number of human phenotypic traits. As efforts progress to systematically map the contribution of human genetic variation to healthy and disease phenotypes, and to integrate it with genome functional resources, the development of platforms like ours will become essential enabling instruments for the integration and cross-validation of genetic discoveries within the scientific community.
  7 in total

1.  BigWig and BigBed: enabling browsing of large distributed datasets.

Authors:  W J Kent; A S Zweig; G Barber; A S Hinrichs; D Karolchik
Journal:  Bioinformatics       Date:  2010-07-17       Impact factor: 6.937

2.  BLUEPRINT to decode the epigenetic signature written in blood.

Authors:  David Adams; Lucia Altucci; Stylianos E Antonarakis; Juan Ballesteros; Stephan Beck; Adrian Bird; Christoph Bock; Bernhard Boehm; Elias Campo; Andrea Caricasole; Fredrik Dahl; Emmanouil T Dermitzakis; Tariq Enver; Manel Esteller; Xavier Estivill; Anne Ferguson-Smith; Jude Fitzgibbon; Paul Flicek; Claudia Giehl; Thomas Graf; Frank Grosveld; Roderic Guigo; Ivo Gut; Kristian Helin; Jonas Jarvius; Ralf Küppers; Hans Lehrach; Thomas Lengauer; Åke Lernmark; David Leslie; Markus Loeffler; Elizabeth Macintyre; Antonello Mai; Joost H A Martens; Saverio Minucci; Willem H Ouwehand; Pier Giuseppe Pelicci; Hèléne Pendeville; Bo Porse; Vardhman Rakyan; Wolf Reik; Martin Schrappe; Dirk Schübeler; Martin Seifert; Reiner Siebert; David Simmons; Nicole Soranzo; Salvatore Spicuglia; Michael Stratton; Hendrik G Stunnenberg; Amos Tanay; David Torrents; Alfonso Valencia; Edo Vellenga; Martin Vingron; Jörn Walter; Spike Willcocks
Journal:  Nat Biotechnol       Date:  2012-03-07       Impact factor: 54.908

3.  Dalliance: interactive genome viewing on the web.

Authors:  Thomas A Down; Matias Piipari; Tim J P Hubbard
Journal:  Bioinformatics       Date:  2011-01-19       Impact factor: 6.937

4.  An integrated encyclopedia of DNA elements in the human genome.

Authors: 
Journal:  Nature       Date:  2012-09-06       Impact factor: 49.962

5.  Integrative analysis of 111 reference human epigenomes.

Authors:  Anshul Kundaje; Wouter Meuleman; Jason Ernst; Misha Bilenky; Angela Yen; Alireza Heravi-Moussavi; Pouya Kheradpour; Zhizhuo Zhang; Jianrong Wang; Michael J Ziller; Viren Amin; John W Whitaker; Matthew D Schultz; Lucas D Ward; Abhishek Sarkar; Gerald Quon; Richard S Sandstrom; Matthew L Eaton; Yi-Chieh Wu; Andreas R Pfenning; Xinchen Wang; Melina Claussnitzer; Yaping Liu; Cristian Coarfa; R Alan Harris; Noam Shoresh; Charles B Epstein; Elizabeta Gjoneska; Danny Leung; Wei Xie; R David Hawkins; Ryan Lister; Chibo Hong; Philippe Gascard; Andrew J Mungall; Richard Moore; Eric Chuah; Angela Tam; Theresa K Canfield; R Scott Hansen; Rajinder Kaul; Peter J Sabo; Mukul S Bansal; Annaick Carles; Jesse R Dixon; Kai-How Farh; Soheil Feizi; Rosa Karlic; Ah-Ram Kim; Ashwinikumar Kulkarni; Daofeng Li; Rebecca Lowdon; GiNell Elliott; Tim R Mercer; Shane J Neph; Vitor Onuchic; Paz Polak; Nisha Rajagopal; Pradipta Ray; Richard C Sallari; Kyle T Siebenthall; Nicholas A Sinnott-Armstrong; Michael Stevens; Robert E Thurman; Jie Wu; Bo Zhang; Xin Zhou; Arthur E Beaudet; Laurie A Boyer; Philip L De Jager; Peggy J Farnham; Susan J Fisher; David Haussler; Steven J M Jones; Wei Li; Marco A Marra; Michael T McManus; Shamil Sunyaev; James A Thomson; Thea D Tlsty; Li-Huei Tsai; Wei Wang; Robert A Waterland; Michael Q Zhang; Lisa H Chadwick; Bradley E Bernstein; Joseph F Costello; Joseph R Ecker; Martin Hirst; Alexander Meissner; Aleksandar Milosavljevic; Bing Ren; John A Stamatoyannopoulos; Ting Wang; Manolis Kellis
Journal:  Nature       Date:  2015-02-19       Impact factor: 69.504

6.  Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome Browser.

Authors:  Brian J Raney; Timothy R Dreszer; Galt P Barber; Hiram Clawson; Pauline A Fujita; Ting Wang; Ngan Nguyen; Benedict Paten; Ann S Zweig; Donna Karolchik; W James Kent
Journal:  Bioinformatics       Date:  2013-11-13       Impact factor: 6.937

7.  The UK10K project identifies rare variants in health and disease.

Authors:  Klaudia Walter; Josine L Min; Jie Huang; Lucy Crooks; Yasin Memari; Shane McCarthy; John R B Perry; ChangJiang Xu; Marta Futema; Daniel Lawson; Valentina Iotchkova; Stephan Schiffels; Audrey E Hendricks; Petr Danecek; Rui Li; James Floyd; Louise V Wain; Inês Barroso; Steve E Humphries; Matthew E Hurles; Eleftheria Zeggini; Jeffrey C Barrett; Vincent Plagnol; J Brent Richards; Celia M T Greenwood; Nicholas J Timpson; Richard Durbin; Nicole Soranzo
Journal:  Nature       Date:  2015-09-14       Impact factor: 49.962

  7 in total
  8 in total

1.  Human genetics: Strength in numbers in the low-frequency spectrum.

Authors:  Magdalena Skipper
Journal:  Nat Rev Genet       Date:  2015-10-07       Impact factor: 53.242

2.  JASS: command line and web interface for the joint analysis of GWAS results.

Authors:  Hanna Julienne; Pierre Lechat; Vincent Guillemot; Carla Lasry; Chunzi Yao; Robinson Araud; Vincent Laville; Bjarni Vilhjalmsson; Hervé Ménager; Hugues Aschard
Journal:  NAR Genom Bioinform       Date:  2020-01-24

3.  Exome Sequencing Reveals Primary Immunodeficiencies in Children with Community-Acquired Pseudomonas aeruginosa Sepsis.

Authors:  Samira Asgari; Paul J McLaren; Jane Peake; Melanie Wong; Richard Wong; Istvan Bartha; Joshua R Francis; Katia Abarca; Kyra A Gelderman; Philipp Agyeman; Christoph Aebi; Christoph Berger; Jacques Fellay; Luregn J Schlapbach
Journal:  Front Immunol       Date:  2016-09-20       Impact factor: 7.561

4.  Toppar: an interactive browser for viewing association study results.

Authors:  Thorhildur Juliusdottir; Karina Banasik; Neil R Robertson; Richard Mott; Mark I McCarthy
Journal:  Bioinformatics       Date:  2018-06-01       Impact factor: 6.937

5.  Survey and evaluation of mutations in the human KLF1 transcription unit.

Authors:  Merlin Nithya Gnanapragasam; John D Crispino; Abdullah M Ali; Rona Weinberg; Ronald Hoffman; Azra Raza; James J Bieker
Journal:  Sci Rep       Date:  2018-04-26       Impact factor: 4.379

6.  An Improved Phenotype-Driven Tool for Rare Mendelian Variant Prioritization: Benchmarking Exomiser on Real Patient Whole-Exome Data.

Authors:  Valentina Cipriani; Nikolas Pontikos; Gavin Arno; Panagiotis I Sergouniotis; Eva Lenassi; Penpitcha Thawong; Daniel Danis; Michel Michaelides; Andrew R Webster; Anthony T Moore; Peter N Robinson; Julius O B Jacobsen; Damian Smedley
Journal:  Genes (Basel)       Date:  2020-04-23       Impact factor: 4.096

7.  The UK10K project identifies rare variants in health and disease.

Authors:  Klaudia Walter; Josine L Min; Jie Huang; Lucy Crooks; Yasin Memari; Shane McCarthy; John R B Perry; ChangJiang Xu; Marta Futema; Daniel Lawson; Valentina Iotchkova; Stephan Schiffels; Audrey E Hendricks; Petr Danecek; Rui Li; James Floyd; Louise V Wain; Inês Barroso; Steve E Humphries; Matthew E Hurles; Eleftheria Zeggini; Jeffrey C Barrett; Vincent Plagnol; J Brent Richards; Celia M T Greenwood; Nicholas J Timpson; Richard Durbin; Nicole Soranzo
Journal:  Nature       Date:  2015-09-14       Impact factor: 49.962

8.  RPAN: rice pan-genome browser for ∼3000 rice genomes.

Authors:  Chen Sun; Zhiqiang Hu; Tianqing Zheng; Kuangchen Lu; Yue Zhao; Wensheng Wang; Jianxin Shi; Chunchao Wang; Jinyuan Lu; Dabing Zhang; Zhikang Li; Chaochun Wei
Journal:  Nucleic Acids Res       Date:  2016-12-10       Impact factor: 16.971

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.