Literature DB >> 30520965

Global Biobank Engine: enabling genotype-phenotype browsing for biobank summary statistics.

Gregory McInnes1, Yosuke Tanigawa1,2, Chris DeBoever2, Adam Lavertu1, Julia Eve Olivieri3, Matthew Aguirre2, Manuel A Rivas2.   

Abstract

SUMMARY: Large biobanks linking phenotype to genotype have led to an explosion of genetic association studies across a wide range of phenotypes. Sharing the knowledge generated by these resources with the scientific community remains a challenge due to patient privacy and the vast amount of data. Here, we present Global Biobank Engine (GBE), a web-based tool that enables exploration of the relationship between genotype and phenotype in biobank cohorts, such as the UK Biobank. GBE supports browsing for results from genome-wide association studies, phenome-wide association studies, gene-based tests and genetic correlation between phenotypes. We envision GBE as a platform that facilitates the dissemination of summary statistics from biobanks to the scientific and clinical communities.
AVAILABILITY AND IMPLEMENTATION: GBE currently hosts data from the UK Biobank and can be found freely available at biobankengine.stanford.edu.
© The Author(s) 2018. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2019        PMID: 30520965      PMCID: PMC6612820          DOI: 10.1093/bioinformatics/bty999

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Population-scale biobanks linking rich phenotype and molecular data are transforming the landscape of biomedical research. UK Biobank, a long-term prospective cohort study, has collected array-genotyped variants from 500 000 individuals and linked it with medical records, activity monitors, imaging and survey data (Sudlow ). Availability of these data enables researchers to perform analyses across a broad range of phenotypes at an unprecedented scale (Bycroft, 2017). The value of large sequencing and genotyping efforts lies not only in primary publications but also in the dissemination of summary statistic data to the scientific community. Other large-scale efforts to sequence and analyze genetic data, such as ExAC and gnomAD (Lek ), have made data available to the scientific community at large available via web browsers (Karczewski ). Browsers serve as an effective communication tool that enable researchers around the world to interrogate genetic statistics of interest. Often, these tools limit the information shared to summary statistics which confers a decreased privacy risk for individuals included in the study (Erlich and Narayanan, 2014) and limits the computational resources required to interrogate the data. However, to date no such tool exists that offers researchers the opportunity to study the relationship between genotype and phenotype. Here, we present Global Biobank Engine (GBE), a web-based tool that presents summary statistics resulting from analysis of genotype-phenotype associations derived from data in population-scale biobanks. GBE serves as a means to communicate scientific discoveries to the scientific community without requiring sharing of individual-level data. In particular, we present results from genome-wide association studies (GWAS) and phenome-wide association studies (PheWAS) for White British individuals (n = 337 199) in UK Biobank, gene-level phenotype associations, genetic correlations and others. Results for each analysis are pre-computed allowing for rapid browsing. Phenotypes currently available in the browser are those made available by UK Biobank, including cancer, disease status, family history of disease, medication, quantitative measures, as well as computational grouping of phenotypes based on self-reported data and ICD 10 codes from hospital in-patient record data (as described in DeBoever ). We encourage use of GBE but note that case-control results are provided as general guides and may not have been subjected to the data quality, statistical and population genetics review that would normally be required for publication of clinical inference.

2 Features

GBE serves as a platform to host summary statistics that explore different facets of biobank data. Here, we describe the features available.

2.1 Phenotype page

The phenotype page presents a summary of the results of a GWAS run for a phenotype of interest. The first part of the page displays relevant data such as the sample count included in the GWAS as well as links to other analyses related to this phenotype (Fig. 1.A1). Next, the Manhattan plot is displayed including all variants with P-value < 0.001 (Fig. 1.A2). Finally, a table is included with detailed information for each variant is included. The table can be subsetted by all variants, protein truncating variants (PTVs) only, or both PTVs and missense variants.
Fig. 1.

Screenshots of phenotype page (left) and variant page (right). Shown here is the phenotype page for asthma in the UK Biobank and the variant page for the protein-truncating variant rs146597587 in IL33 found to protect against asthma. (A1) Summary of phenotype information including sample count and links to other analyses. (A2) Manhattan plot displaying significance of association of each variant. (A3) Detailed variant information is summarized in a table. (B1) Variant summary and link-outs to external references. (B2) Manhattan plot for a PheWAS. Phenotypes are binned by category. (B3) Effect size estimate plot of the log (OR) for each phenotype. (B4) Variant annotations and links to associated genes. (B5) Figures can manipulated using the tools provided

Screenshots of phenotype page (left) and variant page (right). Shown here is the phenotype page for asthma in the UK Biobank and the variant page for the protein-truncating variant rs146597587 in IL33 found to protect against asthma. (A1) Summary of phenotype information including sample count and links to other analyses. (A2) Manhattan plot displaying significance of association of each variant. (A3) Detailed variant information is summarized in a table. (B1) Variant summary and link-outs to external references. (B2) Manhattan plot for a PheWAS. Phenotypes are binned by category. (B3) Effect size estimate plot of the log (OR) for each phenotype. (B4) Variant annotations and links to associated genes. (B5) Figures can manipulated using the tools provided

2.2 Variant page

The variant page presents the annotation of a genetic variant (Fig. 1.B4), links to external resources (Fig. 1.B1) and two plots summarizing the results from PheWAS analysis of the variant. The PheWAS Manhattan plot on the top presents the statistical significance of associations (Fig. 1.B2) while the effect size plot on the bottom presents the log odds-ratio and regression coefficient for binary and continuous traits, respectively (Fig. 1.B3). The phenotypes in the plots are sorted by their category and can be subset by P-values. One can export the plots to image files to facilitate scientific communication (Fig. 1.B5).

2.3 Gene page

The gene page presents a summary of all genotype–phenotype statistics related to a single gene. This page includes a Manhattan plot which displays each variant in the gene region and the phenotype with the lowest P-value for that variant as well as a table summarizing additional variant information. The page also includes a figure showing the top five most related phenotypes by a rare variant aggregate analysis, MRP (DeBoever ). The MRP results are generated using coding variants with less than 1% minor allele frequency for each gene.

2.4 Genetic correlation page

GBE includes an interactive application for browsing genetic correlation estimates for pairs of traits from the UK Biobank. Genetic correlations have been estimated by applying the multi-variate polygenic mixture model (MVPMM) to GWAS summary statistics for more than one million pairs of traits and can be visualized using the app (DeBoever ). Users can select phenotypes of interest and filter results that are displayed by the app by applying statistical thresholds. MVPMM also estimates other genetic parameters including polygenicity and scale of effects which can be seen by mousing over the plot.

2.5 HLA alleles page

The HLA alleles page shows posterior probabilities of causal associations between 175 HLA allelotypes and 270 diseases in the UK Biobank. For each allelotype there is a plot showing the log odds ratio with a 95% confidence interval for each associated phenotype with posterior probability greater than 0.7. Users can also view donut charts displaying the frequencies of allelotypes at each locus. For more detailed description of all the analyses available please see the website FAQ (https://biobankengine.stanford.edu/faq).

3 Implementation

GBE extends the ExAC browser (Karczewski ) which is built in Python, utilizes Flask framework and uses d3 and plot.ly for plot rendering. One change made in our implementation is the use of a SciDB backend to host the summary statistic data presented in the browser (Rivers, 2017). We found SciDB to have superior performance with the large amount of data that needs to be stored and queried.

4 Availability

GBE browsing capabilities are now publicly available at biobankengine.stanford.edu.

5 Future directions

GBE is under active development. Here, we describe several areas of improvement. We are developing improved search functionality for phenotypes and variants; current search is limited by availability of variants and phenotypes within the database. We aim to incorporate more genetic annotations and filtering options, such as filtering by regulatory regions. At this time the data hosted within GBE is limited to the UK Biobank, we are working to streamline the incorporation of more data sources. As more biobanks come online we aim to include summary statistics from any available source. Finally, we plan to open source the GBE code repository in order to allow users to create their own private version of GBE.
  5 in total

Review 1.  Routes for breaching and protecting genetic privacy.

Authors:  Yaniv Erlich; Arvind Narayanan
Journal:  Nat Rev Genet       Date:  2014-05-08       Impact factor: 53.242

2.  UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age.

Authors:  Cathie Sudlow; John Gallacher; Naomi Allen; Valerie Beral; Paul Burton; John Danesh; Paul Downey; Paul Elliott; Jane Green; Martin Landray; Bette Liu; Paul Matthews; Giok Ong; Jill Pell; Alan Silman; Alan Young; Tim Sprosen; Tim Peakman; Rory Collins
Journal:  PLoS Med       Date:  2015-03-31       Impact factor: 11.069

3.  Analysis of protein-coding genetic variation in 60,706 humans.

Authors:  Monkol Lek; Konrad J Karczewski; Eric V Minikel; Kaitlin E Samocha; Eric Banks; Timothy Fennell; Anne H O'Donnell-Luria; James S Ware; Andrew J Hill; Beryl B Cummings; Taru Tukiainen; Daniel P Birnbaum; Jack A Kosmicki; Laramie E Duncan; Karol Estrada; Fengmei Zhao; James Zou; Emma Pierce-Hoffman; Joanne Berghout; David N Cooper; Nicole Deflaux; Mark DePristo; Ron Do; Jason Flannick; Menachem Fromer; Laura Gauthier; Jackie Goldstein; Namrata Gupta; Daniel Howrigan; Adam Kiezun; Mitja I Kurki; Ami Levy Moonshine; Pradeep Natarajan; Lorena Orozco; Gina M Peloso; Ryan Poplin; Manuel A Rivas; Valentin Ruano-Rubio; Samuel A Rose; Douglas M Ruderfer; Khalid Shakir; Peter D Stenson; Christine Stevens; Brett P Thomas; Grace Tiao; Maria T Tusie-Luna; Ben Weisburd; Hong-Hee Won; Dongmei Yu; David M Altshuler; Diego Ardissino; Michael Boehnke; John Danesh; Stacey Donnelly; Roberto Elosua; Jose C Florez; Stacey B Gabriel; Gad Getz; Stephen J Glatt; Christina M Hultman; Sekar Kathiresan; Markku Laakso; Steven McCarroll; Mark I McCarthy; Dermot McGovern; Ruth McPherson; Benjamin M Neale; Aarno Palotie; Shaun M Purcell; Danish Saleheen; Jeremiah M Scharf; Pamela Sklar; Patrick F Sullivan; Jaakko Tuomilehto; Ming T Tsuang; Hugh C Watkins; James G Wilson; Mark J Daly; Daniel G MacArthur
Journal:  Nature       Date:  2016-08-18       Impact factor: 49.962

4.  The ExAC browser: displaying reference data information from over 60 000 exomes.

Authors:  Konrad J Karczewski; Ben Weisburd; Brett Thomas; Matthew Solomonson; Douglas M Ruderfer; David Kavanagh; Tymor Hamamsy; Monkol Lek; Kaitlin E Samocha; Beryl B Cummings; Daniel Birnbaum; Mark J Daly; Daniel G MacArthur
Journal:  Nucleic Acids Res       Date:  2016-11-28       Impact factor: 16.971

5.  Medical relevance of protein-truncating variants across 337,205 individuals in the UK Biobank study.

Authors:  Christopher DeBoever; Yosuke Tanigawa; Malene E Lindholm; Greg McInnes; Adam Lavertu; Erik Ingelsson; Chris Chang; Euan A Ashley; Carlos D Bustamante; Mark J Daly; Manuel A Rivas
Journal:  Nat Commun       Date:  2018-04-24       Impact factor: 14.919

  5 in total
  35 in total

1.  Effect of familial diabetes status and age at diagnosis on type 2 diabetes risk: a nation-wide register-based study from Denmark.

Authors:  Omar Silverman-Retana; Adam Hulman; Jannie Nielsen; Claus T Ekstrøm; Bendix Carstensen; Rebecca K Simmons; Lasse Bjerg; Luke W Johnston; Daniel R Witte
Journal:  Diabetologia       Date:  2020-02-19       Impact factor: 10.122

2.  Phenome-wide Burden of Copy-Number Variation in the UK Biobank.

Authors:  Matthew Aguirre; Manuel A Rivas; James Priest
Journal:  Am J Hum Genet       Date:  2019-07-25       Impact factor: 11.025

Review 3.  Recent advances in developing therapeutics for cystic fibrosis.

Authors:  Lisa J Strug; Anne L Stephenson; Naim Panjwani; Ann Harris
Journal:  Hum Mol Genet       Date:  2018-08-01       Impact factor: 6.150

4.  Pneumonia: host susceptibility and shared genetics with pulmonary function and other traits.

Authors:  M B Khadzhieva; A N Kuzovlev; L E Salnikova
Journal:  Clin Exp Immunol       Date:  2019-10-01       Impact factor: 4.330

5.  Genetic Regulation of Atherosclerosis-Relevant Phenotypes in Human Vascular Smooth Muscle Cells.

Authors:  Redouane Aherrahrou; Liang Guo; V Peter Nagraj; Aaron Aguhob; Jameson Hinkle; Lisa Chen; Joon Yuhl Soh; Dillon Lue; Gabriel F Alencar; Arjan Boltjes; Sander W van der Laan; Emily Farber; Daniela Fuller; Rita Anane-Wae; Ngozi Akingbesote; Ani W Manichaikul; Lijiang Ma; Minna U Kaikkonen; Johan L M Björkegren; Suna Önengüt-Gümüşcü; Gerard Pasterkamp; Clint L Miller; Gary K Owens; Aloke Finn; Mohamad Navab; Alan M Fogelman; Judith A Berliner; Mete Civelek
Journal:  Circ Res       Date:  2020-10-12       Impact factor: 17.367

6.  Integrating Mouse and Human Genetic Data to Move beyond GWAS and Identify Causal Genes in Cholesterol Metabolism.

Authors:  Zhonggang Li; James A Votava; Gregory J M Zajac; Jenny N Nguyen; Fernanda B Leyva Jaimes; Sophia M Ly; Jacqueline A Brinkman; Marco De Giorgi; Sushma Kaul; Cara L Green; Samantha L St Clair; Sabrina L Belisle; Julia M Rios; David W Nelson; Mary G Sorci-Thomas; William R Lagor; Dudley W Lamming; Chi-Liang Eric Yen; Brian W Parks
Journal:  Cell Metab       Date:  2020-03-19       Impact factor: 27.287

7.  Association of EGLN1 gene with high aerobic capacity of Peruvian Quechua at high altitude.

Authors:  Tom D Brutsaert; Melisa Kiyamu; Gianpietro Elias Revollendo; Jenna L Isherwood; Frank S Lee; Maria Rivera-Ch; Fabiola Leon-Velarde; Sudipta Ghosh; Abigail W Bigham
Journal:  Proc Natl Acad Sci U S A       Date:  2019-11-11       Impact factor: 11.205

8.  Fast Lasso method for large-scale and ultrahigh-dimensional Cox model with applications to UK Biobank.

Authors:  Ruilin Li; Christopher Chang; Johanne M Justesen; Yosuke Tanigawa; Junyang Qian; Trevor Hastie; Manuel A Rivas; Robert Tibshirani
Journal:  Biostatistics       Date:  2022-04-13       Impact factor: 5.899

9.  Sex-specific genetic effects across biomarkers.

Authors:  Emily Flynn; Yosuke Tanigawa; Fatima Rodriguez; Russ B Altman; Nasa Sinnott-Armstrong; Manuel A Rivas
Journal:  Eur J Hum Genet       Date:  2020-09-01       Impact factor: 4.246

10.  ERICH3: vesicular association and antidepressant treatment response.

Authors:  Duan Liu; Yongxian Zhuang; Lingxin Zhang; Huanyao Gao; Drew Neavin; Tania Carrillo-Roa; Yani Wang; Jia Yu; Sisi Qin; Daniel C Kim; Erica Liu; Thanh Thanh Le Nguyen; Joanna M Biernacka; Rima Kaddurah-Daouk; Boadie W Dunlop; W Edward Craighead; Helen S Mayberg; Elisabeth B Binder; Mark A Frye; Liewei Wang; Richard M Weinshilboum
Journal:  Mol Psychiatry       Date:  2020-11-23       Impact factor: 13.437

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.