Literature DB >> 31233103

PhenoScanner V2: an expanded tool for searching human genotype-phenotype associations.

Mihir A Kamat1, James A Blackshaw1, Robin Young1, Praveen Surendran1, Stephen Burgess1,2, John Danesh1,3,4, Adam S Butterworth1,4, James R Staley1,5.   

Abstract

SUMMARY: PhenoScanner is a curated database of publicly available results from large-scale genetic association studies in humans. This online tool facilitates 'phenome scans', where genetic variants are cross-referenced for association with many phenotypes of different types. Here we present a major update of PhenoScanner ('PhenoScanner V2'), including over 150 million genetic variants and more than 65 billion associations (compared to 350 million associations in PhenoScanner V1) with diseases and traits, gene expression, metabolite and protein levels, and epigenetic markers. The query options have been extended to include searches by genes, genomic regions and phenotypes, as well as for genetic variants. All variants are positionally annotated using the Variant Effect Predictor and the phenotypes are mapped to Experimental Factor Ontology terms. Linkage disequilibrium statistics from the 1000 Genomes project can be used to search for phenotype associations with proxy variants.
AVAILABILITY AND IMPLEMENTATION: PhenoScanner V2 is available at www.phenoscanner.medschl.cam.ac.uk.
© The Author(s) 2019. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2019        PMID: 31233103      PMCID: PMC6853652          DOI: 10.1093/bioinformatics/btz469

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Dense array-based human genetic studies, such as genome-wide association studies (GWAS), have identified many thousands of associations between genetic variants and a diverse set of phenotypes. The challenge now facing the human genomics community is to understand the mechanisms underlying these associations. One approach to aid biological insight into disease mechanisms is to cross-reference genetic associations across a range of phenotypes, including disease states, cellular traits and other intermediate traits. To enable such ‘phenome scans’ we developed the online tool PhenoScanner (Staley ). Since its release in 2016, PhenoScanner has been accessed by hundreds of users to assist a range of analyses from analyses linking proteins to disease (Sun ) to interrogating novel loci associated with blood cell phenotypes (Astle ). In recent years, there has been a rapid expansion in the availability of genetic association statistics with the maturation of genetic biobanks with rich phenotypic information. Moreover, the scope of molecular phenotypes in genetic association studies has increased with the publication of multi-tissue gene expression GWAS (GTEx Consortium ) and GWAS of thousands of plasma proteins (Sun ). However, integrating genetic associations across this vast array of data sources remains challenging. Hence, to facilitate improved ‘phenome scans’, we have released an updated version of PhenoScanner (PhenoScanner V2) with new features including: (i) an expanded database of human genotype–phenotype associations associations split into phenotype classes (diseases and traits, gene expression, proteins, metabolites and epigenetics); (ii) additional search options including gene, genomic region and phenotype-based queries; (iii) linkage disequilibrium (LD) information for the five super-ancestries in 1000 Genomes; (vi) variant annotation and trait ontology mappings and (v) a brand new web interface and API.

2 Materials and methods

PhenoScanner V2 consists of a Python-R interface which connects to a series of MySQL databases. To develop the catalogue of human genotype–phenotype associations, we identified and collated >5000 genetic association datasets from publicly available lists of full summary associations results compiled by the NHGRI-EBI (https://www.ebi.ac.uk/gwas/downloads/summary-statistics) and NHLBI (https://grasp.nhlbi.nih.gov/FullResults.aspx), as well as from recent literature reviews and lists of omics GWAS (e.g. Sun for protein levels). The catalogue currently contains results for diseases and traits (∼30 billion associations), gene expression (∼84 million associations), protein levels (∼35 billion associations), metabolite levels (∼3 billion associations) and epigenetic markers (∼13 million associations). To ensure consistent formatting across datasets, all of the variants were aligned to the NCBI plus strand, rsIDs were updated to dbSNP 147 (Sherry ) and chromosome-positions [GRCh37 (hg19) and GRCh38 (hg38)] were added or updated using dbSNP 147 and liftOver (https://genome.ucsc.edu/cgi-bin/hgLiftOver). LD measures between neighbouring variants in the autosomal chromosomes were calculated using phased haplotypes for the five super-ancestries (European, African, Admixed American, East Asian and South Asian) in the 1000 Genomes Project phase 3 (1000 Genomes Project Consortium ). We calculated D' and r2 for pairs of variants within 500 Kb and kept LD statistics with r2 ≥ 0.5. All phenotypes were mapped to Experimental Factor Ontology terms (Malone ) using ZOOMA (https://www.ebi.ac.uk/spot/zooma/). Variant and gene annotation for all of the variants was performed using Ensembl Variant Effect Predictor V88 (McLaren ) with GENCODE transcripts V26 (Harrow ) mapped to build 37 positions. Nearest genes for intergenic variants were retrieved using the BEDOPS tool version 2.4.26 (Neph ). Users may enter one genetic variant, gene, genomic region or trait into the text box on the home page (www.phenoscanner.medschl.cam.ac.uk) or upload up to 100 genetic variants, 10 genes or 10 genomic regions as a tab-delimited text file. PhenoScanner V2 also has an API with an associated R package and Python command line tool (www.phenoscanner.medschl.cam.ac.uk/tools), allowing users to search for genotype–phenotype associations from PhenoScanner V2 inside R or from a terminal. When querying genetic variants, all results regardless of P-value can be displayed allowing the user to identify evidence against associations with phenotypes. To produce manageable results sets, only results with P < 1 × 10−5 are returned for queries of genes, genomic regions or phenotypes. Once a query is evoked, the Python-R interface annotates the genetic variant, gene, genomic region or phenotype using dbSNP (or ZOOMA for trait queries), before searching the requested association databases and filtering the results based on the specified P-value threshold. The new web interface then presents the results and makes them available to download. All associations for each genetic variant are aligned such that the effect allele is the same across all results. The associations with proxy variants are aligned such that their effect alleles are given with respect to the effect allele of the corresponding queried variant.

3 Results

To demonstrate the value of the expanded database and additional functionality of PhenoScanner V2, we searched for ‘rs10840293’, ‘SWAP70’ and ‘coronary heart disease’. PhenoScanner V2 found >150 000 results with rs10840293 (variant annotation: intronic variant in SWAP70) or one of its proxies (r2 ≥ 0.8 in Europeans), more than 100 times the number of associations found for the same variant query using PhenoScanner V1 (1405 associations); the NHGRI-EBI GWAS Catalog (MacArthur ) only returns four results for rs10840293. In particular, PhenoScanner V2 identifies strong associations of rs10840293 with coronary heart disease (van der Harst and Verweij, 2018), blood pressure (https://www.nealelab.is/uk-biobank) and platelet width (Astle ), as well as with whole blood gene expression (Võsa ) and plasma protein levels (Sun ) of SWAP70 (all with P < 5 × 10−8), suggesting a possible blood pressure related mechanism affecting coronary heart disease risk at this locus potentially regulated via SWAP70 expression. Variants in the SWAP70 gene had >6000 associations with P < 1 × 10−5 (compared with 27 associations found by the GWAS Catalog), while there were >50 000 genetic associations with coronary heart disease with P < 1 × 10−5 across the genome (compared with 1092 associations found by the GWAS Catalog).

4 Conclusion

PhenoScanner V2 is a large curated database of human genotype–phenotype associations from publicly available genetic association studies. This catalogue of results greatly extends PhenoScanner V1 in both scale and phenotypic breadth, with tables of genetic associations for diseases and traits, gene expression, protein levels, metabolites levels and epigenetic markers. PhenoScanner V2 also has additional annotation and functionality. The database can now be searched for genes, genomic regions and traits, while variant annotation, phenotype ontology mappings and LD statistics from a wider range of ethnic groups have been incorporated to enhance utility and interpretation.

Funding

This work was supported by the UK Medical Research Council [G0800270, MR/L003120/1]; the British Heart Foundation [SP/09/002, RG/13/13/30194, RG/18/13/33946]; Pfizer [G73632]; the European Research Council [268834]; the European Commission Framework Programme 7 [HEALTH-F2-2012-279233]; the National Institute for Health Research; and Health Data Research UK. The views expressed are those of the authors and not necessarily those of the NHS or the NIHR. Conflict of Interest: none declared.
  12 in total

1.  dbSNP: the NCBI database of genetic variation.

Authors:  S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  BEDOPS: high-performance genomic feature operations.

Authors:  Shane Neph; M Scott Kuehn; Alex P Reynolds; Eric Haugen; Robert E Thurman; Audra K Johnson; Eric Rynes; Matthew T Maurano; Jeff Vierstra; Sean Thomas; Richard Sandstrom; Richard Humbert; John A Stamatoyannopoulos
Journal:  Bioinformatics       Date:  2012-05-09       Impact factor: 6.937

3.  GENCODE: the reference human genome annotation for The ENCODE Project.

Authors:  Jennifer Harrow; Adam Frankish; Jose M Gonzalez; Electra Tapanari; Mark Diekhans; Felix Kokocinski; Bronwen L Aken; Daniel Barrell; Amonida Zadissa; Stephen Searle; If Barnes; Alexandra Bignell; Veronika Boychenko; Toby Hunt; Mike Kay; Gaurab Mukherjee; Jeena Rajan; Gloria Despacio-Reyes; Gary Saunders; Charles Steward; Rachel Harte; Michael Lin; Cédric Howald; Andrea Tanzer; Thomas Derrien; Jacqueline Chrast; Nathalie Walters; Suganthi Balasubramanian; Baikang Pei; Michael Tress; Jose Manuel Rodriguez; Iakes Ezkurdia; Jeltje van Baren; Michael Brent; David Haussler; Manolis Kellis; Alfonso Valencia; Alexandre Reymond; Mark Gerstein; Roderic Guigó; Tim J Hubbard
Journal:  Genome Res       Date:  2012-09       Impact factor: 9.043

4.  PhenoScanner: a database of human genotype-phenotype associations.

Authors:  James R Staley; James Blackshaw; Mihir A Kamat; Steve Ellis; Praveen Surendran; Benjamin B Sun; Dirk S Paul; Daniel Freitag; Stephen Burgess; John Danesh; Robin Young; Adam S Butterworth
Journal:  Bioinformatics       Date:  2016-06-17       Impact factor: 6.937

5.  The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog).

Authors:  Jacqueline MacArthur; Emily Bowler; Maria Cerezo; Laurent Gil; Peggy Hall; Emma Hastings; Heather Junkins; Aoife McMahon; Annalisa Milano; Joannella Morales; Zoe May Pendlington; Danielle Welter; Tony Burdett; Lucia Hindorff; Paul Flicek; Fiona Cunningham; Helen Parkinson
Journal:  Nucleic Acids Res       Date:  2016-11-29       Impact factor: 16.971

6.  The Allelic Landscape of Human Blood Cell Trait Variation and Links to Common Complex Disease.

Authors:  William J Astle; Heather Elding; Tao Jiang; Dave Allen; Dace Ruklisa; Alice L Mann; Daniel Mead; Heleen Bouman; Fernando Riveros-Mckay; Myrto A Kostadima; John J Lambourne; Suthesh Sivapalaratnam; Kate Downes; Kousik Kundu; Lorenzo Bomba; Kim Berentsen; John R Bradley; Louise C Daugherty; Olivier Delaneau; Kathleen Freson; Stephen F Garner; Luigi Grassi; Jose Guerrero; Matthias Haimel; Eva M Janssen-Megens; Anita Kaan; Mihir Kamat; Bowon Kim; Amit Mandoli; Jonathan Marchini; Joost H A Martens; Stuart Meacham; Karyn Megy; Jared O'Connell; Romina Petersen; Nilofar Sharifi; Simon M Sheard; James R Staley; Salih Tuna; Martijn van der Ent; Klaudia Walter; Shuang-Yin Wang; Eleanor Wheeler; Steven P Wilder; Valentina Iotchkova; Carmel Moore; Jennifer Sambrook; Hendrik G Stunnenberg; Emanuele Di Angelantonio; Stephen Kaptoge; Taco W Kuijpers; Enrique Carrillo-de-Santa-Pau; David Juan; Daniel Rico; Alfonso Valencia; Lu Chen; Bing Ge; Louella Vasquez; Tony Kwan; Diego Garrido-Martín; Stephen Watt; Ying Yang; Roderic Guigo; Stephan Beck; Dirk S Paul; Tomi Pastinen; David Bujold; Guillaume Bourque; Mattia Frontini; John Danesh; David J Roberts; Willem H Ouwehand; Adam S Butterworth; Nicole Soranzo
Journal:  Cell       Date:  2016-11-17       Impact factor: 41.582

7.  Genomic atlas of the human plasma proteome.

Authors:  Benjamin B Sun; Joseph C Maranville; James E Peters; David Stacey; James R Staley; James Blackshaw; Stephen Burgess; Tao Jiang; Ellie Paige; Praveen Surendran; Clare Oliver-Williams; Mihir A Kamat; Bram P Prins; Sheri K Wilcox; Erik S Zimmerman; An Chi; Narinder Bansal; Sarah L Spain; Angela M Wood; Nicholas W Morrell; John R Bradley; Nebojsa Janjic; David J Roberts; Willem H Ouwehand; John A Todd; Nicole Soranzo; Karsten Suhre; Dirk S Paul; Caroline S Fox; Robert M Plenge; John Danesh; Heiko Runz; Adam S Butterworth
Journal:  Nature       Date:  2018-06-06       Impact factor: 49.962

8.  The Ensembl Variant Effect Predictor.

Authors:  William McLaren; Laurent Gil; Sarah E Hunt; Harpreet Singh Riat; Graham R S Ritchie; Anja Thormann; Paul Flicek; Fiona Cunningham
Journal:  Genome Biol       Date:  2016-06-06       Impact factor: 13.583

9.  Genetic effects on gene expression across human tissues.

Authors:  Alexis Battle; Christopher D Brown; Barbara E Engelhardt; Stephen B Montgomery
Journal:  Nature       Date:  2017-10-11       Impact factor: 49.962

10.  Identification of 64 Novel Genetic Loci Provides an Expanded View on the Genetic Architecture of Coronary Artery Disease.

Authors:  Pim van der Harst; Niek Verweij
Journal:  Circ Res       Date:  2017-12-06       Impact factor: 17.367

View more
  205 in total

1.  Educational attainment protects against type 2 diabetes independently of cognitive performance: a Mendelian randomization study.

Authors:  Jialin Liang; Huan Cai; Ganxiong Liang; Zhonghua Liu; Liang Fang; Baile Zhu; Baoying Liu; Hao Zhang
Journal:  Acta Diabetol       Date:  2021-01-06       Impact factor: 4.280

2.  Genetic evidence for the causal association between programmed death-ligand 1 and lung cancer.

Authors:  Zhao Yang; Rong Yu; Wei Deng; Weihu Wang
Journal:  J Cancer Res Clin Oncol       Date:  2021-07-29       Impact factor: 4.553

3.  Circulating Glutamine and Alzheimer's Disease: A Mendelian Randomization Study.

Authors:  Charleen D Adams
Journal:  Clin Interv Aging       Date:  2020-02-10       Impact factor: 4.458

4.  Assessment Causality in Associations Between Serum Uric Acid and Risk of Schizophrenia: A Two-Sample Bidirectional Mendelian Randomization Study.

Authors:  Qianqian Luo; Zheng Wen; Yuanfan Li; Zefeng Chen; Xinyang Long; Yulan Bai; Shengzhu Huang; Yunkun Yan; Rui Lin; Zengnan Mo
Journal:  Clin Epidemiol       Date:  2020-02-26       Impact factor: 4.790

5.  Direct characterization of cis-regulatory elements and functional dissection of complex genetic associations using HCR-FlowFISH.

Authors:  Steven K Reilly; Sager J Gosai; Alan Gutierrez; Ava Mackay-Smith; Jacob C Ulirsch; Masahiro Kanai; Kousuke Mouri; Daniel Berenzy; Susan Kales; Gina M Butler; Adrianne Gladden-Young; Redwan M Bhuiyan; Michael L Stitzel; Hilary K Finucane; Pardis C Sabeti; Ryan Tewhey
Journal:  Nat Genet       Date:  2021-07-29       Impact factor: 38.330

6.  Haplotype-resolved diverse human genomes and integrated analysis of structural variation.

Authors:  Peter Ebert; Peter A Audano; Qihui Zhu; Bernardo Rodriguez-Martin; Charles Lee; Jan O Korbel; Tobias Marschall; Evan E Eichler; David Porubsky; Marc Jan Bonder; Arvis Sulovari; Jana Ebler; Weichen Zhou; Rebecca Serra Mari; Feyza Yilmaz; Xuefang Zhao; PingHsun Hsieh; Joyce Lee; Sushant Kumar; Jiadong Lin; Tobias Rausch; Yu Chen; Jingwen Ren; Martin Santamarina; Wolfram Höps; Hufsah Ashraf; Nelson T Chuang; Xiaofei Yang; Katherine M Munson; Alexandra P Lewis; Susan Fairley; Luke J Tallon; Wayne E Clarke; Anna O Basile; Marta Byrska-Bishop; André Corvelo; Uday S Evani; Tsung-Yu Lu; Mark J P Chaisson; Junjie Chen; Chong Li; Harrison Brand; Aaron M Wenger; Maryam Ghareghani; William T Harvey; Benjamin Raeder; Patrick Hasenfeld; Allison A Regier; Haley J Abel; Ira M Hall; Paul Flicek; Oliver Stegle; Mark B Gerstein; Jose M C Tubio; Zepeng Mu; Yang I Li; Xinghua Shi; Alex R Hastie; Kai Ye; Zechen Chong; Ashley D Sanders; Michael C Zody; Michael E Talkowski; Ryan E Mills; Scott E Devine
Journal:  Science       Date:  2021-02-25       Impact factor: 47.728

7.  Multi-omics data integration and network-based analysis drives a multiplex drug repurposing approach to a shortlist of candidate drugs against COVID-19.

Authors:  Marios Tomazou; Marilena M Bourdakou; George Minadakis; Margarita Zachariou; Anastasis Oulas; Evangelos Karatzas; Eleni M Loizidou; Andrea C Kakouri; Christiana C Christodoulou; Kyriaki Savva; Maria Zanti; Anna Onisiforou; Sotiroula Afxenti; Jan Richter; Christina G Christodoulou; Theodoros Kyprianou; George Kolios; Nikolas Dietis; George M Spyrou
Journal:  Brief Bioinform       Date:  2021-11-05       Impact factor: 11.622

8.  Circulating Levels of Insulin-like Growth Factor 1 and Insulin-like Growth Factor Binding Protein 3 Associate With Risk of Colorectal Cancer Based on Serologic and Mendelian Randomization Analyses.

Authors:  Neil Murphy; Robert Carreras-Torres; Mingyang Song; Andrew T Chan; Richard M Martin; Nikos Papadimitriou; Niki Dimou; Konstantinos K Tsilidis; Barbara Banbury; Kathryn E Bradbury; Jelena Besevic; Sabina Rinaldi; Elio Riboli; Amanda J Cross; Ruth C Travis; Claudia Agnoli; Demetrius Albanes; Sonja I Berndt; Stéphane Bézieau; D Timothy Bishop; Hermann Brenner; Daniel D Buchanan; N Charlotte Onland-Moret; Andrea Burnett-Hartman; Peter T Campbell; Graham Casey; Sergi Castellví-Bel; Jenny Chang-Claude; María-Dolores Chirlaque; Albert de la Chapelle; Dallas English; Jane C Figueiredo; Steven J Gallinger; Graham G Giles; Stephen B Gruber; Andrea Gsur; Jochen Hampe; Heather Hampel; Tabitha A Harrison; Michael Hoffmeister; Li Hsu; Wen-Yi Huang; Jeroen R Huyghe; Mark A Jenkins; Temitope O Keku; Tilman Kühn; Sun-Seog Kweon; Loic Le Marchand; Christopher I Li; Li Li; Annika Lindblom; Vicente Martín; Roger L Milne; Victor Moreno; Polly A Newcomb; Kenneth Offit; Shuji Ogino; Jennifer Ose; Vittorio Perduca; Amanda I Phipps; Elizabeth A Platz; John D Potter; Conghui Qu; Gad Rennert; Lori C Sakoda; Clemens Schafmayer; Robert E Schoen; Martha L Slattery; Catherine M Tangen; Cornelia M Ulrich; Franzel J B van Duijnhoven; Bethany Van Guelpen; Kala Visvanathan; Pavel Vodicka; Ludmila Vodickova; Veronika Vymetalkova; Hansong Wang; Emily White; Alicja Wolk; Michael O Woods; Anna H Wu; Wei Zheng; Ulrike Peters; Marc J Gunter
Journal:  Gastroenterology       Date:  2019-12-27       Impact factor: 22.682

9.  Genome-wide Association Study for Vitamin D Levels Reveals 69 Independent Loci.

Authors:  Despoina Manousaki; Ruth Mitchell; Tom Dudding; Simon Haworth; Adil Harroud; Vincenzo Forgetta; Rupal L Shah; Jian'an Luan; Claudia Langenberg; Nicholas J Timpson; J Brent Richards
Journal:  Am J Hum Genet       Date:  2020-02-13       Impact factor: 11.025

10.  Influence of genetic variants in FADS2 and ELOVL2 genes on BMI and PUFAs homeostasis in children and adolescents with obesity.

Authors:  Alice Maguolo; Chiara Zusi; Alice Giontella; Emanuele Miraglia Del Giudice; Angela Tagetti; Cristiano Fava; Anita Morandi; Claudio Maffeis
Journal:  Int J Obes (Lond)       Date:  2020-08-25       Impact factor: 5.095

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.