Literature DB >> 26657631

HaploReg v4: systematic mining of putative causal variants, cell types, regulators and target genes for human complex traits and disease.

Lucas D Ward1, Manolis Kellis2.   

Abstract

More than 90% of common variants associated with complex traits do not affect proteins directly, but instead the circuits that control gene expression. This has increased the urgency of understanding the regulatory genome as a key component for translating genetic results into mechanistic insights and ultimately therapeutics. To address this challenge, we developed HaploReg (http://compbio.mit.edu/HaploReg) to aid the functional dissection of genome-wide association study (GWAS) results, the prediction of putative causal variants in haplotype blocks, the prediction of likely cell types of action, and the prediction of candidate target genes by systematic mining of comparative, epigenomic and regulatory annotations. Since first launching the website in 2011, we have greatly expanded HaploReg, increasing the number of chromatin state maps to 127 reference epigenomes from ENCODE 2012 and Roadmap Epigenomics, incorporating regulator binding data, expanding regulatory motif disruption annotations, and integrating expression quantitative trait locus (eQTL) variants and their tissue-specific target genes from GTEx, Geuvadis, and other recent studies. We present these updates as HaploReg v4, and illustrate a use case of HaploReg for attention deficit hyperactivity disorder (ADHD)-associated SNPs with putative brain regulatory mechanisms.
© The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 26657631      PMCID: PMC4702929          DOI: 10.1093/nar/gkv1340

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Phenotype-associated loci from genome-wide association studies (GWAS) are usually non-coding, and functionally interpreting them is a challenge due to linkage disequilibrium (LD) and our almost complete inability to predict regulatory function directly from non-coding sequence. Therefore, regulatory genomic data such as maps of enhancers and transcription factor binding sites are essential to interpreting GWAS, developing mechanistic hypotheses, and ultimately understanding the genetic architecture of complex traits and disease (1–3). For human geneticists, these regulatory data can be unwieldy to translate from a genome browser to insights about a set of genomically-dispersed disease variants. HaploReg (4) integrates regulatory genomic maps together in the context of haplotype blocks, allowing researchers to intersect regulatory elements with genetic variants to quickly formulate functional hypotheses, both through dissection of multiple variants within a haplotype block and through global enrichment analysis of a set of associated loci. HaploReg annotation of GWAS has successfully been applied for haplotype fine-mapping (5–9) and enrichment analysis (7,10,11).

DATA AND INTERFACE UPDATES

HaploReg has been expanded substantially since it first launched in 2011. Here we describe the updates that have been incorporated in Haploreg v4 in response to new research in regulatory genomics and feedback from users.

Catalog of variants

HaploReg v4 defines a core set of 52 054 804 variants, consisting primarily of single-nucleotide polymorphisms (SNPs) using all refSNP IDs, hg19 positions and alleles from dbSNP release b137 (12). Corresponding hg38 coordinates for these variants were obtained from dbSNP release b141. This core set of dbSNP variants was integrated with other data sets either by rsID (for GWAS, eQTL and 1000 Genomes data) or by intersecting intervals by coordinate using the BEDTools software package (13) (for all other functional tracks.) Linkage disequilibrium was calculated using phased low-coverage whole-genome autosomal sequences for four ancestral super-populations (AFR, AMR, ASN and EUR) from the 1000 Genomes Project Phase 1 release (14), using a search space of all variants within 250 kilobases of each other. Allele frequencies were also obtained for each population. Location of variants relative to genes was calculated using BEDTools and both GENCODE (15) and RefSeq (16).

Genome-wide association studies

GWAS were obtained from the EBI-NHGRI GWAS Catalog (17) (downloaded 30 October 2015). When there were multiple GWAS for the same trait, a trait-wide pruning was performed to retain only the strongest (lowest P-value) GWAS result from all studies on that trait, when two results from different studies were overlapping or within one megabase of each other.

Sequence conservation

Mammalian evolutionarily constrained elements are defined as originally reported, using both SiPhy elements (18) and GERP elements (19). Both of these comparative genomics studies report both base-level conservation scores as well as discretized elements; we chose to report discretized elements resulting from the authors’ algorithms for the sake of simplicity and interpretability. A colored cell represents that that the element is conserved according to the algorithm.

Regulatory protein binding

Protein-binding sites from a variety of cell types and experimental conditions was obtained from the ENCODE Project ChIP-Seq data (20), processed by the narrowPeak algorithm.

Reference epigenomes

Epigenomic data from the Roadmap Epigenomics project (11) for the following data sets were included: ChromHMM states corresponding to enhancer or promoter elements, from the 15-state core model and 25-state model incorporating imputed data (21); histone modification ChIP-seq peaks using the gappedPeak algorithm for H3K27ac, H3K9ac, H3K9me1 and H3K9me3; and DNase hypersensitivity data peaks using the narrowPeak algorithm.

Expression quantitative trait loci

Expression QTL (eQTL) results were obtained from the GTEx pilot analysis v6 (22), the GEUVADIS project (23) and 12 other studies (10,24–34) in order to annotate variants with their putative regulatory target genes and the tissue(s) in which genotype has been associated with gene expression level. A wide range of QTLs, including eQTLs and other molecular QTLs such as metabolite QTLs, were also extracted from the GRASP database, build 2.0.0.0 (35,36).

Regulatory motifs

A library of position weight matrices from commercial, literature and motif-finding analysis of the ENCODE project (37) was used to score the effect of variants on regulatory motifs using the position weight matrix (PWM)-scanning process described previously (4).

Enrichment analysis

For a given set of lead SNPs from a GWAS or user-input SNPs, the overlap of SNPs with predicted enhancers in each reference epigenome is assessed. Users have four different options for defining enhancers, available in the option panel: using the 15-state core model, using the 25-state model incorporating imputed epigenomes, using H3K4me1 peaks and using H3K27ac peaks. The overlap with enhancers in each cell type is compared to two background models to assess enrichment: all 1000 Genomes variants with a frequency above 5% in any population and all independent GWAS catalog SNPs. The enrichment relative to these background frequencies is performed using a binomial test and uncorrected P-values are reported in an enrichment table underneath the haplotype views.

USE EXAMPLE

To become acquainted with HaploReg, use the GWAS drop-down menu to select ‘Attention deficit hyperactivity disorder (Lesch KP, 2008, 26 SNPs)’ and select ‘Submit’. Notice that the first two haplotype blocks from this study (38) are driven by lead SNPs with the same P-value = 1 × 10−8. Go to the second haplotype result, for lead SNP rs864643 (Figure 1). Note that the top row in the haplotype block shows the SNP rs561543, and that it has LD of r2 = 0.81 and D′ = 0.95 with the lead variant rs864643. It overlaps with an HMM-predicted enhancer in four major tissue types; hover over ‘4 tissues’ in that row to see a variety of enhancer tissues, including brain. Note that there is also an experiment with HNF4 protein bound by ChIP-Seq, 9 QTL results and an HNF4 motif disruption.
Figure 1.

Example of haplotype summary view.

Example of haplotype summary view. Notice the enrichment results at the bottom of the page below the haplotype results. Note that the strongest enrichment for enhancers (as defined by the 15-state core ChromHMM model) is in the angular gyrus sample from brain, with binomial P = 2.0 × 10−6 relative to all common SNPs. Then go to the entry on the block for the lead SNP itself, rs864643. Click on the rsid, which is colored red because it is the lead SNP. Note that in the full table of epigenomic information from Roadmap Epigenomics (11), there is a cluster of enhancer activity in brain, and that it is classified as a genic enhancer by the 15-state core model and transcribed 3′ enhancer by the 25-state model (Figure 2). Note that H3K4me1, H3K27ac and H3K9ac all contribute to the chromatin state assignment at this locus. Black cells on the right hand of this part of the table indicate that DNase was not assayed by Roadmap in these tissues.
Figure 2.

Example of epigenome details in a SNP detail view.

Example of epigenome details in a SNP detail view. Go to the bottom of the detail page for rs864643. Note that the SNP has been correlated with MOBP expression in two brain tissues (29), MPRL15 expression in blood (39) and serum ratio of allantoin to quinate (40); all three of these studies were curated by GRASP and found by cross-referencing this SNP to its database (35,36) (Figure 3). Looking at studies individually curated by HaploReg, notice that the SNP has been associated with differential expression of a single exon of RPSA in lymphoblastoid cells by the GEUVADIS study (23). In the motif table, note that the SNP changes the match to the p300 PWM, ATTAYRWCA, with the alternate allele changing a match to the fourth A to a G. Hover over the ‘p300_disc’ ID to see that the motif was discovered using the Trawler algorithm on a p300 ChIP-Seq experiment in HeLa cells from the ENCODE dataset (37).
Figure 3.

Example of eQTL and motif alteration details from a SNP detail view.

Example of eQTL and motif alteration details from a SNP detail view. These lines of evidence suggest regulatory mechanisms by which the SNPs from this GWAS may affect the complex phenotype of ADHD. While individually each piece of evidence is relatively weak, they offer ways in which molecular biologists could proceed with further experiments that would more definitively establish mechanisms. For example, the GWAS-wide enrichment suggests global differential gene regulation in angular gyrus, which has been associated with hyperactivation in ADHD by fMRI (41) and suggests a tissue to study gene expression directly in animal models. ChIP-seq and motif data suggest specifically testing HNF4 binding differentially to the alleles of rs561543, and the strong motif coupled with eQTL data suggest looking at whether p300 binds differentially to rs864643 in a brain tissue model. Finally, MOBP eQTL evidence suggests experiments to dissect the mechanism of MOBP differential expression, perhaps modulated by p300 at rs864643 and suggests that it may be useful to perform ADHD-relevant behavioral assays of MOBP-deficient mice, which do not show an overt behavioral phenotype (42).
  42 in total

1.  Common genetic variants modulate pathogen-sensing responses in human dendritic cells.

Authors:  Mark N Lee; Chun Ye; Alexandra-Chloé Villani; Towfique Raj; Weibo Li; Thomas M Eisenhaure; Selina H Imboywa; Portia I Chipendo; F Ann Ran; Kamil Slowikowski; Lucas D Ward; Khadir Raddassi; Cristin McCabe; Michelle H Lee; Irene Y Frohlich; David A Hafler; Manolis Kellis; Soumya Raychaudhuri; Feng Zhang; Barbara E Stranger; Christophe O Benoist; Philip L De Jager; Aviv Regev; Nir Hacohen
Journal:  Science       Date:  2014-03-07       Impact factor: 47.728

2.  BEDTools: a flexible suite of utilities for comparing genomic features.

Authors:  Aaron R Quinlan; Ira M Hall
Journal:  Bioinformatics       Date:  2010-01-28       Impact factor: 6.937

3.  Molecular genetics of adult ADHD: converging evidence from genome-wide association and extended pedigree linkage studies.

Authors:  Klaus-Peter Lesch; Nina Timmesfeld; Tobias J Renner; Rebecca Halperin; Christoph Röser; T Trang Nguyen; David W Craig; Jasmin Romanos; Monika Heine; Jobst Meyer; Christine Freitag; Andreas Warnke; Marcel Romanos; Helmut Schäfer; Susanne Walitza; Andreas Reif; Dietrich A Stephan; Christian Jacob
Journal:  J Neural Transm (Vienna)       Date:  2008-10-07       Impact factor: 3.575

4.  Innate immune activity conditions the effect of regulatory variants upon monocyte gene expression.

Authors:  Peter Humburg; Seiko Makino; Benjamin P Fairfax; Vivek Naranbhai; Daniel Wong; Evelyn Lau; Luke Jostins; Katharine Plant; Robert Andrews; Chris McGee; Julian C Knight
Journal:  Science       Date:  2014-03-07       Impact factor: 47.728

5.  Systematic discovery and characterization of regulatory motifs in ENCODE TF binding experiments.

Authors:  Pouya Kheradpour; Manolis Kellis
Journal:  Nucleic Acids Res       Date:  2013-12-13       Impact factor: 16.971

6.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations.

Authors:  Danielle Welter; Jacqueline MacArthur; Joannella Morales; Tony Burdett; Peggy Hall; Heather Junkins; Alan Klemm; Paul Flicek; Teri Manolio; Lucia Hindorff; Helen Parkinson
Journal:  Nucleic Acids Res       Date:  2013-12-06       Impact factor: 16.971

7.  Genome-wide identification of expression quantitative trait loci (eQTLs) in human heart.

Authors:  Tamara T Koopmann; Michiel E Adriaens; Perry D Moerland; Roos F Marsman; Margriet L Westerveld; Sean Lal; Taifang Zhang; Christine Q Simmons; Istvan Baczko; Cristobal dos Remedios; Nanette H Bishopric; Andras Varro; Alfred L George; Elisabeth M Lodder; Connie R Bezzina
Journal:  PLoS One       Date:  2014-05-20       Impact factor: 3.240

8.  The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease.

Authors:  Janan T Eppig; Judith A Blake; Carol J Bult; James A Kadin; Joel E Richardson
Journal:  Nucleic Acids Res       Date:  2014-10-27       Impact factor: 16.971

9.  Genetic variability in the regulation of gene expression in ten regions of the human brain.

Authors:  Adaikalavan Ramasamy; Daniah Trabzuni; Sebastian Guelfi; Vibin Varghese; Colin Smith; Robert Walker; Tisham De; Lachlan Coin; Rohan de Silva; Mark R Cookson; Andrew B Singleton; John Hardy; Mina Ryten; Michael E Weale
Journal:  Nat Neurosci       Date:  2014-08-31       Impact factor: 24.884

10.  Mapping the genetic architecture of gene expression in human liver.

Authors:  Eric E Schadt; Cliona Molony; Eugene Chudin; Ke Hao; Xia Yang; Pek Y Lum; Andrew Kasarskis; Bin Zhang; Susanna Wang; Christine Suver; Jun Zhu; Joshua Millstein; Solveig Sieberts; John Lamb; Debraj GuhaThakurta; Jonathan Derry; John D Storey; Iliana Avila-Campillo; Mark J Kruger; Jason M Johnson; Carol A Rohl; Atila van Nas; Margarete Mehrabian; Thomas A Drake; Aldons J Lusis; Ryan C Smith; F Peter Guengerich; Stephen C Strom; Erin Schuetz; Thomas H Rushmore; Roger Ulrich
Journal:  PLoS Biol       Date:  2008-05-06       Impact factor: 8.029

View more
  371 in total

1.  Genetic-risk assessment of GWAS-derived susceptibility loci for type 2 diabetes in a 10 year follow-up of a population-based cohort study.

Authors:  Min Jin Go; Young Lee; Suyeon Park; Soo Heon Kwak; Bong-Jo Kim; Juyoung Lee
Journal:  J Hum Genet       Date:  2016-07-21       Impact factor: 3.172

2.  Association of ITGAX and ITGAM gene polymorphisms with susceptibility to IgA nephropathy.

Authors:  Dianchun Shi; Zhong Zhong; Ricong Xu; Bin Li; Jianbo Li; Ullah Habib; Yuan Peng; Haiping Mao; Zhijian Li; Fengxian Huang; Xueqing Yu; Ming Li
Journal:  J Hum Genet       Date:  2019-06-21       Impact factor: 3.172

3.  Investigation of Leukocyte Telomere Length and Genetic Variants in Chromosome 5p15.33 as Prognostic Markers in Lung Cancer.

Authors:  Linda Kachuri; Jens Helby; Stig Egil Bojesen; David C Christiani; Li Su; Xifeng Wu; Adonina Tardón; Guillermo Fernández-Tardón; John K Field; Michael P Davies; Chu Chen; Gary E Goodman; Frances A Shepherd; Natasha B Leighl; Ming S Tsao; Yonathan Brhane; M Catherine Brown; Kevin Boyd; Daniel Shepshelovich; Lei Sun; Christopher I Amos; Geoffrey Liu; Rayjean J Hung
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2019-07       Impact factor: 4.254

4.  Novel genetic variants in KIF16B and NEDD4L in the endosome-related genes are associated with nonsmall cell lung cancer survival.

Authors:  Sen Yang; Dongfang Tang; Yu C Zhao; Hongliang Liu; Sheng Luo; Thomas E Stinchcombe; Carolyn Glass; Li Su; Sipeng Shen; David C Christiani; Qiming Wang; Qingyi Wei
Journal:  Int J Cancer       Date:  2019-12-19       Impact factor: 7.396

5.  An Osteoporosis Risk SNP at 1p36.12 Acts as an Allele-Specific Enhancer to Modulate LINC00339 Expression via Long-Range Loop Formation.

Authors:  Xiao-Feng Chen; Dong-Li Zhu; Man Yang; Wei-Xin Hu; Yuan-Yuan Duan; Bing-Jie Lu; Yu Rong; Shan-Shan Dong; Ruo-Han Hao; Jia-Bin Chen; Yi-Xiao Chen; Shi Yao; Hlaing Nwe Thynn; Yan Guo; Tie-Lin Yang
Journal:  Am J Hum Genet       Date:  2018-04-26       Impact factor: 11.025

6.  A Common Type 2 Diabetes Risk Variant Potentiates Activity of an Evolutionarily Conserved Islet Stretch Enhancer and Increases C2CD4A and C2CD4B Expression.

Authors:  Ina Kycia; Brooke N Wolford; Jeroen R Huyghe; Christian Fuchsberger; Swarooparani Vadlamudi; Romy Kursawe; Ryan P Welch; Ricardo d'Oliveira Albanus; Asli Uyar; Shubham Khetan; Nathan Lawlor; Mohan Bolisetty; Anubhuti Mathur; Johanna Kuusisto; Markku Laakso; Duygu Ucar; Karen L Mohlke; Michael Boehnke; Francis S Collins; Stephen C J Parker; Michael L Stitzel
Journal:  Am J Hum Genet       Date:  2018-04-05       Impact factor: 11.025

7.  Meta-analysis of genome-wide association studies of aggressive and chronic periodontitis identifies two novel risk loci.

Authors:  Matthias Munz; Gesa M Richter; Bruno G Loos; Søren Jepsen; Kimon Divaris; Steven Offenbacher; Alexander Teumer; Birte Holtfreter; Thomas Kocher; Corinna Bruckmann; Yvonne Jockel-Schneider; Christian Graetz; Ilyas Ahmad; Ingmar Staufenbiel; Nathalie van der Velde; André G Uitterlinden; Lisette C P G M de Groot; Jürgen Wellmann; Klaus Berger; Bastian Krone; Per Hoffmann; Matthias Laudes; Wolfgang Lieb; Andre Franke; Jeanette Erdmann; Henrik Dommisch; Arne S Schaefer
Journal:  Eur J Hum Genet       Date:  2018-09-14       Impact factor: 4.246

8.  Genome-wide scan identifies opioid overdose risk locus close to MCOLN1.

Authors:  Zhongshan Cheng; Bao-Zhu Yang; Hang Zhou; Yaira Nunez; Henry R Kranzler; Joel Gelernter
Journal:  Addict Biol       Date:  2019-07-30       Impact factor: 4.280

9.  A Comprehensive cis-eQTL Analysis Revealed Target Genes in Breast Cancer Susceptibility Loci Identified in Genome-wide Association Studies.

Authors:  Xingyi Guo; Weiqiang Lin; Jiandong Bao; Qiuyin Cai; Xiao Pan; Mengqiu Bai; Yuan Yuan; Jiajun Shi; Yaqiong Sun; Mi-Ryung Han; Jing Wang; Qi Liu; Wanqing Wen; Bingshan Li; Jirong Long; Jianghua Chen; Wei Zheng
Journal:  Am J Hum Genet       Date:  2018-05-03       Impact factor: 11.025

10.  Comprehensive functional annotation of susceptibility variants associated with asthma.

Authors:  Yadu Gautam; Yashira Afanador; Sudhir Ghandikota; Tesfaye B Mersha
Journal:  Hum Genet       Date:  2020-04-02       Impact factor: 4.132

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.