Literature DB >> 22064851

HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants.

Lucas D Ward1, Manolis Kellis.   

Abstract

The resolution of genome-wide association studies (GWAS) is limited by the linkage disequilibrium (LD) structure of the population being studied. Selecting the most likely causal variants within an LD block is relatively straightforward within coding sequence, but is more difficult when all variants are intergenic. Predicting functional non-coding sequence has been recently facilitated by the availability of conservation and epigenomic information. We present HaploReg, a tool for exploring annotations of the non-coding genome among the results of published GWAS or novel sets of variants. Using LD information from the 1000 Genomes Project, linked SNPs and small indels can be visualized along with their predicted chromatin state in nine cell types, conservation across mammals and their effect on regulatory motifs. Sets of SNPs, such as those resulting from GWAS, are analyzed for an enrichment of cell type-specific enhancers. HaploReg will be useful to researchers developing mechanistic hypotheses of the impact of non-coding variants on clinical phenotypes and normal variation. The HaploReg database is available at http://compbio.mit.edu/HaploReg.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 22064851      PMCID: PMC3245002          DOI: 10.1093/nar/gkr917

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Genome-wide association studies (GWAS) are providing a flood of data associating genetic variants with common phenotypes (1). A confounding factor in such studies is linkage disequilibrium (LD), which allows many variants at the same locus to be associated with a phenotype even if only one of them is causal. Within genes, prioritizing the likely causal variant is relatively straightforward; variants are easily annotated as synonymous, missense or nonsense, changing the consensus sequence at splice sites, or residing in introns or UTRs. Often, however, GWAS associations lie far from known genes or transcribed regions, presumably in distal tissue-specific enhancers. One of the most striking examples of such a finding is the gene desert at 8q24, within which are regions specifically and independently linked to prostate, breast, ovarian, colorectal and bladder cancer. These variants have been shown to correspond to cell-type-specific distal enhancers for the MYC oncogene (2,3). Recent systematic comparisons of expression quantitative trait loci (eQTL) and GWAS suggest that the association of intergenic variants with complex phenotyes is a result of alteration of gene expression regulatory elements (4,5). Ernst and colleagues (6) recently developed a map of chromatin states, including enhancers, promoters, insulators and heterochromatin, in nine human cell lines based on a variety of histone modifications. Using this map, it was demonstrated that these states can be used to prioritize SNPs within LD blocks associated with disease, and in some cases reveal biologically plausible enrichments for cell type-specific enhancers. Here we present a tool, HaploReg, to systematically mine these chromatin state data, along with conservation data and regulatory motif alterations. A wide range of resources exists to make predictions about the functional consequences of variants, as well as navigating groups of linked variants using LD information. Polyphen (7), SIFT (8) and SNPS3D (9) all make predictions of the impact of missense SNPs. Algorithms such as is-rSNP (10) and RAVEN (11) use regulatory motif changes to predict SNPs that may influence transcriptional regulation. SNPinfo (12) combines missense predictions with TRANSFAC PWM disruption predictions and conservation information across 17 vertebrates for HapMap Phase III SNPs. SNAP (13) provides LD calculations using 1000 Genomes Project pilot data with information about neighboring genes and array membership for proxy/tag SNP selection, but does not currently include indels. HaploReg improves on SNAP by providing LD calculation of 1000 Genomes Project pilot indels associated with query SNPs. In addition, the features of SNPinfo are improved upon by incorporating evolutionary constraint based on two alogrithms (involving the sequences of at least 29 mammals) and considering a much larger library of PWMs. The UCSC Genome Browser (14) and ENSEMBL Genome Browser (15,16) both allow genomic regions to be annotated with the results of cutting-edge genomic data, including chromatin state segmentations, ENCODE data, 1000 Genomes variants, evolutionary constraint, LD calculations and NHGRI catalog variants. However, the output of these browsers can be overwhelming, especially when one is interested only in a limited subset of loci (such as the variants linked to a GWAS hit.) To this end, HaploReg combines the focus on haplotype blocks provided by tools such as SNAP and SNPinfo with the breadth of genomic annotation provided by the full-featured genome browsers.

METHODS

HaploReg consists of a PHP interface to a MySQL database. The initial database table was populated using genomic coordinates and sequences for 16 151 841 biallelic SNPs and small indels from the pilot release of the 1000 Genomes Project (17). In some cases, such as novel indels, the variant call format (VCF) file from the pilot release did not have a RefSNP identifier (rsid); for the purpose of creating a unique identifier for this database, these variants were assigned the label of ‘chromosome:position’ in hg18 coordinates. To provide backward compatibility with obsolete rsids, dbSNP release 132 was checked for variants at the same position as 1000 Genomes pilot variants with multiple rsids (18). In addition, annotations of functional consequences were extracted from dbSNP. A variety of functional annotations were then intersected with the set of variants using the BEDTools package (19), including the chromatin state segmentation of Ernst et al. (6), and conserved regions by GERP (20) and SiPhy (21,22). To obtain gene annotations, RefSeq genes (23) were downloaded from the UCSC Genome Browser and GENCODE version 7 (24) was downloaded from the project website. BEDTools was then used to calculate the proximity of each variant to a gene by either annotation, as well as the orientation (3′ or 5′) relative to the nearest end of the gene, based on the strand of the gene. In order to annotate variants by their effect on regulatory motifs, a library of position weight matrices (PWMs) was constructed from literature sources and was scored on genomic sequences as described previously (6). Briefly, a set of PWMs was collected from TRANSFAC (25), JASPAR (26), and protein-binding microarray (PBM) experiments (27–29). The reference and alternate alleles for each of the 1000 Genomes pilot SNPs and indels were concatenated with 29 bp of genomic context on each side, using the hg18 sequence obtained from the UCSC Genome Browser (30). PWMs were then scored for instances that passed either of two thresholds, a stringent threshold of P < 4−8 and a less-stringent threshold of P < 4−7 (31). Only instances where a motif in the sequence (i) passed the stringent threshold of a PWM in either the reference or the alternate genomic sequence, and (ii) overlapped the variable nucleotide(s) (thus changing the PWM score) were considered. Then, the change in log-odds (LOD) score was calculated. In cases where the weaker match was did not pass the less-stringent threshold, an approximate minimum change of LOD score was reported, corresponding to the difference between the score of the stronger match to the score required to pass the less-stringent threshold. In cases where both allelic variants surpassed the less-stringent threshold, the exact difference in score was reported. GWAS results were obtained from the table curated by NHGRI (32) (accessed June 29, 2011.) In cases where multiple studies were annotated as pertaining to the same phenotype, unique independent SNPs were consolidated into a single list. LD was calculated using the phased genotype information accompanying the 1000 Genomes Project pilot release (17). VCFTools (33) was used to perform the calculation, using an LD threshold of r2 = 0.80, and a maximum distance between variants of 200 kb. Results from VCFTools were then consolidated such that for every variant in our database, a list of linked variants is accessible for each of the three populations, along with an r2 value. To perform enhancer enrichment analysis on sets of variants, tables of common array designs were obtained from the UCSC Table Browser (34) and lists were constructed of 1000 Genomes SNPs segregating in each of the three pilot populations, as well as all SNPs in the database. Then, a background frequency of coverage was calculated for variants annotated as overlapping a strong enhancer state in each cell type. When a user submits a query list of variants, the coverage of strong enhancers in each cell type is calculated. If the coverage exceeds that of the background set selected by the user, a binomial test is performed, and enrichment is reported if it passes an uncorrected significance threshold of 0.05.

USAGE

A user may submit queries in two formats: a comma-delimited list of rsids, or a one of the GWAS or traits from the NHGRI catalog. To illustrate (Figure 1), we select the lupus study by Han et al. (35). Since the study was conducted in Han Chinese, we select ASN (CHB + JPT) as the population for LD calculation, and we select all SNPs in the ASN population as the background for enhancer enrichment analysis. As was reported by Ernst et al. (6), there is a strong enrichment for GM12878 (lymphoblastoid) enhancers. To demonstrate LD blocks, we select an LD threshold of r2 = 0.95. In the LD block with lead SNP rs9271100, there is a SNP rs9271055 which affects an Ets-family binding site. Clicking on rs9271055 leads to a detail view (Figure 2) in which the complete chromatin state data are available. The positions in two literature motifs for Ets-family proteins can be seen, where the alternate T allele strengthens the predicted affinity relative to the reference G allele. In addition, links to NCBI RefSeq and ENSEMBL pages detailing the neighboring HLA-DRB1 gene are provided.
Figure 1.

HaploReg view of the SNPs from the lupus GWAS by Han et al.

Figure 2.

HaploReg detail view for the SNP rs9271055.

HaploReg view of the SNPs from the lupus GWAS by Han et al. HaploReg detail view for the SNP rs9271055.

FUNDING

National Institutes of Health (R01-HG004037, RC1-HG005334); National Science Foundation (0644282). Funding for open access charge: National Institutes of Health. Conflict of interest statement. None declared.
  35 in total

1.  dbSNP: the NCBI database of genetic variation.

Authors:  S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  The human genome browser at UCSC.

Authors:  W James Kent; Charles W Sugnet; Terrence S Furey; Krishna M Roskin; Tom H Pringle; Alan M Zahler; David Haussler
Journal:  Genome Res       Date:  2002-06       Impact factor: 9.043

3.  Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities.

Authors:  Michael F Berger; Anthony A Philippakis; Aaron M Qureshi; Fangxue S He; Preston W Estep; Martha L Bulyk
Journal:  Nat Biotechnol       Date:  2006-09-24       Impact factor: 54.908

4.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits.

Authors:  Lucia A Hindorff; Praveen Sethupathy; Heather A Junkins; Erin M Ramos; Jayashri P Mehta; Francis S Collins; Teri A Manolio
Journal:  Proc Natl Acad Sci U S A       Date:  2009-05-27       Impact factor: 11.205

5.  SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap.

Authors:  Andrew D Johnson; Robert E Handsaker; Sara L Pulit; Marcia M Nizzari; Christopher J O'Donnell; Paul I W de Bakker
Journal:  Bioinformatics       Date:  2008-10-30       Impact factor: 6.937

6.  Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS.

Authors:  Dan L Nicolae; Eric Gamazon; Wei Zhang; Shiwei Duan; M Eileen Dolan; Nancy J Cox
Journal:  PLoS Genet       Date:  2010-04-01       Impact factor: 5.917

7.  Ensembl 2011.

Authors:  Paul Flicek; M Ridwan Amode; Daniel Barrell; Kathryn Beal; Simon Brent; Yuan Chen; Peter Clapham; Guy Coates; Susan Fairley; Stephen Fitzgerald; Leo Gordon; Maurice Hendrix; Thibaut Hourlier; Nathan Johnson; Andreas Kähäri; Damian Keefe; Stephen Keenan; Rhoda Kinsella; Felix Kokocinski; Eugene Kulesha; Pontus Larsson; Ian Longden; William McLaren; Bert Overduin; Bethan Pritchard; Harpreet Singh Riat; Daniel Rios; Graham R S Ritchie; Magali Ruffier; Michael Schuster; Daniel Sobral; Giulietta Spudich; Y Amy Tang; Stephen Trevanion; Jana Vandrovcova; Albert J Vilella; Simon White; Steven P Wilder; Amonida Zadissa; Jorge Zamora; Bronwen L Aken; Ewan Birney; Fiona Cunningham; Ian Dunham; Richard Durbin; Xosé M Fernández-Suarez; Javier Herrero; Tim J P Hubbard; Anne Parker; Glenn Proctor; Jan Vogel; Stephen M J Searle
Journal:  Nucleic Acids Res       Date:  2010-11-02       Impact factor: 16.971

8.  A high-resolution map of human evolutionary constraint using 29 mammals.

Authors:  Kerstin Lindblad-Toh; Manuel Garber; Or Zuk; Michael F Lin; Brian J Parker; Stefan Washietl; Pouya Kheradpour; Jason Ernst; Gregory Jordan; Evan Mauceli; Lucas D Ward; Craig B Lowe; Alisha K Holloway; Michele Clamp; Sante Gnerre; Jessica Alföldi; Kathryn Beal; Jean Chang; Hiram Clawson; James Cuff; Federica Di Palma; Stephen Fitzgerald; Paul Flicek; Mitchell Guttman; Melissa J Hubisz; David B Jaffe; Irwin Jungreis; W James Kent; Dennis Kostka; Marcia Lara; Andre L Martins; Tim Massingham; Ida Moltke; Brian J Raney; Matthew D Rasmussen; Jim Robinson; Alexander Stark; Albert J Vilella; Jiayu Wen; Xiaohui Xie; Michael C Zody; Jen Baldwin; Toby Bloom; Chee Whye Chin; Dave Heiman; Robert Nicol; Chad Nusbaum; Sarah Young; Jane Wilkinson; Kim C Worley; Christie L Kovar; Donna M Muzny; Richard A Gibbs; Andrew Cree; Huyen H Dihn; Gerald Fowler; Shalili Jhangiani; Vandita Joshi; Sandra Lee; Lora R Lewis; Lynne V Nazareth; Geoffrey Okwuonu; Jireh Santibanez; Wesley C Warren; Elaine R Mardis; George M Weinstock; Richard K Wilson; Kim Delehaunty; David Dooling; Catrina Fronik; Lucinda Fulton; Bob Fulton; Tina Graves; Patrick Minx; Erica Sodergren; Ewan Birney; Elliott H Margulies; Javier Herrero; Eric D Green; David Haussler; Adam Siepel; Nick Goldman; Katherine S Pollard; Jakob S Pedersen; Eric S Lander; Manolis Kellis
Journal:  Nature       Date:  2011-10-12       Impact factor: 49.962

9.  GENCODE: producing a reference annotation for ENCODE.

Authors:  Jennifer Harrow; France Denoeud; Adam Frankish; Alexandre Reymond; Chao-Kung Chen; Jacqueline Chrast; Julien Lagarde; James G R Gilbert; Roy Storey; David Swarbreck; Colette Rossier; Catherine Ucla; Tim Hubbard; Stylianos E Antonarakis; Roderic Guigo
Journal:  Genome Biol       Date:  2006-08-07       Impact factor: 13.583

10.  Efficient and accurate P-value computation for Position Weight Matrices.

Authors:  Hélène Touzet; Jean-Stéphane Varré
Journal:  Algorithms Mol Biol       Date:  2007-12-11       Impact factor: 1.405

View more
  1235 in total

1.  Biology and Clinical Implications of the 19q13 Aggressive Prostate Cancer Susceptibility Locus.

Authors:  Ping Gao; Ji-Han Xia; Csilla Sipeky; Xiao-Ming Dong; Qin Zhang; Yuehong Yang; Peng Zhang; Sara Pereira Cruz; Kai Zhang; Jing Zhu; Hang-Mao Lee; Sufyan Suleman; Nikolaos Giannareas; Song Liu; Teuvo L J Tammela; Anssi Auvinen; Xiaoyue Wang; Qilai Huang; Liguo Wang; Aki Manninen; Markku H Vaarala; Liang Wang; Johanna Schleutker; Gong-Hong Wei
Journal:  Cell       Date:  2018-07-19       Impact factor: 41.582

2.  Genome-wide association study identifies the GLDC/IL33 locus associated with survival of osteosarcoma patients.

Authors:  Roelof Koster; Orestis A Panagiotou; William A Wheeler; Eric Karlins; Julie M Gastier-Foster; Silvia Regina Caminada de Toledo; Antonio S Petrilli; Adrienne M Flanagan; Roberto Tirabosco; Irene L Andrulis; Jay S Wunder; Nalan Gokgoz; Ana Patiño-Garcia; Fernando Lecanda; Massimo Serra; Claudia Hattinger; Piero Picci; Katia Scotlandi; David M Thomas; Mandy L Ballinger; Richard Gorlick; Donald A Barkauskas; Logan G Spector; Margaret Tucker; D Hicks Belynda; Meredith Yeager; Robert N Hoover; Sholom Wacholder; Stephen J Chanock; Sharon A Savage; Lisa Mirabello
Journal:  Int J Cancer       Date:  2017-12-23       Impact factor: 7.396

3.  Exploring the underlying biology of intrinsic cardiorespiratory fitness through integrative analysis of genomic variants and muscle gene expression profiling.

Authors:  Sujoy Ghosh; Monalisa Hota; Xiaoran Chai; Jencee Kiranya; Palash Ghosh; Zihong He; Jonathan J Ruiz-Ramie; Mark A Sarzynski; Claude Bouchard
Journal:  J Appl Physiol (1985)       Date:  2019-01-03

4.  Variants in TTC25 affect autistic trait in patients with autism spectrum disorder and general population.

Authors:  Dina Vojinovic; Nathalie Brison; Shahzad Ahmad; Ilse Noens; Irene Pappa; Lennart C Karssen; Henning Tiemeier; Cornelia M van Duijn; Hilde Peeters; Najaf Amin
Journal:  Eur J Hum Genet       Date:  2017-05-17       Impact factor: 4.246

5.  Genetic variants in the genes encoding rho GTPases and related regulators predict cutaneous melanoma-specific survival.

Authors:  Shun Liu; Yanru Wang; William Xue; Hongliang Liu; Yinghui Xu; Qiong Shi; Wenting Wu; Dakai Zhu; Christopher I Amos; Shenying Fang; Jeffrey E Lee; Terry Hyslop; Yi Li; Jiali Han; Qingyi Wei
Journal:  Int J Cancer       Date:  2017-06-01       Impact factor: 7.396

6.  Genome-wide association study identifies 25 known breast cancer susceptibility loci as risk factors for triple-negative breast cancer.

Authors:  Kristen S Purrington; Susan Slager; Diana Eccles; Drakoulis Yannoukakos; Peter A Fasching; Penelope Miron; Jane Carpenter; Jenny Chang-Claude; Nicholas G Martin; Grant W Montgomery; Vessela Kristensen; Hoda Anton-Culver; Paul Goodfellow; William J Tapper; Sajjad Rafiq; Susan M Gerty; Lorraine Durcan; Irene Konstantopoulou; Florentia Fostira; Athanassios Vratimos; Paraskevi Apostolou; Irene Konstanta; Vassiliki Kotoula; Sotiris Lakis; Meletios A Dimopoulos; Dimosthenis Skarlos; Dimitrios Pectasides; George Fountzilas; Matthias W Beckmann; Alexander Hein; Matthias Ruebner; Arif B Ekici; Arndt Hartmann; Ruediger Schulz-Wendtland; Stefan P Renner; Wolfgang Janni; Brigitte Rack; Christoph Scholz; Julia Neugebauer; Ulrich Andergassen; Michael P Lux; Lothar Haeberle; Christine Clarke; Nirmala Pathmanathan; Anja Rudolph; Dieter Flesch-Janys; Stefan Nickels; Janet E Olson; James N Ingle; Curtis Olswold; Seth Slettedahl; Jeanette E Eckel-Passow; S Keith Anderson; Daniel W Visscher; Victoria L Cafourek; Hugues Sicotte; Naresh Prodduturi; Elisabete Weiderpass; Leslie Bernstein; Argyrios Ziogas; Jennifer Ivanovich; Graham G Giles; Laura Baglietto; Melissa Southey; Veli-Matti Kosma; Hans-Peter Fischer; Malcom W R Reed; Simon S Cross; Sandra Deming-Halverson; Martha Shrubsole; Qiuyin Cai; Xiao-Ou Shu; Mary Daly; Joellen Weaver; Eric Ross; Jennifer Klemp; Priyanka Sharma; Diana Torres; Thomas Rüdiger; Heidrun Wölfing; Hans-Ulrich Ulmer; Asta Försti; Thaer Khoury; Shicha Kumar; Robert Pilarski; Charles L Shapiro; Dario Greco; Päivi Heikkilä; Kristiina Aittomäki; Carl Blomqvist; Astrid Irwanto; Jianjun Liu; Vernon Shane Pankratz; Xianshu Wang; Gianluca Severi; Arto Mannermaa; Douglas Easton; Per Hall; Hiltrud Brauch; Angela Cox; Wei Zheng; Andrew K Godwin; Ute Hamann; Christine Ambrosone; Amanda Ewart Toland; Heli Nevanlinna; Celine M Vachon; Fergus J Couch
Journal:  Carcinogenesis       Date:  2013-12-09       Impact factor: 4.944

7.  Sorting nexin 1 loss results in increased oxidative stress and hypertension.

Authors:  Jian Yang; Laureano D Asico; Amber L Beitelshees; Jun B Feranil; Xiaoyan Wang; John E Jones; Ines Armando; Santiago G Cuevas; Gary L Schwartz; John G Gums; Arlene B Chapman; Stephen T Turner; Eric Boerwinkle; Rhonda M Cooper-DeHoff; Julie A Johnson; Robin A Felder; Edward J Weinman; Chunyu Zeng; Pedro A Jose; Van Anthony M Villar
Journal:  FASEB J       Date:  2020-04-15       Impact factor: 5.191

Review 8.  Using chromatin marks to interpret and localize genetic associations to complex human traits and diseases.

Authors:  Gosia Trynka; Soumya Raychaudhuri
Journal:  Curr Opin Genet Dev       Date:  2013-11-25       Impact factor: 5.578

9.  Genome-Wide Association Study of the Genetic Determinants of Emphysema Distribution.

Authors:  Adel Boueiz; Sharon M Lutz; Michael H Cho; Craig P Hersh; Russell P Bowler; George R Washko; Eitan Halper-Stromberg; Per Bakke; Amund Gulsvik; Nan M Laird; Terri H Beaty; Harvey O Coxson; James D Crapo; Edwin K Silverman; Peter J Castaldi; Dawn L DeMeo
Journal:  Am J Respir Crit Care Med       Date:  2017-03-15       Impact factor: 21.405

Review 10.  Determining causality and consequence of expression quantitative trait loci.

Authors:  A Battle; S B Montgomery
Journal:  Hum Genet       Date:  2014-04-26       Impact factor: 4.132

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.