| Literature DB >> 26719772 |
Yu Gyoung Tak1, Peggy J Farnham1.
Abstract
Considerable progress towards an understanding of complex diseases has been made in recent years due to the development of high-throughput genotyping technologies. Using microarrays that contain millions of single-nucleotide polymorphisms (SNPs), Genome Wide Association Studies (GWASs) have identified SNPs that are associated with many complex diseases or traits. For example, as of February 2015, 2111 association studies have identified 15,396 SNPs for various diseases and traits, with the number of identified SNP-disease/trait associations increasing rapidly in recent years. However, it has been difficult for researchers to understand disease risk from GWAS results. This is because most GWAS-identified SNPs are located in non-coding regions of the genome. It is important to consider that the GWAS-identified SNPs serve only as representatives for all SNPs in the same haplotype block, and it is equally likely that other SNPs in high linkage disequilibrium (LD) with the array-identified SNPs are causal for the disease. Because it was hoped that disease-associated coding variants would be identified if the true casual SNPs were known, investigators have expanded their analyses using LD calculation and fine-mapping. However, such analyses also identified risk-associated SNPs located in non-coding regions. Thus, the GWAS field has been left with the conundrum as to how a single-nucleotide change in a non-coding region could confer increased risk for a specific disease. One possible answer to this puzzle is that the variant SNPs cause changes in gene expression levels rather than causing changes in protein function. This review provides a description of (1) advances in genomic and epigenomic approaches that incorporate functional annotation of regulatory elements to prioritize the disease risk-associated SNPs that are located in non-coding regions of the genome for follow-up studies, (2) various computational tools that aid in identifying gene expression changes caused by the non-coding disease-associated SNPs, and (3) experimental approaches to identify target genes of, and study the biological phenotypes conferred by, non-coding disease-associated SNPs.Entities:
Keywords: Enhancers; GWAS; Genome engineering; Non-coding SNPs
Year: 2015 PMID: 26719772 PMCID: PMC4696349 DOI: 10.1186/s13072-015-0050-4
Source DB: PubMed Journal: Epigenetics Chromatin ISSN: 1756-8935 Impact factor: 4.954
Fig. 1Making sense of GWAS: an overview. Shown is a flow chart of analytical and experimental steps that can be followed to understand how a non-coding SNP can be associated with an increased risk for a specific disease. Index SNPs are identified using GWAS arrays and then expanded to a larger set of SNPs (termed Refined Associated SNPs) using LD scores and fine-mapping. These Refined Associated SNPs are then prioritized using functional annotation to identify Regulatory SNPs (Reg SNPs) or linkage to allele-specific gene expression to identify eQTL SNPs, producing a set of Candidate Functional SNPs. The Candidate Functional SNPs can either be studied directly or further refined by testing the Regulatory SNPs for possible SNP-RNA linkages or by testing the eQTL SNPs for functional annotation. If a Candidate Functional SNP (yellow arrowhead) lies within a distal regulatory element, it can be deleted or modified using genomic nucleases or epigenomic toggle switches (Approach A); putative target genes are then identified using RNA-seq. Distal regulatory elements that cause changes in gene expression when deleted or modified can then be studied using allele-specific analyses (Approach B); promoters harboring risk-associated SNPs (pink arrowhead) can be directly studied using Approach B. As described in the text, cells deleted for the distal regulatory elements can be used to identify an appropriate phenotypic assay for analysis of the candidate target genes. Then, the genes that show expression changes that are linked to distal SNPs and the genes regulated by the promoter SNPs can be studied using those biological assays to identify possible therapeutic targets and/or candidates for diagnostic tests. Finally, looping assays can be performed to distinguish direct from indirect targets of the distal regulatory elements. It is important to note that a gene whose expression is indirectly affected by a non-coding SNP could be a more important diagnostic or therapeutic target than the directly affected gene
Publicly available functional annotation programs
| Tool | Type | Minimum input | Output | Epigenetic annotation file used | URL | PMID |
|---|---|---|---|---|---|---|
| HaploReg | Web server | rsID | Overlapping annotated features and TF motif disruption information for SNPs (input) and correlated SNPs with r2 >0.8 | ChromHMM, DNase-seq, a library of position weight matrices (PWMs) from TRANSFAC, JASPAR, and protein binding array (PBM), and eQTL |
| 22064851 |
| RegulomeDB | Web server | rsID | Overlapping annotated features for SNPs (input) with scores which depend on the combination of overlapping annotated features and UCSC genome browser showing overlapping features | TF binding, Dnase-seq, FAIRE, DNase footprinting, eQTL, dsQTL, ChIP-exo and DNA methylation |
| 22955989 |
| FORGE | Web server | rsID | Overlapping DNase1 hotspots for SNP(input) | DNase1 hotspot |
| |
| rSNPBase | Web server | rsID or gene name | Proximal or distal transcriptional regulation, miRNA regulation, RNA binding protein mediated regulation, eQTL results for SNPs (input) and correlated SNPs (r2>0.8) | histone modification, TF bindings, CpG islands, RBP, miRNA data |
| 24285297 |
| FunciSNP | R package | GWAS index SNP information (chrom:position, rsID, population)in tab-delimited file, biofeature information in .bed format, user‐defined | Overlapping annotated features for index SNP(input) and correlated SNPs which r2 values are user‐defined | Any biofeature annotation information in .bed format |
| 22684628 |
| GREGOR | A package run using perl code | A file containing single column of index SNPs, biofeature information in .bed format, user‐defined | Prioritized variants based on overlap with selected regulatory regions, enrichment analysis with P‐values showing how index SNPs or correlated SNPs are enriched in annotated feature compared to control SNPs | Any biofeature annotation information in .bed format |
| 25886982 |
| Enlight | Web server | rsID, | Plots showing LD and overlapping annotated features for SNP (input) | chromHMM, histone modification, DNA methylation, TF bindings, eQTL, Hi-C or customized BED file for biofeatures |
| 25262152 |
| GWAS3D | Web server | rsID, | TF motif analysis and overlapping annotated features for SNPs (input) | 5C, Hi-C, ChIA-PET, ChromeHMM, H3K27Ac, p300, CTCF, DHS (Option for selecting cell lines relevant to disease) |
| 23723249 |
| motifbreakR | R package | SNP information in .bed or .vcf format | Comprehensive TF binding sites disruption at SNPs (input) | TF motif information from ScerTF, FlyFactorSurvey, hpDI, UniPROBE, JASPAR, ENCODE, Homer, Factorbook, HOCOMOCO |
| 26272984 |
Fig. 2Prioritizing SNPs using functional annotation. Shown is a figure produced using the Enlight program. a Shown is an index SNP (rs2071278, indicated by the purple diamond) for Rheumatoid Arthritis and correlated SNPs within ±20 Kb; the high LD SNPs (r 2 > 0.8) are indicated in orange. b Shown is the chromHMM segmentation for the region, with the colors (defined in the inset box) indicating the different chromatin states for that region in the blood cell lines, GM12878 and K562; note that the High LD SNPs fall into enhancer categories (yellow bars). c Shown are the genes within the region. d Shown is an eQTL plot with scores based on −log10 P values, taken from the UChicago eQTL browser. e Shown is H3K27Ac and DNase-seq data for GM12878 and K562 and the TFs ChIP-seq track from the ENCODE browser
Sources of eQTL databases
| Tool | Features | URL | PMID |
|---|---|---|---|
| NCBI eQTL browser | cis‐eQTL from liver, lymphoblastoid, brain |
| |
| seeQTL | browser for cis‐eQTL, and trans-eQTL from lymphoblastoid, brain, monocyte |
| 22171328 |
| Chicago eQTL | QTL (eQTL, dsQTL, trQTL, exonQTL) from lymphoblastoid, brain, liver, fibroblast, T‐cells |
| |
| GTEx Portal | >60 tissues eQTL data and eQTL IGV browser |
| 25954001 |
| GeneVar | >5 tissues eQTL, meQTL data and visualization |
| 20702402 |
| Blood eQTL | Blood cis- and trans-eQTLs |
| 24013639 |
| Geuvadis | QTL (eQTL,mirQTL, trQTL) from lymphoblastoid |
| 24037378 |
mirQTL miRNA QTL, trQTL transcript ratio QTL, dsQTL Dnase I sensitivity QTL