| Literature DB >> 25288875 |
Stephanie A Rosse1, Paul L Auer2, Christopher S Carlson3.
Abstract
Most cancer-associated genetic variants identified from genome-wide association studies (GWAS) do not obviously change protein structure, leading to the hypothesis that the associations are attributable to regulatory polymorphisms. Translating genetic associations into mechanistic insights can be facilitated by knowledge of the causal regulatory variant (or variants) responsible for the statistical signal. Experimental validation of candidate functional variants is onerous, making bioinformatic approaches necessary to prioritize candidates for laboratory analysis. Thus, a systematic approach for recognizing functional (and, therefore, likely causal) variants in noncoding regions is an important step toward interpreting cancer risk loci. This review provides a detailed introduction to current regulatory variant annotations, followed by an overview of how to leverage these resources to prioritize candidate functional polymorphisms in regulatory regions.Entities:
Keywords: bioinformatics; functional follow-up; regulatory prediction; variant annotation
Year: 2014 PMID: 25288875 PMCID: PMC4179605 DOI: 10.4137/CIN.S13789
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Figure 1Tools for bioinformatic annotation.
Note: *Combined evidence across histone modification, open chromatin, ChIP-seq protein and motif annotations provides finer demarcation of functional elements and stronger evidence for the regulatory potential.
Description of bioinformatics tools used for functional follow-up of noncoding regions.
| DATASET | GENOMIC CLASS | DESCRIPTION | DATA SOURCE/PROGRAM |
|---|---|---|---|
| 1 | Non-synonymous coding | Exonic positions leading to amino acid replacement | Nonsense-Ensembl, |
| 2 | Promoter | 1kb regions upstream of annotated transcription start sites | RefSeq, UCSC Genome Browser |
| 3 | TFBS | Transcription factor binding sites (TFBS) predicted in promoter & non-promoter regulatory elements | UCSC Genome Browser |
| 4 | Non-coding RNA | All types of experimentally supported non-coding RNA, including microRNAs | RNAdb 2.0 |
| 5 | MicroRNA target site | Computationally predicted microRNA target sites within 3′ UTRs | TargetScanS 5.2 |
| 6 | Enhancer element | Experimentally supported enhancer elements in any tissue | VISTA Enhancer Browser UCSC Table Browser |
| 7 | Candidate non-specific regulatory element | Open chromatin loci in at least one human cell type, as assessed by DNase I hypersensitivity (DHS) mapping | UCSC Table Browser |
| 8 | Insulator elements | CTCF binding sites assessed by ChIP-seq technology | UCSC Table Browser |
| 9 | eQTL | Allele-specific differences in expression levels | GTEx eQTL Browser, |
| 10 | Conserved Element | UCSC Table Browser | |
| 11 | Splice Site | BDGP |
Note:
PWM-scan was applied using positional weight matrices (PWMs) from the Transfac database.
Figure 3DHS footprinting annotation.
Notes: The rs6983267 variant is a laboratory confirmed functional variant in a distal MYC enhancer associated with colorectal cancer risk. Within the DHS peaks (shown in green) and broad ChIP-seq signal for TCF4 (shown in red), we find a slight valley between two higher DHS peaks. This valley corresponds to a region that actually binds TCF4 and can be detected through DHS footprinting (shown in blue). Motif analysis reveals that rs6983267 falls within the conserved binding motif of TCF4. It is important to note that because there are a number of TFs that bind to this region, as is the case with many enhancers, the valley is less pronounced than would be expected of a strong regulatory element binding a single TF.
Figure 2Bioinformatics annotation of a promoter element using UCSC Genome Browser.
Notes: Figure 2 Bioinformatics annotation using the UCSC Genome Browser for the 8q24 index SNP rs6983267. In the first panel, the index SNP is shown to be located in an intergenic region over 200 kb upstream of MYC. The Roadmap tracks in the second panel zoom in on 8q24 showing the genomic position chr8:128,406,826–128,419,067. The chromosomal positions of three variants in LD (r2 > 0.8) with rs6983267 are shown in the first custom bed file track. The next two plots are biological replicates of normal rectal enhancer ChIP-seq signal. The following two show DHS peak enrichment in fetal intestine. Shown below the DHS plots are genomic footprinting tracks and below that is aggregate ChIP-seq signal from 161 TFs. Genomic segmentation by ChromHMM in normal colon and rectal tissue labels the region containing rs6983267 as an enhancer. In the third panel we see strong TCF7L2 and CTCF binding across the rs6983267 variant. While rs6983267 has a strong bioinformatic functional annotation, the other two variants in strong LD, rs10505477 and rs12682374, do not align to predicted regulatory elements. In vitro evaluation of this region has shown that the rs68983267 is a functional variant that disrupts the binding of TCF7 and expression of the target gene MYC.56,84
Examples of variants with laboratory follow-up.
| VARIANT | REGULATORY REGION | GENE | TRAIT | PUBLICATION | YEAR | BEFORE GWAS? |
|---|---|---|---|---|---|---|
| rs6983267 | Distal Enhancer 300kb downstream | Colorectal Cancer | Pomerantz et al., | 2009 | Follow-up | |
| rs16888589 | Enhancer 20kb downstream | Colorectal Cancer | Pittman et al., | 2010 | Follow-up | |
| rs10822013 | Intronic | Breast Cancer | Cai et al., | 2011 | Follow-up | |
| rs2735940 | Promoter | Telomere Length | Matsubara et al, | 2006 | Pre-GWAS | |
| rs2735940 | Promoter | Acute Lymphoblastic Leukemia (ALL) | Sheng et al., | 2013 | Follow-up | |
| rs2736108 | Promoter | Ovarian and Breast Cancer | Beesley et al., | 2011 | Follow-up | |
| rs1512268 | Enhancer 14kb downstream | Prostate Cancer | Akamatsu et al., | 2010 | Follow-up | |
| rs6913578 | Regulates unknown target | Breast Cancer | Cai et al., | 2011 | Follow-up | |
| rs4784227 | Possible enhancer of | Breast Cancer | Long et al., | 2010 | Follow-up | |
| rs2239632 | Promoter | ALL | Ryoo et al., | 2013 | Follow-up | |
| rs11730582 | Promoter | Gastric Cancer | Zhao et al., | 2012 | Follow-up | |
| rs11730582 | Promoter | Melanoma | Schultz et al., | 2009 | Follow-up | |
| rs4590952 | Intronic | Testicular Cancer | Zeron-Medina et al., | 2013 | Follow-up | |
| rs944289 | Enhancer | Papillary thyroid carcinoma (PTC) | Jendrzejewski et al., | 2012 | Follow-up | |
| rs1859961 | Distal Enhancer 1Mb upstream | Prostate Cancer | Zhang et al., | 2012 | Follow-up | |
| rs12194974 | Promoter | Ovarian Cancer | Permuth-Wey et al., | 2011 | Follow-up | |
| rs8506 | miRNA binding in exon | Gastric Cancer | Fan et al., | 2014 | Follow-up | |
| rs10993994 | Promoter | NA | Buckland et al., | 2005 | Pre-GWAS | |
| rs10993994 | Promoter | Prostate Cancer | Chang et al., | 2009 | Follow-up | |
| rs10993994 | Promoter | Prostate Cancer | Lou et al., | 2009 | Follow-up |
Figure 4Limitation of bioinformatics annotation.
Notes: Figure 4 outlines an example where the variant with the strongest annotation (shown in red) is not the underlying functional variant (shown in green). The rs16892766 locus is located in an intergenic region over 20 kb downstream of EIF3H. Although both rs16888589 and rs16892766 fall in a poised enhancer based on histone modification, DHS, ChromHmm segmentation, and footprinting evidence, only the index rs16892766 falls in the highly conserved ChIP-seq binding site for the three shown TFs. Laboratory follow-up of this variant revealed that the true underlying causal variant is rs16888589 and not the index, although the index had the stronger bioinformatic annotation illustrating the importance of laboratory follow-up, particularly when there exists more than one predicted functional variant.