| Literature DB >> 27257071 |
Jeffrey M Bhasin1, Angela H Ting2.
Abstract
Bioinformatic analysis often produces large sets of genomic ranges that can be difficult to interpret in the absence of genomic context. Goldmine annotates genomic ranges from any source with gene model and feature contexts to facilitate global descriptions and candidate loci discovery. We demonstrate the value of genomic context by using Goldmine to elucidate context dynamics in transcription factor binding and to reveal differentially methylated regions (DMRs) with context-specific functional correlations. The open source R package and documentation for Goldmine are available at http://jeffbhasin.github.io/goldmine.Entities:
Mesh:
Year: 2016 PMID: 27257071 PMCID: PMC4937336 DOI: 10.1093/nar/gkw477
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Goldmine automates the annotation of gene model and feature contexts for any set of genomic ranges. (A) Schematic of Goldmine's annotation approach. For gene context annotation, promoter and gene 3′ end regions are user-specified flanks surrounding annotated transcription start and end sites from gene databases that the tool can automatically download and synchronize. In cases of overlapping contexts, regions are classified using the priority order of promoter > 3′ end > exon > intron > intergenic. For feature contexts, Goldmine can take as input any number of user-specified feature sets of ranges or automatically download any table from the UCSC genome browser, including the ENCODE supertracks and GWAS catalog, and reports the percent overlap with these sets. (B) Proportion of ENCODE supertrack ChIP-seq peaks that annotate into the Goldmine gene contexts defined in (A). Each row is a proportional bar graph for an individual factor. (C) The proportion of REST ChIP-seq peaks across the named cell lines within each Goldmine gene model context. The total number of peaks for the factor in a cell line is given in the column next to the graph. (D) Each heatmap square is valued with the fraction of binding sites for CTCF that overlap with each co-binding partner given on the heatmap rows. Fractional overlaps are computed between the unions of all peaks across all available cell lines in ENCODE for each factor. Each column stratifies this relationship across the Goldmine genomic contexts.
Figure 2.Goldmine gene annotation links genomic ranges to known gene models. (A) DMRs were detected between CD4+ and CD8+ T cells. Each heatmap row represents a DMR, each column a donor, and each value the fold change between the two cell types for paired samples from a given donor. (B) Percent of DMRs that fall in gene model contexts as compared to a length-matched random genomic null region set. (C) Proportion of DMRs between CD4+ and CD8+ T cells that overlap with CpG-island centric features by Goldmine. CpG islands are annotated in the ‘cpgIslandExt’ table of the UCSC genome browser, shores are ±2 kb from these islands, and shelves are ±2 kb from shores. (D) Proportion of DMRs between CD4+ and CD8+ T cells that overlap with ENCODE ChIP-seq peaks (‘TFBS’) or DNaseI hypersensitive sites (‘DNaseI’) as reported by Goldmine. (E) Regional perspective of a promoter DMR for key lineage factor gene CD4 that was identified using Goldmine's annotation. (F) GO term enrichment for promoter DMR genes. ENSEMBL gene IDs were directly copied from Goldmine's gene-level table and pasted into GeneMANIA (http://www.genemania.org/). (G) An intergenic CD4+ hypermethylation DMR (chr8:2,162,901-2,163,500) with hypothesized function based on Goldmine annotation. This DMR correlates with the activity of an enhancer as predicted by ChromHMM segmentation and the presence of H3K27ac (data from the Roadmap Epigenomics Project). A cluster of ENCODE ChIP-seq peaks (‘TFBS’) and a DNaseI hypersensitive site (‘DNaseI’) that overlap with the DMR as reported by Goldmine are shown. (H) Variable enrichment of ENCODE supertrack ChIP-seq peaks in CD4+ hypermethylation DMRs across the contexts as compared to when the DMR set is not stratified by context (‘All’). The background set used for the enrichment calculation is the set of all-methylated regions genome-wide that also fall in the given genomic context. Significance was assigned when >5% of base pairs in a DMR overlapped with peaks of a given factor, and the lower bound of the 95% confidence interval (CI) of the odds ratio between the DMRs and all non-DMR methylated regions was above 1. Non-significant comparisons are plotted as white, and significant comparisons are colored by the value of the lower bound of the 95% CI of the odds ratio.