| Literature DB >> 25887522 |
Daniel R Zerbino1, Steven P Wilder2, Nathan Johnson3, Thomas Juettemann4, Paul R Flicek5.
Abstract
Most genomic variants associated with phenotypic traits or disease do not fall within gene coding regions, but in regulatory regions, rendering their interpretation difficult. We collected public data on epigenetic marks and transcription factor binding in human cell types and used it to construct an intuitive summary of regulatory regions in the human genome. We verified it against independent assays for sensitivity. The Ensembl Regulatory Build will be progressively enriched when more data is made available. It is freely available on the Ensembl browser, from the Ensembl Regulation MySQL database server and in a dedicated track hub.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25887522 PMCID: PMC4407537 DOI: 10.1186/s13059-015-0621-5
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1The Regulatory Build process. In a first step we run segmentation software across multiple cell types. For each cell type and at each base pair, the genome is assigned a state, identified by an arbitrary number assigned by the segmentation software. We assign to each state a non-unique functional label, represented by its color on the browser, as shown at the top. For each state at each base pair, we compute the number of cell types sharing that state at that position, as shown in the center of the figure. Having selected relevant states and set some thresholds, we define regions of interest, which are the foundation of the regulatory build. These regions are then complemented with unannotated ChIP-Seq transcription factor binding site peaks and unannotated DNase1 hypersensitivity sites.
Figure 2Experimental marks associated with different labels. This heatmap represents the experimental marks and the label associated with each state. The states were defined by Segway, and the labels assigned by the Ensembl Regulatory Build a posteriori. Although the label assignment relies mainly on overlaps with known features, the states with the same labels co-cluster based on their experimental marks. The main exception are the promoter flanking states, which cluster either with promoters or with distal cis-regulatory elements. In effect, these states tend to represent a mixture of the other two.
Summary details for the regulatory build in Ensembl release 76
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| Promoters | 16,488 | 4,369 | 2,746 | 72 | 2.3% |
| Proximal enhancers | 85,526 | 1,876 | 1,741 | 160 | 5.2% |
| Distal enhancers | 127,786 | 547 | 482 | 70 | 2.3% |
| CTCF binding | 117,711 | 622 | 1,206 | 73 | 2.4% |
| Unannotated transcription factor binding site | 27,523 | 528 | 628 | 15 | 0.5% |
| Unannotated open chromatin | 71,568 | 502 | 346 | 36 | 1.2% |
| Total | 446,602 | 399 | 12.9% |
Figure 3Decision tree assigning labels to unsupervised segmentation states.