| Literature DB >> 30176808 |
Hani Z Girgis1, Alfredo Velasco2, Zachary E Reyes2.
Abstract
BACKGROUND: Histone modifications play important roles in gene regulation, heredity, imprinting, and many human diseases. The histone code is complex and consists of more than 100 marks. Therefore, biologists need computational tools to characterize general signatures representing the distributions of tens of chromatin marks around thousands of regions.Entities:
Keywords: Artificial neural networks; Associative learning; Chromatin modifications; Epigenetic signatures; Hebbian learning; Histone marks; Visualization
Mesh:
Substances:
Year: 2018 PMID: 30176808 PMCID: PMC6122555 DOI: 10.1186/s12859-018-2312-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Representations of a group of chromatin marks overlapping a region. a Horizontal double lines represent a region of interest. Horizontal single lines represent the marks. Vertical lines are spaced equally and bounded by the region. b The intersections between the marks and the vertical lines are encoded as a matrix where rows represent the marks and columns represent the vertical lines. If a vertical line intersects a mark, the corresponding entry in the matrix is 1, otherwise it is -1
Fig. 2Unsupervised Hebb’s network: w is the weight vector, which represents the learned signature; b is an epigentic vector; p is the ones vector; satlins is the activation/transformation function (Eq. 2); o is the output of the network; and n is the size of p, b, w, and o
Fig. 3Retrieving the chromatin signature of the H1-specific enhancers. Three examples of enhancers are shown in Parts a–c. A row in one of these plots represents the distribution of one mark around a region; red (blue) color indicates the presence (absence) of a mark. It is hard to see a common pattern in these three examples. The signature learned by the Hebbian network is captured by the HebbPlot shown in Part d. A row in the HebbPlot represents the distribution of a mark around all enhancers in the data set. The closer the color to red, the higher the certainty of the presence of a mark around the corresponding sub-region. The HebbPlot is characterized by four zones. The top most zone represents chromatin marks that are absent from the enhancer regions, whereas the next three zones represent the present marks with increasing certainty. A conventional plot of the intensities of all marks around every region in the data set in shown in Part e. Many marks show depressions near the center of the plot; however, some peaks are mixed with these depressions in the conventional plot. In contrast, these depressions correspond to the ellipse in the middle of the third zone of the HebbPlot. This ellipse is very clear. Further, marks of similar intensities obstruct one another in the conventional plot. This is not the case with HebbPlot because every mark is represented by a separate row. An average plot is displayed in Part f. This plot shows a similar — but fuzzy — pattern to the one found by the network
Fig. 4Hierarchical clustering of histone marks around 5899 H1-specific enhancers. The epigenetic vectors, except they are filled row-wise not column-wise, are clustered. This figure shows that certain marks have clear consistent pattern around these regions. However, the specific signature of these marks is not easily interpreted
Fig. 5Liver chromatin signatures representing a active enhancers, b active promoters, and c coding regions of active genes. The three signatures have similarities and differences. They are similar in that H3K9me3 and H3K27me3 are absent from all of them. H3K36me3 is the strongest mark of coding regions, whereas H3K27ac is the strongest mark of promoters and enhancers. H3K4me1 is stronger than H3K4me3 in enhancers; this relation is reversed in promoters, where H3K4me1 is weak around transcription start sites
Fig. 6HebbPlots of active promoters in HeLa-S3 cervical carcinoma cell line. These promoters were separated into two groups according to their strands. The size of a promoter is 4400 nucloetides. The two HebbPlots of the promoters on the positive and the negative strands are mirror images of each other. Multiple marks including H3K36me3, H3K79me2, H3K4me1, H2A.Z, H3K27ac, H3K9ac, H3K4me3, and H3K4me2 are distributed in a direction specific way. H2A.Z tends to stretch upstream, whereas the rest of these directional marks tend to stretch downstream from the promoters toward their coding regions. a Promoters on the positive strand, b Promoters on the negative strand
Promoters — 4400 nucleotides long — were separated according to the strand to positive and negative groups
| Mark | Known | Directional | Percentage (%) |
|---|---|---|---|
| H3K4me3 | 57 | 41 | 72 |
| H3K79me2 | 14 | 10 | 71 |
| H3K4me2 | 16 | 11 | 69 |
| H2AK5ac | 6 | 4 | 67 |
| H3K18ac | 6 | 4 | 67 |
| H2A.Z | 14 | 9 | 64 |
| H3K4me1 | 57 | 35 | 61 |
| H2BK12ac | 5 | 3 | 60 |
| H3K14ac | 5 | 3 | 60 |
| H3K9ac | 24 | 13 | 54 |
| H2BK5ac | 6 | 3 | 50 |
| H3K23ac | 6 | 3 | 50 |
| H3K4ac | 6 | 3 | 50 |
| H3K79me1 | 6 | 3 | 50 |
| H3K27ac | 49 | 22 | 45 |
| H4K91ac | 5 | 2 | 40 |
| H4K8ac | 6 | 2 | 33 |
| H2BK120ac | 6 | 1 | 17 |
| H4K20me1 | 12 | 2 | 17 |
| H3K36me3 | 57 | 6 | 11 |
Mark vectors over the upstream and the downstream thirds of the promoters on the positive strand were compared. A mark is considered directional if these two vectors have a negative dotsim value. The number of cell types, for which a mark was determined, is listed under “Known.” The number of cell types, in which a mark has directional preference around the promoter regions, is listed under “Directional.” The percentage of times a mark showed directional preference is listed under “Percentage.” Only marks determined for at least five tissues were considered
Fig. 7Promoters active in skeletal muscle myoblasts cells were separated into high- and low-CpG groups. A HebbPlot was generated from each group. Clearly, the two signatures are different. Specifically, H3K4me3, H3K9ac, and H3K27ac are present around the high-CpG promoters, whereas they are very weak or absent from the low-CpG promoters. In contrast, H3K36me3 is absent from the high group, but present around the low-CpG promoters. In general, marks present around the high-CpG promoters are stronger than those present around the low-CpG ones. a High-CpG promoters, b Low-CpG promoters
High-CpG promoters have a different signature from that of low-CpG promoters
| Mark | Known | Average dotsim |
|---|---|---|
| H3K4me3 | 57 | -0.98452 |
| H3K9ac | 24 | -0.82137 |
| H3K27ac | 49 | -0.72655 |
| H2BK120ac | 6 | -0.53278 |
| H4K91ac | 5 | -0.48083 |
| H3K4me2 | 16 | -0.33263 |
| H3K23ac | 6 | -0.32737 |
| H2A.Z | 14 | -0.27855 |
| H2BK12ac | 5 | -0.20927 |
| H2BK5ac | 6 | -0.15632 |
| H3K4ac | 6 | -0.15405 |
| H4K8ac | 6 | -0.12716 |
| H2AK5ac | 6 | -0.11522 |
| H3K14ac | 5 | -0.03981 |
| H3K18ac | 6 | 0.14699 |
| H3K4me1 | 57 | 0.24636 |
| H3K79me1 | 6 | 0.35168 |
| H3K79me2 | 14 | 0.62139 |
| H3K36me3 | 57 | 0.65545 |
| H4K20me1 | 12 | 0.82929 |
| H3K27me3 | 57 | 0.92651 |
| H3K9me3 | 57 | 0.97729 |
Active promoters in 57 tissues/cell types were divided into two groups according to their CpG contents. Then two networks were trained on the two groups, producing two signatures for each tissue/cell type. The two signatures of a mark in the same tissue were compared using the dotsim function. The average dotsim values are listed under “Average dotsim.” Not all marks were determined for all tissues. The number of tissues/cell types, for which a mark was determined, is listed under the column titled “Known”
Fig. 8Signatures of active enhancers. Enhancers were collected from a study by Rajagopal et al. [54] and from the Fantom Project. A HebbPlot was generated from the enhancers of each tissue. The HebbPlots of H1 and IMR90, for which more than 20 marks are known, show that several marks are present around active enhancers. Usually, H3K4me1 has a stronger signal around enhancers than H3K4me3; however there are some exceptions, e.g. foetal brain. H3K9ac and H3K27ac are present around enhancers, but H3K9me3, H3K27me3, and H3K36me3 are very weak or absent from enhancers. These plots show that chromatin signatures of enhancers active in different tissues are similar, but not identical. a H1, b IMR90, c Liver, d Foetal brain, e Foetal small intestine, f Left ventricle, g Lung, h pancreas
Fig. 9Histone marks are highly associated with gene expression levels in IMR90. Genes were divided into nine groups according to their expression levels. A HebbPlot was generated from the coding regions of each group. In general, a HebbPlot cools down — becomes bluer — as the expression level decreases. The more red a row is, the more consistent its mark is distributed around the set of regions. H3K36me3 and H3K79me1 mark the coding regions of active genes in IMR90, whereas the repressive modification, H3K27me3, marks the inactive coding regions. H2A.Z is ubiquitous. a First group, b Second group, c Third group, d Fourth group, e Fifth group, f Sixth group, g Seventh group, h Eighth group, i Ninth group
A catalog of functions of histone marks in this study
| Mark | Function | Literature support |
|---|---|---|
| H2A.Z | Directional around promoters stretching upstream. | Associated with trascription start sites [ |
| H2AK5ac | Directional around promoters stretching downstream and weakly associated with coding regions of inactive genes. | – |
| H2BK5ac | Directional around promoters stretching downstream. | Associated with promoters [ |
| H2BK12ac | Directional around promoters stretching downstream. | – |
| H2BK120ac | Associated with high-CpG promoters. | Associated with promoters and CpG islands [ |
| H3K4ac | Directional around promoters stretching downstream. | Associated with promoters [ |
| H3K4me1 | Directional around promoters stretching downstream, absent around transcription start sites, and associated with enhancers. | Associated with enhancers [ |
| H3K4me2 | Directional around promoters stretching downstream and associated with enhancers. | Associated with promoters [ |
| H3K4me3 | Directional around promoters stretching downstream, associated with high-CpG promoters, and associated with enhancers — usually weaker than H3K4me1. | Associated with trascription start sites [ |
| H3K8ac | Weakly associated with coding regions of active genes. | – |
| H3K9ac | Directional around promoters stretching downstream, associated with high-CpG promoters, and associated with enhancers. | Associated with promoters [ |
| H3K9me3 | Weakly associated with coding regions of inactive genes, and very weak/absent from enhancers, and very weak/absent from promoters. | Associate with “repressed regions” [ |
| H3K14ac | Directional around promoters stretching downstream and weakly associated with coding regions of inactive genes. | – |
| H3K18ac | Directional around promoters stretching downstream. | – |
| H3K23ac | Directional around promoters stretching downstream. | – |
| H3K27ac | Associated with high-CpG promoters and enhancers. | Associated with trascription start sites [ |
| H3K27me3 | Weakly associated with coding regions of inactive genes, very weak/absent from enhancers, and very weak/absent from promoters. | “Repressive mark” [ |
| H3K36me3 | Associated with coding regions and very weak/absent from enhancers. | Associated with and directional around “transcriped gene bodies" [ |
| H3K79me1 | Directional around promoters stretching downstream and associated with coding regions of active genes. | Associated with promoters active in CD4+ [ |
| H3K79me2 | Directional around promoters stretching downstream and associated with coding regions of active genes. | Associated with “transcribed regions” [ |
| H4K12ac | Associated with coding regions of inactive genes — this mark is known in one tissue only. | – |
| H4K91ac | Associated with high-CpG promoters. | Associated with promoters [ |
Fig. 10The advantage of HebbPlot is clear when looking at variable-sized regions. Each triangle represents the distributions of chromatin marks around a region. The three equally-spaced samples (X) obtained from each region give a rise to a pattern of low signal (-1), high signal (1), and low signal (-1). Conventional plots wouldn’t detect this pattern because of the differences in length. Hebbplot, however, will rescale these triangles and present the correct signature