| Literature DB >> 28122488 |
Fereshteh Chitsazian1, Mehdi Sadeghi2, Elahe Elahi3,4.
Abstract
BACKGROUND: The histones in the core of nucleosomes may be subject to covalent post-transcriptional modifications. These modifications are thought to correlate with and possibly affect various genomic functions, including transcription. Each modification may alone or in combination with other modifications influence or be influenced by transcription. We aimed to identify correlations between single modifications or combinations of modifications at specific nucleosome sized gene regions with transcription activity based on global histone modification and transcription data of human CD4+ T cells and three other human cell lines. Transcription activity was defined in a binary fashion as either on or off. The analysis was done using the Classification and Regression Tree (CART) data mining protocol, and the Multifactorial Dimensionality Reduction (MDR) method was performed to confirm the CART results. These powerful methods have not previously been used for analysis of histone modification data.Entities:
Keywords: CART; H2BK5ac; Histone modifications; MDR; Transcription
Mesh:
Substances:
Year: 2017 PMID: 28122488 PMCID: PMC5264486 DOI: 10.1186/s12859-016-1418-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Frequencies of histone modifications at TSS and TTS of genes with different transcription levels. a TSS b TTS. Each curve is defined by 19 points, one for each of 19 gene groups; each group consists of approximately 1000 genes grouped on the basis of levels of transcription values. The 39 histone modifications are distinguished by color. Data on control nonspecific goat and rabbit antibodies are also presented
Correlations between active/inactive designations of genes based on CART treesa with increasing number of nodes
| No. nodes in tree 1 | No. nodes in tree 2 | Correlationb |
|---|---|---|
| 1 | 2 | 0.8 |
| 2 | 3 | 0.81 |
| 3 | 4 | 0.82 |
| 4 | 5 | 0.85 |
| 5 | 6 | 0.95 |
| 6 | 7 | 0.98 |
aBased on 3000 genes with highest transcription values and 3000 genes with lowest values. bPearson’ s correlation
Fig. 2Use of CART tree based on 6000 genes for assessment of active/inactive status of genes. a and b Respectively, the seven and five node trees generated by CART based on histone modification patterns of the 3000 genes with highest transcription values (designated active) and the 3000 with the lowest values (designated inactive). + and -, respectively, signify presence or absence of modification in preceding node. See Additional file 1: Figure S4. c Plot showing pattern of Yg values of 18,729 genes ordered from left to right by decreasing levels of transcription values
Fig. 3Gene activity predictions based on histone modification patterns at single gene regions for all 24 gene regions. a Histone modification patterns in CART trees generated using single gene region data were used to ascribe index values of +1 and -1 to each of the 18,729 genes, and the percent of genes with index values of +1 and -1 that were, respectively, among genes designated active and inactive based on Ymax threshold was calculated. The percent was considered the prediction accuracy frequency of the tree. The histone modifications at nodes of each gene region are shown with two (for acetylations) or three (for methylations) digits separated by dots. The first and second digits indicate histone number and the amino acid number. The third digit when present represents the number of methyl groups present. b and c Respectively, three node CART trees generated on basis of histone modification patterns of TSS and TSS-1 of 18,729 genes. + and -, respectively, signify presence or absence of modification in preceding node. See Additional file 1: Figure S4. Prediction accuracy frequency of modification patterns at TSS+2 (not shown) were much lower than those at TSS+1
Fig. 4Gene activity predictions on test genes based on CART trees generated with train gene data. a Histone modification patterns in CART trees with various numbers of nodes were generated using data on all 24 gene regions of approximately 10,000 training genes, and the trees were used to ascribe index values of +1 and -1 to each of the same (left columns; training) or different (right columns; test) set of genes. The percent of genes with index values of +1 and -1 that were, respectively, among genes designated active and inactive based on Ymax threshold was calculated. The percent was considered the prediction accuracy frequency of the tree. b and c Respectively, seven and five node CART trees generated by using data on all 24 gene regions of approximately 10,000 training genes. + and -, respectively, signify presence or absence of modification in preceding node. See Additional file 1: Figure S4
Fig. 5CART trees based on single modification H2BK5ac. Five node CART tree generated on basis of presence or absence of single modification H2BK5ac at all 24 gene regions of all 18,729 genes and activity status of the genes based on Ymax threshold is shown. + and -, respectively, signify presence or absence of modification in preceding node. See Additional file 1: Figure S4
Gene activity prediction frequencies for CD4+ T cells based on five node CART trees for each of 39 histone modifications on the 24 nucleosome sized regions
| Histone modification | H2BK5ac | H4K91ac | H3K9ac | H2BK120ac | H3K27ac | H3K4ac | H3K79me3 | H3K79me2 | H4K5ac | H3K36ac |
| Prediction accuracy | 0.753 | 0.723 | 0.720 | 0.718 | 0.713 | 0.713 | 0.711 | 0.707 | 0.701 | 0.695 |
| Histone modification | H4K8ac | H3K18ac | H2BK20ac | H2AZ | H3K79me1 | H2AK9ac | H3K4me2 | H4K20me1 | H4K16ac | H4K12ac |
| Prediction accuracy | 0.693 | 0.687 | 0.681 | 0.674 | 0.670 | 0.666 | 0.662 | 0.661 | 0.653 | 0.645 |
| Histone modification | H3K9me1 | H3K4me3 | H3K36me3 | H3K27me1 | H2BK5me1 | H3K27me3 | H3K4me1 | H2AK5ac | H3K9me3 | H3K23ac |
| Prediction accuracy | 0.644 | 0.642 | 0.637 | 0.633 | 0.629 | 0.627 | 0.610 | 0.603 | 0.600 | 0.594 |
| Histone modification | H3K27me2 | H3K36me1 | H3K14ac | H3K9me2 | H4K20me3 | H4R3me2 | H3R2me1 | H3R2me2 | H2BK12ac | |
| Prediction accuracy | 0.572 | 0.566 | 0.543 | 0.542 | 0.537 | 0.531 | 0.531 | 0.524 | 0.486 |
Comparison of MDR and CART assessments of combinations of histone modifications for correlation with transcription
| Region | MDR: singleα | Tbab | Rootc | Identityd | MDR: 5 modificationse | Tbab | 5 node treef | Identityd |
|---|---|---|---|---|---|---|---|---|
| TSS-10 | H3K79me1 | 0.6642 | H3K79me1 | + | 2.20,4.91,3.27.3,3.79.1 | 0.719 | 3.27.3,2.20,3.79.1,4.91 | 100% |
| TSS-9 | H3K79me1 | 0.6634 | H3K79me1 | + | 2.20,3.27.3,3.79.2,3.79.1 | 0.7233 | 2.20,3.27.3,4.91,3.79.1 | 75% |
| TSS-8 | H3K79me1 | 0.6717 | H3K79me1 | + | 4.91,3.27.3,3.79.2,3.79.1 | 0.7327 | 3.27.3,2.20,3.79.1,4.91 | 75% |
| TSS-7 | H2BK20ac | 0.6686 | H3K79me1 | - | 2.20,3.27.3,3.79.2,3.79.1 | 0.7368 | 3.27.3,2.20,3.79.1,3.79.2 | 100% |
| TSS-6 | H2BK20ac | 0.6856 | H2BK120ac | - | 2.20,2.5,3.27.3,3.79.1 | 0.7427 | 3.27.3,2.5,2.120,3.79.1 | 75% |
| TSS-5 | H2BK120ac | 0.7078 | H2BK5ac | - | 2.5,2.120,3.27.3,3.79.1 | 0.7638 | 3.27.3,3.79.1,2.120,2.5 | 100% |
| TSS-4 | H2BK5ac | 0.7252 | H2BK5ac | + | 2.5,4.91,3.27.3,3.79.1, | 0.7761 | 2.5,3.27.3,2.20,4.8 | 50% |
| TSS-3 | H2BK5ac | 0.7673 | H2BK5ac | + | 2.5,2.120,3.79.1,3.79.2 | 0.7945 | 2.5,3.27.3,2.120,3.27.3 | 50% |
| TSS-2 | H2BK5ac | 0.7893 | H2BK5ac | + | 2.5,2.120,3.27,3.27.3 | 0.8093 | 2.5,2.120,3.27,2.9 | 75% |
| TSS-1 | H2BK5ac | 0.7903 | H2BK5ac | + | 2.5,3.27,4.91,3.79.2 | 0.8125 | 2.5,3.27.3,3.79.2,2.120 | 50% |
| TSS | H2BK5ac | 0.7923 | H2BK5ac | + | 2.5,4.91,3.79.2,3.79.3 | 0.8181 | 2.5,3.79.2,4.91,3.79.3 | 100% |
| TSS+1 | H4K91ac | 0.6963 | H2BK5ac | - | 2.5,4.91,4.20.1,3.79.2 | 0.7322 | 2.5,3.27.3,3.36.3,4.20.1 | 50% |
| TTS | H4K20me1 | 0.6731 | H4K20me1 | + | 3.27.1,3.27.3,3.36.3,4.20.1 | 0.7238 | 3.27.3,4.20.1,3.36.3,3.27.3 | 75% |
| TTS+1 | H4K20me1 | 0.6757 | H4K20me1 | + | H2AZ,3.27.3,3.36.3,4.20.1 | 0.7239 | H2AZ,3.27.3,4.20.1,3.36.3 | 100% |
| TTS+2 | H4K20me1 | 0.6677 | H4K20me1 | + | 4.91,3.27.3,2.5.1,4.20.1 | 0.6986 | 3.27.3,4.20.1,3.36.3,3.27.3 | 50% |
| TTS+3 | H4K20me1 | 0.6673 | H4K20me1 | + | 3.27.1,3.27.3,3.36.3,4.20.1 | 0.7133 | 2.5.1,3.27.3,4.20.1,3.27.3 | 50% |
| TTS+4 | H4K20me1 | 0.6635 | H4K20me1 | + | 3.27.1,3.36.3,3.27.3,4.20.1 | 0.7084 | 3.27.3,4.20.1,3.79.1,3.27.3 | 50% |
| TTS+5 | H4K20me1 | 0.6617 | H4K20me1 | + | 2.5.1,3.27.3,4.20.1,3.79.1 | 0.7088 | 3.27.3,4.20.1,3.79.1,3.27.3 | 75% |
α, MDR most informative single modification, b: Testing bal. accuracy c: Root of 5 node CART tree based on data presented in Fig. 3a, d: Identity between MDR & CART, eMDR: most informative combination of 5 modifications that are designated as described in legend to Fig. 3, f: Modifications in 5 node CART tree based on data presented in Fig. 3a
Results of SVM analyses
| No. components in analysis | Best component(s) | Accuracy |
|---|---|---|
| One | H2BK5ac at TSS | 0.72 |
| Two | H2BK5ac at TSS-1 | 0.74 |
| H2BK5ac at TSS-2 | ||
| Three | H2BK5ac at TSS-1 | 0.75 |
| H2BK5ac at TSS-2 |
Fig. 6CART trees based on data from IMR-90, hESC-h1, and MSC cells. a Five node trees based on data from all histone modificatios. b Five node trees based only on H2BK5ac data. c Two node trees based only on H2BK5ac data. See Additional file 1: Figure S4
Gene activity prediction capacity in IMR90, hESC-h1, and MSC cells
| Cell type | Two node CART tree based | Five node CART tree based | Five node CART tree based on | |||
|---|---|---|---|---|---|---|
| Gene regions at nodes | Prediction capacity | Gene regions at nodes | Prediction capacity | Modifications & gene regions at nodes | Prediction capacity | |
| IMR-90 cells | TSS+1, TSS-2 | 78.10% | TSS+1, TSS-2, TTS+4, TTS, TSS-10 | 78.10% | H3K27ac/TSS, H2BK5ac/TSS-2 | 78.69% |
| hESC-h1 cells | TSS, TSS-2 | 69.70% | TSS, TSS-2, TSS-1, TTS+1, TSS-3 | 70.00% | H3K27ac/TSS-1, H3K4me3/TSS, | 74.42% |
| MSC cells | TSS, TSS+1 | 63.83% | TSS, TSS+1, TSS-1 | 63.85% | H3K27me3/TSS+1, H3K36me3/TTS+1, | 68.38% |