| Literature DB >> 19261190 |
Alain L Gervais1, Luc Gaudreau.
Abstract
BACKGROUND: Nucleosomes are nucleoproteic complexes, formed of eight histone molecules and DNA, and they are responsible for the compaction of the eukaryotic genome. Their presence on DNA influences many cellular processes, such as transcription, DNA replication, and DNA repair. The evolutionarily conserved histone variant H2A.Z alters nucleosome stability and is highly enriched at gene promoters. Its localization to specific genomic loci in human cells is presumed to depend either on the underlying DNA sequence or on a certain epigenetic modification pattern.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19261190 PMCID: PMC2660331 DOI: 10.1186/1471-2199-10-18
Source DB: PubMed Journal: BMC Mol Biol ISSN: 1471-2199 Impact factor: 2.946
Summary of the datasets used in this study
| Barski | H2A.Z, H2A-H4R3me2 and 19 histone methylations (H2BK5me1, H3K27me1, H3K27me2, H3K27me3, H3K36me1, H3K36me3, H3K4me1, H3K4me2, H3K4me3, H3K79me1, H3K79me2, H3K79me3, H3K9me1, H3K9me2, H3K9me3, H3R2me1, H3R2me2, H4K20me1, H4K20me3) |
| Wang | 18 histone acetylations (H2AK5ac, H2AK9ac, H2BK120ac, H2BK12ac, H2BK20ac, H2BK5ac, H3K14ac, H3K18ac, H3K23ac, H3K27ac, H3K36ac, H3K4ac, H3K9ac, H4K12ac, H4K16ac, H4K5ac, H4K8ac, H4K91ac) |
Figure 1Epigenenetic information can be used to predict if a nucleosome is likely to contain H2A.Z. A. Histone post-translational modifications neighbouring randomly selected genomic regions where a H2A.Z- or H2A-containing nucleosome was found. Red indicates presence of a modification and green indicates absence. B. Accuracies of classifiers trained on a single post-translational modification using the C4.5 algorithm. Post-translational modifications vary greatly in their potential to predict the H2A.Z status of a nucleosome. Most of the best modifications are acetylations. C. Accuracies of the best classifier trained on a combination of the specified number of post-translational modifications. Using multiple post-translational modifications improves the overall classification accuracy. D. Best decision tree inferred using the C4.5 algorithm using four post-translational modifications. Three modifications in this tree (H3K18ac, H4K5ac and H4K8ac), if present in a particular genomic region, guide the classification toward H2A.Z, while the other (H4K20me1) guides it toward H2A.
Figure 2Genetic information can also be used to predict if a nucleosome is likely to contain H2A.Z. A. Top panel is a histogram of the log-odds of all words of eight base pairs in the H2A.Z dataset compared with the H2A dataset. Bottom panel shows the same analysis carried out on the randomized dataset. B. Log odds of the most enriched and most depleted words in the H2A.Z versus the H2A dataset. C. Flexibility profile of H2A.Z- and H2A-containing nucleosome sequences. These curves represent the positional flexible dinucleotide log-odds of the flexibility models described, trained on all H2A.Z and H2A sequences including their reverse complement, using a background probability calculated on input sequences without regard to position. H2A.Z-associated sequences are slightly more rigid than their H2A counterparts. D. Classification results for some of the sequence-based classifiers investigated. True positives in light green, true negatives in dark green, false positives in light red and false negatives in dark red. GC%: A model based solely on GC content of the sequences. MarkovX: A model based on a positional Markov model of order X (see text). NPMarkovX: A model based on a non-positional Markov model of order X. Flexibility: A model based on dinucleotide flexibility (see text). SVM: A model based on a support vector machine.
Figure 3The flexibility model can recapitulate the H2A.Z presence pattern bordering transcriptional start sites observed in vivo. A. H2A.Z occupancy calculated from data originating from Barski et al. Occupancy is calculated by counting how many times a nucleosome is found on each base pair. B. H2A occupancy, calculated as in panel A. C. Scores of the H2A.Z and H2A flexibility models over all human transcriptional start sites. Those regions are clearly a better fit for the H2A.Z model. D. Model and occupancy ratios. The occupancy ratio curve is calculated by dividing the occupancy counts of the H2A dataset by that of the H2A.Z dataset. The score ratio is calculated similarly by using the scores in panel C. The normalized curve is calculated by scaling the values of the score ratio curve between 0 and 1. The occupancy ratio curve was not normalized in any way.