| Literature DB >> 22416065 |
Jia Zeng1, Brian D Kirk, Yufeng Gou, Qinghua Wang, Jianpeng Ma.
Abstract
As key epigenetic regulators, polycomb group (PcG) proteins are responsible for the control of cell proliferation and differentiation as well as stem cell pluripotency and self-renewal. Aberrant epigenetic modification by PcG is strongly correlated with the severity and invasiveness of many types of cancers. Unfortunately, the molecular mechanism of PcG-mediated epigenetic regulation remained elusive, partly due to the extremely limited pool of experimentally confirmed PcG target genes. In order to facilitate experimental identification of PcG target genes, here we propose a novel computational method, EpiPredictor, that achieved significantly higher matching ratios with several recent chromatin immunoprecipitation studies than jPREdictor, an existing computational method. We further validated a subset of genes that were uniquely predicted by EpiPredictor by cross-referencing existing literature and by experimental means. Our data suggest that multiple transcription factor networking at the cis-regulatory elements is critical for PcG recruitment, while high GC content and high conservation level are also important features of PcG target genes. EpiPredictor should substantially expedite experimental discovery of PcG target genes by providing an effective initial screening tool. From a computational standpoint, our strategy of modelling transcription factor interaction with a non-linear kernel is original, effective and transferable to many other applications.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22416065 PMCID: PMC3401425 DOI: 10.1093/nar/gks209
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Our EpiPredictor system. (A) Architecture of EpiPredictor. The modules of Motif Analyzer, PRE Classifier and GC Analyzer are dedicated to the prediction of PRE sites and those of PRE-to-gene Mapper, Conservation Level Analyzer and Comparative Genomics Analyzer are focused on the prediction of PcG target genes. (B) Flowchart of the PRE site prediction modules of EpiPredictor.
SVM kernel evaluation
| Metric | Kernel | ||||
|---|---|---|---|---|---|
| Linear | Polynomial ( | Polynomial ( | RBF | Sigmoid | |
| Sensitivity | 0.80 ± 0.05 | 0.80 ± 0.05 | 0.60 ± 0.05 | 0.00 ± 0.00 | |
| Specificity | 0.91 ± 0.01 | 0.96 ± 0.01 | 0.99 ± 0.02 | 0.84 ± 0.03 | |
Sensitivity = TP/(TP + FN); Specificity = TN/(TN + FP),
where TP, TN, FP, FN correspond to true positive, true negative, false positive and false negative, respectively. We performed three independent runs of 10-fold cross validation on the training collection and reported the average sensitivity/specificity and the standard deviation. The kernel with the best performance in both sensitivity and specificity is highlighted in bold. This is also the kernel we used throughout our analyses.
Evaluation of the performance of individual EpiPredictor components against three genome-wide ChIP studies in D. melanogaster and their intersection
| Number of top genes | Schwartz | Tolhuis | Schuettengruber | Intersection | |
|---|---|---|---|---|---|
| 243 | (a) | 14.20% | 5.33% | 12.09% | 2.63% |
| (a,b) | 22.73% | 9.78% | 19.53% | 23.68% | |
| (a,b,c) | 26.14% | 10.22% | 25.12% | 23.68% | |
| (a,b,c,d) | |||||
| 322 | (a,b,c,d) | 32.39% | 14.22% | 30.70% | 34.21% |
| (a,b,c,d,e) |
(a): Motif Analyzer; (b): SVM Classifier; (c): GC Analyzer; (d): Conservation Level Analyzer; (e): Comparative Genomics Analyzer.
aOverlap with the genes predicted by Schwartz et al. (4).
bOverlap with the genes predicted by Tolhuis et al. (15).
cOverlap with the genes predicted by Schuettengruber et al. (14).
dOverlap with the genes intersected by Schwartz et al., Tolhuis et al., and Schuettengruber et al.
eThe number of top genes retrieved from EpiPredictor-Basic analysis.
fSuppose the validation set includes V genes. Among the top N genes predicted by our system, W genes matched the validation set, the overlap was represented as W/V.
gThe EpiPredictor-Basic module.
hThe number of top genes retrieved from EpiPredictor-CG analysis.
iThe EpiPredictor-CG module. The results corresponding to the EpiPredictor-Basic and EpiPredictor-CG models are highlighted in bold.
Evaluation of the performance of our system using SVM-based PRE classifier vs BART-based PRE classifier
| Method | Schwartz | Tolhuis | Schuettgurber | Intersection | |
|---|---|---|---|---|---|
| SVM | (a, b) | ||||
| (a, b, c) | |||||
| BART | (a, d) | 21.59% | 8.44% | 19.07% | 21.05% |
| (a, d, c) | 22.73% | 9.33% | 22.79% | 21.05% |
(a): Motif Analyzer; (b): SVM-based Classifier; (c): GC Analyzer; (d): BART-based Classifier.
aOverlap between the top 243 predicted genes with the genes predicted by Schwartz et al. (4).
bOverlap between the top 243 predicted genes with the genes predicted by Tolhuis et al. (15).
cOverlap between the top 243 predicted genes with the genes predicted by Schuettengruber et al. (14).
dOverlap between the top 243 predicted genes with the genes intersected by Schwartz et al., Tolhuis et al. and Schuettengruber et al. The results of the SVM-based classifier are highlighted in bold.
Comparison of the overlaps between the PRE genes predicted by EpiPredictor and jPREdictor and three genome-wide ChIP studies in D. melanogaster and their intersection
| Scheme | Approach | Schwartz | Tolhuis | Schuettengruber | Intersection |
|---|---|---|---|---|---|
| Original (243 genes) | |||||
| 21.02% | 8.00% | 20.00% | 21.05% | ||
| Comparative genomics (322 genes) | |||||
| 27.84% | 12.44% | 22.79% | 26.32% |
aOverlap with the genes detected by Schwartz et al. (4).
bOverlap with the genes detected by Tolhuis et al. (15).
cOverlap with the genes detected by Schuettengruber et al. (14).
dOverlap with the genes intersected by Schwartz et al., Tolhuis et al. Schuettengruber et al.
eData reported in the original publication (51).
fThe results of EpiPredictor-Basic and EpiPredictor-CG are highlighted in bold.
Figure 2.ROC curves of the PRE genes predicted by EpiPredictor and jPREdictor. Shown are overlaps with the genes predicted by Schwartz et al. (A), Tolhuis et al. (B), Schuettengruber et al. (C) and the genes intersected by all three sets (D). The AUCs on the four validation sets are 0.61, 0.61, 0.58 and 0.60, respectively, for EpiPredictor-Basic, 0.62, 0.57, 0.62 and 0.53, respectively, for EpiPredictor-CG, 0.64, 0.56, 0.59 and 0.67 for jPREdictor (static), 0.56, 0.49, 0.55 and 0.59 for jPREdictor (dynamic).
Figure 3.Gene ontology analysis of genes predicted by EpiPredictor and jPREdictor. Shown are the top 10 gene ontology terms related to the genes predicted by: (A) EpiPredictor-CG; (B) EpiPredictor-CG but not jPREdictor (dynamic); (C) jPREdictor (dynamic); (D) jPREdictor (dynamic) but not EpiPredictor-CG; (E) both EpiPredictor-CG and jPREdictor (dynamic); (F) EpiPredictor-CG except the seven annotated genes.
Annotation of a set of seven genes uniquely identified by EpiPredictor-CG
| Gene | Verified function in | Vertebrate homologue |
|---|---|---|
| A newly experimentally validated PRE was found to exist in the | ||
| Embryogenesis (Wingless/Wnt signaling pathway) ( | WNT1: predicted as PcG target in human ( | |
| Embryogenesis, neurogenesis ( | ||
| Important for a variety of cell fate decisions in development ( | ||
| Induce ectopic eye development ( | DACH-1: predicted as PcG target in human ( | |
| GSC: predicted as PcG target in human ( | ||
| Imaginal disc development ( | ISL1: predicted as PcG target in human ( |
Figure 4.ChIP-qPCR verification of EpiPredictor prediction. Shown are the enrichment of each genomic region (predicted PRE site) in S2 cell ChIP samples using anti-E(z) versus the use of anti-FLAG mock antibodies. The horizontal line shows an enrichment of 1 (no enrichment). The gene symbols listed are those of the genes closest to the tested genomic regions. For specific coordinates please refer to Supplementary Table S10.