| Literature DB >> 22759862 |
Grzegorz M Burzynski1, Xylena Reed, Leila Taher, Zachary E Stine, Takeshi Matsui, Ivan Ovcharenko, Andrew S McCallion.
Abstract
Illuminating the primary sequence encryption of enhancers is central to understanding the regulatory architecture of genomes. We have developed a machine learning approach to decipher motif patterns of hindbrain enhancers and identify 40,000 sequences in the human genome that we predict display regulatory control that includes the hindbrain. Consistent with their roles in hindbrain patterning, MEIS1, NKX6-1, as well as HOX and POU family binding motifs contributed strongly to this enhancer model. Predicted hindbrain enhancers are overrepresented at genes expressed in hindbrain and associated with nervous system development, and primarily reside in the areas of open chromatin. In addition, 77 (0.2%) of these predictions are identified as hindbrain enhancers on the VISTA Enhancer Browser, and 26,000 (60%) overlap enhancer marks (H3K4me1 or H3K27ac). To validate these putative hindbrain enhancers, we selected 55 elements distributed throughout our predictions and six low scoring controls for evaluation in a zebrafish transgenic assay. When assayed in mosaic transgenic embryos, 51/55 elements directed expression in the central nervous system. Furthermore, 30/34 (88%) predicted enhancers analyzed in stable zebrafish transgenic lines directed expression in the larval zebrafish hindbrain. Subsequent analysis of sequence fragments selected based upon motif clustering further confirmed the critical role of the motifs contributing to the classifier. Our results demonstrate the existence of a primary sequence code characteristic to hindbrain enhancers. This code can be accurately extracted using machine-learning approaches and applied successfully for de novo identification of hindbrain enhancers. This study represents a critical step toward the dissection of regulatory control in specific neuronal subtypes.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22759862 PMCID: PMC3483557 DOI: 10.1101/gr.139717.112
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
Figure 1.Hindbrain enhancers can be accurately predicted from DNA sequence. (A) Area under the ROC curve (AUC) for three Hb enhancer classifiers trained on three highly overlapping data sets (enhancers with activity in the anterior Hb, posterior Hb, and whole Hb). AUC values range from 0.5 (random discrimination) to a theoretical maximum of 1. We tested the performance of the classifiers in a cross-validation setting and obtained values of 0.89 (anterior Hb), 0.92 (posterior Hb), and 0.89 (combined Hb). (B) Overlap among the top-scoring 5% Hb enhancer predictions produced by all three Hb classifiers. (C) Fold-enrichment in 787 genes involved in Hb function in the neighborhood of positive predictions or putative Hb enhancers. Putative Hb enhancers were associated with the closest gene. P-values were computed using Fisher's exact test.
Figure 2.Experimental validation of tissue-specific enhancer candidates in transgenic zebrafish and mouse assays. Our computational approach trained on small empirical data sets (red bars) resulted in validation rates comparable to those for ChIP-seq-derived data sets using an EP300 antibody (gray bars) for the heart. Similarly, the validation rates of computational Hb classifiers trained on small empirical data sets were also comparable to those obtained with EP300 ChIP-seq experiments in other brain tissues.
Figure 3.Predicted enhancers display pleiotropic expression patterns in the hindbrain. (A–H) GFP reporter expression from eight stable lines corresponding to Hb predictions showing expression across the Hb as well as in some non-Hb domains. Dorsal view images were taken at 3 dpf (for lateral images, see Supplemental Figures), anterior to the left. (A) HB41, (B) HB34, (C) HB02, (D) HB15, (E) HB25, (F) HB51, (G) Hb10, (H) HB50. (Cb) cerebellum; (OV) otic vesicle; (Hb) hindbrain; (L) lens; (My) myotome; (dDi) dorsal diencephalon; (Tm) tegmentum; (CG) cranial ganglia; (fb) fin bud.
Figure 4.TF clustering reveals functional sequence domains. (A,H) UCSC Genome Browser custom track showing injected construct, classifier predicted HB sequence, and fragments tested for Hb expression (black bars, top to bottom). Colored bars mark TFBS for various factors. (B–G, I–P) GFP reporter expression observed with each sequence (lateral view, top; dorsal view, bottom). All images taken at 2 dpf, anterior to the left. (A) HB01 custom track with two subcloned fragments, (B) full-length HB01, lateral view, (C) HB01_I, lateral view, (D) HB01_II, no G0 GFP reporter expression observed, (E) full-length HB01, dorsal view, (F) HB01_I, dorsal view, (G) HB01_II, no G0 GFP reporter expression observed. (H) HB16 custom track with three subcloned fragments, (I) full-length HB16, lateral view, (J) HB16_I, lateral view, (K) HB16_II, lateral view, (L) HB16_III, lateral view, (M) full-length HB16, dorsal view, (N) HB16_I, dorsal view, (O) HB16_II, dorsal view, (P) HB16_III, dorsal view. (CG) cranial ganglia; (M-H) midbrain hindbrain boundary; (Hb) hindbrain; (Fb) forebrain; (Tm) tegmentum; (My) myotome; (L) lens.