Literature DB >> 28616452

Datasets on the genomic positions of the MLL1 morphemes, the ZFP57 binding site, and ZFBS-Morph overlaps in the build mm9 of the mouse genome.

Minou Bina1, Phillip Wyss1, Xiaohui C Song2.   

Abstract

While MLL1 activates gene expression in most tissues, ZFP57 represses transcription. MLL1 selectively interacts with a group of nonmethylated DNA sequences known as the MLL1 morphemes. ZFP57 associates with a methylated hexamer (ZFBS), dispersed in the genomic DNA segments known as Imprinted Control Regions (ICRs) and germline Differentially Methylated Regions (gDMRs), to maintain allele-specific gene repression. We have identified a set of composite DNA elements (ZFBS-Morph overlaps) that provides the sequence context of ZFBS in the canonical ICRs/gDMRs. This report provides tables listing the nucleotide sequences of the MLL1 morphemes and ZFBS-Morph overlaps. The report also offers links to the data repository at Purdue University, for downloading the positions of the MLL1 morphemes, the ZFP57 binding site, and the ZFBS-Morph overlaps in the mouse genome.

Entities:  

Keywords:  CpG-rich motifs; Gene regulation; Genomic imprinting; KMT2A; MLL1 morphemes; Mouse genome; ZFP57 binding site

Year:  2017        PMID: 28616452      PMCID: PMC5458072          DOI: 10.1016/j.dib.2017.05.050

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table Value of the data Two tables and three datasets are offered to the scientific community. One table lists the nucleotide sequences of the MLL1 morphemes, the other the nucleotide sequences of ZFBS-Morph overlaps. Three datasets were created to provide the genomic positions of functionally important DNA sequence-motifs: the MLL1 morphemes, the ZFP57 binding site, and ZFBS-Morph overlaps. The datasets consist of two bed files that could be uploaded onto the UCSC genome browser (build mm9 of the mouse genome), to create custom tracks. One file contains the genomic positions of the MLL1 morphemes, the other includes the genomic positions of ZFP57 binding site and ZFBS-Morph overlaps. Availability of these datasets facilitates viewing and analyzing genomic positions of functionally important sequence-motifs in the context of the ENCODE data and mapped landmarks including the position of protein-coding genes and CpG Islands.

Data

Mixed Lineage Leukemia 1 (MLL or MLL1) is an essential regulator of transcription [1], [2]. MLL1 selectively interacts with a group of nonmethylated DNA sequences known as the MLL1 morphemes: the smallest ‘words’ in DNA that selectively bind the MT-domain in MLL1 [3]. The MLL1 gene is one of the mammalian orthologs of the Drosophila Trithorax [4]. In human cells, functions of MLL1 include gene bookmarking during mitosis, in a manner favoring genes that were highly transcribed during interphase [5]. Gene bookmarking may involve interactions of MLL1 with morphemes that are localized in CGIs: the CpG islands [3]. The MLL1 morphemes contain 2–3 CpGs and occur in both the forward and the reverse orientation in genomic DNA (Table 1). Even though the MLL1 morphemes are dispersed along the chromosomal DNA, often they are clustered in CGIs [3], [6]. Examples include two CGIs (CpG36 and CpG72) associated with the Plagl1/Zac1 loci (Fig. 1). As a consequence of length-variability of CGIs [7], morpheme-frequencies in the islands vary: for examples, see Refs. [3], [6].
Table 1

MLL1 morphemes.

CGACG CGTCG
CGCCG CGGCG
CGCGCG
CGTGCG CGCACG
CGCCCG CGGGCG
CGGACG CGTCCG
CGTACG
Fig. 1

A cluster of ZFBS-Morph overlaps localizes the possible peak position of the Zac1 gDMR. Box 1 marks the position of CpG72, a conserved CGI that is methylated in oocyte DNA [11]. CpG72 includes a cluster of 5 ZFBS-Morph overlaps, marked by Box 2. As expected, a cluster of ZFBS also is present in CpG72 (Box 3). Random occurrences of ZFBS are marked by Box 4. A cluster of ZFBS also maps to a region that is not part of the gDMR (Box 5). That region includes a single, isolated, ZFBS-Morph overlap. The CGI that is not imprinted (CpG36) does not contain ZFBS-Morph overlaps. The track labeled MLL1 sites shows the position of the MLL1 morphemes in the displayed chromosomal location (chr10:12,749,001–12,879,000). In this relatively long genomic DNA segment (130,000 bps), closely-spaced MLL1 morphemes appear as thick vertical bars, isolated occurrences as thin vertical lines. Clustering of the MLL1 morphemes in CGIs is more apparent in shorter DNA segments; for examples see Refs. [3], [6].

A cluster of ZFBS-Morph overlaps localizes the possible peak position of the Zac1 gDMR. Box 1 marks the position of CpG72, a conserved CGI that is methylated in oocyte DNA [11]. CpG72 includes a cluster of 5 ZFBS-Morph overlaps, marked by Box 2. As expected, a cluster of ZFBS also is present in CpG72 (Box 3). Random occurrences of ZFBS are marked by Box 4. A cluster of ZFBS also maps to a region that is not part of the gDMR (Box 5). That region includes a single, isolated, ZFBS-Morph overlap. The CGI that is not imprinted (CpG36) does not contain ZFBS-Morph overlaps. The track labeled MLL1 sites shows the position of the MLL1 morphemes in the displayed chromosomal location (chr10:12,749,001–12,879,000). In this relatively long genomic DNA segment (130,000 bps), closely-spaced MLL1 morphemes appear as thick vertical bars, isolated occurrences as thin vertical lines. Clustering of the MLL1 morphemes in CGIs is more apparent in shorter DNA segments; for examples see Refs. [3], [6]. MLL1 morphemes. In contrast to MLL1, ZFP57 represses transcription [8]. Even though the ZFP57 binding site (ZFBS), a methylated hexamer, is dispersed in many loci, the site occurs often in ICRs to maintain allele-specific gene repression [9]. To identify the sequence context of ZFBS in ICRs, we extended the ZFBS length to include a subset of the MLL1 morphemes (Table 2), producing ZFBS-Morph overlaps [10]. Clusters of 2 or more ZFBS-Morph overlaps correctly localized ~90% of the known germline ICRs in the mouse genome [10], Table 3. As an example, Fig. 1 shows a cluster of 5 ZFBS-Morph overlaps in the gDMR of Zac1. This cluster is within CpG72, a conserved CGI that is methylated in oocyte DNA [11].
Table 2

ZFBS-Morph overlaps.

TGCCGCGCGCGGCA
TGCCGCCGCGGCGGCA
TGCCGCGCGCGCGCGGCA
TGCCGCCCGCGGGCGGCA
TGCCGCACGCGTGCGGCA
Table 3

Closely-spaced ZFBS-Morph overlaps in the canonical ICRs in the mouse genome. Identical genes that are displayed in 2 rows contain closely-spaced ZFBS-Morph overlaps at two different genomic positions.

Genomic positions (mm9)GenesZFBS-Morph overlaps
chr1:63,246,711-63,246,910Gpr1TGCCGCCG, CGCGGCA
chr2:157,385,801-157,387,500NnatTGCCGCG, CGGGCGGCA, TGCCGCG
chr2:152,512,591-152,512,650Mcts2TGCCGCG, TGCCGCGCG
chr2:174,121,336-174,121,660GnasTGCCGCG, CGCGGCA, TGCCGCG, CGCGCGGCA
chr2:174,124,701-174,125,300GnasCGCGGCA, TGCCGCCCG, TGCCGCCCG, TGCCGCCG
chr2:174,152,536-174,154,195Gnas_ExTGCCGCCG, CGGCGGCA, TGCCGCCG, TGCCGCCCG
chr2:174,155,591-174,156,025Gnas_ExCGGCGGCA, TGCCGCG
chr6:4,697,131-4,698,550Peg10TGCCGCG, TGCCGCG
chr6:30,687,491-30,688,825MestTGCCGCG, CGCGGCA, TGCCGCG, CGGGCGGCA, TGCCGCG, TGCCGCG
chr6:58,856,861-58,857,170Nap1l5CGCGGCA, CGCGGCA
chr7:67,148,966-67,149,720SnrpnCGCGGCA, CGCGGCA
chr7:6,681,601-6,683,200Peg3CGTGCGGCA, CGCGGCA, TGCCGCG, CGCGGCA
chr7:135,831,441-135,832,095Inpp5fCGCGGCA, TGCCGCG, CGCGGCA, TGCCGCG
chr7:149,765,896-149,766,315H19CGCGGCA, TGCCGCG, CGCGGCA
chr7:149,767,676-149,767,975H19TGCCGCCG, CGTGCGGCA, CGCGGCA
chr7:150,481,306-150,481,730KvDMR1CGCGGCA, TGCCGCG
chr8:125,388,921-125,389,390Cdh15TGCCGCG, TGCCGCG
chr9:89,774,326-89,775,050Rasgrf1TGCCGCG, TGCCGCG
chr10:12,810,341-12,811,120Zac1CGCGGCA, CGCGGCA, TGCCGCG, TGCCGCG, TGCCGCG
chr11:11,925,501-11,926,400Grb10CGCGGCA, CGCGGCA
chr12:110,764,761-110,766,795IG-DMRCGCGGCA, CGCGGCA, TGCCGCG, TGCCGCG, TGCCGCG
chr15:72,640,121-72,641,650Peg13CGCGGCA, CGCGGCA
chr17:12,934,306-12,935,515Igf2rCGCGGCA, TGCCGCG, CGCGGCA, CGCGGCA, CGCGGCA, TGCCGCG, TGCCGCG
ZFBS-Morph overlaps. Closely-spaced ZFBS-Morph overlaps in the canonical ICRs in the mouse genome. Identical genes that are displayed in 2 rows contain closely-spaced ZFBS-Morph overlaps at two different genomic positions.

Methods

We created two text files: one file consisting of the MLL1 morphemes (Table 1), for details see Ref. [3]; the other containing the ZFBS-Morph overlaps (Table 2), for details see Ref. [10]. These two tables include 2 columns displaying complementary pairs of sequences; both pairs are written in 5′ to 3′ direction; a single sequence is shown for complementary pairs with identical sequences. Subsequently, from the UCSC genome browser we downloaded the nucleotide sequences of the build mm9 of the mouse chromosomes [12]. We wrote 2 Perl scripts [3]. We followed the following steps: Script 1 opened and read the data in Table 1, to scan the nucleotide sequence of a specified chromosome; the output was a listing of the positions of the MLL1 morphemes along the analyzed chromosome. Script 2 read the output of the first script to create a bed file. We combined the bed files to obtain the positions of the MLL1 morphemes for the complete set of the mouse chromosomes. A ‘header’ was added to the file containing the complete set of the mouse chromosomes. The final bed file can be uploaded on the UCSC genome browser to create a custom track for displaying the genomic positions of the MLL1 morphemes along the mouse chromosomes. The Specifications Table, shown above, provides a link for downloading the file that contains the positions of the MLL1 morphemes in the mouse genome. After you upload the file onto the UCSC genome browser, to create a custom track, the page may display an entire chromosome. You can direct the browser to a specific region by typing in the query box the name of a gene or a desired chromosomal location; for examples see Table 3 and Refs. [13], [14].
Subject areaGenomics
More specific subject areaGene regulation
Type of dataTables and text files (in bed format, for display at the UCSC genome browser)
How data was acquiredAnalyzing the mouse chromosomes using Perl Scripts
Data formatTables and text files
Experimental featuresNone
Data accessibilityTwo links to files deposited at the Purdue University Research Repository:
1) Bina, M., Wyss, P.J., Wang, D., Song, X.C. (2014). Localization of MLL1 morphemes in mouse mm9 genomic DNA. Purdue University Research Repository. doi:10.4231/R7KW5CXF
https://purr.purdue.edu/publications/1648/1
2) Bina, M., Wyss, P.J., Wang, D., Song, X.C. (2014). Localization of MLL1 morphemes in mouse mm9 genomic DNA. Purdue University Research Repository. doi:10.4231/R7KW5CXF
https://purr.purdue.edu/publications/2473/1
Subsequently, we followed a similar approach for obtaining additional bed files for display at the UCSC genome browser. Specifically, we applied a modified form of script 1, using as input a file containing the ZF57 binding site, as a complementary pair of sequences, and the nucleotide sequence of a specified chromosome. Likewise, we applied the modified form of script 1, using as input a file containing the ZFBS-Morph overlaps (Table 2), and the nucleotide sequence of a specified chromosome. The subsequent steps were done as above. The Specifications Table provides a link for downloading the bed file that contains the genomic positions of both ZFBS and the ZFBS-Morph overlaps. You can upload several datasets to create custom tracks at the UCSC genome browser. At the top of the browser page, use the pull-down menu under ‘view’ to configure the browser to modify the font-size to a larger value; for example see Fig. 1. Under the same menu, you can select PDF to obtain a snapshot for your record or publication. For data validation, we analyzed results of ChIP assays reporting allele-specific binding of ZFP57 to ICRs/gDMRs [15]. Our approach localized the likely peak-positions of the canonical ICRs/gDMRs in the mouse genome (Table 3); for details see Ref. [10].
  14 in total

Review 1.  UCSC genome browser tutorial.

Authors:  Ann S Zweig; Donna Karolchik; Robert M Kuhn; David Haussler; W James Kent
Journal:  Genomics       Date:  2008-06-02       Impact factor: 5.736

Review 2.  Gene regulation.

Authors:  Minou Bina
Journal:  Methods Mol Biol       Date:  2013

3.  Impact of the MLL1 morphemes on codon utilization and preservation in CpG islands.

Authors:  Minou Bina; Phillip Wyss
Journal:  Biopolymers       Date:  2015-09       Impact factor: 2.505

4.  Imprinted control regions include composite DNA elements consisting of the ZFP57 binding site overlapping MLL1 morphemes.

Authors:  Minou Bina
Journal:  Genomics       Date:  2017-05-02       Impact factor: 5.736

5.  Sequence context analysis in the mouse genome: single nucleotide polymorphisms and CpG island sequences.

Authors:  Zhongming Zhao; Fengkai Zhang
Journal:  Genomics       Date:  2005-11-28       Impact factor: 5.736

6.  A reconfigured pattern of MLL occupancy within mitotic chromatin promotes rapid transcriptional reactivation following mitotic exit.

Authors:  Gerd A Blobel; Stephan Kadauke; Eric Wang; Alan W Lau; Johannes Zuber; Margaret M Chou; Christopher R Vakoc
Journal:  Mol Cell       Date:  2009-12-25       Impact factor: 17.970

7.  The mouse Zac1 locus: basis for imprinting and comparison with human ZAC.

Authors:  Rachel J Smith; Philippe Arnaud; Galia Konfortova; Wendy L Dean; Colin V Beechey; Gavin Kelsey
Journal:  Gene       Date:  2002-06-12       Impact factor: 3.688

8.  In embryonic stem cells, ZFP57/KAP1 recognize a methylated hexanucleotide to affect chromatin and DNA methylation of imprinting control regions.

Authors:  Simon Quenneville; Gaetano Verde; Andrea Corsinotti; Adamandia Kapopoulou; Johan Jakobsson; Sandra Offner; Ilaria Baglivo; Paolo V Pedone; Giovanna Grimaldi; Andrea Riccio; Didier Trono
Journal:  Mol Cell       Date:  2011-11-04       Impact factor: 17.970

9.  Allele-specific binding of ZFP57 in the epigenetic regulation of imprinted and non-imprinted monoallelic expression.

Authors:  Ruslan Strogantsev; Felix Krueger; Kazuki Yamazawa; Hui Shi; Poppy Gould; Megan Goldman-Roberts; Kirsten McEwen; Bowen Sun; Roger Pedersen; Anne C Ferguson-Smith
Journal:  Genome Biol       Date:  2015-05-30       Impact factor: 13.583

10.  Discovery of MLL1 binding units, their localization to CpG Islands, and their potential function in mitotic chromatin.

Authors:  Minou Bina; Phillip Wyss; Elise Novorolsky; Noorfatin Zulkelfi; Jing Xue; Randi Price; Matthew Fay; Zach Gutmann; Brian Fogler; Daidong Wang
Journal:  BMC Genomics       Date:  2013-12-28       Impact factor: 3.969

View more
  2 in total

1.  Discovering candidate imprinted genes and imprinting control regions in the human genome.

Authors:  Minou Bina
Journal:  BMC Genomics       Date:  2020-05-31       Impact factor: 3.969

2.  Along the Bos taurus genome, uncover candidate imprinting control regions.

Authors:  Phillip Wyss; Carol Song; Minou Bina
Journal:  BMC Genomics       Date:  2022-06-28       Impact factor: 4.547

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.