| Literature DB >> 28694486 |
Mitsutaka Kadota1, Yuichiro Hara1, Kaori Tanaka1, Wataru Takagi2, Chiharu Tanegashima1, Osamu Nishimura1, Shigehiro Kuraku3.
Abstract
The nuclear protein CCCTC-binding factor (CTCF) contributes as an insulator to chromatin organization in animal genomes. Currently, our knowledge of its binding property is confined mainly to mammals. In this study, we identified CTCF homologs in extant jawless fishes and performed ChIP-seq for the CTCF protein in the Arctic lamprey. Our phylogenetic analysis suggests that the lamprey lineage experienced gene duplication that gave rise to its unique paralog, designated CTCF2, which is independent from the previously recognized duplication between CTCF and CTCFL. The ChIP-seq analysis detected comparable numbers of CTCF binding sites between lamprey, chicken, and human, and revealed that the lamprey CTCF protein binds to the two-part motif, consisting of core and upstream motifs previously reported for mammals. These findings suggest that this mode of CTCF binding was established in the last common ancestor of extant vertebrates (more than 500 million years ago). We analyzed CTCF binding inside Hox clusters, which revealed a reinforcement of CTCF binding in the region spanning Hox1-4 genes that is unique to lamprey. Our study provides not only biological insights into the antiquity of CTCF-based epigenomic regulation known in mammals but also a technical basis for comparative epigenomic studies encompassing the whole taxon Vertebrata.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28694486 PMCID: PMC5504073 DOI: 10.1038/s41598-017-04506-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Properties of L. camtschaticum CTCF and CTCF2 genes. (a) Protein domain structures, in comparison with human CTCF and CTCFL. The Zn finger domains (ZF) were identified by MOTIF Search (http://www.genome.jp/tools/motif/). See SI Materials and Methods for detail procedures of LjCTCF and LjCTCF2 sequence determination by cDNA cloning. (b) Expression levels of LjCTCF (white) and LjCTCF2 (black) in embryos and adult tissues. FPKM, fragments per kilobase of exon per million mapped sequence reads. (c) Whole-mount in situ hybridization on stage 26.5 embryos. Riboprobes were designed in non-conserved regions downstream to the Zn finger domains to avoid cross-hybridization (for the magnified view and the result with upstream riboprobes, see Supplementary Fig. S10). Note that the difference in the intensities of the expression signals between the two genes do not correspond to that of actual expression levels, as indicated by our RNA-seq data in (b). Scale bars: 500 μm. (d) Molecular phylogenetic tree. This tree was inferred with the maximum-likelihood approach using 208 aligned amino acid sites (see Supplementary Tables S8 and S9 for the list of sequences used). Two stickleback sequences are included (upper, Ensembl ENSGACP00000003270; lower, ENSGACP00000020939). At each branch node in the tree, only bootstrap value of no less than 60, and the posterior probability inferred with the Bayesian approaches are shown.
Tree topology test for the phylogenetic relationship between CTCFL and CTCF2.
| Rank by log-likelihood (lnL) | Tree topology* | ΔlnL |
|
|
|---|---|---|---|---|
| 1 | ((CTCFL,Gn),((La-CTCF2,La),Hf),OG); | ML | 0.906 | 0.726 |
| 2 | ((Gn,((La,La-CTCF2),Hf)),CTCFL,OG); | 0.9 | 0.501 | 0.274 |
| 28 | ((Gn,(CTCFL,La-CTCF2)),(La,Hf),OG); | 10.3 | 1.00 × 10−5 | 0.034 |
| 29 | (Gn,((CTCFL,La-CTCF2),(La,Hf)),OG); | 10.3 | 5.00 × 10−6 | 0.034 |
| 31 | ((Gn,(La,Hf)),(CTCFL,La-CTCF2),OG); | 10.3 | 3.00 × 10−6 | 0.034 |
| 39§ | (((Gn,(CTCFL,La-CTCF2)),Hf),La,OG); | 15.6 | 0.024 | 0.035 |
*Gn, gnathostome CTCF; La, lamprey CTCF; Hf, hagfish CTCF; OG, outgroup CTCF.
† p-value of the approximately unbiased test[47]
‡ p -value of the Kishino-Hasegawa test[48]
§The phylogenetic tree supporting the CTCFL-CTCF2 monophyly with the largest pAU among the all tree topologies examined.
Abbreviation: ML, maximum likelihood tree.
Tree topology test for the duplication timing between CTCF and CTCFL
| Rank by log-likelihood (lnL) | Tree topology* | ΔlnL |
|
|
|---|---|---|---|---|
| 1 | (((((Amn,Sa),Amp),Te),Ch),CTCFL,Cy); | ML | 0.924 | 0.499 |
| 2 | ((((Amn,(Sa,Amp)),Te),Ch),CTCFL,Cy); | 0 | 0.925 | 0.501 |
| 3 | (((((Amn,Amp),Sa),Te),Ch),CTCFL,Cy); | 0 | 0.925 | 0.501 |
| 4 | ((((Amn,Amp),Sa),(Te,Ch)),CTCFL,Cy); | 1.6 | 0.667 | 0.22 |
| 5 | ((((Amn,Sa),Amp),(Te,Ch)),CTCFL,Cy); | 1.6 | 0.531 | 0.22 |
| 6 | (((Amn,(Amp,Sa)),(Te,Ch)),CTCFL,Cy); | 1.6 | 0.527 | 0.22 |
| 7 | (((((Amn,Amp),Sa),Ch),Te),CTCFL,Cy); | 1.6 | 0.376 | 0.212 |
| 8 | ((((Amn,(Amp,Sa)),Ch),Te),CTCFL,Cy); | 1.6 | 0.378 | 0.212 |
| 9 | (((((Amn,Sa),Amp),Ch),Te),CTCFL,Cy); | 1.6 | 0.375 | 0.212 |
| 10 | (((Amn,(Sa,(Ch,Te))),Amp),CTCFL,Cy); | 1.9 | 0.55 | 0.267 |
| 106 | (((((Amn,Amp),Sa),Te),CTCFL),Ch,Cy); | 6.7 | 0.513 | 0.139 |
| 111 | (((((Amn,Amp),Sa),CTCFL),Te),Ch,Cy); | 7.3 | 0.496 | 0.134 |
| 122† | ((((Amn,CTCFL),(Amp,Sa)),Te),Ch,Cy); | 7.9 | 0.299 | 0.124 |
| 130 | (((((Amn,CTCFL),Amp),Sa),Te),Ch,Cy); | 8.8 | 0.02 | 0.095 |
| 139 | (((((Amn,Amp),CTCFL),Sa),Te),Ch,Cy); | 9.1 | 0.028 | 0.091 |
| 327‡ | (((Amn,Amp),CTCFL),(Te,Ch),(Sa,Cy)); | 12.2 | 0.069 | 0.046 |
*Amn, amniote CTCF; Amp, amphibian CTCF; Sa, sarcopterygian CTCF; Te, teleost CTCF; Ch, chondrichthyan CTCF; Cy, cyclostome CTCF.
†The phylogenetic tree supporting an amniote CTCF-CTCFL monophyly with the highest pAU.
‡The phylogenetic tree supporting a tetrapod CTCF-CTCFL monophyly with the highest pAU.Abbreviation: ML, maximum likelihood tree.
Figure 2Identification of CTCF proteins. (a) Western blotting of CTCF proteins, using an anti-CTCF antibody. Protein extracts from human GM12878 cells, stage 32 chicken embryo, adult lamprey liver, and stage 27 lamprey embryos were used for the analysis. β-actin (ACTB) or histone H3 was used as a loading control protein. (b) Immunoprecipitation. Silver-stained SDS PAGE gel of IP proteins showing chicken CTCF protein of approximately 140 kDa, and lamprey CTCF protein of >250 kDa. Detailed procedures of western blotting and immunoprecipitation are described in Supplementary Materials and Methods. Note that the band positions may not be accurate possibly because of posttranslational modification.
Figure 3ChIP-seq peaks and binding motifs in lamprey, chicken and human cells. (a) ChIP-seq results for lamprey and chicken CTCF. Shown are the lamprey Hox α cluster and chicken Hox B cluster. (b) Numbers of peaks identified by MACS2. The “consensus peaks” (black bar) were identified by taking an intersect of peaks called in each replicate and in the merged replicates (see Materials and Methods for details about peak selection). (c) Overlap of consensus peaks in the lamprey embryo and adult liver tissue. (d) CTCF core and upstream motifs identified by MEME. The motifs were identified in two parts and are almost identical between the different species analyzed (also see Supplementary Fig. S11). (e) Peaks in various fold enrichment ranges harboring the core motif and core + upstream motifs. Numbers of ChIP-seq peaks are shown with white bars. Proportions of peaks containing a core motif (▪) and those containing a core + upstream motif (▴) are indicated as lines.
Association of CTCF binding with repetitive elements.
| Repeat ID* | Repeat class/subclass* | Number of region | Odds ratio |
| ||
|---|---|---|---|---|---|---|
| Whole genome† | ChIP-seq peak associated‡ | Control§ | ||||
| 743 | LTR/Gypsy | 166 | 96 | 2 | 58.04 | 1.87E-29 |
| 274 | DNA/hAT-Tip100 | 586 | 489 | 12 | 49.73 | 9.83E-147 |
| 771 | DNA/hAT-Charlie | 145 | 59 | 2 | 35.69 | 2.47E-17 |
| 738 | DNA/hAT-Charlie | 187 | 92 | 4 | 27.82 | 7.92E-26 |
| 697 | LTR/Gypsy | 359 | 137 | 7 | 23.70 | 6.85E-37 |
| 239 | Unknown | 1509 | 470 | 26 | 22.06 | 2.26E-124 |
| 749 | DNA/hAT-Charlie | 198 | 45 | 3 | 18.13 | 8.70E-12 |
| 828 | SINE/tRNA-V | 602 | 59 | 4 | 17.83 | 2.65E-15 |
| 337 | Unknown | 1553 | 345 | 24 | 17.50 | 9.98E-87 |
*See Materials and Methods, for repeat identification and (sub)class assignment.
†Total number of individual repeat regions in the whole genome identified by RepeatMasker.
‡Number of repeat regions overlapping with ChIP-seq peaks (summit ± 50 bp).
§Number of repeat regions overlapping with control regions (50,000 regions of 100 bp each).
¶ p-values from Fisher’s exact test subjected to multiple correction by the Q-value package in R.
Figure 4Distribution of CTCF binding sites in Hox clusters. Coding region of genes are indicated with gray boxes. CTCF binding sites are indicated with green arrowheads and bars. Arrowheads indicate the orientations of core motifs inferred by the FIMO program. Green bars indicate CTCF binding sites without a core motif. CTCF binding sites of the Arctic lamprey that overlap repeats are indicated with asterisks (see Results for details). Arrowheads in dashed boxes represent shared relative positions of CTCF binding sites between multiple Hox clusters. This figure includes only “significant peaks” defined in Materials and Methods. See Supplementary Fig. S6 for an equivalent scheme in mouse, dog, and opossum, and Supplementary Fig. S7 for detailed locations of all the peaks in Hox clusters. For the Arctic lamprey, we analyzed only Hox α-ε clusters that were identified in continuous sequences harboring multiple Hox genes.
Figure 5Possible scenarios of the establishment of CTCF binding patterns in vertebrate Hox clusters. Alternative evolutionary scenarios of CTCF gene duplication and CTCF binding patterns in Hox clusters are depicted. CTCF-based Hox regulation spanning Hox1-4 genes was established either at the ancestor of bilaterians (Scenario A) or in the cyclostome lineage (Scenario B).