| Literature DB >> 32884037 |
Mitsutaka Kadota1, Kazuaki Yamaguchi1, Yuichiro Hara1,2, Shigehiro Kuraku3.
Abstract
The nuclear protein CCCTC-binding factor (CTCF) contributes as an insulator to chromatin organization in diverse animals. The gene encoding this protein has a paralog which was first identified to be expressed exclusively in the testis in mammals and designated as CTCFL (also called BORIS). CTCFL orthologs were reported only among amniotes, and thus CTCFL was once thought to have arisen in the amniote lineage. In this study, we identified elasmobranch CTCFL orthologs, and investigated its origin with the aid of a shark genome assembly improved by proximity-guided scaffolding. Our analysis employing evolutionary interpretation of syntenic gene location suggested an earlier timing of the gene duplication between CTCF and CTCFL than previously thought, that is, around the common ancestor of extant vertebrates. Also, our transcriptomic sequencing revealed a biased expression of the catshark CTCFL in the testis, suggesting the origin of the tissue-specific localization in mammals more than 400 million years ago. To understand the historical process of the functional consolidation of the long-standing chromatin regulator CTCF, its additional paralogs remaining in some of the descendant lineages for spatially restricted transcript distribution should be taken into consideration.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32884037 PMCID: PMC7471279 DOI: 10.1038/s41598-020-71602-w
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Treemap for comparing the continuity of the existing and improved brownbanded bamboo shark genome assemblies. Lengths of the genome scaffold sequences longer than the N50 scaffolding length of the individual assembly are shown with the sizes of the rectangles. The detailed properties of the individual genome assemblies are included in Table 1.
Improvement of the brownbanded bamboo shark genome assembly.
| Metric | Cpunctatum_v1.0 | Cpunctatum_v2.0 | Cpunctatum_v2.1 |
|---|---|---|---|
| N50 scaffold length (Kbp) | 1,963 | 6,171 | 9,192 |
| Max. length (Mbp) | 17.15 | 38.70 | 56.09 |
| Min. length (bp) | 500 | 500 | 500 |
| # scaffolds > 10 Mbp | 14 | 72 | 82 |
| # scaffolds > 1 Mbp | 769 | 584 | 495 |
| # scaffolds > 100 Kbp | 2,797 | 1,372 | 1,253 |
| # scaffolds > 10 Kbp | 6,176 | 3,424 | 3,085 |
| % gaps (‘N’) | 9.83 | 10.05 | 10.39 |
| # (%) of reference orthologs detected as ‘complete’ | 209 (89.70%) | 210 (90.13%) | 208 (89.27%) |
| # (%) of reference orthologs detected as ‘fragmented’ | 219 (93.99%) | 221 (94.85%) | 221 (94.85%) |
| # (%) of reference orthologs recognized as ‘missing’ | 14 (6.01%) | 12 (5.15%) | 12 (5.15%) |
Sequences shorter than 500 bp are not taken into consideration. Gene space completeness was estimated by BUSCO v3 with the CVG, a set of 233 single-copy reference orthologs[36].
Figure 2Structural and phylogenetic properties of the shark CTCF homologs. (A) Protein domain structures of the cloudy catshark CTCF and CTCFL in comparison with their homologs of human (CTCF and CTCFL) and Arctic lamprey (LjCTCF and LjCTCF2). The Zn finger domains (ZF) were identified by the webserver MOTIF Search (https://www.genome.jp/tools/motif/). (B) Molecular phylogenetic tree of the CTCF genes and their relatives. The tree was inferred with the maximum-likelihood method using 230 aligned amino acid sites. The support values at nodes indicate bootstrap values and posterior probabilities based on the maximum-likelihood method and Bayesian inference in order, respectively. See “Methods” section for details. (C) Pairwise amino acid sequence alignment of the cloudy catshark CTCF (top) and CTCFL (bottom). The alignment was generated by MAFFT[28] ver. 7.471 by the iterative refinement method (L-INS-i). An asterisk indicates an identical amino acid residue. ZFs (1–11) identified by MOTIF Search are indicated with colored boxes. See Supplementary Fig. S2 for multiple alignment including more species.
Evaluation of tree topologies with the maximum-likelihood method.
| Rank by ln | Tree topology | ||||
|---|---|---|---|---|---|
| 1 | (((Ost,Cho),(Tet-L,Ela-L)),((Lam,Lam-2),Hag),OG) | ML | 0.954 (0.003) | 0.691 (0.005) | 1.000 (0.000) |
| 2 | ((((Ost,Cho),Tet-L),Ela-L),((Lam,Lam-2),Hag),OG) | 1.093172 | 0.837 (0.013) | 0.309 (0.005) | 0.996 (0.001) |
| 3 | (((Ost,Cho),(Tet-L,Ela-L)),(Lam,(Lam-2,Hag)),OG) | 1.118902 | 0.782 (0.013) | 0.237 (0.004) | 0.994 (0.001) |
| 4 | (((Ost,Cho),(Tet-L,Ela-L)),((Lam,Hag),Lam-2),OG) | 1.118911 | 0.685 (0.022) | 0.237 (0.004) | 0.994 (0.001) |
| 5 | ((((Ost,Cho),Ela-L),Tet-L),((Lam,Lam-2),Hag),OG) | 1.566155 | 0.491 (0.020) | 0.246 (0.004) | 0.989 (0.001) |
| 6 | ((((Ost,Cho),Ela-L),Tet-L),((Lam,Lam-2),Hag),OG) | 2.182305 | 0.593 (0.027) | 0.234 (0.004) | 0.988 (0.001) |
| 7 | ((((Ost,Cho),Tet-L),Ela-L),((Lam,Hag),Lam-2),OG) | 2.182311 | 0.593 (0.027) | 0.234 (0.004) | 0.988 (0.001) |
| 8 | (((Ost,Cho),((Lam,Lam-2),Hag)),(Tet-L,Ela-L),OG) | 2.248093 | 0.507 (0.018) | 0.186 (0.004) | 0.983 (0.001) |
| 9 | ((Ost,Cho),((Tet-L,Ela-L),((Lam,Lam-2),Hag)),OG) | 2.248145 | 0.506 (0.018) | 0.186 (0.004) | 0.983 (0.001) |
| 10 | (((((Ost,Cho),(Tet-L,Ela-L)),Hag),Lam),Lam-2,OG) | 2.325971 | 0.739 (0.016) | 0.231 (0.004) | 0.983 (0.001) |
| 82 | (((Ost,Cho),Tet-L),(( | 8.286610 | 0.254 (0.056) | 0.099 (0.003) | 0.847 (0.004) |
| 101 | (((((Ost,Cho),Tet-L),Hag),( | 9.115000 | 0.283 (0.029) | 0.157 (0.004) | 0.829 (0.004) |
| 134 | (((Ost,( | 10.332498 | 0.227 (0.043) | 0.094 (0.003) | 0.799 (0.004) |
| 163 | (((Ost,Cho),Tet-L),(( | 11.212458 | 0.066 (0.027) | 0.069 (0.003) | 0.738 (0.004) |
| 414 | (((Ost,( | 15.082485 | 0.126 (0.016) | 0.015 (0.001) | 0.630 (0.005) |
| 416 | (((( | 15.177589 | 0.004 (0.009) | 0.034 (0.002) | 0.592 (0.005) |
| 417 | (((Ost,( | 15.177689 | 0.004 (0.009) | 0.034 (0.002) | 0.592 (0.005) |
| 3,202 | ((((Ost,( | 37.521879 | 0.020 (0.006) | 0.006 (0.001) | 0.038 (0.002) |
| 3,874 | (((Ost,( | 48.528860 | 0.000 (0.000) | 0.000 (0.000) | 0.005 (0.001) |
Cho, chondrichthyan CTCF; Ost, osteichthyan CTCF; Tet-L, tetrapod CTCFL; Ela-L, elasmobranch CTCFL; Lam, lamprey CTCF; Lam-2, lamprey CTCF2; Hag, hagfish CTCF; OG, outgroup; lnL, log-likelihood; ΔlnL, difference of log-likelihood deviated from the ML tree; SE, standard error of log-likelihood.
ap value of the AU test[32,37].
bp value of the KH test[38].
cp value of the Shimodaira-Hasegawa (SH) test[39,40]. The parentheses include standard errors. The underlined items in the tree topologies refer to the top-rank tree that supports their proximal clustering.
Figure 3Synteny conservation in the genomic regions containing CTCFL orthologs. (A) Improved continuity of the bamboo shark genome assembly Cpunctatum_v2.1 by the Dovetail Chicago, in comparison with an earlier version Cpunctatum_v1.0. The ORF sequence of the CTCFL gene was derived from the scaffold ccg_chipu00000311 through manual curation. (B) Conserved synteny involving the CTCFL gene loci between human, softshell turtle, and bamboo shark. Only the orthologs that were confirmed by molecular phylogeny inference to be shared between the scaffold ccg_chipu00000311 of the bamboo shark genome assembly Cpunctatum_v2.1 and the human chromosome region 20q13 are shown, together with their orthologs of the scaffold JH209331.1 in the softshell turtle assembly PelSin1.0. Orthology is indicated with the same vertical level of the boxes. The CTCFL orthologs are indicated with orange boxes, and the PCK1 orthologs (see C) with light green boxes. The black dots indicate scaffold ends. See Supplementary Fig. S3 for a genomic landscape for these species in which relative lengths between genes are taken into account. (C) Molecular phylogenetic tree of the PCK1 gene and its relatives. The tree was inferred with the maximum-likelihood method using 616 aligned amino acid sites. The support values at nodes indicate bootstrap values and posterior probabilities based on the maximum-likelihood and Bayesian inference in order, respectively.
Figure 4Large-scale chromosomal duplication between the CTCF-associated paralogons in the chicken genome. The diagonal lines show the positions of the genes in the boxes in the chicken genome assembly GRCg6a, while the vertical lines indicate the entire chromosomes, 11, 20, and 2 in order. The members of the same gene families that were confirmed by molecular phylogeny inference to be derived from two-round WGDs are aligned on the same vertical levels. The dashed boxes indicate the genes (including CTCFL) missing in this genome assembly probably because of its secondary loss during evolution.
Figure 5Expression profiles of CTCF and CTCFL in cloudy catshark tissues. Expression levels of cloudy catshark CTCF and CTCFL in adult tissues and embryos at different developmental stages were quantified in TPM (transcripts per kilobase million mapped reads) by the eXpress program using reads mapped to the coding nucleotide sequences of the cloudy catshark (see “Methods” section). Note that the scales are not equal between the genes. Cloudy catshark embryos were staged according to the existing literature[35]. The details of the RNA-seq data used for the analysis are included in Supplementary Table S2. The equivalent expression profiles of the brownbanded bamboo shark CTCF and CTCFL genes are included in Supplementary Fig. S4.
Figure 6Evolutionary scenario of CTCF and CTCFL genes. Timings of gene duplication and loss are indicated with dashed arrows. Numbers of the colored boxes on the right show the number of genes in the genome, and the symbol ‘X’ indicates absence of the gene in the currently available genome assembly. The letter ‘T’ in the box of a CTCFL gene indicates its testis-specific gene expression.