Literature DB >> 32161227

Mb-level CpG and TFBS islands visualized by AI and their roles in the nuclear organization of the human genome.

Kennosuke Wada1, Yoshiko Wada1, Toshimichi Ikemura1.   

Abstract

Unsupervised machine learning that can discover novel knowledge from big sequence data without prior knowledge or particular models is highly desirable for current genome study. We previously established a batch-learning self-organizing map (BLSOM) for oligonucleotide compositions, which can reveal various novel genome characteristics from big sequence data, and found that transcription factor binding sequences (TFBSs) and CpG-containing oligonucleotides are enriched in human centromeric and pericentromeric regions, which support centromere clustering and form the condensed heterochromatin "chromocenter" in interphase nuclei. The number and size of chromocenters, as well as the type of centromeres gathered in individual chromocenters, vary depending on cell type. To study molecular mechanisms of cell type-dependent chromocenter formation, we analyzed distribution patterns of occurrence per Mb of hexa- and heptanucleotide TFBSs, which have been compiled by the SwissRegulon Portal, and of CpG-containing oligonucleotides. We found Mb-level islands enriched for TFBSs and CpG-containing oligonucleotides in centromeric and pericentromeric regions on all human chromosomes except chrY. Considering molecular mechanisms for cell type-dependent centromere clustering, the chromosome-dependent enrichment of a set of TFBSs and CpG-containing oligonucleotides is of particular interest, since the cellular content of TFs and methyl-CpG-binding proteins exhibits cell type-dependent regulation. A newly introduced BLSOM, which analyzed occurrences of a total of 3,946 octanucleotide TFBSs compiled by the SwissRegulon Portal, has self-organized (separated) the sequences that are characteristically enriched in TFBSs and shown that these sequences are derived primarily from centromeric and pericentromeric constitutive heterochromatin regions. Furthermore, the BLSOM identified and visualized characteristic TFBSs that are enriched in these regions. By analyzing Hi-C data for interchromosomal interactions, the present study showed that the chromatin segments supporting the interchromosomal interactions locate primarily in Mb-level TFBS and CpG islands and are thus enriched for a wide variety of TFBSs and CG-containing oligonucleotides.

Entities:  

Keywords:  Hi-C; Self-Organizing Map; big data; oligonucleotide composition; unsupervised machine learning

Mesh:

Substances:

Year:  2020        PMID: 32161227     DOI: 10.1266/ggs.19-00027

Source DB:  PubMed          Journal:  Genes Genet Syst        ISSN: 1341-7568            Impact factor:   1.517


  3 in total

1.  Comparative genomic analysis of the human genome and six bat genomes using unsupervised machine learning: Mb-level CpG and TFBS islands.

Authors:  Yuki Iwasaki; Toshimichi Ikemura; Kennosuke Wada; Yoshiko Wada; Takashi Abe
Journal:  BMC Genomics       Date:  2022-07-08       Impact factor: 4.547

2.  Implication of a new function of human tDNAs in chromatin organization.

Authors:  Yuki Iwasaki; Toshimichi Ikemura; Ken Kurokawa; Norihiro Okada
Journal:  Sci Rep       Date:  2020-10-15       Impact factor: 4.379

3.  Comparative genomics of Glandirana rugosa using unsupervised AI reveals a high CG frequency.

Authors:  Yukako Katsura; Toshimichi Ikemura; Rei Kajitani; Atsushi Toyoda; Takehiko Itoh; Mitsuaki Ogata; Ikuo Miura; Kennosuke Wada; Yoshiko Wada; Yoko Satta
Journal:  Life Sci Alliance       Date:  2021-03-12
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.