| Literature DB >> 29522801 |
Yun Jia1, Hong Li2, Jingfeng Wang3, Hu Meng4, Zhenhua Yang4.
Abstract
The spectra of k-mer frequencies can reveal the structures and evolution of genome sequences. We confirmed that the trimodal spectrum of 8-mers in human genome sequences is distinguished only by CG2, CG1 and CG0 8-mer sets, containing 2,1 or 0 CpG, respectively. This phenomenon is called independent selection law. The three types of CG 8-mers were considered as different functional elements. We conjectured that (1) nucleosome binding motifs are mainly characterized by CG1 8-mers and (2) the core structural units of CpG island sequences are predominantly characterized by CG2 8-mers. To validate our conjectures, nucleosome occupied sequences and CGI sequences were extracted, then the sequence parameters were constructed through the information of the three CG 8-mer sets respectively. ROC analysis showed that CG1 8-mers are more preference in nucleosome occupied segments (AUC > 0.7) and CG2 8-mers are more preference in CGI sequences (AUC > 0.99). This validates our conjecture in principle.Entities:
Keywords: 8-Mer spectrum; CGI sequences; Functions of CG 8-mers; Human genome; Independent selection law; Nucleosome occupied segments
Mesh:
Substances:
Year: 2018 PMID: 29522801 DOI: 10.1016/j.ygeno.2018.03.006
Source DB: PubMed Journal: Genomics ISSN: 0888-7543 Impact factor: 5.736