| Literature DB >> 28181048 |
Yan Zheng1, Hong Li2,3, Yue Wang1, Hu Meng1, Qiang Zhang4, Xiaoqing Zhao5.
Abstract
The rules of k-mer non-random usage and the biological functions are worthy of special attention. Firstly, the article studied human 8-mer spectra and found that only the spectra of cytosine-guanine (CG) dinucleotide classification formed independent unimodal distributions when the 8-mers were classified into three subsets under 16 dinucleotide classifications. Secondly, the distribution rules were reproduced by other seven species including yeast, which showed that the evolution phenomenon had species universality. It followed that we proposed two theoretical conjectures: (1) CG1 motifs (8-mers including 1 CG) are the nucleosome-binding motifs. (2) CG2 motifs (8-mers including two or more than two CG) are the modular units of CpG islands. Our conjectures were confirmed in yeast by the following results: a maximum of average area under the receiver operating characteristic (AUC) resulted from CG1 information during nucleosome core sequences, and linker sequences were distinguished by three CG subsets; there was a one-to-one relationship between abundant CG1 signal regions and histone positions; the sequence changing of squeezed nucleosomes was relevant with the strength of CG1 signals; and the AUC value of 0.986 was based on CG2 information when CpG islands and non-CpG islands were distinguished by the three CG subsets.Entities:
Keywords: 8-mer biological functions; CGIs; Nucleosome binding motifs; Rules of 8-mer usage; Theoretical conjectures; Validation
Mesh:
Substances:
Year: 2017 PMID: 28181048 DOI: 10.1007/s10577-017-9554-z
Source DB: PubMed Journal: Chromosome Res ISSN: 0967-3849 Impact factor: 5.239