| Literature DB >> 16241267 |
Dirk Holste1, Ivo Grosse, Stephan Beirer, Patrick Schieg, Hanspeter Herzel.
Abstract
We study the nucleotide-nucleotide mutual information function I(k) of the DNA sequences of the three completely sequenced human chromosomes 20, 21, and 22. We find in each human chromosome (i) the absence of the k=3 base pair (bp) sequence periodicity characteristic for protein coding regions, (ii) the absence of the k=10-11 bp sequence periodicity characteristic for both protein secondary structure and DNA bendability, and (iii) the presence of significant statistical dependencies at about k=135 bp and at about k=165 bp. We investigate to which degree the density and composition of interspersed repeats might explain these observed statistical patterns in all three human chromosomes. We use simple stochastic models to substitute known interspersed repeats and find by numerical studies that (iv) the presence of interspersed repeats dominates short-range correlations as measured by I(k) on the scale of several hundred base pairs in human chromosomes 20, 21, and 22. On the other hand, we find that (v) interspersed repeats contribute only weakly to long-range correlations due to the clustering of highly abundant Alu repeats.Entities:
Mesh:
Substances:
Year: 2003 PMID: 16241267 DOI: 10.1103/PhysRevE.67.061913
Source DB: PubMed Journal: Phys Rev E Stat Nonlin Soft Matter Phys ISSN: 1539-3755