| Literature DB >> 30978376 |
Elena Fimmel1, Markus Gumbel2, Ali Karpuzoglu3, Sergey Petoukhov4.
Abstract
The revelation of compositional principles of the organization of long DNA sequences is one of the crucial tasks in the study of biosystems. This paper is devoted to the analysis of compositional differences between real DNA sequences and Markov-like randomly generated similar sequences. We formulate, among other things, a generalization of Chargaff's second rule and verify it empirically on DNA sequences of five model organisms taken from Genbank. Moreover, we apply the same frequency analysis to simulated sequences. When comparing the afore mentioned - real and random - sequences, significant similarities, on the one hand, as well as essential differences between them, on the other hand, are revealed and described. The significance and possible origin of these differences, including those from the viewpoint of maximum informativeness of genetic texts, is discussed. Besides, the paper discusses the question of what is a "long" DNA sequence and quantifies the choice of length. More precisely, the standard deviations of relative frequencies of bases stabilize from the length of approximately 100 000 bases, whereas the deviations are about three times as large at the length of approximately 25 000 bases.Keywords: Chargaff's parity rules; DNA; Genetic information
Mesh:
Substances:
Year: 2019 PMID: 30978376 DOI: 10.1016/j.biosystems.2019.04.003
Source DB: PubMed Journal: Biosystems ISSN: 0303-2647 Impact factor: 1.973