Literature DB >> 11539281

Statistical and linguistic features of DNA sequences.

S Havlin1, S V Buldyrev, A L Goldberger, R N Mantegna, C K Peng, M Simons, H E Stanley.   

Abstract

We present evidence supporting the idea that the DNA sequence in genes containing noncoding regions is correlated, and that the correlation is remarkably long range--indeed, base pairs thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationary" feature of the sequence of base pairs by applying a new algorithm called Detrended Fluctuation Analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and noncoding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to all eukaryotic DNA sequences (33 301 coding and 29 453 noncoding) in the entire GenBank database. We describe a simple model to account for the presence of long-range power-law correlations which is based upon a generalization of the classic Levy walk. Finally, we describe briefly some recent work showing that the noncoding sequences have certain statistical features in common with natural languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts, and the Shannon approach to quantifying the "redundancy" of a linguistic text in terms of a measurable entropy function. We suggest that noncoding regions in plants and invertebrates may display a smaller entropy and larger redundancy than coding regions, further supporting the possibility that noncoding regions of DNA may carry biological information.

Entities:  

Keywords:  NASA Discipline Cardiopulmonary; NASA Discipline Number 14-10; NASA Program Space Physiology and Countermeasures; Non-NASA Center

Mesh:

Substances:

Year:  1995        PMID: 11539281     DOI: 10.1142/s0218348x95000229

Source DB:  PubMed          Journal:  Fractals        ISSN: 0218-348X            Impact factor:   3.665


  5 in total

Review 1.  Ingestion-controlling network: what's language got to do with it?

Authors:  Michael Myslobodsky; Richard Coppola
Journal:  Rev Neurosci       Date:  2010       Impact factor: 4.353

2.  Effect of extreme data loss on long-range correlated and anticorrelated signals quantified by detrended fluctuation analysis.

Authors:  Qianli D Y Ma; Ronny P Bartsch; Pedro Bernaola-Galván; Mitsuru Yoneyama; Plamen Ch Ivanov
Journal:  Phys Rev E Stat Nonlin Soft Matter Phys       Date:  2010-03-02

3.  Wavelet Analysis of DNA Bending Profiles reveals Structural Constraints on the Evolution of Genomic Sequences.

Authors:  Benjamin Audit; Cédric Vaillant; Alain Arnéodo; Yves d'Aubenton-Carafa; Claude Thermes
Journal:  J Biol Phys       Date:  2004-03       Impact factor: 1.365

4.  Small-Angle Scattering and Multifractal Analysis of DNA Sequences.

Authors:  lEugen Mircea Anitas
Journal:  Int J Mol Sci       Date:  2020-06-30       Impact factor: 5.923

Review 5.  ALUminating the Path of Atherosclerosis Progression: Chaos Theory Suggests a Role for Alu Repeats in the Development of Atherosclerotic Vascular Disease.

Authors:  Miguel Hueso; Josep M Cruzado; Joan Torras; Estanislao Navarro
Journal:  Int J Mol Sci       Date:  2018-06-12       Impact factor: 5.923

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.