Literature DB >> 9344742

Estimating the entropy of DNA sequences.

A O Schmitt1, H Herzel.   

Abstract

The Shannon entropy is a standard measure for the order state of symbol sequences, such as, for example, DNA sequences. In order to incorporate correlations between symbols, the entropy of n-mers (consecutive strands of n symbols) has to be determined. Here, an assay is presented to estimate such higher order entropies (block entropies) for DNA sequences when the actual number of observations is small compared with the number of possible outcomes. The n-mer probability distribution underlying the dynamical process is reconstructed using elementary statistical principles: The theorem of asymptotic equi-distribution and the Maximum Entropy Principle. Constraints are set to force the constructed distributions to adopt features which are characteristic for the real probability distribution. From the many solutions compatible with these constraints the one with the highest entropy is the most likely one according to the Maximum Entropy Principle. An algorithm performing this procedure is expounded. It is tested by applying it to various DNA model sequences whose exact entropies are known. Finally, results for a real DNA sequence, the complete genome of the Epstein Barr virus, are presented and compared with those of other information carriers (texts, computer source code, music). It seems as if DNA sequences possess much more freedom in the combination of the symbols of their alphabet than written language or computer source codes. Copyright 1997 Academic Press Limited.

Entities:  

Mesh:

Substances:

Year:  1997        PMID: 9344742     DOI: 10.1006/jtbi.1997.0493

Source DB:  PubMed          Journal:  J Theor Biol        ISSN: 0022-5193            Impact factor:   2.691


  27 in total

1.  Correction algorithm for finite sample statistics.

Authors:  T Pöschel; W Ebeling; C Frömmel; R Ramírez
Journal:  Eur Phys J E Soft Matter       Date:  2003-12       Impact factor: 1.890

2.  "Word" preference in the genomic text and genome evolution: different modes of n-tuplet usage in coding and noncoding sequences.

Authors:  Christoforos Nikolaou; Yannis Almirantis
Journal:  J Mol Evol       Date:  2005-07-19       Impact factor: 2.395

3.  A Markov model of the Indus script.

Authors:  Rajesh P N Rao; Nisha Yadav; Mayank N Vahia; Hrishikesh Joglekar; R Adhikari; Iravatham Mahadevan
Journal:  Proc Natl Acad Sci U S A       Date:  2009-08-05       Impact factor: 11.205

4.  Genotypic Complexity of Fisher's Geometric Model.

Authors:  Sungmin Hwang; Su-Chan Park; Joachim Krug
Journal:  Genetics       Date:  2017-04-26       Impact factor: 4.562

5.  A new model for ancient DNA decay based on paleogenomic meta-analysis.

Authors:  Logan Kistler; Roselyn Ware; Oliver Smith; Matthew Collins; Robin G Allaby
Journal:  Nucleic Acids Res       Date:  2017-06-20       Impact factor: 16.971

6.  AnnoGen: annotating genome-wide pragmatic features.

Authors:  Quanhu Sheng; Hui Yu; Olufunmilola Oyebamiji; Jiandong Wang; Danqian Chen; Scott Ness; Ying-Yong Zhao; Yan Guo
Journal:  Bioinformatics       Date:  2020-05-01       Impact factor: 6.937

7.  riboSeed: leveraging prokaryotic genomic architecture to assemble across ribosomal regions.

Authors:  Nicholas R Waters; Florence Abram; Fiona Brennan; Ashleigh Holmes; Leighton Pritchard
Journal:  Nucleic Acids Res       Date:  2018-06-20       Impact factor: 16.971

8.  To denoise or to cluster, that is not the question: optimizing pipelines for COI metabarcoding and metaphylogeography.

Authors:  Adrià Antich; Creu Palacin; Owen S Wangensteen; Xavier Turon
Journal:  BMC Bioinformatics       Date:  2021-04-05       Impact factor: 3.169

9.  A novel hierarchical clustering algorithm for gene sequences.

Authors:  Dan Wei; Qingshan Jiang; Yanjie Wei; Shengrui Wang
Journal:  BMC Bioinformatics       Date:  2012-07-23       Impact factor: 3.169

10.  Like Father Like Son: Cultural and Genetic Contributions to Song Inheritance in an Estrildid Finch.

Authors:  Rebecca N Lewis; Masayo Soma; Selvino R de Kort; R Tucker Gilman
Journal:  Front Psychol       Date:  2021-06-04
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.