Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Word organization in coding DNA: a mathematical model.

Literature DB >> 17046370

Word organization in coding DNA: a mathematical model.

Indranil Mukhopadhyay¹, Anup Som, Satyabrata Sahoo.

Abstract

This article deals with the relationship between vocabulary (total number of distinct oligomers or "words") and text-length (total number of oligomers or "words") for a coding DNA sequence (CDS). For natural human languages, Heaps established a mathematical formula known as Heaps' law, which relates vocabulary to text-length. Our analysis shows that Heaps' law fails to model this relationship for CDSs. Here we develop a mathematical model to establish the relationship between the number of type of words (vocabulary) and the number of words sampled (text-length) for CDSs, when non-overlapping nucleotide strings with the same length are treated as words. We use tangent-hyperbolic function, which captures the saturation property of vocabulary. Based on the parameters of the model, we formulate a mathematical equation, known as "equation of word organization", whose parameters essentially indicate that nucleotide organization of coding sequences are different from one another. We also compare the word organization of CDSs with the random word distribution and conclude that a CDS is neither similar to a natural human language nor to a random one. Moreover, these sequences have their unique nucleotide organization and it is completely structured for specific biological functioning.

Entities: Chemical Disease Species

Mesh：

Substances：
Nucleotides
DNA

Year: 2006 PMID： 17046370 DOI： 10.1016/j.thbio.2006.03.002

Source DB: PubMed Journal: Theory Biosci ISSN： 1431-7613 Impact factor: 1.919

9 in total

2 in total

1. Distinguishing Functional DNA Words; A Method for Measuring Clustering Levels.

Authors: Hanieh Moghaddasi; Khosrow Khalifeh; Amir Hossein Darooneh
Journal: Sci Rep Date: 2017-01-27 Impact factor: 4.379

2. Peptide vocabulary analysis reveals ultra-conservation and homonymity in protein sequences.

Authors: Derek Gatherer
Journal: Bioinform Biol Insights Date: 2009-11-24

2 in total

Word organization in coding DNA: a mathematical model.

1. Quantifying DNA-protein interactions by double-stranded DNA arrays.

2. Codon distributions in DNA.

3. Coding DNA sequences: statistical distributions.

4. Linguistic features of noncoding DNA sequences.

5. A DNA Motif Lexicon: cataloguing and annotating sequences.

6. Similarities and dissimilarities of phage genomes.

7. Long-range correlations in nucleotide sequences.

8. Sequence fossils, triplet expansion, and reconstruction of earliest codons.

9. "Silent" sites in Drosophila genes are not neutral: evidence of selection among synonymous codons.

1. Distinguishing Functional DNA Words; A Method for Measuring Clustering Levels.

2. Peptide vocabulary analysis reveals ultra-conservation and homonymity in protein sequences.