Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A global compositional complexity measure for biological sequences: AT-rich and GC-rich genomes encode less complex proteins.

Literature DB >> 10642881

A global compositional complexity measure for biological sequences: AT-rich and GC-rich genomes encode less complex proteins.

Abstract

Different local regions of natural amino acid or nucleotide sequences show remarkable heterogeneity in residue composition, reflecting diversity in evolutionary history and physiochemical constraints. Compositional complexity measures are helpful for describing and understanding this variegation. Motivated by some open problems in comparative genomics and protein folding, we have developed a new 'global' compositional complexity measure, G1, which overcomes a crucial limitation of earlier methods. The 'local' measures used in previous research resemble entropy functions and are inherently dependent on an underlying probability distribution. Local measures cannot rigorously compare complexity across sequences of substantially different size, because real sequences show very irregular heterogeneity and do not have the necessary ergodicity in scaling and asymptotic properties. G1 is a member of a new class of scale-independent, distribution-independent complexity functions. For a sequence S of length L on an N-letter alphabet, G1 is derived from ratios in the integer partition lattice, P¿L,N¿ of L with N parts, where the elements of P¿L,N¿ are the state vectors of S, (n1, n2,..., nN), ranked by an order principle. We present theorems and proofs relating to the metric properties of G1 and its relationship to other state-vector-dependent compositional complexity functions, together with a fully-efficient O(L) algorithm to compute G1. The distributions of G1 were calculated for the entire sets of translated proteins encoded by extensively sequenced genomes. The results establish the existence of a clear evolutionary principle, common to bacteria, archaea and eukaryotes, that the proteins encoded by more extreme AT-rich and GC-rich genomes have generally lower compositional complexity than those of more typical organisms.

Mesh：

Substances：
Proteins
DNA

Year: 2000 PMID： 10642881 DOI： 10.1016/s0097-8485(99)00048-0

Source DB: PubMed Journal: Comput Chem ISSN： 0097-8485

Keyword Cloud
Cited

11 in total

A global compositional complexity measure for biological sequences: AT-rich and GC-rich genomes encode less complex proteins.

1. The compositional adjustment of amino acid substitution matrices.

2. The Ensembl analysis pipeline.

3. Compositional adjustment of Dirichlet mixture priors.

Review 4. Protein database searches using compositionally adjusted substitution matrices.

5. The genetic code is nearly optimal for allowing additional information within protein-coding sequences.

6. Low-complexity regions in Plasmodium falciparum proteins.

Review 7. Substitution scoring matrices for proteins - An overview.

8. Association of SLC34A2 variation and sodium-lithium countertransport activity in humans and baboons.

9. MACSIMS: multiple alignment of complete sequences information management system.

10. Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches.