Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Information content of protein sequences.

Literature DB >> 10988023

Information content of protein sequences.

O Weiss¹, M A Jiménez-Montaño, H Herzel.

Abstract

The complexity of large sets of non-redundant protein sequences is measured. This is done by estimating the Shannon entropy as well as applying compression algorithms to estimate the algorithmic complexity. The estimators are also applied to randomly generated surrogates of the protein data. Our results show that proteins are fairly close to random sequences. The entropy reduction due to correlations is only about 1%. However, precise estimations of the entropy of the source are not possible due to finite sample effects. Compression algorithms also indicate that the redundancy is in the order of 1%. These results confirm the idea that protein sequences can be regarded as slightly edited random strings. We discuss secondary structure and low-complexity regions as causes of the redundancy observed. The findings are related to numerical and biochemical experiments with random polypeptides. Copyright 2000 Academic Press.

Entities: Disease

Mesh：

Substances：
Peptide Library

Year: 2000 PMID： 10988023 DOI： 10.1006/jtbi.2000.2138

Source DB: PubMed Journal: J Theor Biol ISSN： 0022-5193 Impact factor: 2.691

Keyword Cloud
Cited

29 in total

1. Protein aggregation/folding: the role of deterministic singularities of sequence hydrophobicity as determined by nonlinear signal analysis of acylphosphatase and Abeta(1-40).

Authors: Joseph P Zbilut; Alfredo Colosimo; Filippo Conti; Mauro Colafranceschi; Cesare Manetti; MariaCristina Valerio; Charles L Webber; Alessandro Giuliani
Journal: Biophys J Date: 2003-12 Impact factor: 4.033

2. Phylogeny of prokaryotes and chloroplasts revealed by a simple composition approach on all protein sequences from complete genomes without sequence alignment.

Authors: Z G Yu; L Q Zhou; V V Anh; K H Chu; S C Long; J Q Deng
Journal: J Mol Evol Date: 2005-04 Impact factor: 2.395

3. A study of residue correlation within protein sequences and its application to sequence classification.

Authors: Chris Hemmerich; Sun Kim
Journal: EURASIP J Bioinform Syst Biol Date: 2007

4. Compressing proteomes: the relevance of medium range correlations.

Authors: Dario Benedetto; Emanuele Caglioti; Claudia Chica
Journal: EURASIP J Bioinform Syst Biol Date: 2007

5. Globally, unrelated protein sequences appear random.

Authors: Daniel T Lavelle; William R Pearson
Journal: Bioinformatics Date: 2009-11-30 Impact factor: 6.937

6. A sequence-compatible amount of native burial information is sufficient for determining the structure of small globular proteins.

Authors: Antonio F Pereira de Araujo; José N Onuchic
Journal: Proc Natl Acad Sci U S A Date: 2009-10-26 Impact factor: 11.205

7. Nonlinear analysis of tRNAs nucleotide sequences by random walks: randomness and order in the primitive informational polymers.

Authors: G Bianciardi; L Borruso
Journal: J Mol Evol Date: 2015-01-11 Impact factor: 2.395