Literature DB >> 10988023

Information content of protein sequences.

O Weiss1, M A Jiménez-Montaño, H Herzel.   

Abstract

The complexity of large sets of non-redundant protein sequences is measured. This is done by estimating the Shannon entropy as well as applying compression algorithms to estimate the algorithmic complexity. The estimators are also applied to randomly generated surrogates of the protein data. Our results show that proteins are fairly close to random sequences. The entropy reduction due to correlations is only about 1%. However, precise estimations of the entropy of the source are not possible due to finite sample effects. Compression algorithms also indicate that the redundancy is in the order of 1%. These results confirm the idea that protein sequences can be regarded as slightly edited random strings. We discuss secondary structure and low-complexity regions as causes of the redundancy observed. The findings are related to numerical and biochemical experiments with random polypeptides. Copyright 2000 Academic Press.

Entities:  

Mesh:

Substances:

Year:  2000        PMID: 10988023     DOI: 10.1006/jtbi.2000.2138

Source DB:  PubMed          Journal:  J Theor Biol        ISSN: 0022-5193            Impact factor:   2.691


  29 in total

1.  Protein aggregation/folding: the role of deterministic singularities of sequence hydrophobicity as determined by nonlinear signal analysis of acylphosphatase and Abeta(1-40).

Authors:  Joseph P Zbilut; Alfredo Colosimo; Filippo Conti; Mauro Colafranceschi; Cesare Manetti; MariaCristina Valerio; Charles L Webber; Alessandro Giuliani
Journal:  Biophys J       Date:  2003-12       Impact factor: 4.033

2.  Phylogeny of prokaryotes and chloroplasts revealed by a simple composition approach on all protein sequences from complete genomes without sequence alignment.

Authors:  Z G Yu; L Q Zhou; V V Anh; K H Chu; S C Long; J Q Deng
Journal:  J Mol Evol       Date:  2005-04       Impact factor: 2.395

3.  A study of residue correlation within protein sequences and its application to sequence classification.

Authors:  Chris Hemmerich; Sun Kim
Journal:  EURASIP J Bioinform Syst Biol       Date:  2007

4.  Compressing proteomes: the relevance of medium range correlations.

Authors:  Dario Benedetto; Emanuele Caglioti; Claudia Chica
Journal:  EURASIP J Bioinform Syst Biol       Date:  2007

5.  Globally, unrelated protein sequences appear random.

Authors:  Daniel T Lavelle; William R Pearson
Journal:  Bioinformatics       Date:  2009-11-30       Impact factor: 6.937

6.  A sequence-compatible amount of native burial information is sufficient for determining the structure of small globular proteins.

Authors:  Antonio F Pereira de Araujo; José N Onuchic
Journal:  Proc Natl Acad Sci U S A       Date:  2009-10-26       Impact factor: 11.205

7.  Nonlinear analysis of tRNAs nucleotide sequences by random walks: randomness and order in the primitive informational polymers.

Authors:  G Bianciardi; L Borruso
Journal:  J Mol Evol       Date:  2015-01-11       Impact factor: 2.395

8.  On the Natural Structure of Amino Acid Patterns in Families of Protein Sequences.

Authors:  Pablo Turjanski; Diego U Ferreiro
Journal:  J Phys Chem B       Date:  2018-10-08       Impact factor: 2.991

9.  Comparison of Real Frequencies of Strings vs. the Expected Ones Reveals the Information Capacity of Macromoleculae.

Authors:  Michael G Sadovsky
Journal:  J Biol Phys       Date:  2003-03       Impact factor: 1.365

Review 10.  Folding by numbers: primary sequence statistics and their use in studying protein folding.

Authors:  Brent Wathen; Zongchao Jia
Journal:  Int J Mol Sci       Date:  2009-04-08       Impact factor: 6.208

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.