Literature DB >> 8804598

The Shannon information entropy of protein sequences.

B J Strait1, T G Dewey.   

Abstract

A comprehensive data base is analyzed to determine the Shannon information content of a protein sequence. This information entropy is estimated by three methods: a k-tuplet analysis, a generalized Zipf analysis, and a "Chou-Fasman gambler." The k-tuplet analysis is a "letter" analysis, based on conditional sequence probabilities. The generalized Zipf analysis demonstrates the statistical linguistic qualities of protein sequences and uses the "word" frequency to determine the Shannon entropy. The Zipf analysis and k-tuplet analysis give Shannon entropies of approximately 2.5 bits/amino acid. This entropy is much smaller than the value of 4.18 bits/amino acid obtained from the nonuniform composition of amino acids in proteins. The "Chou-Fasman" gambler is an algorithm based on the Chou-Fasman rules for protein structure. It uses both sequence and secondary structure information to guess at the number of possible amino acids that could appropriately substitute into a sequence. As in the case for the English language, the gambler algorithm gives significantly lower entropies than the k-tuplet analysis. Using these entropies, the number of most probable protein sequences can be calculated. The number of most probable protein sequences is much less than the number of possible sequences but is still much larger than the number of sequences thought to have existed throughout evolution. Implications of these results for mutagenesis experiments are discussed.

Mesh:

Substances:

Year:  1996        PMID: 8804598      PMCID: PMC1233466          DOI: 10.1016/S0006-3495(96)79210-X

Source DB:  PubMed          Journal:  Biophys J        ISSN: 0006-3495            Impact factor:   4.033


  12 in total

Review 1.  Applied molecular evolution.

Authors:  S A Kauffman
Journal:  J Theor Biol       Date:  1992-07-07       Impact factor: 2.691

2.  On the information content of cytochrome c.

Authors:  H P Yockey
Journal:  J Theor Biol       Date:  1977-08-07       Impact factor: 2.691

3.  Selection of representative protein data sets.

Authors:  U Hobohm; M Scharf; R Schneider; C Sander
Journal:  Protein Sci       Date:  1992-03       Impact factor: 6.725

4.  Multifractal analysis of solvent accessibilities in proteins.

Authors: 
Journal:  Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics       Date:  1995-07

5.  Multifractals and decoded walks: Applications to protein sequence correlations.

Authors: 
Journal:  Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics       Date:  1995-12

6.  Multifractals, encoded walks and the ergodicity of protein sequences.

Authors:  T G Dewey; B J Strait
Journal:  Pac Symp Biocomput       Date:  1996

Review 7.  Empirical predictions of protein conformation.

Authors:  P Y Chou; G D Fasman
Journal:  Annu Rev Biochem       Date:  1978       Impact factor: 23.643

8.  Combinatorial cassette mutagenesis as a probe of the informational content of protein sequences.

Authors:  J F Reidhaar-Olson; R T Sauer
Journal:  Science       Date:  1988-07-01       Impact factor: 47.728

9.  LINUS: a hierarchic procedure to predict the fold of a protein.

Authors:  R Srinivasan; G D Rose
Journal:  Proteins       Date:  1995-06

10.  Nonrandomness in protein sequences: evidence for a physically driven stage of evolution?

Authors:  V S Pande; A Y Grosberg; T Tanaka
Journal:  Proc Natl Acad Sci U S A       Date:  1994-12-20       Impact factor: 11.205

View more
  35 in total

1.  Nonlinear methods in the analysis of protein sequences: a case study in rubredoxins.

Authors:  A Giuliani; R Benigni; P Sirabella; J P Zbilut; A Colosimo
Journal:  Biophys J       Date:  2000-01       Impact factor: 4.033

2.  Frequencies of amino acid strings in globular protein sequences indicate suppression of blocks of consecutive hydrophobic residues.

Authors:  R Schwartz; S Istrail; J King
Journal:  Protein Sci       Date:  2001-05       Impact factor: 6.725

3.  Protein aggregation/folding: the role of deterministic singularities of sequence hydrophobicity as determined by nonlinear signal analysis of acylphosphatase and Abeta(1-40).

Authors:  Joseph P Zbilut; Alfredo Colosimo; Filippo Conti; Mauro Colafranceschi; Cesare Manetti; MariaCristina Valerio; Charles L Webber; Alessandro Giuliani
Journal:  Biophys J       Date:  2003-12       Impact factor: 4.033

4.  An information theoretic approach to macromolecular modeling: I. Sequence alignments.

Authors:  Tiba Aynechi; Irwin D Kuntz
Journal:  Biophys J       Date:  2005-11       Impact factor: 4.033

5.  Frequencies of hydrophobic and hydrophilic runs and alternations in proteins of known structure.

Authors:  Russell Schwartz; Jonathan King
Journal:  Protein Sci       Date:  2006-01       Impact factor: 6.725

6.  Structural diversity of protein segments follows a power-law distribution.

Authors:  Yoshito Sawada; Shinya Honda
Journal:  Biophys J       Date:  2006-05-26       Impact factor: 4.033

7.  How are model protein structures distributed in sequence space?

Authors:  E Bornberg-Bauer
Journal:  Biophys J       Date:  1997-11       Impact factor: 4.033

8.  Contribution of genome-wide HCV genetic differences to outcome of interferon-based therapy in Caucasian American and African American patients.

Authors:  Maureen J Donlin; Nathan A Cannon; Rajeev Aurora; Jia Li; Abdus S Wahed; Adrian M Di Bisceglie; John E Tavis
Journal:  PLoS One       Date:  2010-02-03       Impact factor: 3.240

9.  Self-organization and entropy reduction in a living cell.

Authors:  Paul C W Davies; Elisabeth Rieper; Jack A Tuszynski
Journal:  Biosystems       Date:  2012-11-15       Impact factor: 1.973

Review 10.  Folding by numbers: primary sequence statistics and their use in studying protein folding.

Authors:  Brent Wathen; Zongchao Jia
Journal:  Int J Mol Sci       Date:  2009-04-08       Impact factor: 6.208

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.