Literature DB >> 1758884

Information-theoretical entropy as a measure of sequence variability.

P S Shenkin1, B Erman, L D Mastrandrea.   

Abstract

We propose the use of the information-theoretical entrophy, S = -sigman pi log2 pi, as a measure of variability at a given position in a set of aligned sequences. pi stands for the fraction of times the i-th type appears at a position. For protein sequences, the sum has up to 20 terms, for nucleotide sequences, up to 4 terms, and for codon sequences, up to 61 terms. We compare S and Vs, a related measure, in detail with Vk, the traditional measure of immunoglobulin sequence variability, both in the abstract and as applied to the immunoglobulins. We conclude that S has desirable mathematical properties that Vk lacks and has intuitive and statistical meanings that accord well with the notion of variability. We find that Vk and the S-based measures are highly correlated for the immunoglobulins. We show by analysis of sequence data and by means of a mathematical model that this correlation is due to a strong tendency for the frequency of occurrence of amino acid types at a given position to be log-linear. It is not known whether the immunoglobulins are typical or atypical of protein families in this regard, nor is the origin of the observed rank-frequency distribution obvious, although we discuss several possible etiologies.

Mesh:

Substances:

Year:  1991        PMID: 1758884     DOI: 10.1002/prot.340110408

Source DB:  PubMed          Journal:  Proteins        ISSN: 0887-3585


  68 in total

1.  The identification of conserved interactions within the SH3 domain by alignment of sequences and structures.

Authors:  S M Larson; A R Davidson
Journal:  Protein Sci       Date:  2000-11       Impact factor: 6.725

2.  Thoroughly sampling sequence space: large-scale protein design of structural ensembles.

Authors:  Stefan M Larson; Jeremy L England; John R Desjarlais; Vijay S Pande
Journal:  Protein Sci       Date:  2002-12       Impact factor: 6.725

3.  Definition of the tempo of sequence diversity across an alignment and automatic identification of sequence motifs: Application to protein homologous families and superfamilies.

Authors:  Alex C W May
Journal:  Protein Sci       Date:  2002-12       Impact factor: 6.725

4.  Sequence and structure continuity of evolutionary importance improves protein functional site discovery and annotation.

Authors:  A D Wilkins; R Lua; S Erdin; R M Ward; O Lichtarge
Journal:  Protein Sci       Date:  2010-07       Impact factor: 6.725

5.  Hierarchical clustering analysis of flexible GBR 12909 dialkyl piperazine and piperidine analogs.

Authors:  Kathleen M Gilbert; Carol A Venanzi
Journal:  J Comput Aided Mol Des       Date:  2006-07-20       Impact factor: 3.686

6.  Molecular basis for specificity in the druggable kinome: sequence-based analysis.

Authors:  Jianping Chen; Xi Zhang; Ariel Fernández
Journal:  Bioinformatics       Date:  2007-01-25       Impact factor: 6.937

7.  Effects of experimental choices and analysis noise on surveys of the "rare biosphere".

Authors:  Timothy J Hamp; W Joe Jones; Anthony A Fodor
Journal:  Appl Environ Microbiol       Date:  2009-03-06       Impact factor: 4.792

8.  Prediction of catalytic residues using the variation of stereochemical properties.

Authors:  Yongchao Dou; Xiaoqi Zheng; Jun Wang
Journal:  Protein J       Date:  2009-01       Impact factor: 2.371

9.  Sequence conservation in the prediction of catalytic sites.

Authors:  Yongchao Dou; Xingbo Geng; Hongyun Gao; Jialiang Yang; Xiaoqi Zheng; Jun Wang
Journal:  Protein J       Date:  2011-04       Impact factor: 2.371

10.  Structural and functional restraints on the occurrence of single amino acid variations in human proteins.

Authors:  Sungsam Gong; Tom L Blundell
Journal:  PLoS One       Date:  2010-02-12       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.