Literature DB >> 19948773

Globally, unrelated protein sequences appear random.

Daniel T Lavelle1, William R Pearson.   

Abstract

MOTIVATION: To test whether protein folding constraints and secondary structure sequence preferences significantly reduce the space of amino acid words in proteins, we compared the frequencies of four- and five-amino acid word clumps (independent words) in proteins to the frequencies predicted by four random sequence models.
RESULTS: While the human proteome has many overrepresented word clumps, these words come from large protein families with biased compositions (e.g. Zn-fingers). In contrast, in a non-redundant sample of Pfam-AB, only 1% of four-amino acid word clumps (4.7% of 5mer words) are 2-fold overrepresented compared with our simplest random model [MC(0)], and 0.1% (4mers) to 0.5% (5mers) are 2-fold overrepresented compared with a window-shuffled random model. Using a false discovery rate q-value analysis, the number of exceptional four- or five-letter words in real proteins is similar to the number found when comparing words from one random model to another. Consensus overrepresented words are not enriched in conserved regions of proteins, but four-letter words are enriched 1.18- to 1.56-fold in alpha-helical secondary structures (but not beta-strands). Five-residue consensus exceptional words are enriched for alpha-helix 1.43- to 1.61-fold. Protein word preferences in regular secondary structure do not appear to significantly restrict the use of sequence words in unrelated proteins, although the consensus exceptional words have a secondary structure bias for alpha-helix. Globally, words in protein sequences appear to be under very few constraints; for the most part, they appear to be random. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Substances:

Year:  2009        PMID: 19948773      PMCID: PMC2852211          DOI: 10.1093/bioinformatics/btp660

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  32 in total

1.  Protein secondary structure prediction based on position-specific scoring matrices.

Authors:  D T Jones
Journal:  J Mol Biol       Date:  1999-09-17       Impact factor: 5.469

2.  Simplified amino acid alphabets for protein fold recognition and implications for folding.

Authors:  L R Murphy; A Wallqvist; R M Levy
Journal:  Protein Eng       Date:  2000-03

3.  Porter: a new, accurate server for protein secondary structure prediction.

Authors:  Gianluca Pollastri; Aoife McLysaght
Journal:  Bioinformatics       Date:  2004-12-07       Impact factor: 6.937

4.  Assessment of CASP7 structure predictions for template free targets.

Authors:  Ralf Jauch; Hock Chuan Yeo; Prasanna R Kolatkar; Neil D Clarke
Journal:  Proteins       Date:  2007

5.  Folding of chymotrypsin inhibitor 2. 1. Evidence for a two-state transition.

Authors:  S E Jackson; A R Fersht
Journal:  Biochemistry       Date:  1991-10-29       Impact factor: 3.162

6.  Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships.

Authors:  S E Brenner; C Chothia; T J Hubbard
Journal:  Proc Natl Acad Sci U S A       Date:  1998-05-26       Impact factor: 11.205

7.  Protein folding and protein evolution: common folding nucleus in different subfamilies of c-type cytochromes?

Authors:  O B Ptitsyn
Journal:  J Mol Biol       Date:  1998-05-08       Impact factor: 5.469

8.  Protein structure and neutral theory of evolution.

Authors:  O B Ptitsyn; M V Volkenstein
Journal:  J Biomol Struct Dyn       Date:  1986-08

Review 9.  Protein folding dynamics: the diffusion-collision model and experimental data.

Authors:  M Karplus; D L Weaver
Journal:  Protein Sci       Date:  1994-04       Impact factor: 6.725

10.  Protein secondary structure prediction for a single-sequence using hidden semi-Markov models.

Authors:  Zafer Aydin; Yucel Altunbasak; Mark Borodovsky
Journal:  BMC Bioinformatics       Date:  2006-03-30       Impact factor: 3.169

View more
  6 in total

1.  On the Natural Structure of Amino Acid Patterns in Families of Protein Sequences.

Authors:  Pablo Turjanski; Diego U Ferreiro
Journal:  J Phys Chem B       Date:  2018-10-08       Impact factor: 2.991

2.  Do natural proteins differ from random sequences polypeptides? Natural vs. random proteins classification using an evolutionary neural network.

Authors:  Davide De Lucrezia; Debora Slanzi; Irene Poli; Fabio Polticelli; Giovanni Minervini
Journal:  PLoS One       Date:  2012-05-16       Impact factor: 3.240

3.  Global pentapeptide statistics are far away from expected distributions.

Authors:  Jarosław Poznański; Jan Topiński; Anna Muszewska; Konrad J Dębski; Marta Hoffman-Sommer; Krzysztof Pawłowski; Marcin Grynberg
Journal:  Sci Rep       Date:  2018-10-11       Impact factor: 4.379

4.  Randomness in Sequence Evolution Increases over Time.

Authors:  Guangyu Wang; Shixiang Sun; Zhang Zhang
Journal:  PLoS One       Date:  2016-05-25       Impact factor: 3.240

5.  Natural protein sequences are more intrinsically disordered than random sequences.

Authors:  Jia-Feng Yu; Zanxia Cao; Yuedong Yang; Chun-Ling Wang; Zhen-Dong Su; Ya-Wei Zhao; Ji-Hua Wang; Yaoqi Zhou
Journal:  Cell Mol Life Sci       Date:  2016-01-22       Impact factor: 9.261

6.  Gene Unprediction with Spurio: A tool to identify spurious protein sequences.

Authors:  Wolfram Höps; Matt Jeffryes; Alex Bateman
Journal:  F1000Res       Date:  2018-03-02
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.