Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Globally, unrelated protein sequences appear random.

Literature DB >> 19948773

Globally, unrelated protein sequences appear random.

Abstract

MOTIVATION: To test whether protein folding constraints and secondary structure sequence preferences significantly reduce the space of amino acid words in proteins, we compared the frequencies of four- and five-amino acid word clumps (independent words) in proteins to the frequencies predicted by four random sequence models.
RESULTS: While the human proteome has many overrepresented word clumps, these words come from large protein families with biased compositions (e.g. Zn-fingers). In contrast, in a non-redundant sample of Pfam-AB, only 1% of four-amino acid word clumps (4.7% of 5mer words) are 2-fold overrepresented compared with our simplest random model [MC(0)], and 0.1% (4mers) to 0.5% (5mers) are 2-fold overrepresented compared with a window-shuffled random model. Using a false discovery rate q-value analysis, the number of exceptional four- or five-letter words in real proteins is similar to the number found when comparing words from one random model to another. Consensus overrepresented words are not enriched in conserved regions of proteins, but four-letter words are enriched 1.18- to 1.56-fold in alpha-helical secondary structures (but not beta-strands). Five-residue consensus exceptional words are enriched for alpha-helix 1.43- to 1.61-fold. Protein word preferences in regular secondary structure do not appear to significantly restrict the use of sequence words in unrelated proteins, although the consensus exceptional words have a secondary structure bias for alpha-helix. Globally, words in protein sequences appear to be under very few constraints; for the most part, they appear to be random. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical Species

Mesh：

Substances：
Proteins

Year: 2009 PMID： 19948773 PMCID： PMC2852211 DOI： 10.1093/bioinformatics/btp660

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

32 in total

1. Protein secondary structure prediction based on position-specific scoring matrices.

Authors: D T Jones
Journal: J Mol Biol Date: 1999-09-17 Impact factor: 5.469

2. Simplified amino acid alphabets for protein fold recognition and implications for folding.

Authors: L R Murphy; A Wallqvist; R M Levy
Journal: Protein Eng Date: 2000-03

3. Porter: a new, accurate server for protein secondary structure prediction.

Authors: Gianluca Pollastri; Aoife McLysaght
Journal: Bioinformatics Date: 2004-12-07 Impact factor: 6.937

4. Assessment of CASP7 structure predictions for template free targets.

Authors: Ralf Jauch; Hock Chuan Yeo; Prasanna R Kolatkar; Neil D Clarke
Journal: Proteins Date: 2007

5. Folding of chymotrypsin inhibitor 2. 1. Evidence for a two-state transition.

Authors: S E Jackson; A R Fersht
Journal: Biochemistry Date: 1991-10-29 Impact factor: 3.162

6. Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships.

Authors: S E Brenner; C Chothia; T J Hubbard
Journal: Proc Natl Acad Sci U S A Date: 1998-05-26 Impact factor: 11.205

7. Protein folding and protein evolution: common folding nucleus in different subfamilies of c-type cytochromes?

Authors: O B Ptitsyn
Journal: J Mol Biol Date: 1998-05-08 Impact factor: 5.469

8. Protein structure and neutral theory of evolution.

Authors: O B Ptitsyn; M V Volkenstein
Journal: J Biomol Struct Dyn Date: 1986-08

Review 9. Protein folding dynamics: the diffusion-collision model and experimental data.

Authors: M Karplus; D L Weaver
Journal: Protein Sci Date: 1994-04 Impact factor: 6.725

10. Protein secondary structure prediction for a single-sequence using hidden semi-Markov models.

Authors: Zafer Aydin; Yucel Altunbasak; Mark Borodovsky
Journal: BMC Bioinformatics Date: 2006-03-30 Impact factor: 3.169

6 in total

1. On the Natural Structure of Amino Acid Patterns in Families of Protein Sequences.

Authors: Pablo Turjanski; Diego U Ferreiro
Journal: J Phys Chem B Date: 2018-10-08 Impact factor: 2.991

2. Do natural proteins differ from random sequences polypeptides? Natural vs. random proteins classification using an evolutionary neural network.

Authors: Davide De Lucrezia; Debora Slanzi; Irene Poli; Fabio Polticelli; Giovanni Minervini
Journal: PLoS One Date: 2012-05-16 Impact factor: 3.240