Literature DB >> 10195279

Twilight zone of protein sequence alignments.

B Rost1.   

Abstract

Sequence alignments unambiguously distinguish between protein pairs of similar and non-similar structure when the pairwise sequence identity is high (>40% for long alignments). The signal gets blurred in the twilight zone of 20-35% sequence identity. Here, more than a million sequence alignments were analysed between protein pairs of known structures to re-define a line distinguishing between true and false positives for low levels of similarity. Four results stood out. (i) The transition from the safe zone of sequence alignment into the twilight zone is described by an explosion of false negatives. More than 95% of all pairs detected in the twilight zone had different structures. More precisely, above a cut-off roughly corresponding to 30% sequence identity, 90% of the pairs were homologous; below 25% less than 10% were. (ii) Whether or not sequence homology implied structural identity depended crucially on the alignment length. For example, if 10 residues were similar in an alignment of length 16 (>60%), structural similarity could not be inferred. (iii) The 'more similar than identical' rule (discarding all pairs for which percentage similarity was lower than percentage identity) reduced false positives significantly. (iv) Using intermediate sequences for finding links between more distant families was almost as successful: pairs were predicted to be homologous when the respective sequence families had proteins in common. All findings are applicable to automatic database searches.

Mesh:

Year:  1999        PMID: 10195279     DOI: 10.1093/protein/12.2.85

Source DB:  PubMed          Journal:  Protein Eng        ISSN: 0269-2139


  546 in total

1.  NikR is a ribbon-helix-helix DNA-binding protein.

Authors:  P T Chivers; R T Sauer
Journal:  Protein Sci       Date:  1999-11       Impact factor: 6.725

2.  Evaluation of PSI-BLAST alignment accuracy in comparison to structural alignments.

Authors:  I Friedberg; T Kaplan; H Margalit
Journal:  Protein Sci       Date:  2000-11       Impact factor: 6.725

3.  CKAAPs DB: a conserved key amino acid positions database.

Authors:  W W Li; B V Reddy; I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

4.  Finding nuclear localization signals.

Authors:  M Cokol; R Nair; B Rost
Journal:  EMBO Rep       Date:  2000-11       Impact factor: 8.807

5.  Improved recognition of native-like protein structures using a family of designed sequences.

Authors:  Patrice Koehl; Michael Levitt
Journal:  Proc Natl Acad Sci U S A       Date:  2002-01-08       Impact factor: 11.205

6.  The CATH extended protein-family database: providing structural annotations for genome sequences.

Authors:  Frances M G Pearl; David Lee; James E Bray; Daniel W A Buchan; Adrian J Shepherd; Christine A Orengo
Journal:  Protein Sci       Date:  2002-02       Impact factor: 6.725

7.  Three-dimensional structure of fibrolase, the fibrinolytic enzyme from southern copperhead venom, modeled from the X-ray structure of adamalysin II and atrolysin C.

Authors:  M B Bolger; S Swenson; F S Markland
Journal:  AAPS PharmSci       Date:  2001

8.  Models of the extracellular domain of the nicotinic receptors and of agonist- and Ca2+-binding sites.

Authors:  Nicolas Le Novère; Thomas Grutter; Jean-Pierre Changeux
Journal:  Proc Natl Acad Sci U S A       Date:  2002-02-26       Impact factor: 11.205

9.  Comparing function and structure between entire proteomes.

Authors:  J Liu; B Rost
Journal:  Protein Sci       Date:  2001-10       Impact factor: 6.725

10.  The automatic detection of homologous regions (ADHoRe) and its application to microcolinearity between Arabidopsis and rice.

Authors:  Klaas Vandepoele; Yvan Saeys; Cedric Simillion; Jeroen Raes; Yves Van De Peer
Journal:  Genome Res       Date:  2002-11       Impact factor: 9.043

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.