Literature DB >> 9367767

Intermediate sequences increase the detection of homology between sequences.

J Park1, S A Teichmann, T Hubbard, C Chothia.   

Abstract

Two homologous sequences, which have diverged beyond the point where their homology can be recognised by a simple direct comparison, can be related through a third sequence that is suitably intermediate between the two. High scores, for a sequence match between the first and third sequences and between the second and the third sequences, imply that the first and second sequences are related even though their own match score is low. We have tested the usefulness of this idea using a database that contains the sequences of 971 protein domains whose structures are known and whose residue identities with each other are some 40% or less (PDB40D). On the basis of sequence and structural information, 2143 pairs of these sequences are known to have an evolutionary relationship. FASTA, in an all-against-all comparison of the sequences in the database, detected 320 (15%) of these relationships as well as three false positive (i.e. 1% error rate). Using intermediate sequences found by FASTA matches of PDB40D sequences to those in the large non-redundant OWL database we could detect 550 evolutionary relationships with an error rate of 1%. This means the intermediate sequence procedure increases the ability to recognise the evolutionary relationships amongst the PDB40D sequences by 70%.

Mesh:

Substances:

Year:  1997        PMID: 9367767     DOI: 10.1006/jmbi.1997.1288

Source DB:  PubMed          Journal:  J Mol Biol        ISSN: 0022-2836            Impact factor:   5.469


  59 in total

1.  The ASTRAL compendium for protein structure and sequence analysis.

Authors:  S E Brenner; P Koehl; M Levitt
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Assigning genomic sequences to CATH.

Authors:  F M Pearl; D Lee; J E Bray; I Sillitoe; A E Todd; A P Harrison; J M Thornton; C A Orengo
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

3.  Identification of related proteins with weak sequence identity using secondary structure information.

Authors:  C Geourjon; C Combet; C Blanchet; G Deléage
Journal:  Protein Sci       Date:  2001-04       Impact factor: 6.725

4.  Comparison of sequence profiles. Strategies for structural predictions using sequence information.

Authors:  L Rychlewski; L Jaroszewski; W Li; A Godzik
Journal:  Protein Sci       Date:  2000-02       Impact factor: 6.725

5.  Genome analysis: Assigning protein coding regions to three-dimensional structures.

Authors:  A A Salamov; M Suwa; C A Orengo; M B Swindells
Journal:  Protein Sci       Date:  1999-04       Impact factor: 6.725

6.  The CATH extended protein-family database: providing structural annotations for genome sequences.

Authors:  Frances M G Pearl; David Lee; James E Bray; Daniel W A Buchan; Adrian J Shepherd; Christine A Orengo
Journal:  Protein Sci       Date:  2002-02       Impact factor: 6.725

7.  Improved detection of homologous membrane proteins by inclusion of information from topology predictions.

Authors:  Maria Hedman; Hans Deloof; Gunnar Von Heijne; Arne Elofsson
Journal:  Protein Sci       Date:  2002-03       Impact factor: 6.725

8.  Gene3D: structural assignment for whole genes and genomes using the CATH domain structure database.

Authors:  Daniel W A Buchan; Adrian J Shepherd; David Lee; Frances M G Pearl; Stuart C G Rison; Janet M Thornton; Christine A Orengo
Journal:  Genome Res       Date:  2002-03       Impact factor: 9.043

Review 9.  Classification of protein folds.

Authors:  Robert B Russell
Journal:  Mol Biotechnol       Date:  2002-01       Impact factor: 2.695

10.  Pcons: a neural-network-based consensus predictor that improves fold recognition.

Authors:  J Lundström; L Rychlewski; J Bujnicki; A Elofsson
Journal:  Protein Sci       Date:  2001-11       Impact factor: 6.725

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.