| Literature DB >> 10842345 |
J C Ison1, M J Blades, A J Bleasby, S C Daniel, J H Parish, J B Findlay.
Abstract
We extend the concept of the motif as a tool for characterizing protein families and explore the feasibility of a sparse "motif" that is the length of the protein sequence itself. The type of motif discussed is a sparse family signature consisting of a set of N key residue positions (A1, A2...AN) preceded by gaps (G) thus G1A1G2A2. ...GNAN. Both a residue and gap can be variable. A signature is matched to a protein sequence and scored using a dynamic programming algorithm which permits variability in gap distance and residue type. Generating a signature involves identifying residues associated with points of contact in interactions between secondary structure elements. A raw signature consists of a set of positions with potential key structural roles sampled from a sequence alignment constructed with reference to this contact data. Raw signatures are refined by sampling different gap-residue pairs until the specificity of a signature for the family cannot be further improved. We summarize signatures for nine families of protein of diverse fold and function and present results of scans against the OWL protein sequence database. The implications of such signatures are discussed. Copyright 2000 Wiley-Liss, Inc.Mesh:
Substances:
Year: 2000 PMID: 10842345
Source DB: PubMed Journal: Proteins ISSN: 0887-3585