Literature DB >> 15608116

Automatic generation and evaluation of sparse protein signatures for families of protein structural domains.

Matthew J Blades1, Jon C Ison, Ranjeeva Ranasinghe, John B C Findlay.   

Abstract

We identified key residues from the structural alignment of families of protein domains from SCOP which we represented in the form of sparse protein signatures. A signature-generating algorithm (SigGen) was developed and used to automatically identify key residues based on several structural and sequence-based criteria. The capacity of the signatures to detect related sequences from the SWISSPROT database was assessed by receiver operator characteristic (ROC) analysis and jack-knife testing. Test signatures for families from each of the main SCOP classes are described in relation to the quality of the structural alignments, the SigGen parameters used, and their diagnostic performance. We show that automatically generated signatures are potently diagnostic for their family (ROC50 scores typically >0.8), consistently outperform random signatures, and can identify sequence relationships in the "twilight zone" of protein sequence similarity (<40%). Signatures based on 15%-30% of alignment positions occurred most frequently among the best-performing signatures. When alignment quality is poor, sparser signatures perform better, whereas signatures generated from higher-quality alignments of fewer structures require more positions to be diagnostic. Our validation of signatures from the Globin family shows that when sequences from the structural alignment are removed and new signatures generated, the omitted sequences are still detected. The positions highlighted by the signature often correspond (alignment specificity >0.7) to the key positions in the original (non-jack-knifed) alignment. We discuss potential applications of sparse signatures in sequence annotation and homology modeling.

Mesh:

Substances:

Year:  2005        PMID: 15608116      PMCID: PMC2253312          DOI: 10.1110/ps.04929005

Source DB:  PubMed          Journal:  Protein Sci        ISSN: 0961-8368            Impact factor:   6.725


  33 in total

1.  Identification of side-chain clusters in protein structures by a graph spectral method.

Authors:  N Kannan; S Vishveshwara
Journal:  J Mol Biol       Date:  1999-09-17       Impact factor: 5.469

2.  Alignment of a sparse protein signature with protein sequences: application to fold prediction for three small globulins.

Authors:  S C Daniel; J H Parish; J C Ison; M J Blades; J B Findlay
Journal:  FEBS Lett       Date:  1999-10-15       Impact factor: 4.124

3.  SCOP: a structural classification of proteins database.

Authors:  L Lo Conte; B Ailey; T J Hubbard; S E Brenner; A G Murzin; C Chothia
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

4.  Clusters in alpha/beta barrel proteins: implications for protein structure, function, and folding: a graph theoretical approach.

Authors:  N Kannan; S Selvaraj; M M Gromiha; S Vishveshwara
Journal:  Proteins       Date:  2001-05-01

5.  Key residues approach to the definition of protein families and analysis of sparse family signatures.

Authors:  J C Ison; M J Blades; A J Bleasby; S C Daniel; J H Parish; J B Findlay
Journal:  Proteins       Date:  2000-08-01

6.  Conserved key amino acid positions (CKAAPs) derived from the analysis of common substructures in proteins.

Authors:  B V Reddy; W W Li; I N Shindyalov; P E Bourne
Journal:  Proteins       Date:  2001-02-01

7.  Twilight zone of protein sequence alignments.

Authors:  B Rost
Journal:  Protein Eng       Date:  1999-02

8.  CORA--topological fingerprints for protein structural families.

Authors:  C A Orengo
Journal:  Protein Sci       Date:  1999-04       Impact factor: 6.725

9.  Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching.

Authors:  M Gribskov; N L Robinson
Journal:  Comput Chem       Date:  1996-03

10.  Pairwise sequence alignment below the twilight zone.

Authors:  J D Blake; F E Cohen
Journal:  J Mol Biol       Date:  2001-03-23       Impact factor: 5.469

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.