| Literature DB >> 18039711 |
Adam Zemla1, Brian Geisbrecht, Jason Smith, Marisa Lam, Bonnie Kirkpatrick, Mark Wagner, Tom Slezak, Carol Ecale Zhou.
Abstract
Protein structural annotation and classification is an important and challenging problem in bioinformatics. Research towards analysis of sequence-structure correspondences is critical for better understanding of a protein's structure, function, and its interaction with other molecules. Clustering of protein domains based on their structural similarities provides valuable information for protein classification schemes. In this article, we attempt to determine whether structure information alone is sufficient to adequately classify protein structures. We present an algorithm that identifies regions of structural similarity within a given set of protein structures, and uses those regions for clustering. In our approach, called STRALCP (STRucture ALignment-based Clustering of Proteins), we generate detailed information about global and local similarities between pairs of protein structures, identify fragments (spans) that are structurally conserved among proteins, and use these spans to group the structures accordingly. We also provide a web server at http://as2ts.llnl.gov/AS2TS/STRALCP/ for selecting protein structures, calculating structurally conserved regions and performing automated clustering.Entities:
Mesh:
Substances:
Year: 2007 PMID: 18039711 PMCID: PMC2190701 DOI: 10.1093/nar/gkm1049
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 5.(a) Dendrogram showing the results of an LGA_S-based (single measure) clustering of 24 SCOP domains from fold a.8. Each code (entry_family) represents one protein from the SCOP classification: entry and family number. We used the R package (version 2.1.1; http://www.r-project.org/) for the hierarchical clustering and visualization of calculated LGA_S results from all-against-all structure comparisons. (b) Clustering created using STRALCP algorithm with default cutoff LGA_S = 60%.
Figure 2.A 3D plot of structural superposition between 1yn4_A and 1m4v_A (SCOP domain: d1m4va2) that corresponds to the fourth colored bar in Figure 1. The level of sequence identity between proteins Seq_ID: ∼14%, and the level of structure similarity LGA_S: ∼75%.
Figure 1.Structure similarities between EAP domains from S. aureus (PDB: 1yn3, 4 and 5) and 17 protein domains from the SCOP superfamily comprising superantigen toxins. All proteins were compared to the structure of EapH1 (1yn4_A), which serves as a frame of reference. Colored bars represent Calpha–Calpha distance deviation between 1yn4_A [99 residues; from the left (N-terminal) to the right (C-terminal)] superimposed with 20 structures from PDB (first bar represents a 1yn4_A–1yn4_A self-comparison). Colors represent distances between aligned residues and range from green (below 2 Å) to red (above 6 Å). The columns at the right contain information about the level of sequence identity (Seq_ID) and structure similarity (LGA_S).
Figure 3.The results from the analysis of structure similarities between EAP domains from S. aureus and proteins from the SCOP superfamily of superantigen toxins (same domains as in Figure 1). SCOP domain d1f77a2 serves as a frame of reference for this comparison. The coloring scheme is the same as in Figures 1 and 2.
Figure 4.STRALCP clustering applied to the same set of 20 structures as in Figures 1 and 3. STRALCP calculations were performed using default parameters (LGA_S = 60%, DIST = 5 Å). Each row begins from the cluster number, followed by the domain name, and the set of amino acids that are extracted from detected structurally conserved spans. Dots indicate regions that structurally deviate in at least one pairwise comparison between members of the cluster. Note: dots do not indicate the actual number of residue pairs between detected spans. They are introduced for formatting purposes only.