| Literature DB >> 21685100 |
Arun S Konagurthu1, Lloyd Allison, Peter J Stuckey, Arthur M Lesk.
Abstract
UNLABELLED: Simple and concise representations of protein-folding patterns provide powerful abstractions for visualizations, comparisons, classifications, searching and aligning structural data. Structures are often abstracted by replacing standard secondary structural features-that is, helices and strands of sheet-by vectors or linear segments. Relying solely on standard secondary structure may result in a significant loss of structural information. Further, traditional methods of simplification crucially depend on the consistency and accuracy of external methods to assign secondary structures to protein coordinate data. Although many methods exist automatically to identify secondary structure, the impreciseness of definitions, along with errors and inconsistencies in experimental structure data, drastically limit their applicability to generate reliable simplified representations, especially for structural comparison. This article introduces a mathematically rigorous algorithm to delineate protein structure using the elegant statistical and inductive inference framework of minimum message length (MML). Our method generates consistent and statistically robust piecewise linear explanations of protein coordinate data, resulting in a powerful and concise representation of the structure. The delineation is completely independent of the approaches of using hydrogen-bonding patterns or inspecting local substructural geometry that the current methods use. Indeed, as is common with applications of the MML criterion, this method is free of parameters and thresholds, in striking contrast to the existing programs which are often beset by them. The analysis of results over a large number of proteins suggests that the method produces consistent delineation of structures that encompasses, among others, the segments corresponding to standard secondary structure. AVAILABILITY: http://www.csse.monash.edu.au/~karun/pmml.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21685100 PMCID: PMC3117365 DOI: 10.1093/bioinformatics/btr240
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Deviations Δs,t and u of intermediate points P··· P to a line segment between two endpoints P and P. (Refer main text.)
Fig. 2.Distribution of ratios of number of line segments over number of residues per structure in the dataset. Ratios are expressed in percentages and rounded to the nearest integral value.
Geometric profiles of ideal secondary structures used to classify coarsely the delineation identified by the program. ϕ and ψ are average backbone dihedral angles. n is the periodicity of the local structure. ρ is the rise. p is the pitch
| Type | ϕ | ψ | ρ | ||
|---|---|---|---|---|---|
| 310-Helix | −57.1 | −69.7 | 3.0 | 2.0 | 6.0 |
| α-Helix | −57.8 | −47.0 | 3.6 | 1.5 | 5.5 |
| π-Helix | −74.0 | −4.0 | 4.4 | 1.1 | 5.0 |
| β-Strand | −139.0 | 135.0 | 2.0 | 3.4 | 6.8 |
Percentage agreement of Helix and Strand assignments between various methods
| Comparison | Helices (%) | Strands (%) |
|---|---|---|
| PMML (coarse)_vs_DSSP | 79.0 | 83.3 |
| PMML (coarse)_vs_STRIDE | 79.3 | 83.1 |
| PMML (refine)_vs_DSSP | 92.6 | 92.4 |
| PMML (refine)_vs_STRIDE | 91.3 | 92.1 |
| STRIDE_vs_DSSP | 95.7 | 96.9 |
Fig. 3.Wall-eye stereo image of 1.8 Å crystal structure of oxidized Clostridium beijerinckii flavodoxin. Each delineated segment produced by PMML is shown in a different color. The elements of secondary structures, of helices and strands of sheet, were derived from the wwPDB file, 5NLL, and are shown in this figure as thick ribbons. The labels of various secondary structures are also shown. The bound FMN co-factor is shown at the top of the structure as thin lines.
The residue ranges of secondary structural elements (SSEs) in the structure of flavodoxin shown in
| SSE | wwPDB | PMML |
|---|---|---|
| β1 | Lys2-Trp6 | Met1-Tyr5 |
| α | Asn11-Glu25 | Asn11-Glu27 |
| β2 | Asn31–Asn34 | Gly27-Ile33 |
| α | Ile40-Asn45 | Asn39-Glu46 |
| β3 | Ile48–Cys53 | Asp47-Cys53 |
| α | Phe66-Lys76 | Glu65-Thr75 |
| β4 | Lys81–Tyr88 | Gly79-Ser87 |
| α | Lys94-Gly105 | Gly91-Gly107 |
| β5 | Leu115–Gln118 | Glu112-Gln118 |
| α | Asp122-Ile136 | Glu120-Gln126,Gln126-Ile136 |
The SSEs in the rows follow the order of their appearance along the chain of the protein from its N- to C-terminus. The column wwPDB gives the residue ranges of various SSEs as indicated in the wwPDB file 5NLL. The column PMML gives the corresponding residue ranges of the segmentation produced by PMML.