| Literature DB >> 23121764 |
Eric di Luccio1, Patrice Koehl.
Abstract
BACKGROUND: Drug discovery typically starts with the identification of a potential target that is then tested and validated either through high-throughput screening against a library of drug compounds or by rational drug design. When the putative target is a protein, the latter approach requires the knowledge of its structure. Finding the structure of a protein is however a difficult task. Significant progress has come from high-resolution techniques such as X-ray crystallography and NMR; there are many proteins however whose structure have not yet been solved. Computational techniques for structure prediction are viable alternatives to experimental techniques for these cases. However, the proper validation of the structural models they generate remains an issue.Entities:
Year: 2012 PMID: 23121764 PMCID: PMC3502507 DOI: 10.1186/2043-9113-2-18
Source DB: PubMed Journal: J Clin Bioinforma ISSN: 2043-9113
Figure 1Flowchart for computing the H-factor. A. The scoring functions (1) and (2) are sequence-based. The score (1) compares the secondary structure prediction for the target sequence with the actual secondary structure assignment of the template protein. As an example, an extract of a sequence alignment between a template and a target is represented. The secondary structures of the template are indicated underneath the sequence of the template (C: coil, H: helix). Above the sequence of the target is indicated the secondary structure prediction for the target along with its confidence factor returned by PSIPRED. Score (2) evaluates the sequence similarity between the target and template sequence. B. The scoring functions (3) and (4) evaluate the structural models: score (3) quantifies the structural diversity among the models, while score (4) identifies the pfam domains in the target protein, collects the structures of these domains from the models to be tested, and compares these structures with those observed for the same domains in the PDB.
Comparison between the H-factor, cRMS, DOPE and QMEAN scores
| T0295 | 1.0 | 1.9 | 1.3 | 3.8 | 19 | 1.67 | 46 | -33940 | 0.735 |
| T0295* (f) | 1.0 | 1.9 | 1.9 | 3.8 | 21 | 1.71 | 46 | -24317 | 0.196 |
| T0522 | 2.0 | 6.2 | 1.5 | 5.0 | 37 | 2.19 | 38 | -14185 | 0.771 |
| T0522* (f) | 2.0 | 6.9 | 2.0 | 5.2 | 40 | 2.71 | 38 | -12687 | 0.681 |
| T0521 | 4.2 | 7.5 | 3.4 | 5.4 | 51 | 4.09 | 24 | -18484 | 0.695 |
| T0521# (g) | 6.1 | 7.6 | 3.6 | 6.8 | 60 | 4.24 | 23 | -15968 | 0.508 |
| T0544 | 7.0 | 8.3 | 4.0 | 8.3 | 69 | 6.11 | 17 | -10652 | 0.192 |
(a) The four scoring functions included in the H-factor measure the quality of the secondary structure prediction (score (1)), the diversity of the sequence alignment (score (2)), the structural diversity of the models generated (score (3)), and the similarity of the predicted structures for the functional domains in the target, compared to the structures of the same domains found in the PDB (score (4)).
(b) Average cRMS (over Cα) between the 10 different models generated and the actual experimental structure for the target.
(c) Sequence identity between the target sequence and the template sequence.
(d) DOPE scores from MODELLER [24].
(e) QMEAN normalized score [25].
(f) T0295* and T0522* are experiments in which the sequence alignment between the target and its framework has been deliberately modified with a shift of a single amino acid in the sequence alignment (see text for details).
(g) T0521# is an experiment in which a less suitable template has been deliberately chosen compared to T0521 (see text for details).
Figure 2Comparison of experimental structures with their respective homology models.Left side panel: experimental X-ray structures of selected CASP7 (A) and CASP9 (B, C, D) targets. Right side panel: structural overlay of ten models for each selected CASP target build with the closest available template in the protein data bank. A &B depicts “easy” modelling cases, while C &D are dramatically more challenging.
Figure 3Case study: modelling of the SET domain of MMSET/NSD2 A. Structural overlay of a set of 10 models of NSD2-SET modelled using PDB 3OOI (NSD1-SET) as a template. B. Structural overlay of a set of 10 models of NSD2-SET modelled using PDB 4FMU (SETD2-SET) as a template. The models are represented as ribbons (top) and with the amino acid side-chains (bottom). The sequence identity between the template and NSD2-SET is indicated along with the H-factor.