| Literature DB >> 26053889 |
Abstract
We describe an approach to the structure determination of large proteins that relies on experimental NMR chemical shifts, plus sparse nuclear Overhauser effect (NOE) data if available. Our alignment method, POMONA (protein alignments obtained by matching of NMR assignments), directly exploits pre-existing bioinformatics algorithms to match experimental chemical shifts to values predicted for the crystallographic database. Protein templates generated by POMONA are subsequently used as input for chemical shift-based Rosetta comparative modeling (CS-RosettaCM) to generate reliable full-atom models.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26053889 PMCID: PMC4521993 DOI: 10.1038/nmeth.3437
Source DB: PubMed Journal: Nat Methods ISSN: 1548-7091 Impact factor: 28.547
Figure 1POMONA/CS-RosettaCM structure generation (a) Flowchart of the POMONA/CS-RosettaCM structure generation protocol. (b–e) Results of POMONA/CS-RosettaCM structure generation for four representative test proteins: nsp1, sensory rhodopsin, maxacal and maltose binding protein (mbp). (b) For each of these, the POMONA alignment scores (H′, Eq. 5) of the top 1000 protein chains in the PDB are plotted versus the Cα-RMSD, calculated over the aligned residues between the query and the database protein. Grey and black dots correspond to sequence identities <20% and ≥20%, respectively, between the query and database protein. After clustering analysis for the alignments with < 20% sequence identity, alignments contained in the ten highest scoring clusters are colored according to the cluster number, i.e., red, green, blue, magenta, dark-green, yellow, cyan, orange, grey and brown for clusters 1–10, respectively. Only the two highest scoring alignments from each of these ten clusters are used as structural templates for CS-RosettaCM modeling. (c) ROSETTA all-atom energy, incl. the experimental chemical shift score, for the CS-RosettaCM models versus their Cα-RMSD relative to the experimental structure. Colors correspond to those of the starting template. For comparison, the horizontal line and the graph at the bottom of each panel represent the lowest Rosetta all-atom energy and the normalized number of structures, respectively, obtained by CS-Rosetta. (d) Same as c but for POMONA/CS-RosettaCM modeling with additional sparse 1H-1H NOE data. (e) Ribbon models of the lowest energy CS-RosettaCM structure (red) (calculated without sparse NOEs) superimposed on the corresponding experimental structure (blue).
Figure 2Comparison of protein structure alignments obtained by different methods for the 16 proteins listed in Table 1. (a) Histogram of protein structure alignment quality, represented by a MaxSub score, for the top 1000 alignments identified by POMONA (red bars), the sequence alignment method HHsearch (black), and the structure alignment method DALI (blue). Results are shown only for PDB proteins with < 20% sequence identity to the target protein, and DALI and HHsearch results correspond to default thresholds of Z ≥ 2 and Prob ≥ 10%, respectively, used by these programs to identify homologues. The DALI histogram indicates the limit of how good any search program could possibly function. Positive POMONA alignments are taken from the top ten clusters (solid red bars) within the top 1,000 alignments (solid + transparent red), as identified by the highest H′ score (Eq. 5). (b) Comparison of alignment quality obtained by DALI and POMONA methods. For each of the positive alignments identified by both DALI and POMONA, the MaxSub scores are compared, with color representing sequence identity to the query protein (grey: ≥ 30%, blue: 20–30%, red: < 20%) as observed in the DALI alignments.
Performance of POMONA alignment and CS-RosettaCM structure generation for 16 test proteins.
| Name | Size | PDB/BMRB# | Fold | Homologues & alignments
| CS-RosettaCM | csRosetta | ||
|---|---|---|---|---|---|---|---|---|
| DALI | POMONA | Rmsdmean
| Rmsdexp
| Rmsdexp
| ||||
| nsp1 | 113 | 2gdtA/7014 | α/β | 2/0/2(0.50) | 0.30/0.30 | 2.18±0.63 | 3.30±0.73 | 12.1±1.3 |
| HR2876B | 117 | 2ltmA | α/β | 2/4/75(0.47) | 0.41/0.41 | 2.89±0.72 | 4.21±0.55 | 6.41±2.76 |
| YR313A | 119 | 2ltlA | α/β | 1/2/52(0.45) | 0.26/0.25 | 1.60±0.27 | 3.67±0.45 | 2.80±0.68 |
| OR36 | 134 | 2lciA | α/β | 4/5/799(0.50) | 0.36/0.34 | 2.19±0.73 | 4.32±0.56 | 3.05±0.35 |
| OR135 | 83 | 2ln3A | α/β | 1/1/651(0.70) | 0.52/0.40 | 1.35±0.49 | 1.88±0.42 | 1.21±0.13 |
| HR2876C | 87 | 2m5oA | α/β | 4/4/723(0.57) | 0.35/0.33 | 1.77±0.27 | 2.24±0.42 | 1.17±0.20 |
| MTH1958 | 153 | 1tvgA/6344 | β | 5/14/147(0.76) | 0.53/0.51 | 1.30±0.18 | 2.35±0.17 | 10.4±4.9 |
| sgr145 | 173 | 3merA/16806 | α/β | 3/43/896(0.72) | 0.66/0.52 | 2.30±0.58 | 3.05±0.74 | 8.2±2.8 |
| fgf2 | 125 | 1basA/4091 | β | 262/23/449(0.83) | 0.74/0.65 | 1.06±0.18 | 1.56±0.19 | 11.7±1.5 |
| tpx | 167 | 2jszA | α/β | 49/376/389(0.70) | 0.69/0.68 | 1.60±0.23 | 2.32±0.22 | 17.7±2.0 |
| YwIE | 150 | 1zggA/6460 | α/β | 17/66/308(0.65) | 0.69/0.69 | 1.19±0.18 | 1.86±0.23 | 11.0±3.7 |
| fluA | 184 | 1n0sA/5756 | β/α | 11/38/413(0.63) | 0.51/0.51 | 2.01±0.49 | 3.46±0.34 | 8.5±1.5 |
| mad2 | 196 | 1go4C | α/β | 42/10/3(0.44) | 0.13/0.11 | 12.74±4.45 | 19.81±1.01 | 15.8±2.6 |
| s. rhodopsin | 222 | 2ksyA | α | 23/153/149(0.64) | 0.62/0.62 | 2.32±0.43 | 3.09±0.51 | 17.8±3.5 |
| maxacal | 269 | 1svnA | α/β | 273/79/4(0.51) | 0.50/0.50 | 3.29±0.57 | 4.51±0.85 | 19.4±2.6 |
| mbp | 370 | 1dmbA | α/β | 276/31/182(0.52) | 0.52/0.51 | 2.73±0.50 | 4.24±0.73 | 26.3±2.1 |
The PDB code for proteins with an NMR-derived structure as the reference.
Number of alignment hits with sequence identity of ≥30%, 30%-20% and <20%, respectively, and a minimum alignment length of at least 2/3 of the total number of target residues; the highest MaxSub value observed for the alignments with a sequence identity of <20% is listed in parentheses.
Highest MaxSub value observed among all top 1,000 POMONA-alignments (sequence identity <20%) and highest MaxSub score among the up to 20 templates used for subsequent CS-RosettaCM modeling.
Cα-RMSD value calculated for all non-flexible residues (as identified by a RCI-S2 ≥0.6 (ref.[20])). RMSDmean is the Cα-RMSD between the ten lowest-energy models and their mean coordinates. RMSDexp is the Cα-RMSD between the ten lowest-energy models (derived using database proteins with sequence identity <20%) and the experimental reference structure.
CS-RosettaCM and CS-Rosetta structures that met the acceptance criterion (see Online Methods). To convert a calculated RMSD value to its corresponding RMSD100 value (used in our work to evaluate convergence, see Online Methods), RMSD, where N is the number of residues of the protein.
CS-Rosetta models with a lower Rosetta energy than obtained with the POMONA/CS-RosettaCM approach.