| Literature DB >> 34680137 |
Zita Harmat1,2, Dániel Dudola1, Zoltán Gáspári1.
Abstract
Ensemble-based structural modeling of flexible protein segments such as intrinsically disordered regions is a complex task often solved by selection of conformers from an initial pool based on their conformity to experimental data. However, the properties of the conformational pool are crucial, as the sampling of the conformational space should be sufficient and, in the optimal case, relatively uniform. In other words, the ideal sampling is both efficient and exhaustive. To achieve this, specialized tools are usually necessary, which might not be maintained in the long term, available on all platforms or flexible enough to be tweaked to individual needs. Here, we present an open-source and extendable pipeline to generate initial protein structure pools for use with selection-based tools to obtain ensemble models of flexible protein segments. Our method is implemented in Python and uses ChimeraX, Scwrl4, Gromacs and neighbor-dependent backbone distributions compiled and published previously by the Dunbrack lab. All these tools and data are publicly available and maintained. Our basic premise is that by using residue-specific, neighbor-dependent Ramachandran distributions, we can enhance the efficient exploration of the relevant region of the conformational space. We have also provided a straightforward way to bias the sampling towards specific conformations for selected residues by combining different conformational distributions. This allows the consideration of a priori known conformational preferences such as in the case of preformed structural elements. The open-source and modular nature of the pipeline allows easy adaptation for specific problems. We tested the pipeline on an intrinsically disordered segment of the protein Cd3ϵ and also a single-alpha helical (SAH) region by generating conformational pools and selecting ensembles matching experimental data using the CoNSEnsX+ server.Entities:
Keywords: dihedral angle; intrinsically disordered proteins; local interaction; principal component analysis; protein ensemble model; structure prediction
Mesh:
Substances:
Year: 2021 PMID: 34680137 PMCID: PMC8534045 DOI: 10.3390/biom11101505
Source DB: PubMed Journal: Biomolecules ISSN: 2218-273X
Figure 1(A) Flowchart of the steps of the pipeline DIPEND (DIsordered Protein Ensembles from Neighbor-dependent Distributions). (B) Flowchart of the unknotting part of the DIPEND pipeline.
RMSD and correlation of the back-calculated chemical shifts of the generated ensembles of the Cd3 segment.
| Generated 5000 | CA+CB rmsd | CA rmsd | Selected from MD | |||||
|---|---|---|---|---|---|---|---|---|
| Selected | Selected | (CA rmsd, | ||||||
| rmsd | corr. | rmsd | corr. | rmsd | corr. | rmsd | corr. | |
| CA full | 0.458 | 0.996 | 0.211 | 0.999 | 0.073 | 1.000 | 0.400 | 0.997 |
| CA secondary | 0.296 | 0.696 | 0.953 | 0.451 | ||||
| CB full | 0.550 | 0.999 | 0.137 | 1.000 | 0.601 | 0.998 | 0.831 | 0.997 |
| CB secondary | 0.233 | 0.764 | 0.322 | 0.301 | ||||
Figure 2Overview of the Cd3 ensemble. (A) Disorder propensity as predicted by the IUPred3 server. (B) Observed and calculated CA secondary chemical shifts. (C) Secondary structure logo for the ensemble selected based on CA chemical shifts. DSSPcont states are averaged for all models. Figure prepared with Weblogo. (D) PCA of the simulated, generated and selected (sub)ensembles. (E) Ramachandran plot of all residues in the ITAM1 (dark-green) and ITAM2 (purple) motifs in all structures of the selected ensemble.
RMSD and correlation of selected back-calculated NMR parameters of MYO VI SAH ensembles.
| 6OBI (10 Models) | Generated 5000 | Selected (37 Models) | ||||
|---|---|---|---|---|---|---|
| rmsd | corr. | rmsd | corr. | rmsd | corr. | |
| N-H RDC | 4.021 | 0.746 | 7.304 | 0.756 | 3.676 | 0.917 |
| H-C RDC | 1.709 | 0.456 | 1.440 | 0.671 | 1.058 | 0.807 |
| N-C RDC | 0.822 | 0.118 | 0.541 | 0.576 | 0.351 | 0.838 |
| 3JHNHA | 0.602 | 0.789 | 0.765 | 0.625 | 0.518 | 0.903 |
| CA secondary | 0.969 | 0.712 | 0.866 | 0.705 | 0.688 | 0.903 |
| CB secondary | 0.915 | 0.389 | 0.967 | 0.561 | 0.970 | 0.587 |
| N-H S2 | 0.201 | 0.488 | 0.258 | 0.871 | ||
Figure 3Measured and back-calculated NMR parameters and structural characteristics for the selected SAH ensemble. (A) Measured (red) and calculated (blue) values of some NMR parameters for the selected SAH ensemble. Calculated values were obtained with CoNSensX+. (B) Ribbon representation of the selected 37 conformers of the MYO VI SAH domain, superimposed for residues 28–42. Rainbow coloring from N to C terminus. Figure prepared with UCSF Chimera. (C) Secondary structure logo generated from averaging all DSSP state probabilities calculated with DSSPcont for all 37 models. Figure prepared with Weblogo. (D) PCA plot showing the distribution along modes 1–2 of the generated 5000 (orange), the selected 37 (black) structures and the 10 deposited conformers in PDB entry 6OBI (purple). Mode 1 corresponds to the end-to-end distance of the structures.