| Subject area | Chemistry, Biology, Computer Science |
| More specific subject area | Structural molecular biology, Protein modeling, Molecular structure prediction |
| Type of data | Protein structures (PDB format),Tables, charts, figures etc. |
| How data was acquired | Input for the integrative modeling pipeline was acquired from public databases- the protein data bank, and the electron microscopy data bank. Preliminary stage models were produced using Swiss-Model and I-TASSER software. Structural quality of models were evaluated using a number of software including MolProbity, ERRAT, PROSA II, ModEval, PDB validation suite, Verify3D, ProCheck and PSVS, a suite which computes a set of quality scores including some of the ones mentioned here. Energy minimization and minor structural refinement was done using the KoBaMin server. Conformational search, assembly, and assessment of quaternary contact quality were performed using F2Dock and MolEnergy software suites. Correlation of atomic model and electron microscopy data was carried out using the PF2Fit software. We used R for statistical analysis, and TexMol and PyMol for visualization. Detailed citations for these software are included in the article. |
| Data format | Raw: Computationally predicted models (PDB format) Analyzed: Molecular properties of models; charts, figures etc. |
| Experimental factors | The protocol used to resolve the complete structure of gp120 in complex with CD4 and 17b involved multiple steps, each involving computational clustering and pruning of data. Please see the main article[26]for details, as well as the brief description in the body of this article. Note that, this is a computational modeling protocol and there was no experimental pretreatment of samples in the traditional sense. |
| Experimental features | We pose the problem of computational modeling of a protein as- given the primary sequence of a protein, a set of available partial structures at atomic resolution and additional data including possible binding sites, electron microscopy (EM) maps etc, report atomic structure of the entire protein such that it explains (fits) the given data and maximizes a scoring function. The scoring function is designed to reflect structural quality at secondary, tertiary and quaternary levels. |
| Data source location | Not applicable |
| Data accessibility | All data is publicly accessible with no restrictions. All data necessary to understand and replicate the entire pipeline, or any part of it, is provided as a compressed folder with this paper. |