Literature DB >> 30225276

Au courant computation of the PDB to audit diffraction anisotropy of soluble and membrane proteins.

Xavier Robert1, Josiane Kassis-Sahyoun1, Nicoletta Ceres1, Juliette Martin1, Michael R Sawaya2,3, Randy J Read4, Patrice Gouet1, Pierre Falson1, Vincent Chaptal1.   

Abstract

This data article makes available the informed computation of the whole Protein Data Bank (PDB) to investigate diffraction anisotropy on a large scale and to perform statistics. This data has been investigated in detail in "X-ray diffraction reveals the intrinsic difference in the physical properties of membrane and soluble proteins" [1]. Diffraction anisotropy is traditionally associated with absence of contacts in-between macromolecules within the crystals in a given direction of space. There are however many case that do not follow this empirical rule. To investigate and sort out this discrepancy, we computed diffraction anisotropy for every entry of the PDB, and put them in context of relevant metrics to compare X-ray diffraction in reciprocal space to the crystal packing in real space. These metrics were either extracted from PDB files when available (resolution, space groups, cell parameters, solvent content), or calculated using standard procedures (anisotropy, crystal contacts, presence of ligands). More specifically, we separated entries to compare soluble vs membrane proteins, and further separated the later in subcategories according to their insertion in the membrane, function, or type of crystallization (Type I vs Type II crystal packing). This informed database is being made available to investigators in the raw and curated formats that can be re-used for further downstream studies. This dataset is useful to test ideas and to ascertain hypothesis based on statistical analysis.

Entities:  

Keywords:  Diffraction anisotropy; Macromolecule crystals; Membrane proteins; X-ray diffraction

Year:  2018        PMID: 30225276      PMCID: PMC6139481          DOI: 10.1016/j.dib.2018.05.072

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table Value of the data First broad analysis of the spread of diffraction anisotropy across the entire Protein Data Bank. Allows researchers to compare their anisotropy to all other available entries and better gage their data. These data set ground to challenge established ideas and to further investigate diffraction data from macromolecule crystals.

Data

In these data, X-ray diffraction anisotropy is calculated for each entry of the Protein Data Bank and put into perspective with relevant structural information, such as solvent content, resolution, crystal contacts, space group, presence of ligands, etc., to investigate correlations. The aim of these data is to investigate differences between soluble and membrane proteins, so these two types of proteins were identified and separated. Membrane proteins were further separated into subclasses according to their insertion in the membrane, fold, or function to link differences with biology.

Experimental design, materials and methods

Data mining and computation

As of February 24th, 2016, a local copy of the RCSB Protein Data Bank (PDB) was made including all the deposited structures in PDB formatted coordinate files as well as all the crystallographic structure factors in mmCIF format. To this date, out of 115,888 available structures, 103,530 were solved by X-ray crystallography and 92,995 related structure factors files were accessible. For further processing, these last were converted from mmCIF to CCP4 MTZ format with the sf-convert software version 1.204 (developed at RCSB and downloadable at http://deposit.pdb.org/software). By this mean, 92,930 structure factors files were successfully converted, while 65 were not due to various file format issues. We then developed an automated Linux script in Bash programming language that sequentially performed the following tasks for each of the PDB/MTZ couple of files obtained previously: Several data were directly extracted from the PDB file header: the most recent deposition/revision year (PDB ‘REVDAT’ record), the resolution in angstrom (‘REMARK 2 RESOLUTION’ record), the space group and the unit cell parameters (‘CRYST1’ record), the data collection temperature (‘REMARK 200 TEMPERATURE’ record), all compounds/ligands sorted by their 3-letters hetID codes (‘HETNAM’ records), as well as the list of terms relevant to the entry (‘KEYWDS’ record). In addition to these keywords and for each membrane proteins entries, we extracted the name of the protein from the ‘Membrane proteins of known 3D structure’ database (http://blanco.biomol.uci.edu/mpstruc/) leaded by S.H. White (University of California, Irvine). When available, the solvent content (in percent) was also retrieved (‘REMARK 280 SOLVENT CONTENT’ record). If not, it was calculated using the program matthews_coef from the CCP4 software suite version 7.0 [3]. When applicable, the percentages of four classes of protein secondary structure elements (helices, strands, turns and coils) were calculated using the program mkdssp version 2.2.7 included in CCP4. A crystal contacts ratio value was determined by dividing the number of crystal contacts in the unit-cell (computed using ncont from CCP4 with a maximum distance cutoff of 4.0 Å) by the total number of atoms, including heteroatoms and solvent (i.e. all ‘ATOM’ and ‘HETATM’ PDB records). We employed the ‘UCLA Diffraction Anisotropy Server’ [4] script that we modified to take advantage of the last available revisions of CCP4. Thus, for each PDB entry, the anisotropic delta-B value was computed with Phaser [5] both using amplitude and intensity data, when available. Furthermore, the resolution limits at which F/σ(F) drops below 3.0 was determined using the program Truncate from CCP4, this for each of the 3 principle axes of the anisotropic ellipsoid. A ‘delta_res’ value was then deducted by subtracting the lowest resolution limit to the highest one. In addition, the Wilson B-factor was computed with Phaser using amplitude and intensity data, when available. A ratio between the previously calculated anisotropic delta-B value and this Wilson B-factor was then deducted, both with amplitude and intensity data when available. Finally, the total number of reflections was extracted from the structure factors file as well as the number of reflections that were rejected during the anisotropy correction cycles performed by Phaser, this allowing us to determine the percentage of rejected reflection during this process. Thus, from the starting set constituted by 92,930 entries, we were able to compute 92,218 aniso_b based on amplitude data, 26,319 based on intensities and 92,154 delta_res values. The differences came from the fact that a number of structure factor files did not contain intensity data and/or accurate information (i.e. missing or null σ(F), σ(I) values, etc.). All these data were joined, sorted by PDB entry code and imported in an Excel 2013 (Microsoft Corporation) spreadsheet.

Curation

For reasons described in [6] and the behavior of anisotropy over the years (Fig S6D in ref [6]), we decided to only retain structures obtained after 2005 in order to compare entries of similar difficulty levels and susceptible to have comparable anisotropic behavior. Also, in order to compare reasonably well-behaved structures, only data diffracting to less than or equal to 5 Å resolution were kept, and anisotropic delta-B values on amplitudes above 150 Å2 were rejected. In addition, all crystal contacts ratio over 1 were removed from the analysis. Thus, our final dataset consisted of 76,458 entries with 74,928 and 1411 calculated anisotropic delta-B values on amplitudes (soluble and membrane proteins, respectively); and 23,125 and 487 values on intensities.

Subsets extraction

From this curated database, 13 subsets were then extracted based on distinct structural or biological criteria. These last derived from the classification provided by the ‘Membrane proteins of known 3D structure’ database (http://blanco.biomol.uci.edu/mpstruc/). These subsets are: soluble proteins; membrane proteins; membrane proteins structures solved in detergents, lipidic cubic phase (extracted as described by M. Caffrey [7]) or bicelles; α-helical or β-barrel transmembrane proteins; monotopic membrane proteins; membrane ATPase, electron-transfer, channel, receptor and transporter proteins. Finally, two other subsets (embedded membrane proteins and proteins with extramembranous domains) were extracted based on visual inspection of their spatial arrangements information, visualized using the ‘Orientations of Proteins in Membranes’ (OPM) database [8].

Code availability

The present database is generated using an automated Linux Bash script we developed (tested on CentOS 7.x). This last is available with no restrictions upon request to the corresponding author. It can be executed on any Linux distribution as long as the CCP4 software suite version 7.0 (or superior) is installed. Moreover, a local copy of the PDB is also required: this includes all the deposited structures in PDB formatted coordinate files as well as all the crystallographic structure factors in mmCIF format to be converted in MTZ format. An additional Linux Bash script performing these file mirroring and conversion steps is available upon demand as well.
Subject areaBiology
More specific subject areaCrystallography
Type of dataExcel sheet document
How data was acquiredAdvanced computation on Protein Data Bank[2]data
Data formatRaw and curated
Experimental factorsEach Protein Data Bank entry were retrieved for both experimental diffraction data and deposited model, and further processed and classified according to biologically driven criterion.
Experimental featuresSeparation between soluble and membrane proteins; membrane proteins were further separated in different subclasses. For each entry, diffraction anisotropy was calculated and compared to many parameters to investigate the cause of the phenomenon.
Data source locationAll entries were retrieved from the Protein Data Bank[2].
Data accessibilityThe data is made available as supplemental information of this article
TAB LABELSDescriptions
ALLThe complete curated database
SOLUBLESoluble proteins
MEMBRANEMembrane proteins
DETERGENTMembrane proteins structures solved in detergents
LCPMembrane proteins structures solved in lipidic cubic phase
BICELLESMembrane proteins structures solved in bicelles
ALPHAα-helical transmembrane part of membrane proteins
BETAβ-barrel transmembrane part of membrane proteins
MONOTOPICMonotopic membrane proteins
ATPASEMembrane proteins with ATPase function
E-TRANSFERMembrane proteins with electron transfer function
CHANNELMembrane proteins with channel function
RECEPTORMembrane proteins with receptor function
TRANSPORTERMembrane proteins with transporter function
EMBEDDEDMembrane proteins fully embedded within the membrane
EXTRAMBMembrane proteins with extramembranous domains
  7 in total

1.  The Protein Data Bank.

Authors:  H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Toward the structural genomics of complexes: crystal structure of a PE/PPE protein complex from Mycobacterium tuberculosis.

Authors:  Michael Strong; Michael R Sawaya; Shuishu Wang; Martin Phillips; Duilio Cascio; David Eisenberg
Journal:  Proc Natl Acad Sci U S A       Date:  2006-05-11       Impact factor: 11.205

3.  Structural adaptations of proteins to different biological membranes.

Authors:  Irina D Pogozheva; Stephanie Tristram-Nagle; Henry I Mosberg; Andrei L Lomize
Journal:  Biochim Biophys Acta       Date:  2013-06-27

4.  Overview of the CCP4 suite and current developments.

Authors:  Martyn D Winn; Charles C Ballard; Kevin D Cowtan; Eleanor J Dodson; Paul Emsley; Phil R Evans; Ronan M Keegan; Eugene B Krissinel; Andrew G W Leslie; Airlie McCoy; Stuart J McNicholas; Garib N Murshudov; Navraj S Pannu; Elizabeth A Potterton; Harold R Powell; Randy J Read; Alexei Vagin; Keith S Wilson
Journal:  Acta Crystallogr D Biol Crystallogr       Date:  2011-03-18

Review 5.  A comprehensive review of the lipid cubic phase or in meso method for crystallizing membrane and soluble proteins and complexes.

Authors:  Martin Caffrey
Journal:  Acta Crystallogr F Struct Biol Commun       Date:  2015-01-01       Impact factor: 1.056

6.  A log-likelihood-gain intensity target for crystallographic phasing that accounts for experimental error.

Authors:  Randy J Read; Airlie J McCoy
Journal:  Acta Crystallogr D Struct Biol       Date:  2016-03-01       Impact factor: 7.652

7.  X-ray diffraction reveals the intrinsic difference in the physical properties of membrane and soluble proteins.

Authors:  Xavier Robert; Josiane Kassis-Sahyoun; Nicoletta Ceres; Juliette Martin; Michael R Sawaya; Randy J Read; Patrice Gouet; Pierre Falson; Vincent Chaptal
Journal:  Sci Rep       Date:  2017-12-05       Impact factor: 4.379

  7 in total
  1 in total

1.  X-ray diffraction reveals the intrinsic difference in the physical properties of membrane and soluble proteins.

Authors:  Xavier Robert; Josiane Kassis-Sahyoun; Nicoletta Ceres; Juliette Martin; Michael R Sawaya; Randy J Read; Patrice Gouet; Pierre Falson; Vincent Chaptal
Journal:  Sci Rep       Date:  2017-12-05       Impact factor: 4.379

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.