Robert A Nicholls1, Marcin Wojdyr2, Robbie P Joosten3, Lucrezia Catapano1, Fei Long1, Marcus Fischer4, Paul Emsley1, Garib N Murshudov1. 1. Structural Studies, MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom. 2. Global Phasing Limited, Sheraton House, Castle Park, Cambridge CB3 0AX, United Kingdom. 3. Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands. 4. Chemical Biology and Therapeutics and Structural Biology, St Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN 38105-3678, USA.
Abstract
Covalent linkages between constituent blocks of macromolecules and ligands have been subject to inconsistent treatment during the model-building, refinement and deposition process. This may stem from a number of sources, including difficulties with initially detecting the covalent linkage, identifying the correct chemistry, obtaining an appropriate restraint dictionary and ensuring its correct application. The analysis presented herein assesses the extent of problems involving covalent linkages in the Protein Data Bank (PDB). Not only will this facilitate the remediation of existing models, but also, more importantly, it will inform and thus improve the quality of future linkages. By considering linkages of known type in the CCP4 Monomer Library (CCP4-ML), failure to model a covalent linkage is identified to result in inaccurate (systematically longer) interatomic distances. Scanning the PDB for proximal atom pairs that do not have a corresponding type in the CCP4-ML reveals a large number of commonly occurring types of unannotated potential linkages; in general, these may or may not be covalently linked. Manual consideration of the most commonly occurring cases identifies a number of genuine classes of covalent linkages. The recent expansion of the CCP4-ML is discussed, which has involved the addition of over 16 000 and the replacement of over 11 000 component dictionaries using AceDRG. As part of this effort, the CCP4-ML has also been extended using AceDRG link dictionaries for the aforementioned linkage types identified in this analysis. This will facilitate the identification of such linkage types in future modelling efforts, whilst concurrently easing the process involved in their application. The need for a universal standard for maintaining link records corresponding to covalent linkages, and references to the associated dictionaries used during modelling and refinement, following deposition to the PDB is emphasized. The importance of correctly modelling covalent linkages is demonstrated using a case study, which involves the covalent linkage of an inhibitor to the main protease in various viral species, including SARS-CoV-2. This example demonstrates the importance of properly modelling covalent linkages using a comprehensive restraint dictionary, as opposed to just using a single interatomic distance restraint or failing to model the covalent linkage at all. open access.
Covalent linkages between constituent blocks of macromolecules and ligands have been subject to inconsistent treatment during the model-building, refinement and deposition process. This may stem from anumber of sources, including difficulties with initially detecting the covalent linkage, identifying the correct c<n class="Chemical">span class="Disease">hemn>istry, obtaining an appropriate restraint dictionary and ensuring its correct application. The analysis presented herein assesses the extent of problems involving covalent linkages in the Protein Data <spn>an class="Gene">Bank (PDB). Not only will this facilitate the remediation of existing models, but also, more importantly, it will inform and thus i<span class="Gene">mprove the quality of future linkages. By considering linkages of known type in the CCP4 Monomer Library (CCP4-ML), failure to model a covalent linkage is identified to result in inaccurate (systematically longer) interatomic distances. Scanning the PDB for proximal atom pairs that do not have a corresponding type in the CCP4-ML reveals a large number of commonly occurring types of unannotated potential linkages; in general, these may or may not be covalently linked. Manual consideration of the most commonly occurring cases identifies a number of genuine classes of covalent linkages. The recent expansion of the CCP4-ML is discussed, which has involved the addition of over 16 000 and the replacement of over 11 000 component dictionaries using AceDRG. As part of this effort, the CCP4-ML has also been extended using AceDRG link dictionaries for the aforementioned linkage types identified in this analysis. This will facilitate the identification of such linkage types in future modelling efforts, whilst concurrently easing the process involved in their application. The need for a universal standard for maintaining link records corresponding to covalent linkages, and references to the associated dictionaries used during modelling and refinement, following deposition to the PDB is emphasized. The importance of correctly modelling covalent linkages is demonstrated using a case study, which involves the covalent linkage of an inhibitor to the main protease in various viral species, including SARS-CoV-2. This example demonstrates the importance of properly modelling covalent linkages using a comprehensive restraint dictionary, as opposed to just using a single interatomic distance restraint or failing to model the covalent linkage at all. open access.
Authors: Mihaela Atanasova; Robert A Nicholls; Robbie P Joosten; Jon Agirre Journal: Acta Crystallogr D Struct Biol Date: 2022-03-04 Impact factor: 7.652
Authors: Keitaro Yamashita; Colin M Palmer; Tom Burnley; Garib N Murshudov Journal: Acta Crystallogr D Struct Biol Date: 2021-09-29 Impact factor: 7.652
Authors: Ida de Vries; Tim Kwakman; Xiang Jun Lu; Maarten L Hekkelman; Mandar Deshpande; Sameer Velankar; Anastassis Perrakis; Robbie P Joosten Journal: Acta Crystallogr D Struct Biol Date: 2021-08-24 Impact factor: 7.652