Literature DB >> 30759194

Finding enzyme cofactors in Protein Data Bank.

Abhik Mukhopadhyay¹, Neera Borkakoti¹, Lukáš Pravda¹, Jonathan D Tyzack¹, Janet M Thornton¹, Sameer Velankar¹.

Abstract

MOTIVATION: Cofactors are essential for many enzyme reactions. The Protein Data Bank (PDB) contains >67 000 entries containing enzyme structures, many with bound cofactor or cofactor-like molecules. This work aims to identify and categorize these small molecules in the PDB and make it easier to find them.
RESULTS: The Protein Data Bank in Europe (PDBe; pdbe.org) has implemented a pipeline to identify enzyme cofactor and cofactor-like molecules, which are now part of the PDBe weekly release process.
AVAILABILITY AND IMPLEMENTATION: Information is made available on the individual PDBe entry pages at pdbe.org and programmatically through the PDBe REST API (pdbe.org/api). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical Disease Species

Year: 2019 PMID： 30759194 PMCID： PMC6748742 DOI： 10.1093/bioinformatics/btz115

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Almost 80% of entries (115 274 out of 144 464 as of September 2018) in the Protein Data Bank (PDB) (wwPDB Consortium, 2019) contain at least one small molecule bound to protein or nucleic acids. Presently, reasons for the presence of these molecules is not well described by the PDB annotation procedure (Young ): some may be biologically relevant such as substrates, products or inhibitors, whereas others are molecules added to increase protein stability or facilitate crystallization. To identify role of these molecules we have first focussed on enzymes and their small molecule organic cofactors. The PDB has >67 000 structures of enzymes and over 30% of these enzymes require cofactors to function, making this a non-trivial problem. Relying on the knowledge of reactions catalysed by these enzymes, the classification of cofactors into 27 classes in the CoFactor database (Fischer ) and a new method to measure molecular similarity (Tyzack ), we have developed a protocol to identify cofactors and cofactor-like molecules in cofactor-dependent enzymes. The information is updated weekly with each PDB release and is stored in the PDBe database (Mir ). The up to date information is made available via PDBe REST API, query system and the PDBe entry pages.

2 Materials and methods

2.1 Sources of information

The initial data were obtained from the CoFactor database (Fischer ) including manually curated information for 27 cofactor classes (Supplementary Table S1) and a list of EC numbers of cofactor-binding enzymes that are known to require each of these cofactor classes. A template molecule is defined for each of the 27 cofactor classes. The initial list of manually curated EC numbers from the CoFactor database has been expanded to include EC numbers for all non-metal cofactor-binding enzymes available in the BRENDA database (Placzek ) (Supplementary Table S2). The process also uses the wwPDB chemical component dictionary (Westbrook ) that contains descriptions of all unique chemical components in the PDB. The SIFTS resource (Dana ) is used to obtain up to date mapping of PDB entries to enzyme commission numbers.

2.2 Identification and categorizing procedure

A semi-automated process (Fig. 1) has been integrated in the PDBe’s weekly release pipeline. The main steps of this procedure are

Fig. 1.

The steps implemented to identify new bound molecules that are cofactor or cofactor like

Newly released small molecules are identified from the chemical component dictionary. Small molecules that are structurally similar to the cofactor template molecules (Supplementary Table S3) are identified. A chemical structure similarity score is calculated using RDKit-based similarity-searching methods PARITY (Tyzack ) and SiteBinder (Sehnal ). Small molecules with a similarity score above a predefined cut-off specified for a particular cofactor class are tentatively selected for further manual inspection for appropriate structural equivalence with the template molecule (Supplementary Table S3) and are added to the list of small molecules in the corresponding cofactor classes. An automatic process obtains a list of PDB entries containing newly identified cofactor-like small molecule and enzymes associated with the corresponding cofactor classes. If the enzymes identified from the PDB entries are from the curated list of cofactor-binding enzymes, the small molecule is identified as a cofactor-like molecule in the context of the PDB entry The steps implemented to identify new bound molecules that are cofactor or cofactor like

3 Results

As of September 21, 2018, we have identified 364 unique cofactor and cofactor-like molecules found in 11% of PDB entries (16 022 out of 144 464 entries). The distribution of these small molecules across enzyme classes is provided in supplementary information (Supplementary Fig. S1). All new cofactor or cofactor-like molecules and associated PDB entries are processed weekly.

3.1 Cofactor annotation on PDBe web pages

Cofactor and cofactor-like molecules are now clearly identified in the ligands and environments section of a PDBe entry page (Supplementary Fig. S2). Additional details on cofactor class and the similarity to the template molecule are shown on the ligand page (Supplementary Fig. S3). It is also possible to find all the cofactor-like molecules and associated PDB entries for a specific cofactor class using PDBe’s advanced search.

3.2 Data retrieval using the cofactor API

Three calls have been designed to retrieve cofactor data through the PDBe’s REST API (Supplementary Table S4). The most general call retrieves information of all cofactor-like molecules organized into the 27 cofactor classes along with the EC numbers of chemical reactions they catalyse. The cofactor information specific to a PDB entry can be obtained via an API call by providing the PDB entry id. For example, coenzyme A acts as a cofactor in Clostridium acetobutylicum thiolase (PDB id: 4xl4; pdbe.org/4xl4), but does not in pantothenate kinase 3 (PDB id: 3mk6; pdbe.org/3mk6). The last API call takes the PDB Chemical Component ID as input and lists the cofactor classes the small molecule belongs to, the cofactor class template molecule, all structurally similar cofactor-like small molecules from the identified cofactor class and their similarity to the template molecule.

4 Conclusion and future directions

This work provides a new way of finding cofactor related information from experimentally determined enzyme structures in the PDB. The information is also made available for integration into other biomedical data resources. Work is under way to extend the cofactor classes to include other cofactor molecules such as ATP. The details of the method and a complete analysis of the results will be presented elsewhere (in preparation). Click here for additional data file.

9 in total

1. The chemical component dictionary: complete descriptions of constituent molecules in experimentally determined 3D macromolecules in the Protein Data Bank.

Authors: John D Westbrook; Chenghua Shao; Zukang Feng; Marina Zhuravleva; Sameer Velankar; Jasmine Young
Journal: Bioinformatics Date: 2014-12-02 Impact factor: 6.937

2. SiteBinder: an improved approach for comparing multiple protein structural motifs.

Authors: David Sehnal; Radka Svobodová Vařeková; Heinrich J Huber; Stanislav Geidl; Crina-Maria Ionescu; Michaela Wimmerová; Jaroslav Koča
Journal: J Chem Inf Model Date: 2012-02-08 Impact factor: 4.956

3. The CoFactor database: organic cofactors in enzyme catalysis.

Authors: Julia D Fischer; Gemma L Holliday; Janet M Thornton
Journal: Bioinformatics Date: 2010-08-02 Impact factor: 6.937

4. PDBe: towards reusable data delivery infrastructure at protein data bank in Europe.

Authors: Saqib Mir; Younes Alhroub; Stephen Anyango; David R Armstrong; John M Berrisford; Alice R Clark; Matthew J Conroy; Jose M Dana; Mandar Deshpande; Deepti Gupta; Aleksandras Gutmanas; Pauline Haslam; Lora Mak; Abhik Mukhopadhyay; Nurul Nadzirin; Typhaine Paysan-Lafosse; David Sehnal; Sanchayita Sen; Oliver S Smart; Mihaly Varadi; Gerard J Kleywegt; Sameer Velankar
Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971

5. BRENDA in 2017: new perspectives and new tools in BRENDA.

Authors: Sandra Placzek; Ida Schomburg; Antje Chang; Lisa Jeske; Marcus Ulbrich; Jana Tillack; Dietmar Schomburg
Journal: Nucleic Acids Res Date: 2016-10-19 Impact factor: 16.971

6. Worldwide Protein Data Bank biocuration supporting open access to high-quality 3D structural biology data.

Authors: Jasmine Y Young; John D Westbrook; Zukang Feng; Ezra Peisach; Irina Persikova; Raul Sala; Sanchayita Sen; John M Berrisford; G Jawahar Swaminathan; Thomas J Oldfield; Aleksandras Gutmanas; Reiko Igarashi; David R Armstrong; Kumaran Baskaran; Li Chen; Minyu Chen; Alice R Clark; Luigi Di Costanzo; Dimitris Dimitropoulos; Guanghua Gao; Sutapa Ghosh; Swanand Gore; Vladimir Guranovic; Pieter M S Hendrickx; Brian P Hudson; Yasuyo Ikegawa; Yumiko Kengaku; Catherine L Lawson; Yuhe Liang; Lora Mak; Abhik Mukhopadhyay; Buvaneswari Narayanan; Kayoko Nishiyama; Ardan Patwardhan; Gaurav Sahni; Eduardo Sanz-García; Junko Sato; Monica R Sekharan; Chenghua Shao; Oliver S Smart; Lihua Tan; Glen van Ginkel; Huanwang Yang; Marina A Zhuravleva; John L Markley; Haruki Nakamura; Genji Kurisu; Gerard J Kleywegt; Sameer Velankar; Helen M Berman; Stephen K Burley
Journal: Database (Oxford) Date: 2018-01-01 Impact factor: 3.451

7. Ranking Enzyme Structures in the PDB by Bound Ligand Similarity to Biological Substrates.

Authors: Jonathan D Tyzack; Laurent Fernando; Antonio J M Ribeiro; Neera Borkakoti; Janet M Thornton
Journal: Structure Date: 2018-03-15 Impact factor: 5.006

8. SIFTS: updated Structure Integration with Function, Taxonomy and Sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins.

Authors: Jose M Dana; Aleksandras Gutmanas; Nidhi Tyagi; Guoying Qi; Claire O'Donovan; Maria Martin; Sameer Velankar
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971

9. Protein Data Bank: the single global archive for 3D macromolecular structure data.

Authors:
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971

9 in total

5 in total

1. 3DLigandSite: structure-based prediction of protein-ligand binding sites.

Authors: Jake E McGreig; Hannah Uri; Magdalena Antczak; Michael J E Sternberg; Martin Michaelis; Mark N Wass
Journal: Nucleic Acids Res Date: 2022-04-12 Impact factor: 19.160

2. PDBe: improved findability of macromolecular structure data in the PDB.

Authors: David R Armstrong; John M Berrisford; Matthew J Conroy; Aleksandras Gutmanas; Stephen Anyango; Preeti Choudhary; Alice R Clark; Jose M Dana; Mandar Deshpande; Roisin Dunlop; Paul Gane; Romana Gáborová; Deepti Gupta; Pauline Haslam; Jaroslav Koča; Lora Mak; Saqib Mir; Abhik Mukhopadhyay; Nurul Nadzirin; Sreenath Nair; Typhaine Paysan-Lafosse; Lukas Pravda; David Sehnal; Osman Salih; Oliver Smart; James Tolchard; Mihaly Varadi; Radka Svobodova-Vařeková; Hossam Zaki; Gerard J Kleywegt; Sameer Velankar
Journal: Nucleic Acids Res Date: 2020-01-08 Impact factor: 16.971

3. PDBe-KB: collaboratively defining the biological context of structural data.

Authors:
Journal: Nucleic Acids Res Date: 2022-01-07 Impact factor: 16.971

4. PDBe and PDBe-KB: Providing high-quality, up-to-date and integrated resources of macromolecular structures to support basic and applied research and education.

Authors: Mihaly Varadi; Stephen Anyango; Sri Devan Appasamy; David Armstrong; Marcus Bage; John Berrisford; Preeti Choudhary; Damian Bertoni; Mandar Deshpande; Grisell Diaz Leines; Joseph Ellaway; Genevieve Evans; Romana Gaborova; Deepti Gupta; Aleksandras Gutmanas; Deborah Harrus; Gerard J Kleywegt; Weslley Morellato Bueno; Nurul Nadzirin; Sreenath Nair; Lukas Pravda; Marcelo Querino Lima Afonso; David Sehnal; Ahsan Tanweer; James Tolchard; Charlotte Abrams; Roisin Dunlop; Sameer Velankar
Journal: Protein Sci Date: 2022-10 Impact factor: 6.993

5. PDBe-KB: a community-driven resource for structural and functional annotations.

Authors:
Journal: Nucleic Acids Res Date: 2020-01-08 Impact factor: 16.971

5 in total