Literature DB >> 25348214

PPDMs-a resource for mapping small molecule bioactivities from ChEMBL to Pfam-A protein domains.

Felix A Kruger1, Anna Gaulton1, Michal Nowotka1, John P Overington1.   

Abstract

UNLABELLED: PPDMs is a resource that maps small molecule bioactivities to protein domains from the Pfam-A collection of protein families. Small molecule bioactivities mapped to protein domains add important precision to approaches that use protein sequence searches alignments to assist applications in computational drug discovery and systems and chemical biology. We have previously proposed a mapping heuristic for a subset of bioactivities stored in ChEMBL with the Pfam-A domain most likely to mediate small molecule binding. We have since refined this mapping using a manual procedure. Here, we present a resource that provides up-to-date mappings and the possibility to review assigned mappings as well as to participate in their assignment and curation. We also describe how mappings provided through the PPDMs resource are made accessible through the main schema of the ChEMBL database.
AVAILABILITY AND IMPLEMENTATION: The PPDMs resource and curation interface is available at https://www.ebi.ac.uk/chembl/research/ppdms/pfam_maps. The source-code for PPDMs is available under the Apache license at https://github.com/chembl/pfam_maps. Source code is available at https://github.com/chembl/pfam_map_loader to demonstrate the integration process with the main schema of ChEMBL.
© The Author 2014. Published by Oxford University Press.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 25348214      PMCID: PMC4341065          DOI: 10.1093/bioinformatics/btu711

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Systematic analyses of bioactive small molecules and their molecular targets and homologues form the basis of a number of novel applications in computational drug discovery and systems and chemical biology, including methods of target prediction (Martínez-Jiménez ), and for the establishment of functional relationships between proteins (Kruger and Overington, 2012; Lin ; van der Horst ). To add precision to these methods, we have previously proposed a simple mapping heuristic of small molecule bioactivities to protein domains (Kruger ). Here, we present a full implementation of the mapping to a relevant subset of biological assays stored in the current version of the ChEMBL database (Bento ). This implementation also accommodates edge cases that were unaddressed in the original implementation—specifically cases where more than one Pfam-A domain could mediate small molecule binding. In the refined implementation, such cases were resolved manually. The PPDMs server provides a platform to review and contribute manual assignments. Previous computational approaches exist that link small molecule binding to specific protein domains, but these approaches use information extracted from crystal structures of protein-ligand complexes to transfer binding annotations between homologous proteins (Davis and Šali, 2010; Finn ; Snyder ). More recently, such approaches have been used to associate protein domains based on ligands shared between them (Li ; Moya-García and Ranea, 2013). PPDMs provides an alternative approach that associates small molecule binding and protein domains based on empirical evidence from literature reported measurements in biological assays. PPDMs has generated binding-domain annotations for ∼770 k small molecule bioactivities which can be obtained from the main schema of the ChEMBL database.

2 PPDMs enables improved mapping of ligand-binding domains

The objective of the mapping heuristic is to annotate biological assays reported in ChEMBL with the protein domain that mediates small molecule binding. The heuristic is based on protein domain annotations provided through the Pfam-A collection of sequence-based protein domains (Punta ). As a first step in the heuristic, a catalogue of Pfam-A domains capable of small molecule binding was constructed from small molecule bioactivities measured against single domain proteins from ChEMBL. PPDMs offers a facility to refine the original catalogue by adding Pfam-A domains that are known from other sources to interact with small molecules but which are missing from the catalogue in the original implementation. Vice-versa, Pfam-A domains can be removed from the catalogue if evidence for small molecule binding is deemed insufficient. For example, we adjusted the previously applied potency threshold of 50 µM to a more stringent threshold of 10 µM, corresponding to a pChEMBL value of 5, where pChEMBL is defined as −log10(molar IC50, XC50, EC50, AC50, Ki, Kd or Potency), see Bento . As a consequence, we removed a number of domains associated with weak and potentially non-specific binding. The catalogue and associated evidence for small molecule binding can be reviewed in the ‘Evidence’ section of the PPDMs resource. In a second step, this catalogue was mapped to proteins that are defined as targets in binding or functional assays where the target is either a single protein or a protein complex (defined through a relationship of type ‘D’) and a pChEMBL value is assigned. This resulted in three possible categories of outcomes (see also Fig. 1):
Fig. 1.

Schematic illustration of homology-based transfer of binding domain annotation. The schematic shows how a catalogue of Pfam-A domains with known small molecule interactions was mapped to protein sequences in the ChEMBL target dictionary

A successful mapping if exactly one of the Pfam-A domain models from the catalogue matches the sequence. No mapping if none of the Pfam-A domain models from the catalogue match the sequence; A conflicting mapping if multiple domain models from the catalogue match the sequence. Schematic illustration of homology-based transfer of binding domain annotation. The schematic shows how a catalogue of Pfam-A domains with known small molecule interactions was mapped to protein sequences in the ChEMBL target dictionary Table 1 summarizes the distribution of measured activities across these three categories. Despite their relatively small contribution to the total of measured activities, protein architectures associated with category iii-type outcomes form a subset of high relevance to drug discovery, for example, many tyrosine kinases and ligand-gated ion channels. In the ‘Conflicts’ section, PPDMs provides a facility to manually assign mappings for such architectures on a per-assay basis. For each assay, PPDMs provides an overview of the assay details, domain architecture of the associated target and a form to submit a manual assignment. Assignments can be reviewed in the ‘Logs’ section, with an option of revoking a previous curation decision. User profiles ensure that accidental or deliberate assignment errors can be rolled back on a per-user basis if necessary.
Table 1.

The table below summarizes how activities in the current release distribute over the three possible outcomes

Outcome# All% All# Active% Active
i) Successful map750 65353.5269 12876.2
ii) Not mapped625 13544.563 01017.9
iii) Conflicting map28 3272.020 8395.9
Total1 404 115100352 977100

Columns headed ‘all’ represent all activities, whereas columns headed ‘active’ represent activities from binding assays where pChEMBL is >5. %, percentage relative to total.

aTotal count.

The table below summarizes how activities in the current release distribute over the three possible outcomes Columns headed ‘all’ represent all activities, whereas columns headed ‘active’ represent activities from binding assays where pChEMBL is >5. %, percentage relative to total. aTotal count.

3 Integration with the ChEMBL database

The PPDMs workflow is decoupled from the release cycle of the ChEMBL database. Assigned mappings can be exported from PPDMs, by downloading the pfam_maps table using a link in the logs section. Equally, an up-to-date version of the catalogue (table name: valid_domains) can be downloaded from the evidence section. For integration of mappings assigned using PPDMs into the main schema of the ChEMBL database, a standardized procedure exists. Prior to each ChEMBL release, the most recent version of the catalogue is obtained from PPDMs. In a second step, it is applied to proteins that are defined as targets in assays meeting the required criteria. Finally, the set of manually assigned mappings is obtained from the PPDMs resource and used to override mappings that have been assigned by the default procedure.

4 Outlook

PPDMs provides a richer, domain-level perspective of small molecule binding and enriches annotation of small molecule bioactivities stored in the ChEMBL database. We anticipate that this type of annotation will improve the precision of target prediction and efficacy modelling approaches, and interpretation of the effects of natural genetic variation. PPDMs enables the refinement of domain-level annotation of small molecule bioactivities in a facile and transparent manner. The curation of conflicting mappings in PPDMs is ongoing and we are hopeful that PPDMs can engage the community in reviewing and improving domain-level annotations of small molecule bioactivities.
  12 in total

1.  Insights into polypharmacology from drug-domain associations.

Authors:  Aurelio A Moya-García; Juan A G Ranea
Journal:  Bioinformatics       Date:  2013-06-05       Impact factor: 6.937

2.  A novel chemogenomics analysis of G protein-coupled receptors (GPCRs) and their ligands: a potential strategy for receptor de-orphanization.

Authors:  Eelke van der Horst; Julio E Peironcely; Adriaan P Ijzerman; Margot W Beukers; Jonathan R Lane; Herman W T van Vlijmen; Michael T M Emmerich; Yasushi Okuno; Andreas Bender
Journal:  BMC Bioinformatics       Date:  2010-06-10       Impact factor: 3.169

3.  The overlap of small molecule and protein binding sites within families of protein structures.

Authors:  Fred P Davis; Andrej Sali
Journal:  PLoS Comput Biol       Date:  2010-02-05       Impact factor: 4.475

4.  A pharmacological organization of G protein-coupled receptors.

Authors:  Henry Lin; Maria F Sassano; Bryan L Roth; Brian K Shoichet
Journal:  Nat Methods       Date:  2013-01-06       Impact factor: 28.547

5.  Global analysis of small molecule binding to related protein targets.

Authors:  Felix A Kruger; John P Overington
Journal:  PLoS Comput Biol       Date:  2012-01-12       Impact factor: 4.475

6.  The Pfam protein families database.

Authors:  Marco Punta; Penny C Coggill; Ruth Y Eberhardt; Jaina Mistry; John Tate; Chris Boursnell; Ningze Pang; Kristoffer Forslund; Goran Ceric; Jody Clements; Andreas Heger; Liisa Holm; Erik L L Sonnhammer; Sean R Eddy; Alex Bateman; Robert D Finn
Journal:  Nucleic Acids Res       Date:  2011-11-29       Impact factor: 16.971

7.  Domain-based small molecule binding site annotation.

Authors:  Kevin A Snyder; Howard J Feldman; Michel Dumontier; John J Salama; Christopher W V Hogue
Journal:  BMC Bioinformatics       Date:  2006-03-17       Impact factor: 3.169

8.  Mapping small molecule binding data to structural domains.

Authors:  Felix A Kruger; Raghd Rostom; John P Overington
Journal:  BMC Bioinformatics       Date:  2012-12-13       Impact factor: 3.169

9.  Target prediction for an open access set of compounds active against Mycobacterium tuberculosis.

Authors:  Francisco Martínez-Jiménez; George Papadatos; Lun Yang; Iain M Wallace; Vinod Kumar; Ursula Pieper; Andrej Sali; James R Brown; John P Overington; Marc A Marti-Renom
Journal:  PLoS Comput Biol       Date:  2013-10-03       Impact factor: 4.475

10.  iPfam: a database of protein family and domain interactions found in the Protein Data Bank.

Authors:  Robert D Finn; Benjamin L Miller; Jody Clements; Alex Bateman
Journal:  Nucleic Acids Res       Date:  2013-12-01       Impact factor: 16.971

View more
  4 in total

Review 1.  Unexplored therapeutic opportunities in the human genome.

Authors:  Tudor I Oprea; Cristian G Bologa; Søren Brunak; Allen Campbell; Gregory N Gan; Anna Gaulton; Shawn M Gomez; Rajarshi Guha; Anne Hersey; Jayme Holmes; Ajit Jadhav; Lars Juhl Jensen; Gary L Johnson; Anneli Karlson; Andrew R Leach; Avi Ma'ayan; Anna Malovannaya; Subramani Mani; Stephen L Mathias; Michael T McManus; Terrence F Meehan; Christian von Mering; Daniel Muthas; Dac-Trung Nguyen; John P Overington; George Papadatos; Jun Qin; Christian Reich; Bryan L Roth; Stephan C Schürer; Anton Simeonov; Larry A Sklar; Noel Southall; Susumu Tomita; Ilinca Tudose; Oleg Ursu; Dušica Vidovic; Anna Waller; David Westergaard; Jeremy J Yang; Gergely Zahoránszky-Köhalmi
Journal:  Nat Rev Drug Discov       Date:  2018-03-23       Impact factor: 84.694

2.  Protein domain-based prediction of drug/compound-target interactions and experimental validation on LIM kinases.

Authors:  Tunca Doğan; Ece Akhan Güzelcan; Marcus Baumann; Altay Koyas; Heval Atas; Ian R Baxendale; Maria Martin; Rengul Cetin-Atalay
Journal:  PLoS Comput Biol       Date:  2021-11-29       Impact factor: 4.475

3.  Identification of New Toxicity Mechanisms in Drug-Induced Liver Injury through Systems Pharmacology.

Authors:  Aurelio A Moya-García; Andrés González-Jiménez; Fernando Moreno; Camilla Stephens; María Isabel Lucena; Juan A G Ranea
Journal:  Genes (Basel)       Date:  2022-07-21       Impact factor: 4.141

4.  Structural and Functional View of Polypharmacology.

Authors:  Aurelio Moya-García; Tolulope Adeyelu; Felix A Kruger; Natalie L Dawson; Jon G Lees; John P Overington; Christine Orengo; Juan A G Ranea
Journal:  Sci Rep       Date:  2017-08-31       Impact factor: 4.379

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.